JP4510237B2

JP4510237B2 - Pattern detection apparatus and method, image processing apparatus and method

Info

Publication number: JP4510237B2
Application number: JP2000181480A
Authority: JP
Inventors: 優和真継
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2000-06-16
Filing date: 2000-06-16
Publication date: 2010-07-21
Anticipated expiration: 2020-06-16
Also published as: JP2002008031A

Description

【０００１】
【発明の属する技術分野】
本発明は、パターン認識、特定被写体の検出等を行うパターン検出装置及び方法、画像処理装置及び方法に関するものである。
【０００２】
【従来の技術】
従来より、画像認識や音声認識の分野においては、特定の認識対象に特化した認識処理アルゴリズムをコンピュータソフトとして逐次演算して実行するタイプ、或いは専用並列画像処理プロセッサ（ＳＩＭＤ、ＭＩＭＤマシン等）により実行するタイプに大別される。
【０００３】
画像認識アルゴリズムにおいては、生体系処理とのアナロジーから、注視領域を選択的にシフト等しながら認識を行うことにより、認識処理に要する演算コスト（負荷）を軽減することが重要と考えられている。
【０００４】
例えば、特公平６―３４２３６号公報に係る階層型情報処理方法では、特徴抽出素子層と同一特徴に応じた特徴抽出素子からの出力に応じた出力を行う特徴統合層とを有する複数の階層間に、下位層から上位層に向かう複数の上向性信号経路に対応して上位層から下位層へ向かう下向性信号経路を設け、最上位層からの下向性信号に応じて上向性信号の伝達を制御することにより、自己想起型の連想能力と特定パターンの選択的抽出によるセグメンテーションを行うことにより、処理領域（認識のための注視領域）を設定している。
【０００５】
ＵＳＰ４８７６７３１では、上記のような上向性信号経路と下向性信号経路は出力層（最上位層）からの文脈的情報(ルールデータベース、確率的重み付け処理)に基づき制御されている。
【０００６】
特許２８５６７０２号公報では、認識を修正することを注視といい、正確な認識を行うためにパターンが特定できないときに各部分領域に対する注視度を選択する注視度判断部と、その選択された注視度を加味した認識をさせる注視度制御部とをパターン認識装置に具備させている。
【０００７】
KochとUllman(1985)が提案した選択的ルーティングによる注視領域の設定制御を行う方式(Human Neurobiology, vol.4, pp.219-227)は、特徴抽出と選択的マッピング機構による顕著度マップ抽出機構、いわゆるウイナー・テイク・オール(Winner-take-all)神経回路網(特開平5-242069号公報, US5049758号公報, US5059814号公報, US5146106号公報など参照)による注視位置の選択機構、そして選択位置での神経素子の抑制機構を具備して注視制御を行う。
【０００８】
ダイナミックルーティング回路網により対象中心のスケール、位置不変な表現を得て行う方法(Anderson, et al. 1995, Routing Networks in Visual Cortex, in Handbook of Brain Theory and Neural Networks (M. Arbib, Ed.), MIT Press, pp.823-826、Olhausen et al. 1995, A Multiscale Dynamic Routing Circuit for Forming Size- and Position-Invariant Object Representations, J. Computational Neuroscience, vol.2 pp.45-62.)では、動的に結合荷重を設定する機能を有する制御ニューロンを介した情報のルーティングを行うことにより、注視領域の制御、及び認識対象についての観察者中心での特徴表現から対象物中心の表現への変換を行う。
【０００９】
選択的チューニングによる方法(Culhane & Tsotsos, 1992, An Attentional Prototype for Early Vision. Proceedings of Second European Conference on Computer Vision ,(G.Sadini Ed.), Springer-Verlag, pp. 551-560.)では、最上位層でのＷＴＡ回路から下位層のＷＴＡ回路へと勝者のみを活性化させる機構により探索することにより、最上位層での全体的な勝者の位置を最下位層（入力データを直接受ける層）で決定する。注視制御において行う位置の選択と特徴の選択は、それぞれ対象の位置と無関係な結合の抑制及び対象に関与しない特徴を検出するような素子に対する抑制により実現される。
【００１０】
【発明が解決しようとする課題】
前述したような選択的注視を行う方法では、次のような問題があった。
【００１１】
まず、特公平６―３４２３６号公報に係る方法では、階層的神経回路構成において、上向性信号経路と対をなす下向性信号経路が存在することが前提となっているため、上向性信号経路に相当する神経回路と同程度の大きさの神経回路が下向性信号経路を形成する回路として必要となり、回路規模が非常に大きくなるという問題があった。
【００１２】
また、この方法では、注視位置を順次変更制御する機構が設けられていないため、ノイズその他の影響により、注視領域設定及び変更の際の動作の安定性に欠けるという問題があった。即ち、上向性経路の中間層の素子と下向性経路の中間層の素子間に、全ての階層にわたって相互作用が存在し、注視位置は、それら相互作用の全体を通じて最終的に決まるので、複数の同一カテゴリに属する対象が存在する場合に、注視点位置がそれら対象間を安定的に順次遷移するように制御されず、注視位置が特定の対象間(又はその近傍)だけで振動したりする問題があった。
【００１３】
更に、検出(認識)対象と同一カテゴリの対象が複数入力データ中に存在する場合に、同時に複数の対象に対する処理（実質的に注視になっていない処理）が発生したり、適切な注視位置の更新を行うためには、そのたびにネットワークパラメータの微妙な調整を行うことが必要となるという問題があった。
【００１４】
また、選択的ルーティングによる方式では、注視位置の制御においては、選択された領域を抑制する機構のみを有している為、注視点位置の効率的制御が容易でなく、注視点位置の制御が、特定対象或いは特定部位に偏ったりすることがあった。
【００１５】
また、ダイナミックルーティング回路網による方式では、シナプス結合荷重を動的に変更可能な多くの制御ニューロンを介して層間結合の再構成を行うため、回路構成が複雑となり、また、制御ニューロンの作用が下位層の特徴顕著度に基づくボトムアップな処理過程であったため、複数の同一カテゴリ対象が存在している場合の効率的な注視位置の制御が困難であった。
【００１６】
また、選択的チューニングによる方式では、選択された対象に関与しない結合を単に削ぎ落とすいわゆるpruningに似た選択を階層的にかつ動的に行うので、複数の対象が存在する場合には注視点位置の効率的制御が容易でないという問題があった。
【００１７】
更に、上記した従来方式に共通して以下の問題点があった。
【００１８】
まず、認識対象サイズの違いに対処できる機構を有していないので、異なるサイズの対象が同時に複数位置に存在する場合や、認識対象のサイズが異なる複数のシーンについて、そのシーン毎にネットワークパラメータのチューニングを行う必要があった。
【００１９】
また、複数の同一カテゴリに属する対象が入力データ中の複数の位置に存在するとき、それらの間を万遍なくかつ効率的に注視位置を遷移(更新)することができなかった。
【００２１】
【課題を解決するための手段】
上記課題を解決するために、本発明によれば、パターン検出装置に、パターンを入力する入力手段と、前記入力手段より入力されたパターンを所定の方法によりサンプリングして得られる各点に対応して、それぞれ複数の特徴を検出する複数個の特徴検出素子と、特徴顕著度を検出する顕著度検出素子と、前記素子間を結合し信号を伝達する結合手段とを備え、低次から高次までの特徴に関する複数の素子層を形成し、所定パターンの検出を行なう検出処理手段と、注視領域設定制御手段とを備え、前記結合手段は、高次の特徴に関する素子層からそれより低次の特徴に関する素子層への信号伝達を行うフィードバック結合手段を備え、前記注視領域設定制御手段は、前記特徴顕著度と前記フィードバック結合手段より得られる信号伝達量とに基づいて、低次の特徴データ又は入力データに関する注視領域の設定を制御することを特徴とする。
【００３２】
【発明の実施の形態】
＜第１の実施形態＞
以下、図面を用いて本発明の１実施形態を詳細に説明する。
【００３３】
全体構成概要
図１は、本実施形態のパターン検出・認識装置の全体構成図を示す図である。ここで、パターン情報はＷｈａｔ経路(図上段)とＷｈｅｒｅ経路(図下段)により処理される。Ｗｈａｔ経路は対象または幾何学的特徴などの認識(検出)に関与する情報を、Ｗｈｅｒｅ経路は対象または特徴の位置(配置)に関する情報を主として扱う。
【００３４】
Ｗｈａｔ経路はいわゆるConvolutionalネットワーク構造(LeCun, Y. and Bengio, Y., 1995, "Convolutional Networks for Images Speech, and Time Series" in Handbook of Brain Theory and Neural Networks (M. Arbib, Ed.), MIT Press, pp.255-258)を有している。但し、同経路内の層間結合は局所的に相互結合をなし得る点(後述)が、従来と異なる。Ｗｈａｔ経路の最終出力は認識結果、即ち認識された対象のカテゴリに相当する。また、Ｗｈｅｒｅ経路の最終出力は、認識結果に対応する場所を表す。
【００３５】
データ入力層101は、画像の検出認識などを行う場合は、ＣＭＯＳセンサー或いはＣＣＤ素子等の光電変換素子であり、音声の検出認識などを行う場合には音声入力センサーである。また、所定データ解析部の解析結果（例えば、主成分分析、ベクトル量子化など）から得られる高次元のデータを入力するものであってもよい。データ入力層101は、上記２経路に共通のデータ入力を行う。
【００３６】
なお、本実施形態において音声認識への適用の詳細は説明はしないが、以下の説明でいう選択的注視は、例えば特定帯域や音素のみ抽出処理すること、或いは独立成分分析などの手法により抽出される「特徴」のみを選択的に処理すること、或いは文脈に依存した特定話者の音声(特徴)を抽出して処理することなどに相当する。
【００３７】
以下、画像を入力する場合について説明する。Ｗｈａｔ経路には、特徴検出層102（(１,０）、(１,１)、…、(１,N)）と特徴統合層103（(２,０)、(２,１)、…、(２,N)）と、注視領域設定制御層108とがある。
【００３８】
最初の特徴検出層(１,０)は、Gabor wavelet変換その他による多重解像度処理により、画像パターンの局所的な低次の特徴（幾何学的特徴のほか色成分特徴を含んでもよい）を全画面の各位置(或いは、全画面にわたる所定のサンプリング点の各点)において同一箇所で複数のスケールレベル又は解像度で複数の特徴カテゴリの数だけ検出し、特徴量の種類（例えば、幾何学的特徴として所定方向の線分を抽出する場合にはその幾何学的構造である線分の傾き）に応じた受容野構造を有し、その程度に応じたパルス列を発生するニューロン素子から構成される。
【００３９】
注視領域設定制御層108は、後述する特徴統合層(2,0)からの特徴顕著度マップと上位層、例えば特徴位置検出層(3,k)からのフィードバック結合を後述するWhere経路を通じて受け、そのフィードバック信号に基づき注視領域の位置・サイズの設定及びそれらの更新等の制御を行う。その動作機構の詳細については後述する。
【００４０】
特徴検出層(1,k)は全体として、複数の解像度（又はスケールレベル）での処理チャネルを形成する。即ち、Gabor wavelet変換を特徴検出層（１,０）で行う場合を例にとると、図１２に示すように、スケールレベルが同一で方向選択性の異なるGaborフィルタカーネルを受容野構造に持つ特徴検出細胞のセットは、特徴検出層（１,０）において同一の処理チャネルを形成し、後続の層(1,１) においても、それら特徴検出細胞からの出力を受ける特徴検出細胞（より高次の特徴を検出する）は、当該処理チャネルと同一のチャネルに属する。更に後続の層（１,ｋ）(但しｋ＞１)においても、同様に（2,ｋ―１）層において同一チャネルを形成する複数の特徴統合細胞からの出力を受ける特徴検出細胞は、当該チャネルに属するように構成される。各処理チャネルは、同一スケールレベル（又は解像度）での処理が進行していくものであり、階層的並列処理により低次特徴から高次特徴までの検出及び認識を行う。
【００４１】
Ｗｈａｔ経路上の特徴統合層(２,０)は、所定の受容野構造(以下、受容野とは直前の層の出力素子との結合範囲を、受容野構造とはその結合荷重の分布を意味する)を有し、パルス列を発生するニューロン素子からなり、特徴検出層(１,０)からの同一受容野内の複数のニューロン素子出力の統合（局所平均化等によるサブサンプリング、及び異なるスケールレベルでの処理結果の結合処理などの演算）を行う。
【００４２】
また、特徴統合層内のニューロンの各受容野は同一層内のニューロン間で共通の構造を有している。各特徴検出層（１,１）、(１,２)、…、(１,N)）及び各特徴統合層（(２,１)、(２,２)、…、(２,N)）は、それぞれ学習により獲得した所定の受容野構造を持ち、上述した各層と同様に前者（(１,１)、…）は、各特徴検出モジュールにおいて複数の異なる特徴の検出を行い、後者（(２,１)、…）は、前段の特徴検出層からの複数特徴に関する検出結果の統合を行う。
【００４３】
但し、前者の特徴検出層は、同一チャネルに属する前段の特徴統合層の細胞素子出力を受けるように結合（配線）されている。特徴統合層は２種類の処理を行う。その第一であるサブサンプリングは、同一特徴カテゴリかつ同一スケールレベルの特徴検出細胞集団からの局所的な領域（当該特徴統合層ニューロンの局所受容野）からの出力についての平均化などを行うものであり、第二の処理である異なるスケールレベルでの処理結果の結合処理とは、同一特徴カテゴリかつ異なる複数のスケールレベルにわたる複数の特徴検出細胞集団の出力の線形結合（又は非線形結合）を行う。
【００４４】
また、Where経路には、特徴位置検出層（(３,０)、…、(３,ｋ)）があり、What経路上の所定の（全てである必要はない）特徴検出層からの入力を受け、低次、中次、高次特徴の位置の出力に関与するとともに、Where経路は上位の特徴統合層又は特徴検出層からのフィードバック結合(図１では(1,N)→(3,k)→注視領域設定制御層108)を形成する。
【００４５】
即ち、このフィードバック結合は、予め学習により、上位層で検出される高次特徴パターンを構成する特定の低次特徴に関する注視領域設定制御層108の注視制御ニューロン1801(図１８) への結合として形成されているものとする。この学習は、例えば上位層の特徴検出ニューロンと下位層の特徴検出ニューロンの所定時間幅内での同時発火により、結合が増強または形成され(他の結合は抑制または消滅され)るような自己組織化過程を伴うものである。
【００４６】
各特徴位置検出層107の出力は、対応する特徴検出層102出力の比較的高い解像度成分を保持する（図１において特徴検出層から特徴位置検出層への矢印は、いずれもこのことを模式的に示している）か、或いは特徴統合層出力を一時的に保持することにより、各特徴カテゴリについての顕著度マップを表す。ここにある層での顕著度マップとは、同程度の複雑さを有する複数の特徴カテゴリ（特徴要素）の検出レベル（又は入力データ上での存在確率）の空間的分布を表すものとする。なお、Where経路の特徴位置検出層での処理については後でも述べる。
【００４７】
各層間のニューロン間を結合する機構は、図２の（Ａ）に示すように、神経細胞の軸索または樹状突起に相当する信号伝達部203（配線または遅延線）、及びシナプス結合回路Ｓ202である。
【００４８】
図２の（Ａ）では、ある特徴検出（統合）細胞に対する受容野を形成する特徴統合(検出)細胞のニューロン群（n_i）からの出力（当該細胞から見ると入力）に関与する結合機構の構成を示している。信号伝達部として太線で示している部分は共通バスラインを構成し、この信号伝達ライン上に複数のニューロンからのパルス信号が時系列に並んで伝達される。出力先の細胞からの入力を受ける場合も同様の構成がとられる。この場合には、全く同じ構成において時間軸上で入力信号と出力信号とを分割して処理してもよいし、或いは入力用(樹状突起側)と出力用（軸索側）の２系統で図２(A)と同様の構成を与えて処理してもよい。
【００４９】
シナプス回路Ｓ202としては、層間結合（特徴検出層102上のニューロンと特徴統合層103上のニューロン間の結合であって、各層ごとにその後続の層及び前段の層への結合が存在しうる）に関与するものと、同一層内ニューロン間結合に関与するものとがある。後者は必要に応じて、主に、後述するペースメーカーニューロンと特徴検出または特徴統合ニューロンとの結合に用いられる。
【００５０】
いわゆる、興奮性結合はシナプス回路Ｓ202において、パルス信号の増幅を行い、抑制性結合は逆に減衰を与えるものである。パルス信号により情報の伝達を行う場合、増幅及び減衰は、パルス信号の振幅変調、パルス幅変調、位相変調、周波数変調のいずれによっても実現することができる。
【００５１】
本実施形態においては、シナプス結合回路Ｓ202は、主にパルスの位相変調素子として用い、信号の増幅は、特徴に固有な量としてのパルス到着時間の実質的な進み、減衰は実質的な遅れとして変換される。即ち、シナプス結合は後述するように出力先のニューロンでの特徴に固有な時間軸上の到着位置(位相)を与え、定性的には興奮性結合はある基準位相に対しての到着パルスの位相の進みを、抑制性結合では同様に遅れを与えるものである。
【００５２】
図２の（Ａ）において、各ニューロン素子n_jは、パルス信号（スパイクトレイン）を出力し、後述する様ないわゆるintegrate-and-fire型のニューロン素子を用いている。なお、図２の（C）に示すように、シナプス結合回路とニューロン素子とを、それぞれまとめて回路ブロックを構成してもよい。
【００５３】
図１のＷｈｅｒｅ経路内の各特徴位置検出層は、Ｗｈａｔ経路の特徴検出層等の出力を受けて、データ入力層上の位置関係を保持し粗くサンプリングされた格子点上の各点で、Ｗｈａｔ経路上の特徴抽出結果のうち認識に有用な成分（認識カテゴリのパターンから予め登録してあるもの）に対応するニューロンのみがフィルタリングなどにより応答する。
【００５４】
例えば、Ｗｈｅｒｅ経路内の最上位層では、認識対象のカテゴリに対応するニューロンが格子上に配列し、どの位置に該当する対象が存在するかを表現する。また、Ｗｈｅｒｅ経路内の中間層内のニューロンは、（Ｗｈｅｒｅ経路内の）上位層からのトップダウンの入力（又はＷｈａｔ経路上位層→Ｗｈｅｒｅ経路上位層→Ｗｈｅｒｅ経路中間層のルートによる入力）を受けて対応する認識対象の存在位置を中心として配置しうる特徴が検出された場合にのみに応答するように感度調整等が行われるようにすることができる。
【００５５】
検出された特徴間の位置関係（または位置情報）が保持される階層的特徴検出をＷｈｅｒｅ経路で行う際には、受容野構造が局所的（例えば、楕円形状）であってサイズが上位層ほど徐々に大きくなる（または、中間層から上位層にかけてはセンサー面上の１画素より大きいサイズであって一定である）ように構成すれば、特徴要素（図形要素、図形パターン）間の位置関係はセンサー面上での位置関係をある程度保存しつつ、各層において各特徴要素（図形要素）が検出されるようにすることができる。
【００５６】
なお、Ｗｈｅｒｅ経路の他の形態としては、階層的に上位層ほど受容野サイズが大きくなり、最上位層では検出された対称のカテゴリに対応するニューロンのうち、最大値を出力するものだけが発火するように構成された神経回路網でもよい。このような系では、データ入力層での配置関係（空間的位相）に関する情報を最上位層（及び中間の各層）においてもある程度保存するようになっている。
【００５７】
また、他のネットワーク構成として、図１７に示すネットワークの構成(図１とWhat経路については受容野サイズを除いて同じ)では、逆に特徴統合層は特徴カテゴリの空間配置関係を高次特徴まで保存するため、上位層でも受容野のサイズは一定レベル以下に保つようになっている。このため、特徴位置検出層を設定せずに特徴統合層より注視領域設定制御層108へのフィードバック結合を行っている。
【００５８】
ニューロン素子
次に各層を構成するニューロンについて説明する。各ニューロン素子はいわゆるintegrate-and-fireニューロンを基本として拡張モデル化したもので、入力信号（アクションポテンシャルに相当するパルス列）を時空間的に線形加算した結果が閾値を越したら発火し、パルス状信号を出力する点ではいわゆるintegrate-and-fireニューロンと同じである。
【００５９】
図２の（Ｂ）は、ニューロン素子としてのパルス発生回路（ＣＭＯＳ回路）の動作原理を表す基本構成の一例を示し、公知の回路(IEEE Trans. on Neural Networks Vol. 10, pp.540)を拡張したものである。ここでは、入力として興奮性と抑制性の入力を受けるものとして構成されている。
【００６０】
以下、回路の動作原理について説明する。興奮性入力側のキャパシタC₁及び抵抗Ｒ₁回路の時定数は、キャパシタＣ₂及び抵抗Ｒ₂回路の時定数より小さく、定常状態では、トランジスタT₁,T₂,T₃は遮断されている。なお、抵抗は実際には、能動負荷たるトランジスタで構成される。C₁の電位が増加し、C₂のそれよりトランジスタＴ₁の閾値だけ上回ると、Ｔ₁はアクティブになり、更にトランジスタＴ₂，Ｔ₃をアクティブにする。トランジスタＴ_2,Ｔ₃は、電流ミラー回路を構成し、図２の（Ｂ）の回路の出力は、不図示の出力回路によりキャパシタC₁側から出力される。キャパシタＣ₂の電荷蓄積量が最大となるとトランジスタＴ₁は遮断され、その結果としてトランジスタＴ₂及びＴ₃も遮断され、上記正のフィードバックは０となる様に構成されている。
【００６１】
いわゆる不応期にはキャパシタＣ₂は放電し、Ｃ₁の電位がＣ₂の電位よりＴ₁の閾値分より大とならない限り、ニューロンは応答しない。キャパシタC₁,C₂の交互充放電の繰り返しにより周期的なパルスが出力され、その周波数は一般的には興奮性入力のレベルに対応してきまる。但し、不応期が存在することにより、最大値で制限されるようにすることもできるし、一定周波数を出力するようにもできる。
【００６２】
キャパシタの電位、従って電荷蓄積量は基準電圧制御回路(時間窓重み関数発生回路)により、時間的に制御される。この制御特性を反映するのが、入力パルスに対する後述の時間窓内での重み付き加算である（図７参照）。この基準電圧制御回路は、後述するペースメーカニューロンからの入力タイミング（又は、後続層のニューロンとの相互結合入力）に基づき、基準電圧信号（図７の（Ｂ）の重み関数に相当）を発生する。
【００６３】
抑制性の入力は本実施形態においては必ずしも要しない場合があるが、後述するペースメーカニューロンから特徴検出層ニューロンへの入力を抑制性とすることにより、出力の発散（飽和）を防ぐことができる。
【００６４】
一般的に、入力信号の上記総和と出力レベル（パルス位相、パルス周波数、パルス幅など）の関係は、そのニューロンの感度特性によって変化し、また、その感度特性は上位層からのトップダウンの入力により変化させることができる。以下では、説明の便宜上、入力信号総和値に応じたパルス出力の周波数は急峻に立ち上がるように回路パラメータが設定されているものとし（従って周波数ドメインでは殆ど２値）、パルス位相変調により、出力レベル（位相変調を加えたタイミングなど）が変動するものとする。
【００６５】
また、パルス位相の変調手段としては、後述する図５に示すような回路を付加して用いてもよい。これにより、時間窓内の重み関数で上記基準電圧が制御される結果、このニューロンからのパルス出力の位相が変化し、この位相をニューロンの出力レベルとして用いることができる。
【００６６】
シナプス結合でパルス位相変調を受けたパルスについての時間的積分特性（受信感度特性）を与える図７の（Ｂ）に示すような重み関数の極大値に相当する時刻τ_w1は、一般的にシナプス結合で与えられる特徴に固有なパルスの到着予定時刻τ_s1より時間的に早く設定される。その結果、到着予定時刻より一定範囲で早く（図７(B)の例では、到着の早すぎるパルスは減衰される）到着するパルスは、それを受け取るニューロンでは、高い出力レベルを持ったパルス信号として時間的に積分される。重み関数の形状はガウシアン等の対称形に限らず、非対称形状であってもよい。なお、上述した趣旨より、図７の（Ｂ）の各重み関数の中心は、パルス到着予定時刻ではないことを注記しておく。
【００６７】
また、ニューロン出力（シナプス前）の位相は、後述するように時間窓の始期を基準とし、その基準時からの遅れ（位相）は基準パルス（ペースメーカ出力その他による）を受けた時の電荷蓄積量により決まるような出力特性を有する。このような出力特性を与える回路構成の詳細については、本発明の主眼とする所ではないので省略する。シナプス後のパルス位相は当該シナプスにより与えられる固有の位相変調量にシナプス前の位相を加算したものとなる。
【００６８】
なお、窓関数などを用いることにより得られる入力の総和値が閾値を越えたときに、所定タイミング遅れて発振出力を出すような公知の回路構成を用いてもよい。
【００６９】
ニューロン素子の構成としては、特徴検出層１０２または特徴統合層１０３に属するニューロンであって、後述するペースメーカニューロンの出力タイミングに基づき発火パターンが制御される場合には、ペースメーカニューロンからのパルス出力を受けた後、当該ニューロンが、前段の層の受容野から受ける入力レベル（上記の入力の単純または重み付き総和値）に応じた位相遅れをもって、パルス出力するような回路構成であればよい。この場合、ペースメーカニューロンからのパルス信号が入力される前では、入力レベルに応じて各ニューロンは互いにランダムな位相でパルス出力する過渡的な遷移状態が存在する。
【００７０】
また、後述するようにペースメーカニューロンを用いない場合には、ニューロン間（特徴検出層と特徴統合層の間）の相互結合とネットワークダイナミックスによりもたらされる同期発火信号を基準とし、上述したような入力レベルに応じた特徴検出ニューロンの出力パルスの発火タイミングの制御がなされるような回路構成であってもよい。
【００７１】
特徴検出層102のニューロンは、前述したように特徴カテゴリに応じた受容野構造を有し、前段の層（入力層101または特徴統合層103）のニューロンからの入力パルス信号（電流値または電位）の時間窓関数による荷重総和値（後述）が閾値以上となったとき、その総和値に応じて、例えばシグモイド関数等の一定レベルに漸近的に飽和するような非減少かつ非線形な関数、即ちいわゆるsquashing関数値をとるような出力レベル（ここでは位相変化で与えるが、周波数、振幅、パルス幅基準での変化となる構成でもよい）でパルス出力を行う。
【００７２】
特徴検出層 (1,0) での処理（ Gabor wavelet 変換等による低次特徴抽出）
特徴検出層(１,０)には、局所的な、ある大きさの領域で所定の空間周波数を持ち、方向成分が垂直であるようなパターンの構造（低次特徴）を検出するニューロンN1があるとすると、データ入力層101上においてニューロンN1の受容野内に該当する構造が存在すれば、そのコントラストに応じた位相でパルス出力する。このような機能はGabor filterにより実現することができる。以下、特徴検出層(１,０)の各ニューロンが行う特徴検出フィルタ機能について説明する。
【００７３】
特徴検出層(１,０)では、多重スケール、多重方向成分のフィルタセットで表されるGaborウエーブレット変換を行うものとし、層内の各ニューロン（または複数ニューロンからなる各グループ）は、所定の Gaborフィルタ機能を有する。
【００７４】
特徴検出層102では、スケールレベル（解像度）が一定で方向選択性の異なる複数のGabor関数の畳み込み演算カーネルに対応する受容野構造を有するニューロンからなる複数のニューロン集団を一まとめにして一つのチャネルを形成する。その際、図１３に示すように、同一チャネルを形成するニューロン群は方向選択性が異なり、サイズ選択性が同一のニューロン群どうしを互いに近接した位置に配置してもよいし、図１２のように同一の特徴カテゴリに属し、異なる処理チャネルに属するニューロン群どうしが互いに近接配置されるようにしてもよい。
【００７５】
これは、集団的符号化における後述する結合処理の都合上、上記各図に示すような配置構成にした方が、回路構成上実現しやすいことによる。図１２、１３の回路構成の詳細についても後で説明する。
【００７６】
なお、Gabor wavelet変換を神経回路網で行う方法の詳細については、Daugman (1988)による文献（IEEE Trans. on Acoustics, Speech, and Signal Processing, vol.36, pp.1169-1179）を参照されたい。
【００７７】
Gaborウエーブレットは、以下の式（１）で与えられるように、一定の方向成分と空間周波数とを有する正弦波をガウシアン関数で変調した形状を有し、スケーリングレベルのインデックスmと方向成分のインデックスｎで特定される。ウエーブレットとしてこのフィルタのセットは互いに相似の関数形状を有し、また主方向と大きさが互いに異なる。このウエーブレットは空間周波数ドメインと実空間ドメインで関数形が局在していること、位置と空間周波数に関する同時不確定性が最小となり、実空間でも周波数空間でも最も局在した関数であることが知られている（J,G.Daugman (1985), Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters, Journal of Optical Society of America A, vol.2, pp. 1160-1169）。
【００７８】
【外１】

【００７９】
ここで、(x,y)が画像中の位置、aはスケーリングファクター、θ_nはフィルタの方向成分を表し、Wは基本空間周波数、σ_x, σ_yはフィルタ関数のｘ方向、ｙ方向の広がりの大きさを与えるパラメータである。本実施形態ではθ_nは６方向で０度、３０度、６０度、９０度、１２０度、１５０度の値をとり、aは２とし、mは１から３までの値をとる整数として与える。
【００８０】
フィルタの特性を定めるパラメータσ_x, σ_y、および、aはフーリエドメインで互いに適切に均質に重なり合うことにより、特定の空間周波数及び方向への偏り(感度)がないように設定されるのが望ましい。そのために例えば、フーリエ変換後の振幅最大値に対する半値レベルがフーリエドメインで互いに接するように設計すると、
【００８１】
【外２】

となる。ここに、U_H, U_Lはウエーブレット変換でカバーする空間周波数帯域の最大値、最小値であり、Mはその範囲でのスケーリングレベル数を与える。
【００８２】
また、式（１）で与えられる特徴検出細胞の受容野の構造は、σ_x, σ_yで決まる所定の幅のスケール選択性及び方向選択性を有する。即ち、式（１）のフーリエ変換はガウシアン関数形状となるので、特定の空間周波数及び方向にピークチューニング（感度）特性を与える。Gaborフィルタカーネルのサイズ(広がり)はスケールインデックスｍに応じて変わるので、異なるスケールインデックスを有する Gaborフィルタは、異なるサイズ選択性を有する。後述する集団的符号化においては、主にサイズ選択性に関して感度特性が互いに重なり合う複数の特徴検出細胞からの出力を統合する。
【００８３】
各フィルタg_mn(x,y)と入力濃淡画像との２次元畳み込み演算を行うことによりGaborウエーブレット変換が行われる。即ち、
【００８４】
【外３】

【００８５】
ここにIは入力画像、W_mnはGaborウエーブレット変換係数である。W_mn (m=1,2,3; n=1,..., 6)のセットを特徴ベクトルとして各点で求める。'^*'は複素共役をとることを示す。
【００８６】
特徴検出層(１,０)の各ニューロンは、g_mnに対応する受容野構造を有する。同じスケールインデックスｍを有するg_mnは同じサイズの受容野を有し、演算上は対応するカーネルg_mnサイズもスケールインデックスに応じた大きさを有するようにしてある。ここでは、最も粗いスケールから順に入力画像上の３０×３０、１５×１５、７×７のサイズとした。
【００８７】
各ニューロンは、分布重み係数と画像データとの積和入力を行って得られるウエーブレット変換係数値の非線型squashing関数となる出力レベル（ここでは位相基準とする；但し、周波数、振幅、パルス幅基準となる構成でもよい）でパルス出力を行う。この結果、この層（１,０）全体の出力として、式（４）のGabor
wavelet変換が行われたことになる。
【００８８】
特徴検出層での処理（中次、高次特徴抽出）
一方、後続の特徴検出層（(１,１）、(１,２)、…、(１,Ｎ)）の各ニューロンは、上記特徴検出層(1,0)とは異なり、認識対象のパターンに固有の特徴を検出する受容野構造をいわゆるHebb学習則等により形成する。後の層ほど特徴検出を行う局所的な領域のサイズが認識対象全体のサイズに段階的に近くなり、幾何学的には中次または高次の特徴を検出する。
【００８９】
例えば、顔の検出認識を行う場合には中次（または高次）の特徴とは顔を構成する目、鼻、口等の図形要素のレベルでの特徴を表す。異なる処理チャネル間では、同じ階層レベル(検出される特徴の複雑さが同レベル)であれば、検出される特徴の違いは、同一カテゴリであるが、互いに異なるスケールで検出されたものであることにある。例えば、中次の特徴としての「目」は異なる処理チャネルでは、サイズの異なる「目」として検出を行う。即ち、画像中の与えられたサイズの「目」に対してスケールレベル選択性の異なる複数の処理チャネルにおいて検出が試みられる。
【００９０】
なお、特徴検出層ニューロンは一般的に(低次、高次特徴抽出に依らず)、出力の安定化のために抑制性(分流型抑制：shunting inhibition)の結合を前段の層出力に基づいて受けるような機構を有してもよい。
【００９１】
選択的注視処理
注視領域設定制御層108の構成及び動作について、以下に説明する。同層で設定される注視領域のサイズは、初期状態では、予め設定される前述した複数のスケールレベルのうち最大サイズの処理チャネルについて画面全体とする。
【００９２】
具体的には、図１８に示すように、注視領域制御層には入力データの所定の複数位置（特徴検出が行われる位置）に対応する注視制御ニューロン1801が配列される。図１８において楕円で示してある領域は、設定された注視点、及び対応する各層上の領域(説明の便宜上、各層に付き一つだけ図示しているが、一般的には、複数の特徴カテゴリで検出されるので各層で複数の対応領域が存在しうる；当該領域のサイズは各層の受容野サイズと同程度)を模式的に表す。
【００９３】
図１８において、特徴検出層から特徴位置検出層へ向かう太い矢印は、特徴位置検出層の各ニューロンが、該当する特徴検出層ニューロンの出力を受けることを模式的に示す。各特徴位置検出層ニューロンは、一般的に特徴統合層ニューロンよりもサイズの小さい受容野を有し、特徴統合層と同様にサブサンプリングを行うことにより、特徴統合よりも各特徴カテゴリ間の空間的配置情報をより良く保持することができる。
【００９４】
各注視制御ニューロン1801には、上位の特徴位置検出層(3,2)または(3,1)からのWhere経路からのフィードバック結合、または図１７のような場合、特徴統合層(2,2)における当該注視制御ニューロンに対応する位置についてのニューロンからのフィードバック結合を介して入力される。また、注視制御ニューロンは並行して特徴統合層(2,0)からの出力を受けてもよい。これら２〜３種類の入力の所定の重み付けによる線形加算(または非線形加算）、或いは、何れか一方からの入力に基づく注視領域の選択制御過程（後述）の結果、選択された特定の注視制御ニューロンが活性化し、注視領域が決定される。
【００９５】
注視領域の位置は、主に最上位の特徴統合層(2,N)又は最上位の特徴位置検出層(3,M)で検出される最大レベルの対象の位置情報をフィードバック入力して設定される。前者の場合(即ち、図１７の構成)では、位置情報が特徴統合層で保持される場合に限られる。注視位置の選択方法としては、主に以下の３通りがある。
（１）上位層からのフィードバック量(或いは、複数上位層からのフィードバック量の総和)と特徴検出層出力（低次特徴の顕著度）の線形和（非線形和）が最大の注視制御ニューロンが注視点位置を与える。
（２）低次の特徴顕著度マップ最大(極大)値を受けるような注視制御ニューロンが注視点位置を与える。
（３）上位層からのフィードバック量（複数のフィードバック経路が特定注視制御ニューロンにある場合にはその総和）が最大(極大)となる注視制御ニューロンが注視点位置を与える。
【００９６】
ここで、フィードバック量とは、特定処理チャネルかつ特定特徴カテゴリについての上位層の特徴位置検出層（又は特徴統合層）ニューロンの出力レベル(またはそれに比例する量)を指す。本発明では、（１）または（３）の場合に相当する。なお、上記（２）は公知の方法であり、前述のような欠点がある。このように、注視位置は上位層からのフィードバック量が何らかの寄与をすることにより決定される結果、物体認識の過程は、下位層から上位層への信号処理によるボトムアップ過程と上位層から下位層へのトップダウン過程による上位層ニューロンと下位層ニューロン間の協調作用によりもたらされる注視領域の設定（注視制御ニューロンの活性化）が実現されて初めてスタートする。
【００９７】
このため、初期の過程（初めにボトムアップ過程で上位層ニューロンが信号を出力するまでの過程）では、いわゆる「認識・検出」された状態に達せず（この段階では、上位層では高次特徴に関するいわゆる顕著度マップが出力された潜在化状態）、トップダウン過程が関与することにより注視制御信号が発生した結果に基づいて再びボトムアップ過程を経て上位層の高次特徴の検出・統合ニューロンが出力した時点（この段階では、上位層において出力レベルの高いニューロンが局在化し、まばらに存在する顕在化状態）で一通りの認識・検出に関する動作が完結する。なお、上位層とは必ずしも図１または図１７の(2,N)層、或いは図１の(3,k)層に限定されず、それらより下位の中間層または不図示の更に上位の層でもよい。
【００９８】
注視制御ニューロン1801の数は、必ずしも特徴統合層(2,0)上のニューロン数（即ち、特徴検出を行う位置の数N_F0）と一致しなくてよく、例えば、最上位の特徴位置検出層のニューロン数（N_Fp）と同じであってもよい。前提条件として、N_Fp ＜ N_F0の場合、注視制御を行うためのサンプリング点数は、低次の特徴検出位置の数より少なくなるが、検出対象のサイズの下限値がνＳ₀（ここに、０＜ν＜１、Ｓ₀は画面の全面積）とすると、N_Fp<<νN_F0でなければ、実用上は殆ど問題とならない。なお、注視制御ニューロンが受ける低次の特徴顕著度マップは、特徴検出層(例えば、(1,0)層)からの出力でもよい。
【００９９】
一方、注視位置の更新により設定される注視領域のサイズは、当該注視制御ニューロン1801に対して最大のフィードバック量を与える最上位層ニューロンが属する（或いは注視制御ニューロンに対応する特徴統合層ニューロンが属するチャネルでも同じ）処理チャネル(上述した（３）の場合)、または上位層からの入力と低次特徴顕著度との線形和が最大(極大)となる注視制御ニューロンに対応する特徴統合層ニューロンの属する処理チャネルによって決まる、予め設定された大きさを有する。本実施形態では、注視領域のサイズは前述した特徴統合層(2,0)の当該処理チャネルに属するGabor フィルタのカーネルサイズとする。
【０１００】
注視領域設定制御層の具体的構成としては、注視制御ニューロン1801に加えて、前述した図１２または１３に示すようなチャネル処理構造において、図１６のゲーティング回路の様な構成、及び以下に示す図１９の注視位置更新制御回路1901を導入したものが考えられる。
【０１０１】
このゲーティング回路は、低次特徴のマスキング(特定の部分領域のデータのみ上位層に伝播させること)に用い、かつ同ゲーティング回路は特徴統合層(2,0)からの出力のうち、選択された特定チャネル内で注視すべき領域を選択的に抽出して、後続のチャネルに信号伝播させる機能を有する。このようにして、特定チャネルかつ入力データ上の特定領域からの低次特徴信号のみを伝播させることにより、注視領域の設定を行うことができる。
【０１０２】
他の構成としては、ある注視制御ニューロンが（例えば、上位層からのフィードバック信号により）選択され、活性化されることにより、該当する位置の特徴統合層（２，０）ニューロンへシナプスを介して発火促進のパルスが伝達され、その結果、当該特徴統合層ニューロンがＯＮ状態（発火可能状態：ペースメーカニューロンなどの出力を受信可能な状態、即ち、過渡的遷移状態、または当該遷移状態が許容される状態）となるようなものでもよい。
【０１０３】
注視領域設定制御層108、特徴統合層(2,0)、及び特徴検出層(1,1)の間の信号伝播経路の例を図２０、２１に示す。図２０、２１では共に、いわゆるデジタル回路の共通バスと共に設定され、スイッチ機能を有するゲート回路2001およびそのスイッチ機能のＯＮ/ＯＦＦを高速制御するスイッチ信号制御回路2002が用いられている。また、図２０では、注視制御ニューロン1801は、特徴統合層(2,0)の複数ニューロン、即ち、注視位置に相当する特徴統合層上のニューロン(以下、「注視中心ニューロン」と称する)を中心とする領域109に属するニューロン（当該領域サイズは統合層上の注視中心ニューロンが属する処理チャネルに固有である）処理チャネルに固有の所定サイズ領域内のニューロンとの結合をなす。
【０１０４】
図２０の構成では、注視制御ニューロン1801から特徴統合層への結合を有し、注視位置はWhere経路上の特徴位置検出層からのフィードバック量のみに基づいて制御される（その変形構成として注視制御ニューロンと特徴統合層ニューロンとは相互結合をなし、注視位置は上記フィードバック量と特徴統合層出力に基づいて制御される構成でもよい）。一方、図２１の構成では、注視制御ニューロン1801は図１６のゲーティング回路に相当するゲーティング層2101への同様な結合を有し、特徴統合層(2,0)から特徴検出層(1,1)へ向かう出力はゲーティング層2101を介してなされる。
【０１０５】
図２０に示す構成では、設定された注視位置の特徴統合層ニューロンを中心とする処理チャネルに固有サイズの領域（一般的に円形または矩形の領域）が活性化して、その領域に対応する特徴統合層(2,0)の出力が特徴検出層(1,1)に伝播する。一方、図２１に示す構成では、ゲーティング層のうち設定された注視位置を中心とする処理チャネルに固有サイズの領域のゲートだけがオープンし、その領域に対応する特徴統合層(2,0)の出力が特徴検出層(1,1)に伝播する。
【０１０６】
このゲーティング層はＦＰＧＡ等により構成され、注視制御ニューロンの各位置に対応して存在する論理回路の全体が、特徴統合層(2,0)と特徴検出層(1,1)との間に存在する一般的なゲーティング回路に相当し、注視制御ニューロンが活性化することにより、ゲーティング層の対応する位置にある論理回路ユニットだけがＯＮ状態となり、信号が後続層に伝達される。
【０１０７】
設定される注視領域のサイズは、既に説明したように上述の活性化した注視制御ニューロンに対応する特徴統合層ニューロンの属する処理チャネルに応じて決まる。例えば、対応する特徴統合層ニューロンが図１２または１３の処理チャネル１に属するとすると、予め決められたサイズのうち最大のものとなる。
【０１０８】
次に、注視位置の更新制御の機構について説明する。上述した注視制御機構の（３）の場合では、最初に設定した後の注視点位置は、基本的には最新の注視位置の一定範囲内の近傍領域での上位層から各注視制御ニューロンが受けるフィードバック量の大きさ（フィードバック量の最大値の次の大きさから）の順に更新される。また、上述した注視制御機構（１）の場合には、以下に示すような評価値Ｇ_fの値の順に位置が更新される。
【０１０９】
Ｇ_f＝η₁S_f＋η₂S_FB,f （5）
【０１１０】
ここに、G_fは特徴fに関する評価値であって、S_fは下位の特徴統合層からの特徴ｆに関する顕著度、S_FB,fは最上位層または中間の特徴検出層で検出された特徴ｆに関する注視制御ニューロン1801へのフィードバック量、η₁、η₂は正の定数を示す。η₁、η₂は、認識または検出の課題の性質、例えば、類似対象間の識別など詳細な特徴比較を行う場合や、人物の顔などの特定カテゴリに属する対象検出を行う平均的な(概略的な)特徴比較を行う場合などに応じて適応的に変化させることができる。
【０１１１】
具体的には、前者の課題では、η₁に対するη₂の比率を小さくし、逆に後者の場合には大きくする。この様なη₁、η₂の制御は、注視制御ニューロンへの対応するシナプス結合を変化させることにより行われる。
【０１１２】
図２２は、注視制御ニューロンのシナプスに付随するη₁、η₂の制御を行う場合の注視制御ニューロン1801と注視制御ニューロンへの入力配分制御回路2202を中心とした構成例を示す。
【０１１３】
ここに、入力配分制御回路2202は、下位層からの低次特徴の顕著度マップ信号の入力の検知レベルの変調に関与するシナプス2201と上位層からのフィードバック信号の検知レベルの変調に関与するシナプス2201の双方に結合し、外部からの制御信号(例えば、図１には図示していない更に上位の層から課題に関する信号として対象の検出モードか詳細識別モードかなどを示す信号)に応じてη₁、η₂値に相当するシナプスでの位相遅延量制御などを行う。
【０１１４】
近傍領域のサイズは、入力データ上の注視領域サイズと同程度（例えば、その１０倍を越えない値）とする。近傍領域での探索に限定したのは、処理の高速性を保つためであり、処理速度が著しく劣化しなければ、画面全体を探索範囲としてもよい。
【０１１５】
次に、注視位置の逐次更新制御を行う注視位置更新制御回路1901について説明する。この回路は、図１９に示すように、最新の注視位置に対応する注視制御ニューロン1801へのフィードバック量の一次記憶部1902、近傍領域内の各フィードバック量入力部1903、近傍領域内での最新の注視位置フィードバック量に次ぐ(次候補)フィードバック量の検出を行い、当該フィードバックを受ける注視制御ニューロンを検出して活性化させる更新位置決定部1904等から構成される。
【０１１６】
更新位置決定手段1904は、上記一次記憶部1902と近傍領域内フィードバック入力部1903からの入力を受け、最新の注視制御ニューロンでのフィードバック量とその近傍領域内でのフィードバック量との比較部1907、後述する２次フィードバック用信号増幅部1906、及び次候補フィードバック量検出部1905とを有し、次候補フィードバック量検出部1905で最新のフィードバック量に準じるフィードバック量を検出すると、２次フィードバック用信号増幅部1906からそれに対応する注視制御ニューロンに対してパルス出力を行う。
【０１１７】
最新のフィードバック量に準じるとは、具体的には、近傍領域内でランダムにフィードバック量を入力し、前述した方法で２つのフィードバック量を比較した結果、その差が一定基準(閾値)範囲内、または最新フィードバック量より比較対象のフィードバック量の方が大きいこと(探索範囲が更新されればありえる)を指し、これを検出することにより次の更新位置の検出が実効的に行われる。
【０１１８】
次に、図２０、２１に模式的に示す構成において、注視制御ニューロン1801と注視位置更新制御回路1901との信号の伝達について説明する。後述するように特定の注視制御ニューロン1801と注視位置更新制御回路1901とは一時的に相互結合をなし、前述した方法により選択された更新位置に該当する注視制御ニューロンは、更新位置決定部1904からの２次フィードバック入力（上位層から注視制御ニューロンへのフィードバック量に比例した信号レベルを有するフィードバック信号）を受けて活性化する（その結果、当該ニューロンに固有の注視領域が開かれることにより、入力データ上の該当する領域のデータに相当する信号のみが後続層に伝達される）。
【０１１９】
上記の一時的な相互結合とは、ある時刻で活性化している(即ち、後続層への信号伝達制御が行われる)注視制御ニューロンに対応して存在する近傍領域上の他の特定の注視制御ニューロンとの結合（信号の伝播）を局所共通バスを通じて一時的に行うことである。具体的には、近傍領域内の他の注視制御ニューロンのうち、一定時間範囲(例えば、十数ミリ秒オーダー)内で一つのニューロン(ランダムな位置にある)だけが自発的に発火出力し、そのスパイクパルスが共通バスを通って活性化している注視制御ニューロンに到達する結果、当該時間範囲内だけ、既に活性化している注視位置更新制御回路1901とのコミュニケーションチャネルが確立するものとする。
【０１２０】
このために、例えば近傍領域内にある最新の注視制御ニューロン以外の他のニューロンが発火状態になったときだけ共通バスへのアクセスを得るようにする。具体的には、その発火状態にあるニューロンからのパルス信号が共通バスと結合する配線上を伝播することにより、一時的にその配線の抵抗が低くなるような可変抵抗アレイとその制御回路を使用する(不図示)か、或いは各注視制御ニューロンと共通バスとの間に図２０、２１に模式的に示すゲート回路2001(スイッチ回路)とスイッチ信号制御回路2002とを設定し、注視制御ニューロンからの発火パルス信号をトリガーとして、スイッチ信号制御回路2002がゲート回路の一つ（図２０、２１の2001のｃ）にスイッチＯＮ信号を送り、一時的にチャネルオープン(共通バスへの接続をＯＮ状態)にするなどの構成が取られる。
【０１２１】
次候補の注視位置が選択された後、２次フィードバック信号は増幅手段８４により、注視制御ニューロンへの上位層からのフィードバック信号が所定の増幅（位相変調の場合、パルス位相の進行などによる）を受けて注視制御ニューロンに戻される。
【０１２２】
ここで、同時刻に活性化される注視制御ニューロン（従って注視領域）の数は一つの特徴カテゴリにつき一つに限られる。このために、ある時刻で活性化していた注視制御ニューロンから異なる注視制御ニューロンに活性化状態が遷移する際、元の活性化ニューロンは活動が一定時間抑制されるものとする。これは、（最近傍の）注視制御ニューロンどうしが抑制性結合を形成することにより実現される(図２０、２１の注視制御ニューロン1801から出ている点線は抑制性結合を表す)。
【０１２３】
なお、以上において注視制御に関わるフィードバック量の範囲には、予め設定された下限値があり、その下限値に対応する注視位置に制御が移った後は、再び最初の注視点位置に設定がなされるものとする。
【０１２４】
上位層からのフィードバック信号を用いて注視領域を注視制御層により直接更新制御する上述したような機構により、次のような従来にない効果がもたらされる。
【０１２５】
低次の特徴に関する顕著度マップのみを用いて注視領域制御を行う場合と比べて、効率的(従って高速)な探索を行うことができ、また、注視領域が２つの特徴パターン間を振動しつづけるような無意味な視覚探索を回避することができる。
【０１２６】
注視すべき対象において、その中の特に注視すべき特徴的な部分やパターン(例えば、顔などの場合目など)だけを選択的にかつ効率的に探索することができる。この場合には、必ずしも最上位層からのフィードバックを用いるのではなく対象中の特徴的な部位の検出に関与するような中間層からのフィードバックを用いてもよい。
【０１２７】
最新の注視位置の近傍領域で次の更新位置を探索することにより、高速に注視位置更新制御を行う事ができる。
【０１２８】
複数の同一カテゴリに属する対象が存在する場合でも、探索範囲とする近傍領域のサイズを適切にとることにより、注視点位置の遷移の範囲が特定対象に固定されることなく、それら対象間を安定的に順次遷移するような制御が可能である。
【０１２９】
特徴統合層での処理
特徴統合層（(２,０)、(２,１)、…）のニューロンについて説明する。図１に示すごとく特徴検出層（例えば(１,０)）から特徴統合層(例えば(２,０))への結合は、当該特徴統合ニューロンの受容野内にある前段の特徴検出層の同一特徴要素（タイプ）のニューロンから興奮性結合の入力を受けるように構成され、統合層のニューロンは前述したごとく、各特徴カテゴリごとの局所平均化（特徴検出ニューロンの受容野を形成するニューロンからの入力の平均値算出、代表値算出、最大値算出等）などによるサブサンプリングを行うもの（サブサンプリングニューロン）と、異なるスケール(処理チャネル)にまたがって、同一カテゴリの特徴に関する出力の結合を行うもの（集団的符号化ニューロン）とがある。
【０１３０】
前者によれば、複数の同一種類の特徴のパルスを入力し、それらを局所的な領域（受容野）で統合して平均化する（或いは、受容野内での最大値等の代表値を算出する）ことにより、その特徴の位置のゆらぎ、変形に対しても確実に検出することができる。このため、特徴統合層ニューロンの受容野構造は、特徴カテゴリによらず一様（例えば、いずれも所定サイズの矩形領域であって、かつ感度または重み係数がその中で一様分布するなど）となるように構成してよい。
【０１３１】
スケールレベルに関する集団的符号化処理
後者の集団的符号化(population coding)のメカニズムについて詳しく説明する。集団的符号化ニューロンでは、同一の階層レベル（図形特徴の複雑さが同程度）にあるが、異なる処理チャネルに属し、同一の特徴統合層内にある複数のサブサンプリングニューロンからの出力の正規化線形結合をとることにより統合する。例えば、Gabor wavelet変換を行う特徴検出層(1,0)の出力を受ける特徴統合層(2,0)においては、異なる処理チャネルに属し、方向選択性の等しいGaborフィルタのセット{g_mn}(n一定、m=1,2,…)に対応する出力を線形結合などにより統合する。具体的には、p_ij(t)を方向成分選択性がiでスケール選択性がjとなるようなサブサンプリングニューロンの出力、q_ij(t)を同様の選択性を有する集団的符号(population code)とすると、サブサンプリングニューロンの正規化出力の線形結合を表す式（６）、及びその正規化方法を表す式（７）の様に表される。なお、式（６）、（７）は、説明の便宜上サブサンプリングニューロンと集団的符号化ニューロンの出力状態遷移を離散時間遷移として表している。
【０１３２】
【外４】

【０１３３】
ここに、w_ij,abは複数の異なる選択性（感度特性）を有するニューロン（またはニューロン集団）からの（特徴カテゴリ、即ち方向成分選択性のインデックスがa、スケールレベル選択性のインデックスがbのサブサンプリングニューロン出力から方向成分選択性のインデックスがi、スケールレベル選択性のインデックスがjの集団的符号化ニューロンへの）寄与を表す結合係数である。
【０１３４】
w_ij,abは、方向成分インデックスi、スケールレベルインデックスjを中心とするフィルタ機能(選択性)を示し、典型的には|i-a|と|j-b|の関数形状(w_ij,ab=f(|i-a|,|j-b|))となる。後述するように、このw_ij,abを介した線形結合による集団的符号化は他の選択性を有するニューロンの検出レベルを考慮した上でq_ijが特徴カテゴリ（方向成分）およびスケールレベルに関する存在確率を与えるようにすることを目的とする。
【０１３５】
Ｃは正規化定数、λ、βは定数である(典型的にはβは１ないし２)。Cはある特徴カテゴリに対する集団的符号の総和が殆どゼロでもp_ijが発散しないようにするための定数である。なお、システム起動時の初期状態ではq_ij(0) = p_ij(0)とする。図１２に対応して式(６)、(７)ではスケールレベル選択性インデックスのみについての加算を行っている。その結果、集団的符号化ニューロンは、同一特徴カテゴリで異なるスケールレベル（処理チャネル）に属する各特徴についての存在確率（に比例する量）を出力することになる。
【０１３６】
一方、図１３の場合のように一般的には方向成分選択性インデックスについての加算も更に行うことにより、予め設定された数の方向成分の中間レベルについても集団的符号化を行う系を組み立てることができる。この場合、パラメータ（式（８）、（９）のβ、及びｗ_{ｉｊ，ｌｋ}）を適切に設定することにより、図１３に示す構成では、各集団的符号化ニューロンは、各スケールレベルと各特徴カテゴリについての特徴の存在確率（に比例する量）を出力することができる。
【０１３７】
式（６）に示すごとく、集団的符号q_ij(t)は、異なる感度特性を有するニューロンの出力に関する正規化された線形結合により得られる。定常状態に達したq_ij(t)は、適切に正規化(例えば、q_ijに関する総和値で正規化)して値が０から１の間になるようにすると、q_ijは方向成分がiとスケールレベルがjに相当する確率を与えることになる。従って、入力データ中の対象のサイズに対応するスケールレベルを明示的に値として求めるには、q_ijをフィッティングする曲線を求めて最大値を推定し、これに対応するスケールレベルを求めればよい。このようにして求まるスケールレベルは、一般的には予め設定したスケールレベルの中間的な値を示す。
【０１３８】
図２３はスケールレベルの集団的符号化の例を示し、横軸はスケールレベル、縦軸は細胞出力を表す。出力とは、パルス位相に相当し、特定のスケールにピーク感度を有するニューロンは、そのスケールからずれたサイズを有する特徴に対しては、特定スケールに対応するサイズの特徴と比べて出力レベルの低下、即ち、位相遅れが生じることになる。同図には、各特徴検出細胞のスケール選択性に関する感度曲線（いわゆるチューニング曲線）と各細胞出力、及びそれらを統合して得られる集団的符号統合出力(各細胞出力のスケールレベルに関するモーメント、即ち線形和)を示す。集団的符号の統合出力の横軸上の位置は、認識対象に関するスケール(サイズ)の推定値を反映している。
【０１３９】
本実施形態では、実際にはスケールレベルを明示的には求めず、特徴統合層から特徴検出層への出力はq_ijとする(正規化したq_ijでもよい)。即ち、図１２、１３のいずれも、特徴統合層から特徴検出層への出力は、サブサンプリングニューロンからの出力ではなく、集団的符号化ニューロンの出力とすることにより、最終的には、上記した正規化後のq_ijのように、複数スケールレベル（解像度）にまたがった特定対象の検出確率として集団的に表される。
【０１４０】
図１２に示す特徴統合層の回路構成では、先ず、サブサンプリングニューロン回路1201で前段の特徴検出層ニューロン出力のうち、各特徴カテゴリとサイズ選択性が同一のニューロン出力を、当該サブサンプリングニューロンの局所受容野で受け、局所的な平均化を行う。各サブサンプリングニューロン出力は、結合処理回路1203に送られる。このとき、後述するように各ニューロンからのパルス信号は、不図示のシナプス回路により、所定位相量（例えば、式（７）のβが２のとき、特徴検出ニューロンの出力レベル相当の２乗に比例する量）だけ遅延を受け、局所的な共通バスを介して伝播される。ただし、ニューロン間の配線には、共通バスを用いず物理的に独立配線してもよい。
【０１４１】
結合処理回路1203では、式（６）、（７）に相当する処理を行い、特徴カテゴリが同じだが、サイズ選択性の異なる（複数処理チャネルにまたがる）情報の集団的符号化を行う。
【０１４２】
また、図１２では特徴カテゴリ（方向成分選択性）が同一のサブサンプリングニューロン出力について集団的符号化を行ったのに対し、図１３に示す回路構成では、特徴カテゴリおよびサイズ選択性の全体にわたって行う結合処理回路で、式（８）、（９）に示すような処理を行う。
【０１４３】
【外５】

【０１４４】
一方、（2,0）層内の各チャネルごとのP_aに応じた信号増幅/減衰（パルス位相の前進/遅延）を、各集団的符号化ニューロンからの出力に対して行うような、チャネル活性度制御回路を設定することができる。図１５に、この様なチャネル活性度制御回路1502を設定した構成の模式図を示す。このチャネル活性度制御回路は、図１２、１３の集団的符号化ニューロン1202と次層である特徴検出層との間に設定される。
【０１４５】
最終層では、複数チャネルにわたって、高次特徴としての認識対象の存在確率が、ニューロンの活動レベル(即ち、発火周波数や発火スパイクの位相など)として表現される。Where処理経路(或いは最終層で検出・認識対象の位置情報も検出される場合)では、最終層で入力データ中の位置（場所）に応じた対象の存在確率（閾値処理すれば、対象の有無）が、各ニューロンの活動レベルとして検出される。集団的符号化は、正規化を行わない線形結合によって求めてもよいが、ノイズの影響を受けやすくなる可能性があり、正規化することが望ましい。
【０１４６】
式（７）または（９）に示す正規化は、神経回路網レベルでは、いわゆる分流型抑制(shunting inhibition)により、また、式（６）または（８）に示すような線形結合は、層内の結合（lateral connection）により実現することができる。
【０１４７】
βが２のときの正規化回路の例を図１４に示す。この正規化回路は、異なる処理チャネルに属する特徴検出細胞n_ijの出力の２乗和を取るための２乗和算出回路1403と、主に式（６）の正規化を行う分流型抑制回路1404、及び式（５）の線形和を求めて出力する線形和回路1405とから構成される。
【０１４８】
２乗和算出回路1403においては、各特徴検出細胞の２乗値を保持(pooling)する介在ニューロン(inter-neuron)素子1406が存在し、当該介在ニューロン1406への結合を与える各シナプス結合素子1402が、特徴検出細胞1401出力の２乗値に相当するパルス位相遅れ(或いはパルス幅変調、パルス周波数変調)を与える。
【０１４９】
分流型抑制回路1404は、例えば、介在ニューロン1406の出力に所定の係数(λ/C)を乗算した値の逆数に比例するような可変抵抗素子とコンデンサ及び特徴検出細胞1401の出力の２乗を与えるパルス位相変調回路(或いはパルス幅変調回路、パルス周波数変調回路)とから構成される。
【０１５０】
次に、チャネル処理の変形例について説明する。以上の様な処理チャネル毎に集団的符号化がなされ、各処理チャネル出力が後続層に伝達されるようにする構成（即ち、図１２又は１３の構成がカスケード的に後続層まで保持される構成）のほかに、処理効率を上げるとともに消費電力を抑えるために、特徴統合層（2,0）内の最大応答レベルを与える処理チャネルと同一のチャネルに属する（次の層の）特徴検出細胞のみに当該集団的符号化ニューロンの出力が伝播するようにしてもよい。
【０１５１】
この場合には、図１２、１３に示す構成に加えて、集団的符号化ニューロン回路の出力を受け、最大応答レベルを与える処理チャネル選択回路として最大入力検出回路、いわゆるWinner-Take-All回路(以下、ＷＴＡ回路と称す)を特徴統合層(2,0)出力と次の特徴検出層(1,1)との間に存在するように設定する。この処理チャネル選択回路は特徴統合層の各位置ごとに設定してもよいし、当該層に一つ、場所によらず入力データ全体について処理チャネルごとの最大応答レベルを算出する回路として設定してもよい。
【０１５２】
ＷＴＡ回路としては例えば、特開平08-321747号公報、USP5059814, USP5146106その他に記載された公知の構成を用いることができる。特徴統合層においてＷＴＡ回路により特徴統合層の最大応答を示す処理チャネルのみの出力を次の層である特徴検出層に伝播させる構成を図１６の（Ａ）に模式的に示す。これは、図１５のチャネル活性度制御回路1502をゲーティング回路1602で置き換えたものである。
【０１５３】
ゲーティング回路1602は、図１６の（Ｂ）に示すように各処理チャネルごとの平均出力レベルを入力するＷＴＡ回路1603と、最大の平均出力レベルを示す処理チャネルからの各ニューロンの出力を次の層の同一チャネルに伝播させるためのチャネル選択回路1604とを有する。
【０１５４】
また、後続の特徴統合層(2,k)（ｋは１以上）では、このような処理チャネル選択回路は必ずしも要しないが、例えば、高次特徴検出後の特徴統合層の出力を処理チャネル選択回路経由でフィードバックして低次又は中次特徴の統合層での処理チャネル選択を行うようにしてもよい。以上でチャネル処理の変形例についての説明を終わる。なお、図１２、１３に示すようなサブサンプリング、結合処理、集団的符号化の流れを特徴統合層内で行う構成に限定されず、例えば結合処理、集団的符号化の為の層を別に設けるなどしてもよいことは言うまでもない。
【０１５５】
なお、サイズのほぼ等しい対象が近接して存在し、或いは部分的に重なり合って存在しているときでも、局所的な受容野構造とサブサンプリング構造等による部分的な複数種類の特徴を統合して検出するメカニズムにより、対象の認識、検出性能が保持されることは、言うまでもない。
【０１５６】
特徴統合層でのパルス信号処理
前述した結合回路は、式(５)又は（７）により得られる集団的符号化レベルに対応するパルスを、各集団的符号化ニューロンに出力し、層番号(2,k)の各特徴統合細胞(n₁,n₂,n₃)としての前述した集団的符号化ニューロンは、層番号(1,k+1)の層のペースメーカニューロンからのパルス入力を受け、かつ前段の特徴検出層またはセンサー入力層（層番号(1,k）)からの入力により、前述した結合回路出力が十分なレベルにある場合（例えば、ある時間範囲または時間窓での平均入力パルス数が閾値より大、或いはパルス位相が進んでいること）には、ペースメーカからのパルスの立ち下がり時を基準とした出力を行う。
【０１５７】
また、前述したサブサンプリングニューロンは、いずれのペースメーカニューロンからの制御も受けず、前段の(1,k)層の特徴検出細胞からの平均的な（各サブサンプリングニューロンごとに独立した位相をもった時間窓内）出力レベルに基づき、サブサンプリング処理を行う。また、サブサンプリングニューロンから結合処理ニューロンへのパルス出力タイミング制御も、ペースメーカニューロンを介さずに行われ、結合処理回路から集団的符号化ニューロンへのパルス出力も同様である。
【０１５８】
このように本実施形態では、特徴統合細胞（サブサンプリングニューロン、集団的符号化ニューロンなど）は、その前の層番号(1,k)の特徴検出層上のペースメーカニューロンからのタイミング制御を受けるようには、構成していない。なぜならば、特徴統合細胞においては、入力パルスの到着時間パターンではなく、むしろ一定の時間範囲での入力レベル（入力パルスの時間的総和値など）によって決まる位相（周波数、パルス幅、振幅のいずれが依存してもよいが、本実施形態では位相とした）でのパルス出力をするため、時間窓の発生タイミングは余り重要ではないからである。なお、このことは、特徴統合細胞が前段の層の特徴検出層のペースメーカニューロンからのタイミング制御を受ける構成を排除する趣旨ではなく、そのような構成も可能であることはいうまでもない。
【０１５９】
特徴位置検出層での処理
特徴位置検出層は、図１に示すように、What経路と分離したWhere経路をなし、同一階層レベルの特徴検出層(1,k)との結合、及び注視領域設定制御層108へのフィードバック結合をなす。特徴位置検出層107のニューロンの配列及び機能は、特徴統合層103における集団的符号化を行わないこと、及び特徴配置関係の情報が失われないように上位層と下位層間で受容野サイズが大きく変わらないことを除いて、特徴統合層103のサブサンプリングニューロン1201と同様に、特徴カテゴリごとに各ニューロンが配列し、また特徴検出層ニューロンと結合することにより、サブサンプリングを行っている。その結果、最上位の特徴位置検出層では、認識・検出対象に関する大まかな空間分布(配置)を表すニューロンの発火状態の分布が得られる。
【０１６０】
特徴位置検出層でのパルス信号処理
特徴統合層103のサブサンプリングニューロン1201と同じである。即ち、(3,k)層のニューロンは、いずれのペースメーカニューロンからの制御を受けず、前段の(1,k)層の特徴検出細胞からの平均的な（各サブサンプリングニューロンごとに独立した位相をもった時間窓内）出力レベルに基づき、サブサンプリング処理を行う。
【０１６１】
パターン検出の動作原理
次に、２次元図形パターンのパルス符号化と検出方法について説明する。図３は、特徴統合層103から特徴検出層102への（例えば、図１の層(２,０)から層(１,１)への）パルス信号の伝播の様子を模式的に示した図である。
【０１６２】
特徴統合層103側の各ニューロンn_i（n₁〜n₄）は、それぞれ異なる特徴量（或いは特徴要素）に対応しており、特徴検出層102側のニューロンn'_jは、同一受容野内の各特徴を組み合わせて得られる、より高次の特徴（図形要素）の検出に関与する。
【０１６３】
各ニューロン間結合には、パルスの伝播時間とニューロンn_iからニューロンn'_jへのシナプス結合（S_ij）での時間遅れ等による固有（特徴に固有）の遅延が生じ、その結果として、共通バスライン301を介してニューロンn'_jに到着するパルス列P_iは、特徴統合層103の各ニューロンからパルス出力がなされる限り、学習によって決まるシナプス結合での遅延量により、所定の順序（及び間隔）になっている（図３の（Ａ）では、P₄,P₃,P₂,P₁の順に到着することが示されている）。
【０１６４】
図３の（Ｂ）は、ペースメーカニューロンからのタイミング信号を用いて時間窓の同期制御を行う場合において、層番号（2,k）上の特徴統合細胞n₁、n₂、n₃（それぞれ異なる種類の特徴を表す）から、層番号（1,k+1）上のある特徴検出細胞(n'_j)（より上位の特徴検出を行う）へのパルス伝播のタイミング等を示している。
【０１６５】
図６は、特徴検出層ニューロンにペースメーカニューロンからの入力がある場合のネットワーク構成を示す図である。図６において、ペースメーカニューロン603（n_p)は、同一の受容野を形成し、かつ異なる種類の特徴を検出する特徴検出ニューロン602（n_j,n_k等）に付随し、それらと同一の受容野を形成して、特徴統合層（または入力層）上のニューロン601からの興奮性結合を受ける。そして、その入力の総和値（或いは受容野全体の活動度レベル平均値など、受容野全体に固有の活動特性を表す状態に依存するように制御するため）によって決まる所定のタイミング（または周波数）でパルス出力を特徴検出ニューロン602及び特徴統合ニューロンに対して行う。
【０１６６】
また、各特徴検出ニューロン602では、その入力をトリガー信号として互いに時間窓が位相ロックする様に構成されているが、前述したようにペースメーカニューロン入力がある前は、位相ロックされず、各ニューロンはランダムな位相でパルス出力する。また、特徴検出ニューロン602では、ペースメーカニューロン603からの入力がある前は後述する時間窓積分は行われず、ペースメーカニューロン603からのパルス入力をトリガーとして、同積分が行われる。
【０１６７】
ここに、時間窓は特徴検出細胞(n'_i)ごとに定められ、当該細胞に関して同一受容野を形成する特徴統合層内の各ニューロンおよび、ペースメーカニューロン603に対して共通であり、時間窓積分の時間範囲を与える。
【０１６８】
層番号(1,k)にあるペースメーカニューロン603は（ｋは自然数）、パルス出力を、層番号(2,k-1)の各特徴統合細胞、及びそのペースメーカニューロン603が属する特徴検出細胞（層番号(1,k)）に出力することにより、特徴検出細胞が時間的に入力を加算する際の時間窓発生のタイミング信号を与えている。この時間窓の開始時刻が各特徴統合細胞から出力されるパルスの到着時間を図る基準時となる。即ち、ペースメーカニューロン603は特徴統合細胞からのパルス出力時刻、及び特徴検出細胞での時間窓積分の基準パルスを与える。
【０１６９】
各パルスは、シナプス回路を通過すると所定量の位相遅延が与えられ、更に共通バスなどの信号伝達線を通って特徴検出細胞に到着する。この時のパルスの時間軸上の並びを、特徴検出細胞の時間軸上において点線で表したパルス（Ｐ₁，Ｐ₂，Ｐ₃）により示す。
【０１７０】
特徴検出細胞において各パルス（Ｐ₁，Ｐ₂,Ｐ₃）の時間窓積分(通常、一回の積分とする；但し、多数回に渡る時間窓積分による電荷蓄積、または多数回に渡る時間窓積分の平均化処理を行ってもよい)の結果、閾値より大となった場合には、時間窓の終了時刻を基準としてパルス出力（Ｐ_d）がなされる。なお、図３の（B）に示した学習時の時間窓とは、後で説明する学習則を実行する際に参照されるものである。
【０１７１】
シナプス回路等
図４は、シナプス回路S_iの構成を示す図である。図４の（Ａ）は、シナプス回路202(S_i)において、ニューロンn_iの結合先である各ニューロンn'_jへのシナプス結合強度（位相遅延）を与える各小回路401が、マトリクス的に配置されていることを示している。このようにすると、シナプス回路から結合先ニューロンへの配線を各受容野に対応する同一ライン（局所的な共通バス301）上で行う事ができ（ニューロン間の配線を仮想的に行うことができ）、従来から問題となっていた配線問題の軽減（除去）が図られる。
【０１７２】
また、結合先のニューロンでは、同一受容野からの複数パルス入力を受けた際に、それぞれがどのニューロンから発せられたものかを時間窓基準でのパルスの到着時間（特徴検出細胞が検出する特徴に対応し、それを構成する低次特徴に固有の位相遅延）により、時間軸上で識別することができる。
【０１７３】
図４の（Ｂ）に示すように、各シナプス結合小回路401は、学習回路402と位相遅延回路403とからなる。学習回路402は、位相遅延回路403の特性を変化させることにより、上記遅延量を調整し、また、その特性値（或いはその制御値）を浮遊ゲート素子、或いは浮遊ゲート素子と結合したキャパシタ上に記憶するものである。
【０１７４】
図５は、シナプス結合小回路の詳細構成を示す図である。位相遅延回路４０３はパルス位相変調回路であり、例えば、図５の（Ａ）に示すように、単安定マルチバイブレータ５０６、５０７、抵抗５０１、５０４、キャパシタ５０３、５０５、トランジスタ５０２を用いて構成できる。図５の（Ｂ）は、単安定マルチバイブレータ５０６へ入力された方形波Ｐ１（図５の（Ｂ）の［１］）、単安定マルチバイブレータ５０６から出力される方形波Ｐ２（同［２］）、単安定マルチバイブレータ５０７から出力される方形波Ｐ３（同［３］）の各タイミングを表している。
【０１７５】
位相遅延回路403の動作機構の詳細については説明を省略するが、Ｐ１のパルス幅は、充電電流によるキャパシタ503の電圧が予め定められた閾値に達するまでの時間で決まり、Ｐ２の幅は抵抗504とキャパシタ505による時定数で決まる。Ｐ２のパルス幅が（図５の（Ｂ）の点線方形波のように）広がって、その立ち下がり時点が後にずれるとＰ３の立ち上がり時点も同じ量ずれるが、Ｐ３のパルス幅は変わらないので、結果的に入力パルスの位相だけが変調されて出力されたことになる。
【０１７６】
制御電圧Ecを基準電圧のリフレッシュ回路509と結合荷重を与えるキャパシタ508への電荷蓄積量制御を行う学習回路402で変化させることにより、パルス位相（遅延量）を制御することができる。この結合荷重の長期保持のためには、学習動作後に図５の（Ａ）の回路の外側に付加される浮遊ゲート素子（図示せず）のチャージとして、或いはデジタルメモリへの書き込み等を行って結合荷重を格納してもよい。その他回路規模を小さくなるように工夫した構成（例えば、特開平5-37317号公報、特開平10-327054号公報参照）など周知の回路構成を用いることができる。
【０１７７】
ネットワークが結合荷重の共有結合形式（特に、１個の重み係数で複数のシナプス結合を同一に表す場合）になるような構成をとる場合には、各シナプスでの遅延量（下記の式（９）のＰij）が、図３の場合と異なって、同一受容野内で一様とすることもできる。特に、特徴検出層から特徴統合層への結合は、特徴統合層がその前段の層である特徴検出層出力の局所平均化その他によるサブサンプリングに関与するため、検出対象によらず（即ち、課題によらず）、このように構成することができる。
【０１７８】
この場合、図４の（Ａ）の各小回路は、図４の（Ｂ）に示すように、局所共通バスライン401で結合される単一の回路S_k,iで済み、特に経済的な回路構成となる。一方、特徴統合層103（またはセンサー入力層101）から特徴検出層102への結合がこのようになっている場合、特徴検出ニューロンが検出するのは、複数の異なる特徴要素を表すパルスの同時到着（或いは、略同時到着）という、イベントである。
【０１７９】
なお、結合が対称性を有する場合には、同一荷重（位相遅延）量を与える結合を同一のシナプス結合用小回路で代表させることにより、相当数のシナプス結合が少数の回路で代表されるように構成することができる。特に幾何学的特徴量の検出においては、受容野内での結合荷重の分布が対称性を有する場合が多いので、シナプス結合回路を減少させ回路規模を大幅に縮小にすることが可能である。
【０１８０】
パルスの同時到着、或いは所定の位相変調量を実現するシナプスでの学習回路の例としては、図５の（Ｃ）に示すような回路要素を有するものを用いればよい。即ち、学習回路402をパルス伝播時間計測回路510（ここで、伝播時間とは、ある層のニューロンの前シナプスでのパルス出力時間と次の層上にある出力先ニューロンでの当該パルスの到着時間との時間差をさし、図３の（Ｂ）では、シナプス遅延と伝播に要した時間との和になる）、時間窓発生回路511、及び伝播時間が一定値となるようにシナプス部でのパルス位相変調量を調整するパルス位相変調量調整回路512から構成できる。
【０１８１】
伝播時間計測回路としては、後述するような同一局所受容野を形成するペースメーカニューロンからのクロックパルスを入力し、所定の時間幅（時間窓：図３の（Ｂ）参照）において、そのクロックパルスのカウンター回路からの出力に基づき伝播時間を求めるような構成などが用いられる。なお、時間窓は出力先ニューロンの発火時点を基準として設定することにより、以下に示すような拡張されたＨｅｂｂの学習則が適用される。
【０１８２】
学習則
また、学習回路402は、同じカテゴリの物体が提示される頻度が大きくなるほど上記時間窓の幅が狭くなるようにしてもよい。このようにすることにより、見慣れた（すなわち呈示回数、学習回数の多い）カテゴリのパターンであるほど、複数パルスの同時到着の検出(coincidence detection)モードに近づく様な動作をすることになる。このようにすることにより、特徴検出に要する時間を短縮できる(瞬時検出の動作が可能となる)が、特徴要素の空間配置の細かな比較分析や、類似するパターン間の識別等を行うことには適さなくなる。
【０１８３】
遅延量の学習過程は、例えば、複素数ドメインに拡張することにより、特徴検出層のニューロンniと特徴統合層のニューロンn_jとの間の複素結合荷重Ｃ_ijは、
Ｃ_ij＝Ｓ_ijexp(iP_ij) （１０）
のように与えられる。ここに、S_ijは結合強度、P_ijは位相、その前のｉは純虚数を表し、所定周波数でニューロンjからニューロンiに出力されるパルス信号の時間遅れに相当する位相である。S_ijはニューロンiの受容野構造を反映し、認識検出する対象に応じて一般に異なる構造を有する。これは学習（教師付き学習または自己組織化）により別途形成されるか、或いは予め決められた構造として形成される。
【０１８４】
一方、遅延量に関する自己組織化のための学習則は、
【０１８５】
【外６】

で与えられる。但し、
【０１８６】
【外７】

はＣの時間微分、τ_ijは上記時間遅れ（予め設定された量）、β（〜１）は定数を示す。
【０１８７】
上式を解くと、Ｃ_ijはβexp(-2πiτ_ij)に収束し、従って、P_ijは−τ_ijに収束する。学習則適用の例を図３の（Ｂ）に示した学習時の時間窓を参照して説明すると、シナプス結合の前側ニューロン（n1,n2,n3）と後側ニューロン(特徴検出細胞)とが、その学習時間窓の時間範囲において、ともに発火しているときにだけ、式（１１）に従って結合荷重が更新される。なお、図３の（Ｂ）において、特徴検出細胞は時間窓の経過後に発火しているが、同図の時間窓経過前に発火してもよい。
【０１８８】
特徴検出層処理
以下、特徴検出層で主に行われる処理（学習時、認識時）について説明する。
【０１８９】
各特徴検出層102においては、前述したように、各スケールレベルごとに設定される処理チャネル内において、同一受容野からの複数の異なる特徴に関するパルス信号を入力し、時空間的重み付き総和（荷重和）演算と閾値処理を行う。各特徴量に対応するパルスは、予め学習により定められた遅延量(位相) により、所定の時間間隔で到着する。
【０１９０】
このパルス到着時間パターンの学習制御は、本願の主眼ではないので詳しくは説明しないが、例えば、ある図形パターンを構成する特徴要素がその図形の検出に最も寄与する特徴であるほど先に到着し、そのままでは、パルス到着時間がほぼ等しくなる特徴要素間では、互いに一定量だけ時間的に離れて到着するような競争学習を導入する。或いは、予め決められた特徴要素(認識対象を構成する特徴要素であって、特に重要と考えられるもの：例えば、平均曲率の大きい特徴、直線性の高い特徴など)間で異なる時間間隔で到着する様に設計してもよい。
【０１９１】
本実施形態では、前段の層である特徴統合層上の同一受容野内の各低次特徴要素に相当するニューロンは、それぞれ所定の位相で同期発火（パルス出力）することになる。
【０１９２】
一般的に、特徴統合層のニューロンであって、位置が異なるが同一の高次の特徴を検出する特徴検出ニューロンへの結合が存在する（この場合、受容野は異なるが、高次の同じ特徴を構成する結合を有する）。この時、これら特徴検出ニューロンとの間で同期発火することはいうまでもない。但し、その出力レベル（ここでは位相基準とするが、周波数、振幅、パルス幅基準となる構成でもよい）は、特徴検出ニューロンの受容野ごとに与えられる複数ペースメーカニューロンからの寄与の総和（或いは平均など）によって決まる。また、特徴検出層102上の各ニューロンにおいては、入力パルスの時空間的重み付き総和（荷重和）の演算は、ニューロンに到着したパルス列について、所定幅の時間窓においてのみ行われる。時間窓内の重み付き加算を実現する機構は、図２に示したニューロン素子回路に限らず、他の方法で実現してもよいことは言うまでもない。
【０１９３】
この時間窓は、実際のニューロンの不応期(refractory period)以外の時間帯にある程度対応している。即ち、不応期(時間窓以外の時間範囲)にはどのような入力を受けてもニューロンからの出力はないが、その時間範囲以外の時間窓では入力レベルに応じた発火を行うという点が実際のニューロンと類似している。
【０１９４】
図３の（Ｂ）に示す不応期は、特徴検出細胞の発火直後から次の時間窓開始時刻までの時間帯である。不応期の長さと時間窓の幅は任意に設定可能であることはいうまでもなく、同図に示したように時間窓に比べて不応期を短くとらなくてもよい。ペースメーカニューロンを使わなくても時間窓の開始時刻は、特徴検出層と特徴統合層のニューロン間でニューロン間の弱相互結合と所定の結合条件などにより同期発火するメカニズム（E.M.Izhikevich, 1999 'Weakly Pulse-Coupled Oscillation, FM Interactions, Synchronization, and Oscillatory Associative Memory' IEEE Trans. on Neural Networks, vol.10. pp.508-526.）を導入することにより、これらニューロン間で同一となる。この同期発火は、一般的にニューロン間での相互結合と引き込み現象によりもたらされることが知られている。
【０１９５】
従って、本実施形態においてもニューロン間の弱相互結合と所定のシナプス結合条件を満たすように構成することによりペースメーカニューロンなしで、このような効果をもたらすことができる。
【０１９６】
本実施形態では、図６に模式的に示すように、既に説明したメカニズムとして、例えば各特徴検出層ニューロンごとに、その同一受容野からの入力を受けるようなペースメーカニューロン（固定周波数でパルス出力）によるタイミング情報（クロックパルス）の入力により、上述した開始時期の共通化をもたらすようにしてもよい。
【０１９７】
このように構成した場合には、時間窓の同期制御は（仮に必要であったとしても）ネットワーク全体にわたって行う必要が無く、また、上記したようなクロックパルスの揺らぎ、変動があっても、局所的な同一受容野からの出力に対して一様にその影響を受ける（窓関数の時間軸上での位置の揺らぎは同一受容野を形成するニューロン間で同一となる）ので、特徴検出の信頼性は劣化することはない。このような局所的な回路制御により信頼度の高い同期動作を可能にするため、回路素子パラメータに関するばらつきの許容度も高くなる。
【０１９８】
以下、簡単のために、三角形を特徴として検出する特徴検出ニューロンについて説明する。その前段の特徴統合層103は、図７の（Ｃ）に示すような各種向きを持ったL字パターン(f₁₁, f₁₂, …, )、Ｌ字パターンとの連続性（連結性）を有する線分の組み合わせパターン(f₂₁, f₂₂,…)、三角形を構成する２辺の一部の組み合わせ(f₃₁,…)、などのような図形的特徴（特徴要素）に反応するものとする。
【０１９９】
また、同図のf₄₁,f₄₂,f₄₃は向きの異なる三角形を構成する特徴であって、f₁₁,f₁₂,f₁₃に対応する特徴を示している。学習により層間結合をなすニューロン間に固有の遅延量が設定された結果、三角形の特徴検出ニューロンにおいては、時間窓を分割して得られる各サブ時間窓(タイムスロット）(w₁,w₂,…)において、三角形を構成する主要かつ異なる特徴に対応するパルスが到着するように予め設定がなされる。
【０２００】
例えば、時間窓をｎ分割した後のw₁, w₂, …、wnには、図７の(A)に示す如く、全体として三角形を構成するような特徴のセットの組み合わせに対応するパルスが初めに到着する。ここに、Ｌ字パターン(f₁₁, f₁₂, f₁₃)は、それぞれw₁,w₂,w₃内に到着し、特徴要素(f₂₁,f₂₂,f₂₃)に対応するパルスは、それぞれw₁, w₂, w₃内に到着するように学習により遅延量が設定されている。
【０２０１】
特徴要素(f₃₁,f₃₂,f₃₃)に対応するパルスも同様の順序で到着する。図７の(A)の場合、一つのサブ時間窓(タイムスロット）にそれぞれ一つの特徴要素に対応するパルスが到着する。サブ時間窓に分割する意味は、各サブ時間窓で時間軸上に展開表現された異なる特徴要素に対応するパルスの検出（特徴要素の検出）を個別にかつ確実に行うことにより、それらの特徴を統合する際の統合の仕方、例えば、すべての特徴要素の検出を条件とするか、或いは一定割合の特徴検出を条件とするか等の処理モードの変更可能性や適応性を高めることにある。
【０２０２】
例えば、認識（検出）対象が顔であり、それを構成するパーツである目の探索（検出）が重要であるような状況（目のパターン検出の優先度を視覚探索において高く設定したい場合）においては、高次の特徴検出層からのフィードバック結合を導入することにより、選択的に目を構成する特徴要素パターンに対応する反応選択性（特定の特徴の検出感度）を高めたりすることができる。このようにすることにより、高次の特徴要素（パターン）を構成する低次の特徴要素により高い重要度を与えて検出することができる。
【０２０３】
また、重要な特徴ほど早いサブ時間窓にパルスが到着するように予め設定されているとすると、当該サブ時間窓での重み関数値が他のサブ時間窓での値より大きくすることにより、重要度の高い特徴ほど検出されやすくすることができる。この重要度（特徴間の検出優先度）は学習により獲得されるか、予め定義しておくこともできる。
【０２０４】
従って、一定割合の特徴要素の検出という事象さえ起きればよいのであれば、サブ時間窓への分割は殆ど意味が無くなり、一つの時間窓において行えばよい。
【０２０５】
なお、複数（３つ）の異なる特徴要素に対応するパルスがそれぞれ到着して加算されるようにしてもよい(図７の（Ｄ）参照)。即ち、一つのサブ時間窓(タイムスロット）に複数の特徴要素(図７の（Ｄ）)、或いは任意の数の特徴要素に対応するパルスが入力されることを前提としてもよい。この場合、図７の（Ｄ）では、初めのサブ時間窓では、三角形の頂角部分ｆ₁₁の検出を支持する他の特徴要素ｆ₂₁、ｆ₂₃に対応するパルスが到着し、同様に２番目のサブ時間窓には頂角部分ｆ₁₂の検出を支持するような他の特徴要素ｆ₂₂、ｆ₃₁のパルスが到着している。
【０２０６】
なお、サブ時間窓(タイムスロット）への分割数、各サブ時間窓(タイムスロット）の幅および特徴のクラスおよび特徴に対応するパルスの時間間隔の割り当てなどは上述した説明に限らず、変更可能であることはいうまでもない。例えば、上述した特徴要素の他に'Ｘ'，'＋'等の特徴要素に対応するサブ時間窓を設定してもよい。三角形の図形検出にはこのような特徴要素は冗長(又は不要)ともいえるが、逆に、これらが存在しないことを検出することにより、三角形という図形パターンの検出確度を高めることができる。
【０２０７】
また、これら特徴要素の組み合わせでは表されないような変形を加えた場合（例えば、一定範囲内の回転を与えた場合）に対しても、上記特徴要素を表す特徴統合層のニューロンの出力パルスは、理想的なパターンからのずれの程度に応じた連続的な位相遅れ(遅延量：但し、予め定めたサブ時間窓(タイムスロット）にパルスが到着する範囲)をもって反応する（いわゆるgraceful degradation）ため、検出される図形特徴の変形に対する許容範囲が一定レベル以上になるよう出力の安定化が図られている。例えば、図７の（Ｃ）に示す特徴ｆ₁₁、ｆ₁₂、ｆ₁₃に対応する特徴により形成される三角形（Ｑ１）と、ｆ₄₁、ｆ₄₂、ｆ₄₃に対応する特徴により形成される三角形（Ｑ２）とでは、少なくとも向きが互いに異なっている筈である。
【０２０８】
この場合、各特徴に対応する検出（統合）細胞が存在するとき、両三角形の中間的な向きに相当する三角形（Ｑ３）に対しては、ｆ₁₁、ｆ₁₂、ｆ₁₃に対応する検出(統合)細胞とｆ₄₁、ｆ₄₂、ｆ₄₃に対応する検出（細胞）とは、いずれも最大応答出力より低く、直接的には特徴の種類に応じて決まる受容野構造としてのフィルタカーネルとの畳み込み演算値に応じた出力レベルとなり、これら全ての細胞からの出力としてのベクトル量は中間的な図形に固有なものとして統合すると、２つの三角形の状態の中間的な図形（回転を与えた場合）の検出が可能になる。
【０２０９】
例えば、定性的には、回転角度が小さく、Ｑ１に近いほどｆ₁₁、ｆ₁₂、ｆ₁₃に対応する細胞からの出力が相対的に大きく、逆にＱ２に近いほどｆ₄₁、ｆ₄₂、ｆ₄₃に対応する細胞からの出力が大きくなる。
【０２１０】
パルス出力の時空間的統合及びネットワーク特性
次に入力パルスの時空間的重み付き総和（荷重和）の演算について説明する。図７の（B）に示す如く、各ニューロンでは、上記サブ時間窓(タイムスロット）毎に所定の重み関数（例えばGaussian）で入力パルスの荷重和がとられ、各荷重和の総和が閾値と比較される。τ_jはサブ時間窓ｊの重み関数の中心位置を表し、時間窓の開始時刻基準（開始時間からの経過時間）で表す。重み関数は一般に所定の中心位置（検出予定の特徴が検出された場合のパルス到着時間を表す）からの距離(時間軸上でのずれ)の関数になる。
【０２１１】
従って、ニューロンの各サブ時間窓(タイムスロット）の重み関数のピーク位置τが、ニューロン間の学習後の時間遅れとすると、入力パルスの時空間的重み付き総和（荷重和）を行う神経回路網は、一種の時間軸ドメインの動径基底関数ネットワーク（Radial Basis Function Network；以下ＲＢＦと略す）と見なすことができる。Gaussian関数の重み関数を用いたニューロンn_iの時間窓F_Tiは、各サブ時間窓毎の広がりをσ、係数因子（シナプス結合荷重値に相当）をb_ijで表すと、
【０２１２】
【外８】

【０２１３】
なお、重み関数としては、負の値をとるものであってもよい。例えば、ある特徴検出層のニューロンが三角形を最終的に検出することが予定されている場合に、その図形パターンの構成要素でないことが明らかな特徴（Ｆ_faulse）（例えば、前述した'Ｘ'，'＋'等）が検出された場合には、他の特徴要素からの寄与が大きくても三角形の検出出力が最終的になされないように、入力の総和値算出処理において、当該特徴（Ｆ_faulse）に対応するパルスからは、負の寄与を与えるような重み関数及び特徴検出(統合)細胞からの結合を与えておくことができる。
【０２１４】
特徴検出層のニューロンn_iへの入力信号の時空間和X_i(t)は、
【０２１５】
【外９】

と表せる。ここに、ε_jは、ニューロンn_jからの出力パルスの初期位相であり、ニューロンn_iとの同期発火により、０に収束するか、又はペースメーカニューロンからのタイミングパルス入力により、時間窓の位相を０に強制同期する場合には、ε_jは常に０としてよい。図７の（Ａ）のパルス入力と同（Ｂ）に示す重み関数による荷重和とを実行すると、図７の（E）に示すような荷重和値の時間的遷移が得られる。特徴検出ニューロンは、この荷重和値が閾値(Vt)に達するとパルス出力を行う。
【０２１６】
ニューロンn_iからの出力パルス信号は、前述したように、入力信号の時空間和（いわゆる総入力和）のsquashing非線形関数となる出力レベルと学習により与えられた時間遅れ（位相）をもって上位層のニューロンに出力される（パルス出力は固定周波数(２値)とし、学習によって決まる固定遅延量に相当する位相に入力信号の時空間和についてのsquashing非線形関数となる位相変調量を加えて出力される）。
【０２１７】
処理フロー概要
図８は、上述した各層の処理手順を示すフローチャートである。低次特徴検出から高次特徴検出までの処理の流れをまとめて示すと、同図のようになる。先ず、ステップS801で、低次特徴検出（例えば、各位置でのGabor wavelet変換係数の算出など）を行なう。次に、ステップS802で、それらの特徴の局所平均化等を行う低次特徴の統合処理を行う。更に、ステップS803〜804で中次特徴の検出と統合、ステップS805〜806で高次特徴の検出と統合を行う。そして、ステップS807では、最終層の出力として、認識(検出)対象の有無またはその検出位置出力が行われる。ステップS803〜804とS805〜806に割り当てる層数は、課題（認識対象など）に応じて任意に設定又は変更することができる。
【０２１８】
図９は、各特徴検出ニューロン602の処理の手順を示すフローチャートである。まず、ステップS901で、複数の特徴カテゴリに応じたパルスを、前層である入力層101または特徴統合層103において同一受容野105を形成するニューロン601から入力を受け、ステップS902で、ペースメーカニューロン603から入力される（又は前層ニューロンとの相互作用により得られる）局所同期信号に基づき、時間窓及び重み関数を発生させ、ステップS903で、それぞれについての所定の時間的重み関数による荷重和をとり、ステップS904で、閾値に達したか否かの判定を行い、閾値に達した場合には、ステップS905で、パルス出力を行う。なお、ステップS902と903は時系列的に示したが、実際にはほぼ同時に行われる。
【０２１９】
また、各特徴統合ニューロンの処理の手順は、図１０のフローチャートに示す通りである。すなわち、ステップS1001において、同一カテゴリをなす特徴検出の処理モジュール104であって、当該ニューロンに固有の局所受容野をなす特徴検出ニューロンからのパルス入力を受け、ステップS1002で、所定の時間幅（不応期以外の時間範囲）において入力パルスの加算を行う。ステップS1003で、入力パルスの総和値（例えば、電位基準で測る）が閾値に達したか否かの判定を行ない、閾値に達した場合、ステップS1004で、その総和値に応じた位相でパルス出力をする。
【０２２０】
注視位置の設定制御に関する主な処理の流れの概要を整理すると、図２７のようになる。先ず、ステップS2701で、注視位置を画面中央に設定し、注視領域サイズを画面全体とする。次に、ステップS2702で、最上位層までのWhat経路による処理過程を実行する。この段階では、前述したごとく、いわゆる対象を知覚した状態、即ち認識状態には至っていない。
【０２２１】
その後、ステップS2703で、最上位層相当の特徴位置検出層(3,M)出力からのWhere経路等を介したフィードバック量等に基づき、最大フィードバック量または所定の最大評価値(式（５)）を受ける注視制御ニューロンに対応する特徴統合層上のニューロンを新たな注視位置とし、その特徴統合層ニューロンが属する処理チャネルに固有の注視領域のサイズが設定され、ステップS2704で、What経路での認識処理が行われる。次に、ステップS2705で、注視位置更新判定を行い、更新する場合には、注視領域の更新制御として、最新の注視位置の近傍領域内で次候補注視制御ニューロンの探索処理（ステップS2706）を行う。
【０２２２】
ここで、注視領域の更新判定とは、例えば、近傍領域内に他のフィードバック量が十分なレベルにある注視制御ニューロン(「次候補ニューロン」という)が存在するか否かの判定、近傍領域には他に次候補ニューロンが存在せず、かつ、画面内に未設定の注視位置が存在するか否かの判定（この場合には、不図示の未設定注視位置の有無に関する判定回路を要する）、或いは外部からの制御信号の入力有無の判定等である。
【０２２３】
第一の場合には、次候補ニューロンの一つを前述した方法で選択して次の注視位置等を設定する。第二の場合は、最新の近傍領域外の任意の未設定位置(又は近傍領域に隣接する任意の未設定位置など)に次の注視位置を設定する。第三の場合は、例えばユーザがシャッタボタンを押す動作信号を制御信号として検知しない場合、ユーザが更新を促す指示を与え、その更新指示の信号が制御信号として検知される場合である。
【０２２４】
ステップS2707で、更新の結果、前述した準最大フィードバック量を受ける注視制御ニューロンを活性化して、当該ニューロンから特徴統合層へ（図２０の場合）、またはゲーティング層へ（図２１の場合）次の注視領域に関する設定信号が出力される。ステップS2708で、特定処理チャネルの一部がオープンされることにより、更新された注視領域に該当する特徴統合層ニューロンから特徴検出層(1,1)への信号の伝播が行われる。一方、注視更新判定の結果、更新を行わない場合(上記判定の３つのケースを参照)には、注視領域の更新制御動作を終了する。
【０２２５】
ネットワークその他構成上の変形例
入力パルスは空間ドメインの各位置での特徴（或いは、特徴要素の空間的配置関係）に対応するものであるから、時空間的ＲＢＦを構成することも可能である。
【０２２６】
具体的には、各ニューロン出力値に対して更に重み付けを行って加算を行うことにより、十分な数の予め定められた特徴要素のセット（特徴検出細胞）および十分な数のサブ時間窓(タイムスロット）での重み付き総和（荷重和）の演算とから任意の図形パターンに対応するパルスパターンの時空間関数を表現することができる。認識対象のカテゴリ及びその形状の変化がある程度限られていれば、必要な特徴検出細胞やサブ時間窓(タイムスロット）の数を少なくすることができる。
【０２２７】
本実施形態では、共通バスは同一受容野に対して一つ割り当てられるような局所的なバスラインとしたが、これに限らず、ある層から次の層への層間結合は同一バスラインで行うように、時間軸上でパルス位相遅延量を分割設定してもよい。また、重なり割合が比較的大きい隣接受容野間では、共通のバスラインを用いるように構成しても良い。
【０２２８】
なお、上述した時空間的ＲＢＦによらずに、各サブ時間窓(タイムスロット）内での重み付き積和演算の結果が非線形なsquashing関数値となるように処理（或いは、閾値処理）して、それらの積をとってもよい。例えば、不図示の回路構成により、閾値処理結果(２値)を各サブ時間窓ごとに得て、一時記憶手段に格納するとともに、順次求まる閾値処理結果の論理積を時系列的に求めるようにすればよい。
【０２２９】
閾値処理して積をとる場合には、パターンの欠損や低コントラスト条件下での特徴検出の許容度が小さくなることは言うまでもない。
【０２３０】
また、上述した処理（時空間的ＲＢＦによる図形パターンの検出）は、連想記憶の想起過程に類似する動作として実現することもできる。即ち、ある局所領域（または全体領域）で検出されるべき低次（または中次）の特徴要素の欠損が生じても、他の幾つかの特徴要素が検出され、上記総和値（式（１３））が閾値を上回れば、時空間ＲＢＦネットワーク全体としては、中次（または高次）の特徴要素の検出（該当するニューロンの発火）が行われる様にすることができる。
【０２３１】
ネットワークの構成としては、図１に示したものに限定される必要はなく、所定の幾何学的特徴要素を検出する層を含む構成であればＭＬＰその他のものであってもよいことはいうまでもない。
【０２３２】
本実施形態では、低次特徴抽出のためにGabor wavelet変換を用いたが、他の多重スケール特徴(例えば、スケールに比例するサイズで求めた局所自己相関係数など)を用いてもよいことは言うまでもない。
【０２３３】
なお、図１に示すようなネットワーク構成のもとで、パルス幅(アナログ値)変調動作を行うシナプス素子とintegrate-and-fireニューロンで構成されるネットワークにより、図形パターン等の認識を行ってもよい。この場合、シナプスによる変調はシナプス前信号のパルス幅とシナプス後のパルス幅をそれぞれ、W_b,W_aとするとW_a = S_ijW_bで与えられる。ここに、S_ijは結合強度（式（10））と同じ意味である。変調のダイナミックレンジを大きくとる為には、パルス信号の基本パルス幅を周期（基本パルス間隔）と比べて十分に小さくとる必要がある。
【０２３４】
ニューロンの発火（パルス出力）は、所定の特徴要素を表す複数のパルス電流の流入に伴う電荷の蓄積により、電位が所定の閾値を越したときに生じる。パルス幅変調等の場合においては、サブ時間窓ごとの到着パルスの重み付き加算は特に要しないが、所定の幅の時間窓での積分は実行される。この場合、検出されるべき特徴要素（図形パターン）は、特徴検出層ニューロンに入力される信号の時間的総和（パルス電流値の総和）のみに依存する。また、入力パルスの幅は重み関数の値に相当するものである。
【０２３５】
なお、スケール不変な特徴表現を先に得る構成により、複数の処理チャネルを中次、高次まで有する構成を用いずにスケール不変な認識性能を保持しながら、選択的注視処理の効率が向上し、回路構成の簡素化、規模の小型化、更には低消費電力化がもたらす構成として、以下のようなものでもよい。
【０２３６】
即ち、前述した基本構成では、注視領域の設定制御層を低次特徴に関する集団的符号化の後で行い、かつ集団的符号化を各層レベルで行ったが、集団的符号化スケールレベルの異なる特徴表現及び前述したような集団的符号化を低次特徴に限って行い、後述するように各特徴に関するパルスの位相変調などによりスケール不変な特徴表現を得て（スケール不変なパルス情報変換）、その後前述した注視領域の設定制御を行い、中次および高次の特徴検出はこのスケール不変な特徴表現ドメインで行ってもよい。注視領域の設定制御を集団的符号化の前に行ってもよいが、後に行う場合に比べて処理効率が低下する。
【０２３７】
上述したスケール不変なパルス情報への変換を行うには、同一特徴カテゴリであってスケールレベル(サイズ)が異なる特徴は同じパルス間隔により表現されるような変換として、例えば、ある図形特徴を検出する特徴検出ニューロンへの複数パルスの到着パターンの位相オフセット量が、どの処理チャネルからのパルスであっても一定値となるような変換を行えばよい。パルス幅変調により情報表現を行う場合にも、パルス幅の伸縮又はオフセット量に関して同様の処理を行えばよい。
【０２３８】
更に、互いに異なるスケールレベル(処理チャネル)に属する特徴検出ニューロンにおいては、同一カテゴリの図形的特徴(例えば、Ｌ字パターン)に対応するパルスの到着時間の間隔（またはパルス到着の時間パターン）が、スケールレベルにより異なり時分割されるように学習規則を定めてもよい。
【０２３９】
集団的符号化処理は、時分割されたパルス信号全体にわたる重み付き加算による線形結合等により行う。選択的注視処理は、集団的符号化の前に特定部分領域に対応する特徴統合層(2,0)からの出力データを注視領域として選択的に抽出し、集団的符号化処理は選択された出力データについて時間軸上で行われる。特徴検出層(1,1)以降の層においては、処理チャネルごとに異なる回路を設けることなく、同一回路で多重スケール処理を行うことができ、経済的な回路構成となる。即ち、(1,1)層以降では処理チャネルの違いは回路構成上は物理的に区別を無くすることができる。
【０２４０】
特徴検出層(1,1)から特徴統合層(2,1)層への出力（及び、それ以降の層につき同様とする）は、各処理チャネル出力ごと（スケールレベルごと）に時分割で行われる。
【０２４１】
撮像装置に搭載した応用例
次に、本実施形態の構成に係るパターン検出（認識）装置を撮像装置に搭載させることにより、特定被写体へのフォーカシングや特定被写体の色補正、露出制御を行う場合について、図１１を参照して説明する。図１１は、実施形態に係るパターン検出（認識）装置を撮像装置に用いた例の構成を示す図である。
【０２４２】
図１１の撮像装置1101は、撮影レンズおよびズーム撮影用駆動制御機構を含む結像光学系1102、CCD又はＣＭＯＳイメージセンサー1103、撮像パラメータの計測部1104、映像信号処理回路1105、記憶部1106、撮像動作の制御、撮像条件の制御などの制御用信号を発生する制御信号発生部1107、EVFなどファインダーを兼ねた表示ディスプレイ1108、ストロボ発光部1109、記録媒体1110などを具備し、更に上述したパターン検出装置を、選択的注視機構を有する被写体検出（認識）装置1111として備える。
【０２４３】
この撮像装置1101は、例えば撮影前のファインダー映像中から、予め登録された人物の顔画像の検出(存在位置、サイズの検出)又は認識を被写体検出(認識)装置1111により行う。そして、その人物の位置、サイズ情報が被写体検出(認識)装置1111から制御信号発生部1107に入力されると、同制御信号発生部1107は、撮像パラメータ計測部1104からの出力に基づき、その人物に対するピント制御（ＡＦ）、露出条件制御（ＡＥ）、ホワイトバランス制御（ＡＷ）などを最適に行う制御信号を発生する。
【０２４４】
図２８に、撮像装置において選択的注視を行いながら、撮影を行う場合の処理フローの概要を示す。
【０２４５】
先ず、ステップＳ２８０１で、検出・認識の対象を予め特定するために、撮影対象（例えば、人物の撮影を行う場合には対象人物の顔など）についてのモデルデータを記憶部１１０６又は撮像装置に内蔵される通信部（不図示）を介して一時記憶部（不図示）にローディングする。次に、ステップＳ２８０２で、撮影待機状態（シャッター半押し状態など）において注視制御処理の初期化（例えば、注視領域を画面中央位置、画面全体サイズにするなど）を行う。
【０２４６】
続けて、ステップS2803で、前述したような注視領域更新処理を起動して、所定の基準(例えば、最上位層出力に基づき撮影対象の「検出」を判定した場合)に合致する注視領域を設定し、ステップS2804で注視領域更新処理を一旦停止し、ステップS2805で、選択された注視領域を中心とする撮影条件の最適化制御（ＡＦ、ＡＥ，ＡＷ及びズーミングなど）を行う。このとき、更に、ユーザが撮影対象の確認を行えるように、ステップS2806で、撮影対象位置を十字マーク等のマーカー表示をファインダーディスプレイ上で行う。
【０２４７】
次に誤検出判定処理（ステップS2807）として、例えば、一定時間範囲内でシャッタボタンが押されたか否か、或いはユーザが他の対象の探索指示を出したか否か(例えば、シャッタ半押し状態解除の状態にした場合など)を判定する処理を行う。そこで誤検出と判定された場合には、ステップS2803に戻って、再び注視領域更新処理を起動する。また、誤検出なしと判定された場合には、その時の撮影条件で撮影がなされる。
【０２４８】
上述した実施形態に係るパターン検出(認識)装置を、このように撮像装置に用いた結果、複数の被写体が入力画像中に存在する場合、予め被写体の存在位置や画面内サイズが分からない場合でも、当該被写体を確実かつ効率的に検出(認識)することができ、そのような機能を低消費電力かつ高速（リアルタイム）に実現して、人物等の検出とそれに基づく撮影の最適制御（ＡＦ、ＡＥなど）を行うことができる。
【０２４９】
＜第２の実施形態＞
本実施形態では、予め選択的注視を行う場所(画面内位置)等に関する優先順位(優先度)を求めることにより、選択的注視の更新処理そのものを高速に行う。本実施形態に特有の効果として、後で定義する優先度の設定部2401により、予め(検出動作などを開始する前に)優先順位を設定しておくことにより、処理効率（注視点更新のスピード）が大きく向上する。また、更新可能な位置を制限するために優先度の許容範囲を設け、その範囲にある優先度の注視制御ニューロンのみに注視位置の更新を行うことにより処理の更なる高速化がもたらされる。
【０２５０】
本実施形態の注視領域設定制御層108を中心とした要部構成を図２４に示す。低次特徴の顕著度マップ(特徴統合層(2,0)出力)と上位層からのフィードバック量に基づき、注視制御ニューロンを選択する優先順位の設定を行う優先度設定部2401を設けてある。これは注視位置更新制御回路1901とともに注視領域設定制御層108に近接して設定する(注視領域設定制御層108上に設定してもよい)のが良い。なお、優先度設定部2401と注視位置更新制御回路1901とは、本実施形態ではデジタル信号処理回路（または論理ゲートアレイ）である。前実施形態と同様に注視領域設定制御層108には、入力データ上で低次特徴検出を行う各位置に対応する注視制御ニューロン（処理チャネルの数ｘ特徴カテゴリの数だけ図８の楕円で示された同一位置に存在する）が存在する。
【０２５１】
ここで優先度を示す量は、式（５)に示すような特徴統合層出力（顕著度の値）と上位層からのフィードバック量の線形和（又はその順位）に相当する。この優先度の高い順に対応する注視制御ニューロンを活性化する制御信号が注視位置更新制御回路1901から特定の(更新位置の)注視制御ニューロンに対して送出される。注視位置更新制御回路1901は、優先度の高い注視制御ニューロンの位置(アドレス)を求めるために、後述する所定の方法で各注視制御ニューロンを逐次アクセスして優先度を算出し、かつソーティングを行って一次記憶部(不図示)に優先度の高い順にニューロンのアドレス情報を格納する。但し、予め設定してある優先度の許容範囲にある優先度の注視制御ニューロンのアドレスのみを格納する。
【０２５２】
優先度算出のための注視制御ニューロンへのアクセス方法としては、例えば、初期注視位置(前実施形態と同様に画面中央とする)から螺旋状にサンプリングして行う。また、撮影条件（被写体距離、倍率など）に基づいて探索すべき処理チャネルを決定し（被写体サイズが予め分かっているとすると撮影条件に応じた画面内の対象のサイズも分かり、その結果スケールレベルも絞られることによる）、対応する注視制御ニューロン群のみを探索してもよい。
【０２５３】
また注視位置更新制御回路1901は、登録された注視制御ニューロンアドレスの全てを順次選択するのではなく、予め設定された優先度範囲にある注視制御ニューロンを選択する。例えば、視覚探索（選択的注視制御）を行う初めの段階では、優先度の許容範囲を高めに設定しておき、その範囲で一巡するたびに（これを注視位置探索更新の回数にカウントし、その回数に応じて）優先度の許容範囲を変化させてもよい。例えば、その回数が増すたびに優先度の許容範囲を低くするか、或いは許容範囲を拡大すればよい。
【０２５４】
＜第３の実施形態＞
本実施形態の注視領域設定制御層108を中心とした構成を、図２０、２１に示す。本実施形態では注視制御ニューロン1801は、認識対象カテゴリに属する対象の位置と存在確率に関する情報を出力する上位層(例えば、特徴位置検出層(3,2)または特徴統合層(2,2))からのフィードバック結合と、当該認識カテゴリの対象が有する中次特徴の位置と存在確率(集団的符号化処理の説明を参照)または検出レベルに関する情報を出力する中間層(例えば、特徴位置検出層(3,1)または特徴統合層(2,1))からのフィードバック結合を受け、外部(例えば、図１で示される最上位層より更に上位の層)からの制御信号などにより、当該対象の探索時(検出モード)には上位層からのフィードバック入力を、当該対象の認識時(認識モード)には中間層からのフィードバック入力を優先する。
【０２５５】
ここで、検出モードでの対象の探索とは、単に認識対象と同一カテゴリの対象の検出(例えば、顔の検出)を行うことを意味し、認識とは、注視位置の設定制御をより詳細な特徴に基づいて行うことにより、当該対象が認識対象であるか否か（例えば、特定人物の顔か否か）の判定を行うことを意味する。後者の認識時は前者の探索時よりも、いわゆる注視度が高い状態である。
【０２５６】
フィードバック結合の重み付けによる優先の仕方は、図２６に示すフィードバック量変調部2601により設定される。このフィードバック量変調部2601を介した処理の流れを図２５に示す。フィードバック量変調部2601は、先ず各注視制御ニューロンに対する上位層からのフィードバック量と中間層からのフィードバック量をそれぞれ入力する（2501）。
【０２５７】
検出モードか認識モードかのモード制御信号の入力を行い（2502）、検出モードの場合には、上位層からのフィードバック量のみ、または両フィードバック量を適切な重み付けした線形和(αF₁ + βF₂：F₁は上位層から、F₂は中間層からのフィードバック量）をとり、上位層からのフィードバック量に関する寄与が大となるような(α ＞ β ≧ ０)結合フィードバック量の算出を行い、当該フィードバック量が出力となるような変調（増幅）を行う（2503）。一方、認識モードの場合には、中間層からのフィードバック量のみ、または両フィードバック量の線形和による中間層からのフィードバック量寄与が大となるような(０ ≦ α ＜ β)結合フィードバック量を算出し、同様な変調を与える（2504）。
【０２５８】
本実施形態では、どのフィードバック結合を優先するかは別として、注視位置の設定制御を実施形態１と同様に行うことができるが、実施形態１に係る方法に加えて、注視点位置に関するランダムな時間的変動を与えながら行ってもよい。このように注視位置にゆらぎを与えることにより、探索対象の検出をより効率良く行うことができる場合がある。即ち、実施形態１では、注視位置更新の際、常に近傍領域内での次候補探索処理を要していた（最新の注視位置の近傍領域において次候補となる注視制御ニューロンが存在しないとき、当該近傍領域の外部または外部の隣接位置に次の注視位置を設定した）が、注視位置のゆらぎ付与により、次のような特有の効果を得ることができる。
【０２５９】
（１）探索すべき近傍領域サイズを（ゆらぎの付与無しの場合より）小さくしても、探索に要する時間を同等に保ちながら、近傍領域内での探索に要する処理負荷を軽減することができる。何故ならば、注視位置のランダムなゆらぎは、近傍領域内での注視制御ニューロンへのフィードバック量等の比較処理を行わずに、注意位置の探索を行うことと等価な作用を与えるからである。
【０２６０】
（２）ゆらぎの幅が小さい場合には、所定の注視点を中心とする特徴検出層出力の時間的平均化による認識（検出）率を向上させることができる。
【０２６１】
注視位置のゆらぎを与える場合には、その変動幅は注視度に応じて変化させることができる。ここで、注視度とは、認識・検出対象の全体のパターンを構成する特定パターンの検出、またはそのような構成要素となる特定パターン間の配置の検出を行なう層からのフィードバック量を重視する度合いであり、例えば当該フィードバック量の注視制御ニューロンでの検出レベルを、全体パターン（高次特徴）の検出を行う層(上位層)からのフィードバック入力の検出レベルを基準として求めた値、即ち上位層からのフィードバック量F1に対する中間層からのフィードバック量F₂の比(F₂/F₁)で表される。
【０２６２】
注視度が高いときには変動幅を小さくする要領で、注視位置の変動幅を制御する。注視度は上位層からのフィードバック量の総和の単調増加関数とするか、或いは複数の注視制御ニューロン間でフィードバック量の差が小さいほど注視度（当該ニューロンへのフィードバック量）を大きくするフィードバック量の増幅を行うことによって定めてもよい。後者のフィードバック量の増幅は、注視領域設定制御層108において、図２６に示すフィードバック量変調部2601により該当する各注視制御ニューロン1801に対して行われる。
【０２６３】
注視点位置に時間的変動を与える方法としては、例えば最新の注視点位置を中心とする実施形態１と同様な近傍領域において、前述した変調後のフィードバック量に基づいて、実施形態１と同様に更新後の注視点を設定し、更にその位置のランダムな変動を与えることにより、最終的な更新後の注視制御ニューロン(注視位置)を設定する。このときランダムな変動の幅を注視度に応じて前述したように制御すればよい。
【０２６４】
＜第４の実施形態＞
本実施形態では、図２９に示すように、画像入力装置（カメラ、ビデオカメラ等）内部において、ユーザ（撮影者）が意図する画像入力（撮影）対象の位置・サイズ等の情報を検出するアシスト情報検出部2902（以下、本実施形態では視線を例に説明する）、および前述した実施形態に係る注視領域設定制御部2901を内蔵した被写体検出（認識）装置1111を設ける。
【０２６５】
アシスト情報（視線）検出部2902と注視領域設定制御部2901とが連動して注視位置の設定制御を行うことにより、画面中の特定対象に適し、かつユーザの意図を反映することのできる高速自動撮影を行う。なお、アシスト情報としては、視線の他にユーザにより明示的に設定してもよい。以下、アシスト情報検出部2902を視線検出部2902として記述する。
【０２６６】
図２９において、撮像装置1101は、撮影レンズおよびズーム撮影用駆動制御機構を含む結像光学系1102、CCD又はＣＭＯＳイメージセンサー1103、撮像パラメータの計測部1104、映像信号処理回路1105、記憶部1106、撮像動作の制御、撮像条件の制御などの制御用信号を発生する制御信号発生部1107、EVFなどファインダーを兼ねた表示ディスプレイ1108、ストロボ発光部1109、記録媒体1110などを具備する。
【０２６７】
更に、上述した視線検出部2901、視線検出用の接眼部光学系2903、注視領域設定制御部2901を内蔵した被写体検出（認識）手段1111を備える。接眼部光学系2903には、接眼レンズ、ハーフミラーの様な光分割器、集光レンズ、赤外光を発するＬＥＤなどの照明光源などから構成され、視線検出部2901は、ミラー、ピント板、ペンタ・ダハプリズム、光電変換器、信号処理手段などから構成され、視線位置検知信号は、撮像パラメータの計測部1104する出力する。
【０２６８】
なお、視線検出部2901の構成としては、本出願人による特許第２５０５８５４号公報、特許第２７６３２９６号公報、特許第２９４１８４７号公報などに開示された機構その他を用いることができ、説明を省略する。
【０２６９】
注視制御の具体的手順としては、まず、予めユーザが注目する場所（領域）を視線検出部2902により抽出し、その情報を一次記憶部に格納しておく。次に、注視領域設定制御部2901が起動して、予め記憶された注視候補位置（複数可）を優先的に探索する。ここで、優先的に探索するとは、実施形態２で説明したような優先順位を用いて注視位置を設定したのと同様に、例えば、ユーザが注視した位置の順番に注視位置を設定し、その都度、実施形態１などに示したような近傍領域の探索を一定時間範囲で行い、その後ユーザ注視位置への更新、近傍領域探索を交互に繰り返することを意味する。その理由は後に記す。
【０２７０】
一方、注視領域設定制御部2901の動作中に視線検出部2902からの信号を一定時間間隔で入力し、その時得られるユーザの注視する位置周辺を優先的に探索してもよい。具体的には、視線検出部2902からの信号により特定される画面内の注視位置に最も近い位置にある注視制御ニューロンを選択し、当該制御ニューロンを活性化する（或いは、図２１のゲーティング層2101の一定範囲のゲートに対してゲートのオープン信号を送る）ことにより、ユーザの注視位置を反映した認識（検出）処理を行う。
【０２７１】
以上のように、ユーザの注視する（した）位置に関する情報と注視位置設定制御処理により自動的に更新される注視位置情報とを組み合わせて注視領域の設定制御を行う主な理由は以下のとおりである。
【０２７２】
１．ユーザの注視している位置が常に撮影対象を正確に表す（の検出に役立つ）とは限らない。
【０２７３】
２．実施形態１、２、及び３に示すような注視位置設定制御の処理単独よりもユーザの意図する撮影対象の位置情報を補助的に用いることにより探索範囲を狭めることができ、より効率的な探索を行うことができる。
【０２７４】
３．注視領域設定制御部により設定された注視位置の誤り検出が容易となる。
【０２７５】
以上のように、ユーザの注視位置検出結果を援用することにより、確実かつ高速に被写体の検出と認識を行うことができた。
【０２７６】
以上説明した実施形態によれば、低次特徴（例えば、当該対象の検出には意味のないエッジなど）による擾乱を受けずに、注視すべき領域の設定制御を小規模の回路で高効率に行うことができる。特に、多重解像処理と集団的符号化処理メカニズムを、上位層からのフィードバック量が関与する注視制御に組み込んだことにより、複数の同一カテゴリに属する対象が、異なるサイズで任意の位置に存在している場合でも、高次特徴の検出レベルを用いてフィードバック制御する機構を設けたことにより、複数の対象間を効率良く探索することができる。
【０２７７】
また、特徴検出信号であるパルス信号について時間窓内での重み付き荷重和の閾値処理することにより、複雑多様な背景下において、検出（認識）すべき対称が複数存在しそれらの配置関係が事前にわからなくても、かつその変形（位置変動、回転等を含む）、特にサイズの変化や照明、ノイズの影響等による特徴検出の欠損等が生じても、確実に所望のパターンを効率よく検出することができる。この効果は、特定のネットワーク構造によらず実現することができるものである。
【０２７８】
更に、注視領域の設定更新をユーザからのアシスト情報（視線方向など）と注視領域設定制御処理とを用いて行うことにより、高速かつ確実に行う事ができた。
【０２７９】
また、高次特徴の検出レベルに応じた注視領域の探索を高効率に行い、所定のカテゴリの対象の検出及び認識を高速かつコンパクトな構成で行うことができる。
【０２８０】
また、注視領域の設定を低次特徴または入力データについてのみ行うことにより、従来技術（選択的チューニング法、及び特公平6-34236号公報）で行われていたような低次から高次までの設定制御を不要とし、処理効率の向上と高速化がもたらされる。
【０２８１】
また、低次等の特徴顕著度とフィードバック信号の双方を混合的に用いることにより、詳細なパターンの分析を行う様な認識モードでは、認識対象を構成する要素となるパターンや局所的な特徴配置に関する情報を優先的に活用し、単にある特徴カテゴリに属するパターンを検出する様な検出モードでは全体的な特徴（高次特徴、或いは上位層からのフィードバック信号）を優先的に扱うなど適応的な処理ができる。
【０２８２】
また、注視位置に関する優先度が予め求められ、その結果に基づいて注視位置が制御されるので注視領域の探索制御を極めて高速に行うことができる。
【０２８３】
また、注視位置探索回数に応じて注視位置に関する優先度の許容範囲を変化させることにより、認識対象の探索過程が巡回してしまう（ほぼ同じ探索位置を繰り返し探索する）ような場合には許容優先度を変化させて、実質的な探索範囲を変更することができる。
【０２８４】
また、優先度の分布から優先度の高い順に注視位置を設定し、優先度に基づいて選択された特徴の属する処理チャネルに基づき注視領域のサイズを制御することにより、認識検出対象のサイズが予め分からない場合でもその対象のサイズに応じた注視領域の設定を自動的に行うことが可能になる。
【０２８５】
また、注視領域を低次特徴検出層に属する特徴検出素子のアクティブな受容野として設定制御することにより、注視領域設定のための特別なゲーティング素子を設定しなくても注視領域の設定が可能になる。
【０２８６】
また、複数の解像度又はスケールレベルごとに複数の特徴を抽出することにより、任意サイズの対象を高効率に探索することができる。
【０２８７】
また、対象の探索時には上位層からのフィードバック入力を、当該対象の認識時には中間層からのフィードバック入力を優先することにより、所定のカテゴリに属する対象の検出のみを行うときは、高次の特徴（パターンの全体的な特徴）に基づいて探索を行い、類似物体間の識別や認識を行う場合には中次の特徴（全体の一部をなすパターンや特徴の配置関係に関する情報など）に基づいた詳細にわたる処理が可能となる。
【０２８８】
また、所定の注視度が大の時には注視領域の中心位置の時間的変動を小さくしたことにより、視覚探索時の注視位置のゆらぎを可変とし、注視度に応じた探索効率の向上と探索時間の短縮を実現する。
【０２８９】
また、注視領域のサイズを認識対象カテゴリに属するパターンに関する検出されたスケールレベルに基づいて設定することを特徴とする。これにより、予め対象のサイズがわからなくても処理チャネルにより予め対応づけられたサイズを推定サイズとして用いることができ、高効率な注視領域サイズの設定をすることができる。
【０２９０】
また注視度は、上位層からのフィードバック信号レベルの大きさの単調増加関数であるすることにより、高次特徴の検出レベルが高いほど注視度が高いものとして、その高次特徴に応じた自動処理を行うことができる。
【０２９１】
更に、上述したパターン検出装置からの出力信号に基づき、動作を制御することにより、低消費電力かつ高速に特定被写体をターゲットとした画像入力が任意の被写体距離で可能となる。この画像入力機器として、いわゆる静止画像、動画像その他立体画像などの撮影装置、及び複写機、ファクシミリ、プリンタその他の画像入力部を含む機器に適用できる。
【０２９２】
また、所定の基準に合致する注視領域を設定し、設定された注視領域を中心とする撮影条件の最適化制御を行うことにより、撮影対象を高効率に探索し、検出された対象に応じた最適自動撮影が可能となる。
【０２９３】
更に、検出した視線などのユーザからのアシスト情報に基づき注視領域を更新又は設定することにより、注視領域の設定制御を高速かつ確実に行う事ができる。
【０２９４】
【発明の効果】
以上説明したように、本発明によれば、注視領域の設定を高効率に行ないながら、所定のパターンの検出を高速に行うことができる。
【図面の簡単な説明】
【図１】本発明に係る一実施形態のネットワーク構成を示すブロック図である。
【図２】シナプス部とニューロン素子部の構成を示す図である。
【図３】実施形態１において特徴統合層または入力層から特徴検出層ニューロンへの複数パルス伝播の様子を示す図である。
【図４】シナプス回路の構成図を示す図である。
【図５】シナプス結合小回路の構成、及び実施形態１で用いるパルス位相遅延回路の構成を示す図である。
【図６】特徴検出層ニューロンにペースメーカニューロンからの入力がある場合のネットワーク構成を示す図である。
【図７】特徴検出ニューロンに入力される異なる特徴要素に対応する複数パルスを処理する際の時間窓の構成、重み関数分布の例、特徴要素の例を示す図である。
【図８】各層の処理手順を示すフローチャートである。
【図９】各特徴検出ニューロンの処理手順を示すフローチャートである。
【図１０】各特徴統合ニューロンの処理手順を示すフローチャートである。
【図１１】実施形態に係るパターン検出（認識）装置を撮像装置に用いた例の構成を示す図である。
【図１２】特徴統合層の回路構成を示す図である。
【図１３】特徴統合層の回路構成を示す図である。
【図１４】正規化回路の構成を示す図である。
【図１５】チャネル活性度制御回路の構成を示す図である。
【図１６】ゲーティング回路の構成を示す図である。
【図１７】実施形態１のネットワーク構成を示すブロック図である。
【図１８】注視制御層内の注視制御ニューロンを中心とした結合模式図である。
【図１９】注視位置制御回路の構成を示す図である。
【図２０】注視領域設定制御層を中心とするネットワーク構成を示す図である。
【図２１】注視領域設定制御層を中心とするネットワーク構成を示す図である。
【図２２】注視制御ニューロンへの下位層入力と上位層フィードバック入力の配分制御回路を示す図である。
【図２３】スケール選択性の異なる複数処理チャネル出力を集団的符号化により統合した場合の出力例を示す図である。
【図２４】実施形態２で用いるネットワーク構成を示す図である。
【図２５】フィードバック量変調回路での処理の流れを示す図である。
【図２６】実施形態３の注視制御層を中心とした構成を示す図である。
【図２７】注視位置の設定制御に関する処理の流れを示すフローチャートである。
【図２８】選択的注視機構を撮影装置に内蔵させた場合の制御の流れを示すフローチャートである。
【図２９】被写体認識機構を備えた画像入力装置を示す図である。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a pattern detection apparatus and method, an image processing apparatus and method for performing pattern recognition, detection of a specific subject, and the like.
[0002]
[Prior art]
Conventionally, in the field of image recognition and voice recognition, a recognition processing algorithm specialized for a specific recognition target is sequentially calculated and executed as computer software, or by a dedicated parallel image processing processor (SIMD, MIMD machine, etc.) Broadly divided into the types to be executed.
[0003]
In the image recognition algorithm, it is thought that it is important to reduce the calculation cost (load) required for recognition processing by performing recognition while selectively shifting the gaze area from analogy with biological processing. .
[0004]
For example, in the hierarchical information processing method according to Japanese Patent Publication No. 6-34236, a feature extraction element layer and a feature integration layer that performs output according to an output from a feature extraction element corresponding to the same feature are used. In addition, a downward signal path from the upper layer to the lower layer is provided corresponding to a plurality of upward signal paths from the lower layer to the upper layer, and the upward signal is sent according to the downward signal from the uppermost layer. By controlling the transmission of the signal, the processing area (gaze area for recognition) is set by performing the segmentation by the selective extraction of the self-associative associative ability and the specific pattern.
[0005]
In US Pat. No. 4,876,731, the upward signal path and the downward signal path as described above are controlled based on contextual information (rule database, stochastic weighting process) from the output layer (the highest layer).
[0006]
In Japanese Patent No. 2856702, correction of recognition is referred to as gaze, a gaze degree determination unit that selects a gaze degree for each partial area when a pattern cannot be specified for accurate recognition, and the selected gaze degree The pattern recognizing unit is provided with a gaze degree control unit that performs recognition in consideration of the above.
[0007]
Koch and Ullman (1985) proposed a method for controlling gaze area setting by selective routing (Human Neurobiology, vol.4, pp.219-227), which is a feature extraction and saliency map extraction mechanism based on a selective mapping mechanism. , Gaze position selection mechanism by so-called winner-take-all neural network (see Japanese Patent Application Laid-Open No. H5-242069, US5049758, US5059814, US5146106, etc.) Gaze control is performed with a suppression mechanism for neural elements.
[0008]
A method of obtaining a scale and position-invariant representation of the object center using a dynamic routing network (Anderson, et al. 1995, Routing Networks in Visual Cortex, in Handbook of Brain Theory and Neural Networks (M. Arbib, Ed.), MIT Press, pp. 823-826, Olhausen et al. 1995, A Multiscale Dynamic Routing Circuit for Forming Size- and Position-Invariant Object Representations, J. Computational Neuroscience, vol. 2 pp. 45-62.) By routing information through a control neuron that has a function to set the connection weight to the object, control of the gaze area and conversion from the observer-centered feature representation to the object-centered representation of the recognition target .
[0009]
The selective tuning method (Culhane & Tsotsos, 1992, An Attentional Prototype for Early Vision.Proceedings of Second European Conference on Computer Vision, (G. Sadini Ed.), Springer-Verlag, pp. 551-560.) By searching with a mechanism that activates only the winner from the WTA circuit in the upper layer to the WTA circuit in the lower layer, the overall winner position in the highest layer is the lowest layer (layer that directly receives input data) To decide. The position selection and the feature selection performed in the gaze control are realized by suppressing the coupling unrelated to the position of the target and suppressing the element that detects the feature not related to the target, respectively.
[0010]
[Problems to be solved by the invention]
The method of performing selective gaze as described above has the following problems.
[0011]
First, in the method according to Japanese Patent Publication No. 6-34236, it is assumed that there is a downward signal path that is paired with an upward signal path in the hierarchical neural circuit configuration. There is a problem that a neural circuit having a size similar to that of the neural circuit corresponding to the signal path is required as a circuit for forming the downward signal path, and the circuit scale becomes very large.
[0012]
Further, in this method, since a mechanism for sequentially changing the gaze position is not provided, there is a problem in that the stability of the operation at the time of setting and changing the gaze area is lacking due to noise and other influences. That is, there is an interaction across all layers between the elements in the intermediate layer of the upward path and the elements in the intermediate layer of the downward path, and the gaze position is finally determined throughout the interaction. When there are multiple targets belonging to the same category, the gaze point position is not controlled so as to stably and sequentially transition between the targets, and the gaze position vibrates only between specific targets (or the vicinity). There was a problem to do.
[0013]
In addition, when there are multiple targets in the same category as the detection (recognition) target, processing for multiple targets (processing that is not substantially watched) occurs at the same time, In order to perform the update, there has been a problem that it is necessary to finely adjust the network parameters each time.
[0014]
In addition, in the method based on selective routing, the gaze position control has only a mechanism for suppressing the selected area, so that it is not easy to efficiently control the gaze position, and the gaze position control is not easy. In some cases, it may be biased toward a specific object or a specific part.
[0015]
In addition, the dynamic routing network method reconfigures the interlayer connection via many control neurons that can dynamically change the synaptic connection weight, which complicates the circuit configuration and lowers the action of the control neuron. Since this process is a bottom-up process based on the feature saliency of the layer, it is difficult to efficiently control the gaze position when there are a plurality of objects of the same category.
[0016]
In addition, the method based on selective tuning performs hierarchical and dynamic selection similar to so-called pruning, which simply cuts off the connections that do not participate in the selected target, so if there are multiple targets, the position of the point of interest There was a problem that efficient control of the system was not easy.
[0017]
Furthermore, there are the following problems in common with the above-described conventional method.
[0018]
First, since there is no mechanism that can deal with the difference in recognition target size, if there are multiple different size targets at the same time or multiple scenes with different recognition target sizes, There was a need to tune.
[0019]
Further, when there are a plurality of objects belonging to the same category at a plurality of positions in the input data, the gaze position cannot be changed (updated) uniformly and efficiently between them.
[0021]
[Means for Solving the Problems]
In order to solve the above-described problems, according to the present invention, the pattern detection apparatus corresponds to input means for inputting a pattern, and each point obtained by sampling the pattern input from the input means by a predetermined method. A plurality of feature detection elements for detecting a plurality of features, a saliency detection element for detecting feature saliency, and a coupling means for coupling between the elements and transmitting signals. A plurality of element layers relating to the features up to and including a detection processing means for detecting a predetermined pattern, and a gaze area setting control means, wherein the coupling means Feedback coupling means for transmitting a signal to the element layer related to the feature, and the gaze area setting control means includes the feature saliency and a signal transmission amount obtained from the feedback coupling means. Based on, and controls the setting of the fixation region related low-order feature data or input data.
[0032]
DETAILED DESCRIPTION OF THE INVENTION
<First Embodiment>
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
[0033]
Overall configuration overview
FIG. 1 is a diagram showing an overall configuration diagram of a pattern detection / recognition apparatus according to the present embodiment. Here, the pattern information is processed by the What path (upper part of the figure) and the Where path (lower part of the figure). The What path mainly handles information related to recognition (detection) of an object or a geometric feature, and the Where path mainly handles information related to the position (arrangement) of the object or feature.
[0034]
What path is the so-called Convolutional network structure (LeCun, Y. and Bengio, Y., 1995, "Convolutional Networks for Images Speech, and Time Series" in Handbook of Brain Theory and Neural Networks (M. Arbib, Ed.), MIT Press , pp.255-258). However, the interlayer coupling within the same path is different from the conventional one in that mutual coupling can be locally formed (described later). The final output of the What path corresponds to the recognition result, ie, the category of the recognized object. The final output of the Where route represents a place corresponding to the recognition result.
[0035]
The data input layer 101 is a photoelectric conversion element such as a CMOS sensor or a CCD element when performing image detection and recognition, and is a sound input sensor when performing sound detection and recognition. Alternatively, high-dimensional data obtained from the analysis result (for example, principal component analysis, vector quantization, etc.) of the predetermined data analysis unit may be input. The data input layer 101 performs data input common to the two paths.
[0036]
Although details of application to speech recognition are not described in the present embodiment, the selective gaze referred to in the following description is extracted by, for example, a method of extracting only a specific band or phoneme or an independent component analysis. This is equivalent to selectively processing only “features” or extracting and processing speech (features) of a specific speaker depending on the context.
[0037]
Hereinafter, a case where an image is input will be described. The What path includes a feature detection layer 102 ((1, 0), (1, 1),..., (1, N)) and a feature integration layer 103 ((2, 0), (2, 1),. (2, N)) and a gaze area setting control layer 108.
[0038]
The first feature detection layer (1, 0) displays the local low-order features of the image pattern (may include color component features in addition to geometric features) by multi-resolution processing using Gabor wavelet transform and others. Are detected by the number of feature categories (for example, as geometric features) at the same location at multiple scale levels or resolutions at the same location (or each point of a predetermined sampling point across the entire screen). When a line segment in a predetermined direction is extracted, it has a receptive field structure corresponding to the geometrical structure of the line segment), and is composed of neuron elements that generate a pulse train according to the degree.
[0039]
The gaze area setting control layer 108 receives a feature saliency map from a feature integration layer (2, 0) described later and a feedback layer from an upper layer, for example, a feature position detection layer (3, k), through a Where path described later, Based on the feedback signal, control of setting and updating of the position and size of the gaze area is performed. Details of the operation mechanism will be described later.
[0040]
The feature detection layer (1, k) as a whole forms a processing channel at a plurality of resolutions (or scale levels). That is, taking Gabor wavelet transformation in the feature detection layer (1, 0) as an example, as shown in FIG. 12, a feature having a Gabor filter kernel with the same scale level and different direction selectivity in the receptive field structure. The set of detection cells forms the same processing channel in the feature detection layer (1, 0), and in the subsequent layer (1, 1), the feature detection cells that receive the output from the feature detection cells (higher order) Is detected to belong to the same channel as the processing channel. Further, in the subsequent layer (1, k) (where k> 1), similarly, the feature detection cell that receives outputs from a plurality of feature integration cells forming the same channel in the (2, k-1) layer is Configured to belong to a channel. Each processing channel progresses at the same scale level (or resolution), and performs detection and recognition from lower-order features to higher-order features by hierarchical parallel processing.
[0041]
The feature integration layer (2,0) on the What path is a predetermined receptive field structure (hereinafter, the receptive field means the coupling range with the output element of the immediately preceding layer, and the receptive field structure means the distribution of the coupling load) Integration of multiple neuron element outputs in the same receptive field from the feature detection layer (1, 0) (sub-sampling by local averaging, etc., and at different scale levels) (Computation processing, etc. of the processing results).
[0042]
Each receptive field of neurons in the feature integration layer has a common structure among neurons in the same layer. Each feature detection layer (1, 1), (1, 2), ..., (1, N)) and each feature integration layer ((2, 1), (2, 2), ..., (2, N)) Each has a predetermined receptive field structure acquired by learning, and the former ((1, 1),...) Performs detection of a plurality of different features in each feature detection module, and the latter ((( 2, 1),...) Integrate the detection results relating to a plurality of features from the preceding feature detection layer.
[0043]
However, the former feature detection layer is coupled (wired) to receive the cell element output of the former feature integration layer belonging to the same channel. The feature integration layer performs two types of processing. The first subsampling is to average the output from the local region (local receptive field of the feature integration layer neuron) from the feature detection cell population of the same feature category and the same scale level. In addition, the combination processing of processing results at different scale levels, which is the second processing, performs linear combination (or non-linear combination) of outputs of a plurality of feature detection cell populations over the same feature category and a plurality of different scale levels.
[0044]
Also, the Where path has a feature position detection layer ((3,0), ..., (3, k)), and inputs from a predetermined (not necessarily all) feature detection layer on the What path And the Where path is a feedback combination from the upper feature integration layer or feature detection layer (in FIG. 1, (1, N) → (3, k ) → Gaze region setting control layer 108) is formed.
[0045]
That is, this feedback connection is formed as a connection to the gaze control neuron 1801 (FIG. 18) of the gaze area setting control layer 108 related to a specific low-order feature constituting a higher-order feature pattern detected in the upper layer by learning in advance. It is assumed that This learning is based on self-organization in which the connection is enhanced or formed (other connections are suppressed or extinguished) by simultaneous firing of the upper layer feature detection neurons and the lower layer feature detection neurons within a predetermined time width. It involves the process of conversion.
[0046]
The output of each feature position detection layer 107 retains a relatively high resolution component of the output of the corresponding feature detection layer 102 (the arrows from the feature detection layer to the feature position detection layer in FIG. Or the feature integration layer output is temporarily retained to represent a saliency map for each feature category. Here, the saliency map in a certain layer represents a spatial distribution of detection levels (or existence probabilities on input data) of a plurality of feature categories (feature elements) having the same degree of complexity. The processing in the feature position detection layer of the Where route will be described later.
[0047]
As shown in FIG. 2A, a mechanism for coupling neurons between each layer includes a signal transmission unit 203 (wiring or delay line) corresponding to an axon or dendrite of a nerve cell, and a synapse coupling circuit S202. It is.
[0048]
In FIG. 2A, a neuron group of feature integration (detection) cells (n) forming a receptive field for a certain feature detection (integration) cell._i) Shows the structure of the coupling mechanism involved in the output from the cell (input as seen from the cell). A portion indicated by a thick line as a signal transmission unit constitutes a common bus line, and pulse signals from a plurality of neurons are transmitted in time series on the signal transmission line. A similar configuration is also adopted when receiving an input from the output destination cell. In this case, the input signal and the output signal may be divided and processed on the time axis in the same configuration, or two systems for input (dendritic side) and output (axon side) Thus, the same configuration as that shown in FIG.
[0049]
As the synapse circuit S202, an interlayer connection (a connection between a neuron on the feature detection layer 102 and a neuron on the feature integration layer 103, and each layer may have a connection to the subsequent layer and the previous layer) Some of them are involved in the connection between neurons in the same layer. The latter is mainly used to connect a pacemaker neuron, which will be described later, to a feature detection or feature integration neuron as required.
[0050]
The so-called excitatory coupling amplifies the pulse signal in the synapse circuit S202, and the inhibitory coupling conversely gives attenuation. When information is transmitted using a pulse signal, amplification and attenuation can be realized by any of amplitude modulation, pulse width modulation, phase modulation, and frequency modulation of the pulse signal.
[0051]
In the present embodiment, the synapse coupling circuit S202 is mainly used as a pulse phase modulation element, the signal amplification is a substantial advance of the pulse arrival time as a characteristic intrinsic amount, and the attenuation is a substantial delay. Converted. That is, the synaptic connection gives the arrival position (phase) on the time axis unique to the characteristics of the output neuron as described later, and qualitatively, the excitatory connection is the phase of the arrival pulse with respect to a certain reference phase. In the case of inhibitory coupling, the advance is similarly delayed.
[0052]
In FIG. 2A, each neuron element n_jOutputs a pulse signal (spike train) and uses a so-called integral-and-fire type neuron element as described later. As shown in FIG. 2C, the synapse connection circuit and the neuron element may be combined to form a circuit block.
[0053]
Each feature position detection layer in the WHERE path of FIG. 1 receives the output of the feature detection layer or the like of the What path, and retains the positional relationship on the data input layer, and at each point on the roughly sampled lattice points, Of the feature extraction results on the route, only neurons corresponding to components useful for recognition (registered in advance from recognition category patterns) respond by filtering or the like.
[0054]
For example, in the uppermost layer in the Where path, neurons corresponding to the category to be recognized are arranged on the lattice, and the position corresponding to the target exists is expressed. In addition, neurons in the middle layer in the WHERE path receive top-down input from the upper layer (in the WHERE path) (or input by the route in the What path higher layer → Where path upper layer → Where path intermediate layer). Thus, sensitivity adjustment or the like can be performed so as to respond only when a feature that can be arranged around the position where the corresponding recognition target exists is detected.
[0055]
When hierarchical feature detection in which the positional relationship (or positional information) between detected features is maintained is performed using the Where path, the receptive field structure is local (for example, elliptical) and the size is higher in the upper layer. If it is configured to gradually increase (or the size from the middle layer to the upper layer is larger than one pixel on the sensor surface and is constant), the positional relationship between feature elements (graphic elements, graphic patterns) is Each feature element (graphic element) can be detected in each layer while preserving the positional relationship on the sensor surface to some extent.
[0056]
As another form of the WHERE path, the receptive field size increases hierarchically in the upper layer, and only the one that outputs the maximum value among the neurons corresponding to the symmetric category detected in the uppermost layer is fired. It may be a neural network configured to. In such a system, information on the arrangement relationship (spatial phase) in the data input layer is stored to some extent also in the uppermost layer (and each intermediate layer).
[0057]
Further, as another network configuration, in the network configuration shown in FIG. 17 (the same route as FIG. 1 except for the receptive field size), the feature integration layer conversely changes the spatial arrangement relationship of the feature categories to higher-order features. In order to preserve, the size of the receptive field is kept below a certain level even in the upper layer. Therefore, feedback coupling from the feature integration layer to the gaze area setting control layer 108 is performed without setting the feature position detection layer.
[0058]
Neuron element
Next, the neurons constituting each layer will be described. Each neuron element is an extended model based on the so-called integrate-and-fire neuron. It fires when the result of linear addition of the input signal (pulse train corresponding to the action potential) exceeds the threshold value and fires. It is the same as a so-called integrate-and-fire neuron in that it outputs a signal.
[0059]
FIG. 2B shows an example of a basic configuration representing the operation principle of a pulse generation circuit (CMOS circuit) as a neuron element, and a known circuit (IEEE Trans. On Neural Networks Vol. 10, pp.540) is shown. It is an extension. Here, it is configured to receive an excitatory and inhibitory input as an input.
[0060]
Hereinafter, the operation principle of the circuit will be described. Excitatory input side capacitor C₁And resistance R₁The time constant of the circuit is the capacitor C₂And resistance R₂In the steady state, smaller than the circuit time constant, the transistor T₁, T₂, T_ThreeIs blocked. The resistor is actually composed of a transistor as an active load. C₁The potential of C increases₂Transistor T₁Exceeding the threshold of T₁Becomes active and transistor T₂, T_ThreeActivate Transistor T_2,T_ThreeConstitutes a current mirror circuit, and the output of the circuit of FIG.₁Output from the side. Capacitor C₂Transistor T₁Is cut off, resulting in the transistor T₂And T_ThreeAnd the positive feedback is configured to be zero.
[0061]
Capacitor C during the so-called refractory period₂Discharges and C₁Potential is C₂Than the potential of T₁The neuron will not respond unless the threshold is exceeded. Capacitor C₁, C₂Periodic pulses are output by repeating alternating charging and discharging, and the frequency generally corresponds to the level of excitatory input. However, since there is a refractory period, it can be limited by the maximum value, or a constant frequency can be output.
[0062]
The potential of the capacitor, and hence the charge accumulation amount, is temporally controlled by a reference voltage control circuit (time window weight function generation circuit). Reflecting this control characteristic is a weighted addition within a time window described later for the input pulse (see FIG. 7). This reference voltage control circuit generates a reference voltage signal (corresponding to the weighting function in FIG. 7B) based on input timing from a pacemaker neuron described later (or a mutual coupling input with a neuron in a subsequent layer). .
[0063]
Inhibitory input may not always be necessary in the present embodiment, but output divergence (saturation) can be prevented by making the input from the pacemaker neuron described later to the feature detection layer neuron inhibitory.
[0064]
In general, the relationship between the above sum of input signals and the output level (pulse phase, pulse frequency, pulse width, etc.) varies depending on the sensitivity characteristics of the neuron, and the sensitivity characteristics are top-down inputs from higher layers. Can be changed. In the following, for convenience of explanation, it is assumed that the circuit parameters are set so that the frequency of the pulse output corresponding to the total value of the input signal rises sharply (thus, it is almost binary in the frequency domain), and the output level is obtained by pulse phase modulation. It is assumed that the timing (such as the timing at which phase modulation is applied) varies.
[0065]
Further, as a means for modulating the pulse phase, a circuit as shown in FIG. As a result, the reference voltage is controlled by the weighting function within the time window. As a result, the phase of the pulse output from this neuron changes, and this phase can be used as the output level of the neuron.
[0066]
A time τ corresponding to the maximum value of the weighting function as shown in FIG. 7B, which gives a temporal integration characteristic (reception sensitivity characteristic) for a pulse subjected to pulse phase modulation by synaptic coupling._w1Is the pulse arrival time τ specific to the characteristic given by the synaptic connection_s1It is set earlier in time. As a result, an arriving pulse that is earlier than the estimated arrival time within a certain range (in the example of FIG. 7B, the pulse that arrives too early is attenuated) is a pulse signal having a high output level in the receiving neuron. Is integrated over time. The shape of the weight function is not limited to a symmetric shape such as Gaussian, but may be an asymmetric shape. It should be noted that for the purpose described above, the center of each weight function in FIG. 7B is not the estimated pulse arrival time.
[0067]
The phase of the neuron output (before synapse) is based on the beginning of the time window as described later, and the delay (phase) from the reference time is the amount of charge accumulated when receiving a reference pulse (due to pacemaker output, etc.) Output characteristics determined by The details of the circuit configuration that provides such output characteristics are not the main point of the present invention, and are therefore omitted. The post-synaptic pulse phase is obtained by adding the pre-synaptic phase to the specific phase modulation amount given by the synapse.
[0068]
It should be noted that a known circuit configuration may be used in which an oscillation output is output with a predetermined timing delay when the total value of inputs obtained by using a window function or the like exceeds a threshold value.
[0069]
As a configuration of the neuron element, neurons belonging to the feature detection layer 102 or the feature integration layer 103, and when the firing pattern is controlled based on the output timing of the pacemaker neuron described later,Pacemaker neuronAfter receiving the pulse output from, the neuron outputs a pulse with a phase delay corresponding to the input level (simple or weighted sum of the above inputs) received from the receptive field of the previous layer. I just need it. in this case,Pacemaker neuronBefore the pulse signal from is input, there is a transitional state in which each neuron outputs a pulse with a random phase according to the input level.
[0070]
In addition, as described later, when pacemaker neurons are not used, the input as described above is based on the synchronous firing signal provided by the mutual coupling between the neurons (between the feature detection layer and the feature integration layer) and network dynamics. The circuit configuration may be such that the firing timing of the output pulse of the feature detection neuron according to the level is controlled.
[0071]
As described above, the neurons in the feature detection layer 102 have a receptive field structure corresponding to the feature category, and input pulse signals (current values or potentials) from neurons in the previous layer (the input layer 101 or the feature integration layer 103). When the total load value (described later) by the time window function is equal to or greater than a threshold value, a non-decreasing and non-linear function that is asymptotically saturated to a certain level, such as a sigmoid function, for example, according to the total value, Pulse output is performed at an output level that takes a squashing function value (in this case, it is given by phase change, but it may be changed on the basis of frequency, amplitude, and pulse width).
[0072]
Feature detection layer (1,0) Processing in ( Gabor wavelet Low-order feature extraction by conversion)
In the feature detection layer (1, 0), there is a neuron N1 that detects a pattern structure (low-order feature) that has a predetermined spatial frequency in a region of a certain size and a vertical direction component. If there is a structure corresponding to the receptive field of the neuron N1 on the data input layer 101, a pulse is output at a phase corresponding to the contrast. Such a function can be realized by a Gabor filter. The feature detection filter function performed by each neuron of the feature detection layer (1, 0) will be described below.
[0073]
The feature detection layer (1, 0) performs Gabor wavelet transform represented by a filter set of multi-scale and multi-directional components, and each neuron (or each group of neurons) in the layer has a predetermined Has Gabor filter function.
[0074]
In the feature detection layer 102, a plurality of neuron populations composed of neurons having receptive field structures corresponding to convolution operation kernels of a plurality of Gabor functions having a constant scale level (resolution) and different direction selectivity are combined into one channel. Form. At that time, as shown in FIG. 13, the neuron groups forming the same channel have different direction selectivity, and the neuron groups having the same size selectivity may be arranged at positions close to each other, as shown in FIG. The neuron groups belonging to the same feature category and belonging to different processing channels may be arranged close to each other.
[0075]
This is due to the fact that the arrangement configuration shown in each of the above figures is easier to realize in the circuit configuration for the convenience of the combination processing described later in the collective encoding. Details of the circuit configurations of FIGS. 12 and 13 will be described later.
[0076]
For details of the method of performing Gabor wavelet transformation in a neural network, see the document by Daugman (1988) (IEEE Trans. On Acoustics, Speech, and Signal Processing, vol. 36, pp. 1169-1179). .
[0077]
The Gabor wavelet has a shape obtained by modulating a sine wave having a constant direction component and a spatial frequency with a Gaussian function, as given by the following equation (1), and includes a scaling level index m and a direction component index. n. As a wavelet, this set of filters have similar functional shapes and differ in main direction and size. This wavelet must be localized in the spatial frequency domain and the real space domain, with minimal simultaneous uncertainty regarding position and spatial frequency, and be the most localized function in both real and frequency space. (J, G. Daugman (1985), Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters, Journal of Optical Society of America A, vol.2, pp. 1160 -1169).
[0078]
[Outside 1]

[0079]
Where (x, y) is the position in the image, a is the scaling factor, θ_nRepresents the directional component of the filter, W is the fundamental spatial frequency, σ_x, σ_yIs a parameter that gives the magnitude of the spread of the filter function in the x and y directions. In this embodiment, θ_nTakes values of 0 degrees, 30 degrees, 60 degrees, 90 degrees, 120 degrees, and 150 degrees in six directions, a is 2, and m is an integer that takes values from 1 to 3.
[0080]
Parameter to determine filter characteristics σ_x, σ_y, And a are preferably set so that there is no bias (sensitivity) in a specific spatial frequency and direction by appropriately and uniformly overlapping each other in the Fourier domain. Therefore, for example, if the half-value level for the amplitude maximum after Fourier transform is designed to touch each other in the Fourier domain,
[0081]
[Outside 2]

It becomes. Where U_H, U_LIs the maximum and minimum values of the spatial frequency band covered by the wavelet transform, and M gives the number of scaling levels in that range.
[0082]
The structure of the receptive field of the feature detection cell given by the equation (1) is σ_x, σ_yScale selectivity and direction selectivity of a predetermined width determined by That is, since the Fourier transform of the equation (1) has a Gaussian function shape, a peak tuning (sensitivity) characteristic is given to a specific spatial frequency and direction. Since the size (spread) of the Gabor filter kernel varies according to the scale index m, Gabor filters having different scale indexes have different size selectivity. In collective encoding described later, outputs from a plurality of feature detection cells whose sensitivity characteristics overlap each other mainly with respect to size selectivity are integrated.
[0083]
Each filter g_mnGabor wavelet transformation is performed by performing a two-dimensional convolution operation between (x, y) and the input grayscale image. That is,
[0084]
[Outside 3]

[0085]
Where I is the input image, W_mnIs a Gabor wavelet transform coefficient. W_mn A set of (m = 1,2,3; n = 1, ..., 6) is obtained at each point as a feature vector. '^*'Indicates that complex conjugate is taken.
[0086]
Each neuron in the feature detection layer (1,0) is g_mnReceptive field structure corresponding to G with the same scale index m_mnHave receptive fields of the same size, and the corresponding kernel g_mnThe size also has a size corresponding to the scale index. Here, the sizes of the input image are 30 × 30, 15 × 15, and 7 × 7 in order from the coarsest scale.
[0087]
Each neuron has an output level that is a nonlinear squashing function of wavelet transform coefficient values obtained by performing product-sum input of distribution weight coefficients and image data (here, phase reference; however, frequency, amplitude, pulse width) The pulse may be output with a reference configuration. As a result, the Gabor of equation (4) is obtained as the output of the entire layer (1, 0).
Wavelet conversion has been performed.
[0088]
Processing in the feature detection layer (middle order and higher order feature extraction)
On the other hand, each neuron in the subsequent feature detection layer ((1, 1), (1, 2),..., (1, N)) is different from the feature detection layer (1,0) in the pattern to be recognized. The receptive field structure that detects the features unique to the is formed by the so-called Hebb learning rule. The size of the local region where the feature detection is performed becomes closer to the size of the entire recognition target stepwise in the later layers, and a medium-order or higher-order feature is detected geometrically.
[0089]
For example, when detecting and recognizing a face, the middle-order (or higher-order) feature represents a feature at the level of graphic elements such as eyes, nose, and mouth constituting the face. If different processing channels have the same hierarchical level (the detected feature complexity is the same level), the difference in the detected feature is the same category but detected on different scales. It is in. For example, “eyes” as a middle feature are detected as “eyes” having different sizes in different processing channels. That is, detection is attempted in a plurality of processing channels having different scale level selectivities for a given size “eye” in the image.
[0090]
In general, feature detection layer neurons (regardless of low-order or high-order feature extraction) are connected based on the output of the previous layer in order to stabilize the output. You may have a mechanism which receives.
[0091]
Selective gaze processing
The configuration and operation of the gaze area setting control layer 108 will be described below. In the initial state, the size of the gaze area set in the same layer is the entire screen for the processing channel having the maximum size among the plurality of scale levels set in advance.
[0092]
Specifically, as shown in FIG. 18, gaze control neurons 1801 corresponding to predetermined multiple positions (positions where feature detection is performed) of input data are arranged in the gaze area control layer. In FIG. 18, an area indicated by an ellipse is a set point of gaze and a corresponding area on each layer (for convenience of explanation, only one is shown for each layer. Thus, there may be multiple corresponding regions in each layer; the size of the region is approximately the same as the receptive field size of each layer).
[0093]
In FIG. 18, a thick arrow heading from the feature detection layer to the feature position detection layer schematically indicates that each neuron in the feature position detection layer receives the output of the corresponding feature detection layer neuron. Each feature position detection layer neuron generally has a smaller receptive field than the feature integration layer neuron, and subsampling in the same way as the feature integration layer makes it possible to spatially distinguish between feature categories rather than feature integration. Arrangement information can be better retained.
[0094]
Each gaze control neuron 1801 has a feedback connection from the Where path from the upper feature position detection layer (3, 2) or (3, 1), or a feature integration layer (2, 2) in FIG. The position corresponding to the gaze control neuron is input through feedback coupling from the neuron. The gaze control neuron may receive the output from the feature integration layer (2,0) in parallel. A specific gaze control neuron selected as a result of linear addition (or non-linear addition) of these two or three types of inputs with a predetermined weight, or as a result of a gaze region selection control process (described later) based on input from either one of them Is activated and the gaze area is determined.
[0095]
The position of the gaze area is set mainly by feedback input of the maximum level target position information detected by the highest feature integration layer (2, N) or the highest feature position detection layer (3, M). The In the former case (that is, the configuration of FIG. 17), it is limited to the case where the position information is held in the feature integration layer. There are mainly the following three methods for selecting the gaze position.
(1) The gaze control neuron with the largest linear sum (non-linear sum) of the feedback amount from the upper layer (or the sum of the feedback amounts from multiple upper layers) and the feature detection layer output (significance of low-order features) Give viewpoint position.
(2) A gaze control neuron that receives a maximum (maximum) value of a low-order feature saliency map gives a gaze position.
(3) The gaze control neuron having the maximum (maximum) feedback amount from the upper layer (the sum of the feedback paths when a plurality of feedback paths are in the specific gaze control neuron) gives the gaze position.
[0096]
Here, the feedback amount refers to an output level (or an amount proportional thereto) of a higher-layer feature position detection layer (or feature integration layer) neuron for a specific processing channel and a specific feature category. In the present invention, this corresponds to the case (1) or (3). In addition, said (2) is a well-known method and has the above faults. As described above, the gaze position is determined by some contribution of the feedback amount from the upper layer.As a result, the object recognition process includes the bottom-up process by the signal processing from the lower layer to the upper layer and the upper layer to the lower layer. It starts only when the gaze area setting (activation of the gaze control neuron) brought about by the cooperative action between the upper layer neuron and the lower layer neuron by the top-down process is realized.
[0097]
For this reason, in the initial process (the process from the bottom-up process until the upper layer neuron outputs a signal), the so-called “recognized / detected” state is not reached (at this stage, the upper layer has higher-order features. The so-called saliency map is output as a latent state), and the upper-layer high-level feature detection / integration neuron goes through the bottom-up process again based on the result of the attention control signal generated by the top-down process. At the time of output (at this stage, neurons with a high output level are localized in the upper layer and are sparsely present), one operation related to recognition / detection is completed. The upper layer is not necessarily limited to the (2, N) layer in FIG. 1 or FIG. 17 or the (3, k) layer in FIG. 1, and may be an intermediate layer lower than these layers or a higher layer not shown. Good.
[0098]
The number of gaze control neurons 1801 is not necessarily the number of neurons on the feature integration layer (2,0) (that is, the number N of feature detection positions)._F0), For example, the number of neurons in the highest feature position detection layer (N_Fp) May be the same. As a prerequisite, N_Fp <N_F0In this case, the number of sampling points for performing gaze control is smaller than the number of low-order feature detection positions, but the lower limit of the size of the detection target is νS.₀(Where 0 <ν <1, S₀Is the total area of the screen), N_Fp<< νN_F0Otherwise, there is almost no problem in practical use. Note that the low-order feature saliency map received by the gaze control neuron may be an output from the feature detection layer (for example, the (1,0) layer).
[0099]
On the other hand, the size of the gaze area set by updating the gaze position belongs to the highest layer neuron that gives the maximum feedback amount to the gaze control neuron 1801 (or the feature integration layer neuron corresponding to the gaze control neuron belongs) The same applies to the processing channel (in the case of (3) above), or the feature integration layer neuron corresponding to the gaze control neuron with the maximum (maximum) linear sum of the input from the upper layer and the low-order feature saliency It has a preset size determined by the processing channel to which it belongs. In this embodiment, the size of the gaze area is the kernel size of the Gabor filter belonging to the processing channel of the feature integration layer (2, 0) described above.
[0100]
As a specific configuration of the gaze area setting control layer, in addition to the gaze control neuron 1801, in the channel processing structure shown in FIG. 12 or 13 described above, a configuration like the gating circuit of FIG. A gaze position update control circuit 1901 shown in FIG. 19 may be introduced.
[0101]
This gating circuit is used for masking low-order features (propagating only the data of a specific partial area to the upper layer), and the gating circuit selects from the outputs from the feature integration layer (2,0) A function to selectively extract a region to be watched in the specified channel and propagate the signal to the subsequent channel. In this way, the gaze area can be set by propagating only the low-order feature signal from the specific area on the specific channel and input data.
[0102]
As another configuration, when a gaze control neuron is selected and activated (for example, by a feedback signal from an upper layer), it is passed through a synapse to a feature integration layer (2, 0) neuron at the corresponding position. An ignition promotion pulse is transmitted, and as a result, the feature integration layer neuron is turned on (fireable state:Pacemaker neuronOr the like, that is, a transitional transition state or a state in which the transition state is allowed).
[0103]
Examples of signal propagation paths between the gaze area setting control layer 108, the feature integration layer (2,0), and the feature detection layer (1,1) are shown in FIGS. 20 and 21, a gate circuit 2001 that is set together with a so-called digital circuit common bus and has a switch function and a switch signal control circuit 2002 that controls ON / OFF of the switch function at high speed are used. In FIG. 20, the gaze control neuron 1801 is centered on a plurality of neurons in the feature integration layer (2, 0), that is, a neuron on the feature integration layer corresponding to the gaze position (hereinafter referred to as “gaze central neuron”). (The area size is specific to the processing channel to which the gaze central neuron on the integrated layer belongs) and a neuron in a predetermined size area unique to the processing channel.
[0104]
In the configuration of FIG. 20, the gaze control neuron 1801 has a connection to the feature integration layer, and the gaze position is controlled based only on the feedback amount from the feature position detection layer on the Where path (gaze control as a modified configuration thereof). The neuron and the feature integration layer neuron are mutually connected, and the gaze position may be controlled based on the feedback amount and the feature integration layer output). On the other hand, in the configuration of FIG. 21, the gaze control neuron 1801 has the same connection to the gating layer 2101 corresponding to the gating circuit of FIG. 16, and the feature integration layer (2,0) to the feature detection layer (1,1). The output directed to 1) is made via the gating layer 2101.
[0105]
In the configuration shown in FIG. 20, an area of a specific size (generally a circular or rectangular area) is activated in a processing channel centering on a feature integration layer neuron at a set gaze position, and feature integration corresponding to that area is performed. The output of layer (2,0) propagates to feature detection layer (1,1). On the other hand, in the configuration shown in FIG. 21, only the gate of the region of the specific size is opened in the processing channel centering on the set gaze position in the gating layer, and the feature integration layer (2, 0) corresponding to that region is opened. Is propagated to the feature detection layer (1, 1).
[0106]
This gating layer is composed of an FPGA or the like, and the entire logic circuit corresponding to each position of the gaze control neuron is located between the feature integration layer (2,0) and the feature detection layer (1,1). Corresponding to a general gating circuit that exists, when the gaze control neuron is activated, only the logic circuit unit at the corresponding position in the gating layer is turned on, and the signal is transmitted to the subsequent layer.
[0107]
As described above, the size of the gaze area to be set is determined according to the processing channel to which the feature integration layer neuron corresponding to the activated gaze control neuron described above belongs. For example, if the corresponding feature integration layer neuron belongs to the processing channel 1 in FIG. 12 or 13, it is the largest of the predetermined sizes.
[0108]
Next, a gaze position update control mechanism will be described. In the case of (3) of the gaze control mechanism described above, the gaze position after the initial setting is basically received by each gaze control neuron from the upper layer in the vicinity region within a certain range of the latest gaze position. The feedback amount is updated in the order of the magnitude (from the next magnitude of the maximum value of the feedback amount). In the case of the gaze control mechanism (1) described above, an evaluation value G as shown below._fThe position is updated in the order of the values.
[0109]
G_f= Η₁S_f+ Η₂S_{FB, f} (Five)
[0110]
Where G_fIs an evaluation value for the feature f, and S_fIs the saliency of the feature f from the lower feature integration layer, S_{FB, f}Is the feedback amount to the gaze control neuron 1801 regarding the feature f detected in the uppermost layer or the middle feature detection layer, η₁, Η₂Indicates a positive constant. η₁, Η₂Is an average (schematic) feature comparison for performing detailed feature comparisons, such as identification of similar objects, for example, or for detecting objects belonging to a specific category, such as a human face. It can be changed adaptively according to the case of performing.
[0111]
Specifically, in the former problem, η₁For η₂The ratio is reduced, and conversely is increased in the latter case. Η like this₁, Η₂This control is performed by changing the corresponding synaptic connection to the gaze control neuron.
[0112]
FIG. 22 shows η associated with synapses of gaze control neurons.₁, Η₂A configuration example centering on the gaze control neuron 1801 and the input distribution control circuit 2202 to the gaze control neuron when performing the above control is shown.
[0113]
Here, the input distribution control circuit 2202 includes the synapse 2201 involved in the modulation of the detection level of the low-level feature saliency map signal input from the lower layer and the synapse involved in the modulation of the detection level of the feedback signal from the upper layer. 2201 and η depending on an external control signal (for example, a signal indicating whether the target detection mode or the detailed identification mode is a signal related to a problem from a higher layer not shown in FIG. 1).₁, Η₂Controls the amount of phase delay at the synapse corresponding to the value.
[0114]
The size of the neighborhood area is approximately the same as the gaze area size on the input data (for example, a value not exceeding 10 times that). The reason why the search is limited to the vicinity region is to keep the processing speed high. If the processing speed is not significantly deteriorated, the entire screen may be set as the search range.
[0115]
Next, a gaze position update control circuit 1901 that performs sequential gaze position update control will be described. As shown in FIG. 19, this circuit includes a primary storage unit 1902 for feedback amount to the gaze control neuron 1801 corresponding to the latest gaze position, each feedback amount input unit 1903 in the neighborhood region, and the latest in the neighborhood region. An update position determination unit 1904 that detects a feedback amount following the gaze position feedback amount (next candidate) and detects and activates a gaze control neuron that receives the feedback, and the like.
[0116]
The update position determining means 1904 receives inputs from the primary storage unit 1902 and the feedback input unit 1903 in the vicinity region, and compares the feedback amount in the latest gaze control neuron with the feedback amount in the vicinity region 1907, A secondary feedback signal amplification unit 1906 and a next candidate feedback amount detection unit 1905, which will be described later, have a secondary feedback signal amplification when the next candidate feedback amount detection unit 1905 detects a feedback amount according to the latest feedback amount. A pulse is output from the unit 1906 to the gaze control neuron corresponding thereto.
[0117]
According to the latest feedback amount, specifically, the feedback amount is randomly input in the vicinity region, and the result of comparing the two feedback amounts by the above-described method, the difference is within a certain reference (threshold) range, Alternatively, it means that the feedback amount to be compared is larger than the latest feedback amount (it is possible if the search range is updated), and by detecting this, the next update position is effectively detected.
[0118]
Next, in the configuration schematically shown in FIGS. 20 and 21, signal transmission between the gaze control neuron 1801 and the gaze position update control circuit 1901 will be described. As will be described later, a specific gaze control neuron 1801 and a gaze position update control circuit 1901 are temporarily connected to each other, and the gaze control neuron corresponding to the update position selected by the method described above is updated from the update position determination unit 1904. Secondary feedback input (feedback signal having a signal level proportional to the feedback amount from the upper layer to the gaze control neuron) is received and activated (as a result, the gaze area specific to the neuron is opened, and the input Only the signal corresponding to the data in the corresponding area on the data is transmitted to the subsequent layer).
[0119]
The above-mentioned temporary mutual connection means other specific gaze control on the neighboring region existing corresponding to the gaze control neuron that is activated at a certain time (that is, the signal transmission control to the subsequent layer is performed). This is to temporarily connect the neurons (signal propagation) through the local common bus. Specifically, among other gaze control neurons in the vicinity region, only one neuron (in a random position) spontaneously fires and outputs within a certain time range (for example, on the order of tens of milliseconds), As a result of the spike pulse reaching the gaze control neuron activated through the common bus, a communication channel with the gaze position update control circuit 1901 already activated is established only within the time range.
[0120]
For this reason, for example, access to the common bus is obtained only when another neuron other than the latest gaze control neuron in the vicinity region is in a fired state. Specifically, a variable resistor array and its control circuit are used so that the pulse signal from the firing neuron propagates on the wiring connected to the common bus, thereby temporarily reducing the resistance of the wiring. (Not shown) or a gate circuit 2001 (switch circuit) and a switch signal control circuit 2002 schematically shown in FIGS. 20 and 21 are set between each gaze control neuron and the common bus. Triggered by the firing pulse signal, the switch signal control circuit 2002 sends a switch ON signal to one of the gate circuits (c in 2001 in FIGS. 20 and 21) to temporarily open the channel (connection to the common bus is in the ON state) ) Etc. are taken.
[0121]
After the next candidate gaze position is selected, the secondary feedback signal is amplified by the amplifying means 84 and the feedback signal from the upper layer to the gaze control neuron is amplified a predetermined amount (in the case of phase modulation, due to the progress of the pulse phase, etc.). It is returned to the gaze control neuron.
[0122]
Here, the number of gaze control neurons (and thus gaze areas) activated at the same time is limited to one per feature category. For this reason, when the activation state transitions from a gaze control neuron activated at a certain time to a different gaze control neuron, the activity of the original activation neuron is suppressed for a certain period of time. This is realized by the fact that the gaze control neurons (nearest neighbors) form an inhibitory connection (the dotted line from the gaze control neuron 1801 in FIGS. 20 and 21 represents the inhibitory connection).
[0123]
In the above, the range of the feedback amount related to the gaze control has a preset lower limit value, and after the control shifts to the gaze position corresponding to the lower limit value, the first gaze position is set again. Shall be.
[0124]
The above-described mechanism in which the gaze area is directly updated and controlled by the gaze control layer using the feedback signal from the upper layer provides the following unprecedented effects.
[0125]
Compared to the case where gaze area control is performed using only the saliency map for low-order features, the search can be performed more efficiently (and hence faster), and the gaze area continues to vibrate between the two feature patterns. Such meaningless visual search can be avoided.
[0126]
In the object to be watched, it is possible to selectively and efficiently search only a characteristic part or pattern (for example, an eye in the case of a face or the like) in particular. In this case, feedback from the intermediate layer that is involved in detection of a characteristic site in the object may be used instead of using feedback from the uppermost layer.
[0127]
By searching for the next update position in the region near the latest gaze position, gaze position update control can be performed at high speed.
[0128]
Even when there are multiple targets belonging to the same category, by appropriately setting the size of the neighborhood area as the search range, the range of the gazing point position is not fixed to a specific target, and the target area can be stabilized. Can be controlled in such a way as to make sequential transitions.
[0129]
Processing in the feature integration layer
The neurons in the feature integration layer ((2, 0), (2, 1), ...) will be described. As shown in FIG. 1, the connection from the feature detection layer (eg, (1, 0)) to the feature integration layer (eg, (2, 0)) is the same feature of the preceding feature detection layer in the receptive field of the feature integration neuron. It is configured to receive excitatory coupling input from element (type) neurons, and as described above, neurons in the integration layer are locally averaged for each feature category (input from neurons that form the receptive field of feature detection neurons) Sub-sampling (sub-sampling neuron) by sub-sampling by means of average value calculation, representative value calculation, maximum value calculation, etc.), and combination of outputs related to features of the same category across different scales (processing channels) ( Collective coding neurons).
[0130]
According to the former, a plurality of pulses of the same type of feature are input, and they are integrated in a local region (receptive field) and averaged (or a representative value such as a maximum value in the receptive field is calculated. Therefore, it is possible to reliably detect fluctuations and deformations of the position of the feature. For this reason, the receptive field structure of the feature integration layer neuron is uniform regardless of the feature category (for example, each is a rectangular region of a predetermined size and the sensitivity or weighting factor is uniformly distributed therein). You may comprise.
[0131]
Collective coding for scale levels
The latter mechanism of population coding will be explained in detail. In a collective coding neuron, normalization of outputs from multiple sub-sampling neurons that are at the same hierarchical level (with similar complexity of graphic features) but belong to different processing channels and within the same feature integration layer Integrate by taking a linear combination. For example, in the feature integration layer (2,0) that receives the output of the feature detection layer (1,0) that performs Gabor wavelet transformation, a set of Gabor filters belonging to different processing channels and having the same direction selectivity {g_mn} (n constant, m = 1, 2,...) are integrated by linear combination or the like. Specifically, p_ijThe output of the subsampling neuron with direction component selectivity i and scale selectivity j (q), q_ijAssuming that (t) is a population code having the same selectivity, the expression (6) representing the linear combination of the normalized outputs of the sub-sampling neurons and the expression (7) representing the normalization method are as follows. It is expressed in Expressions (6) and (7) represent the output state transitions of the subsampling neurons and the collective coding neurons as discrete time transitions for convenience of explanation.
[0132]
[Outside 4]

[0133]
Where w_{ij, ab}Is a directional component from a subsampling neuron output from a neuron (or neuron population) with multiple different selectivity (sensitivity characteristics) with a feature category, that is, a directional component selectivity index of a and a scale level selectivity index of b A coupling coefficient representing the contribution (to a collective coding neuron) with a selectivity index of i and a scale level selectivity index of j.
[0134]
w_{ij, ab}Indicates the filter function (selectivity) centered on the directional component index i and the scale level index j, typically the function shape of | i-a | and | j-b |_{ij, ab}= f (| i-a |, | j-b |)). As will be described later, this w_{ij, ab}The collective coding by linear combination via the q is considered in consideration of the detection level of other selective neurons q_ijIs intended to give a presence probability with respect to a feature category (direction component) and a scale level.
[0135]
C is a normalization constant, and λ and β are constants (typically β is 1 to 2). C is p even if the sum of collective codes for a feature category is almost zero_ijIs a constant to prevent divergence. In the initial state when the system is started, q_ij(0) = p_ijSet to (0). Corresponding to FIG. 12, in formulas (6) and (7), only the scale level selectivity index is added. As a result, the collective coding neuron outputs an existence probability (a quantity proportional to) for each feature belonging to a different scale level (processing channel) in the same feature category.
[0136]
On the other hand, as in the case of FIG. 13, in general, by further adding the direction component selectivity index, a system for performing collective encoding also on intermediate levels of a predetermined number of direction components is assembled. Can do. in this case,Parameters(Β in formulas (8) and (9), and w_{ij, lk}) Is appropriately set, in the configuration shown in FIG. 13, each collective coding neuron can output a feature existence probability (a quantity proportional to) for each scale level and each feature category.
[0137]
As shown in equation (6), the collective code q_ij(t) is obtained by a normalized linear combination of the outputs of neurons with different sensitivity characteristics. Q reached steady state_ij(t) is properly normalized (e.g. q_ijNormalized to the sum of the values) so that the value is between 0 and 1, q_ijGives the probability that the direction component corresponds to i and the scale level to j. Therefore, to explicitly determine the scale level corresponding to the target size in the input data as q_ijIt is only necessary to obtain a curve that fits and estimate the maximum value and obtain the corresponding scale level. The scale level obtained in this way generally indicates an intermediate value of a preset scale level.
[0138]
FIG. 23 shows an example of collective encoding of scale levels, where the horizontal axis represents the scale level and the vertical axis represents the cell output. The output corresponds to the pulse phase, and a neuron with peak sensitivity at a specific scale has a lower output level for features with a size that deviates from that scale compared to a feature with a size corresponding to the specific scale. That is, a phase delay occurs. The figure shows the sensitivity curve (so-called tuning curve) regarding the scale selectivity of each feature detection cell and each cell output, and the collective code integrated output obtained by integrating them (the moment regarding the scale level of each cell output, ie, Linear sum). The position on the horizontal axis of the integrated output of the collective code reflects the estimated value of the scale (size) related to the recognition target.
[0139]
In the present embodiment, the scale level is not actually obtained explicitly, and the output from the feature integration layer to the feature detection layer is q_ij(Normalized q_ijMay be) That is, in both FIGS. 12 and 13, the output from the feature integration layer to the feature detection layer is not the output from the sub-sampling neuron, but the output of the collective coding neuron. Q after normalization_ijAs described above, it is represented collectively as the detection probability of a specific object across multiple scale levels (resolutions).
[0140]
In the circuit configuration of the feature integration layer shown in FIG. 12, first, the subsampling neuron circuit 1201 outputs the neuron output having the same size selectivity as each feature category among the preceding feature detection layer neuron outputs. Receive in the receptive field and perform local averaging. Each sub-sampling neuron output is sent to the connection processing circuit 1203. At this time, as will be described later, the pulse signal from each neuron is converted to a square corresponding to the output level of the feature detection neuron by a synapse circuit (not shown) when β in Formula (7) is 2, for example. Propagated by a proportional amount) and propagated through a local common bus. However, the wiring between neurons may be physically independent without using a common bus.
[0141]
The combination processing circuit 1203 performs processing corresponding to equations (6) and (7), and performs collective encoding of information having the same feature category but different size selectivity (across multiple processing channels).
[0142]
In FIG. 12, collective encoding is performed for sub-sampling neuron outputs having the same feature category (direction component selectivity), whereas in the circuit configuration shown in FIG. The combination processing circuit performs processing as shown in equations (8) and (9).
[0143]
[Outside 5]

[0144]
On the other hand, P for each channel in the (2,0) layer_aA channel activity control circuit can be set up that performs signal amplification / attenuation (pulse phase advance / delay) in accordance with the output from each collective coding neuron. FIG. 15 shows a schematic diagram of a configuration in which such a channel activity control circuit 1502 is set. This channel activity control circuit is set between the collective coding neuron 1202 of FIGS. 12 and 13 and the feature detection layer which is the next layer.
[0145]
In the final layer, the existence probability of the recognition target as a high-order feature is expressed as a neuron activity level (that is, firing frequency, firing spike phase, etc.) over a plurality of channels. In the Where processing path (or when the location information of the detection / recognition target is also detected in the last layer), the existence probability of the target according to the position (location) in the input data in the final layer (the presence or absence of the target if threshold processing is performed) ) Is detected as the activity level of each neuron. The collective coding may be obtained by linear combination without normalization, but may be easily affected by noise, and thus normalization is desirable.
[0146]
The normalization shown in Equation (7) or (9) is based on so-called shunting inhibition at the neural network level, and linear combination as shown in Equation (6) or (8) This can be realized by a lateral connection.
[0147]
An example of a normalization circuit when β is 2 is shown in FIG. This normalization circuit is used for feature detection cells n belonging to different processing channels._ijA sum-of-squares calculation circuit 1403 for taking the sum of the squares of the outputs of the current, a shunt type suppression circuit 1404 that mainly performs normalization of equation (6), and a linear sum obtained by obtaining the linear sum of equation (5) And a circuit 1405.
[0148]
In the sum-of-squares calculation circuit 1403, there is an inter-neuron element 1406 that pools the square value of each feature detection cell, and each synapse connection element 1402 that provides connection to the inter-neuron 1406 However, a pulse phase delay (or pulse width modulation or pulse frequency modulation) corresponding to the square value of the output of the feature detection cell 1401 is given.
[0149]
For example, the shunting suppression circuit 1404 calculates the square of the output of the variable resistance element, the capacitor, and the feature detection cell 1401 that is proportional to the inverse of the value obtained by multiplying the output of the interneuron 1406 by a predetermined coefficient (λ / C). A pulse phase modulation circuit (or pulse width modulation circuit, pulse frequency modulation circuit).
[0150]
Next, a modification of channel processing will be described. A configuration in which collective encoding is performed for each processing channel as described above and each processing channel output is transmitted to the subsequent layer (that is, a configuration in which the configuration of FIG. 12 or 13 is cascaded to the subsequent layer) In order to increase processing efficiency and reduce power consumption, only feature detection cells belonging to the same channel as the processing channel that gives the maximum response level in the feature integration layer (2,0) (next layer) Alternatively, the output of the collective coding neuron may be propagated.
[0151]
In this case, in addition to the configuration shown in FIGS. 12 and 13, a maximum input detection circuit, a so-called Winner-Take-All circuit (as a processing channel selection circuit that receives the output of the collective coding neuron circuit and gives the maximum response level ( Hereinafter, the WTA circuit) is set to exist between the output of the feature integration layer (2,0) and the next feature detection layer (1,1). This processing channel selection circuit may be set for each position of the feature integration layer, or set as a circuit for calculating the maximum response level for each processing channel for the entire input data regardless of the location, one for that layer. Also good.
[0152]
As the WTA circuit, for example, known configurations described in JP-A-08-321747, USP5059814, USP5146106 and others can be used. FIG. 16A schematically shows a configuration in which the output of only the processing channel showing the maximum response of the feature integration layer is propagated to the next feature detection layer by the WTA circuit in the feature integration layer. This is obtained by replacing the channel activity control circuit 1502 of FIG.
[0153]
As shown in FIG. 16B, the gating circuit 1602 inputs the average output level for each processing channel and the output of each neuron from the processing channel indicating the maximum average output level as follows. And a channel selection circuit 1604 for propagating to the same channel of the layer.
[0154]
In the subsequent feature integration layer (2, k) (k is 1 or more), such a processing channel selection circuit is not necessarily required. For example, the output of the feature integration layer after the detection of higher-order features is selected as the processing channel. The processing channel may be selected in the integration layer of the low-order or middle-order features by feedback through a circuit. This is the end of the description of the modification of the channel processing. It should be noted that the sub-sampling, combining process, and collective coding flow shown in FIGS. 12 and 13 are not limited to the configuration in the feature integration layer. For example, a separate layer for combining process and collective coding is provided. Needless to say, it may be.
[0155]
Even when objects of almost the same size exist in close proximity or partially overlap, multiple types of partial features such as local receptive field structure and sub-sampling structure are integrated. Needless to say, the detection mechanism maintains the recognition and detection performance of the object.
[0156]
Pulse signal processing in the feature integration layer
The combination circuit described above outputs a pulse corresponding to the collective coding level obtained by the expression (5) or (7) to each collective coding neuron, and each feature integrated cell of the layer number (2, k). (n₁, n₂, n_Three) As described above receives a pulse input from a pacemaker neuron of layer number (1, k + 1) and has a feature detection layer or sensor input layer (layer number (1, k + 1) )) When the output of the above-mentioned coupling circuit is at a sufficient level (for example, the average number of input pulses in a certain time range or time window is larger than the threshold value or the pulse phase is advanced). The output is based on the falling edge of the pulse from the pacemaker.
[0157]
In addition, the subsampling neurons described above do not receive any control from any pacemaker neuron, but have an average phase (independent phase for each subsampling neuron from the (1, k) layer feature detection cells in the previous stage. Within the time window) Subsampling is performed based on the output level. The pulse output timing control from the sub-sampling neuron to the connection processing neuron is also performed without using the pacemaker neuron, and the pulse output from the connection processing circuit to the collective coding neuron is the same.
[0158]
Thus, in this embodiment, feature integrated cells (subsampling neurons, collective coding neurons, etc.) receive timing control from the pacemaker neurons on the feature detection layer of the previous layer number (1, k). There is no configuration. This is because, in feature-integrated cells, the phase (frequency, pulse width, or amplitude) determined by the input level (such as the temporal total value of the input pulse) in a certain time range rather than the arrival time pattern of the input pulse. This is because the timing of generating the time window is not so important because the pulse is output at the phase) in this embodiment. This is not intended to exclude the configuration in which the feature-integrated cells are subjected to timing control from the pacemaker neuron in the feature detection layer of the preceding layer, and it goes without saying that such a configuration is also possible.
[0159]
Processing in the feature position detection layer
As shown in FIG. 1, the feature position detection layer forms a Where route separated from the What route, and is connected to the feature detection layer (1, k) at the same hierarchical level and feedback to the gaze area setting control layer 108. Make. The arrangement and function of the neurons in the feature position detection layer 107 is such that the collective coding in the feature integration layer 103 is not performed, and the receptive field size is large between the upper layer and the lower layer so that the information on the feature arrangement relationship is not lost. Except for the fact that it does not change, similar to the sub-sampling neuron 1201 of the feature integration layer 103, each neuron is arranged for each feature category, and sub-sampling is performed by coupling with the feature detection layer neuron. As a result, in the uppermost feature position detection layer, a distribution of firing states of neurons representing a rough spatial distribution (arrangement) related to the recognition / detection target is obtained.
[0160]
Pulse signal processing in the feature position detection layer
This is the same as the sub-sampling neuron 1201 of the feature integration layer 103. That is, the neurons in the (3, k) layer are not controlled by any pacemaker neuron, and are averaged from the feature detection cells in the previous (1, k) layer (independent phase for each subsampling neuron. Sub-sampling processing is performed based on the output level.
[0161]
Principle of pattern detection
Next, a pulse encoding and detection method for a two-dimensional figure pattern will be described. FIG. 3 is a diagram schematically showing how a pulse signal propagates from the feature integration layer 103 to the feature detection layer 102 (for example, from the layer (2,0) to the layer (1,1) in FIG. 1). It is.
[0162]
Each neuron n on the feature integration layer 103 side_i(N₁~ N_Four) Correspond to different feature quantities (or feature elements), and the neuron n ′ on the feature detection layer 102 side._jIs involved in detection of higher-order features (graphic elements) obtained by combining features in the same receptive field.
[0163]
For each neuron connection, pulse propagation time and neuron n_iFrom neuron n '_jSynaptic connection (S_ij) Due to a time delay or the like at time), resulting in a neuron n ′ via the common bus line 301._jPulse train P arriving at_iAs long as pulse output is made from each neuron of the feature integration layer 103, it is in a predetermined order (and interval) depending on the delay amount at the synaptic connection determined by learning (in FIG. 3A, P_Four, P_Three, P₂, P₁Are shown to arrive in order).
[0164]
FIG. 3B shows a feature integrated cell n on the layer number (2, k) in the case where time window synchronization control is performed using a timing signal from a pacemaker neuron.₁, N₂, N_Three(Representing different types of features), a certain feature detection cell (n 'on layer number (1, k + 1))_j) (The timing of pulse propagation to the higher-level feature detection) is shown.
[0165]
FIG. 6 is a diagram illustrating a network configuration when the feature detection layer neuron has an input from a pacemaker neuron. In FIG. 6, pacemaker neuron 603 (n_p) Feature detection neurons 602 (n) that form the same receptive field and detect different types of features._j, n_kEtc.) and form an identical receptive field to receive excitatory connections from neurons 601 on the feature integration layer (or input layer). And at a predetermined timing (or frequency) determined by the total value of the inputs (or to control depending on the state representing the activity characteristic unique to the entire receptive field, such as the average activity level of the entire receptive field). Pulse output is performed on the feature detection neuron 602 and the feature integration neuron.
[0166]
In addition, each feature detection neuron 602 is configured so that the time windows are phase-locked with each other using the input as a trigger signal, but before the pacemaker neuron input as described above, the phase lock is not performed, and each neuron Pulse output with random phase. The feature detection neuron 602 does not perform time window integration, which will be described later, before input from the pacemaker neuron 603, and performs integration using a pulse input from the pacemaker neuron 603 as a trigger.
[0167]
Here, the time window is the feature detection cell (n '_i) And is common to each neuron in the feature integration layer and the pacemaker neuron 603 forming the same receptive field with respect to the cell, and provides a time range of time window integration.
[0168]
The pacemaker neuron 603 in the layer number (1, k) (k is a natural number) outputs the pulse output, each feature integrated cell of the layer number (2, k-1), and the feature detection cell (layer) to which the pacemaker neuron 603 belongs. By outputting to the number (1, k)), a timing signal for generating a time window when the feature detection cells add inputs temporally is given. The start time of this time window is a reference time for the arrival time of the pulse output from each feature integrated cell. That is, the pacemaker neuron 603 gives a pulse output time from the feature integrated cell and a reference pulse for time window integration in the feature detection cell.
[0169]
Each pulse is given a predetermined amount of phase delay when passing through the synapse circuit, and further arrives at the feature detection cell through a signal transmission line such as a common bus. The pulse (P) in which the arrangement on the time axis of the pulse at this time is represented by a dotted line on the time axis of the feature detection cell.₁, P₂, P_Three).
[0170]
Each pulse (P₁, P₂, P_Three) Time window integration (usually a single integration; however, charge accumulation by multiple time window integrations or averaging of multiple time window integrations may be performed), threshold value If it becomes larger, the pulse output (P_d) Is made. Note that the learning time window shown in FIG. 3B is referred to when a learning rule described later is executed.
[0171]
Synapse circuit, etc.
Figure 4 shows the synaptic circuit S_iFIG. FIG. 4A shows a synapse circuit 202 (S_i) In neuron n_iEach neuron n '_jIt shows that the respective small circuits 401 that give synaptic coupling strength (phase delay) to are arranged in a matrix. In this way, the wiring from the synapse circuit to the connected neuron can be performed on the same line (local common bus 301) corresponding to each receptive field (the wiring between neurons can be virtually performed). ), The wiring problem that has been a problem in the past can be reduced (removed).
[0172]
In addition, when receiving multiple pulses from the same receptive field, the neuron to which they are connected receives the pulse arrival time on the basis of the time window (characteristic detected by the feature detection cell). Can be identified on the time axis by the phase delay inherent to the low-order features that make up the
[0173]
As shown in FIG. 4B, each synapse coupling subcircuit 401 includes a learning circuit 402 and a phase delay circuit 403. The learning circuit 402 adjusts the delay amount by changing the characteristics of the phase delay circuit 403, and also sets the characteristic value (or its control value) on the floating gate element or a capacitor coupled to the floating gate element. It is something to remember.
[0174]
FIG. 5 is a diagram showing a detailed configuration of the synapse coupling subcircuit. The phase delay circuit 403 is a pulse phase modulation circuit. For example, as shown in FIG. 5A,

monostable multivibrators

506 and 507,

resistors

501 and 504,

capacitors

503 and 505,Transistor502 can be used. FIG. 5B shows a square wave P1 input to the monostable multivibrator 506 ([1] in FIG. 5B) and a square wave P2 output from the monostable multivibrator 506 (same [2] ), Each timing of the square wave P3 ([3]) output from the monostable multivibrator 507.
[0175]
Although the detailed description of the operation mechanism of the phase delay circuit 403 is omitted, the pulse width of P1 is determined by the time until the voltage of the capacitor 503 due to the charging current reaches a predetermined threshold, and the width of P2 is the resistance 504. And the time constant of the capacitor 505. If the pulse width of P2 spreads (as in the dotted square wave in FIG. 5B) and the falling point shifts later, the rising point of P3 also shifts by the same amount, but the pulse width of P3 does not change. As a result, only the phase of the input pulse is modulated and output.
[0176]
The pulse phase (delay amount) can be controlled by changing the control voltage Ec by the reference voltage refresh circuit 509 and the learning circuit 402 that controls the amount of charge accumulated in the capacitor 508 that applies a coupling weight. In order to maintain this coupling load for a long period of time, the floating gate element (not shown) added outside the circuit of FIG. 5A after the learning operation is charged, or writing to the digital memory is performed. The combined load may be stored. In addition, a known circuit configuration such as a configuration designed to reduce the circuit scale (for example, see Japanese Patent Laid-Open Nos. 5-37317 and 10-327054) can be used.
[0177]
When the network is configured so as to be in the form of a covalent coupling of coupling weights (especially when a plurality of synaptic couplings are represented by a single weighting factor), the delay amount at each synapse (the following formula (9 Pij) can be uniform within the same receptive field, unlike the case of FIG. In particular, the coupling from the feature detection layer to the feature integration layer is not related to the detection target because the feature integration layer is involved in sub-sampling by local averaging of the output of the feature detection layer, which is the preceding layer, etc. Regardless of this, it can be configured in this way.
[0178]
In this case, each small circuit in FIG. 4A is composed of a single circuit S coupled by a local common bus line 401 as shown in FIG._{k, i}This is a particularly economical circuit configuration. On the other hand, when the coupling from the feature integration layer 103 (or the sensor input layer 101) to the feature detection layer 102 is like this, the feature detection neuron detects the simultaneous arrival of pulses representing a plurality of different feature elements. (Or almost simultaneous arrival).
[0179]
When the coupling has symmetry, a number of synaptic connections can be represented by a small number of circuits by representing the coupling giving the same load (phase delay) amount by the same small circuit for synaptic coupling. Can be configured. In particular, in the detection of geometric features, the distribution of coupling weights within the receptive field often has symmetry, so that the synapse coupling circuit can be reduced and the circuit scale can be greatly reduced.
[0180]
As an example of a learning circuit at a synapse that realizes simultaneous arrival of pulses or a predetermined phase modulation amount, a learning circuit having a circuit element as shown in FIG. That is, the learning circuit 402 is replaced with a pulse propagation time measurement circuit 510 (where propagation time is the pulse output time at the presynapse of a neuron in a layer and the arrival time of the pulse at an output destination neuron on the next layer. (B) in FIG. 3 is the sum of the synapse delay and the time required for propagation), the time window generation circuit 511, and the synapse portion so that the propagation time becomes a constant value. The pulse phase modulation amount adjustment circuit 512 can be configured to adjust the pulse phase modulation amount.
[0181]
As a propagation time measurement circuit, the same local receptive field as described later is formed.Pacemaker neuronA configuration is used in which a clock pulse is input and the propagation time is determined based on the output from the counter circuit of the clock pulse in a predetermined time width (time window: see FIG. 3B). The time window is set on the basis of the firing point of the output destination neuron, so that the extended Hebb learning rule as shown below is applied.
[0182]
Learning rules
Further, the learning circuit 402 may be configured such that the width of the time window becomes narrower as the frequency of presenting objects of the same category increases. By doing so, the operation is such that the closer the pattern is to a familiar category (that is, the greater the number of presentations and the number of learning), the closer to the coincidence detection mode of multiple pulses. By doing this, the time required for feature detection can be shortened (instantaneous detection operation is possible), but it is possible to perform detailed comparative analysis of the spatial arrangement of feature elements, identification between similar patterns, etc. Is no longer suitable.
[0183]
The learning process of the delay amount is extended to the complex domain, for example, the neurons ni of the feature detection layer and the neurons n of the feature integration layer_jComplex connection weight C between_ijIs
C_ij= S_ijexp (iP_ij(10)
Is given as follows. Where S_ijIs the bond strength, P_ijIs a phase, i before that is a pure imaginary number, and is a phase corresponding to a time delay of a pulse signal output from neuron j to neuron i at a predetermined frequency. S_ijReflects the receptive field structure of neuron i and generally has a different structure depending on the object to be recognized and detected. This is formed separately by learning (supervised learning or self-organization), or formed as a predetermined structure.
[0184]
On the other hand, the learning rule for self-organization regarding the delay amount is
[0185]
[Outside 6]

Given in. However,
[0186]
[Outside 7]

Is the time derivative of C, τ_ijIs the time delay (a preset amount), and β (˜1) is a constant.
[0187]
Solving the above equation, C_ijIs βexp (-2πiτ_ij) And therefore P_ijIs -τ_ijConverge to. An example of learning rule application will be described with reference to the learning time window shown in FIG. 3B. The frontal neurons (n1, n2, n3) and the rear neurons (feature detection cells) with synaptic connections are shown. In the time range of the learning time window, the combined load is updated according to the equation (11) only when both are ignited. In FIG. 3B, the feature detection cell is ignited after the time window elapses, but may be ignited before the time window elapses.
[0188]
Feature detection layer processing
In the following, processing (at the time of learning and recognition) performed mainly in the feature detection layer will be described.
[0189]
In each feature detection layer 102, as described above, pulse signals related to a plurality of different features from the same receptive field are input in the processing channel set for each scale level, and the spatio-temporal weighted sum (load) Sum) and threshold processing. The pulse corresponding to each feature amount arrives at a predetermined time interval by a delay amount (phase) determined in advance by learning.
[0190]
The learning control of this pulse arrival time pattern is not the main point of the present application and will not be described in detail, but for example, the feature elements that make up a certain graphic pattern arrive earlier the more the feature that contributes to the detection of that graphic, As it is, competitive learning is introduced such that the characteristic elements whose pulse arrival times are approximately equal arrive at a certain amount apart from each other in time. Alternatively, it arrives at different time intervals between predetermined feature elements (feature elements constituting the recognition target, which are considered to be particularly important: for example, features with a large average curvature, features with high linearity, etc.) You may design it like this.
[0191]
In the present embodiment, the neurons corresponding to the low-order feature elements in the same receptive field on the feature integration layer, which is the previous layer, each fire synchronously (pulse output) at a predetermined phase.
[0192]
Generally, there is a connection to a feature detection neuron in the feature integration layer that detects the same higher-order feature at different positions but in this position (in this case, the receptive field is different but the same higher-order feature) Having a bond). At this time, it goes without saying that synchronous firing occurs between these feature detection neurons. However, the output level (here, the phase reference is used, but the frequency, amplitude, and pulse width reference may be used) is the sum (or average) of contributions from multiple pacemaker neurons given for each receptive field of the feature detection neuron. Etc.). In each neuron on the feature detection layer 102, the calculation of the spatio-temporal weighted sum (load sum) of the input pulse is performed only in a time window of a predetermined width for the pulse train that has arrived at the neuron. Needless to say, the mechanism for realizing weighted addition within the time window is not limited to the neuron element circuit shown in FIG.
[0193]
This time window corresponds to some extent to a time zone other than the actual neuronal refractory period. In other words, in the refractory period (time range other than the time window), there is no output from the neuron no matter what input is received, but in the time window outside that time range, firing is performed according to the input level. Similar to neurons.
[0194]
The refractory period shown in FIG. 3B is a time period from immediately after the firing of the feature detection cell to the next time window start time. It goes without saying that the length of the refractory period and the width of the time window can be arbitrarily set, and the refractory period does not have to be shorter than the time window as shown in FIG. Even without the use of pacemaker neurons, the start time of the time window can be synchronized between the neurons in the feature detection layer and the feature integration layer by a weak mutual coupling between neurons and a predetermined coupling condition (EMIzhikevich, 1999 'Weakly Pulse -Coupled Oscillation, FM Interactions, Synchronization, and Oscillatory Associative Memory 'IEEE Trans. On Neural Networks, vol.10. Pp.508-526. It is known that this synchronous firing is generally caused by mutual coupling and pulling phenomenon between neurons.
[0195]
Therefore, even in this embodiment, such an effect can be brought about without a pacemaker neuron by configuring so as to satisfy the weak mutual coupling between neurons and a predetermined synaptic coupling condition.
[0196]
In the present embodiment, as schematically shown in FIG. 6, as the mechanism already described, for example, for each feature detection layer neuron, a pacemaker neuron that receives input from the same receptive field (pulse output at a fixed frequency) By inputting the timing information (clock pulse) according to, the above-mentioned start time may be made common.
[0197]
In such a configuration, the time window synchronization control does not need to be performed over the entire network (even if it is necessary), and even if the clock pulse fluctuates or fluctuates as described above, Reliability of feature detection because it is uniformly affected by the output from the same receptive field (the fluctuations in the position of the window function on the time axis are the same among neurons that form the same receptive field) Sex does not deteriorate. In order to enable highly reliable synchronous operation by such local circuit control, the tolerance of variation regarding circuit element parameters is also increased.
[0198]
Hereinafter, for simplicity, a feature detection neuron that detects a triangle as a feature will be described. The feature integration layer 103 in the previous stage is an L-shaped pattern (f) having various orientations as shown in FIG.₁₁, f₁₂, ...,), line segment combination pattern (f)_{twenty one}, f_{twenty two}, ...), a combination of parts of the two sides of the triangle (f₃₁,...), Etc., it shall react to graphical features (feature elements).
[0199]
Also, f in the figure₄₁, f₄₂, f₄₃Is a feature of triangles with different orientations, and f₁₁, f₁₂, f₁₃The characteristic corresponding to is shown. As a result of setting a specific delay amount between the neurons that make the interlayer connection by learning, in the triangular feature detection neuron, each sub-time window (time slot) obtained by dividing the time window (time slot) (w₁, w₂,...) Are preset so that pulses corresponding to the main and different features constituting the triangle arrive.
[0200]
For example, w after dividing the time window into n₁, w₂.., Wn, first, a pulse corresponding to a combination of feature sets that form a triangle as a whole, as shown in FIG. Here, an L-shaped pattern (f₁₁, f₁₂, f₁₃) Are w₁, w₂, w_ThreeArrived in the feature element (f_{twenty one}, f_{twenty two}, f_{twenty three}) Is a pulse corresponding to w₁, w₂, w_ThreeThe amount of delay is set by learning so that it arrives inside.
[0201]
Feature element (f₃₁, f₃₂, f₃₃The pulses corresponding to) arrive in the same order. In the case of FIG. 7A, pulses corresponding to one feature element arrive in one sub time window (time slot). The meaning of dividing into sub-time windows is that each of the sub-time windows is detected individually and reliably by detecting pulses corresponding to different feature elements developed on the time axis (detection of feature elements). The integration method when integrating the features, for example, to increase the possibility of changing the processing mode and adaptability such as whether all feature elements are to be detected or whether a certain percentage of feature detection is to be a condition .
[0202]
For example, in a situation where the recognition (detection) target is a face and the eye search (detection) that is a part of the face is important (when the eye pattern detection priority is to be set high in the visual search) By introducing feedback coupling from a higher-order feature detection layer, reaction selectivity (detection sensitivity of a specific feature) corresponding to a feature element pattern that selectively constitutes an eye can be increased. By doing so, it is possible to perform detection by giving higher importance to the low-order feature elements constituting the high-order feature elements (patterns).
[0203]
Also, if the important feature is set in advance so that the pulse arrives earlier in the sub time window, the weight function value in the sub time window becomes larger than the value in the other sub time window. A feature with higher degree can be easily detected. This importance (detection priority between features) is acquired by learning or can be defined in advance.
[0204]
Therefore, if it is only necessary to generate an event of detecting a certain proportion of feature elements, the division into sub-time windows is almost meaningless and may be performed in one time window.
[0205]
Note that pulses corresponding to a plurality (three) of different characteristic elements may arrive and be added (see FIG. 7D). That is, it may be assumed that a plurality of feature elements (FIG. 7D) or pulses corresponding to an arbitrary number of feature elements are input to one sub time window (time slot). In this case, in FIG. 7D, in the first sub-time window, the apex portion f of the triangle₁₁Other features f that support the detection of_{twenty one}, F_{twenty three}And the second sub time window similarly has an apex portion f₁₂Other features f that support the detection of_{twenty two}, F₃₁The pulse has arrived.
[0206]
The number of divisions into sub-time windows (time slots), the width of each sub-time window (time slot), the class of features, and the assignment of pulse time intervals corresponding to features are not limited to the above description, and can be changed. Needless to say. For example, in addition to the above-described feature elements, sub time windows corresponding to feature elements such as “X” and “+” may be set. Such a feature element can be said to be redundant (or unnecessary) for triangle figure detection, but conversely, by detecting that these elements do not exist, it is possible to improve the detection accuracy of the figure pattern of triangles.
[0207]
In addition, even when a deformation that is not represented by the combination of these feature elements is applied (for example, when rotation within a certain range is given), the output pulse of the neuron of the feature integration layer that represents the feature element is Because it reacts with a continuous phase delay (delay amount: where the pulse arrives in a predetermined sub-time window (time slot)) according to the degree of deviation from the ideal pattern (so-called graceful degradation) The output is stabilized so that the permissible range for the detected deformation of the graphic feature is above a certain level. For example, the feature f shown in FIG.₁₁, F₁₂, F₁₃A triangle (Q1) formed by features corresponding to f, and f₄₁, F₄₂, F₄₃In the triangle (Q2) formed by the feature corresponding to 少なくとも, at least the directions should be different from each other.
[0208]
In this case, when there is a detection (integrated) cell corresponding to each feature, for a triangle (Q3) corresponding to an intermediate orientation of both triangles, f₁₁, F₁₂, F₁₃Detected (integrated) cells and f corresponding to₄₁, F₄₂, F₄₃The detection (cell) corresponding to each is lower than the maximum response output, and directly becomes the output level according to the convolution calculation value with the filter kernel as the receptive field structure determined according to the type of the feature. If the vector quantities as the outputs from the cells are integrated as unique to the intermediate graphic, it becomes possible to detect an intermediate graphic (when rotation is applied) in the state of two triangles.
[0209]
For example, qualitatively, the smaller the rotation angle, the closer to Q1, the more f₁₁, F₁₂, F₁₃The output from the cell corresponding to is relatively large, and conversely, the closer to Q2, the f₄₁, F₄₂, F₄₃The output from the cell corresponding to is increased.
[0210]
Spatio-temporal integration of pulse output and network characteristics
Next, the calculation of the spatiotemporal weighted sum (load sum) of the input pulse will be described. As shown in FIG. 7B, in each neuron, the load sum of the input pulse is calculated with a predetermined weight function (for example, Gaussian) for each sub time window (time slot), and the sum of the load sums is set as a threshold value. To be compared. τ_jRepresents the center position of the weight function of the sub time window j, and is represented by the time window start time reference (elapsed time from the start time). The weight function is generally a function of a distance (deviation on the time axis) from a predetermined center position (representing a pulse arrival time when a feature to be detected is detected).
[0211]
Therefore, assuming that the peak position τ of the weight function of each sub-time window (time slot) of the neuron is the time delay after learning between neurons, the neural network that performs spatio-temporal weighted summation (load sum) of the input pulses Can be regarded as a kind of time-domain radial basis function network (hereinafter abbreviated as RBF). Neuron n using Gaussian function weight function_iTime window F_TiIs the spread for each sub-time window, σ, and the coefficient factor (equivalent to the synaptic connection weight) b_ijIn terms of
[0212]
[Outside 8]

[0213]
Note that the weighting function may take a negative value. For example, if a neuron of a feature detection layer is scheduled to finally detect a triangle, it is clear that the feature (F_faulse) (For example, “X”, “+”, etc., described above), the total sum of the inputs so that the triangular detection output is not finally made even if the contribution from other feature elements is large. In the value calculation process, the feature (F_faulseFrom the pulse corresponding to), a weighting function that gives a negative contribution and a combination from a feature detection (integrated) cell can be given.
[0214]
Neuron n in the feature detection layer_iSpatio-temporal sum of input signals to X_i(t) is
[0215]
[Outside 9]

It can be expressed. Where ε_jThe neuron n_jIs the initial phase of the output pulse from_iIf the time window phase is forced to synchronize to 0 by the timing firing from the pacemaker neuron, or the time window phase forcibly synchronizes to 0_jMay always be 0. When the pulse input in FIG. 7A and the load sum by the weight function shown in FIG. 7B are executed, a temporal transition of the load sum value as shown in FIG. 7E is obtained. The feature detection neuron outputs a pulse when the weight sum reaches a threshold value (Vt).
[0216]
Neuron n_iAs described above, the output pulse signal from is output to the upper layer neurons with the output level that is a squashing nonlinear function of the spatio-temporal sum of the input signals (so-called total input sum) and the time delay (phase) given by learning (The pulse output is set to a fixed frequency (binary), and is output by adding a phase modulation amount that is a squashing nonlinear function for the spatiotemporal sum of the input signal to the phase corresponding to the fixed delay amount determined by learning).
[0217]
Process flow overview
FIG. 8 is a flowchart showing the processing procedure of each layer described above. The flow of processing from low-order feature detection to high-order feature detection is summarized as shown in FIG. First, in step S801, low-order feature detection (for example, calculation of Gabor wavelet transform coefficients at each position) is performed. Next, in step S802, low-order feature integration processing that performs local averaging of these features is performed. Further, detection and integration of middle-order features are performed in steps S803 to S804, and detection and integration of higher-order features are performed in steps S805 to S806. In step S807, the presence / absence of a recognition (detection) target or its detection position is output as the output of the final layer. The number of layers allocated to steps S803 to 804 and S805 to 806 can be arbitrarily set or changed according to the task (recognition target, etc.).
[0218]
FIG. 9 is a flowchart showing a processing procedure of each feature detection neuron 602. First, in step S901, pulses corresponding to a plurality of feature categories are input from the neuron 601 that forms the same receptive field 105 in the input layer 101 or the feature integration layer 103, which is the previous layer, and in step S902, the pacemaker neuron 603 is received. A time window and a weighting function are generated based on the local synchronization signal input from (or obtained by interaction with the previous layer neuron), and in step S903, a weighted sum by a predetermined temporal weighting function is obtained for each. In step S904, it is determined whether the threshold value has been reached. If the threshold value has been reached, pulse output is performed in step S905. Although steps S902 and S903 are shown in time series, they are actually performed almost simultaneously.
[0219]
The processing procedure of each feature integration neuron is as shown in the flowchart of FIG. That is, in step S1001, the feature detection processing module 104 in the same category receives a pulse input from a feature detection neuron that forms a local receptive field unique to the neuron. Input pulses are added during the time range other than the response period. In step S1003, it is determined whether or not the total value of input pulses (for example, measured based on potential) has reached a threshold value. If the threshold value is reached, in step S1004, a pulse is output with a phase corresponding to the total value. do.
[0220]
FIG. 27 shows an outline of the main processing flow regarding gaze position setting control. First, in step S2701, the gaze position is set at the center of the screen, and the gaze area size is the entire screen. Next, in step S2702, a processing process using a What path to the highest layer is executed. At this stage, as described above, a state in which a so-called object is perceived, that is, a recognition state has not been reached.
[0221]
Thereafter, in step S2703, the maximum feedback amount or a predetermined maximum evaluation value (formula (5)) is determined based on the feedback amount from the output of the feature position detection layer (3, M) corresponding to the uppermost layer via the Where path. The neuron on the feature integration layer corresponding to the received gaze control neuron is set as a new gaze position, and the size of the gaze area specific to the processing channel to which the feature integration layer neuron belongs is set. In step S2704, recognition in the What path Processing is performed. Next, in step S2705, gaze position update determination is performed, and in the case of updating, the next candidate gaze control neuron search process (step S2706) is performed in the vicinity area of the latest gaze position as the gaze area update control. .
[0222]
Here, the update determination of the gaze area is, for example, a determination as to whether or not there is a gaze control neuron (referred to as a “next candidate neuron”) whose other feedback amount is at a sufficient level in the vicinity area. Determines whether there is no other next candidate neuron and there is an unset gaze position in the screen (in this case, a determination circuit regarding the presence / absence of an unset gaze position not shown is required) Or determination of the presence / absence of input of a control signal from the outside.
[0223]
In the first case, one of the next candidate neurons is selected by the method described above to set the next gaze position and the like. In the second case, the next gaze position is set at an arbitrary unset position outside the latest neighboring area (or an arbitrary unset position adjacent to the neighboring area). In the third case, for example, when the operation signal for pressing the shutter button by the user is not detected as the control signal, the user gives an instruction to update, and the update instruction signal is detected as the control signal.
[0224]
In step S2707, as a result of the update, the gaze control neuron that receives the quasi-maximum feedback amount described above is activated, and the neuron goes to the feature integration layer (in the case of FIG. 20) or to the gating layer (in the case of FIG. 21). A setting signal related to the gaze area is output. In step S2708, a part of the specific processing channel is opened, so that a signal is propagated from the feature integration layer neuron corresponding to the updated gaze region to the feature detection layer (1, 1). On the other hand, as a result of the gaze update determination, if the update is not performed (see the above three cases of determination), the gaze area update control operation is terminated.
[0225]
Network and other structural variations
Since the input pulse corresponds to the feature (or the spatial arrangement relationship of the feature elements) at each position in the spatial domain, it is possible to construct a spatiotemporal RBF.
[0226]
Specifically, each neuron output value is further weighted and added, so that a sufficient number of predetermined feature elements (feature detection cells) and a sufficient number of sub-time windows (time It is possible to express a spatio-temporal function of a pulse pattern corresponding to an arbitrary graphic pattern from the calculation of the weighted sum (load sum) in the slot). If the recognition target category and its shape change are limited to some extent, the number of necessary feature detection cells and sub time windows (time slots) can be reduced.
[0227]
In this embodiment, the common bus is a local bus line that is assigned to the same receptive field. However, the present invention is not limited to this, and the interlayer connection from one layer to the next layer is performed on the same bus line. As described above, the pulse phase delay amount may be divided and set on the time axis. Further, a common bus line may be used between adjacent receptive fields having a relatively large overlapping ratio.
[0228]
In addition, processing (or threshold processing) is performed so that the result of the weighted product-sum operation within each sub-time window (time slot) becomes a non-linear squashing function value without using the spatiotemporal RBF described above. You may take the product of them. For example, with a circuit configuration (not shown), a threshold processing result (binary) is obtained for each sub-time window, stored in the temporary storage means, and a logical product of the threshold processing results obtained sequentially is obtained in time series. do it.
[0229]
Needless to say, when a product is obtained by performing threshold processing, the tolerance for feature detection under a pattern defect or low contrast condition decreases.
[0230]
The above-described processing (detection of a graphic pattern by spatiotemporal RBF) can also be realized as an operation similar to the associative memory recall process. That is, even if a low-order (or medium-order) feature element to be detected in a certain local region (or the entire region) is lost, some other feature elements are detected, and the above summation value (Expression (13) )) Exceeds the threshold value, the entire spatio-temporal RBF network can detect medium-order (or higher-order) feature elements (fire the corresponding neuron).
[0231]
The network configuration need not be limited to that shown in FIG. 1, and may be an MLP or the like as long as it includes a layer for detecting a predetermined geometric feature. Nor.
[0232]
In this embodiment, Gabor wavelet transform is used for low-order feature extraction, but other multi-scale features (for example, local autocorrelation coefficient obtained with a size proportional to the scale) may be used. Needless to say.
[0233]
In the network configuration as shown in FIG. 1, even if a graphic pattern or the like is recognized by a network composed of synapse elements that perform a pulse width (analog value) modulation operation and an integral-and-fire neuron. Good. In this case, the modulation by the synapse uses the pulse width of the pre-synaptic signal and the pulse width of the post-synaptic signal as W_b, W_aW_a = S_ijW_bGiven in. Where S_ijMeans the same as the bond strength (formula (10)). In order to increase the modulation dynamic range, it is necessary to make the basic pulse width of the pulse signal sufficiently smaller than the period (basic pulse interval).
[0234]
The firing of a neuron (pulse output) occurs when the electric potential exceeds a predetermined threshold value due to the accumulation of charges accompanying the inflow of a plurality of pulse currents representing a predetermined characteristic element. In the case of pulse width modulation or the like, weighted addition of arrival pulses for each sub time window is not particularly required, but integration in a time window having a predetermined width is executed. In this case, the feature element (graphic pattern) to be detected depends only on the temporal summation of signals input to the feature detection layer neurons (summation of pulse current values). The input pulse width corresponds to the value of the weight function.
[0235]
Note that the configuration that obtains scale-invariant feature expression first improves the efficiency of selective gaze processing while maintaining scale-invariant recognition performance without using a configuration having multiple processing channels up to the middle and higher order. As a configuration brought about by the simplification of the circuit configuration, the reduction in scale, and the reduction in power consumption, the following may be used.
[0236]
That is, in the basic configuration described above, the gaze area setting control layer is performed after collective coding for low-order features and collective coding is performed at each layer level. However, features with different collective coding scale levels are used. Representation and collective coding as described above are limited to low-order features, and scale-invariant feature representation is obtained by phase modulation of the pulses related to each feature as described later (scale-invariant pulse information conversion), and thereafter The above-described gaze area setting control may be performed, and medium-order and higher-order feature detection may be performed in this scale-invariant feature expression domain. Although the gaze area setting control may be performed before the collective encoding, the processing efficiency is reduced as compared with the case where the gaze area is set after the gaze area encoding.
[0237]
In order to perform the conversion to the above-described scale-invariant pulse information, for example, a certain graphic feature is detected as a conversion in which features having the same feature category but different scale levels (sizes) are expressed by the same pulse interval. Conversion may be performed so that the phase offset amount of the arrival pattern of a plurality of pulses to the feature detection neuron becomes a constant value regardless of the pulse from any processing channel. Even when information is expressed by pulse width modulation, the same processing may be performed with respect to expansion / contraction of the pulse width or an offset amount.
[0238]
Furthermore, in the feature detection neurons belonging to different scale levels (processing channels), the pulse arrival time intervals (or pulse arrival time patterns) corresponding to the same category of graphic features (for example, L-shaped pattern) are: The learning rule may be determined so as to be divided in time depending on the scale level.
[0239]
The collective encoding process is performed by linear combination by weighted addition over the entire time-divided pulse signal. In selective gaze processing, output data from the feature integration layer (2,0) corresponding to a specific partial region is selectively extracted as a gaze region before collective coding, and collective coding processing is selected. The output data is performed on the time axis. In the layers after the feature detection layer (1, 1), multiple scale processing can be performed by the same circuit without providing different circuits for each processing channel, resulting in an economical circuit configuration. That is, in the (1, 1) layer and later, the difference in processing channel can be physically discriminated from the circuit configuration.
[0240]
Output from the feature detection layer (1,1) to the feature integration layer (2,1) layer (and the same for the subsequent layers) is performed in a time-sharing manner for each processing channel output (for each scale level). Is called.
[0241]
Application examples installed in imaging devices
Next, referring to FIG. 11, a case where focusing on a specific subject, color correction of a specific subject, and exposure control are performed by mounting the pattern detection (recognition) device according to the configuration of the present embodiment on the imaging device will be described. explain. FIG. 11 is a diagram illustrating a configuration of an example in which the pattern detection (recognition) apparatus according to the embodiment is used in an imaging apparatus.
[0242]
An imaging device 1101 in FIG. 11 includes an imaging optical system 1102 including a photographing lens and a zoom photographing drive control mechanism, a CCD or CMOS image sensor 1103, an imaging parameter measurement unit 1104, a video signal processing circuit 1105, a storage unit 1106, an imaging unit. It includes a control signal generator 1107 that generates control signals for controlling operations, imaging conditions, etc., a display display 1108 that also serves as a viewfinder such as EVF, a strobe light emitting unit 1109, a recording medium 1110, etc. The apparatus is provided as a subject detection (recognition) apparatus 1111 having a selective gaze mechanism.
[0243]
In this imaging device 1101, for example, a subject detection (recognition) device 1111 detects (recognizes the existing position and size) or recognizes a face image of a person registered in advance in a finder image before photographing. When the position and size information of the person is input from the subject detection (recognition) device 1111 to the control signal generation unit 1107, the control signal generation unit 1107 receives the person based on the output from the imaging parameter measurement unit 1104. Control signals for optimally performing focus control (AF), exposure condition control (AE), white balance control (AW), and the like.
[0244]
FIG. 28 shows an outline of a processing flow when shooting is performed while performing selective gaze in the imaging apparatus.
[0245]
First, in step S2801, in order to specify a detection / recognition target in advance, model data about a shooting target (for example, the face of a target person when shooting a person) is stored in the storage unit 1106 or the imaging apparatus.Built-inTo a temporary storage unit (not shown) via a communication unit (not shown). Next, in step S2802, the gaze control process is initialized (for example, the gaze area is set to the screen center position, the entire screen size, etc.) in the shooting standby state (shutter half-pressed state).
[0246]
Subsequently, in step S2803, the gaze area update process as described above is started, and a gaze area that matches a predetermined standard (for example, when “detection” of the imaging target is determined based on the highest layer output) is set. In step S2804, the gaze area update process is temporarily stopped, and in step S2805, optimization control (AF, AE, AW, zooming, etc.) of the imaging conditions centered on the selected gaze area is performed. At this time, a marker such as a cross mark is displayed on the finder display in step S2806 so that the user can confirm the shooting target.
[0247]
Next, as an erroneous detection determination process (step S2807), for example, whether or not the shutter button has been pressed within a certain time range, or whether or not the user has issued a search instruction for another target (for example, canceling the shutter half-press state) The process of determining (when the status is set to “1”) is performed. If it is determined that there is a false detection, the process returns to step S2803 to start the gaze area update process again. If it is determined that there is no false detection, shooting is performed under the shooting conditions at that time.
[0248]
As a result of using the pattern detection (recognition) device according to the above-described embodiment in the imaging device as described above, even when a plurality of subjects are present in the input image, even when the location of the subject and the size in the screen are not known in advance. The object can be detected (recognized) reliably and efficiently, and such a function can be realized with low power consumption and high speed (real time) to detect a person or the like and to optimally control shooting (AF, AE etc.).
[0249]
<Second Embodiment>
In the present embodiment, the priority gaze update process itself is performed at high speed by obtaining the priority (priority) related to the place (in-screen position) where selective gaze is performed in advance. As an effect peculiar to the present embodiment, the priority is set in advance by the priority setting unit 2401 to be defined later (before the detection operation etc. is started), so that the processing efficiency (speed of updating the gazing point) is increased. ) Is greatly improved. Further, by providing an allowable range of priority in order to limit the position that can be updated, and updating the gaze position only for the gaze control neurons having the priority in the range, the processing speed can be further increased.
[0250]
FIG. 24 shows the main configuration of the gaze area setting control layer 108 of the present embodiment. A priority setting unit 2401 is provided for setting a priority order for selecting a gaze control neuron based on a low-level feature saliency map (feature integration layer (2,0) output) and a feedback amount from an upper layer. This may be set close to the gaze area setting control layer 108 together with the gaze position update control circuit 1901 (may be set on the gaze area setting control layer 108). Note that the priority setting unit 2401 and the gaze position update control circuit 1901 are digital signal processing circuits (or logic gate arrays) in this embodiment. As in the previous embodiment, the gaze area setting control layer 108 has gaze control neurons (number of processing channels x number of feature categories) corresponding to each position where low-order feature detection is performed on input data. Exist in the same position).
[0251]
Here, the amount indicating the priority corresponds to a linear sum (or its rank) of the feature integration layer output (saliency value) and the feedback amount from the upper layer as shown in Expression (5). A control signal for activating the gaze control neurons corresponding to the descending order of priority is sent from the gaze position update control circuit 1901 to a specific (update position) gaze control neuron. The gaze position update control circuit 1901 calculates the priority by sequentially accessing each gaze control neuron by a predetermined method described later, and performs sorting in order to obtain the position (address) of the gaze control neuron having a high priority. The neuron address information is stored in the primary storage unit (not shown) in descending order of priority. However, only the address of the gaze control neuron having the priority within the allowable range of the preset priority is stored.
[0252]
As a method of accessing the gaze control neuron for calculating the priority, for example, sampling is performed in a spiral form from the initial gaze position (the center of the screen as in the previous embodiment). Also, the processing channel to be searched is determined based on the shooting conditions (subject distance, magnification, etc.) (if the subject size is known in advance, the size of the target in the screen corresponding to the shooting conditions is also known, and as a result the scale level Only the corresponding gaze control neuron group may be searched.
[0253]
The gaze position update control circuit 1901 does not sequentially select all of the registered gaze control neuron addresses, but selects gaze control neurons in a preset priority range. For example, in the first stage of performing visual search (selective gaze control), the allowable range of priority is set to a high value, and every time it makes a round in that range (this is counted as the number of times of gaze position search update, Depending on the number of times, the allowable range of priority may be changed. For example, as the number of times increases, the allowable range of priority may be lowered or the allowable range may be expanded.
[0254]
<Third Embodiment>
A configuration centering on the gaze area setting control layer 108 of the present embodiment is shown in FIGS. In this embodiment, the gaze control neuron 1801 is an upper layer (for example, the feature position detection layer (3, 2) or the feature integration layer (2, 2)) that outputs information on the position and existence probability of the target belonging to the recognition target category. And an intermediate layer (for example, a feature position detection layer (e.g., feature position detection layer (e.g., feature position detection layer ( 3,1) or feature integration layer (2,1)), and search for the target by a control signal from the outside (for example, a layer higher than the highest layer shown in FIG. 1). At the time (detection mode), the feedback input from the upper layer is given priority, and when the target is recognized (recognition mode), the feedback input from the intermediate layer is prioritized.
[0255]
Here, the search for an object in the detection mode simply means that an object in the same category as the recognition object is detected (for example, detection of a face), and the recognition is more detailed control of gaze position setting. By performing based on the feature, it means that it is determined whether or not the target is a recognition target (for example, whether or not it is a face of a specific person). When the latter is recognized, a so-called gaze degree is higher than when the former is searched.
[0256]
The method of priority by weighting feedback coupling is set by a feedback amount modulation unit 2601 shown in FIG. FIG. 25 shows the flow of processing via the feedback amount modulation unit 2601. The feedback amount modulation unit 2601 first inputs the feedback amount from the upper layer and the feedback amount from the intermediate layer for each gaze control neuron (2501).
[0257]
A mode control signal for detection mode or recognition mode is input (2502). In detection mode, only the feedback amount from the upper layer or a linear sum (αF₁ + βF₂: F₁F from the upper layer₂Takes the feedback amount from the intermediate layer), calculates the combined feedback amount that contributes greatly to the feedback amount from the upper layer (α> β ≧ 0), and modulates the feedback amount to be output (Amplification) is performed (2503). On the other hand, in the recognition mode, the combined feedback amount is calculated such that only the feedback amount from the intermediate layer, or the feedback amount contribution from the intermediate layer by the linear sum of both feedback amounts is large (0 ≦ α <β). And give similar modulation (2504).
[0258]
In the present embodiment, the gaze position setting control can be performed in the same manner as in the first embodiment, apart from which feedback coupling has priority, but in addition to the method according to the first embodiment, You may carry out, giving time fluctuation. Thus, there are cases where the search target can be detected more efficiently by giving fluctuations to the gaze position. That is, in the first embodiment, when the gaze position is updated, the next candidate search process is always required in the vicinity area (when there is no gaze control neuron that becomes the next candidate in the vicinity area of the latest gaze position, The next gaze position is set to the outside of the vicinity area or the adjacent position outside), but the following specific effects can be obtained by providing fluctuation of the gaze position.
[0259]
(1) Even if the size of the neighborhood area to be searched is reduced (as compared to the case without fluctuation), the processing load required for the search in the neighborhood area can be reduced while keeping the time required for the search equal. . This is because the random fluctuation of the gaze position has an effect equivalent to the search for the attention position without performing the comparison processing of the feedback amount to the gaze control neuron in the vicinity region.
[0260]
(2) When the fluctuation width is small, it is possible to improve the recognition (detection) rate by temporal averaging of the feature detection layer output centered on a predetermined gazing point.
[0261]
When the fluctuation of the gaze position is given, the fluctuation range can be changed according to the gaze degree. Here, the degree of gaze refers to the degree of emphasizing the feedback amount from the layer that detects the specific pattern constituting the entire pattern to be recognized and detected, or detects the arrangement between the specific patterns as such components. For example, the value obtained by using the feedback level from the layer (upper layer) for detecting the overall pattern (higher-order feature) as the reference level, ie, the upper layer Feedback amount F from the intermediate layer to feedback amount F1 from₂Ratio (F₂/ F₁).
[0262]
When the gaze degree is high, the variation range of the gaze position is controlled in the manner of reducing the variation range. The gaze degree is a monotonically increasing function of the sum of feedback amounts from higher layers, or the feedback amount that increases the gaze degree (feedback amount to the neuron) as the difference in feedback amount between multiple gaze control neurons decreases. It may be determined by performing amplification. The latter feedback amount amplification is performed on each gaze control neuron 1801 in the gaze region setting control layer 108 by the feedback amount modulation unit 2601 shown in FIG.
[0263]
As a method for giving temporal variation to the gazing point position, for example, in the vicinity region similar to the first embodiment centered on the latest gazing point position, the same as in the first embodiment based on the feedback amount after modulation described above. An updated gaze point is set, and a random change in the position is given, thereby setting the final updated gaze control neuron (gaze position). At this time, the random fluctuation width may be controlled as described above according to the gaze degree.
[0264]
<Fourth Embodiment>
In the present embodiment, as shown in FIG. 29, in an image input device (camera, video camera, etc.), an assist for detecting information such as the position / size of an image input (shooting) target intended by the user (photographer). An object detection (recognition) device 1111 incorporating an information detection unit 2902 (hereinafter, described as an example of a line of sight in the present embodiment) and a gaze area setting control unit 2901 according to the above-described embodiment is provided.
[0265]
High-speed automatic that is suitable for a specific target on the screen and can reflect the user's intention by controlling the setting of the gaze position in conjunction with the assist information (gaze) detection unit 2902 and the gaze area setting control unit 2901 Take a picture. The assist information may be explicitly set by the user in addition to the line of sight. Hereinafter, the assist information detection unit 2902 will be described as a line-of-sight detection unit 2902.
[0266]
In FIG. 29, an imaging device 1101 includes an imaging optical system 1102 including a photographing lens and a zoom photographing drive control mechanism, a CCD or CMOS image sensor 1103, an imaging parameter measurement unit 1104, a video signal processing circuit 1105, a storage unit 1106, A control signal generation unit 1107 that generates control signals for imaging operation control, imaging condition control, and the like, a display display 1108 that also serves as a viewfinder such as an EVF, a strobe light emitting unit 1109, a recording medium 1110, and the like are provided.
[0267]
Furthermore, a subject detection (recognition) unit 1111 including the above-described gaze detection unit 2901, an eyepiece optical system 2903 for gaze detection, and a gaze area setting control unit 2901 is provided. The eyepiece optical system 2903 includes an eyepiece, a light splitter such as a half mirror, a condenser lens, an illumination light source such as an LED that emits infrared light, and the line-of-sight detection unit 2901 includes a mirror and a focus plate. , A penta-dach prism, a photoelectric converter, a signal processing means, and the like, and the line-of-sight position detection signal is output to the imaging parameter measuring unit 1104.
[0268]
As the configuration of the line-of-sight detection unit 2901, the mechanism disclosed in Japanese Patent No. 2505854, Japanese Patent No. 2763296, Japanese Patent No. 2941847, etc. by the present applicant can be used, and description thereof will be omitted.
[0269]
As a specific procedure of gaze control, first, a location (region) that is noticed by the user is extracted in advance by the gaze detection unit 2902 and the information is stored in the primary storage unit. Next, the gaze area setting control unit 2901 is activated to preferentially search for gaze candidate position (s) stored in advance. Here, to search preferentially, as in the case where the gaze position is set using the priority order described in the second embodiment, for example, the gaze position is set in the order in which the user gazes, This means that a search for a nearby region as shown in the first embodiment or the like is performed every time within a certain time range, and then the update to the user gaze position and the nearby region search are repeated alternately. The reason will be described later.
[0270]
On the other hand, during the operation of the gaze area setting control unit 2901, a signal from the line-of-sight detection unit 2902 may be input at regular time intervals to preferentially search around the position where the user gazes at that time. Specifically, the gaze control neuron that is closest to the gaze position in the screen specified by the signal from the gaze detection unit 2902 is selected and the control neuron is activated (or the gating layer in FIG. 21). 2101), a recognition (detection) process reflecting the user's gaze position is performed.
[0271]
As described above, the main reasons for performing the gaze area setting control by combining the information on the position where the user gazes and the gaze position information automatically updated by the gaze position setting control process are as follows. is there.
[0272]
1. The position where the user is gazing does not always accurately represent the object to be photographed (useful for detection).
[0273]
2. The search range can be narrowed by using the position information of the photographing target intended by the user as an auxiliary than the gaze position setting control process alone as shown in the first, second, and third embodiments, and the search is more efficient. It can be performed.
[0274]
3. Error detection of the gaze position set by the gaze area setting control unit is facilitated.
[0275]
As described above, by using the user's gaze position detection result, the object can be detected and recognized reliably and at high speed.
[0276]
According to the embodiment described above, the setting control of the region to be watched can be performed efficiently with a small-scale circuit without being disturbed by a low-order feature (for example, an edge which is meaningless for detection of the target). It can be carried out. In particular, by incorporating multi-resolution processing and collective coding processing mechanisms into gaze control that involves feedback from higher layers, multiple objects belonging to the same category can exist at different positions in different sizes. Even in such a case, it is possible to efficiently search between a plurality of objects by providing a mechanism that performs feedback control using a detection level of higher-order features.
[0277]
In addition, by performing threshold processing of weighted load sums within the time window for pulse signals that are feature detection signals, there are multiple symmetries to be detected (recognized) under complex and diverse backgrounds, and their arrangement relations are determined in advance. Even if you don't know it, it can reliably and efficiently detect the desired pattern even if it is deformed (including position fluctuations, rotation, etc.), especially if it loses its feature detection due to changes in size, lighting, noise, etc. can do. This effect can be realized regardless of a specific network structure.
[0278]
Furthermore, the gaze area setting can be updated at high speed and reliably by using the assist information from the user (such as the gaze direction) and the gaze area setting control process.
[0279]
In addition, it is possible to search for a gaze area according to the detection level of higher-order features with high efficiency, and to detect and recognize objects of a predetermined category with a high speed and compact configuration.
[0280]
In addition, by setting the gaze area only for the low-order features or input data, the low-order to the high-order as performed in the prior art (the selective tuning method and Japanese Patent Publication No. 6-34236) are used. Setting control is not required, resulting in improved processing efficiency and higher speed.
[0281]
Also, in the recognition mode that performs detailed pattern analysis by using both low-order feature saliency and feedback signal in a mixed manner, the pattern that constitutes the recognition target and local feature arrangement In a detection mode that simply uses information related to information and simply detects patterns belonging to a certain feature category, the overall features (higher-order features or feedback signals from higher layers) are handled preferentially. Can be processed.
[0282]
Moreover, since the priority regarding a gaze position is calculated | required beforehand and a gaze position is controlled based on the result, search control of a gaze area | region can be performed very rapidly.
[0283]
In addition, if the allowable range of priority related to the gaze position is changed according to the number of gaze position searches, the permissible priority is used in the case where the search process of the recognition target circulates (searches for almost the same search position repeatedly). The substantial search range can be changed by changing the degree.
[0284]
In addition, by setting the gaze position in descending order of priority from the priority distribution and controlling the size of the gaze area based on the processing channel to which the feature selected based on the priority belongs, the size of the recognition detection target is set in advance. Even if it is not known, it is possible to automatically set the gaze area according to the size of the target.
[0285]
In addition, by setting and controlling the gaze area as an active receptive field of the feature detection element belonging to the low-order feature detection layer, it is possible to set the gaze area without setting a special gating element for setting the gaze area become.
[0286]
Also, by extracting a plurality of features for each of a plurality of resolutions or scale levels, it is possible to search for an object of an arbitrary size with high efficiency.
[0287]
In addition, when only a target belonging to a predetermined category is detected by giving priority to feedback input from an upper layer when searching for an object and feedback input from an intermediate layer when recognizing the target, higher-order features ( Search based on the overall characteristics of the pattern), and when identifying and recognizing similar objects, it is based on the middle order features (such as information about the pattern that forms part of the whole and the layout of the features) Detailed processing is possible.
[0288]
In addition, when the predetermined gaze degree is large, the temporal fluctuation of the center position of the gaze area is reduced, thereby making the fluctuation of the gaze position during visual search variable, improving the search efficiency according to the gaze degree and the search time. Realize shortening.
[0289]
Further, the size of the gaze area is set based on the detected scale level related to the pattern belonging to the recognition target category. As a result, even if the target size is not known in advance, the size previously associated with the processing channel can be used as the estimated size, and the gaze area size can be set with high efficiency.
[0290]
The gaze degree is a monotonically increasing function of the level of the feedback signal level from the upper layer, so that the higher the detection level of higher-order features, the higher the gaze degree, and the automatic processing according to the higher-order features It can be performed.
[0291]
Furthermore, by controlling the operation based on the output signal from the pattern detection apparatus described above, it is possible to input an image targeting a specific subject at low power consumption and at a high speed at an arbitrary subject distance. As this image input device, it can be applied to a photographing device such as a so-called still image, moving image or other stereoscopic image, and a device including an image input unit such as a copying machine, a facsimile, a printer or the like.
[0292]
In addition, by setting a gaze area that matches a predetermined reference and performing optimization control of shooting conditions centered on the set gaze area, a shooting target is searched with high efficiency, and according to the detected target Optimal automatic shooting is possible.
[0293]
Further, by updating or setting the gaze area based on the assist information from the user such as the detected line of sight, the setting control of the gaze area can be performed quickly and reliably.
[0294]
【The invention's effect】
As described above, according to the present invention, it is possible to detect a predetermined pattern at a high speed while setting a gaze area with high efficiency.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a network configuration of an embodiment according to the present invention.
FIG. 2 is a diagram illustrating a configuration of a synapse unit and a neuron element unit.
FIG. 3 is a diagram illustrating a state of multi-pulse propagation from a feature integration layer or an input layer to a feature detection layer neuron in the first embodiment.
FIG. 4 is a diagram illustrating a configuration diagram of a synapse circuit.
FIG. 5 is a diagram showing a configuration of a synapse coupling subcircuit and a configuration of a pulse phase delay circuit used in the first embodiment.
FIG. 6 is a diagram showing a network configuration when a feature detection layer neuron has an input from a pacemaker neuron.
FIG. 7 is a diagram illustrating a configuration of a time window, an example of a weight function distribution, and an example of a feature element when processing a plurality of pulses corresponding to different feature elements input to a feature detection neuron.
FIG. 8 is a flowchart showing a processing procedure of each layer.
FIG. 9 is a flowchart showing a processing procedure of each feature detection neuron.
FIG. 10 is a flowchart showing a processing procedure of each feature integration neuron.
FIG. 11 is a diagram illustrating a configuration of an example in which the pattern detection (recognition) apparatus according to the embodiment is used in an imaging apparatus.
FIG. 12 is a diagram illustrating a circuit configuration of a feature integration layer.
FIG. 13 is a diagram illustrating a circuit configuration of a feature integration layer.
FIG. 14 is a diagram illustrating a configuration of a normalization circuit.
FIG. 15 is a diagram showing a configuration of a channel activity control circuit.
FIG. 16 is a diagram illustrating a configuration of a gating circuit.
FIG. 17 is a block diagram illustrating a network configuration according to the first embodiment.
FIG. 18 is a schematic connection diagram centered on a gaze control neuron in a gaze control layer.
FIG. 19 is a diagram illustrating a configuration of a gaze position control circuit.
FIG. 20 is a diagram illustrating a network configuration centering on a gaze area setting control layer.
FIG. 21 is a diagram illustrating a network configuration centering on a gaze area setting control layer;
FIG. 22 is a diagram showing a distribution control circuit for lower layer inputs and upper layer feedback inputs to a gaze control neuron.
FIG. 23 is a diagram illustrating an output example when a plurality of processing channel outputs having different scale selectivity are integrated by collective encoding.
FIG. 24 is a diagram illustrating a network configuration used in the second embodiment.
FIG. 25 is a diagram showing a flow of processing in a feedback amount modulation circuit.
FIG. 26 is a diagram illustrating a configuration centering on a gaze control layer according to the third embodiment.
FIG. 27 is a flowchart showing a flow of processing relating to gaze position setting control;
FIG. 28 shows a selective gaze mechanism in an imaging apparatus.Built-inIt is a flowchart which shows the flow of control at the time of making it carry out.
FIG. 29 is a diagram illustrating an image input apparatus including a subject recognition mechanism.

Claims

An input means for inputting a pattern;
A plurality of feature detection elements for detecting a plurality of features corresponding to each point obtained by sampling a pattern input from the input means by a predetermined method, and a saliency detection element for detecting feature saliency And a detection processing means for detecting a predetermined pattern by forming a plurality of element layers relating to features from low to high orders, and a coupling means for coupling between the elements and transmitting signals.
Gaze area setting control means,
The coupling means comprises feedback coupling means for transmitting signals from the element layer for higher-order features to the element layer for lower-order features;
The gaze area setting control means controls the setting of the gaze area related to low-order feature data or input data based on the feature saliency and the signal transmission amount obtained from the feedback coupling means. Detection device.

The pattern detection apparatus according to claim 1, wherein the gaze area setting control unit updates the setting of the position and size of the gaze area.

The gaze area setting control means includes
Priority calculation means for determining the gaze position priority at the sampling point position on each input data based on the signal transmission amount from the feedback coupling means and the saliency of the low-order feature, and the distribution of the priority The pattern detection apparatus according to claim 1, further comprising gaze position setting means for setting gaze positions in descending order of priority.

The gaze area setting control means includes
Counting means for counting the number of gaze position searches;
The pattern detection apparatus according to claim 3, further comprising a control unit that controls an allowable range of priority in which a gaze position can be set by the gaze position setting unit according to the number of times of the gaze position search.

The detection means has a plurality of processing channels corresponding to a plurality of scale levels or resolutions,
The pattern detection apparatus according to claim 3, wherein the gaze area setting control unit controls the size of the gaze area based on a processing channel to which a feature selected based on the priority belongs.

The pattern detection apparatus according to claim 1, wherein the gaze area setting control unit sets and controls the gaze area as an active receptive field of a feature detection element belonging to a low-order feature detection layer.

The gaze area setting control means outputs information related to the position and existence probability of the middle-order feature of the target of the recognition category and the feedback combination from the upper layer that outputs the information about the position and existence probability of the object belonging to the recognition target category. The feedback input from the upper layer is given priority when searching for the target, and the feedback input from the intermediate layer is given priority when the target is recognized. Pattern detection device.

The pattern detection apparatus according to claim 7, wherein the gaze area setting control unit reduces a temporal variation of a center position of the gaze area when a predetermined gaze degree is large.

The pattern detection apparatus according to claim 8, wherein the gaze degree is a monotonically increasing function value of a magnitude of a feedback signal level from the upper layer.