JP3659914B2

JP3659914B2 - Object recognition apparatus, object recognition method, program, and recording medium

Info

Publication number: JP3659914B2
Application number: JP2001333151A
Authority: JP
Inventors: 太郎今川; 強司目片
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2000-10-31
Filing date: 2001-10-30
Publication date: 2005-06-15
Anticipated expiration: 2021-10-30
Also published as: JP2002203240A

Description

【０００１】
【発明の属する技術分野】
本発明は、特定のカテゴリに属する物体を認識する装置および方法に関し、より詳細には、異なる属性を用いて物体を表現する複数の画像を用いて物体を認識する装置および方法に関する。
【０００２】
【従来の技術】
異なる属性を用いて対象物を表現した複数の画像を用いて対象物の認識を行う従来技術として、特開平８−２８７２１６号公報「顔面内部位認識方法」に開示される技術が知られている。この従来技術では、遠赤外光（波長８〜１０μｍの光）画像と可視光画像（異なる属性を用いて対象物を表現した複数の画像）とから、顔面内の部位（例えば、口）が認識される。遠赤外線画像は、対象物から放射される遠赤外光の強度を表現する。対象物から放射される遠赤外光の強度は、対象物の温度と対応付けることができるので、遠赤外光画像から特定の温度（例えば、人間の皮膚の通常の温度である約３６℃）の領域を抽出することができる。
【０００３】
温度画像を用いただけでは、対象物の周囲に人間と同じ温度の物体(室内の電機製品など)が存在する場合に、正確に人間を検出することが困難になるので、可視光の画像における肌色の領域を参照して信頼性の高い検出を実現していた。
【０００４】
【発明が解決しようとする課題】
上記公報に記載の従来技術では、認識対象となる部位の位置を特定するために、遠赤外線画像から抽出された皮膚温度領域と、可視光画像から抽出された肌色の領域とが対応付けられる。このような対応付けを行うためには、▲１▼遠赤外線画像からの皮膚温度領域（約３６℃の温度の領域）の抽出と、可視光画像からの肌色の領域の抽出とを正確に行う必要があり、▲２▼遠赤外線画像における画素と可視光画像における画素との対応付けを予め行う必要がある。
【０００５】
遠赤外線画像における画素と可視光画像における画素との対応付けを予め行うためには、可視光カメラと遠赤外光カメラの光軸を正確に合わせることが必要であり、撮像システムの構築や物体認識のための初期設定が複雑になるという課題がある。
【０００６】
遠赤外線画像から皮膚温度領域を正確に抽出するためには、時間経過とともに変化する遠赤外線カメラの光学系や回路、素子等の温度の影響をキャンセルするためのキャリブレーションを頻繁に行う必要がある。あるいは、これらの光学系や回路、素子等の温度の影響をなくすために、遠赤外線カメラの全体を一定の温度に保つ（例えば、冷却する）ことが必要になる。その結果、遠赤外線カメラを含む認識システムの初期設定および保守が複雑であり、コスト高になるという課題がある。
【０００７】
また、皮膚温度は日射や気温の影響によって大きく変化する。特に屋外においては日射や気温等の条件の変化に応じて、皮膚温度は標準的な３６℃付近の温度からかけ離れやすく、１日のうち、時間的にも大きく変化する。このように、皮膚温度が変化すると、遠赤外線画像から皮膚温度領域を正確に抽出することは困難になる。様々に変化する環境条件下で皮膚温度領域を正確に抽出するためには、その個々の条件に応じた抽出アルゴリズムを用意しなければならず、認識システムの初期設定が容易でないという課題がある。
【０００８】
可視光画像においても、屋外のように日射や車のヘッドライト等の人工照明の影響を受けやすい環境下では、カメラのダイナミックレンジの制限や光源のスペクトル分布が不確定であることに起因して、対象物の色を常に正確に検出することは困難になる。様々に変化する環境条件下で肌色領域を正確に抽出するためには、その個々の条件に応じた抽出アルゴリズムを用意しなければならず、認識システムの初期設定が容易でないという課題がある。
【０００９】
さらに、遠赤外線画像からの皮膚温度領域の抽出と、可視光画像からの肌色の領域の抽出とは、いずれも、個々の対象物の属性に特化した処理である。このような処理は、認識の対象が変わった場合にはうまく動作しない。例えば、この従来技術を動物の認識に適用するためには、領域抽出のアルゴリズムを変更しなければならない。個々の認識対象ごとに抽出アルゴリズムを用意しなければならないので、認識システムの初期設定が容易でない。
【００１０】
このように、従来技術によれば、遠赤外線画像の領域と可視光画像の領域とを対応付ける処理が必要であることに起因して、対象物の認識を行うための初期設定が容易でなく、環境条件の影響を受けやすいという課題がある。
【００１１】
本発明は、このような課題に鑑みてなされたものであり、認識の信頼度が高く、かつ、初期設定が容易で環境条件の影響を受けにくい物体認識装置、物体を認識する方法、プログラムおよび記録媒体を提供することを目的とする。
【００１２】
【課題を解決するための手段】
本発明の物体認識装置は、第１の対象物の可視光画像の画像データである第１の画像データと、前記第１の対象物の遠赤外光画像の画像データである第２の画像データとを含む第１の画像データ組を入力する入力部と、該入力部が前記第１の画像データ組を入力して、入力された前記第１の画像データ組の前記第１の画像データと前記第２の画像データにおける予め定められた少なくとも１つの位置に、方位選択性、位置選択性、空間周波数特性の少なくとも１つの選択性を有する少なくとも１つの画像フィルタをそれぞれ適用することによって前記第１の画像データおよび前記第２の画像データからそれぞれ得られる少なくとも１つのフィルタ出力値を成分として有する、特徴量空間における第１の特徴量ベクトルを求める特徴量ベクトル算出部と、前記第１の特徴量ベクトルと所定の識別パラメータとの関係に基づいて、前記第１の対象物が特定のカテゴリに属するか否かを判定する判定部とを備えており、これにより、上記目的が達成される。
【００１３】
前記第１の画像データは、前記第１の対象物から放射または反射される可視光線の光の強度によって前記第１の対象物を表現し、前記第２の画像データは、前記第１の対象物から放射または反射される遠赤外線の光の強度よって前記第１の対象物を表現してもよい。
【００１４】
前記入力部は、それぞれが複数の画像データからなる第２の画像データ組および第３の画像データ組をさらに前記特徴量ベクトル算出部に入力し、前記第２の画像データ組および第３の画像データ組のそれぞれは、前記特定のカテゴリに属する第２の対象物の可視光画像の画像データである第３の画像データと、前記第２の対象物の遠赤外光画像の画像データである第４の画像データとを含み、前記特徴量ベクトル算出部は、前記入力された第２の画像データ組および第３の画像データ組のそれぞれについて、前記第３の画像データおよび第４の画像データにおける予め定められた少なくとも１つの位置に、前記画像フィルタと同じ選択性を有する少なくとも１つの画像フィルタを適用することによって前記第３の画像データおよび前記第４の画像データからそれぞれ得られる少なくとも１つのフィルタ出力値を成分として有する、前記特徴量空間における特徴量ベクトルをさらに求め、前記第２の画像データ組についての前記特徴量空間における少なくとも１つの特徴量ベクトルと、前記第３の画像データ組についての前記特徴量空間における少なくとも１つの特徴量ベクトルとを識別するように、前記識別パラメータを求める学習部をさらに備えていてもよい。
【００１５】
前記第１の対象物は人間であってもよい。
【００１６】
前記学習部は、前記特徴量空間よりも多い次元数を有する仮の特徴量空間において、前記第２の画像データ組についての特徴量ベクトルと、前記第３の画像データ組についての特徴量ベクトルとを識別するための平面の法線の向きに基づいて前記仮の特徴量空間から少なくとも１つの次元を削除することによって、前記特徴量空間を定義してもよい。
【００１７】
前記識別パラメータは、前記特徴量空間における識別面を表し、前記判定部は、前記第１の特徴量ベクトルが、前記識別面に対してどちらの側に位置するかに基づいて、前記第１の対象物が前記特定のカテゴリに属するか否かを判定してもよい。
【００１８】
前記判定部は、前記第１の特徴量ベクトルと、前記識別面との距離が所定の閾値以上である場合に、前記第１の対象物が前記特定のカテゴリに属すると判定してもよい。
【００１９】
前記入力部は、前記第１の対象物の可視光画像の画像データである第５の画像データと、前記第１の対象物対象物の遠赤外光画像の画像データである第６の画像データとをさらに前記特徴量ベクトル算出部に入力するようになっており、前記第５の画像データおよび前記第６の画像データは、前記第１の画像データと前記第２の画像データとが撮影された第１の時刻から所定の時間の後に撮影されたものであってもよい。
【００２０】
前記第１の画像データが第１の場所から撮影され、前記第２の画像データが前記第１の場所とは異なる第２の場所から撮影されたものであってもよい。
【００２１】
前記入力部は、前記第１の対象物の可視光画像の画像データである第５の画像データと、前記第１の対象物の遠赤外光画像の画像データである第６の画像データとをさらに前記特徴量ベクトル算出部に入力するようになっており、前記第５の画像データおよび前記第６の画像データは、前記第１の画像データと前記第２の画像データとが撮影される第１の場所とは異なる第２の場所から撮影されたものであってもよい。
【００２２】
本発明の物体を認識する方法は、（ａ）第１の対象物物の可視光画像の画像データである第１の画像データと、前記第１の対象物の遠赤外光画像の画像データである第２の画像データとを含む第１の画像データ組を入力するステップと、（ｂ）入力された前記第１の画像データと前記第２の画像データのそれぞれの予め定められた少なくとも１つの位置に、方位選択性、位置選択性、空間周波数特性の少なくとも１つの選択性を有する少なくとも１つの画像フィルタをそれぞれ適用することによって前記第１の画像データおよび前記第２の画像データからそれぞれ得られる少なくとも１つのフィルタ出力値を成分として有する、特徴量空間における第１の特徴量ベクトルを求めるステップと、（ｃ）前記第１の特徴量ベクトルと所定の識別パラメータとの関係に基づいて、前記第１の対象物が前記特定のカテゴリに属するか否かを判定するステップとを包含し、これにより、上記目的が達成される。
【００２３】
本発明のプログラムは、コンピュータに物体認識処理を実行させるためのプログラムであって、前記物体認識処理は、（ａ）第１の対象物物の可視光画像の画像データである第１の画像データと、前記第１の対象物の遠赤外光画像の画像データである第２の画像データとを含む第１の画像データ組を入力するステップと、（ｂ）入力された前記第１の画像データと前記第２の画像データのそれぞれの予め定められた少なくとも１つの位置に、方位選択性、位置選択性、空間周波数特性の少なくとも１つの選択性を有する少なくとも１つの画像フィルタをそれぞれ適用することによって前記第１の画像データおよび前記第２の画像データからそれぞれ得られる少なくとも１つのフィルタ出力値を成分として有する、特徴量空間における第１の特徴量ベクトルを求めるステップと、（ｃ）前記第１の特徴量ベクトルと所定の識別パラメータとの関係に基づいて、前記第１の対象物が前記特定のカテゴリに属するか否かを判定するステップとを包含し、これにより、上記目的が達成される。
【００２４】
本発明の記録媒体は、コンピュータに物体認識処理を実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体であって、前記物体認識処理は、（ａ）第１の対象物物の可視光画像の画像データである第１の画像データと、前記第１の対象物の遠赤外光画像の画像データである第２の画像データとを含む第１の画像データ組を入力するステップと、（ｂ）入力された前記第１の画像データと前記第２の画像データのそれぞれの予め定められた少なくとも１つの位置に、方位選択性、位置選択性、空間周波数特性の少なくとも１つの選択性を有する少なくとも１つの画像フィルタをそれぞれ適用することによって前記第１の画像データおよび前記第２の画像データからそれぞれ得られる少なくとも１つのフィルタ出力値を成分として有する、特徴量空間における第１の特徴量ベクトルを求めるステップと、（ｃ）前記第１の特徴量ベクトルと所定の識別パラメータとの関係に基づいて、前記第１の対象物が前記特定のカテゴリに属するか否かを判定するステップとを包含し、これにより、上記目的が達成される。
【００２５】
以下、作用を説明する。
【００２６】
本発明によれば、認識のために入力される画像組（第１の画像組）は、対象物（第１の対象物）を第１の属性を用いて表現する第１の画像と、その対象物を第１の属性とは異なる第２の属性を用いて表現する第２の画像とを含む。対象物が特定のカテゴリに属するか否かの判定は、第１の属性と第２の属性とに基づいて行われるので、対象物の認識の信頼度が高くなる。さらに、その所定の数の画像の予め定められた位置に予め定められた画像フィルタを適用することによって得られるフィルタ出力値を成分として有する特徴量空間内の特徴量ベクトルが求められ、画像組は、この特徴量ベクトルによって表される。この処理には、第１の画像の領域と第２の画像の領域とを対応付ける処理は必要でないので、対象物の認識を行うための初期設定が容易であり、認識結果は、環境条件の影響を受けにくい。
【００２７】
【発明の実施の形態】
はじめに、図１〜図５を参照して、本発明の原理を説明する。
【００２８】
図１は、本発明の物体認識方法の全体の処理手順を示す。本発明の物体認識方法は、学習処理１００１（ステップＳ１００１ａ〜Ｓ１００１ｃ）と、認識処理１００２（ステップＳ１００２ａ〜Ｓ１００２ｃ）とを含む。以下、本発明の物体認識方法が人間を認識するために適用される場合を例に挙げて本発明の物体認識方法の手順を説明する。図１に示される物体認識方法は、図６を参照して後述する物体認識装置１によって実行される。
【００２９】
ステップＳ１００１ａ：学習用の画像組が物体認識装置１に入力される。以下の説明において、画像組とは、特に断らない限り、同一の対象物の可視光画像と遠赤外光画像との２枚の画像からなる組をいう。学習用の画像組は、人間（「人間」というカテゴリに属する第２の対象物）を表現する少なくとも１つの画像組（第２の画像組）と、その第２の画像組以外の少なくとも１つの画像組（人間以外の対象物を表現する画像組、第３の画像組）とを含む。ステップＳ１００１ａにおいて、複数の学習用の画像組が入力される。
【００３０】
ステップＳ１００１ｂ：ステップＳ１００１ａで入力された複数の学習用の画像組のそれぞれについて、特徴量ベクトルが求められる。１つの画像組からの特徴量ベクトルの算出は、図２Ａおよび図２Ｂを参照して後述される。この特徴量ベクトルは、特徴量空間における１つの点とみなすことができる。
【００３１】
ステップＳ１００１ｃ：少なくとも１つの第２の画像組についての特徴量ベクトルと、少なくとも１つの第３の画像組についての特徴量ベクトルとを識別（分離）するように、特徴量空間における識別面が求められる。識別面の算出は、図４を参照して後述される。
【００３２】
ステップＳ１００２ａ：認識用の画像組（第１の画像組）が物体認識装置１に入力される。
【００３３】
ステップＳ１００２ｂ：ステップＳ１００２ａで入力された認識用の画像組のそれぞれについて、特徴量ベクトルが求められる。
【００３４】
ステップＳ１００２ｃ：認識用の画像組の対象物（第１の対象物）が、「人間」という特定のカテゴリに属するか否かが判定される。この判定は、ステップＳ１００１ｃで求められた識別面と、ステップＳ１００２ｂで求められた特徴量ベクトルとの位置関係に基づいてなされる。
【００３５】
学習処理１００１は、学習用の画像組から、識別面（識別パラメータ）を求める処理である。この識別面は、認識処理１００２において、認識用の画像組によって表現される対象物が特定のカテゴリに属するか否かの判定のための判定基準として用いられる。
【００３６】
図２Ａは、ステップＳ１００１ａ（図１）において入力される画像組６１０〜６１３（少なくとも１つの第２の画像組）を示す。画像組６１０〜６１３のそれぞれは、人間の可視光画像と、その同じ人間の遠赤外光画像との２枚の画像を含む。図２Ａに示される例では、画像組６１０は、可視光画像６０１（第３の画像）と、遠赤外光画像６０２（第４の画像）とを含む。なお、可視光画像とは、その画像の対象物から放射または反射される可視光線（波長３８０〜８００ｎｍの波長帯域の光線）の強度を表す画像であり、遠赤外光画像とは、その画像の対象物から放射また反射される遠赤外光線（波長８〜１０μｍの波長帯域の光線）の強度を表す画像である。可視光画像は、対象物から放射または反射される可視光線の強度（輝度）という対象物の属性を用いて対象物を表現し、遠赤外光画像は、対象物から放射または反射される遠赤外光線の強度という対象物の属性を用いて対象物を表現しているということができる。
【００３７】
画像６０１の対象物６２１と、画像６０２の対象物６２２とは、同一の対象物（同一の人物）である。画像組６１１に含まれる可視光画像と遠赤外光画像との対象物も同一の対象物である。しかし、画像組６１０〜画像組６１１の間で、対象物が同一である必要はない。画像組６１０〜画像組６１１の間で、対象物は同一のカテゴリ（この例では、「人間」というカテゴリ）に属してさえいればよい。
【００３８】
図２Ａには、ステップＳ１００１ａ（図１）において入力される第２の画像組が４組示されている（画像組６１０〜６１３）が、ステップＳ１００１ａ（図１）において入力される第２の画像組の数はこれに限定されない。
【００３９】
図２Ｂは、ステップＳ１００１ａ（図１）において入力される画像組６６０〜６６３（少なくとも１つの第３の画像組）を示す。画像組６６０〜６６３のそれぞれは、人間以外の対象物の可視光画像と、その同じ対象物の遠赤外光画像との２枚の画像を含む。図２Ｂに示される例では、画像組６６０は、画像６５１（可視光画像）と、画像６５２（遠赤外光画像）とを含む。画像６５１は、画像６０１（図２Ａ）と同一のサイズを有し、画像６５２は、画像６０２（図２Ａ）と同一のサイズを有しているものとする。
【００４０】
再び図２Ａを参照して、ステップＳ１００１ｂ（図１）において、画像組についての特徴量ベクトルを求める処理を説明する。
【００４１】
画像６０１の２つの位置６３１および６３２のそれぞれに、２種類の画像フィルタ（画像フィルタＡおよび画像フィルタＢ、図示せず）を適用すること仮定する。画像フィルタＡと画像フィルタＢとは、例えば、異なる特性を有する画像フィルタである。画像フィルタの具体例は、図３を参照して後述される。図２Ａにおいて、位置６３１および６３２が矩形で示されている。この矩形は、画像フィルタＡおよび画像フィルタＢのサイズを表している。ここでは、画像フィルタＡおよび画像フィルタＢは等しいサイズを有するものとする。
【００４２】
１つの画像フィルタを画像６０１の１つの位置に適用することによって１つのスカラー値（フィルタ出力値）が生成される。図２Ａに示される例では、画像６０１の２つの位置６３１および６３２のそれぞれに画像フィルタＡおよびＢを適用することによって、４つのフィルタ出力値（１０、３、１１および５）が生成されている。具体的には、位置６３１および位置６３２に画像フィルタＡを適用することによって、フィルタ出力値「１０」および「３」がそれぞれ生成され、位置６３１および位置６３２に画像フィルタＢを適用することによってフィルタ出力値「１１」および「５」がそれぞれ生成される。
【００４３】
同様に、画像６０２の２つの位置６３３および６３４のそれぞれに、上述した画像フィルタＡおよびＢを適用することによって、４つのフィルタ出力値（１、７、１１および４）が生成されている。
【００４４】
画像６０１についての４つのフィルタ出力値（１０、３、１１および５）と、画像６０２についての４つのフィルタ出力値（１、７、１１および４）とを結合することによって、画像組６１０についての特徴量ベクトル（１０，３，１１，５，１，７，１１，４）が算出される。このようにして、フィルタ出力値を用いて可視光画像の情報と遠赤外光画像の情報とが統合される。
【００４５】
図２Ａに示される画像組６１１〜６１３についても同様にして特徴量ベクトルが算出される。特徴量ベクトルは、フィルタ出力値を成分として有する。この特徴量ベクトルは、８次元の特徴量空間における１つの点とみなすことができる。
【００４６】
図２Ｂに示される画像組６６０についても、同様にして特徴量ベクトルが算出される。具体的には、画像６５１の２つの位置６８１および６８２のそれぞれに上述した画像フィルタＡおよび画像フィルタＢを適用することによって、４つのフィルタ出力値（８、９、０および２）が生成される。画像６５２の２つの位置６８３および６８４のそれぞれに上述した画像フィルタＡおよび画像フィルタＢを適用することによって、４つのフィルタ出力値（９、１２、１０および４）が生成される。画像６５１についての４つのフィルタ出力値（８、９、０および２）と、画像６５２についての４つのフィルタ出力値（９、１２、１０および４）とを結合することによって、画像組６６０についての特徴量ベクトル（８，９，０，２，９，１２，１０，４）が算出される。図２Ｂに示される画像組６６１〜６６３についても同様にして特徴量ベクトルが算出される。
【００４７】
適用される画像フィルタと、その画像フィルタが適用される位置とは予め定められている。本発明の１つの実施形態において、適用される画像フィルタと、その画像フィルタが適用される位置とは、図１２を参照して後述する特徴量次元の削除処理を通じて決定される。図２Ａおよび図２Ｂに示される例では、画像組６１０（図２Ａ）に適用される画像フィルタＡおよびＢと同一の画像フィルタが、画像組６６０（図２Ｂ）にも適用される。位置６３１および６３２の画像６０１に対する位置関係は、それぞれ、位置６８１および６８２の画像６５１に対する位置関係に等しい。なお、１つの画像中の画像フィルタが適用される位置の数は、２に限定されない。また、１つの画像中の１つの位置に適用される画像フィルタの数も、２に限定されない。
【００４８】
このように、ステップ１００１ｂ（図１）において、複数の画像組（図２Ａに示される画像組６１０〜６１３および図２Ｂに示される画像組６６０〜６６３）のそれぞれについて、２枚の画像のうち予め定められた少なくとも１つの位置に予め定められた少なくとも１つの画像フィルタを適用することによって得られる少なくとも１つのフィルタ出力値を成分とする、特徴量空間における特徴量ベクトルが求められる。
【００４９】
図３は、画像３５１に画像フィルタ３５４を適用する例を示す。図３に示される例では、画像３５１の位置３５３に、画像フィルタ３５４が適用される。図３には、画像３５１の部分３５２の拡大図が示されている。この拡大図において、部分３５２の輪郭を表す矩形の内部の値は、画像３５１に含まれる画素の値を示す。
【００５０】
図３に示される例では、画像フィルタ３５４は、３×３のサイズを有している。画像フィルタ３５４を表す矩形の内部の値は、９個のフィルタ係数を示す。画像３５１の位置３５３に画像フィルタ３５４を適用することによって得られるフィルタ出力値は、画像フィルタ３５４のフィルタ係数と、そのフィルタ係数に対応する画素の値との積を画像フィルタ３５４の９個のフィルタ係数について合計した値である。この例では、フィルタ出力値は、７６５である。フィルタ出力値を求める演算をフィルタ演算という。フィルタ演算により、画像の局所的な特性情報がフィルタ出力値として抽出される。
【００５１】
図４は、ステップ１００１ｂ（図１）において各画像組について求められた特徴量ベクトルを特徴量空間７０１にプロットした状態を示す。ただし、図４に示される例では、説明のために、特徴量空間７０１は２次元の空間（すなわち、平面）として表されている。特徴量空間７０１は、２個の次元、すなわち、特徴量次元１と特徴量次元２とによって定義されている。特徴量ベクトルは、特徴量空間７０１における１つの点として表される。図４において、○印のそれぞれは、人間を表現する画像組（第２の画像組）についての特徴量ベクトルを表し、×印のそれぞれは、人間以外の対象物を表現する画像組（第３の画像組）についての特徴量ベクトルを表す。以下、人間を表現する画像組についての特徴量ベクトルを単に、「人間を表現する特徴量ベクトル」と呼び、人間以外の対象物を表現する画像組についての特徴量ベクトルを単に、「人間以外の対象物を表現する特徴量ベクトル」と呼ぶことがある。
【００５２】
識別直線７０２（識別平面）は、ステップ１００１ｃ（図１）において、○印によって表される特徴量ベクトルと、×印によって表される特徴量ベクトルとを識別するように定められる。図４に示される例では、○印によって表される特徴量ベクトルはすべて識別直線７０２の上側（矢印７０３の側、第１の側）にあり、×印によって表される特徴量ベクトルはすべて識別直線７０２の下側（矢印７０４の側、第２の側）にある。識別直線７０２は、例えば、サポートベクトルマシンの手法を用いて定められ得る。サポートベクトルマシンの手法については、例えば、文献：Ｖ．Ｖａｐｎｉｃ、”ＴｈｅＮａｔｕｒｅｏｆＳｔａｔｉｓｔｉｃａｌＬｅａｒｎｉｎｇＴｈｅｏｒｙ”、ＳｐｒｉｎｇｅｒＶｅｒｌａｇ、１９９５年を参照されたい。識別直線７０２は、あるいは、線形パーセプトロンの学習または判別分析法等の手法を用いて定められてもよい。学習アルゴリズムには、統計的パラメータ推定法およびニューラルネットワーク等のノンパラメトリックな学習アルゴリズムが採用され得る。
【００５３】
図４に示される例では、２次元の特徴量空間７０１において、識別直線７０２が、人間を表現する画像組（第２の画像組）についての特徴量ベクトルと、人間以外の対象物を表現する画像組（第３の画像組）についての特徴量ベクトルとを識別（分離）している。人間を表現する画像組（第２の画像組）についての特徴量ベクトルと、人間以外の対象物を表現する画像組（第３の画像組）についての特徴量ベクトルとが、ｎ次元（ｎ≧２）の特徴量空間において表される場合には、それらの特徴量ベクトルは、識別平面によって識別される。ｎ次元の特徴量空間が、次元ｘ₁、次元ｘ₂、．．．、および次元ｘ_nによって定義される場合には、その識別平面は、（数１）によって表される。
【００５４】
【数１】
ａ₁ｘ₁＋ａ₂ｘ₂＋．．．＋ａ_nｘ_n＋ｄ＝０
以下、本明細書中で「平面」とは、ｎ次元（ｎ≧２）の特徴量空間において（数１）の関係を満たす点（ｘ₁，ｘ₂，．．．，ｘ_n）の集合をいう。ｎ＝２の場合には、（数１）は直線を表すが、この直線は上記の「平面」の定義に含まれる。
【００５５】
図５は、ステップ１００２ａ（図１）において入力される認識用の画像組（第１の画像組）５１０の例を示す。画像組５１０は、対象物５２１を表現する可視光画像５０１と、対象物５２２を表現する遠赤外光画像５０２とを含む。対象物５２１と、対象物５２２とは、同一の対象物（第１の対象物）である。
【００５６】
ステップＳ１００２ｂ（図１）において、画像組５１０についての特徴量空間における特徴量ベクトル（第１の特徴量ベクトル）が算出される。図５に示される例では、画像組５１０についての特徴量ベクトルは、（９，４，１２，６，１，６，１４，３）として算出されている。この特徴量ベクトルの算出は、図２Ａを参照して上述した画像組６１０についての特徴量ベクトルの算出と同様にして行われる。すなわち、ステップＳ１００２ｂにおいて、画像組５１０について、２個の画像（画像５０１および５０２）のうち、予め定められた少なくとも１つの位置に予め定められた少なくとも１つの画像フィルタ（画像フィルタＡおよびＢ）を適用することによって得られる少なくとも１つのフィルタ出力値を成分として有する、特徴量空間における特徴量ベクトル（第１の特徴量ベクトル）が求められる。
【００５７】
図４に示される●印は、特徴量空間７０１にプロットされた第１の特徴量ベクトルを示す。ただし、図４には、説明のために、第１の特徴量ベクトルを８次元の特徴量ベクトルとしてではなく、２次元の特徴量ベクトル（２，１０）として表している。
【００５８】
ステップＳ１００２ｃ（図１）では、●印により示される第１の特徴量ベクトルと、識別直線７０２との特徴量空間７０１における位置関係に基づいて、画像組５１０（図５）が表現する対象物が人間であるか否かが判定される。図４に示される例では、●印により示される特徴量ベクトルは、識別直線７０２の上側（矢印７０３の側）にある。識別直線７０２の矢印７０３の側は、人間を表現する画像組についての特徴量ベクトルが位置する側であるので、画像組５１０（図５）が表現する対象物が人間であると判定される。
【００５９】
このようにして、図１に示される本発明の方法によれば、画像組５１０（図５）が表現する第１の対象物（画像組５１０の可視光画像５０１と遠赤外光画像５０２との共通の対象物）が人間であることが認識される。第１の対象物が「人間」というカテゴリに属するか否かの判定は、対象物が反射または放射する可視光線の強度（第１の属性）と、その対象物が反射または放射する遠赤外光線の強度（第２の属性）とに基づいて行われるので、対象物の認識の信頼度が高くなる。さらに、その画像５０１および画像５０２の予め定められた位置に予め定められた画像フィルタを適用することによって得られるフィルタ出力値を成分として有する特徴量空間内の特徴量ベクトルが求められ、画像組５１０は、この特徴量ベクトルによって代表される。この処理には、画像５０１の領域と第２の画像５０２の領域とを対応付ける処理は必要でないので、第１の対象物の認識を行うための初期設定が容易になり、かつ、認識の結果が環境条件の影響を受けにくくなる。
【００６０】
以下、図面を参照して本発明の実施の形態を説明する。同一の構成要素には同一の参照番号を付し、重複した記載を省略する場合がある。
【００６１】
図６は、本発明の実施の形態の物体認識装置１の構成を示す。
【００６２】
物体認識装置１は、遠赤外光カメラ１００と、可視光カメラ１１０と、学習用画像データ（学習用の画像組）を格納する記憶装置１２０と、画像に画像フィルタを作用させるフィルタ処理部１２５と、学習処理部１３０と、遠赤外光カメラ１００および可視光カメラ１１０によって取得された画像組が表す対象物が特定のカテゴリに属するか否か（例えば、その対象物が人間か否か）を判定する認識処理部１４０と、その判定の際に判定基準として用いられる識別パラメータを記憶する識別パラメータ記憶部１５０と、ワークメモリ１６０と、認識結果を表示する表示部１７０とを含む。物体認識装置１の各構成要素は、内部バスを介して相互に接続されてもよいし、ネットワークを介して相互に接続されてもよい。そのようなネットワークは、無線ネットワーク、有線ネットワーク、電話回線ネットワーク等の任意のネットワークを含み得る。そのようなネットワークは、インターネットを含んでもよい。
【００６３】
遠赤外光カメラ１００は、遠赤外光画像を撮影し、可視光カメラ１１０は、可視光画像を撮影する。本発明の実施の形態では、可視光画像として、輝度画像が用いられた。
【００６４】
物体認識装置１は、例えば、屋外における侵入者の監視システムや、自動車等の移動体に搭載される歩行者の検出システムや、移動ロボットに搭載される視覚システムに適用され得る。
【００６５】
上述したように、物体認識装置１は、全体として、図１に示される学習処理および認識処理を行う。
【００６６】
可視光カメラ１１０と、遠赤外光カメラ１００とは、ステップＳ１００１ａ（図１）における学習用の画像組の入力処理と、ステップＳ１００２ａ（図１）における、認識用の画像組（第１の画像組）の入力処理とを行う。可視光カメラ１１０と、遠赤外光カメラ１００とは、学習用の画像組と認識用の画像組とを物体認識装置１に入力する入力部１９０として機能する。もちろん、学習用の画像組を物体認識装置に入力する可視光カメラおよび遠赤外光カメラと、認識用の画像組を物体認識装置に入力する可視光カメラおよび遠赤外光カメラとがそれぞれ別に設けられていてもよい。
【００６７】
学習用の画像組は、人間の対象物を表現する画像組（第２の画像組）と、人間以外の対象物を表現する画像組（第３の画像組）とに区分されて、学習用画像データとしていったん記憶装置１２０に蓄積される。記憶装置１２０は、例えば、ハードディスクであり得る。あるいは、記憶装置１２０は、任意のメモリであり得る。
【００６８】
フィルタ処理部１２５は、ステップＳ１００１ｂ（図１）およびステップＳ１００２ｂ（図１）における特徴ベクトルの算出を行う。フィルタ処理部１２５（特徴量ベクトル算出部）は、例えば、デジタルシグナルプロセッサであり得る。
【００６９】
学習処理部１３０（学習部）は、ステップＳ１００１ｃ（図１）における、識別面を求める処理を行う。
【００７０】
認識処理部１４０（判定部）は、ステップＳ１００２ｃ（図１）における、認識用の画像組の対象物が特定のカテゴリ「人間」に属するか否かを判定する処理を行う。
【００７１】
表示部１７０は、認識処理部１４０における認識の結果を表示する。表示部１７０としては、任意の表示デバイスが用いられ得る。表示部１７０は、省略されてもよい。
【００７２】
図７Ａは、遠赤外光カメラ１００と可視光カメラ１１０との配置の例を示す。図７Ａに示される例では、遠赤外光カメラ１００と、可視光カメラ１１０とが並列に配置されている。
【００７３】
図７Ｂは、遠赤外光カメラ１００と可視光カメラ１１０との配置の他の例を示す。図７Ｂに示される例では、コールドミラー８０２で反射した可視光カメラ１１０の光軸が遠赤外光カメラ１００の光軸にそろうように、遠赤外光カメラ１００と可視光カメラ１１０とが配置されている。コールドミラーとは、可視光を反射し、遠赤外光を透過する性質を有するミラーである。
【００７４】
遠赤外光カメラ１００の機能と可視光カメラ１１０の機能とが１つのカメラによって実現されてもよい。
【００７５】
図７Ｃは、遠赤外光カメラ１００と可視光カメラ１１０とに代えて、その両方の機能を併せ持つ可視光・遠赤外光カメラ２１０が用いられる例を示す。可視光・遠赤外光カメラ２１０は、エリアセンサを用いて遠赤外光カメラの機能と可視光カメラの機能とを実現している。
【００７６】
図８Ａは、可視光カメラ１１０によって撮影された、人間の対象物を表現する可視光画像８０３の例を示す。
【００７７】
図８Ｂは、遠赤外光カメラ１００によって撮影された、人間の対象物を表現する遠赤外光画像８０４の例を示す。遠赤外光画像８０４は、図８Ａに示される可視光画像８０３と同一の対象物をほぼ同一の時刻に撮影することによって得られる。
【００７８】
図９Ａは、可視光カメラ１１０によって撮影された、人間以外の対象物（木）を表現する可視光画像８０５の例を示す。
【００７９】
図９Ｂは、遠赤外光カメラ１００によって撮影された、人間以外の対象物（木）を表現する遠赤外光画像８０６の例を示す。遠赤外光画像８０６は、図９Ａに示される可視光画像８０５と同一の対象物をほぼ同一の時刻に撮影することによって得られる。
【００８０】
図８Ａ、図８Ｂ、図９Ａおよび図９Ｂに示される可視光画像および遠赤外光画像は、画像組（学習用の画像組または認識用の画像組）を構成する。可視光画像および遠赤外光画像に同一の対象物が写ることが必要であるが、可視光画像と遠赤外光画像との間で画素単位で正確に位置合わせが行われている必要はない。例えば、可視光画像８０３（図８Ａ）において、対象物は画像の中心から左方向にずれており、遠赤外光画像８０４（図８Ｂ）において、対象物は画像の中心から右方向にずれているが、本発明の学習処理および認識処理において可視光画像８０３の領域と遠赤外光画像８０４の領域とを対応付ける処理は必要でないので、このようなずれは問題にならない。従って、遠赤外光カメラ１００および可視光カメラ１１０の位置合わせが容易であり、物体認識装置１の初期設定が容易である。ただし、学習用の画像組と認識用の画像組とに含まれるすべての可視光画像のスケール比（縦横比）が同じであり、対象物が同様の位置に写っていることが必要である。これは、可視光カメラ１１０により撮影される可視光画像から所定の領域を切り出すことによって実現されてもよい。学習用の画像組と認識用の画像組とに含まれる赤外光画像についても同様である。
【００８１】
図１０Ａは、フィルタ処理部１２５（図６）において用いられる画像フィルタの特性を模式的に示す。図１０Ａに示される画像フィルタは、画像中の特定の位置に適用された場合に、その特定の位置における特定方向（垂直方向）の特定の空間周波数を有するエッジ（図１０Ａに示される楕円形の横幅（短軸）の範囲内で画素の値が順次変化するようなエッジ）を選択的に検出する。
【００８２】
図１０Ｂは、水平方向のエッジを選択的に検出する画像フィルタの特性を模式的に示す。
【００８３】
図１０Ｃは、左下から右上に延びるエッジを選択的に検出する画像フィルタの特性を模式的に示す。
【００８４】
図１０Ｄは、右下から左上に延びるエッジを選択的に検出する画像フィルタの特性を模式的に示す。
【００８５】
図１０Ａ〜図１０Ｄに示される画像フィルタは、方位選択性（特定の方向のエッジのみを検出する）と位置選択性（特定の位置のエッジを検出する）と空間周波数選択性（特定の空間周波数で画素の値が変化するエッジを検出する）とを有するフィルタである。ここで、空間周波数とは、画像中の位置変化に対する画素の値（例えば、輝度）の変化の度合いをいう。このような特性を有する画像フィルタの例として、Ｇａｂｏｒフィルタが挙げられる。方位選択性と位置選択性と空間周波数選択性とを有する画像フィルタを複数種類（選択性が異なる複数の画像フィルタ）用いることによって、異なる画像フィルタが同一のエッジに関する情報を重複して検出するという無駄を低減することができ、必要な画像フィルタの数を減らすことができる。これにより、学習処理１００１および認識処理１００２を実行するために必要な計算量を減らすことができる。その結果、属性が異なる複数の画像（可視光画像と遠赤外光画像）を入力することに起因する計算量の増加を最小限にとどめることができる。
【００８６】
１つのフィルタ出力値は、画像の特定の位置における特定方向の特定の空間周波数を有するエッジの情報を表す。フィルタ出力値を成分とする特徴量ベクトルによって１つの画像組を表すことは、可視光画像と遠赤外光画像との共通の対象物の形状をエッジの集まりとして簡易的に表現することに相当する。
【００８７】
図１１Ａ〜図１１Ｄのそれぞれは、Ｇａｂｏｒフィルタのフィルタ係数の例を示す。図１１Ａ〜図１１Ｄに示される例において、各格子点は、１３画素×１３画素のサイズを有するＧａｂｏｒフィルタの１つのフィルタ係数に対応し、そのフィルタ係数の値（実数部）は、各格子点の高さとして示されている。図１１Ａ〜図１１Ｄは、図１０Ａ〜図１０Ｄに示される画像フィルタにそれぞれ対応する。
【００８８】
以下、物体認識装置１が実行する学習処理と認識処理との詳細な処理手順を説明する。
【００８９】
＜学習処理＞
図１２は、物体認識装置１が実行する学習処理の詳細な手順を示す。ステップＳ９１は、ステップＳ１００１ａおよびステップＳ１００１ｂ（図１）に対応しており、ステップＳ９２〜ステップＳ９７は、ステップＳ１００１ｃ（図１）に対応している。
【００９０】
ステップＳ９１：人間を表現する画像組（第２の画像組）と、人間以外の対象物を表現する画像組（第３の画像組）とをそれぞれ複数用意し、各画像組について特徴量ベクトルを求める。ステップＳ９１において求められた特徴量ベクトルの集合を特徴量データＦと呼ぶ。ステップＳ９１のさらに詳細な処理手順は、図１５および図１６を参照して後述される。以下の説明では、ステップＳ９１において、各画像組から１０３２次元の特徴量ベクトルが算出されることを仮定する（本発明はこれに限定されない）。この特徴量ベクトルは、１０３２個の特徴量次元、特徴量次元ｘ₁、特徴量次元ｘ₂、特徴量次元ｘ₃、．．．、特徴量次元ｘ₁₀₃₂によって定義される空間内の１つの点として表される。
【００９１】
ステップＳ９２：特徴量次元のうち、学習に用いる特徴量次元が指定される。最初は全ての次元を指定する。この例では、１回目は、特徴量次元ｘ₁、特徴量次元ｘ₂、特徴量次元ｘ₃、．．．、特徴量次元ｘ₁₀₃₂の全てが、学習に用いる特徴量次元として指定される。２回目以降は、後述するステップＳ９４で除かれた残りの次元が学習に用いる特徴量次元として指定される。ステップＳ９２〜ステップＳ９６の処理を繰り返すことによって、特徴量空間の次元数が減っていく。これを、「特徴量次元の削除処理」と呼ぶ。
【００９２】
ステップＳ９３：指定された特徴量次元を用いてより低次元の特徴量空間を定義し（ただし、最初は１０３２次元の特徴量空間が定義される）、特徴量データＦに含まれる各特徴量ベクトルが、このより低次元の特徴量空間における特徴量ベクトルとして表される。このより低次元の特徴量空間における特徴量ベクトルは、特徴量データＦに含まれる１０３２次元の特徴量ベクトルの成分のうち、ステップＳ９２で指定された特徴量次元に対応する成分のみから構成される、より低次元の特徴量ベクトルとして表される。特徴量ベクトルの１つの成分は、画像における１つの位置における１つのフィルタ出力値に対応しているので、このより低次元の特徴量ベクトルも、画像組における２つの画像の少なくとも１つの予め定められた位置に予め定められた１つの画像フィルタを適用することによって得られる少なくとも１つのフィルタ出力値を成分として有する。
【００９３】
次に、このより低次元の特徴量空間において、人間を表現する画像組についての特徴量ベクトルと、人間以外の対象物を表現する画像組についての特徴量ベクトルとを識別（分離）する識別平面が、仮識別平面として決定される。
【００９４】
識別平面に対する特徴量ベクトルの位置は、特徴量ベクトルの各特徴量次元に対応する成分を識別平面の各特徴量次元に対応する係数で重み付けして足し合わせた値（重み付け和）により表現できる。例えば、３次元空間における識別平面がｘ＋２ｙ＋３ｚ＝０と表され、特徴量ベクトルが（ｘ，ｙ，ｚ）＝（−１，０，４）の場合を考える。特徴量ベクトル（−１、０、４）の各成分を識別平面の各特徴量次元の係数（１、２、３）で重み付けして足し合わせると、１×（−１）＋２×０＋３×４＝１１という値が得られる。この値は、特徴量ベクトル（−１，０，４）と、識別平面との距離を表す。この値の符号および大小で識別平面に対する特徴量ベクトルの位置関係を表すことができる。
【００９５】
特徴量空間内に２つのカテゴリに属する点が分布する場合に、カテゴリを分けるような識別平面（仮識別平面）を決定する手法としては、上述したように、サポートベクトルマシンの手法等が用いられ得る。
【００９６】
このような手法を用いて、特徴量空間内において人間を表現する画像組についての特徴量ベクトルと、人間以外の対象物を表現する画像組についての特徴量ベクトルとを分離する識別平面が仮識別平面として決定される。分離は必ずしも完全である必要はなく、一部の特徴量ベクトル（例えば、人間を表現する画像組についての特徴量ベクトル）が識別平面を越えて（人間を表現しない画像組についての特徴量ベクトルの側に）分布する配置（誤識別）になっていてもかまわない。ただし、誤識別の特徴量ベクトルの個数は少ない方がよい。誤識別の特徴量ベクトルが少なくなるように識別平面の決定手法が複数の手法のうちから選択されてもよい。
【００９７】
ステップＳ９４：ステップＳ９２で指定された特徴量次元に対応する座標軸のうち、ステップＳ９３で決定した仮識別平面とのなす角度の絶対値が小さい座標軸から順にｄ個の座標軸が、ステップＳ９２で指定した特徴量次元（学習に用いる特徴量次元）から除かれる。ｄの値は予め定められた１以上の整数とする。
【００９８】
例えば、３次元の特徴量空間（座標軸をｘｙｚとする）において、仮識別平面がｘ＋２ｙ＋３ｚ＝０と表される場合、この仮識別平面とｘ、ｙ、ｚ軸のなす角度の絶対値は、ｘ軸、ｙ軸、ｚ軸の順に大きくなる。この場合、ｄ＝１とするとｘ軸に対応する特徴量次元が、学習に用いる特徴量次元から除かれる。各座標軸と識別平面とのなす角度に注目することに代えて、各座標軸と識別平面の法線とのなす角度δに注目し、δの絶対値の大きいものからｄ個の座標軸を除いても同様の結果が得られる。識別平面の法線の向きは、仮識別平面を表す式の係数を成分として有する法線ベクトルによって表される。例えば、仮識別平面ｘ＋２ｙ＋３ｚ＝０の法線の向きは、法線ベクトル（１，２，３）によって表される。
【００９９】
再び図４を参照して、特徴量次元を学習に用いる特徴量次元から除くことの意味を説明する。
【０１００】
図４において、特徴量次元１に対応する軸（横軸）が識別直線（識別平面）７０２となす角度は、特徴量次元２に対応する軸（縦軸）が識別直線（識別平面）７０２となす角度よりも小さい。このことは、特徴量次元１は、人間を表現する画像組についての特徴量ベクトルと人間以外の対象物を表現する画像組についての特徴量ベクトルとを識別（分離）するために、特徴量次元２よりも重要でないことを意味する。すなわち、特徴量次元２の値（フィルタ出力値）が、対象物が人間であるか否かの判定に大きく影響する。図４に示される例では、ステップＳ９４（図１２）において、特徴量次元１と特徴量次元２とのうち、特徴量次元１が削除される。
【０１０１】
１つの特徴量次元の値は、画像組に含まれる１つの可視光画像または遠赤外光画像の１つの位置における１つのフィルタ出力値に対応する。このように、識別直線(または識別平面)を求めることによって、可視光画像から得た複数のフィルタ出力と遠赤外光画像から得た複数のフィルタ出力とのうち、どの特徴量次元(フィルタ出力)が重要であるかを決定することができる。ステップＳ９４（図１２）は、重要でない特徴量次元（識別に寄与しない特徴量次元）を学習に用いる特徴量次元から削除することを意味する。図１２に戻って学習処理の詳細な処理手順の説明を続ける。
【０１０２】
ステップＳ９５：ｄ個の座標軸を減らした特徴量空間を設定し、新たに設定した特徴量空間において識別性能の評価を行う。この評価は、ステップＳ９４でｄ個の座標軸(特徴量次元)を除くことによって新たに定義されたより低次元の特徴量空間において、特徴量データＦに含まれる、人間を表現する特徴量ベクトルと人間以外の対象物を表現する特徴量ベクトルとをどれだけ正確に識別（分離）できるかを調べることによって行われる。ステップＳ９５のさらに詳細な処理手順は、図１７を参照して後述される。
【０１０３】
ステップＳ９６：識別性能が基準値を満たすか否かが判定される。ステップＳ９６の判定の結果が「Ｎｏ」である場合には、処理はステップＳ９７に進む。ステップＳ９６の判定の結果が「Ｙｅｓ」である場合には、処理はステップＳ９２に戻る（さらに特徴量次元の削除が行われる）。
【０１０４】
ステップＳ９７：ステップＳ９３で用いられた特徴量次元が選択次元として指定される。また、ステップＳ９３で求めた仮識別平面が、識別平面として指定される。この識別平面は、後に行われる認識処理において判定基準として使用される。識別平面は、ステップＳ９７で選択次元として指定された特徴量次元によって定義される空間（特徴量空間）における平面である。なお、ステップＳ９６において、初回は無条件にステップＳ９２に移行するようにしてもよい。
【０１０５】
このように、ステップＳ９２〜ステップＳ９６において、学習処理部１３０（図６）は、認識処理において用いられる特徴量空間よりも多い次元数を有する仮の特徴量空間において、人間を表現する画像組（第２の画像組）についての特徴量ベクトルと、人間以外の対象物を表現する画像組（第３の画像組）についての特徴量ベクトルとを識別するための平面（仮識別平面）の法線の向きに基づいて、仮の特徴量空間から少なくとも１つの次元を削除することによって、認識処理において用いられる特徴量空間を定義する。
【０１０６】
ステップＳ９７において、１０３２個の特徴量次元、特徴量次元ｘ₁、特徴量次元ｘ₂、特徴量次元ｘ₃、．．．、特徴量次元ｘ₁₀₃₂のうち、ｍ個（ｍは１０３２以下の整数）の特徴量次元、特徴量次元ｘ_a1、特徴量次元ｘ_a2、特徴量次元ｘ_a3、．．．、特徴量次元ｘ_am（添え字ａ１、ａ２、ａ３、．．．ａｍは、１以上１０３２以下の整数）が選択次元として指定されることを仮定すると、選択次元のリスト（特徴量次元ｘ_a1，特徴量次元ｘ_a2，特徴量次元ｘ_a3，．．．，特徴量次元ｘ_am）は、画像組に含まれる可視光画像と遠赤外光画像とに適用されるどのフィルタ出力が、後に行われる認識処理において用いられるかを示す。すなわち、選択次元のリストは、可視光画像の情報と遠赤外光画像の情報の組み合わせ方を規定しているということができる。
【０１０７】
ステップＳ９７で決定された識別平面は、選択次元のリストと、その選択次元についての係数とにより表される。これらの識別平面を表すパラメータは、識別パラメータ記憶部１５０（図６）に格納される。
【０１０８】
上述した処理手順において、ステップＳ９５を省略し、ステップＳ９６の判断を削除された特徴量次元の数が予め定めた値に達したか否かの判断に置き換えてもよい。すなわち、特徴量次元の削除数が予め定めた数に達した場合にはステップＳ９７に進み、予め定めた回数に達しない場合はステップＳ９２に進むようにしてもよい。このような処理を行うことで、特徴量次元数を予め定めた値に設定することができる。
【０１０９】
なお、ステップＳ９４において用いられる値ｄを大くすると、ステップＳ９２〜ステップＳ９６までの手順の繰り返し回数を少なくすることができ、計算量を低減することができる。一方、ｄの値を小さくすると、一度に多くの特徴量次元を削除することがないので、望ましい識別性能を実現するために必要十分な数の特徴量次元を選択次元として決定することが可能になる。
【０１１０】
なお、後に行われる認識処理において、ステップＳ９７で決定された識別平面とは異なる識別面（平面に限定されない）が、判定基準（識別パラメータ）として用いられてもよい。そのような識別面は、選択次元によって定義される空間内で人間を表す特徴量ベクトルと人間以外の対象物を表す特徴量ベクトルとを識別するように設定される。選択次元によって定義される空間内で人間を表す特徴量ベクトルと人間以外の対象物を表す特徴量ベクトルとを識別する任意の識別手法およびその識別手法において用いられる識別パラメータが、後に行われる認識処理において、対象物が人間であるか否かを判定するために採用され得る。
【０１１１】
後に行われる認識処理において、線形な識別手法が採用されてもよいし、非線形な識別手法が採用されてもよい。線形な識別手法とは、例えば、ステップＳ９７で決定された、識別パラメータによって表される識別平面のどちら側に対象物を表現する特徴量ベクトルがあるかに基づいて、対象物が人間であるか否かを判定する手法である。非線形な識別手法の例としては、ｋ−ＮＮ法、非線型素子を用いたパーセプトロン、ＬＶＱ、非線型ＳＶＭ等が挙げられる。以下、図１３Ａ〜図１３Ｃを参照して、非線型な識別手法の例を説明する。
【０１１２】
図１３Ａは、曲面の識別面を用いた識別手法を説明する図である。空間１３８０は、選択次元によって定義される特徴量空間である。図１３Ａには、空間１３８０は、特徴量次元１と特徴量次元２との２つの選択次元によって定義される平面として示されており、識別面１３６１は、曲線として示されている。図１３Ａおよび後述する図１３Ｂ、図１３Ｃにおいて、○印のそれぞれは、人間を表現する特徴量ベクトルを表し、×印のそれぞれは、人間以外の対象物を表現する特徴量ベクトルを表す。
【０１１３】
識別面１３６１は、特徴量空間１３８０において、人間を表現する特徴量ベクトルと、人間以外の対象物を表現する特徴量ベクトルとを識別する。この例では、人間を表現する特徴量ベクトル（○印）は、識別面１３６１の第１の側（矢印１３６２の側）に位置し、人間以外の対象物表現する特徴量ベクトル（×印）は、識別面１３６１の第２の側（矢印１３６３の側）に位置する。識別面１３６１は、例えば、特徴量次元１と特徴量次元２との値を変数とする式によって表され得る。このような式の係数は、識別パラメータとして識別パラメータ記憶部１５０（図６）に格納される。
【０１１４】
点１３６４は、後に行われる認識処理において入力される画像組（認識用画像組）についての特徴量ベクトルを示す。図１３Ａに示される例では、特徴量ベクトル１３６４は識別面１３６１の第１の側に位置しているので、認識用画像組の対象物は人間であると判定される。
【０１１５】
図１３Ｂは特徴量空間１３８０における距離を用いる識別手法を説明する図である。このような識別手法の例としては、ｋ−ＮＮ法やＬＶＱなどが挙げられる。代表点１３６６は、人間を表現する特徴量ベクトルを代表する点（「人間」というカテゴリを示す代表点）である。代表点１３６６は、例えば、人間を表現するすべての特徴量ベクトルの重心として求められる。同様に、代表点１３６７は、人間以外の対象物を表現する特徴量ベクトルを代表する点（「人間以外」というカテゴリを示す代表点）である。代表点１３６６および代表点１３６７は、特徴量空間においてその点を表す座標によって表される。このような座標は、識別パラメータとして識別パラメータ記憶部１５０（図６）に格納される。
【０１１６】
点１３６５は、後に行われる認識処理において入力される画像組（認識用画像組）についての特徴量ベクトルを示す。この識別手法では、特徴量ベクトルから最も近い代表点の属するカテゴリが、その特徴量ベクトルについての認識結果となる。図１３Ｂに示される例では、特徴量ベクトル１３６５から最も近い代表点（代表点１３６６）の示すカテゴリが「人間」というカテゴリなので、認識用画像組の対象物は人間であると判定される。
【０１１７】
図１３Ｃは、特徴量空間１３８２における特徴量ベクトルの分布を用いた識別手法を説明する図である。このような識別手法の例としては、非線型素子を用いたパーセプトロンなどのニューラルネット等の手法が挙げられる。図１３Ｃにおいて、特徴量空間１３８２は、１つの選択次元（特徴量次元１）によって定義される１次元の空間（すなわち、直線）として示されている。曲線１３６９と曲線１３７０とはそれぞれ、特徴量空間１３８２における人間を表現する特徴量ベクトルの分布の強度（「人間」というカテゴリを示す分布の強度）と、人間以外の対象物を表現する特徴量ベクトルの分布の強度（「人間以外」というカテゴリを示す分布の強度）とを示す。曲線１３６９と曲線１３７０とは、特徴量次元１の値を変数とする式によって表され得る。このような式の係数は、識別パラメータとして識別パラメータ記憶部１５０（図６）に格納される。
【０１１８】
点１３６８は、後に行われる認識処理において入力される画像組（認識用画像組）についての特徴量ベクトルを示す。この識別手法では、特徴量ベクトルの位置において、複数の分布の強度を比較し、最大の分布の強度が示すカテゴリがその特徴量ベクトルについての認識結果となる。図１３Ｃに示される例では、特徴量ベクトル１３６８の位置において、「人間」というカテゴリを示す分布の強度１３６９が、「人間以外」というカテゴリを示す分布の強度１３７０よりも大きいので、認識用画像組の対象物は人間であると判定される。
【０１１９】
このように、識別パラメータは識別平面のみでなく、特徴量空間において異なるカテゴリに属する特徴量ベクトルを識別する任意の識別手法において用いられるパラメータを表現する。
【０１２０】
図１４は、特徴量次元の削除処理を行うことに伴う識別性能の変化を模式的に示す。図１４に示されるように、最初は、特徴量次元を削除することによって、識別性能は向上する。これは、識別に寄与しない特徴量次元を削減することによって、識別に悪影響のある余分な情報（ノイズ）を低減することができるからである。
【０１２１】
一般に、可視光画像と遠赤外光画像のように異なる属性を持つ複数の画像の情報を単純に組み合わせると情報量が増し、識別処理が増加するとともに、学習に必要なサンプル数（学習用の画像組の数）が増加するため、サンプル収集が困難になる。学習用の画像組の数が不足すると、識別性能が悪化する可能性がある。しかしながら、本発明の実施の形態では、可視光画像の情報と遠赤外光画像の情報とを組み合わせた上で、ステップＳ９２〜ステップＳ９６において、特徴量次元が削除される。特徴量次元を削除して識別に有効な情報を選別することにより、後に行われる認識処理における計算量を低減しつつ、識別性能を向上させる（または、維持する）ことが可能になる。
【０１２２】
人間を表現する画像組８５８組と人間以外対象物を表現する画像組１１０５２組を用いて本発明者らがシミュレーションを行った結果、ステップＳ９２〜ステップＳ９６の処理によって、特徴量次元の数が８８％削減できると同時に、誤識別の確率が１／９に低減された。
【０１２３】
ステップＳ９６（図１２）における判定は、例えば、次のようにして行われる。ステップＳ９５で求められる識別性能の変化を監視し、前回のステップＳ９５での識別性能と今回のステップＳ９５での性能を比較し、識別性能が向上または維持していれば基準を満たすと判定し、識別性能が低下していれば基準を満たさないと判定する。このような判定を行う場合には、図１４に点１３０２により示される識別性能の極大値を実現することができる。
【０１２４】
ステップＳ９６（図１２）における判定は、他の様式で行われてもよい。例えば、絶対的な識別性能値（図１４に示される参照番号１３０１）を予め指定し、指定した識別性能を実現できる条件下でできるだけ特徴量次元を削除してもよい。この場合には、その識別性能値を満たす限りにおいて、最大限に特徴量次元の数が削除される（点１３０２）。
【０１２５】
図１５は、ステップＳ９１（図１２）のさらに詳細な処理手順を示す。なお、ステップＳ１０１は、ステップＳ１００１ａ（図１）に対応し、ステップＳ１０２〜Ｓ１０４は、ステップＳ１００１ｂ（図１）に対応する。
【０１２６】
ステップＳ１０１：可視光画像と遠赤外光画像とが入力される。この可視光画像と遠赤外光画像とは、画像組を構成する。ステップＳ１０１において、人間を表現する画像組（可視光画像と遠赤外光画像とが、同一の人間を表現している画像組）と、人間以外の対象物を表現する画像組（可視光画像と遠赤外光画像とが、人間以外の同一の対象物を表現している画像組）とが入力される。このような画像組の例は、図２Ａおよび図２Ｂを参照して上述した。
【０１２７】
ステップＳ１０１で入力される可視光画像と遠赤外光画像とは、遠赤外光カメラ１００と可視光カメラ１１０とを用いて撮影され、いったん学習用の画像組として記憶装置１２０に格納される。あるいは、可視光画像と遠赤外光画像とは、記録媒体（図示せず）から読み出されることによって物体認識装置１に入力されてもよい。
【０１２８】
ステップＳ１０２：画像（可視光画像または遠赤外光画像）ごとに画素値の正規化が行われる。画素値の正規化は、（数２）に従って行われる。
【０１２９】
【数２】
Ｉ’（ｘ，ｙ）＝（Ｉ（ｘ，ｙ）−ｍ）／σ
ここで、
Ｉ（ｘ，ｙ）：正規化の前の画像中の座標（ｘ，ｙ）における画素値
ｍ：画像全体の画素値の平均値
σ：画像全体の画素値の平均値からの標準偏差
Ｉ’（ｘ，ｙ）：画像中の座標（ｘ，ｙ）における正規化された画素値
である。
【０１３０】
ステップＳ１０３：画素値の正規化を行った画像に対して、複数の異なる特性のＧａｂｏｒフィルタが画像中の複数の領域に適用される。
【０１３１】
ステップＳ１０４：Ｇａｂｏｒフィルタを適用した各領域（特定領域）に対応するフィルタ出力値から特徴量ベクトルが求められる。
【０１３２】
図１６は、画像４３１と、画像４３１内の画像フィルタが適用される領域４３２との関係を示す。画像４３１は、ステップＳ１０１において入力される可視光画像または赤外光画像の１つである。領域４３２は、画像フィルタが適用される領域を示す。以下、画像４３１内の画像フィルタが適用される領域４３２を「特定領域」と呼ぶ。図１６を参照しながら、ステップＳ１０３およびステップＳ１０４（図１５）の処理の詳細を説明する。
【０１３３】
図１６に示される例では、特定領域４３２は、一辺の長さがＬの正方形の形状を有する。画像４３１は、高さＨ、幅Ｗのサイズの矩形の形状を有する。画像４３１内に、特定領域４３２が複数設定される。特定領域４３２のそれぞれに、特定領域４３２のサイズに一致するＧａｂｏｒフィルタが適用されることにより、その特定領域４３２のそれぞれにおいてフィルタ出力値が生成される。フィルタ出力値が生成される方法は、図３を参照して上述した。
【０１３４】
フィルタ処理部１２５（図６）は、例えば、特定領域４３２が画像４３１に対して重複を許して画像４３１の全体を覆うように、複数の特定領域４３２を画像４３１中に設定する。例えば、図１６において、特定領域４３２がＬ＝Ｈ／８のサイズ（サイズ１）を有し、Ｗ＝Ｈ／２である場合に、特定領域４３２を縦方向と横方向とにＬ／２ずつ重複するように画像４３１の全面に配置すると、画像４３１の中の特定領域の数は（Ｈ／Ｌ×２−１）×（Ｗ／Ｌ×２−１）＝１５×７＝１０５個になる。１つの特定領域には、特性の異なる（方位選択性の異なる）４個の画像フィルタ（図１１Ａ〜図１１Ｄに示される４種類の方向選択性を有する４個の画像フィルタ）を適用する。
【０１３５】
さらに、サイズの異なる特定領域が、画像４３１の全面に配置される。サイズの異なる特定領域には、サイズの異なる画像フィルタが適用される。特定領域をＬ＝Ｈ／４のサイズ（サイズ２）の正方形とすると、この特定領域を縦方向および横方向にＬ／２ずつ重複して画像４３１の全面を覆うように配置すると、Ｗ＝Ｈ／２の仮定の下で、特定領域の数は、（Ｈ／Ｌ×２−１）×（Ｗ／Ｌ×２−１）＝７×３＝２１個となる。同様に、特定領域をＬ＝Ｈ／２のサイズ（サイズ３）の正方形とすると、特定領域の数は（Ｈ／Ｌ×２−１）×（Ｗ／Ｌ×２−１）＝３×１＝３個になる。
【０１３６】
画像４３１中の３つの異なるサイズ（サイズ１、サイズ２およびサイズ３）の特定領域の数をすべて合わせると１０５＋２１＋３＝１２９個になる。この特定領域のそれぞれに方位選択性の異なる４種類の画像フィルタを適用すると、画像４３１から１２９×４＝５１６個のフィルタ出力が得られる。１つの画像組（学習用の画像組または認識用の画像組）は、可視光画像（輝度画像）と遠赤外光画像とを含む。可視光画像と遠赤外光画像とが同じサイズ（高さＨ、幅Ｗ）を有しており、可視光画像と遠赤外光画像との同じ位置に特定領域を設定する場合に、１つの画像組の２個の画像から得られるフィルタ出力の数は、５１６×２＝１０３２個になる。従って、１つの画像組が１０３２次元の特徴量ベクトルによって表される。あるいは、より高い次元の特徴量空間を設定し、この１０３２次元の特徴量ベクトルをその高次元の特徴量空間に写像することにより、高次元の特徴量ベクトルを生成してもよい。特徴量ベクトルをより高次元の特徴量空間に写像することよって、それぞれの画像組に対応する特徴量ベクトルの間の距離が大きくなるので、後に行われるステップＳ９３（図１２）において識別平面を求めやすいという利点が得られる。
【０１３７】
なお、Ｇａｂｏｒフィルタの方位の数は４に限定されない。Ｇａｂｏｒフィルタの方位の数をＧａｂｏｒフィルタのサイズおよび／または特定領域の位置に依存して変えてもよい。Ｇａｂｏｒフィルタのサイズおよび／または特定領域の位置に応じて方位の数を変えることにより、画像中の特定の位置（例えば、方位の区別を詳細に行いたい位置）および／または特定の空間周波数領域から、より多くの情報を効率よく取得することが可能になる。
【０１３８】
また、画像フィルタのサイズは３種類に限定されない。画像フィルタのサイズは、１種類以上であればよい。可視光画像と遠赤外光画像とで、画素値（輝度）の変化の空間周波数特性が異なる。従って、可視光画像と遠赤外光画像とで、適用される画像フィルタのサイズを変えることによっても、画像から多くの情報を効率よく取得することが可能になる。
【０１３９】
可視光画像と遠赤外光画像について、特定領域のサイズや位置は必ずしも等しくしなくてもよい。可視光画像と遠赤外光画像について、それぞれに適したサイズや位置を設定することで性能向上が期待できる。しかし、両画像に対する特定領域のサイズや配置を等しくすることにより、両画像に対する画像フィルタの適用の処理を同一の手続きで実行できるという利点が得られ、ハードウェア回路やソフトウェアの規模を削減することが可能になる。
【０１４０】
画像フィルタとして、Ｇａｂｏｒフィルタと類似した形状のフィルタや他のエッジを求める画像フィルタが用いられてもよい。さらに、エッジを求めるフィルタ以外の画像フィルタが用いられてもよい。しかし、Ｇａｂｏｒフィルタまたは類似形状の画像フィルタを用いることにより、位置空間と周波数空間との両方の空間で局在する輝度変化の情報を効率よく取得することが可能になる。従って、Ｇａｂｏｒフィルタまたは類似形状の画像フィルタを用いた場合には、ｓｏｂｅｌフィルタなどのエッジフィルタを用いた場合に比較して効率よく特定の空間周波数において空間的変化の情報を取得することが可能になる。その結果、可視光画像と遠赤外光画像のように異なる性質の画像を組み合わせることによって増加した情報量の中から、効率的に認識のために有効な情報を取得することが可能になる。可視光画像の情報と遠赤外光画像の情報とは、遠赤外光画像から得られる温度の情報を用いることなく、効果的に組み合わされ、認識のために使用され得る。
【０１４１】
また、図１６を参照して説明した例では、１つのサイズの画像４３１に複数のサイズの特定領域（Ｇａｂｏｒフィルタを適用する領域）が設定された。しかし、予め同一対象を異なる解像度で撮影したサイズの異なる複数の画像のそれぞれに、同じサイズの特定領域（Ｇａｂｏｒフィルタを適用する領域）を設定することによっても、同様の結果を得ることができる。
【０１４２】
遠赤外光画像と可視光画像とにおいて、必ずしも同じ位置に対象物が写っている必要はない（２枚の画像の間で上下方向および／または左右方向にずれていてもよい）。他の対象物を撮影した場合にも同じ位置関係が保たれる限り、可視光画像と遠赤外光画像の中における対象物の位置は一致する必要はない。本発明の学習処理および認識処理では、遠赤外光画像中の領域と、可視光画像中の領域とを対応付ける必要がないからである。
【０１４３】
図１７は、ステップＳ９５（図１２）における、識別性能を評価する処理の詳細な手順を示す。
【０１４４】
ステップＳ１１１：特徴量ベクトルの成分のうち、ステップＳ９２で指定された特徴量次元に対応する成分のみが有効にされる。この特徴量ベクトルとしては、ステップＳ９１（図１２）において求められた特徴量データＦに含まれるすべての特徴量ベクトルまたは特徴量データＦに含まれる一部の特徴量ベクトルが用いられる。
【０１４５】
ステップＳ１１２：予め定められ識別手法を用いて、人間を表現する特徴量ベクトルと、人間以外の対象物を表現する特徴量ベクトルとを識別する学習が行われる。予め定められ識別手法とは、後に行われる認識処理において、対象物が人間であるか否かを判定するために用いられる手法である。
【０１４６】
ステップＳ１１３：評価用の特徴量データ（特徴量ベクトルの集合）を用いて、識別性能が算出される。評価用の特徴量データとしては、ステップＳ９１（図１２）において求められた特徴量データＦを用いてもよいし、図１２に示される学習処理において使用しなかった特徴量データを用いてもよい。あるいは、予め評価用の特徴量ベクトルの集合を特徴量データＦと同様の手続きで別途作成しておいてもよい。識別性能は、例えば、ステップＳ９４（図１２）で設定された次元を有効にした特徴量ベクトル(人間を表現する特徴量ベクトルと人間以外の対象物を表現する特徴量ベクトルとを含む)をステップＳ１１２における学習が終った後の識別手法を用いて正しく識別できた割合として表される。
【０１４７】
＜認識処理＞
図１８は、物体認識装置１が実行する認識処理の詳細な手順を示す。ステップＳ１２１は、ステップＳ１００２ａ（図１）に対応しており、ステップＳ１２２〜ステップＳ１２３は、ステップＳ１００２ｂ（図１）に対応しており、ステップＳ１２４〜ステップＳ１２５は、Ｓ１００２ｃ（図１）に対応している。
【０１４８】
ステップＳ１２１：可視光画像と遠赤外光画像とが入力される。この画像入力は、学習処理におけるステップＳ１０１（図１５）と同様に、可視光カメラ１１０と遠赤外光カメラ１００（図６）とにより行われる。
【０１４９】
ステップＳ１２２：画像（可視光画像および遠赤外光画像）から、認識対象領域が切り出される。認識対象領域は、その認識対象領域の形状が学習処理において使用した画像の形状に一致するように切り出される。認識対象領域は、画像中で固定しておいてもよいし、１つの画像から複数の認識対象領域が切り出されてもよい。切り出される認識対象領域の形状は、図１６を参照して説明した例では、縦横比がＨ対Ｗの矩形である。学習処理における可視光画像の形状と認識処理において可視光画像から切り出される認識対象領域の形状とが同じであり、かつ、学習処理における遠赤外光画像の形状と認識処理において遠赤外光画像から切り出される認識対象領域の形状とが同じである限り、可視光画像から切り出される認識対象領域の形状と、遠赤外光画像から切り出される認識対象領域の形状とが異なっていてもよい。
【０１５０】
ステップＳ１２２における切り出しは、学習処理においてステップＳ１０１（図１５）で入力された可視光画像と遠赤外光画像との撮影位置を考慮して行われる。具体的には、学習処理における可視光画像と、認識処理において可視光画像から切り出された認識対象領域とで、同じ位置に対象物が写るように、ステップＳ１２２における切り出しが行われる。もちろん、認識処理において可視光画像を撮影する可視光カメラ１１０を切り出しが必要でないように設置していてもよい。遠赤外光画像についても同様である。可視光画像と遠赤外光画像との拡大率についても同様で、可視光画像と遠赤外光画像の両者の拡大率(画素数)が異なっていてもよいが、可視光画像の拡大率と遠赤外光画像の拡大率(画素数)との比は、学習処理と認識処理とで同じになるように調整される。
【０１５１】
次に、必要に応じて、切り出した可視光画像と遠赤外光画像の大きさが正規化される。可視光画像、遠赤外光画像ともに切り出した形状が縦横比２：１の矩形の場合、例えば、縦６４画素、横３２画素の矩形に大きさが正規化される。画像の大きさを正規化することによって、次のステップＳ１２３において画像に適用されるＧａｂｏｒフィルタの大きさ(フィルタを作用させる特定領域の大きさ)は、固定され得る。切り出され、大きさが正規化された可視光画像と遠赤外光画像とは、認識用の画像組を構成する。
【０１５２】
ステップＳ１２３：切り出した画像から、ステップＳ９７（図１２）で決定された選択次元に対応する特徴量ベクトルが求められる。特徴量ベクトルは、学習処理において用いられたものと同じＧａｂｏｒフィルタを用いて算出される。上述したステップＳ１２２において画像の大きさが正規化されていない場合には、画像の大きさに応じたサイズのＧａｂｏｒフィルタが適用される。
【０１５３】
学習処理において特徴量次元の一部を削除している場合には、削除した特徴量次元に対応するＧａｂｏｒフィルタの演算処理は不要であるので、予め特徴量ベクトルの算出処理から除いておく。
【０１５４】
ステップＳ１２４：特徴量ベクトルと識別平面の係数との重み付け和を用いて、類似度が求められる。類似度は、認識用の画像組の対象物が人間に似ている度合いを表す。既に述べたように、特徴量ベクトルと識別平面の係数との重み付け和は、特徴量ベクトルと識別平面との距離（位置関係）を表す。この距離は、識別平面で区切られる一方の空間側（例えば、人間を表す特徴量ベクトルが位置する第１の側）にある場合に正の値、反対側（例えば、人間以外の対処物を表す特徴量ベクトルが位置する第２の側）にある場合に負の値として表すことができる。特徴量ベクトルが識別平面から離れるほど、距離の絶対値が大きくなる。
【０１５５】
ステップＳ１２５：類似度に基づいて、人間が認識される（すなわち、対象物が人間であると判定される）。例えば、類似度（特徴量ベクトルと識別平面との距離）が正であるか負であるかに基づいて（すなわち、特徴量ベクトルが、識別平面どちらの側に位置するかに基づいて）、対象物が人間であるか否かの判定がなされる。あるいは、類似度が正であり（すなわち、特徴量ベクトルが、識別平面の第１の側にあり）、かつ、類似度が所定の閾値以上である場合に、対象物が人間であると判定するようにしてもよい。このような閾値は、認識の精度についての要求（例えば、人間でない対象物を人間と誤認識する可能性を低減することが望まれるのか、人間を人間でないと誤認識する可能性を低減することが望まれるのか）に応じて設定され得る。類似度を示す数値が表示部１７０に表示されてもよい。
【０１５６】
このように、本発明の物体認識装置１（図６）は、学習用の画像組の遠赤外光画像と可視光画像とを用いて識別パラメータ（例えば、識別平面を表すパラメータ）を求め、その識別パラメータを判定基準として用いて認識用の画像組の遠赤外光画像と可視光画像との対象物を認識する（対象物が特定のカテゴリに属するか否かを判定する）。対象物の認識は、対象物から放射または反射される可視光線の強度（第１の属性）と、対象物から放射または反射される遠赤外光線の強度（第２の属性）とに基づいて行われるので、対象物の認識の信頼度が高くなる。
【０１５７】
本発明者らは、屋外で昼夜に撮影した可視光画像と遠赤外光画像の組（人間を表す画像組８５８組と人間以外の対象物を表す画像組１１０５２組）を学習用画像組として用いて、上述した学習処理および認識処理のシミュレーションを行った。シミュレーションの結果、誤認識率は０．２％であった。この誤認識率は、可視光画像のみを用いて学習処理および認識処理を行った比較例における誤検出率（２．７％）および遠赤外光画像のみを用いて学習処理および認識処理を行った比較例における誤認識率（３．５％）に比較して、非常に低い（１／１０以下）値である。このように、高い対象物の認識の信頼度が実現される。
【０１５８】
本発明の物体認識装置１は、可視光画像と遠赤外光画像とを用いて学習処理を行うことにより、可視光画像と遠赤外光画像との間の相関関係を学習することができ、その相関関係が認識処理に反映される。例えば、日中の屋外で対象物（人間）を撮影した可視光画像と遠赤外光画像とを考える。対象物に直射日光が当たるような環境条件下では、可視光画像中の対象物の輝度は高くなると同時に、遠赤外光画像は対象物の温度が高いことを示す。一方、対象物に直射日光が当たらないような環境条件下では、可視光画像中の対象物の輝度は低くなると同時に、遠赤外光画像は対象物の温度が低いことを示す。
【０１５９】
このような様々な環境条件下で撮影された可視光画像と遠赤外光画像とを用いることにより、本発明の物体認識装置１は、可視光画像と遠赤外光画像との間の相関関係を学習することができる。その結果、例えば、認識画像組の可視光画像中の対象物の輝度が高く、かつ、遠赤外光画像は対象物の温度が低い場合（対象物が人間である場合には起こり得ない事象が発生した場合）に、その対象物を人間であると誤認識する可能性は低い。
【０１６０】
可視光画像のみを用いて学習処理および認識処理を行う認識システムでは、様々な環境条件下で撮影された可視光画像を用いて学習処理を行うことにより、対象物を人間として認識するための許容範囲が広くなる。その結果、人間でない対象物を人間であると誤認識する可能性が高くなる。遠赤外光画像のみを用いて学習処理および認識処理を行う認識システムについても同様である。
【０１６１】
本発明の物体認識装置１によれば、様々な環境条件（例えば、照明の条件、温度の条件）において撮影された学習用の画像組を用いて学習処理を行うことにより、その様々な環境条件において撮影された認識用の画像組の対象物を正しく認識することができるようになる。このような特徴は、屋外における侵入者の監視システムや、自動車等の移動体に搭載される歩行者の検出システムや、移動ロボットに搭載される視覚システム等の、変動する環境条件下で対象物を正しく認識することが要求される用途に特に適している。
【０１６２】
さらに、上述した本発明の学習処理および認識処理は、対象物の属性に特化した処理ではない。これは、本発明の物体認識装置１が、上述した本発明の学習処理および認識処理を変更することなく、人間の認識以外の用途（例えば、動物を認識する用途または車両を認識する用途）にも適用できることを意味する。このように、本発明の物体認識装置１は、認識が行われる環境条件が変化した場合および認識の対象物が変化した場合の初期設定が容易である。
【０１６３】
本発明の物体認識装置１では、遠赤外線画像から特定の温度領域を抽出する処理は必要ではない。従って、時間経過とともに変化する遠赤外線カメラの光学系や回路、素子等の温度の影響をキャンセルするためのキャリブレーション処理を行う必要がなく、物体認識装置１の構成およびその保守を簡素化できるという利点が得られる。
【０１６４】
上述した実施の形態では、学習処理のステップＳ９７（図１２）において、選択次元が１セットに定められていた。しかし、選択次元のリスト（選択次元のセット）を複数セット用意して、各セットごとに識別パラメータが設定されていてもよい。この場合、認識処理において、いずれか１つのセットの識別パラメータを用いて求められた類似度に基づいて人間の認識を行ってもよいし、複数のセットのそれぞれの識別パラメータを用いて求められた類似度の和（平均値）に基づいて人間の認識を行ってもよい。あるいは、複数のセットのそれぞれの識別パラメータを用いて求められた類似度に基づいて人間の認識を行い、その認識結果の多数決を行ってもよい。
【０１６５】
上述した実施の形態では、ステップＳ１２２において切り出された認識対象領域について、ステップＳ１２３においてＧａｂｏｒフィルタを用いて特徴量ベクトルを求めていた。しかし、切り出しを行う前の画像にＧａｂｏｒフィルタを適用してもよい。この場合、フィルタを作用させる特定領域を予め画像全体に設定してＧａｂｏｒフィルタを適用し、画像全体の各位置に対するフィルタ出力を事前に得ておく。次に、画像中で検出の対象領域となる場所のフィルタ出力のみを用いて特徴量ベクトルを算出する。このように予めフィルタ出力を求めておくことにより、画像中の広範囲を走査しながら切り出しと認識の手続きを順次繰り返す場合等に、画像中の同一個所に同一のＧａｂｏｒフィルタを適用するフィルタ演算を複数回行う無駄を回避することができる。なお、画像中の広範囲を走査しながら切り出しと認識の手続きを順次繰り返す処理によって、対象物がどこに写っているのか未確定な画像から、人間を検出することができる。このような処理を行う場合には、物体認識装置１（図６）を物体検出装置として機能させることができる。
【０１６６】
学習処理において用いられる人間以外の対象物を表現する画像組(可視光画像と遠赤外光画像の組)は、木や犬等の実在する人間以外の対象物を撮影することによって物体認識装置１に入力されてもよい。あるいは、人間を表現する画像組の可視光画像と遠赤外光画像とに対して変換処理を施すことによりそれぞれ生成される可視光画像と遠赤外光画像との組が、人間以外の対象物を表現する画像組として学習処理において用いられてもよい。このような変換処理の例としては、画像にアフィン変換を施す処理および／または画像にノイズを付加する処理が挙げられる。変換処理後の画像は、人間を表現する画像に比較的類似した画像である。このような変換処理後の画像を学習処理において用いることにより、少しでも人間の形状とは異なる形状の対象物は人間として認識しないような判定基準を学習することができる。
【０１６７】
上述した実施の形態では、ほぼ同時刻に撮影した可視光画像と遠赤外光画像の２画像を組み合わせて学習処理および認識処理を行っていた。組み合わせる画像の数は２枚に限定されない。また、可視光画像として、輝度画像に代えて、カラー画像を用いてもよい。この場合、カラー画像をＲＧＢ（対象物から放射または反射される、異なる３種類の波長帯域の光の強度を表す３種類の画像）の３つの画像で表現すると、Ｒの画像とＧの画像とＢの画像の３画像と遠赤外光画像の計４個の画像が１つの画像組（学習用の画像組および認識用の画像組）として物体認識装置１に入力される。４個の画像を入力とした場合の学習処理および認識処理は、可視光画像と遠赤外光画像との２つの画像を入力した場合の学習処理および認識処理と同様である。
【０１６８】
物体認識装置１に入力される画像組が、異なる時刻に撮影した画像を含んでいてもよい。例えば、入力部１９０は、時刻Ｔ（第１の時刻）に撮影した可視光画像（第１の画像）および遠赤外光画像（第２の画像）と、時刻Ｔ＋ｔ（第１の時刻から所定の時間後の時刻）に撮影した可視光画像（第５の画像）および遠赤外光画像（第６の画像）との４枚の画像を１つの認識用の画像組として物体認識装置１に入力するように構成されていてもよい。もちろん、この場合には、学習用の画像組のそれぞれも、同一の時刻に撮影された可視光画像および遠赤外光画像と、その時刻からｔ時間後に撮影された可視光画像および遠赤外光画像とを含んでいなければならないことは言うまでもない。４個の画像を入力とした場合の学習処理および認識処理は、可視光画像と遠赤外光画像との２つの画像を入力した場合の学習処理および認識処理と同様である。
【０１６９】
このように、撮影時刻の異なる可視光画像と遠赤外光画像を組み合わせることで、歩行者などのように時間とともに形状が特定の様態で変化する対象物と、時間とともに形状が変化しない対象物および時間とともに形状が異なった様態で変化する対象物とを区別することが可能になり、認識の精度が向上する。
【０１７０】
所定の時間ｔを短くすると速い動きを伴う対象物を効率よく認識することができ、所定の時間ｔを長くすると遅い動きを伴う対象物を効率よく認識することができる。通常、屋外において動きを伴う人間、車両または動物等を認識する場合には、所定の時間ｔを１秒以下に設定することにより、形状および／または位置が時間とともに変化する対象物を効果的に認識することができる。このように、複数の時刻における画像の情報を効果的に組み合わせることにより、識別性能が向上する。
【０１７１】
本発明の物体認識装置１によれば、画像組に含まれる画像の数が増えた場合にも、識別に寄与する特徴量次元（フィルタ出力）だけが選択されるので、画像の数の増加による認識処理の計算量の増加が抑制される。
【０１７２】
物体認識装置１に入力される画像組が、視点の異なる画像を含んでいてもよい。例えば、同じ位置から撮影した可視光画像および遠赤外光画像と、それとは異なる位置から撮影した可視光画像および遠赤外光画像とが１つの画像組を構成してもよい。
【０１７３】
図１９は、遠赤外光カメラと可視光カメラとの配置のさらに他の例を示す。図１９に示される例では、遠赤外光カメラ１００ａと可視光カメラ１１０ａとがＡ地点に配置されており、遠赤外光カメラ１００ｂと可視光カメラ１１０ｂとがＢ地点に配置されている。４つのカメラは、同一の対象物を撮影するように配置されている。遠赤外光カメラ１００ａ、１００ｂおよび可視光カメラ１１０ａ、１００ｂは、全体として、物体認識装置１（図６）に学習用の画像組および認識用の画像組を入力する入力部１９０（図６）として機能する。
【０１７４】
入力部１９０は、認識処理においては、Ａ地点（第１の場所）から撮影された可視光画像（第１の画像）および遠赤外光画像（第２の画像）に加えて、さらに、Ｂ地点（第２の場所）から撮影された可視光画像（第５の画像）および遠赤外光画像（第６の画像）との４枚の画像を１つの認識用の画像組として物体認識装置１に入力するように構成されている。４個の画像を入力とした場合の学習処理および認識処理は、可視光画像と遠赤外光画像との２つの画像を入力した場合の学習処理および認識処理と同様である。
【０１７５】
このように、異なる場所から撮影された可視光画像と遠赤外光画像を組み合わせることにより、人物のように見る向きによって異なる形状を有する物体を認識する精度が向上する。
【０１７６】
本発明の物体認識装置１によれば、画像組に含まれる画像の数が増えた場合にも、識別に寄与する特徴量次元（フィルタ出力）だけが選択されるので、画像の数の増加による認識処理の計算量の増加が抑制される。
【０１７７】
また、異なる位置から撮影した可視光画像と遠赤外光画像とが１つの画像組を構成してもよい。
【０１７８】
図２０は、遠赤外光カメラ１００と可視光カメラ１１０との配置のさらに他の例を示す。図２０に示される例では、可視光カメラ１１０がＣ地点に配置されており、遠赤外光カメラ１００がＤ地点に配置されている。２個のカメラは、同一の対象物を撮影するように配置されている。
【０１７９】
図２０に示される例では、遠赤外光カメラ１００と可視光カメラ１１０とは、Ｃ地点（第１の場所）から可視光画像（第１の画像）を撮影し、Ｄ地点（第２の場所）から遠赤外光画像（第２の画像）を撮影するように構成されている。このような構成によれば、可視光画像と遠赤外光画像とで、対象物の背景が異なるようにすることができる。可視光画像と遠赤外光画像と共通する背景の余分な情報が認識結果に悪影響を及ぼす可能性が低下し、認識結果が背景の影響を受けにくくなるという利点が得られる。また、図１９に示される例と同様に、異なる複数の視点から撮影した画像を用いることで、人物のように見る向きによって異なる形状を有する対象物の認識精度が向上する。
【０１８０】
上述した実施の形態では、異なる属性を用いて対象物を表現する画像の例として、可視光画像と遠赤外光画像とを挙げたが、本発明はこれに限定されない。可視光画像と近赤外光画像とで画像組を構成するようにしてもよいし、可視光画像と紫外光画像とで画像組を構成するようにしてもよい。あるいは、可視光画像と距離画像とで画像組を構成するようにしてもよい。距離画像における画素値は、撮影点から対象物までの距離を示す。距離画像は、撮影点から対象物までの距離という属性を用いて対象物を表現する画像であるということができる。
【０１８１】
上述した本発明の学習処理および認識処理は、対象物の属性に特化した処理ではないので、可視光画像と遠赤外光画像以外の種類の画像を用いた場合にも、上述した本発明の学習処理および認識処理を変更する必要はない。
【０１８２】
本発明の学習処理と認識処理とは、典型的には、コンピュータ上のソフトウェアによって実現される。しかし、本発明の学習処理と認識処理とをハードウェアによって実現してもよいし、ソフトウェアとハードウェアとの組み合わせによって実現してもよい。さらに、物体認識装置１（図６）が学習処理を実行することは必須ではない。なぜなら、物体認識装置１は、学習処理の結果（識別パラメータ）が識別パラメータ記憶部１５０に格納されてさえいれば、認識処理（図１に示されるステップＳ１００２ａ〜Ｓ１００２ｃの処理）を実行することができるからである。そのような識別パラメータは、予め定められたパラメータであり得る。あるいは、そのような識別パラメータは、物体認識装置１とは別の装置を使用して学習処理を行うことによって求められてもよい。そのような学習処理の結果として得られる識別パラメータを物体認識装置１の識別パラメータ記憶部１５０に格納することにより、物体認識装置１はその識別パラメータを判定基準として用いた認識処理を行うことが可能になる。
【０１８３】
物体認識装置１が学習処理を行わない場合には、学習処理部１３０および記憶装置１２０は省略されてもよい。
【０１８４】
本発明の学習処理と認識処理との一方またはその両方の一部または全部を表現するプログラム（学習プログラム、認識プログラム）は、例えば、学習処理部１３０内のメモリ（図示せず）または認識処理部１４０内のメモリ（図示せず）に格納され得る。あるいは、そのようなプログラムは、フレキシブルディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭなどの任意のタイプのコンピュータ読み取り可能な記録媒体に記録され得る。そのような記録媒体に記録された学習プログラムまたは認識プログラムは、ディスクドライブ（図示せず）を介してコンピュータのメモリにロードされる。あるいは、学習プログラムまたは認識プログラム（またはその一部）は、通信網（ネットワーク）または放送を通じてコンピュータのメモリにダウンロードされてもよい。コンピュータに内蔵されるＣＰＵが学習プログラムまたは認識プログラムを実行することによって、そのコンピュータは物体認識装置として機能する。
【０１８５】
【発明の効果】
本発明によれば、認識のために入力される画像組は、対象物を第１の属性を用いて表現する第１の画像と、その対象物を第１の属性とは異なる第２の属性を用いて表現する第２の画像とを含む。対象物が特定のカテゴリに属するか否かの判定は、第１の属性と第２の属性とに基づいて行われるので、対象物の認識の信頼度が高くなる。さらに、その所定の数の画像の予め定められた位置に予め定められた画像フィルタを適用することによって得られるフィルタ出力値を成分として有する特徴量空間内の特徴量ベクトルが求められ、画像組は、この特徴量ベクトルによって表される。この処理には、第１の画像の領域と第２の画像の領域とを対応付ける処理は必要でないので、対象物の認識を行うための初期設定が容易であり、認識結果は、環境条件の影響を受けにくい。
【図面の簡単な説明】
【図１】本発明の物体認識方法の全体の処理手順を示すフローチャート
【図２Ａ】ステップＳ１００１ａ（図１）において入力される画像組６１０〜６１３を示す図
【図２Ｂ】ステップＳ１００１ａ（図１）において入力される画像組６６０〜６６３を示す図
【図３】画像３５１に画像フィルタ３５４を適用する例を示す図
【図４】ステップ１００１ｂ（図１）において各画像組について求められた特徴量ベクトルを特徴量空間７０１にプロットした状態を示す図
【図５】ステップ１００２ａ（図１）において入力される認識用の画像組５１０の例を示す図
【図６】本発明の実施の形態の物体認識装置１の構成を示すブロック図
【図７Ａ】遠赤外光カメラ１００と可視光カメラ１１０との配置の例を示す図
【図７Ｂ】遠赤外光カメラ１００と可視光カメラ１１０との配置の他の例を示す図
【図７Ｃ】遠赤外光カメラ１００と可視光カメラ１１０とに代えて、その両方の機能を併せ持つ可視光・遠赤外光カメラ２１０が用いられる例を示す図
【図８Ａ】可視光カメラ１１０によって撮影された、人間の対象物を表現する可視光画像８０３の例を示す図
【図８Ｂ】遠赤外光カメラ１００によって撮影された、人間の対象物を表現する遠赤外光画像８０４の例を示す図
【図９Ａ】可視光カメラ１１０によって撮影された、人間以外の対象物（木）を表現する可視光画像８０５の例を示す図
【図９Ｂ】遠赤外光カメラ１００によって撮影された、人間以外の対象物を表現する遠赤外光画像８０６の例を示す図
【図１０Ａ】フィルタ処理部１２５（図６）において用いられる画像フィルタの特性を模式的に示す図
【図１０Ｂ】水平方向のエッジを選択的に検出する画像フィルタの特性を模式的に示す図
【図１０Ｃ】左下から右上に延びるエッジを選択的に検出する画像フィルタの特性を模式的に示す図
【図１０Ｄ】右下から左上に延びるエッジを選択的に検出する画像フィルタの特性を模式的に示す図
【図１１Ａ】Ｇａｂｏｒフィルタのフィルタ係数の例を示す図
【図１１Ｂ】Ｇａｂｏｒフィルタのフィルタ係数の例を示す図
【図１１Ｃ】Ｇａｂｏｒフィルタのフィルタ係数の例を示す図
【図１１Ｄ】Ｇａｂｏｒフィルタのフィルタ係数の例を示す図
【図１２】物体認識装置１が実行する学習処理の詳細な手順を示すフローチャート
【図１３Ａ】曲面の識別面を用いた識別手法を説明する図
【図１３Ｂ】特徴量空間１３８０における距離を用いる識別手法を説明する図
【図１３Ｃ】特徴量空間１３８２における特徴量ベクトルの分布を用いた識別手法を説明する図
【図１４】特徴量次元の削除処理を行うことに伴う識別性能の変化を模式的に示す図
【図１５】ステップＳ９１（図１２）のさらに詳細な処理手順を示すフローチャート
【図１６】画像４３１と、画像４３１内の画像フィルタが適用される領域４３２との関係を示す図
【図１７】ステップＳ９５（図１２）における、識別性能を評価する処理の詳細な手順を示すフローチャート
【図１８】物体認識装置１が実行する認識処理の詳細な手順を示すフローチャート
【図１９】遠赤外光カメラと可視光カメラとの配置のさらに他の例を示す図
【図２０】遠赤外光カメラ１００と可視光カメラ１１０との配置のさらに他の例を示す図
【符号の説明】
１物体認識装置
１００遠赤外光カメラ
１１０可視光カメラ
１２０記憶装置
１２５フィルタ処理部
１３０学習処理部
１４０認識処理部
１５０識別パラメータ記憶部
１６０ワークメモリ
１７０表示部
１９０入力部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an apparatus and method for recognizing an object belonging to a specific category, and more particularly to an apparatus and method for recognizing an object using a plurality of images that represent the object using different attributes.
[0002]
[Prior art]
As a conventional technique for recognizing an object using a plurality of images representing the object using different attributes, a technique disclosed in Japanese Patent Application Laid-Open No. 8-287216, “Intrafacial region recognition method” is known. . In this prior art, a part (for example, mouth) in the face is obtained from a far-infrared light (light having a wavelength of 8 to 10 μm) image and a visible light image (a plurality of images expressing an object using different attributes). Be recognized. The far-infrared image represents the intensity of far-infrared light emitted from the object. Since the intensity of the far-infrared light emitted from the object can be correlated with the temperature of the object, a specific temperature from the far-infrared light image (for example, about 36 ° C., which is a normal temperature of human skin). Can be extracted.
[0003]
Using only a temperature image makes it difficult to accurately detect a person when an object with the same temperature as the person (such as an electrical appliance in a room) exists around the object. Reliable detection was realized with reference to the area.
[0004]
[Problems to be solved by the invention]
In the prior art described in the above publication, the skin temperature region extracted from the far-infrared image and the skin color region extracted from the visible light image are associated with each other in order to specify the position of the part to be recognized. In order to perform such association, (1) extraction of a skin temperature region (a region having a temperature of about 36 ° C.) from a far-infrared image and extraction of a skin color region from a visible light image are accurately performed. (2) It is necessary to associate the pixels in the far-infrared image with the pixels in the visible light image in advance.
[0005]
In order to associate the pixels in the far-infrared image with the pixels in the visible light image in advance, it is necessary to accurately align the optical axes of the visible light camera and the far-infrared light camera. There is a problem that the initial setting for recognition becomes complicated.
[0006]
In order to accurately extract the skin temperature region from the far-infrared image, it is necessary to frequently perform calibration to cancel the influence of the temperature of the optical system, circuit, element, etc. of the far-infrared camera that changes over time. . Alternatively, in order to eliminate the influence of the temperature of these optical systems, circuits, elements, etc., it is necessary to keep the entire far-infrared camera at a constant temperature (for example, to cool it). As a result, there is a problem that initial setting and maintenance of a recognition system including a far-infrared camera are complicated and costly.
[0007]
In addition, skin temperature varies greatly due to the effects of solar radiation and temperature. Especially in the outdoors, the skin temperature tends to deviate from the standard temperature around 36 ° C. and changes greatly in time during the day according to changes in conditions such as solar radiation and temperature. Thus, when the skin temperature changes, it becomes difficult to accurately extract the skin temperature region from the far-infrared image. In order to accurately extract the skin temperature region under various changing environmental conditions, an extraction algorithm corresponding to each condition must be prepared, and there is a problem that initial setting of the recognition system is not easy.
[0008]
Even in the visible light image, under the environment that is susceptible to the influence of artificial lighting such as sunlight and car headlights, such as outdoors, it is caused by the limitation of the dynamic range of the camera and the spectral distribution of the light source. It is difficult to always accurately detect the color of the object. In order to accurately extract the skin color region under various changing environmental conditions, an extraction algorithm corresponding to each condition must be prepared, and there is a problem that initial setting of the recognition system is not easy.
[0009]
Furthermore, the extraction of the skin temperature region from the far-infrared image and the extraction of the skin color region from the visible light image are both processing specialized for the attributes of the individual objects. Such processing does not work well when the recognition target changes. For example, in order to apply this prior art to animal recognition, the region extraction algorithm must be changed. Since an extraction algorithm must be prepared for each individual recognition target, initial setting of the recognition system is not easy.
[0010]
As described above, according to the related art, the initial setting for recognizing the object is not easy due to the necessity of the process of associating the region of the far-infrared image with the region of the visible light image, There is a problem of being easily affected by environmental conditions.
[0011]
The present invention has been made in view of such a problem, and has an object recognition device, a method for recognizing an object, a program, and an object recognition device that have high recognition reliability, are easily initialized, and are not easily affected by environmental conditions. An object is to provide a recording medium.
[0012]
[Means for Solving the Problems]
  The object recognition apparatus of the present invention is a first object.First image data which is image data of a visible light image ofSaid first objectSecond image data which is image data of a far-infrared light image ofAndIncludeFirst imagedataAn input unit for inputting a set;The input unit inputs the first image data set and is inputThe first imagedatasetIn the first image data and the second image dataIn at least one predetermined position, Orientation selectivity, position selectivity, at least one selectivity of spatial frequency characteristicsAt least one image filterRespectivelyBy applyingFrom the first image data and the second image data, respectivelyBased on the relationship between the feature quantity vector calculation unit for obtaining the first feature quantity vector in the feature quantity space, having at least one obtained filter output value as a component, and the first feature quantity vector and a predetermined identification parameter And a determination unit that determines whether or not the first object belongs to a specific category, thereby achieving the above object.
[0013]
  The first image data represents the first object by the intensity of visible light emitted or reflected from the first object, and the second image data is the first object. The first object may be expressed by the intensity of far-infrared light emitted or reflected from the object.
[0014]
  The input unit further inputs a second image data set and a third image data set each consisting of a plurality of image data to the feature vector calculation unit, and the second image data set and the third image set Each of the data sets is third image data that is image data of a visible light image of a second object belonging to the specific category, and image data of a far-infrared light image of the second object. And the feature quantity vector calculation unit includes the third image data and the fourth image data for each of the input second image data set and third image data set. Applying at least one image filter having the same selectivity as the image filter to at least one predetermined position in the third image data and the fourth image data A feature amount vector in the feature amount space having at least one filter output value respectively obtained from the image data as a component, and at least one feature amount vector in the feature amount space for the second image data set; And a learning unit for obtaining the identification parameter so as to identify at least one feature vector in the feature space for the third image data set..
[0015]
  The first object may be a human.
[0016]
  The learning unit includes a feature amount vector for the second image data set, a feature amount vector for the third image data set, and a feature amount vector in a temporary feature amount space having a larger number of dimensions than the feature amount space. The feature amount space may be defined by deleting at least one dimension from the temporary feature amount space based on a normal direction of a plane for identifying.
[0017]
  The identification parameter represents an identification plane in the feature amount space, and the determination unit determines the first feature vector based on which side the first feature amount vector is located with respect to the identification plane. It may be determined whether the object belongs to the specific category.
[0018]
  The determination unit may determine that the first object belongs to the specific category when a distance between the first feature vector and the identification plane is equal to or greater than a predetermined threshold..
[0019]
  The input unit includes the first object.It is image data of visible light image of5th imagedataAnd the first object objectIt is image data of far-infrared light image6th imagedataAnd furtherIn the feature vector calculatorinputIs supposed to,The fifth image data and the sixth image data are:The first imagedataAnd the second imagedataTogaFilmed1st timeFromTaken after a predetermined timeMay have been.
[0020]
  The first image data may be captured from a first location, and the second image data may be captured from a second location different from the first location..
[0021]
  The input unit includes the first object.It is image data of visible light image of5th imagedataAnd the first objectIt is image data of far-infrared light image6th imagedataAnd furtherIn the feature vector calculatorinputThe fifth image data and the sixth image data areThe first imagedataAnd the second imagedataTogaTo be photographedFirstWhat is a place?Taken from a different second locationMay be.
[0022]
  The present inventionThe method of recognizing an object is(A)FirstObjectsFirst image data which is image data of a visible light image ofSaid first objectSecond image data which is image data of a far-infrared light image ofAndIncludeFirst imagedataInputting a set; (b)Each of the input first image data and second image dataIn at least one predetermined position, Orientation selectivity, position selectivity, at least one selectivity of spatial frequency characteristicsAt least one image filterRespectivelyBy applyingFrom the first image data and the second image data, respectivelyObtaining a first feature quantity vector in the feature quantity space having at least one obtained filter output value as a component; and (c) based on a relationship between the first feature quantity vector and a predetermined identification parameter, Determining whether the first object belongs to the specific category, thereby achieving the above object.
[0023]
  The program of the present invention is a program for causing a computer to execute object recognition processing, and the object recognition processing includes:FirstObjectsFirst image data which is image data of a visible light image ofSaid first objectSecond image data which is image data of a far-infrared light image ofAndIncludeFirst imagedataInputting a set; (b)Each of the input first image data and second image dataIn at least one predetermined position, Orientation selectivity, position selectivity, at least one selectivity of spatial frequency characteristicsAt least one image filterRespectivelyBy applyingFrom the first image data and the second image data, respectivelyObtaining a first feature quantity vector in the feature quantity space having at least one obtained filter output value as a component; and (c) based on a relationship between the first feature quantity vector and a predetermined identification parameter, Determining whether the first object belongs to the specific category, thereby achieving the above object.
[0024]
  The recording medium of the present invention is a computer-readable recording medium that records a program for causing a computer to execute object recognition processing. The object recognition processing includes:FirstObjectsFirst image data which is image data of a visible light image ofSaid first objectSecond image data which is image data of a far-infrared light image ofAndIncludeFirst imagedataInputting a set; (b)Each of the input first image data and second image dataIn at least one predetermined position, Orientation selectivity, position selectivity, at least one selectivity of spatial frequency characteristicsAt least one image filterRespectivelyBy applyingFrom the first image data and the second image data, respectivelyObtaining a first feature quantity vector in the feature quantity space having at least one obtained filter output value as a component; and (c) based on a relationship between the first feature quantity vector and a predetermined identification parameter, Determining whether the first object belongs to the specific category, thereby achieving the above object.
[0025]
The operation will be described below.
[0026]
According to the present invention, an image set (first image set) input for recognition includes a first image representing an object (first object) using a first attribute, and And a second image representing the object using a second attribute different from the first attribute. Since it is determined based on the first attribute and the second attribute whether or not the object belongs to a specific category, the reliability of recognition of the object is increased. Further, a feature vector in the feature space having as a component a filter output value obtained by applying a predetermined image filter to a predetermined position of the predetermined number of images is obtained, and the image set is , Represented by this feature vector. Since this process does not require the process of associating the first image area and the second image area, the initial setting for recognizing the object is easy, and the recognition result depends on the influence of environmental conditions. It is hard to receive.
[0027]
DETAILED DESCRIPTION OF THE INVENTION
First, the principle of the present invention will be described with reference to FIGS.
[0028]
FIG. 1 shows the entire processing procedure of the object recognition method of the present invention. The object recognition method of the present invention includes a learning process 1001 (steps S1001a to S1001c) and a recognition process 1002 (steps S1002a to S1002c). Hereinafter, the procedure of the object recognition method of the present invention will be described by taking as an example the case where the object recognition method of the present invention is applied to recognize a human. The object recognition method shown in FIG. 1 is executed by an object recognition device 1 described later with reference to FIG.
[0029]
Step S1001a: A learning image set is input to the object recognition apparatus 1. In the following description, an image set refers to a set of two images of a visible light image and a far-infrared light image of the same object unless otherwise specified. The learning image set includes at least one image set (second image set) representing a human (second object belonging to the category “human”) and at least one other than the second image set. An image set (an image set representing a non-human object, a third image set). In step S1001a, a plurality of learning image sets are input.
[0030]
Step S1001b: A feature vector is obtained for each of the plurality of learning image sets input in step S1001a. The calculation of the feature vector from one image set will be described later with reference to FIGS. 2A and 2B. This feature vector can be regarded as one point in the feature space.
[0031]
Step S1001c: An identification plane in the feature amount space is determined so as to identify (separate) the feature amount vector for at least one second image set and the feature amount vector for at least one third image set. . The calculation of the identification plane will be described later with reference to FIG.
[0032]
Step S1002a: A recognition image set (first image set) is input to the object recognition apparatus 1.
[0033]
Step S1002b: A feature vector is obtained for each recognition image set input in step S1002a.
[0034]
Step S1002c: It is determined whether or not the object (first object) of the image set for recognition belongs to a specific category “human”. This determination is made based on the positional relationship between the identification plane obtained in step S1001c and the feature vector obtained in step S1002b.
[0035]
The learning process 1001 is a process for obtaining an identification surface (identification parameter) from a learning image set. In the recognition process 1002, this identification surface is used as a criterion for determining whether or not the object represented by the recognition image set belongs to a specific category.
[0036]
FIG. 2A shows image sets 610 to 613 (at least one second image set) input in step S1001a (FIG. 1). Each of the image sets 610 to 613 includes two images of a human visible light image and the same human far-infrared light image. In the example shown in FIG. 2A, the image set 610 includes a visible light image 601 (third image) and a far-infrared light image 602 (fourth image). The visible light image is an image representing the intensity of visible light (light in the wavelength band of 380 to 800 nm) emitted or reflected from the object of the image, and the far-infrared light image is the image. It is an image showing the intensity | strength of the far-infrared light ray (light ray of a wavelength band of wavelength 8-10 micrometers) radiated | emitted or reflected from the target object. The visible light image represents the object using an attribute of the object called the intensity (luminance) of visible light emitted or reflected from the object, and the far-infrared light image is the far light emitted or reflected from the object. It can be said that the object is expressed using the attribute of the object such as the intensity of infrared rays.
[0037]
The object 621 in the image 601 and the object 622 in the image 602 are the same object (the same person). The objects of the visible light image and the far-infrared light image included in the image set 611 are also the same object. However, the objects need not be the same among the image sets 610 to 611. The object only needs to belong to the same category (in this example, the category “human”) between the image sets 610 to 611.
[0038]
FIG. 2A shows four second image sets input in step S1001a (FIG. 1) (image sets 610 to 613), but the second image set input in step S1001a (FIG. 1). The number of sets is not limited to this.
[0039]
FIG. 2B shows image sets 660 to 663 (at least one third image set) input in step S1001a (FIG. 1). Each of the image sets 660 to 663 includes two images, a visible light image of a non-human object and a far-infrared light image of the same object. In the example shown in FIG. 2B, the image set 660 includes an image 651 (visible light image) and an image 652 (far infrared light image). The image 651 has the same size as the image 601 (FIG. 2A), and the image 652 has the same size as the image 602 (FIG. 2A).
[0040]
With reference to FIG. 2A again, the processing for obtaining a feature vector for an image set in step S1001b (FIG. 1) will be described.
[0041]
Assume that two types of image filters (image filter A and image filter B, not shown) are applied to each of the two positions 631 and 632 of the image 601. The image filter A and the image filter B are image filters having different characteristics, for example. A specific example of the image filter will be described later with reference to FIG. In FIG. 2A, locations 631 and 632 are shown as rectangles. This rectangle represents the size of the image filter A and the image filter B. Here, it is assumed that the image filter A and the image filter B have the same size.
[0042]
By applying one image filter to one position of the image 601, one scalar value (filter output value) is generated. In the example shown in FIG. 2A, four filter output values (10, 3, 11 and 5) are generated by applying image filters A and B to two positions 631 and 632 of image 601 respectively. . Specifically, filter output values “10” and “3” are generated by applying the image filter A to the positions 631 and 632, respectively, and the filter is applied by applying the image filter B to the positions 631 and 632. Output values “11” and “5” are respectively generated.
[0043]
Similarly, four filter output values (1, 7, 11, and 4) are generated by applying the above-described image filters A and B to the two positions 633 and 634 of the image 602, respectively.
[0044]
By combining the four filter output values for image 601 (10, 3, 11 and 5) and the four filter output values for image 602 (1, 7, 11 and 4), A feature vector (10, 3, 11, 5, 1, 7, 11, 4) is calculated. In this way, the information of the visible light image and the information of the far infrared light image are integrated using the filter output value.
[0045]
A feature quantity vector is similarly calculated for the image sets 611 to 613 shown in FIG. 2A. The feature vector has a filter output value as a component. This feature quantity vector can be regarded as one point in the 8-dimensional feature quantity space.
[0046]
For the image set 660 shown in FIG. 2B, the feature amount vector is calculated in the same manner. Specifically, four filter output values (8, 9, 0, and 2) are generated by applying the above-described image filter A and image filter B to the two positions 681 and 682 of the image 651, respectively. . By applying the image filter A and image filter B described above to the two positions 683 and 684 of the image 652, four filter output values (9, 12, 10 and 4) are generated. By combining the four filter output values for image 651 (8, 9, 0 and 2) and the four filter output values for image 652 (9, 12, 10 and 4), A feature vector (8, 9, 0, 2, 9, 12, 10, 4) is calculated. A feature quantity vector is similarly calculated for the image sets 661 to 663 shown in FIG. 2B.
[0047]
The image filter to be applied and the position to which the image filter is applied are determined in advance. In one embodiment of the present invention, an image filter to be applied and a position to which the image filter is applied are determined through a feature amount dimension deletion process described later with reference to FIG. In the example shown in FIGS. 2A and 2B, the same image filters as image filters A and B applied to image set 610 (FIG. 2A) are also applied to image set 660 (FIG. 2B). The positional relationship between the positions 631 and 632 with respect to the image 601 is equal to the positional relationship between the positions 681 and 682 with respect to the image 651, respectively. Note that the number of positions to which the image filter is applied in one image is not limited to two. Further, the number of image filters applied to one position in one image is not limited to two.
[0048]
Thus, in step 1001b (FIG. 1), each of a plurality of image sets (image sets 610 to 613 shown in FIG. 2A and image sets 660 to 663 shown in FIG. 2B) is preliminarily selected from the two images. A feature quantity vector in the feature quantity space is obtained that has at least one filter output value obtained by applying at least one predetermined image filter at at least one predetermined position as a component.
[0049]
FIG. 3 shows an example in which the image filter 354 is applied to the image 351. In the example shown in FIG. 3, the image filter 354 is applied to the position 353 of the image 351. FIG. 3 shows an enlarged view of the portion 352 of the image 351. In this enlarged view, the value inside the rectangle representing the outline of the portion 352 indicates the value of the pixel included in the image 351.
[0050]
In the example shown in FIG. 3, the image filter 354 has a size of 3 × 3. The value inside the rectangle representing the image filter 354 indicates nine filter coefficients. The filter output value obtained by applying the image filter 354 to the position 353 of the image 351 is the product of the filter coefficient of the image filter 354 and the pixel value corresponding to the filter coefficient. This is the sum of the coefficients. In this example, the filter output value is 765. The operation for obtaining the filter output value is called a filter operation. By the filter operation, local characteristic information of the image is extracted as a filter output value.
[0051]
FIG. 4 shows a state in which the feature amount vector obtained for each image set in step 1001b (FIG. 1) is plotted in the feature amount space 701. However, in the example shown in FIG. 4, the feature amount space 701 is represented as a two-dimensional space (that is, a plane) for the sake of explanation. The feature amount space 701 is defined by two dimensions, that is, a feature amount dimension 1 and a feature amount dimension 2. The feature amount vector is represented as one point in the feature amount space 701. In FIG. 4, each of the ◯ marks represents a feature amount vector for an image set (second image set) representing a human, and each of the x marks represents an image set (third) representing an object other than a human. ) Represents a feature vector. Hereinafter, a feature vector for an image set representing a human is simply referred to as a “feature vector representing a human”, and a feature vector for an image set representing a non-human object is simply referred to as “non-human”. It may be referred to as a “feature vector expressing the object”.
[0052]
The identification straight line 702 (identification plane) is determined in step 1001c (FIG. 1) so as to identify the feature quantity vector represented by the ◯ mark and the feature quantity vector represented by the X mark. In the example shown in FIG. 4, all feature quantity vectors represented by ◯ are on the upper side of the identification line 702 (arrow 703 side, first side), and all feature quantity vectors represented by x are identified. It is below the straight line 702 (arrow 704 side, second side). The identification line 702 can be determined using, for example, a support vector machine technique. Regarding the method of the support vector machine, for example, reference: V.C. See Vapnic, “The Nature of Statistical Learning Theory”, Springer Verlag, 1995. Alternatively, the identification line 702 may be determined using a technique such as learning of a linear perceptron or a discriminant analysis method. As the learning algorithm, a non-parametric learning algorithm such as a statistical parameter estimation method and a neural network may be employed.
[0053]
In the example shown in FIG. 4, in the two-dimensional feature amount space 701, the identification line 702 represents a feature amount vector for an image set (second image set) representing a person and an object other than a person. The feature quantity vector for the image set (third image set) is identified (separated). A feature amount vector for an image set representing a human (second image set) and a feature vector for an image set expressing a non-human target (third image set) are n-dimensional (n ≧ When represented in the feature amount space of 2), the feature amount vectors are identified by the identification plane. An n-dimensional feature space is dimension x₁, Dimension x₂,. . . , And dimension x_nIs defined by (Equation 1).
[0054]
[Expression 1]
a₁x₁+ A₂x₂+. . . + A_nx_n+ D = 0
Hereinafter, in this specification, “plane” means a point satisfying the relationship of (Equation 1) in an n-dimensional (n ≧ 2) feature amount space (x₁, X₂,. . . , X_n). In the case of n = 2, (Equation 1) represents a straight line, and this straight line is included in the definition of “plane” described above.
[0055]
FIG. 5 shows an example of a recognition image set (first image set) 510 input in step 1002a (FIG. 1). The image set 510 includes a visible light image 501 that represents the object 521 and a far-infrared light image 502 that represents the object 522. The target object 521 and the target object 522 are the same target object (first target object).
[0056]
In step S1002b (FIG. 1), a feature quantity vector (first feature quantity vector) in the feature quantity space for the image set 510 is calculated. In the example shown in FIG. 5, the feature vector for the image set 510 is calculated as (9, 4, 12, 6, 1, 6, 14, 3). The calculation of the feature amount vector is performed in the same manner as the calculation of the feature amount vector for the image set 610 described above with reference to FIG. 2A. That is, in step S1002b, for the image set 510, at least one image filter (image filters A and B) determined in advance in at least one predetermined position of two images (images 501 and 502). A feature quantity vector (first feature quantity vector) in the feature quantity space having at least one filter output value obtained by applying as a component is obtained.
[0057]
The mark ● shown in FIG. 4 indicates the first feature vector plotted in the feature space 701. However, in FIG. 4, for the sake of explanation, the first feature vector is represented as a two-dimensional feature vector (2, 10), not as an eight-dimensional feature vector.
[0058]
In step S1002c (FIG. 1), the object represented by the image set 510 (FIG. 5) is based on the positional relationship in the feature amount space 701 between the first feature amount vector indicated by the mark ● and the identification line 702. It is determined whether or not the user is human. In the example shown in FIG. 4, the feature amount vector indicated by the mark ● is on the upper side of the identification line 702 (the arrow 703 side). The arrow 703 side of the identification line 702 is the side on which the feature vector for an image set representing a person is located, and therefore it is determined that the object represented by the image set 510 (FIG. 5) is a person.
[0059]
Thus, according to the method of the present invention shown in FIG. 1, the first object (the visible light image 501 and the far-infrared light image 502 of the image set 510) represented by the image set 510 (FIG. 5). Are recognized as being human. The determination as to whether or not the first object belongs to the category “human” is based on the intensity of the visible light reflected or emitted by the object (first attribute) and the far infrared that the object reflects or emits. Since this is performed based on the intensity of the light beam (second attribute), the reliability of recognition of the object is increased. Furthermore, a feature quantity vector in the feature quantity space having a filter output value obtained by applying a predetermined image filter at a predetermined position of the image 501 and the image 502 as a component is obtained, and an image set 510 is obtained. Is represented by this feature vector. Since this process does not require the process of associating the area of the image 501 with the area of the second image 502, the initial setting for recognizing the first object is facilitated, and the recognition result is Less susceptible to environmental conditions.
[0060]
Embodiments of the present invention will be described below with reference to the drawings. The same components are denoted by the same reference numerals, and duplicate descriptions may be omitted.
[0061]
FIG. 6 shows a configuration of the object recognition apparatus 1 according to the embodiment of the present invention.
[0062]
The object recognition apparatus 1 includes a far-infrared light camera 100, a visible light camera 110, a storage device 120 that stores learning image data (a learning image set), and a filter processing unit 125 that applies an image filter to the image. And whether the object represented by the learning processing unit 130 and the image set acquired by the far-infrared light camera 100 and the visible light camera 110 belongs to a specific category (for example, whether the object is a human). A recognition processing unit 140, an identification parameter storage unit 150 that stores an identification parameter used as a determination criterion in the determination, a work memory 160, and a display unit 170 that displays a recognition result. Each component of the object recognition device 1 may be connected to each other via an internal bus or may be connected to each other via a network. Such a network may include any network such as a wireless network, a wired network, a telephone line network, and the like. Such a network may include the Internet.
[0063]
The far infrared light camera 100 captures a far infrared light image, and the visible light camera 110 captures a visible light image. In the embodiment of the present invention, a luminance image is used as the visible light image.
[0064]
The object recognition apparatus 1 can be applied to, for example, an outdoor intruder monitoring system, a pedestrian detection system mounted on a moving body such as an automobile, and a visual system mounted on a mobile robot.
[0065]
As described above, the object recognition apparatus 1 performs the learning process and the recognition process shown in FIG. 1 as a whole.
[0066]
The visible light camera 110 and the far-infrared light camera 100 are configured to input a learning image set in step S1001a (FIG. 1) and a recognition image set (first image) in step S1002a (FIG. 1). Group) input processing. The visible light camera 110 and the far-infrared light camera 100 function as an input unit 190 that inputs a learning image set and a recognition image set to the object recognition apparatus 1. Of course, a visible light camera and a far-infrared light camera that input a learning image set to the object recognition device, and a visible light camera and a far-infrared light camera that input a recognition image set to the object recognition device are separately provided. It may be provided.
[0067]
The learning image group is divided into an image group (second image group) representing a human object and an image group (third image group) representing a non-human object. The image data is temporarily stored in the storage device 120. The storage device 120 can be, for example, a hard disk. Alternatively, the storage device 120 can be any memory.
[0068]
The filter processing unit 125 calculates feature vectors in step S1001b (FIG. 1) and step S1002b (FIG. 1). The filter processing unit 125 (feature quantity vector calculation unit) can be, for example, a digital signal processor.
[0069]
The learning processing unit 130 (learning unit) performs processing for obtaining an identification plane in step S1001c (FIG. 1).
[0070]
The recognition processing unit 140 (determination unit) performs a process of determining whether or not the object of the recognition image set belongs to a specific category “human” in step S1002c (FIG. 1).
[0071]
The display unit 170 displays the recognition result in the recognition processing unit 140. An arbitrary display device can be used as the display unit 170. The display unit 170 may be omitted.
[0072]
FIG. 7A shows an example of the arrangement of the far-infrared light camera 100 and the visible light camera 110. In the example shown in FIG. 7A, the far-infrared light camera 100 and the visible light camera 110 are arranged in parallel.
[0073]
FIG. 7B shows another example of the arrangement of the far-infrared light camera 100 and the visible light camera 110. In the example shown in FIG. 7B, the far-infrared light camera 100 and the visible-light camera 110 are arranged so that the optical axis of the visible light camera 110 reflected by the cold mirror 802 is aligned with the optical axis of the far-infrared light camera 100. Has been. A cold mirror is a mirror that reflects visible light and transmits far-infrared light.
[0074]
The function of the far-infrared light camera 100 and the function of the visible light camera 110 may be realized by one camera.
[0075]
FIG. 7C shows an example in which a visible / far-infrared light camera 210 having both functions is used in place of the far-infrared light camera 100 and the visible light camera 110. The visible / far-infrared light camera 210 realizes the functions of a far-infrared light camera and the function of a visible light camera using an area sensor.
[0076]
FIG. 8A shows an example of a visible light image 803 that is captured by the visible light camera 110 and represents a human object.
[0077]
FIG. 8B shows an example of a far-infrared light image 804 that is captured by the far-infrared light camera 100 and represents a human object. The far-infrared light image 804 is obtained by photographing the same object as the visible light image 803 shown in FIG. 8A at substantially the same time.
[0078]
FIG. 9A shows an example of a visible light image 805 that is captured by the visible light camera 110 and represents a non-human object (tree).
[0079]
FIG. 9B shows an example of a far-infrared light image 806 representing an object (tree) other than a human being photographed by the far-infrared light camera 100. The far-infrared light image 806 is obtained by photographing the same object as the visible light image 805 shown in FIG. 9A at substantially the same time.
[0080]
The visible light image and the far-infrared light image shown in FIGS. 8A, 8B, 9A, and 9B constitute an image set (an image set for learning or an image set for recognition). It is necessary that the same object appears in the visible light image and the far-infrared light image, but the alignment between the visible light image and the far-infrared light image must be accurately performed in pixel units. Absent. For example, in the visible light image 803 (FIG. 8A), the object is shifted to the left from the center of the image, and in the far-infrared light image 804 (FIG. 8B), the object is shifted to the right from the center of the image. However, since the processing for associating the region of the visible light image 803 and the region of the far-infrared light image 804 is not necessary in the learning process and the recognition process of the present invention, such a shift is not a problem. Therefore, alignment of the far-infrared light camera 100 and the visible light camera 110 is easy, and the initial setting of the object recognition apparatus 1 is easy. However, the scale ratio (aspect ratio) of all visible light images included in the learning image set and the recognition image set must be the same, and the object must be shown at the same position. This may be realized by cutting out a predetermined area from a visible light image captured by the visible light camera 110. The same applies to the infrared light images included in the learning image set and the recognition image set.
[0081]
FIG. 10A schematically shows characteristics of an image filter used in the filter processing unit 125 (FIG. 6). The image filter shown in FIG. 10A, when applied to a specific position in the image, has an edge with a specific spatial frequency in a specific direction (vertical direction) at that specific position (the elliptical shape shown in FIG. 10A). Edges where the pixel values change sequentially within the range of the horizontal width (short axis) are selectively detected.
[0082]
FIG. 10B schematically shows the characteristics of an image filter that selectively detects horizontal edges.
[0083]
FIG. 10C schematically shows characteristics of an image filter that selectively detects an edge extending from the lower left to the upper right.
[0084]
FIG. 10D schematically illustrates the characteristics of an image filter that selectively detects an edge extending from the lower right to the upper left.
[0085]
The image filters shown in FIGS. 10A to 10D include orientation selectivity (detects only edges in a specific direction), position selectivity (detects edges at a specific position), and spatial frequency selectivity (specific spatial frequency). The edge where the pixel value changes). Here, the spatial frequency refers to a degree of change of a pixel value (for example, luminance) with respect to a position change in an image. An example of an image filter having such characteristics is a Gabor filter. By using multiple types of image filters having azimuth selectivity, position selectivity, and spatial frequency selectivity (a plurality of image filters with different selectivity), different image filters detect information related to the same edge repeatedly. Waste can be reduced and the number of required image filters can be reduced. Thereby, the amount of calculation required to execute the learning process 1001 and the recognition process 1002 can be reduced. As a result, an increase in the amount of calculation due to inputting a plurality of images (visible light image and far-infrared light image) having different attributes can be minimized.
[0086]
One filter output value represents information of an edge having a specific spatial frequency in a specific direction at a specific position of the image. Representing one set of images by a feature vector having a filter output value as a component is equivalent to simply expressing the shape of a common object of a visible light image and a far-infrared light image as a collection of edges. To do.
[0087]
11A to 11D show examples of filter coefficients of the Gabor filter. In the example shown in FIGS. 11A to 11D, each grid point corresponds to one filter coefficient of a Gabor filter having a size of 13 pixels × 13 pixels, and the value (real part) of the filter coefficient is represented by each grid point. Shown as the height of. 11A to 11D correspond to the image filters shown in FIGS. 10A to 10D, respectively.
[0088]
Hereinafter, detailed processing procedures of the learning process and the recognition process executed by the object recognition apparatus 1 will be described.
[0089]
<Learning process>
FIG. 12 shows the detailed procedure of the learning process executed by the object recognition device 1. Step S91 corresponds to step S1001a and step S1001b (FIG. 1), and steps S92 to S97 correspond to step S1001c (FIG. 1).
[0090]
Step S91: preparing a plurality of image sets (second image set) representing human beings and image sets (third image set) representing objects other than human beings, and obtaining feature vector for each image set. Ask. The set of feature quantity vectors obtained in step S91 is referred to as feature quantity data F. A more detailed processing procedure of step S91 will be described later with reference to FIGS. In the following description, it is assumed that a 1032-dimensional feature vector is calculated from each image set in step S91 (the present invention is not limited to this). This feature quantity vector has 1032 feature quantity dimensions and feature quantity dimensions x.₁, Feature dimension x₂, Feature dimension x_Three,. . . , Feature dimension x₁₀₃₂Expressed as a point in the space defined by.
[0091]
Step S92: Of the feature value dimensions, a feature value dimension used for learning is designated. First specify all dimensions. In this example, the first time is the feature dimension x₁, Feature dimension x₂, Feature dimension x_Three,. . . , Feature dimension x₁₀₃₂Are designated as feature quantity dimensions used for learning. In the second and subsequent times, the remaining dimensions removed in step S94, which will be described later, are designated as feature quantity dimensions used for learning. By repeating the processing from step S92 to step S96, the number of dimensions of the feature amount space decreases. This is referred to as “feature dimension dimension deletion processing”.
[0092]
Step S93: A lower-dimensional feature quantity space is defined using the designated feature quantity dimension (however, a 1032-dimensional feature quantity space is first defined), and each feature quantity vector included in the feature quantity data F is defined. Is represented as a feature quantity vector in this lower dimensional feature quantity space. The feature quantity vector in the lower dimensional feature quantity space is composed only of components corresponding to the feature quantity dimension designated in step S92 among the components of the 1032-dimensional feature quantity vector included in the feature quantity data F. It is expressed as a lower-dimensional feature vector. Since one component of the feature vector corresponds to one filter output value at one position in the image, this lower-dimensional feature vector is also determined in advance for at least one of the two images in the image set. As a component, at least one filter output value obtained by applying one predetermined image filter at a predetermined position is included.
[0093]
Next, in this lower-dimensional feature amount space, an identification plane for identifying (separating) a feature amount vector for an image set representing a person and a feature amount vector for an image set representing a non-human object. Is determined as a provisional identification plane.
[0094]
The position of the feature quantity vector with respect to the identification plane can be expressed by a value (weighted sum) obtained by weighting the components corresponding to the feature quantity dimensions of the feature quantity vector with the coefficients corresponding to the feature quantity dimensions of the identification plane. For example, consider a case where the identification plane in the three-dimensional space is represented as x + 2y + 3z = 0 and the feature vector is (x, y, z) = (− 1, 0, 4). When each component of the feature vector (−1, 0, 4) is weighted and added by the coefficient (1, 2, 3) of each feature value dimension of the identification plane, 1 × (−1) + 2 × 0 + 3 × 4 A value of = 11 is obtained. This value represents the distance between the feature vector (-1, 0, 4) and the identification plane. The positional relationship of the feature vector with respect to the identification plane can be expressed by the sign and magnitude of this value.
[0095]
As described above, a support vector machine method or the like is used as a method for determining an identification plane (temporary identification plane) that divides a category when points belonging to two categories are distributed in the feature amount space. obtain.
[0096]
Using such a technique, an identification plane for separating a feature vector for an image set representing a person in a feature space and a feature vector for an image set representing an object other than a human is temporarily identified. Determined as a plane. The separation does not necessarily have to be complete, and some feature vectors (for example, feature vectors for image sets representing humans) cross the identification plane (feature vector vectors for image sets that do not represent humans). It does not matter if it is in a distributed arrangement (misidentification). However, it is better that the number of misidentified feature vectors is small. The determination method of the identification plane may be selected from a plurality of methods so that the misidentification feature vector is reduced.
[0097]
Step S94: Among the coordinate axes corresponding to the feature quantity dimension designated in Step S92, d coordinate axes are designated in Step S92 in order from the coordinate axis having the smallest absolute value with respect to the temporary identification plane determined in Step S93. It is removed from the feature quantity dimension (feature quantity dimension used for learning). The value of d is a predetermined integer of 1 or more.
[0098]
For example, in a three-dimensional feature amount space (the coordinate axis is xyz), when the temporary identification plane is represented as x + 2y + 3z = 0, the absolute value of the angle formed by this temporary identification plane and the x, y, and z axes is x The axis increases in the order of the y-axis and the z-axis. In this case, if d = 1, the feature quantity dimension corresponding to the x-axis is excluded from the feature quantity dimension used for learning. Instead of paying attention to the angle formed between each coordinate axis and the identification plane, attention is paid to the angle δ formed between each coordinate axis and the normal of the identification plane, and d coordinate axes may be excluded from those having a large absolute value of δ. Similar results are obtained. The direction of the normal of the identification plane is represented by a normal vector having as a component the coefficient of the expression representing the temporary identification plane. For example, the direction of the normal line of the temporary identification plane x + 2y + 3z = 0 is represented by the normal vector (1, 2, 3).
[0099]
With reference to FIG. 4 again, the meaning of removing the feature value dimension from the feature value dimension used for learning will be described.
[0100]
In FIG. 4, the angle between the axis (horizontal axis) corresponding to the feature quantity dimension 1 and the identification line (identification plane) 702 is the same as the axis (vertical axis) corresponding to the feature quantity dimension 2 to the identification line (identification plane) 702. It is smaller than the angle to make. This is because the feature quantity dimension 1 is used to identify (separate) a feature quantity vector for an image set representing a human and a feature quantity vector for an image set representing a non-human object. Means less important than 2. That is, the value of the feature quantity dimension 2 (filter output value) greatly affects the determination of whether or not the object is a human. In the example shown in FIG. 4, in step S94 (FIG. 12), the feature quantity dimension 1 out of the feature quantity dimension 1 and the feature quantity dimension 2 is deleted.
[0101]
The value of one feature amount dimension corresponds to one filter output value at one position of one visible light image or far infrared light image included in the image set. In this way, by obtaining the identification line (or identification plane), it is possible to determine which feature quantity dimension (filter output) out of the plurality of filter outputs obtained from the visible light image and the plurality of filter outputs obtained from the far-infrared light image. ) Can be determined. Step S94 (FIG. 12) means that an unimportant feature amount dimension (a feature amount dimension that does not contribute to identification) is deleted from the feature amount dimension used for learning. Returning to FIG. 12, the detailed processing procedure of the learning process will be continued.
[0102]
Step S95: A feature amount space in which d coordinate axes are reduced is set, and the discrimination performance is evaluated in the newly set feature amount space. This evaluation is performed by using the feature vector representing the person included in the feature data F in the lower-dimensional feature quantity space newly defined by removing the d coordinate axes (feature quantity dimensions) in step S94 and the human. This is performed by examining how accurately the feature vector representing the object other than the above can be identified (separated). A more detailed processing procedure of step S95 will be described later with reference to FIG.
[0103]
Step S96: It is determined whether or not the identification performance satisfies a reference value. If the result of the determination in step S96 is “No”, the process proceeds to step S97. If the result of the determination in step S96 is “Yes”, the process returns to step S92 (further, the feature quantity dimension is deleted).
[0104]
Step S97: The feature quantity dimension used in step S93 is designated as the selection dimension. Further, the temporary identification plane obtained in step S93 is designated as the identification plane. This identification plane is used as a criterion in a recognition process performed later. The identification plane is a plane in the space (feature quantity space) defined by the feature quantity dimension designated as the selection dimension in step S97. In step S96, the first time may unconditionally move to step S92.
[0105]
As described above, in steps S92 to S96, the learning processing unit 130 (FIG. 6) sets an image group (FIG. 6) representing a person in a temporary feature amount space having a larger number of dimensions than the feature amount space used in the recognition process. The normal of the plane (temporary identification plane) for identifying the feature quantity vector for the second image set) and the feature quantity vector for the image set representing the non-human object (third image set) The feature amount space used in the recognition process is defined by deleting at least one dimension from the tentative feature amount space based on the orientation of.
[0106]
In step S97, 1032 feature quantity dimensions, feature quantity dimension x₁, Feature dimension x₂, Feature dimension x_Three,. . . , Feature dimension x₁₀₃₂Among them, m (m is an integer of 1032 or less) feature quantity dimension, feature quantity dimension x_a1, Feature dimension x_a2, Feature dimension x_a3,. . . , Feature dimension x_amAssuming that subscripts a1, a2, a3,... Am are integers greater than or equal to 1 and less than or equal to 1032 are specified as selection dimensions, a list of selection dimensions (feature quantity dimension x_a1, Feature dimension x_a2, Feature dimension x_a3,. . . , Feature dimension x_am) Indicates which filter output applied to the visible light image and the far-infrared light image included in the image set is used in a recognition process performed later. That is, it can be said that the list of selected dimensions defines how to combine the information of the visible light image and the information of the far infrared light image.
[0107]
The identification plane determined in step S97 is represented by a list of selected dimensions and coefficients for the selected dimensions. Parameters representing these identification planes are stored in the identification parameter storage unit 150 (FIG. 6).
[0108]
In the processing procedure described above, step S95 may be omitted, and the determination in step S96 may be replaced with a determination as to whether or not the number of deleted feature quantity dimensions has reached a predetermined value. That is, when the number of feature dimension deletions reaches a predetermined number, the process may proceed to step S97, and when it does not reach the predetermined number, the process may proceed to step S92. By performing such processing, the feature quantity dimension number can be set to a predetermined value.
[0109]
If the value d used in step S94 is increased, the number of repetitions of the procedure from step S92 to step S96 can be reduced, and the amount of calculation can be reduced. On the other hand, if the value of d is reduced, many feature quantity dimensions are not deleted at a time, so that it is possible to determine a sufficient number of feature quantity dimensions as selection dimensions necessary to realize a desired discrimination performance. Become.
[0110]
In the recognition process performed later, an identification plane (not limited to a plane) different from the identification plane determined in step S97 may be used as a determination criterion (identification parameter). Such an identification plane is set so as to identify a feature vector representing a person and a feature vector representing an object other than a human in a space defined by the selected dimension. An arbitrary identification method for identifying a feature vector representing a person and a feature vector representing a non-human object in a space defined by the selected dimension, and a recognition process performed later by an identification parameter used in the identification method , It can be employed to determine whether the object is human.
[0111]
In the recognition process performed later, a linear identification method may be employed, or a non-linear identification method may be employed. The linear identification method is, for example, whether the object is a human being based on which side of the identification plane represented by the identification parameter, which is determined in step S97, the feature vector representing the object is present. This is a method for determining whether or not. Examples of the non-linear identification method include a k-NN method, a perceptron using a non-linear element, LVQ, and a non-linear SVM. Hereinafter, an example of a non-linear identification method will be described with reference to FIGS. 13A to 13C.
[0112]
FIG. 13A is a diagram for explaining an identification method using a curved identification surface. A space 1380 is a feature amount space defined by the selected dimension. In FIG. 13A, the space 1380 is shown as a plane defined by two selection dimensions of the feature quantity dimension 1 and the feature quantity dimension 2, and the identification surface 1361 is shown as a curve. In FIG. 13A and FIGS. 13B and 13C described later, each of the ◯ marks represents a feature vector that represents a person, and each of the x marks represents a feature vector that represents an object other than a human.
[0113]
The identification surface 1361 identifies, in the feature quantity space 1380, a feature quantity vector that represents a person and a feature quantity vector that represents an object other than a human. In this example, the feature vector (○ mark) representing a human is located on the first side (the arrow 1362 side) of the identification surface 1361, and the feature vector (× mark) representing an object other than a human is , Located on the second side of the identification surface 1361 (the side of the arrow 1363). The identification surface 1361 can be represented by, for example, an expression having values of the feature value dimension 1 and the feature value dimension 2 as variables. The coefficient of such an expression is stored in the identification parameter storage unit 150 (FIG. 6) as an identification parameter.
[0114]
A point 1364 indicates a feature vector for an image set (recognition image set) input in a recognition process performed later. In the example shown in FIG. 13A, the feature quantity vector 1364 is located on the first side of the identification plane 1361, so it is determined that the object of the recognition image set is a human.
[0115]
FIG. 13B is a diagram for explaining an identification method using a distance in the feature amount space 1380. Examples of such identification methods include the k-NN method and LVQ. The representative point 1366 is a point representing a feature vector representing a person (a representative point indicating a category of “human”). The representative point 1366 is obtained, for example, as the centroid of all feature vectors representing humans. Similarly, the representative point 1367 is a point representing a feature vector representing a non-human object (a representative point indicating a category “non-human”). The representative point 1366 and the representative point 1367 are represented by coordinates representing the points in the feature amount space. Such coordinates are stored as identification parameters in the identification parameter storage unit 150 (FIG. 6).
[0116]
A point 1365 indicates a feature vector for an image set (recognition image set) input in a recognition process performed later. In this identification method, the category to which the representative point closest to the feature vector belongs is the recognition result for the feature vector. In the example shown in FIG. 13B, since the category indicated by the representative point (representative point 1366) closest to the feature quantity vector 1365 is the category “human”, it is determined that the object of the recognition image set is a human.
[0117]
FIG. 13C is a diagram for explaining an identification method using the distribution of feature quantity vectors in the feature quantity space 1382. As an example of such an identification method, a method such as a neural network such as a perceptron using a non-linear element may be used. In FIG. 13C, the feature quantity space 1382 is shown as a one-dimensional space (that is, a straight line) defined by one selected dimension (feature quantity dimension 1). A curve 1369 and a curve 1370 are respectively a distribution amount of a feature vector expressing a person in the feature space 1382 (a distribution intensity indicating a category of “human”) and a feature vector expressing an object other than a human. Distribution intensity (distribution intensity indicating a category of “non-human”). The curve 1369 and the curve 1370 can be represented by expressions using the value of the feature quantity dimension 1 as a variable. The coefficient of such an expression is stored in the identification parameter storage unit 150 (FIG. 6) as an identification parameter.
[0118]
A point 1368 indicates a feature vector for an image set (recognition image set) input in a recognition process performed later. In this identification method, the intensities of a plurality of distributions are compared at the position of the feature quantity vector, and the category indicated by the maximum distribution intensity is the recognition result for the feature quantity vector. In the example shown in FIG. 13C, the distribution intensity 1369 indicating the category “human” is greater than the distribution intensity 1370 indicating the category “non-human” at the position of the feature vector 1368. The object is determined to be a human.
[0119]
Thus, the identification parameter represents not only the identification plane but also a parameter used in an arbitrary identification method for identifying feature quantity vectors belonging to different categories in the feature quantity space.
[0120]
FIG. 14 schematically illustrates a change in identification performance associated with performing a feature quantity dimension deletion process. As shown in FIG. 14, initially, the identification performance is improved by deleting the feature quantity dimension. This is because extra information (noise) that adversely affects identification can be reduced by reducing feature quantity dimensions that do not contribute to identification.
[0121]
In general, simply combining information from multiple images with different attributes, such as a visible light image and a far-infrared light image, increases the amount of information, increases the identification process, and increases the number of samples required for learning (for learning Since the number of image sets) increases, sample collection becomes difficult. If the number of learning image sets is insufficient, the identification performance may deteriorate. However, in the embodiment of the present invention, after combining the information of the visible light image and the information of the far-infrared light image, the feature amount dimension is deleted in Steps S92 to S96. By deleting feature quantity dimensions and selecting information effective for identification, it is possible to improve (or maintain) identification performance while reducing the amount of calculation in recognition processing performed later.
[0122]
As a result of the simulations performed by the present inventors using 858 image sets representing humans and 11052 image sets representing non-human objects, the number of feature dimensions is 88 as a result of the processing in steps S92 to S96. %, And the probability of misidentification was reduced to 1/9.
[0123]
The determination in step S96 (FIG. 12) is performed as follows, for example. The change in the identification performance obtained in step S95 is monitored, the identification performance in the previous step S95 is compared with the performance in this step S95, and if the identification performance is improved or maintained, it is determined that the standard is satisfied, If the identification performance is degraded, it is determined that the standard is not satisfied. When such a determination is made, the maximum value of the identification performance indicated by the point 1302 in FIG. 14 can be realized.
[0124]
The determination in step S96 (FIG. 12) may be performed in other manners. For example, an absolute discrimination performance value (reference number 1301 shown in FIG. 14) may be designated in advance, and the feature amount dimension may be deleted as much as possible under the condition that the designated discrimination performance can be realized. In this case, as long as the identification performance value is satisfied, the number of feature amount dimensions is deleted to the maximum (point 1302).
[0125]
FIG. 15 shows a more detailed processing procedure of step S91 (FIG. 12). Step S101 corresponds to step S1001a (FIG. 1), and steps S102 to S104 correspond to step S1001b (FIG. 1).
[0126]
Step S101: A visible light image and a far-infrared light image are input. The visible light image and the far-infrared light image constitute an image set. In step S101, an image group representing a person (an image group in which a visible light image and a far-infrared light image represent the same person) and an image group (an visible light image) representing an object other than a person. And a far-infrared light image are input as an image set representing the same object other than a human being). Examples of such image sets have been described above with reference to FIGS. 2A and 2B.
[0127]
The visible light image and the far-infrared light image input in step S101 are captured using the far-infrared light camera 100 and the visible light camera 110, and are once stored in the storage device 120 as a learning image set. . Alternatively, the visible light image and the far-infrared light image may be input to the object recognition apparatus 1 by being read from a recording medium (not shown).
[0128]
Step S102: Normalization of pixel values is performed for each image (visible light image or far infrared light image). Normalization of pixel values is performed according to (Equation 2).
[0129]
[Expression 2]
I ′ (x, y) = (I (x, y) −m) / σ
here,
I (x, y): pixel value at coordinates (x, y) in the image before normalization
m: Average pixel value of the entire image
σ: Standard deviation from the average pixel value of the entire image
I '(x, y): normalized pixel value at coordinates (x, y) in the image
It is.
[0130]
Step S103: A plurality of Gabor filters having different characteristics are applied to a plurality of regions in the image with respect to the image on which the pixel value has been normalized.
[0131]
Step S104: A feature vector is obtained from the filter output value corresponding to each region (specific region) to which the Gabor filter is applied.
[0132]
FIG. 16 shows the relationship between the image 431 and the region 432 to which the image filter in the image 431 is applied. The image 431 is one of a visible light image or an infrared light image input in step S101. A region 432 indicates a region to which the image filter is applied. Hereinafter, the region 432 to which the image filter in the image 431 is applied is referred to as a “specific region”. Details of the processing in steps S103 and S104 (FIG. 15) will be described with reference to FIG.
[0133]
In the example illustrated in FIG. 16, the specific region 432 has a square shape with one side having a length L. The image 431 has a rectangular shape having a height H and a width W. A plurality of specific areas 432 are set in the image 431. By applying a Gabor filter that matches the size of the specific area 432 to each of the specific areas 432, a filter output value is generated in each of the specific areas 432. The method by which the filter output value is generated was described above with reference to FIG.
[0134]
For example, the filter processing unit 125 (FIG. 6) sets a plurality of specific areas 432 in the image 431 so that the specific area 432 covers the entire image 431 while allowing the image 431 to overlap. For example, in FIG. 16, when the specific area 432 has a size (size 1) of L = H / 8 and W = H / 2, the specific area 432 is divided by L / 2 in the vertical direction and the horizontal direction. When overlapping the entire surface of the image 431, the number of specific areas in the image 431 is (H / L × 2-1) × (W / L × 2-1) = 15 × 7 = 105. . Four image filters having different characteristics (different azimuth selectivity) (four image filters having four types of direction selectivity shown in FIGS. 11A to 11D) are applied to one specific region.
[0135]
Furthermore, specific areas having different sizes are arranged on the entire surface of the image 431. Image filters having different sizes are applied to specific regions having different sizes. If the specific area is a square having a size of L = H / 4 (size 2), the specific area is overlapped by L / 2 in the vertical direction and the horizontal direction so as to cover the entire surface of the image 431, and W = H Under the assumption of / 2, the number of specific regions is (H / L × 2-1) × (W / L × 2-1) = 7 × 3 = 21. Similarly, when the specific area is a square having a size of L = H / 2 (size 3), the number of specific areas is (H / L × 2-1) × (W / L × 2-1) = 3 × 1. = 3.
[0136]
The total number of specific areas of three different sizes (size 1, size 2 and size 3) in the image 431 is 105 + 21 + 3 = 129. When four types of image filters having different orientation selectivity are applied to each of the specific regions, 129 × 4 = 516 filter outputs can be obtained from the image 431. One image set (a learning image set or a recognition image set) includes a visible light image (luminance image) and a far-infrared light image. When the visible light image and the far-infrared light image have the same size (height H, width W), and the specific region is set at the same position in the visible light image and the far-infrared light image, 1 The number of filter outputs obtained from two images in one image set is 516 × 2 = 1032. Therefore, one image set is represented by a 1032-dimensional feature vector. Alternatively, a higher-dimensional feature quantity space may be set by setting a higher-dimensional feature quantity space and mapping this 1032-dimensional feature quantity vector to the higher-dimensional feature quantity space. By mapping the feature quantity vector to a higher-dimensional feature quantity space, the distance between the feature quantity vectors corresponding to each image set is increased, so that an identification plane is obtained in step S93 (FIG. 12) to be performed later. The advantage that it is easy is obtained.
[0137]
The number of orientations of the Gabor filter is not limited to four. The number of orientations of the Gabor filter may be changed depending on the size of the Gabor filter and / or the position of the specific region. By changing the number of orientations depending on the size of the Gabor filter and / or the location of a specific region, it is possible to change from a specific location in the image (eg, a location where the orientation is to be distinguished in detail) and / or a specific spatial frequency region. It becomes possible to acquire more information efficiently.
[0138]
Further, the size of the image filter is not limited to three types. The size of the image filter may be one or more types. The visible light image and the far-infrared light image have different spatial frequency characteristics of changes in pixel values (luminance). Therefore, it is possible to efficiently acquire a large amount of information from an image by changing the size of an applied image filter between a visible light image and a far-infrared light image.
[0139]
For the visible light image and the far-infrared light image, the size and position of the specific region are not necessarily equal. For visible light images and far-infrared light images, performance improvement can be expected by setting the size and position suitable for each. However, by making the size and arrangement of the specific areas for both images equal, there is the advantage that the processing of applying the image filter to both images can be executed in the same procedure, reducing the scale of hardware circuits and software Is possible.
[0140]
As the image filter, a filter having a shape similar to the Gabor filter or an image filter for obtaining another edge may be used. Furthermore, an image filter other than the filter for obtaining the edge may be used. However, by using a Gabor filter or an image filter having a similar shape, it is possible to efficiently acquire information on luminance changes localized in both the position space and the frequency space. Therefore, when a Gabor filter or an image filter having a similar shape is used, it is possible to acquire information on spatial changes at a specific spatial frequency more efficiently than when an edge filter such as a sobel filter is used. Become. As a result, information effective for recognition can be acquired efficiently from the amount of information increased by combining images with different properties such as a visible light image and a far-infrared light image. The information of the visible light image and the information of the far-infrared light image can be effectively combined and used for recognition without using the temperature information obtained from the far-infrared light image.
[0141]
In the example described with reference to FIG. 16, a plurality of specific areas (areas to which the Gabor filter is applied) are set in one size image 431. However, the same result can be obtained by setting specific regions (regions to which the Gabor filter is applied) of the same size for each of a plurality of images of different sizes obtained by photographing the same object at different resolutions.
[0142]
In the far-infrared light image and the visible light image, the object does not necessarily appear at the same position (they may be shifted in the vertical direction and / or the horizontal direction between the two images). As long as the same positional relationship is maintained even when other objects are photographed, the positions of the objects in the visible light image and the far-infrared light image do not need to match. This is because in the learning process and the recognition process of the present invention, it is not necessary to associate a region in the far-infrared light image with a region in the visible light image.
[0143]
FIG. 17 shows the detailed procedure of the process for evaluating the discrimination performance in step S95 (FIG. 12).
[0144]
Step S111: Of the components of the feature vector, only the component corresponding to the feature dimension specified in step S92 is validated. As the feature amount vector, all feature amount vectors included in the feature amount data F obtained in step S91 (FIG. 12) or some feature amount vectors included in the feature amount data F are used.
[0145]
Step S112: Learning is performed to identify a feature vector expressing a person and a feature vector expressing an object other than a human using a predetermined identification method. The predetermined identification method is a method used for determining whether or not the object is a human in a recognition process performed later.
[0146]
Step S113: The discrimination performance is calculated using the feature data for evaluation (a set of feature vectors). As the feature quantity data for evaluation, the feature quantity data F obtained in step S91 (FIG. 12) may be used, or feature quantity data not used in the learning process shown in FIG. 12 may be used. . Alternatively, a set of evaluation feature amount vectors may be separately created in advance by the same procedure as the feature amount data F. The identification performance includes, for example, a feature quantity vector (including a feature quantity vector representing a human and a feature quantity vector representing a non-human object) in which the dimension set in step S94 (FIG. 12) is validated. It is expressed as a ratio that can be correctly identified using the identification method after the learning in S112 is completed.
[0147]
<Recognition process>
FIG. 18 shows a detailed procedure of recognition processing executed by the object recognition apparatus 1. Step S121 corresponds to step S1002a (FIG. 1), steps S122 to S123 correspond to step S1002b (FIG. 1), and steps S124 to S125 correspond to S1002c (FIG. 1). ing.
[0148]
Step S121: A visible light image and a far-infrared light image are input. This image input is performed by the visible light camera 110 and the far-infrared light camera 100 (FIG. 6), similarly to step S101 (FIG. 15) in the learning process.
[0149]
Step S122: A recognition target region is cut out from the image (visible light image and far infrared light image). The recognition target area is cut out so that the shape of the recognition target area matches the shape of the image used in the learning process. The recognition target area may be fixed in the image, or a plurality of recognition target areas may be cut out from one image. In the example described with reference to FIG. 16, the shape of the recognition target region to be cut out is a rectangle having an aspect ratio of H to W. The shape of the visible light image in the learning process is the same as the shape of the recognition target region cut out from the visible light image in the recognition process, and the shape of the far infrared light image in the learning process and the far infrared light image in the recognition process As long as the shape of the recognition target region cut out from the same is the same, the shape of the recognition target region cut out from the visible light image may be different from the shape of the recognition target region cut out from the far-infrared light image.
[0150]
The clipping in step S122 is performed in consideration of the shooting positions of the visible light image and the far-infrared light image input in step S101 (FIG. 15) in the learning process. Specifically, the cutout in step S122 is performed so that the object appears at the same position in the visible light image in the learning process and the recognition target region cut out from the visible light image in the recognition process. Of course, you may install the visible light camera 110 which image | photographs a visible light image in a recognition process so that extraction may not be required. The same applies to the far-infrared light image. The same applies to the magnification of the visible light image and the far-infrared light image, and the magnification (number of pixels) of both the visible light image and the far-infrared light image may be different. And the far-infrared light image enlargement ratio (number of pixels) are adjusted to be the same in the learning process and the recognition process.
[0151]
Next, the size of the cut-out visible light image and far-infrared light image is normalized as necessary. In the case where the shape of both the visible light image and the far-infrared light image is a rectangle with an aspect ratio of 2: 1, the size is normalized to, for example, a rectangle of 64 pixels vertically and 32 pixels horizontally. By normalizing the size of the image, the size of the Gabor filter applied to the image in the next step S123 (the size of the specific area on which the filter operates) can be fixed. The visible light image and the far-infrared light image that are cut out and normalized in size constitute an image set for recognition.
[0152]
Step S123: A feature vector corresponding to the selected dimension determined in step S97 (FIG. 12) is obtained from the cut out image. The feature amount vector is calculated using the same Gabor filter as that used in the learning process. If the image size is not normalized in step S122 described above, a Gabor filter having a size corresponding to the image size is applied.
[0153]
When a part of the feature quantity dimension is deleted in the learning process, the calculation process of the Gabor filter corresponding to the deleted feature quantity dimension is unnecessary, and is excluded from the feature quantity vector calculation process in advance.
[0154]
Step S124: The similarity is obtained using the weighted sum of the feature vector and the coefficient of the identification plane. The similarity indicates the degree to which the object of the recognition image set resembles a human. As already described, the weighted sum of the feature vector and the coefficient of the identification plane represents the distance (positional relationship) between the feature vector and the identification plane. This distance is a positive value when it is on one side of the space separated by the identification plane (for example, the first side on which a feature vector representing humans is located), and the opposite side (for example, a non-human countermeasure) When the feature amount vector is on the second side), it can be expressed as a negative value. The absolute value of the distance increases as the feature vector moves away from the identification plane.
[0155]
Step S125: A person is recognized based on the similarity (that is, it is determined that the object is a person). For example, based on whether the similarity (distance between the feature vector and the identification plane) is positive or negative (that is, based on which side the feature vector is located) A determination is made as to whether the object is human. Alternatively, when the similarity is positive (that is, the feature vector is on the first side of the identification plane) and the similarity is greater than or equal to a predetermined threshold, it is determined that the object is a human being. You may do it. Such thresholds reduce the need for recognition accuracy (for example, whether it is desirable to reduce the possibility of misrecognizing a non-human object as a human being, or misrecognizing a human being not a human being). Is desired). A numerical value indicating the degree of similarity may be displayed on the display unit 170.
[0156]
As described above, the object recognition device 1 (FIG. 6) of the present invention obtains an identification parameter (for example, a parameter representing an identification plane) using the far-infrared light image and the visible light image of the learning image set, The identification parameter is used as a criterion for recognizing the object of the far-infrared light image and visible light image of the recognition image set (determining whether the object belongs to a specific category). Recognition of the object is based on the intensity of the visible light emitted or reflected from the object (first attribute) and the intensity of the far-infrared light emitted or reflected from the object (second attribute). Since this is done, the reliability of recognition of the object is increased.
[0157]
The present inventors set a set of visible light images and far-infrared light images taken outdoors day and night (an image set 858 representing a human and an image set 11052 representing an object other than a human) as a learning image set. The simulation of the learning process and the recognition process described above was performed. As a result of the simulation, the misrecognition rate was 0.2%. This error recognition rate is the error detection rate (2.7%) in the comparative example in which learning processing and recognition processing are performed using only visible light images, and learning processing and recognition processing are performed using only far-infrared light images. Compared to the erroneous recognition rate (3.5%) in the comparative example, the value is very low (1/10 or less). In this way, high reliability of object recognition is realized.
[0158]
The object recognition apparatus 1 of the present invention can learn a correlation between a visible light image and a far infrared light image by performing learning processing using a visible light image and a far infrared light image. The correlation is reflected in the recognition process. For example, consider a visible light image and a far-infrared light image obtained by photographing an object (human) outdoors in the daytime. Under an environmental condition where the object is exposed to direct sunlight, the brightness of the object in the visible light image increases, and the far-infrared light image indicates that the temperature of the object is high. On the other hand, under an environmental condition in which the object is not exposed to direct sunlight, the brightness of the object in the visible light image is low, and the far-infrared light image indicates that the temperature of the object is low.
[0159]
By using the visible light image and the far-infrared light image captured under such various environmental conditions, the object recognition apparatus 1 of the present invention can correlate between the visible light image and the far-infrared light image. You can learn the relationship. As a result, for example, when the brightness of the object in the visible light image of the recognition image set is high and the far-infrared light image has a low temperature of the object (an event that cannot occur when the object is a human being) If this occurs, the possibility of misrecognizing the object as a human is low.
[0160]
In recognition systems that use only visible light images for learning and recognition processing, it is possible to recognize objects as humans by performing learning processing using visible light images taken under various environmental conditions. The range becomes wider. As a result, there is a high possibility that a non-human object is erroneously recognized as a human. The same applies to a recognition system that performs learning processing and recognition processing using only a far-infrared light image.
[0161]
According to the object recognition device 1 of the present invention, by performing learning processing using a learning image set photographed under various environmental conditions (for example, illumination conditions, temperature conditions), the various environmental conditions are obtained. It becomes possible to correctly recognize the object of the image set for recognition taken in step S2. Such features include objects that are subject to changing environmental conditions such as outdoor intruder monitoring systems, pedestrian detection systems mounted on mobile objects such as automobiles, and visual systems mounted on mobile robots. It is particularly suitable for applications that require correct recognition.
[0162]
Furthermore, the learning process and the recognition process of the present invention described above are not processes specialized for the attributes of the object. This is because the object recognition apparatus 1 of the present invention is used for purposes other than human recognition (for example, an application for recognizing an animal or a vehicle for recognizing a vehicle) without changing the learning process and the recognition process of the present invention. Is also applicable. As described above, the object recognition apparatus 1 according to the present invention can easily perform the initial setting when the environmental condition for recognition is changed and when the recognition target is changed.
[0163]
In the object recognition apparatus 1 of the present invention, a process for extracting a specific temperature region from the far-infrared image is not necessary. Therefore, it is not necessary to perform a calibration process for canceling the influence of the temperature of the optical system, circuit, element, etc. of the far-infrared camera that changes over time, and the configuration and maintenance of the object recognition apparatus 1 can be simplified. Benefits are gained.
[0164]
In the embodiment described above, the selection dimension is set to one set in step S97 (FIG. 12) of the learning process. However, a plurality of sets of selected dimensions (a set of selected dimensions) may be prepared, and an identification parameter may be set for each set. In this case, in the recognition processing, human recognition may be performed based on the similarity obtained using any one set of identification parameters, or may be obtained using each of the plurality of sets of identification parameters. Human recognition may be performed based on the sum (average value) of similarities. Alternatively, human recognition may be performed based on the degree of similarity obtained using each of the plurality of sets of identification parameters, and the majority of the recognition results may be determined.
[0165]
In the above-described embodiment, the feature amount vector is obtained using the Gabor filter in step S123 for the recognition target region cut out in step S122. However, a Gabor filter may be applied to an image before being cut out. In this case, a specific area where the filter is applied is set in advance for the entire image, and the Gabor filter is applied to obtain a filter output for each position in the entire image in advance. Next, the feature quantity vector is calculated using only the filter output of the place that is the detection target area in the image. In this way, by obtaining the filter output in advance, a plurality of filter operations that apply the same Gabor filter to the same location in the image, for example, when the procedure of clipping and recognition is sequentially repeated while scanning a wide range in the image, etc. It is possible to avoid waste of time. Note that a person can be detected from an image in which the object is uncertain by performing a process of sequentially repeating the cutout and recognition procedures while scanning a wide area in the image. When such processing is performed, the object recognition device 1 (FIG. 6) can function as an object detection device.
[0166]
An image set (a set of visible light image and far-infrared light image) representing a non-human object used in learning processing is an object recognition device by photographing a real non-human object such as a tree or a dog. 1 may be input. Alternatively, a set of a visible light image and a far-infrared light image respectively generated by performing conversion processing on a visible light image and a far-infrared light image of an image set that represents a human being is a target other than a human being. An image set representing an object may be used in the learning process. Examples of such conversion processing include processing for performing affine transformation on an image and / or processing for adding noise to an image. The image after the conversion process is an image that is relatively similar to an image representing a person. By using such an image after the conversion process in the learning process, it is possible to learn a determination criterion that does not recognize an object having a shape different from a human shape as a human being.
[0167]
In the above-described embodiment, the learning process and the recognition process are performed by combining two images of a visible light image and a far-infrared light image captured at substantially the same time. The number of images to be combined is not limited to two. Further, a color image may be used as the visible light image instead of the luminance image. In this case, if a color image is represented by three images of RGB (three types of images representing the intensity of light in three different wavelength bands that are emitted or reflected from an object), an R image, a G image, A total of four images of the B image and the far-infrared light image are input to the object recognition apparatus 1 as one image set (an image set for learning and an image set for recognition). The learning process and the recognition process when four images are input are the same as the learning process and the recognition process when two images of a visible light image and a far-infrared light image are input.
[0168]
The image set input to the object recognition apparatus 1 may include images taken at different times. For example, the input unit 190 includes a visible light image (first image) and a far-infrared light image (second image) taken at time T (first time), and a time T + t (predetermined from the first time). The four images of the visible light image (fifth image) and the far-infrared light image (sixth image) taken at the time after the first time) are given to the object recognition apparatus 1 as one image set for recognition. It may be configured to input. Of course, in this case, each of the learning image sets also has a visible light image and a far-infrared light image captured at the same time, and a visible light image and a far-infrared image captured t hours after that time. It goes without saying that it must contain a light image. The learning process and the recognition process when four images are input are the same as the learning process and the recognition process when two images of a visible light image and a far-infrared light image are input.
[0169]
In this way, by combining a visible light image and a far-infrared light image with different shooting times, an object whose shape changes in a specific manner with time, such as a pedestrian, and an object whose shape does not change with time In addition, it is possible to distinguish an object whose shape changes with time, and the recognition accuracy is improved.
[0170]
If the predetermined time t is shortened, an object with fast movement can be efficiently recognized, and if the predetermined time t is lengthened, an object with slow movement can be efficiently recognized. Normally, when recognizing a person, vehicle, animal, or the like with movement outdoors, by setting the predetermined time t to 1 second or less, an object whose shape and / or position changes with time can be effectively used. Can be recognized. In this way, the identification performance is improved by effectively combining information of images at a plurality of times.
[0171]
According to the object recognition apparatus 1 of the present invention, even when the number of images included in an image set increases, only the feature quantity dimension (filter output) that contributes to identification is selected. An increase in the calculation amount of the recognition process is suppressed.
[0172]
The image set input to the object recognition apparatus 1 may include images with different viewpoints. For example, a visible light image and a far-infrared light image captured from the same position, and a visible light image and a far-infrared light image captured from a different position may constitute one image set.
[0173]
FIG. 19 shows still another example of the arrangement of the far-infrared light camera and the visible light camera. In the example shown in FIG. 19, the far-infrared light camera 100a and the visible light camera 110a are arranged at the point A, and the far-infrared light camera 100b and the visible light camera 110b are arranged at the point B. Four cameras are arrange | positioned so that the same target object may be image | photographed. As a whole, the far-infrared light cameras 100a and 100b and the visible light cameras 110a and 100b have an input unit 190 (FIG. 6) for inputting a learning image set and a recognition image set to the object recognition apparatus 1 (FIG. 6). Function as.
[0174]
In the recognition process, in addition to the visible light image (first image) and the far-infrared light image (second image) captured from the point A (first place), the input unit 190 further includes B Object recognition apparatus using four images of a visible light image (fifth image) and a far-infrared light image (sixth image) taken from a point (second place) as one image set for recognition 1 is input. The learning process and the recognition process when four images are input are the same as the learning process and the recognition process when two images of a visible light image and a far-infrared light image are input.
[0175]
In this way, by combining a visible light image and a far-infrared light image taken from different places, the accuracy of recognizing an object having a different shape depending on the viewing direction such as a person is improved.
[0176]
According to the object recognition apparatus 1 of the present invention, even when the number of images included in an image set increases, only the feature quantity dimension (filter output) that contributes to identification is selected. An increase in the calculation amount of the recognition process is suppressed.
[0177]
Further, a visible light image and a far-infrared light image taken from different positions may constitute one image set.
[0178]
FIG. 20 shows still another example of the arrangement of the far-infrared light camera 100 and the visible light camera 110. In the example shown in FIG. 20, the visible light camera 110 is disposed at the point C, and the far infrared light camera 100 is disposed at the point D. Two cameras are arrange | positioned so that the same target object may be image | photographed.
[0179]
In the example shown in FIG. 20, the far-infrared light camera 100 and the visible light camera 110 capture a visible light image (first image) from a point C (first place), and a point D (second place). The far-infrared light image (second image) is taken from the (location). According to such a configuration, the background of the object can be made different between the visible light image and the far-infrared light image. The possibility that the extra information of the background common to the visible light image and the far-infrared light image adversely affects the recognition result is reduced, and an advantage is obtained that the recognition result is less affected by the background. Similarly to the example shown in FIG. 19, the use of images taken from a plurality of different viewpoints improves the recognition accuracy of an object having a different shape depending on the viewing direction, such as a person.
[0180]
In the above-described embodiment, a visible light image and a far-infrared light image are given as examples of an image that expresses an object using different attributes, but the present invention is not limited to this. An image set may be constituted by a visible light image and a near-infrared light image, or an image set may be constituted by a visible light image and an ultraviolet light image. Or you may make it comprise an image group with a visible light image and a distance image. The pixel value in the distance image indicates the distance from the shooting point to the object. It can be said that the distance image is an image that expresses the object using the attribute of the distance from the shooting point to the object.
[0181]
Since the learning process and the recognition process of the present invention described above are not processes specialized for the attributes of the object, the above-described present invention can be used even when a type of image other than a visible light image and a far infrared light image is used. There is no need to change the learning process and the recognition process.
[0182]
The learning process and the recognition process of the present invention are typically realized by software on a computer. However, the learning process and the recognition process of the present invention may be realized by hardware, or may be realized by a combination of software and hardware. Furthermore, it is not essential for the object recognition apparatus 1 (FIG. 6) to execute the learning process. This is because the object recognition apparatus 1 can execute the recognition process (the processes of steps S1002a to S1002c shown in FIG. 1) as long as the learning process result (identification parameter) is stored in the identification parameter storage unit 150. Because it can. Such an identification parameter may be a predetermined parameter. Or such an identification parameter may be calculated | required by performing a learning process using an apparatus different from the object recognition apparatus 1. FIG. By storing the identification parameter obtained as a result of such learning processing in the identification parameter storage unit 150 of the object recognition device 1, the object recognition device 1 can perform recognition processing using the identification parameter as a determination criterion. become.
[0183]
When the object recognition device 1 does not perform the learning process, the learning processing unit 130 and the storage device 120 may be omitted.
[0184]
A program (learning program, recognition program) that expresses part or all of one or both of the learning process and the recognition process of the present invention is, for example, a memory (not shown) in the learning processing unit 130 or a recognition processing unit. 140 may be stored in a memory (not shown) within 140. Alternatively, such a program can be recorded on any type of computer-readable recording medium such as a flexible disk, a CD-ROM, and a DVD-ROM. A learning program or a recognition program recorded on such a recording medium is loaded into a computer memory via a disk drive (not shown). Alternatively, the learning program or the recognition program (or part thereof) may be downloaded to a computer memory through a communication network (network) or broadcast. When the CPU built in the computer executes the learning program or the recognition program, the computer functions as an object recognition device.
[0185]
【The invention's effect】
According to the present invention, an image set inputted for recognition includes a first image that represents an object using the first attribute, and a second attribute that is different from the first attribute. And a second image expressed using. Since it is determined based on the first attribute and the second attribute whether or not the object belongs to a specific category, the reliability of recognition of the object is increased. Further, a feature vector in the feature space having as a component a filter output value obtained by applying a predetermined image filter to a predetermined position of the predetermined number of images is obtained, and the image set is , Represented by this feature vector. Since this process does not require the process of associating the first image area and the second image area, the initial setting for recognizing the object is easy, and the recognition result depends on the influence of environmental conditions. It is hard to receive.
[Brief description of the drawings]
FIG. 1 is a flowchart showing an overall processing procedure of an object recognition method of the present invention.
FIG. 2A is a diagram showing image sets 610 to 613 input in step S1001a (FIG. 1).
FIG. 2B is a view showing image sets 660 to 663 inputted in step S1001a (FIG. 1).
3 is a diagram illustrating an example in which an image filter 354 is applied to an image 351. FIG.
FIG. 4 is a diagram showing a state in which feature quantity vectors obtained for each image set in step 1001b (FIG. 1) are plotted in a feature quantity space 701;
FIG. 5 is a diagram showing an example of a recognition image set 510 input in step 1002a (FIG. 1).
FIG. 6 is a block diagram showing the configuration of the object recognition apparatus 1 according to the embodiment of the present invention.
7A is a diagram showing an example of the arrangement of the far-infrared light camera 100 and the visible light camera 110. FIG.
7B is a diagram showing another example of the arrangement of the far-infrared light camera 100 and the visible light camera 110. FIG.
FIG. 7C is a diagram showing an example in which a visible / far-infrared light camera 210 having both functions is used instead of the far-infrared light camera 100 and the visible light camera 110;
FIG. 8A is a diagram showing an example of a visible light image 803 representing a human target image taken by the visible light camera 110;
FIG. 8B is a diagram showing an example of a far-infrared light image 804 representing a human target image taken by the far-infrared light camera 100.
FIG. 9A is a diagram showing an example of a visible light image 805 representing a non-human object (tree) taken by the visible light camera 110;
9B is a diagram showing an example of a far-infrared light image 806 representing an object other than a human being photographed by the far-infrared light camera 100.
FIG. 10A is a diagram schematically showing characteristics of an image filter used in the filter processing unit 125 (FIG. 6).
FIG. 10B is a diagram schematically showing the characteristics of an image filter that selectively detects edges in the horizontal direction.
FIG. 10C is a diagram schematically illustrating characteristics of an image filter that selectively detects an edge extending from the lower left to the upper right.
FIG. 10D is a diagram schematically illustrating the characteristics of an image filter that selectively detects an edge extending from the lower right to the upper left.
FIG. 11A is a diagram showing an example of filter coefficients of a Gabor filter.
FIG. 11B is a diagram showing an example of the filter coefficient of the Gabor filter.
FIG. 11C is a diagram showing an example of the filter coefficient of the Gabor filter.
FIG. 11D is a diagram showing an example of the filter coefficient of the Gabor filter.
FIG. 12 is a flowchart showing a detailed procedure of learning processing executed by the object recognition apparatus 1;
FIG. 13A is a diagram for explaining an identification method using a curved identification surface;
FIG. 13B is a diagram for explaining an identification method using a distance in the feature amount space 1380;
FIG. 13C is a view for explaining an identification method using a distribution of feature quantity vectors in a feature quantity space 1382;
FIG. 14 is a diagram schematically illustrating a change in identification performance due to a feature amount dimension deletion process;
FIG. 15 is a flowchart showing a more detailed processing procedure of step S91 (FIG. 12).
FIG. 16 is a diagram showing a relationship between an image 431 and a region 432 to which an image filter in the image 431 is applied.
FIG. 17 is a flowchart showing a detailed procedure of processing for evaluating identification performance in step S95 (FIG. 12).
FIG. 18 is a flowchart showing a detailed procedure of recognition processing executed by the object recognition apparatus 1;
FIG. 19 is a diagram showing still another example of the arrangement of the far-infrared light camera and the visible light camera.
20 is a diagram showing still another example of the arrangement of the far-infrared light camera 100 and the visible light camera 110. FIG.
[Explanation of symbols]
1 Object recognition device
100 far-infrared light camera
110 Visible light camera
120 storage device
125 Filter processing unit
130 Learning processing unit
140 Recognition processing unit
150 Identification parameter storage unit
160 Work memory
170 Display
190 Input section

Claims

First image including a first image data is image data of the visible light image of the first object and a second image data is image data of the far-infrared light image of the first object An input unit for inputting a data set;
The input unit inputs the first image data set , and at the predetermined position in the first image data and the second image data of the input first image data set. , orientation selectivity, regioselectivity, at least 1 respectively obtained from the first image data and the second image data by applying each at least one image filtering with at least one selective spatial frequency characteristic A feature quantity vector calculation unit for obtaining a first feature quantity vector in the feature quantity space, having two filter output values as components;
An object recognition apparatus comprising: a determination unit that determines whether the first object belongs to a specific category based on a relationship between the first feature vector and a predetermined identification parameter.

The first image data represents the first object by the intensity of visible light emitted or reflected from the first object, and the second image data is the first object. The object recognition apparatus according to claim 1, wherein the first object is represented by an intensity of far-infrared light emitted or reflected from the object.

Wherein the input unit, each input to the second image data sets and the third addition the feature quantity vector calculation unit image data set comprising a plurality of image data, the second image data sets and the third image Each of the data sets is third image data that is image data of a visible light image of a second object belonging to the specific category, and image data of a far-infrared light image of the second object. 4th image data ,
The feature amount vector calculation unit may determine at least one position in the third image data and the fourth image data that is determined in advance for each of the input second image data set and third image data set. And applying at least one image filter having the same selectivity as the image filter to at least one filter output value respectively obtained from the third image data and the fourth image data as a component, Further obtain a feature vector in the feature space,
And at least one feature vector in the feature space for the second image data set, to identify at least one feature vector in the feature space for the third image data sets, wherein The object recognition apparatus according to claim 1, further comprising a learning unit for obtaining an identification parameter.

The object recognition apparatus according to claim 1, wherein the first object is a human.

The learning section, in feature space provisional with number of dimensions larger than the feature space, a feature amount vector for the second image data set, and the feature vector for said third image data set The object recognition device according to claim 3 , wherein the feature amount space is defined by deleting at least one dimension from the temporary feature amount space based on a direction of a normal line of a plane for identifying the feature amount.

The identification parameter represents an identification plane in the feature quantity space, and the determination unit determines whether the first feature quantity vector is located on the side of the identification plane. The object recognition apparatus according to claim 1, wherein it is determined whether an object belongs to the specific category.

The determination unit determines a first feature vector, if the distance between the identification surface is equal to or greater than a predetermined threshold value, and said first object belongs to the specific category, claim 6 The object recognition apparatus described in 1.

Wherein the input unit, the first and the fifth image data which is the image data of the visible light image of the object, a sixth image of an image data of the far-infrared light image of the first object object the data further being adapted to input to the feature quantity vector calculation unit, the image data of the image data and the sixth of the fifth, the first image data and the second image data is captured It has been from the first time it was taken after a predetermined time, the object recognition device according to claim 1.

The first image data is captured from a first location, said second image data is the first location that was taken from a different second location, object recognition according to claim 1 apparatus.

Wherein the input unit includes a fifth image data which is the image data of the visible light image of the first object, a sixth image data is image data of the far-infrared light image of the first object and further adapted to be input to the feature vector calculating section, the image data of the image data and the sixth of the fifth, the first image data and the second image data is captured the first location is one that is taken from a different second location, object recognition apparatus according to claim 1.

A method for recognizing an object,
(A) including a first image data is image data of the visible light image of the first object thereof, and a second image data is image data of the far-infrared light image of the first object Inputting a first image data set;
(B) At least one selectivity of azimuth selectivity, position selectivity, and spatial frequency characteristics is provided to at least one predetermined position of each of the input first image data and second image data. the first feature vector in which the first image data and said having a second at least one filter output values respectively obtained from the image data as a component, feature quantity space by applying at least one image filters each having A step of seeking
(C) determining whether or not the first object belongs to the specific category based on a relationship between the first feature vector and a predetermined identification parameter. .

A program for causing a computer to execute object recognition processing,
The object recognition process includes
(A) including a first image data is image data of the visible light image of the first object thereof, and a second image data is image data of the far-infrared light image of the first object Inputting a first image data set;
(B) At least one selectivity of azimuth selectivity, position selectivity, and spatial frequency characteristics is provided to at least one predetermined position of each of the input first image data and second image data. the first feature vector in which the first image data and said having a second at least one filter output values respectively obtained from the image data as a component, feature quantity space by applying at least one image filters each having A step of seeking
(C) determining whether the first object belongs to the specific category based on a relationship between the first feature vector and a predetermined identification parameter.

A computer-readable recording medium storing a program for causing a computer to execute object recognition processing,
The object recognition process includes
(A) including a first image data is image data of the visible light image of the first object thereof, and a second image data is image data of the far-infrared light image of the first object Inputting a first image data set;
(B) At least one selectivity of azimuth selectivity, position selectivity, and spatial frequency characteristics is provided to at least one predetermined position of each of the input first image data and second image data. the first feature vector in which the first image data and said having a second at least one filter output values respectively obtained from the image data as a component, feature quantity space by applying at least one image filters each having A step of seeking
(C) a step of determining whether or not the first object belongs to the specific category based on a relationship between the first feature vector and a predetermined identification parameter.