JP3680658B2

JP3680658B2 - Image recognition method and image recognition apparatus

Info

Publication number: JP3680658B2
Application number: JP27870899A
Authority: JP
Inventors: めぐみ山岡; 健司長尾
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1999-09-30
Filing date: 1999-09-30
Publication date: 2005-08-10
Anticipated expiration: 2019-09-30
Also published as: JP2001101405A

Description

【０００１】
【発明の属する技術分野】
本発明は、入力画像が、予め作成済の学習画像データベース中のどの画像と近いかを判定することにより、入力画像上に表示されている物体が何であるかを認識する画像認識方法及び画像認識装置並びに画像認識プログラムを記録した記録媒体に関するものである。
【０００２】
【従来の技術】
従来の画像認識装置は、特開平９−２１６１０号公報に記載されたものが知られている。
【０００３】
図１６は、従来の画像認識装置のブロック構成図を示しており、画像を入力する画像入力手段１１と、抽出対象物の局所モデルを予め格納しているモデル記憶手段１２と、入力画像の各手段分画像について各局所モデルとのマッチングを行うマッチング処理手段１３と、入力画像の各手段分画像がどの程度局所モデルに一致しているかによって画像の位置情報も含めたパラメータ空間で抽出対象物の位置を確率的に表示し統合する局所情報統合手段１４と、パラメータ空間内で最も確立の高い手段分を抽出して入力画像内での抽出対象物の位置を判別して出力する物体位置決定手段１５から構成されている。
【０００４】
【発明が解決しようとする課題】
このような従来の画像認識装置は、異なるモデル間で類似した局所モデルが多くなればなるほど認識が困難になるという課題を有していた。
【０００５】
本発明は、上記従来の課題を解決するもので、異なるモデル間で類似した局所モデルが多数ある場合にも入力画像中の対象を検出し、その位置と対象物体の種類を高精度に推定することを目的とする。
【０００６】
【課題を解決するための手段】
この課題を解決するために本発明は、予め学習画像を登録した学習画像データベースから学習画像を局所領域に分割した各学習局所領域の中から類似する学習局所領域をグループ化し、その各グループを代表する学習局所領域である代表学習局所領域とそのグループに属する全ての学習局所領域の座標を登録した同種ウィンドウ情報データベースと、
入力した画像を局所領域に分割する画像分割手段と、前記各入力局所領域それぞれに対して、前記同種ウィンドウ情報データベースから類似する代表学習局所領域を抽出して、その代表学習局所領域が属するグループの各学習局所領域と入力局所領域とを対応づける類似ウィンドウ抽出手段と、前記各入力局所領域の座標と対応づけされたグループに含まれる学習局所領域の座標から入力画像中の対象物体の位置を推定する対象位置推定手段と、前記推定位置が一致する入力局所領域と学習局所領域の数を集計値として求め、前記集計値が一定値以上である場合に対象があると判断する対象判定手段とを備えたものである。
【０００７】
これにより、本発明は、類似した学習局所領域を１つのグループにまとめ各グループの代表の学習局所領域と各入力局所領域とを画素値に基づいて対応づけることにより、学習画像間で類似ウィンドウが多数ある場合にも、対応づけが早くでき、入力画像中の各局所領域がそれぞれ異なる学習画像の異なる物体の局所領域と一致するような場合にも入力画像中の物体とその位置を高精度に推定することができる。
【０００９】
【発明の実施の形態】
本発明の請求項１に記載の発明は、予め学習画像を登録した学習画像データベースから学習画像を局所領域に分割した各学習局所領域の中から類似する学習局所領域をグループ化し、その各グループを代表する学習局所領域である代表学習局所領域とそのグループに属する全ての学習局所領域の座標を登録した同種ウィンドウ情報データベースと、
入力した画像を局所領域に分割し、各入力局所領域それぞれに対して、前記同種ウィンドウ情報データベースから類似する代表学習局所領域を抽出して、その代表学習局所領域が属するグループの各学習局所領域と入力局所領域とを対応づけ、前記各入力局所領域の座標と対応づけされたグループに含まれる学習局所領域の座標から入力画像中の対象物体の位置を推定して、前記推定位置が一致する入力局所領域と学習局所領域の数を集計値として求め、前記集計値が一定値以上である場合に対象があると判断するもので、類似した学習局所領域を１つのグループにまとめ、各グループの代表の学習局所領域と各入力局所領域とを画素値に基づいて対応づけることにより、学習画像間で類似ウィンドウが多数ある場合にも、対応づけが早くでき、入力画像中の物体の各局所領域がそれぞれ異なる学習画像の異なる物体の局所領域と一致するような場合にも入力画像中の物体とその位置を高精度に推定するという作用を有する。
【００１２】
請求項２に記載の発明は、請求項１記載の画像認識方法において、入力局所領域と代表学習局所領域との対応づけは、各画素値の差の二乗の和または各画素値の差の絶対値の累積値を算出して、最も差の小さいものを抽出するもので、入力局所領域と学習局所領域との対応付けが高精度にできるという作用を有する。
【００１４】
請求項３に記載の発明は、予め学習画像を登録した学習画像データベースから学習画像を局所領域に分割した各学習局所領域の中から類似する学習局所領域をグループ化し、その各グループを代表する学習局所領域である代表学習局所領域とそのグループに属する全ての学習局所領域の座標を登録した同種ウィンドウ情報データベースと、
入力した画像を局所領域に分割する画像分割手段と、前記各入力局所領域それぞれに対して、前記同種ウィンドウ情報データベースから類似する代表学習局所領域を抽出して、その代表学習局所領域が属するグループの各学習局所領域と入力局所領域とを対応づける類似ウィンドウ抽出手段と、前記各入力局所領域の座標と対応づけされたグループに含まれる学習局所領域の座標から入力画像中の対象物体の位置を推定する対象位置推定手段と、前記推定位置が一致する入力局所領域と学習局所領域の数を集計値として求め、前記集計値が一定値以上である場合に対象があると判断する対象判定手段とを備えるもので、類似した各グループの代表の学習局所領域と各入力局所領域とを画素値で対応づけることにより、学習画像間で類似ウィンドウが多数ある場合にも、対応づけが早くでき、入力画像中の物体の各局所領域がそれぞれ異なる学習画像の異なる物体の局所領域と一致するような場合にも入力画像中の物体とその位置を高精度に推定するという作用を有する。
【００１８】
以下、本発明の実施の形態について、図１から図１６を用いて説明する。
【００１９】
（実施の形態１）
図１は、本発明の実施の形態１における画像認識装置のブロック構成図を示している。図１において、１は認識したい対象物の画像データを入力する画像入力手段、２は画像入力手段１で入力した画像を局所ウィンドウに分割して出力する画像分割手段、３は画像分割手段２で分割した各入力ウィンドウに対して類似する学習ウィンドウをデータベースから抽出して、対応する入力ウィンドウと共に出力する類似ウィンドウ抽出手段、４は認識したい物体のモデルを予め作成しておく学習手段、４１は認識したい種々の物体のモデル画像である学習画像を、画像分割手段２で作成する局所ウィンドウと同じサイズに分割して学習ウィンドウとして格納している学習画像データベース、５は類似ウィンドウ抽出手段３で抽出した学習ウィンドウの学習画像上での位置と、それに対応する入力ウィンドウの入力画像上での位置から、対象の入力画像中の位置を算出する対象位置推定手段、６は対象位置推定手段５から入力した各入力ウインドウと学習ウィンドウの推定位置のうち一致するものの数を集計する集計手段、７は集計手段６の集計結果を受けて入力画像中の対象物の有無と対象物の位置を決定する対象決定手段である。
【００２０】
また、図２はコンピュータにより画像認識装置を実現した場合のブロック構成図であり、２０１はコンピュータ、２０２はＣＰＵ、２０３はメモリ、２０４はキーボード及びディスプレイ、２０５は画像認識プログラムを読み込むためのＦＤ、ＰＤ、ＭＯなどの蓄積媒体ユニット、２０６〜２０８はＩ／Ｆユニット、２０９はＣＰＵバス、２１０は画像を取り込むためのカメラ、２１１は予め蓄積されている画像を取り込むための画像データベース、２１２は種々の物体のモデル画像である学習画像を局所ウィンドウに分割して学習ウィンドウとして格納している学習画像データベース、２１３は得られた物体の種類と位置をＩ／Ｆユニットを介して出力する出力端子で構成されている。
【００２１】
以上のように構成された画像認識装置について、以下その動作を図３のフローチャートを用いて説明する。図４は、入力画像の一例、図５は、学習画像の例、図６は、類似ウィンドウ抽出手段３が出力するデータの一例、図７は、集計手段６が出力する集計結果の一例である。
【００２２】
なお、学習画像データベース４１（学習画像データベース２１２）には、予め、認識したい対象の種々の画像が、図５に示すように、学習ウインドウ画像データとして入力ウィンドウと同じサイズのウィンドウに区切られ、学習画像とウィンドウの中心点の位置座標とともに格納されている。ここで、図５は、学習画像１、２で示した向き・大きさのセダンを認識するための学習ウインドウの例である。
【００２３】
認識対象となる画像データを画像入力手段１（カメラ２１０または画像データベース２１１）から入力する（ステップ３０１）。画像分割手段２は、図４に示すように、その画像から一定サイズの局所ウィンドウを任意画素移動させて順次抽出し、各入力ウィンドウをウィンドウの中心点の座標とともに出力する（ステップ３０２）。
【００２４】
類似ウィンドウ抽出手段３は、画像分割手段２から入力された入力ウィンドウと、学習画像データベース４１（学習画像データベース２１２）に蓄積されている全ての学習ウィンドウとの差（例えば、各画素値の差の二乗の和または各画素値の差の絶対値の累積値）を算出して、最も差の小さいものを抽出する。類似ウィンドウ抽出手段３は、全ての入力ウィンドウに対してそれぞれ最も類似した学習ウィンドウを学習画像データベース４１から抽出すると、図６に示すように、学習ウィンドウの中心座標と、対応する入力ウィンドウの中心座標の対で出力する（ステップ３０３）。
【００２５】
対象位置推定手段５は、一組の入力ウィンドウと学習ウィンドウの座標を入力すると（ステップ３０４）、入力画像中の物体の位置（例えば、物体に外接する矩形の左上隅座標すなわち、図５で示した学習画像の原点）を算出し出力する（ステップ３０５）。図６に示すような、任意の入力ウィンドウの座標（α,β）と学習ウィンドウの座標（γ,θ）を入力すると、対象位置推定手段５は物体の位置として（α-γ,β-θ）を出力する。
【００２６】
集計手段６は、ステップ３０５で算出された座標（α-γ,β-θ）を入力すると、その座標への得点として１点加算する（ステップ３０６）。全ての対応する入力ィンドウと学習ウィンドウの組について、ステップ３０４からステップ３０６までの処理が終了したら（ステップ３０７）、集計手段６は図７に示すような位置座標と得点からなる集計データを出力する。
【００２７】
対象画像判定手段７は、座標ごとの得点のうち一定値Ｔより大きいものがあるか否かを判定し（ステップ３０９）、ある場合は入力画像中に対象物体が存在すると判断し、Ｔ以上の得点を持つ物体の位置座標を出力する（ステップ３１０）。また、一定値Ｔ以上の得点のものが無ければ、入力画像中に対象物体は存在しないと判断する（ステップ３１１）。
【００２８】
なお、得られた物体の位置座標は、Ｉ／Ｆユニット２０８を介して出力端子２１３から出力される（ステップ３１２）。
【００２９】
（実施の形態２）
図８は、本発明の実施の形態２における画像認識装置のブロック構成図を示す。図８において、１は認識したい対象物の画像データを入力する画像入力手段、２は画像入力手段１で入力した画像を局所ウィンドウに分割して出力する画像分割手段、３は画像分割手段２で分割した各入力ウィンドウに対して類似する学習ウィンドウをデータベースから抽出して、対応する入力ウィンドウと共に出力する類似ウィンドウ抽出手段、４は認識したい物体のモデルを予め作成しておく学習手段、４１は種々の物体のモデル画像である学習画像を、画像分割手段２で作成する局所ウィンドウと同じサイズに分割して学習ウィンドウとして格納している学習画像データベース、４２は学習画像データベースに格納されている学習ウィンドウの中から相互に類似する学習ウィンドウをグループ化し、その各グループの代表学習ウィンドウの画像データとそのグループに登録されている他の全ての学習ウィンドウの座標を出力し、また類似するウィンドウが無い学習ウィンドウはその画像データと座標を出力する類似ウィンドウ統合部、４３は類似ウィンドウ統合部４２から入力した各グループの代表学習ウィンドウの画像データとその座標データを格納している同種ウィンドウ情報データベース、５は類似ウィンドウ抽出手段３で抽出した学習ウィンドウの学習画像上での位置と、それに対応する入力ウィンドウの入力画像上での位置から、対象の入力画像中の位置を算出する対象位置推定手段、６は対象位置推定手段５から入力した各入力ウインドウと学習ウィンドウの推定位置のうち一致するものの数を集計する集計手段、７は集計手段６の集計結果を受けて入力画像中の対象物の有無と対象物の位置を決定する対象決定手段である。
【００３０】
以上のように構成された画像認識装置について、以下その動作を図９に示すフローチャートを用いて説明する。
【００３１】
図４は入力画像の一例、図５は学習画像の一例、図１０は学習画像データベース４１に格納されている類似ウィンドウの一例、図１１は同種ウィンドウ情報データベース４３に格納されている同種ウィンドウ情報の一例、図１２は類似ウィンドウ抽出手段３が出力するデータの一例、図１３は集計手段６が出力する集計結果の一例である。
【００３２】
なお、学習画像データベース４１は、予め、種々の物体の画像が、図５に示すように、入力ウィンドウと同じサイズのウィンドウに区切られ、ウィンドウ番号とウィンドウの中心点の位置座標とともに格納されている。ここで、図５は、学習画像１、２で示した向き・大きさのセダンを認識するための学習ウインドウの例である。また、同種ウィンドウ情報データベース４３には、図１０に示すような類似ウィンドウの各グループを代表学習ウィンドウとしてその画像データと、そのグループに登録された全ての学習ウィンドウの座標が、類似ウィンドウ統合部４２で学習画像データベース４１から抽出され、図１１のように格納されている。
【００３３】
認識対象となる画像データが画像入力手段１から入力する（ステップ９０１）。画像分割手段２は、図４に示すように、その画像から一定サイズの局所ウィンドウを順次抽出して、各入力ウィンドウとその中心点の座標とともに出力する（ステップ９０２）。
【００３４】
類似ウィンドウ抽出手段３は、画像分割手段２から入力された各入力ウィンドウと、同種ウィンドウ情報データベース４３の全てグループの代表学習ウィンドウとの差（例えば、各画素値の差の二乗の和または各画素値の差の絶対値の累積値）を算出して、最も差の小さいグループを抽出する。類似ウィンドウ抽出手段３は、全ての入力ウィンドウに対してそれぞれ最も類似したグループの学習ウィンドウを抽出することにより、そのグループに登録されている学習ウィンドウも類似（対応）していると見なしその座標を同種ウィンドウ情報データベース４３から抽出し、図１２に示すように、入力ウィンドウの中心座標と、対応する学習ウィンドウの中心座標と、学習ウィンドウが属する車種の対で出力する（ステップ９０３）。
【００３５】
対象位置推定手段５は、一組の入力ウィンドウと学習ウィンドウの座標を入力すると（ステップ９０４）、入力画像中の物体の位置、例えば、物体に外接する矩形の左上隅座標、すなわち、図５で示した学習画像の原点、を算出し車種情報と共に出力する（ステップ９０５）。図１２に示すような、任意の入力ウィンドウ座標（α,β）と学習ウィンドウ座標（γ,θ）を入力すると、対象位置推定手段５は、入力画像中の物体の位置として座標（α-γ,β-θ）を出力する。
【００３６】
集計手段６は、ステップ９０５で算出された入力画像中の物体の座標（α-γ,β-θ）と車種情報を入力すると、その座標・車種への得点として１点加算する（ステップ９０６）。
【００３７】
全ての対応する入力ウインドウと学習ウィンドウについて、ステップ９０４からステップ９０６までの処理が終了したかを判断し（ステップ９０７）、終了した場合は集計手段６から対象画像決定手段７へ、図１２に示すような位置座標・得点・車種別得点の組を出力する。
【００３８】
対象判定手段７は、座標の得点のうち一定値Ｔより大きいものがあるかどうかを判断し（ステップ９０９）、入力画像中に対象物体が存在する場合はＴ以上の得点を持つ位置座標とその座標の得点の中で最も高得点の車種を出力する（ステップ９１０）。また、一定値Ｔ以上の得点のものが無ければ、入力画像中に対象物体は存在しないと判断する（ステップ９１１）。
【００３９】
なお、得られた物体の位置座標と車種は、Ｉ／Ｆユニット２０８を介して出力端子２１３から出力される（ステップ９１２）。
【００４０】
（実施の形態３）
図１４は本発明の実施の形態３における画像認識装置のブロック構成図を示す。図１４において、１は認識したい対象物の画像データを入力する画像入力手段、２は画像入力手段１で入力した画像を局所ウィンドウに分割して出力する画像分割手段、３は画像分割手段２で分割した各入力ウィンドウに対して類似する学習ウィンドウを各種類の学習データベースからそれぞれ一つ抽出して対応する入力ウィンドウと共に出力する類似ウィンドウ抽出手段、４は認識したい物体のモデルを予め認識したい種類ごとに分類して作成しておく学習手段、４１、４２…は認識したい種々の物体のモデル画像である学習画像を、画像分割手段２で作成する局所ウィンドウと同じサイズに分割して学習ウィンドウとして認識したい種類ごとに格納している種類別学習画像データベース、５は類似ウィンドウ抽出手段３で抽出した各種類の学習ウィンドウの学習画像上での位置と、それに対応する入力ウィンドウの入力画像上での位置から、対象の入力画像中の位置を算出する対象位置推定手段、６は対象位置推定手段５から入力した各種類の入力ウインドウと学習ウィンドウの推定位置のうち一致するものの数を集計する集計手段、７は集計手段６の各種類別の集計結果を受けて入力画像中の対象物の有無と対象物の位置を決定する対象決定手段である。
【００４１】
以上のように構成された画像認識装置について、以下その動作を図１５のフローチャートを用いて説明する。図４は入力画像の一例、図５は種類１学習画像の一例、図６は類似ウィンドウ抽出手段３が出力するデータの一例、図１６は種類２学習画像の一例である。
【００４２】
なお、学習手段４の各種類の学習画像データベースには、予め、認識したい種類の対象の画像が、図５に示すように、入力ウィンドウ画像と同じサイズのウィンドウに区切られ、ウィンドウ番号とウィンドウの中心点の位置座標とともに格納されている。ここで、図５は、種類１学習データベースに格納されている学習画像で、学習画像１，２で示した向き・大きさのセダンを認識するための学習画像の例である。また、図１６は、種類２学習データベースに格納されている、図５と同じ位置・同じ向きのバスを認識するための学習画像の例である。
【００４３】
認識対象となる画像データを画像入力手段１から入力する（ステップ１５０１）。画像分割手段２は、図４に示すように、その画像から一定サイズの局所ウィンドウを任意画素移動させて順次抽出し、各入力ウィンドウをウィンドウの中心点の座標とともに出力する（ステップ１５０２）。
【００４４】
類似ウィンドウ抽出手段３は、画像分割手段２から入力ウィンドウを入力すると、学習手段４の全ての学習データベースの学習ウィンドウとの差（例えば、各画素値の差の二乗の和または各画素値の差の絶対値の累積値）を算出して、各学習データベースごとに最も差の小さいものを抽出する。類似ウィンドウ抽出手段３は、全ての入力ウィンドウに対してそれぞれ最も類似した学習ウィンドウを学習手段４から抽出すると、各種類ごとに、図６に示すような学習ウィンドウの中心座標と、それに対応する入力ウィンドウの中心座標の対で出力する（ステップ１５０３）。
【００４５】
対象位置推定手段５は、種類ごとに、一組の入力ウィンドウと学習ウィンドウの座標を入力すると（ステップ１５０４）、入力画像中の物体の位置、例えば、物体に外接する矩形の左上隅座標、すなわち、図５で示した学習画像の原点、を算出し出力する（ステップ１５０５）。図６に示すような、任意の入力ウィンドウ座標（α,β）と学習ウィンドウ座標（γ,θ）を入力すると、対象位置推定手段５は、物体の位置として（α-γ,β-θ）を出力する。
【００４６】
集計手段６は、ステップ１５０５で算出された座標（α-γ,β-θ）を入力すると、種類別にその座標への得点として１点加算する（ステップ１５０６）。
【００４７】
ある種類の全ての対応する入力ウインドウと学習ウィンドウについてステップ１５０４からステップ１５０６までの処理が終了したかを判断し（ステップ１５０７）、次の種類についてステップ１５０４からステップ１５０６までの処理を行い、全ての種類の全ての入力ウインドウと学習ウィンドウについてステップ１５０４からステップ１５０６までの処理が終了したら、集計手段６は対象画像決定手段７へ、各種類ごとに図７に示すような位置座標と得点の組を出力する（ステップ１５０８）。
【００４８】
対象判定手段７は、座標ごとの得点のうち一定値Ｔより大きいものがあるかを判断し（ステップ１５０９）、入力画像中にその種類の物体が存在すると判断した場合は、さらに、同じ座標の得点で一定値Ｔ以上のものが複数あれば、そのうち最高得点をもつ種類の物体が入力画像中に存在すると判断し、その物体の種類と位置座標を出力する（ステップ１５１０）。また、一定値Ｔ以上の得点のものが無ければ、入力画像中に対象物体は存在しないと判断する（ステップ１５１１）。
【００４９】
なお、得られた物体の位置座標と車種は、Ｉ／Ｆユニット２０８を介して出力端子２１３から出力される（ステップ１５１２）。
【００５０】
【発明の効果】
以上のように本発明によれば、各学習画像間で類似した局所ウィンドウが多数ある場合にも、入力画像中の対象の有無や対象の種類を認識でき、かつ、対象の入力画像中の位置を高精度に推定することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態１における画像認識装置のブロック構成図
【図２】本発明の実施の形態１におけるコンピュータによる画像認識装置のブロック構成図
【図３】本発明の実施の形態１における処理の流れを示すフローチャート
【図４】本発明の実施の形態１における入力画像の一例を示す図
【図５】本発明の実施の形態１における学習画像データベースが保管している学習画像データの一例を示す図
【図６】本発明の実施の形態１における類似ウィンドウ抽出手段が出力する入力ウィンドウと学習ウィンドウの対応の一例を示す図
【図７】集計手段が出力する集計の一例を示す図
【図８】本発明の実施の形態２における画像認識装置のブロック構成図
【図９】本発明の実施の形態２における処理の流れを示すフローチャート
【図１０】本発明の実施の形態２における画像データベース中の同種画像の一例を示す図
【図１１】本発明の実施の形態２における同種ウィンドウ情報データベースが保管している同種ウィンドウ情報の一例を示す図
【図１２】本発明の実施の形態２における類似ウィンドウ抽出手段が出力する入力ウィンドウと学習ウィンドウの対応の一例を示す図
【図１３】本発明の実施の形態２における集計手段が出力する集計の一例を示す図
【図１４】本発明の実施の形態３における画像認識装置のブロック構成図
【図１５】本発明の実施の形態３における処理の流れを示すフローチャート
【図１６】本発明の実施の形態３における種類Ｘの学習画像データベースが保管している学習画像データの一例を示す図
【図１７】従来の画像認識装置の一例を示すブロック図
【符号の説明】
１画像入力手段
２画像分割手段
３類似ウインドウ抽出手段
４学習手段
５対象位置推定手段
６集計手段
７対象判定手段
４１学習画像データベース
４２類似ウインドウ統合部
４３同種ウインドウ情報データベース
２０１コンピュータ
２０２ＣＰＵ
２０３メモリ
２０４キーボード／ディスプレイ
２０５蓄積媒体ユニット
２０６〜２０８Ｉ／Ｆユニット
２０９ＣＰＵバス
２１０カメラ
２１１画像データベース
２１２学習画像データベース
２１３出力端子[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image recognition method and an image recognition method for recognizing what an object displayed on an input image is by determining which image in the learning image database created in advance is closer to the input image. The present invention relates to an apparatus and a recording medium on which an image recognition program is recorded.
[0002]
[Prior art]
As a conventional image recognition apparatus, one described in Japanese Patent Application Laid-Open No. 9-21610 is known.
[0003]
FIG. 16 is a block diagram of a conventional image recognition apparatus. The image input unit 11 inputs an image, the model storage unit 12 stores a local model of an extraction target in advance, and each of the input images. The matching processing means 13 for matching the means image with each local model, and the parameter object space including the position information of the image according to how much each means image of the input image matches the local model Local information integration means 14 that probabilistically displays and integrates positions, and object position determination means that extracts the highest-established means in the parameter space, determines the position of the extraction object in the input image, and outputs it 15 is composed.
[0004]
[Problems to be solved by the invention]
Such a conventional image recognition apparatus has a problem that recognition becomes more difficult as the number of similar local models between different models increases.
[0005]
The present invention solves the above-described conventional problem, and detects a target in an input image even when there are many similar local models between different models, and estimates the position and the type of the target object with high accuracy. For the purpose.
[0006]
[Means for Solving the Problems]
In order to solve this problem, the present invention groups similar learning local regions from learning local regions obtained by dividing a learning image into local regions from a learning image database in which learning images are registered in advance, and represents each group. The same kind of window information database in which the coordinates of all the learning local regions belonging to the group and the representative learning local region that is the learning local region to be registered,
An image dividing means for dividing the input image into local regions, and for each of the input local regions, a similar representative learning local region is extracted from the homogeneous window information database, and the group of the representative learning local region belongs to Similar window extraction means for associating each learning local area with the input local area, and estimating the position of the target object in the input image from the coordinates of the learning local area included in the group associated with the coordinates of each input local area Target position estimating means for determining the number of input local areas and learning local areas where the estimated positions coincide with each other as a total value, and target determination means for determining that there is a target when the total value is equal to or greater than a certain value. It is provided.
[0007]
Thus, the present invention collects similar learning local areas into one group and associates the representative learning local area of each group with each input local area based on the pixel value, so that a similar window is created between the learning images. Even when there are a large number, the correspondence can be made quickly, and even when each local region in the input image matches the local region of a different object in a different learning image, the object and its position in the input image can be accurately Can be estimated .
[0009]
DETAILED DESCRIPTION OF THE INVENTION
The invention according to claim 1 of the present invention groups similar learning local regions from learning local regions obtained by dividing a learning image into local regions from a learning image database in which learning images are registered in advance. The same kind of window information database in which the coordinates of all the learning local regions belonging to the representative learning local region and the group that is the representative learning local region are registered;
The input image is divided into local regions, and for each input local region, a similar representative learning local region is extracted from the homogeneous window information database, and each learning local region of the group to which the representative learning local region belongs An input corresponding to the input local area, the position of the target object in the input image is estimated from the coordinates of the learning local area included in the group associated with the coordinates of each input local area, and the estimated positions match calculated by determining the number of local regions and learning the local region as an aggregate value, in which the aggregate value to determine that there is a target is equal to or greater than a predetermined value, grouping similar learning local region into one group, the representative of each group by characterizing the corresponding based learning and local area and each of the input local region to the pixel value, when there similar window number among the learning images also, early association Has the effect of estimating an object and its position also in the input image when such that each local region matches the local area of the different objects of different training images of the object in the input image with high accuracy.
[0012]
Invention according to claim 2, in the image recognition method of claim 1, wherein, correspondence between the input local region and representatives learning local region, the difference between the sum or the pixel values of the squares of the differences between the pixel values The cumulative value of absolute values is calculated, and the one with the smallest difference is extracted. This has the effect that the input local region and the learning local region can be associated with high accuracy.
[0014]
According to the third aspect of the present invention, similar learning local regions are grouped from learning local regions obtained by dividing a learning image into local regions from a learning image database in which learning images are registered in advance, and learning representing each group is performed. The same kind of window information database in which the coordinates of the representative learning local region that is the local region and the coordinates of all the learning local regions belonging to the group are registered,
An image dividing means for dividing the input image into local regions, and for each of the input local regions, a similar representative learning local region is extracted from the homogeneous window information database, and the group of the representative learning local region belongs to Similar window extraction means for associating each learning local area with the input local area, and estimating the position of the target object in the input image from the coordinates of the learning local area included in the group associated with the coordinates of each input local area Target position estimating means for determining the number of input local areas and learning local areas where the estimated positions coincide with each other as a total value, and target determination means for determining that there is a target when the total value is equal to or greater than a certain value. as it has, by associating the learning local region of a representative of each group was similar to the input local region pixel value, similar Wynn among learning image C even if there are many, can quickly correspondence, the object and its position also in the input image when such that each local region matches the local area of the different objects of different training images of the object in the input image It has the effect | action that it estimates with high precision.
[0018]
Hereinafter, embodiments of the present invention will be described with reference to FIGS.
[0019]
(Embodiment 1)
FIG. 1 shows a block diagram of an image recognition apparatus according to Embodiment 1 of the present invention. In FIG. 1, 1 is an image input means for inputting image data of an object to be recognized, 2 is an image dividing means for dividing an image input by the image input means 1 into local windows, and 3 is an image dividing means 2. Similar window extraction means for extracting a learning window similar to each divided input window from the database and outputting it together with the corresponding input window, 4 for learning means for creating a model of an object to be recognized in advance, and 41 for recognition A learning image database in which learning images, which are model images of various objects to be divided, are divided into the same size as a local window created by the image dividing means 2 and stored as a learning window, 5 is extracted by the similar window extracting means 3 From the position of the learning window on the learning image and the position of the corresponding input window on the input image, Target position estimating means for calculating the position of the elephant in the input image, 6 is a counting means for counting the number of matching input windows input from the target position estimating means 5 and the estimated positions of the learning window, and 7 is a counting means. 6 is a target determination unit that receives the counting result of 6 and determines the presence / absence of the target in the input image and the position of the target.
[0020]
FIG. 2 is a block diagram when the image recognition apparatus is realized by a computer. 201 is a computer, 202 is a CPU, 203 is a memory, 204 is a keyboard and display, 205 is an FD for reading an image recognition program, Storage medium units such as PD and MO, 206 to 208 are I / F units, 209 is a CPU bus, 210 is a camera for capturing images, 211 is an image database for capturing prestored images, and 212 is various A learning image database in which a learning image that is a model image of the object is divided into local windows and stored as a learning window, 213 is an output terminal that outputs the type and position of the obtained object via the I / F unit It is configured.
[0021]
The operation of the image recognition apparatus configured as described above will be described below with reference to the flowchart of FIG. 4 is an example of an input image, FIG. 5 is an example of a learning image, FIG. 6 is an example of data output by the similar window extraction unit 3, and FIG. 7 is an example of an aggregation result output by the aggregation unit 6. .
[0022]
In the learning image database 41 (learning image database 212), various images to be recognized are divided into windows having the same size as the input window as learning window image data, as shown in FIG. Stored together with the image and the coordinates of the center point of the window. Here, FIG. 5 is an example of a learning window for recognizing the orientation / size sedan shown by the learning images 1 and 2.
[0023]
Image data to be recognized is input from the image input means 1 (camera 210 or image database 211) (step 301). As shown in FIG. 4, the image dividing means 2 sequentially extracts local windows of a certain size by moving arbitrary pixels from the image, and outputs each input window together with the coordinates of the center point of the window (step 302).
[0024]
The similar window extraction unit 3 includes differences between the input window input from the image division unit 2 and all learning windows stored in the learning image database 41 (learning image database 212) (for example, differences in pixel values). The sum of squares or the cumulative value of the absolute values of the differences between the pixel values is calculated, and the one with the smallest difference is extracted. When the similar window extracting unit 3 extracts the learning window most similar to all the input windows from the learning image database 41, as shown in FIG. 6, the center coordinates of the learning window and the center coordinates of the corresponding input window are displayed. Are output in pairs (step 303).
[0025]
When the target position estimation means 5 inputs the coordinates of a pair of input window and learning window (step 304), the position of the object in the input image (for example, the upper left corner coordinates of a rectangle circumscribing the object, ie, shown in FIG. 5). The learning image origin) is calculated and output (step 305). When the coordinates (α, β) of the arbitrary input window and the coordinates (γ, θ) of the learning window as shown in FIG. 6 are input, the target position estimation means 5 sets (α-γ, β-θ as the position of the object. ) Is output.
[0026]
The counting means 6 receives the coordinates (α−γ, β−θ) calculated in step 305, and adds one point as a score to the coordinates (step 306). When the processing from step 304 to step 306 is completed for all corresponding combinations of input windows and learning windows (step 307), the counting means 6 outputs the total data including the position coordinates and the scores as shown in FIG. .
[0027]
The target image determination means 7 determines whether or not there is a score for each coordinate greater than a certain value T (step 309). If there is, the target image determination means 7 determines that the target object is present in the input image, and is equal to or greater than T. The position coordinates of the object having the score are output (step 310). If there is no score of a certain value T or more, it is determined that the target object does not exist in the input image (step 311).
[0028]
The obtained position coordinates of the object are output from the output terminal 213 via the I / F unit 208 (step 312).
[0029]
(Embodiment 2)
FIG. 8 is a block diagram of an image recognition apparatus according to Embodiment 2 of the present invention. In FIG. 8, 1 is an image input means for inputting image data of an object to be recognized, 2 is an image dividing means for dividing an image inputted by the image input means 1 into local windows, and 3 is an image dividing means 2. Similar window extraction means for extracting a learning window similar to each divided input window from the database and outputting it together with the corresponding input window, 4 is a learning means for creating a model of an object to be recognized in advance, 41 is various A learning image database that stores a learning image that is a model image of the object in the same size as a local window created by the image dividing means 2 and stores it as a learning window, and 42 indicates a learning window stored in the learning image database The learning windows that are similar to each other are grouped, and the representative learning window of each group Image data and coordinates of all other learning windows registered in the group are output, and a learning window having no similar window outputs the image data and coordinates, and 43 is a similar window integration. The same kind of window information database storing the image data of the representative learning window of each group and the coordinate data inputted from the unit 42, 5 is the position of the learning window extracted by the similar window extracting means 3 on the learning image, and Target position estimation means for calculating the position in the target input image from the position on the input image of the corresponding input window, 6 is the same among the estimated positions of the learning window and each input window input from the target position estimation means 5 Counting means for counting the number of items to be processed, 7 in the input image in response to the counting result of the counting means 6 A target determination means for determining the position of the presence or absence and the object of the object.
[0030]
The operation of the image recognition apparatus configured as described above will be described below with reference to the flowchart shown in FIG.
[0031]
4 is an example of an input image, FIG. 5 is an example of a learning image, FIG. 10 is an example of a similar window stored in the learning image database 41, and FIG. 11 is a window of similar window information stored in the same window information database 43. For example, FIG. 12 shows an example of data output by the similar window extraction unit 3, and FIG. 13 shows an example of a totaling result output by the totaling unit 6.
[0032]
In the learning image database 41, images of various objects are previously divided into windows having the same size as the input window as shown in FIG. 5, and stored together with the window number and the position coordinates of the center point of the window. . Here, FIG. 5 is an example of a learning window for recognizing the orientation / size sedan shown by the learning images 1 and 2. Further, in the similar window information database 43, each group of similar windows as shown in FIG. 10 is used as a representative learning window, the image data thereof, and the coordinates of all learning windows registered in the group are stored in the similar window integration unit 42. Are extracted from the learning image database 41 and stored as shown in FIG.
[0033]
Image data to be recognized is input from the image input means 1 (step 901). As shown in FIG. 4, the image dividing means 2 sequentially extracts local windows of a certain size from the image, and outputs them together with the coordinates of each input window and its center point (step 902).
[0034]
The similar window extracting unit 3 calculates the difference between each input window input from the image dividing unit 2 and the representative learning windows of all groups in the same type window information database 43 (for example, the sum of the squares of the difference between the pixel values or each pixel). (A cumulative value of absolute values of the difference of values) is calculated, and the group having the smallest difference is extracted. The similar window extracting means 3 extracts the learning window of the most similar group for all the input windows, thereby considering that the learning windows registered in the group are similar (corresponding) and the coordinates thereof. Extracted from the same-type window information database 43, and output as a pair of the input window center coordinates, the corresponding learning window center coordinates, and the vehicle type to which the learning window belongs (step 903), as shown in FIG.
[0035]
When the target position estimation means 5 inputs the coordinates of a pair of input window and learning window (step 904), the position of the object in the input image, for example, the upper left corner coordinates of the rectangle circumscribing the object, that is, in FIG. The origin of the learning image shown is calculated and output together with the vehicle type information (step 905). When arbitrary input window coordinates (α, β) and learning window coordinates (γ, θ) as shown in FIG. 12 are input, the target position estimating means 5 uses coordinates (α−γ) as the position of the object in the input image. , β-θ).
[0036]
When the counting means 6 inputs the coordinates (α-γ, β-θ) of the object in the input image calculated in step 905 and the vehicle type information, one point is added as a score to the coordinates and the vehicle type (step 906). .
[0037]
For all the corresponding input windows and learning windows, it is determined whether the processing from step 904 to step 906 has been completed (step 907). If it has been completed, the counting means 6 to the target image determining means 7 are shown in FIG. Such a set of position coordinates, points, and vehicle type points is output.
[0038]
The target determination means 7 determines whether there is a coordinate score that is greater than a certain value T (step 909). If the target object is present in the input image, the position coordinate having a score of T or more and its coordinates The vehicle model with the highest score among the coordinate scores is output (step 910). If there is no score of a certain value T or more, it is determined that there is no target object in the input image (step 911).
[0039]
The obtained object position coordinates and vehicle type are output from the output terminal 213 via the I / F unit 208 (step 912).
[0040]
(Embodiment 3)
FIG. 14 is a block diagram of an image recognition apparatus according to Embodiment 3 of the present invention. In FIG. 14, 1 is an image input means for inputting image data of an object to be recognized, 2 is an image dividing means for dividing an image input by the image input means 1 into local windows, and 3 is an image dividing means 2. Similar window extraction means for extracting one similar learning window for each divided input window from each type of learning database and outputting it together with the corresponding input window, 4 for each type for which an object model to be recognized is to be recognized in advance The learning means 41, 42,..., Which are classified and created, are divided into the same size as the local window created by the image dividing means 2 and recognized as learning windows, which are model images of various objects to be recognized. The learning image database classified by type stored for each type desired, and 5 each type extracted by the similar window extraction means 3 Target position estimation means for calculating the position in the target input image from the position on the learning image of the learning window and the position on the input image of the corresponding input window, 6 is input from the target position estimation means 5 Aggregation means for aggregating the number of the matching positions of the input windows of each type and the learning window, and 7 indicates the presence or absence of the object in the input image and the position of the object in response to the aggregation result for each type of the aggregation means 6 Is an object determining means for determining.
[0041]
The operation of the image recognition apparatus configured as described above will be described below with reference to the flowchart of FIG. 4 is an example of an input image, FIG. 5 is an example of a type 1 learning image, FIG. 6 is an example of data output by the similar window extraction unit 3, and FIG. 16 is an example of a type 2 learning image.
[0042]
In the learning image database of each type of learning means 4, the target image of the type to be recognized is divided in advance into windows having the same size as the input window image, as shown in FIG. Stored with the position coordinates of the center point. Here, FIG. 5 is an example of a learning image for recognizing the sedan having the orientation and size indicated by the learning images 1 and 2 in the learning image stored in the type 1 learning database. FIG. 16 is an example of a learning image for recognizing a bus that is stored in the type 2 learning database and has the same position and direction as FIG.
[0043]
Image data to be recognized is input from the image input means 1 (step 1501). As shown in FIG. 4, the image dividing means 2 sequentially extracts a local window of a certain size by moving arbitrary pixels from the image, and outputs each input window together with the coordinates of the center point of the window (step 1502).
[0044]
When the input window is input from the image dividing unit 2, the similar window extracting unit 3 receives a difference (for example, the sum of the squares of the differences between the pixel values or the difference between the pixel values) from all the learning databases of the learning unit 4. (The cumulative value of the absolute values) is calculated, and the one with the smallest difference is extracted for each learning database. When the similar window extracting unit 3 extracts the learning window most similar to all the input windows from the learning unit 4, for each type, the center coordinates of the learning window as shown in FIG. A window center coordinate pair is output (step 1503).
[0045]
When the target position estimation means 5 inputs the coordinates of a pair of input window and learning window for each type (step 1504), the position of the object in the input image, for example, the upper left corner coordinates of the rectangle circumscribing the object, that is, Then, the origin of the learning image shown in FIG. 5 is calculated and output (step 1505). When arbitrary input window coordinates (α, β) and learning window coordinates (γ, θ) as shown in FIG. 6 are input, the target position estimating means 5 sets (α-γ, β-θ) as the object position. Is output.
[0046]
The counting means 6 receives the coordinates (α−γ, β−θ) calculated in step 1505, and adds one point as a score to the coordinates for each type (step 1506).
[0047]
It is determined whether or not the processing from step 1504 to step 1506 has been completed for all corresponding input windows and learning windows of a certain type (step 1507), and the processing from step 1504 to step 1506 is performed for the next type, When the processing from step 1504 to step 1506 is completed for all types of input windows and learning windows, the counting unit 6 sends the set of position coordinates and scores as shown in FIG. 7 to the target image determining unit 7 for each type. Output (step 1508).
[0048]
The object determination means 7 determines whether there is a score greater than a certain value T among the scores for each coordinate (step 1509), and if it is determined that an object of that type exists in the input image, the object determination means 7 further has the same coordinates. If there are a plurality of scores that are equal to or greater than the predetermined value T, it is determined that the type of object having the highest score is present in the input image, and the type and position coordinates of the object are output (step 1510). If there is no score of a certain value T or more, it is determined that there is no target object in the input image (step 1511).
[0049]
The obtained object position coordinates and vehicle type are output from the output terminal 213 via the I / F unit 208 (step 1512).
[0050]
【The invention's effect】
As described above, according to the present invention, even when there are many similar local windows between learning images, the presence or absence of the target in the input image and the type of the target can be recognized, and the position in the target input image Can be estimated with high accuracy.
[Brief description of the drawings]
FIG. 1 is a block configuration diagram of an image recognition apparatus according to Embodiment 1 of the present invention. FIG. 2 is a block configuration diagram of an image recognition apparatus using a computer according to Embodiment 1 of the present invention. FIG. 4 is a diagram showing an example of an input image according to Embodiment 1 of the present invention. FIG. 5 is learning image data stored in a learning image database according to Embodiment 1 of the present invention. FIG. 6 is a diagram showing an example of the correspondence between the input window output by the similar window extracting unit and the learning window according to the first embodiment of the present invention. FIG. 7 is an example of the tabulation output by the tabulating unit. FIG. 8 is a block configuration diagram of an image recognition apparatus according to a second embodiment of the present invention. FIG. 9 is a flowchart showing a process flow according to the second embodiment of the present invention. FIG. 11 is a view showing an example of the same kind of image in the image database according to the second embodiment of the present invention. FIG. 11 is a view showing an example of the same kind of window information stored in the same kind of window information database according to the second embodiment of the present invention. FIG. 12 is a diagram showing an example of correspondence between an input window output by a similar window extraction unit and a learning window in Embodiment 2 of the present invention. FIG. 13 shows an example of aggregation output by a tabulation unit in Embodiment 2 of the present invention. FIG. 14 is a block diagram of an image recognition apparatus according to the third embodiment of the present invention. FIG. 15 is a flowchart showing a process flow according to the third embodiment of the present invention. FIG. 17 is a diagram showing an example of learning image data stored in a type X learning image database in FIG. 3; Click view DESCRIPTION OF SYMBOLS
DESCRIPTION OF SYMBOLS 1 Image input means 2 Image division means 3 Similar window extraction means 4 Learning means 5 Target position estimation means 6 Total means 7 Object determination means 41 Learning image database 42 Similar window integration part 43 Similar window information database 201 Computer 202 CPU
203 Memory 204 Keyboard / Display 205 Storage Medium Unit 206-208 I / F Unit 209 CPU Bus 210 Camera 211 Image Database 212 Learning Image Database 213 Output Terminal

Claims

A learning local region is grouped from learning local regions obtained by dividing a learning image into local regions from a learning image database in which learning images are registered in advance, and a representative learning local region that is a learning local region representing each group A homogeneous window information database in which coordinates of all learning local regions belonging to the group are registered;
The input image is divided into local regions, and for each input local region, a similar representative learning local region is extracted from the homogeneous window information database, and each learning local region of the group to which the representative learning local region belongs correspondence with the input local region, said respectively estimate the position of the target object in the input image from the coordinates of each learning local regions included in the coordinates correspondence groups of each input local region, the estimated position is matched An image recognition method characterized in that the number of input local areas and learning local areas to be obtained is calculated as a total value, and it is determined that there is a target when the total value is a predetermined value or more.

The correspondence between the input local area and each learning local area of the group to which the representative learning local area belongs is calculated by calculating the sum of the squares of the differences between the pixel values or the cumulative value of the absolute values of the differences between the pixel values. the image recognition method according to claim 1, wherein the extracting something small.

A learning local region is grouped from learning local regions obtained by dividing a learning image into local regions from a learning image database in which learning images are registered in advance, and a representative learning local region that is a learning local region representing each group A homogeneous window information database in which the coordinates and types of all learning local areas belonging to the group are registered,
An image dividing means for dividing the input image into local regions, and for each of the input local regions, a similar representative learning local region is extracted from the homogeneous window information database, and the group of the representative learning local region belongs to Similar window extraction means for associating each learning local area with the input local area, and estimating the position of the target object in the input image from the coordinates of the learning local area included in the group associated with the coordinates of each input local area Target position estimating means for determining the number of input local areas and learning local areas where the estimated positions coincide with each other as a total value, and target determination means for determining that there is a target when the total value is equal to or greater than a certain value. An image recognition apparatus comprising: