JP4259365B2

JP4259365B2 - Image recognition apparatus and image recognition method

Info

Publication number: JP4259365B2
Application number: JP2004089649A
Authority: JP
Inventors: 雅道大杉
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2004-03-25
Filing date: 2004-03-25
Publication date: 2009-04-30
Anticipated expiration: 2024-03-25
Also published as: JP2005275915A

Description

本発明は、撮像画像から認識対象物を認識する画像認識装置及び画像認識方法に関する。 The present invention relates to an image recognition apparatus and an image recognition method for recognizing a recognition object from a captured image.

個人認証等に利用するために、人物の顔を撮像し、その撮像画像からの人物の顔を認識する装置が開発されている。顔認識装置としては、例えば、顔の複数の特徴となる部分（目、鼻、口等）の参照画像を用意し、参照画像を用いて撮像画像から特徴部分と類似する部分を探索し、類似度の高い位置を各特徴部分の候補位置として検出し、その相対的な位置関係についてそれぞれパターンマッチングすることによって顔を認識するものがある（特許文献１参照）。
特開２０００−９９７２２号公報 In order to use for personal authentication and the like, an apparatus for capturing a human face and recognizing the human face from the captured image has been developed. As a face recognition device, for example, a reference image of a plurality of facial features (eyes, nose, mouth, etc.) is prepared, and a portion similar to the characteristic portion is searched from the captured image using the reference image. There is a technique for recognizing a face by detecting a position with a high degree as a candidate position of each feature portion and pattern matching the relative positional relationship (see Patent Document 1).
JP 2000-99722 A

しかしながら、上記顔認識装置では、類似度の高い位置が少しでもずれると、相対的な位置のパターンの組み合せが増加し、特に、複数の特徴部分の位置がずれると、パターンの組み合せは膨大となる。そのため、処理負荷が増加し、処理時間が長くなる。 However, in the face recognition device, if a position with a high degree of similarity is shifted even a little, the combination of patterns at relative positions increases. In particular, if the positions of a plurality of feature portions are shifted, the combination of patterns becomes enormous. . As a result, the processing load increases and the processing time becomes longer.

そこで、本発明は、処理負荷を軽減する画像認識装置及び画像認識方法を提供することを課題とする。 Therefore, an object of the present invention is to provide an image recognition apparatus and an image recognition method that reduce the processing load.

本発明に係る画像認識装置は、撮像手段と、認識対象物の複数の特徴部分の参照画像を保持する参照画像保持手段と、参照画像に対応して当該参照画像の画像全体における位置と画像中の任意の点との位置関係を保持する位置関係保持手段と、撮像手段で撮像した撮像画像と参照画像保持手段で保持している複数の参照画像とをそれぞれ比較する比較手段と、位置関係保持手段で保持している位置関係に基づいて比較手段による複数の比較結果を画像中の任意の点を基準としてそれぞれ移動させ、当該移動させた複数の比較結果を統合し、当該統合結果に基づいて認識対象物か否かを判定する判定手段とを備えることを特徴とする。
An image recognition apparatus according to the present invention includes an imaging unit, a reference image holding unit that holds a reference image of a plurality of characteristic parts of a recognition target, a position of the reference image in the entire image corresponding to the reference image, A positional relationship holding unit that holds a positional relationship with an arbitrary point, a comparison unit that compares a captured image captured by the imaging unit with a plurality of reference images held by the reference image holding unit, and a positional relationship holding Based on the positional relationship held by the means, the plurality of comparison results by the comparison means are respectively moved with reference to any point in the image, the plurality of moved comparison results are integrated, and based on the integration result And determining means for determining whether the object is a recognition object.

この画像認識装置では、参照画像保持手段に認識対象物の特徴となる部分（例えば、認識対象物が人物の顔の場合には目、鼻、口等）の参照画像を複数保持するとともに、位置関係保持手段に保持している参照画像毎にその参照画像が画像全体における位置と画像中の任意の点（例えば、中心点、左上点、右下点）との位置関係を保持している。画像認識装置では、撮像手段によりある対象物を撮像し、撮像画像を取得する。そして、画像認識装置では、比較手段により撮像画像と複数の参照画像とをそれぞれ比較し、各参照画像に対する比較結果を取得する。この比較では、撮像画像から各参照画像のサイズの領域を縦方向及び横方向にずらしながら画像全体にわたって順次切り出し、その切り出した多数の領域と参照画像とをそれぞれ比較する。したがって、比較結果としては、各参照画像に対して、撮像画像から切り出した領域の数分の各領域の位置に対する結果がそれぞれ得られる。撮像画像において参照画像（特徴部分）と類似している領域がある場合にはその領域の位置に対する類似度が高くなり、撮像画像において参照画像と類似していない領域ではその領域の位置に対する類似度が低くなる。さらに、画像認識装置では、判定手段により参照画像毎に比較結果をその参照画像に対応する位置関係に基づいて移動させ、この移動させた複数の比較結果に基づいて認識対象物か否かを判定する。撮像されたある対象物が各参照画像に類似する特徴部分を有している場合、各参照画像の比較結果では、参照画像が画像全体における位置（参照画像となっている特徴部分が撮像画像において存在する位置）周辺の類似度がそれぞれ高くなっている。この場合、その各比較結果を画像の任意の点を基準として移動させると、各比較結果における類似度が高い位置がその任意の点に集まる。そのため、その移動させた複数の比較結果において任意の点に類似度が高いことを示す結果が得られる場合、複数の参照画像の特徴部分を有しているので、その撮像されたある対象物が認識対象物であると判断できる。このように、この画像認識装置では、複数の特徴部分の参照画像と撮像画像をそれぞれ比較し、その複数の比較結果を移動させて統合するだけなので、各撮像画像に対する処理負荷（処理時間）が一定となる。そのため、従来より、処理負荷が軽減され、処理時間が短くなる。ちなみに、従来のように複数の特徴部分を探索し、その探索した特徴部分の相対的な位置関係についてパターンマッチングを行わないので、パターンの組み合せが増加して処理負荷が増加するようなことはない。また、この画像認識装置では、各参照画像に対する比較を行うことにより撮像画像において特徴部分に類似している部分が存在しているか否かを検出でき、その求めた比較結果を幾何学的な位置関係を元にして統合することにより撮像画像において特徴部分と類似している部分の位置が合っているか否かを検出できるので、認識精度も高い。また、この画像認識装置では、各比較結果にノイズがあった場合（例えば、撮像画像において参照画像との類似度が高くなる領域が他にも存在した場合）でも、複数の比較結果を移動させて統合しているのでそのノイズが抑制される。また、この画像認識装置では、参照画像の数（すなわち、比較する特徴部分の数）を増やすほど、認識精度が高くなる。 In this image recognition apparatus, the reference image holding means holds a plurality of reference images of parts that are characteristics of the recognition target (for example, eyes, nose, mouth, etc. when the recognition target is a human face) For each reference image held in the relationship holding means, the reference image holds the positional relationship between the position in the whole image and any point in the image (for example, the center point, the upper left point, the lower right point). In the image recognition apparatus, a certain object is imaged by an imaging unit, and a captured image is acquired. In the image recognition apparatus, the comparison unit compares the captured image with a plurality of reference images, and acquires a comparison result for each reference image. In this comparison, a region of the size of each reference image is sequentially cut out from the captured image over the entire image while shifting in the vertical direction and the horizontal direction, and the cut out many regions are compared with the reference image. Therefore, as a comparison result, a result is obtained for each reference image with respect to the position of each region corresponding to the number of regions cut out from the captured image. When there is an area similar to the reference image (feature part) in the captured image, the similarity to the position of the area is high, and in the area not similar to the reference image in the captured image, the similarity to the position of the area Becomes lower. Further, in the image recognition apparatus, the determination unit moves the comparison result for each reference image based on the positional relationship corresponding to the reference image, and determines whether the object is a recognition object based on the plurality of moved comparison results. To do. When a certain captured object has a feature portion similar to each reference image, the comparison result of each reference image indicates that the reference image is located in the entire image (the feature portion that is the reference image is The degree of similarity around the existing position) is high. In this case, when each comparison result is moved with reference to an arbitrary point of the image, a position having a high similarity in each comparison result gathers at the arbitrary point. Therefore, when a result indicating that the degree of similarity is high at an arbitrary point is obtained in the plurality of moved comparison results, since there are a plurality of reference image feature parts, the captured object is It can be determined that the object is a recognition object. In this way, in this image recognition apparatus, the reference images and captured images of a plurality of characteristic portions are respectively compared, and the comparison results are simply moved and integrated, so that the processing load (processing time) for each captured image is increased. It becomes constant. As a result, the processing load is reduced and the processing time is shortened. By the way, since a plurality of feature portions are searched as in the past and pattern matching is not performed on the relative positional relationship of the searched feature portions, the combination of patterns does not increase and the processing load does not increase. . Further, in this image recognition device, it is possible to detect whether or not there is a portion similar to the feature portion in the captured image by performing comparison with each reference image, and the obtained comparison result is obtained from the geometric position. By integrating based on the relationship, it is possible to detect whether or not the position of the portion similar to the characteristic portion in the captured image is matched, and thus the recognition accuracy is high. In addition, in this image recognition apparatus, even when there is noise in each comparison result (for example, when there are other regions in the captured image that have high similarity to the reference image), the plurality of comparison results are moved. The noise is suppressed. In this image recognition apparatus, the recognition accuracy increases as the number of reference images (that is, the number of feature portions to be compared) is increased.

本発明の上記画像認識装置では、撮像手段で撮像した撮像画像のサイズを変換する画像サイズ変換手段を備え、比較手段では、画像サイズ変換手段でサイズ変換した撮像画像と参照画像保持手段で保持している複数の参照画像とをそれぞれ比較する構成としてもよい。 The image recognition apparatus of the present invention includes an image size conversion unit that converts the size of a captured image captured by the imaging unit, and the comparison unit holds the captured image converted by the image size conversion unit and the reference image storage unit. A plurality of reference images may be compared with each other.

この画像認識装置では、画像サイズ変換手段により撮像画像のサイズを変換し、比較手段においてその変換したサイズの撮像画像を用いて比較を行う。そのため、撮像画像上での対象物のサイズ変化に対応でき、撮像画像と参照画像との比較を可能とする。また、撮像画像のサイズを小さくする場合、処理負荷が更に軽減され、ノイズも低減することができる。 In this image recognition apparatus, the size of the captured image is converted by the image size conversion unit, and the comparison unit performs comparison using the captured image of the converted size. Therefore, it is possible to cope with a change in the size of the object on the captured image and to compare the captured image with the reference image. In addition, when the size of the captured image is reduced, the processing load is further reduced and noise can be reduced.

本発明に係る画像認識方法は、撮像画像から認識対象物を認識する画像認識方法であって、認識対象物の複数の特徴部分の参照画像及び参照画像に対応して当該参照画像の画像全体における位置と画像中の任意の点との位置関係を予め保持し、撮像画像と保持している複数の参照画像とをそれぞれ比較する比較ステップと、保持している位置関係に基づいて比較ステップによる複数の比較結果を画像中の任意の点を基準としてそれぞれ移動させ、当該移動させた複数の比較結果を統合し、当該統合結果に基づいて認識対象物か否かを判定する判定ステップとを含むことを特徴とする。 An image recognition method according to the present invention is an image recognition method for recognizing a recognition object from a captured image, and the reference image of the plurality of characteristic parts of the recognition object and the reference image in the entire image of the reference image A comparison step in which a positional relationship between a position and an arbitrary point in the image is held in advance and the captured image is compared with a plurality of held reference images, respectively, and a plurality of comparison steps based on the held positional relationship. comparison result of moving each relative to the arbitrary point in the image, by integrating the plurality of comparison results obtained by the mobile, to include a determination step of determining whether or not the recognition target object based on the integration result It is characterized by.

本発明の上記画像認識方法では、撮像画像のサイズを変換する画像サイズ変換ステップを含み、比較ステップでは、画像サイズ変換ステップでサイズ変換した撮像画像と保持している複数の参照画像とをそれぞれ比較する構成としてもよい。 The image recognition method of the present invention includes an image size conversion step for converting the size of the captured image. In the comparison step, the captured image that has been subjected to the size conversion in the image size conversion step is compared with a plurality of held reference images. It is good also as composition to do.

上記した各画像認識方法では、上記した画像認識装置と同様の作用効果を奏する。 Each of the above-described image recognition methods has the same effects as the above-described image recognition device.

本発明によれば、認識対象物を認識するための処理負荷を軽減できる。 According to the present invention, the processing load for recognizing a recognition target object can be reduced.

以下、図面を参照して、本発明に係る画像認識装置及び画像認識方法の実施の形態を説明する。 Embodiments of an image recognition apparatus and an image recognition method according to the present invention will be described below with reference to the drawings.

本実施の形態では、本発明を、人物又は自動車を対象とした画像認識装置に適用する。本実施の形態に係る画像認識装置では、人物を対象とした場合には認識対象が顔であり、人物の顔であるか否かを判別し、自動車を対象とした場合には認識対象がバックスタイルであり、ある車種であるか否かを判別する。本実施の形態に係る画像認識装置は、人物及び自動車のいずれを対象とした場合も同様の構成及び動作であるが、特徴部分の参照画像と参照画像と画像の重心との距離（オフセット量）を対象に応じてそれぞれ保持している。本実施の形態では、まず、人物の顔認識に適用した場合の画像認識装置の構成について説明し、次に、その自動車の車種認識に適用した場合の構成について説明し、最後に、画像認識装置１の動作について説明する。 In the present embodiment, the present invention is applied to an image recognition apparatus targeting a person or a car. In the image recognition apparatus according to the present embodiment, when a person is targeted, it is determined whether the recognition target is a face and whether it is a person's face. It is a style and it is discriminated whether it is a certain vehicle type. The image recognition apparatus according to the present embodiment has the same configuration and operation for any one of a person and a car, but the distance (offset amount) between the reference image of the characteristic part, the reference image, and the center of gravity of the image. Are held according to the target. In the present embodiment, first, the configuration of the image recognition device when applied to human face recognition will be described, then the configuration when applied to vehicle type recognition of the automobile will be described, and finally, the image recognition device The operation of No. 1 will be described.

図１〜図７を参照して、人物の顔認識に適用した場合の画像認識装置１の構成について説明する。図１は、本実施の形態に係る画像認識装置の構成図である。図２は、図１の画像間類似度評価部の説明図である。図３は、認識対象を人物の顔とした場合の参照画像の一例であり、（ａ）が参照画像を作成するための人物の顔画像であり、（ｂ）が（ａ）図の顔画像における両目、鼻、口を特徴部分とした参照画像である。図４は、ある人物の顔を正面から撮像した撮像画像の一例であり、（ａ）が撮像画像であり、（ｂ）が顔を認識した結果を示す撮像画像である。図５は、図３の参照画像に基づいて図４の撮像画像から顔を認識する際の類似度評価マップであり、（ａ）が両目の参照画像に対する類似度評価マップであり、（ｂ）が鼻の参照画像に対する類似度評価マップであり、（ｃ）が口の参照画像に対する類似度評価マップである。図６は、図５の類似度評価マップを重心にオフセットしたオフセット類似度評価マップであり、（ａ）が両目の参照画像に対するオフセット類似度評価マップであり、（ｂ）が鼻の参照画像に対するオフセット類似度評価マップであり、（ｃ）が口の参照画像に対するオフセット類似度評価マップである。図７は、図６の３つのオフセット類似度評価マップを統合させた統合マップである。 With reference to FIGS. 1-7, the structure of the image recognition apparatus 1 at the time of applying to a person's face recognition is demonstrated. FIG. 1 is a configuration diagram of an image recognition apparatus according to the present embodiment. FIG. 2 is an explanatory diagram of the inter-image similarity evaluation unit of FIG. FIG. 3 is an example of a reference image when the recognition target is a person's face, (a) is a person's face image for creating a reference image, and (b) is the face image of FIG. Is a reference image having the eyes, nose, and mouth as features. FIG. 4 is an example of a captured image obtained by capturing a person's face from the front. (A) is a captured image, and (b) is a captured image showing a result of recognizing the face. FIG. 5 is a similarity evaluation map for recognizing a face from the captured image of FIG. 4 based on the reference image of FIG. 3, (a) is a similarity evaluation map for the reference images of both eyes, and (b) Is the similarity evaluation map for the nose reference image, and (c) is the similarity evaluation map for the mouth reference image. FIG. 6 is an offset similarity evaluation map obtained by offsetting the similarity evaluation map of FIG. 5 to the center of gravity, (a) is an offset similarity evaluation map for the reference images of both eyes, and (b) is for the reference image of the nose. It is an offset similarity evaluation map, and (c) is an offset similarity evaluation map for a mouth reference image. FIG. 7 is an integrated map in which the three offset similarity evaluation maps of FIG. 6 are integrated.

画像認識装置１は、対象物を撮像した撮像画像から人物の顔であるか否かを判定する顔認識装置である。画像認識装置１では、人物の顔の両目部分、鼻部分、口部分の参照画像及び顔全体の画像の重心から各参照画像の中心までのオフセット量を保持している。そして、画像認識装置１では、ある対象物を撮像した撮像画像において３つの参照画像にそれぞれ類似する両目、鼻、口が正しい位置に存在するか否かを判定する。そのために、画像認識装置１は、カメラ２及び画像ＥＣＵ[Electronic Control Unit]３を備え、画像ＥＣＵ３に参照画像データベース１０、オフセットデータベース１１、画像解像度変換部１２、画像間類似度評価部１３及び統合処理部１４が構成される。なお、本実施の形態では、各画像のデータを画素単位で取り扱い、座標系としては画素単位の（ｘ，ｙ）とする。 The image recognition apparatus 1 is a face recognition apparatus that determines whether or not a face of a person is a captured image obtained by capturing an object. The image recognition apparatus 1 holds offset amounts from the center of gravity of the reference images of both eyes, nose, and mouth of a person's face and the image of the entire face to the center of each reference image. Then, the image recognition device 1 determines whether or not both eyes, nose, and mouth that are similar to the three reference images in a captured image obtained by capturing an object are present at the correct positions. For this purpose, the image recognition apparatus 1 includes a camera 2 and an image ECU [Electronic Control Unit] 3. The image ECU 3 includes a reference image database 10, an offset database 11, an image resolution conversion unit 12, an inter-image similarity evaluation unit 13, and an integration. A processing unit 14 is configured. In this embodiment, the data of each image is handled in pixel units, and the coordinate system is (x, y) in pixel units.

なお、本実施の形態では、カメラ２が特許請求の範囲に記載する撮像手段に相当し、参照画像データベース１０が特許請求の範囲に記載する参照画像保持手段に相当し、オフセットデータベース１１が特許請求の範囲に記載する位置関係保持手段に相当し、画像解像度変換部１２が特許請求の範囲に記載する画像サイズ変換手段に相当し、画像間類似度評価部１３が特許請求の範囲に記載する比較手段に相当し、統合処理部１４が特許請求の範囲に記載する判定手段に相当する。 In the present embodiment, the camera 2 corresponds to the imaging means described in the claims, the reference image database 10 corresponds to the reference image holding means described in the claims, and the offset database 11 claims. The image resolution converting unit 12 corresponds to the image size converting unit described in the claims, and the inter-image similarity evaluation unit 13 is compared in the claims. The integrated processing unit 14 corresponds to a determination unit described in the claims.

カメラ２は、例えば、ＣＣＤ[Charge Coupled Device]カメラである。カメラ２では、ある対象物を撮像し、その撮像したカラー画像（例えば、ＲＧＢ[Red Green Blue]による画像）を取得する。人物の顔を認識する場合、カメラ２では、人物の顔を真正面から撮像する。例えば、図４（ａ）に示すような、撮像画像である。カメラ２では、その撮像画像のデータを画像ＥＣＵ３に送信する。なお、カメラ２はカラーであるが、少なくとも輝度情報が得られればよいので、白黒のカメラでもよい。 The camera 2 is, for example, a CCD [Charge Coupled Device] camera. The camera 2 captures an object and acquires a captured color image (for example, an image of RGB [Red Green Blue]). When recognizing a person's face, the camera 2 images the person's face from the front. For example, a captured image as shown in FIG. The camera 2 transmits the captured image data to the image ECU 3. Although the camera 2 is color, it is sufficient that at least luminance information can be obtained, so a monochrome camera may be used.

画像ＥＣＵ３は、ＣＰＵ[Central Processing Unit]、ＲＯＭ[Read Only Memory]、ＲＡＭ[Random Access Memory]等からなり、画像認識装置１の各処理部及び各データベースが構成される。画像ＥＣＵ３では、認識対象物の複数の特徴部分の参照画像と各参照画像のオフセット量を保持している。画像ＥＣＵ３では、カメラ２から撮像画像のデータを取り入れ、参照画像毎に撮像画像全体にわたって各参照画像と同一の領域を切り出し、その多数の切り出した領域と参照画像との類似度を示した類似度評価マップを生成する。さらに、画像ＥＣＵ３では、参照画像毎の類似度評価マップをオフセット量分オフセットさせたオフセット類似度評価マップを生成し、複数のオフセット類似度評価マップを統合した統合マップから認識対象物であるか否かを判定する。 The image ECU 3 includes a CPU [Central Processing Unit], a ROM [Read Only Memory], a RAM [Random Access Memory], and the like, and each processing unit and each database of the image recognition apparatus 1 are configured. The image ECU 3 holds reference images of a plurality of characteristic portions of the recognition target object and offset amounts of the respective reference images. In the image ECU 3, captured image data is taken from the camera 2, the same area as each reference image is cut out over the entire captured image for each reference image, and the similarity indicating the similarity between the number of cut out areas and the reference image Generate an evaluation map. Further, the image ECU 3 generates an offset similarity evaluation map obtained by offsetting the similarity evaluation map for each reference image by an offset amount, and whether or not the object is a recognition target from an integrated map obtained by integrating a plurality of offset similarity evaluation maps. Determine whether.

参照画像データベース１０は、ＲＯＭ内に構築され、複数個の参照画像のデータを格納している。参照画像は、認識対象物の特徴となる部分の画像であり、認識対象物が顔なので両目、鼻、口の画像である。これ以外でも、耳、眉毛等の他の特徴部分の参照画像を用いてもよい。本実施の形態では、特定の人物の顔を認識するのではなく、人物の顔か否かを判別するので、参照画像を作成するための顔画像としては、一般的な人物の顔を撮像した撮像画像でもよいし、あるいは、多数の人物の顔の画像を収集し、その平均的な形や大きさをした両目、鼻、口からなる人物の顔画像を作成してもよい。なお、カメラ２で必ずしも正面を向いている顔を撮像できない場合、様々な向きをした顔の平均的な顔画像を作成し、その平均的な顔画像から参照画像を作成してもよい。また、顔は様々な表情するので、様々な表情をした顔の平均的な顔の画像を作成し、その平均的な顔画像から参照画像を作成してもよい。 The reference image database 10 is constructed in the ROM and stores a plurality of reference image data. The reference image is an image of a part that is a feature of the recognition target object, and is an image of both eyes, nose, and mouth because the recognition target object is a face. Other than this, reference images of other characteristic portions such as ears and eyebrows may be used. In this embodiment, instead of recognizing the face of a specific person, it is determined whether or not it is a person's face. Therefore, as a face image for creating a reference image, a face of a general person is captured. A captured image may be used, or face images of a large number of persons may be collected and a face image of a person composed of both eyes, nose, and mouth having an average shape and size may be created. If the camera 2 cannot always capture a face that faces the front, an average face image of faces in various directions may be created, and a reference image may be created from the average face image. In addition, since the face has various expressions, an average face image of faces having various expressions may be created, and a reference image may be created from the average face image.

参照画像を作成するために、例えば、図３（ａ）に示すような、顔画像を用意する。この顔画像において、両目、鼻、口を含む顔に接する最小の長方形であり、この長方形の４辺が画像の縦軸又は横軸に平行な顔領域を設定する（図３（ａ）の破線で示す長方形の領域）。さらに、この顔領域の中心を、重心Ｃｅｎ＿Ｂ（＝（０，０））とする（図３（ａ）の○印）。さらに、この顔画像から、両目を含む長方形、鼻を含む長方形及び口を含む長方形であり、この各長方形の４辺が画像の縦軸又は横軸に平行な領域を切り出す（図３（ｂ）の破線で示す３つの長方形の領域）。この各領域の中心を、特徴点Ｐ＿Ｂ＿ｉ（ｉ＝０，１，２）とする（図３（ｂ）の３つの十印）。この切り出した長方形の領域で区画される各画像が参照画像であり、この参照画像のデータが参照画像データベース１０に格納されている。参照画像のデータは、画素毎の輝度値である。なお、参照画像のサイズは、このサイズが小さいと類似度評価マップでノイズが増加するので、ある程度の大きさを有していたほうが望ましい。 In order to create a reference image, for example, a face image as shown in FIG. In this face image, the face is the smallest rectangle in contact with the face including both eyes, nose and mouth, and a face area in which the four sides of the rectangle are parallel to the vertical or horizontal axis of the image is set (broken line in FIG. 3A). Rectangle area). Further, the center of the face area is set as the center of gravity Cen_B (= (0, 0)) (circle mark in FIG. 3A). Further, from this face image, a rectangle including both eyes, a rectangle including the nose, and a rectangle including the mouth, and a region in which the four sides of each rectangle are parallel to the vertical axis or the horizontal axis of the image is cut out (FIG. 3B). 3 rectangular regions indicated by broken lines). The center of each region is defined as a feature point P_B_i (i = 0, 1, 2) (three crosses in FIG. 3B). Each image partitioned by the cut-out rectangular area is a reference image, and the data of the reference image is stored in the reference image database 10. The reference image data is a luminance value for each pixel. Note that it is desirable that the size of the reference image has a certain size since noise is increased in the similarity evaluation map if this size is small.

オフセットデータベース１１は、ＲＯＭ内に構築され、参照画像毎のオフセット量を格納している。オフセット量は、顔領域の重心Ｃｅｎ＿Ｂと各特徴点Ｐ＿Ｂ＿ｉとの偏差であり、ΔＰ＿Ｂ＿ｉ（ｘ，ｙ）で表される。このオフセット量は、顔領域の重心Ｃｅｎ＿Ｂと各特徴点Ｐ＿Ｂ＿ｉ間の距離の平均的な値が設定される。つまり、多数の人物の顔の特徴点の存在する位置を収集し、その位置の平均値を用いて各距離を求めている。オフセットデータベース１１には、各参照画像に対応付けてオフセット量ΔＰ＿Ｂ＿ｉ（ｉ＝０，１，２）が格納される。

The offset database 11 is constructed in the ROM and stores an offset amount for each reference image. The offset amount is a deviation between the center of gravity Cen_B of the face area and each feature point P_B_i, and is represented by ΔP_B_i (x, y). As the offset amount, an average value of the distance between the center of gravity Cen_B of the face region and each feature point P_B_i is set. That is, positions where feature points of many human faces exist are collected, and each distance is obtained using an average value of the positions. The offset database 11 stores an offset amount ΔP_B_i (i = 0, 1, 2) in association with each reference image.

式（１）に示すように、オフセット量ΔＰ＿Ｂ＿ｉが既知とすると、各特徴点Ｐ＿Ｂ＿ｉが決まれば、重心Ｃｅｎ＿Ｂを求めることができる。 As shown in Expression (1), if the offset amount ΔP_B_i is known, the center of gravity Cen_B can be obtained if each feature point P_B_i is determined.

画像解像度変換部１２では、撮像画像上での対象物の解像度を参照画像の解像度に合わせるために、撮像画像のサイズ（解像度）を任意のサイズに変換する。本実施の形態では輝度値を用いて類似度の計算等を行うので、画像解像度変換部１２では撮像画像のデータから各画素の輝度値を抽出し、画素毎の輝度値からなるサイズを変換した撮像画像を生成する。サイズの変換方法としては、例えば、線形補間により変換後の画像上の各画素の輝度値を決定したり、あるいは、画像サイズを小さくする場合には画素を間引く。 The image resolution conversion unit 12 converts the size (resolution) of the captured image into an arbitrary size in order to match the resolution of the object on the captured image with the resolution of the reference image. In this embodiment, since the similarity is calculated using the luminance value, the image resolution conversion unit 12 extracts the luminance value of each pixel from the captured image data, and converts the size of the luminance value for each pixel. A captured image is generated. As a size conversion method, for example, the luminance value of each pixel on the converted image is determined by linear interpolation, or pixels are thinned out when the image size is reduced.

画像間類似度評価部１３では、参照画像データベース１０から参照画像を順次抽出する。そして、画像間類似度評価部１３では、参照画像毎に、切り出す領域の中心点（ｘｃ，ｙｃ）をｘ方向又はｙ方向に１画素ずつずらしながら、サイズ変換した撮像画像からその参照画像と同一のサイズの領域を切り出す（図２参照）。さらに、画像間類似度評価部１３では、切り出し各領域の画像と参照画像との間で、式（２）により、対応する位置の画素毎に輝度値の差分の絶対値を計算し、全ての画素分の絶対値差分の総和を計算する。

The inter-image similarity evaluation unit 13 sequentially extracts reference images from the reference image database 10. Then, in the image similarity evaluation unit 13, for each reference image, the center point (xc, yc) of the clipped region is shifted by one pixel in the x direction or the y direction, and is the same as the reference image from the size-converted captured image. A region of the size of is cut out (see FIG. 2). Further, the inter-image similarity evaluation unit 13 calculates the absolute value of the difference in luminance value for each pixel at the corresponding position by using the expression (2) between the image of each cut-out region and the reference image. The sum of absolute value differences for pixels is calculated.

式（２）において、Ｔ（ｘ，ｙ）は撮像画像の（ｘ，ｙ）の位置の画素の輝度値であり、Ｓ（ｘ，ｙ）は参照画像の（ｘ，ｙ）の位置の画素の輝度値である。参照画像のサイズをｘ方向をｍ画素、ｙ方向をｎ画素とすると、ｍ×ｎ個の差分の絶対値が計算され、ｍ×ｎ個の絶対値差分が積算される。したがって、切り出し領域の画像と参照画像との類似度が高いほど（相関があるほど）、絶対値差分総和値は小さくなり、切り出し領域の画像と参照画像とが一致する場合には絶対値差分総和値は０になる。なお、輝度値を用いたが、画像の他の値も用いて計算してもよい。例えば、彩度値、色合を用いてもよいし、あるいは、ＲＧＢ画像のＲ値、Ｇ値、Ｂ値を用いてもよい。輝度値以外を用いる場合、その用いる値に応じて、参照画像を作成し、画像解像度変換部１２でもサイズ変換した撮像画像を生成する。 In Expression (2), T (x, y) is the luminance value of the pixel at the position (x, y) of the captured image, and S (x, y) is the pixel at the position (x, y) of the reference image. Luminance value. Assuming that the size of the reference image is m pixels in the x direction and n pixels in the y direction, the absolute values of m × n differences are calculated, and m × n absolute value differences are integrated. Therefore, the higher the similarity between the image of the cutout area and the reference image (the more correlation there is), the smaller the absolute value difference sum value, and when the image of the cutout area matches the reference image, the absolute value difference summation The value is 0. In addition, although the luminance value was used, you may calculate using the other value of an image. For example, saturation values and hues may be used, or R values, G values, and B values of RGB images may be used. When a value other than the luminance value is used, a reference image is created according to the value to be used, and the image resolution conversion unit 12 generates a captured image whose size has been converted.

さらに、画像間類似度評価部１３では、輝度値の最大値からその絶対値差分総和値を減算し、切り出し各領域の画像と参照画像との類似度を示す輝度値を求める。したがって、切り出し領域の画像と参照画像との類似度が高いほど、輝度値は大きくなり、切り出し領域の画像と参照画像とが一致する場合には輝度値の最大値となる。そして、画像間類似度評価部１３では、切り出した領域の中心である座標（ｘｃ，ｙｃ）にその輝度値をそれぞれ設定していき、類似度評価マップＭａｐ＿Ｂ＿ｉを生成する（図２参照）。類似度評価マップＭａｐ＿Ｂ＿ｉは、参照画像毎に生成され、全ての切り出し領域の中心点（ｘｃ，ｙｃ）の各画素に参照画像との類似度である輝度値を配列させたマップである。類似度評価マップＭａｐ＿Ｂ＿ｉは、画面に表示すると、類似度が高いほど輝度値が大きくなるので、その箇所が白くなる。また、類似度評価マップＭａｐ＿Ｂ＿ｉでは、撮像画像に参照画像と類似する特徴部分が存在する場合、その特徴部分が位置する周辺で類似度がピークとなるので、特徴点Ｐ＿Ｂ＿ｉ周辺の輝度値が大きくなる。 Further, the inter-image similarity evaluation unit 13 subtracts the absolute value difference sum value from the maximum luminance value to obtain a luminance value indicating the similarity between the image of each cut-out region and the reference image. Therefore, the higher the degree of similarity between the image in the cutout region and the reference image, the larger the luminance value. When the image in the cutout region matches the reference image, the luminance value becomes the maximum value. Then, the inter-image similarity evaluation unit 13 sets the luminance value to the coordinates (xc, yc) that are the center of the cut out region, and generates the similarity evaluation map Map_B_i (see FIG. 2). The similarity evaluation map Map_B_i is generated for each reference image, and is a map in which luminance values that are similarities to the reference image are arranged in each pixel at the center point (xc, yc) of all cutout regions. When the similarity evaluation map Map_B_i is displayed on the screen, the higher the similarity is, the larger the luminance value becomes, and the portion becomes white. In addition, in the similarity evaluation map Map_B_i, when there is a feature portion similar to the reference image in the captured image, the similarity is a peak in the vicinity where the feature portion is located, and thus the luminance value around the feature point P_B_i increases. .

画像間類似度評価部１３では、例えば、図５に示すような類似度評価マップＭａｐ＿Ｂ＿ｉ（ｉ＝０，１，２）を生成する。図５（ａ）は、両目の参照画像の類似度評価マップＭａｐ＿Ｂ＿０が表示されたものであり、特徴点Ｐ＿Ｂ＿０周辺の輝度値が大きくなっている。図５（ｂ）は、鼻の参照画像の類似度評価マップＭａｐ＿Ｂ＿１が表示されたものであり、特徴点Ｐ＿Ｂ＿１周辺の輝度値が大きくなっている。図５（ｃ）は、口の参照画像の類似度評価マップＭａｐ＿Ｂ＿２が表示されたものであり、特徴点Ｐ＿Ｂ＿２周辺の輝度値が大きくなっている。図５（ａ）〜（ｃ）では、特徴点以外でも、輝度値が大きくなる（白くなっている）箇所があるが、これらの箇所はノイズである。ちなみに、顔の場合、背景や肌等の画像は輝度値が一様であるが、顔の特徴部分や背景と顔の境界部分等では輝度値が変化するので、参照画像との間で類似度を計算すると類似性が出る場合あり、これがノイズとなる。 The inter-image similarity evaluation unit 13 generates, for example, a similarity evaluation map Map_B_i (i = 0, 1, 2) as shown in FIG. FIG. 5A shows the similarity evaluation map Map_B_0 of the reference images of both eyes, and the luminance value around the feature point P_B_0 is large. FIG. 5B shows the similarity evaluation map Map_B_1 of the nose reference image, and the luminance value around the feature point P_B_1 is large. FIG. 5C displays a similarity evaluation map Map_B_2 of the mouth reference image, and the luminance value around the feature point P_B_2 is large. In FIGS. 5A to 5C, there are places where the luminance value becomes large (white) other than the feature points, but these places are noise. By the way, in the case of a face, the background and skin images have the same brightness value, but the brightness value changes at the facial features and the border between the background and the face. When calculating, similarity may appear, and this becomes noise.

統合処理部１４では、オフセットデータベース１１から参照画像に対応するオフセット量ΔＰ＿Ｂ＿ｉを取り出す。そして、統合処理部１４では、類似度評価マップＭａｐ＿Ｂ＿ｉの各画素をそのオフセット量ΔＰ＿Ｂ＿ｉによってオフセットし、オフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉを生成する。オフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉは、参照画像毎に生成され、各参照画像の特徴点Ｐ＿Ｂ＿ｉが重心Ｃｅｎ＿Ｂの位置になるように類似度評価マップΔＭａｐ＿Ｂ＿ｉを移動させたマップである。オフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉでは、撮像画像に参照画像と類似する特徴部分が存在する場合、類似度のピークが各特徴点Ｐ＿Ｂ＿ｉからオフセット量ΔＰ＿Ｂ＿ｉ分移動するので、重心Ｃｅｎ＿Ｂ周辺の輝度値が大きくなる。なお、オフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉは、類似度評価マップＭａｐ＿Ｂ＿ｉと同一のサイズであり、類似度評価マップＭａｐ＿Ｂ＿ｉからの輝度値が設定されない画素には輝度値として０が設定される。 The integration processing unit 14 extracts the offset amount ΔP_B_i corresponding to the reference image from the offset database 11. Then, the integration processing unit 14 offsets each pixel of the similarity evaluation map Map_B_i by the offset amount ΔP_B_i to generate an offset similarity evaluation map ΔMap_B_i. The offset similarity evaluation map ΔMap_B_i is generated for each reference image, and is a map obtained by moving the similarity evaluation map ΔMap_B_i so that the feature point P_B_i of each reference image is located at the center of gravity Cen_B. In the offset similarity evaluation map ΔMap_B_i, when there is a feature portion similar to the reference image in the captured image, the similarity peak moves from each feature point P_B_i by the offset amount ΔP_B_i, so the luminance value around the center of gravity Cen_B increases. . Note that the offset similarity evaluation map ΔMap_B_i has the same size as the similarity evaluation map Map_B_i, and 0 is set as the luminance value for pixels for which no luminance value is set from the similarity evaluation map Map_B_i.

統合処理部１４では、例えば、図６に示すようなオフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉ（ｉ＝０，１，２）を生成する。図６（ａ）〜（ｃ）は、図５に示す参照画像毎の類似度評価マップＭａｐ＿Ｂ＿ｉが矢印で示すオフセット量ΔＰ＿Ｂ＿ｉ分それぞれオフセットされた各オフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉが表示されたものであり、重心Ｃｅｎ＿Ｂ周辺の輝度値が大きくなっている。オフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉでは、類似度評価マップＭａｐ＿Ｂ＿０をオフセットさせただけなので、ノイズが残っている。

For example, the integration processing unit 14 generates an offset similarity evaluation map ΔMap_B_i (i = 0, 1, 2) as shown in FIG. 6A to 6C show the respective offset similarity evaluation maps ΔMap_B_i obtained by offsetting the similarity evaluation map Map_B_i for each reference image shown in FIG. 5 by the offset amount ΔP_B_i indicated by the arrows. The luminance value around the center of gravity Cen_B is large. In the offset similarity evaluation map ΔMap_B_i, noise is left because the similarity evaluation map Map_B_0 is merely offset.

さらに、統合処理部１４では、式（３）により、全てのオフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉの輝度値を対応する画素毎に加算し、統合マップＴ＿Ｍａｐを生成する。統合マップＴ＿Ｍａｐは、全てのオフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉを統合したマップである。統合マップＴ＿Ｍａｐでは、全てのオフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉにおいて輝度値が大きい箇所が一致している場合、その箇所の輝度値が大きくなる。そのため、統合マップＴ＿Ｍａｐでは、撮像画像に複数の参照画像とそれぞれ類似する特徴部分が全て存在する場合、重心Ｃｅｎ＿Ｂ周辺の類似度のピークが統合されるので、重心Ｃｅｎ＿Ｂ周辺の輝度値が最も大きくなる。また、統合マップＴ＿Ｍａｐでは、各オフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉに異なる箇所にそれぞれノイズ（輝度値が大きい箇所）が存在していても、統合することによって相殺され、ノイズが低減される。統合処理部１４では、例えば、図７に示すような統合マップＴ＿Ｍａｐを生成する。図７は、図６に示す３つのオフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉが統合された統合マップＴ＿Ｍａｐが表示されたものであり、重心Ｃｅｎ＿Ｂ周辺の輝度値だけが大きくなっており、ノイズが無くなっている。 Further, the integration processing unit 14 adds the luminance values of all the offset similarity evaluation maps ΔMap_B_i for each corresponding pixel according to the expression (3) to generate an integrated map T_Map. The integrated map T_Map is a map in which all the offset similarity evaluation maps ΔMap_B_i are integrated. In the integrated map T_Map, when a portion having a large luminance value matches in all the offset similarity evaluation maps ΔMap_B_i, the luminance value at that portion increases. Therefore, in the integrated map T_Map, when there are all feature portions similar to the plurality of reference images in the captured image, the similarity peaks around the centroid Cen_B are integrated, so the luminance value around the centroid Cen_B becomes the largest. . Further, in the integrated map T_Map, even if noises (locations with large luminance values) exist in different locations in each offset similarity evaluation map ΔMap_B_i, they are canceled out by integrating them, and noise is reduced. For example, the integration processing unit 14 generates an integration map T_Map as shown in FIG. FIG. 7 shows an integrated map T_Map in which the three offset similarity evaluation maps ΔMap_B_i shown in FIG. 6 are integrated. Only the luminance value around the center of gravity Cen_B is increased, and noise is eliminated.

そして、統合処理部１４では、統合マップＴ＿Ｍａｐの重心Ｃｅｎ＿Ｂ周辺の各画素の輝度値を閾値と比較し、輝度値が閾値より大きい場合には撮像画像に顔が存在すると判定し、輝度値が閾値以下の場合には撮像画像に顔が存在しないと判定する。図４（ｂ）では、図４（ａ）の撮像画像に顔が存在したとする判定結果を示しており、特徴部分を含む顔の領域を実線の長方形で描いている。 Then, the integration processing unit 14 compares the luminance value of each pixel around the center of gravity Cen_B of the integrated map T_Map with a threshold value. If the luminance value is larger than the threshold value, it is determined that a face exists in the captured image. In the following cases, it is determined that no face exists in the captured image. FIG. 4B shows a determination result that a face is present in the captured image of FIG. 4A, and the face area including the feature portion is drawn with a solid line rectangle.

次に、図１及び図８〜図１２を参照して、自動車の車種認識に適用した場合の画像認識装置１の構成について説明する。図８は、認識対象を自動車のバックスタイルとした場合の参照画像の一例である。図９は、ある自動車を後方から撮像した撮像画像の一例である。図１０は、図８の参照画像に基づいて図９の撮像画像から自動車を認識する際の類似度評価マップであり、（ａ）が右側テールランプの参照画像に対する類似度評価マップであり、（ｂ）が右側タイヤの参照画像に対する類似度評価マップであり、（ｃ）がナンバプレートの参照画像に対する類似度評価マップであり、（ｄ）が左側テールランプの参照画像に対する類似度評価マップであり、（ｅ）が左側タイヤの参照画像に対する類似度評価マップである。図１１は、図１０の類似度評価マップを重心にオフセットしたオフセット類似度評価マップであり、（ａ）が右側テールランプの参照画像に対するオフセット類似度評価マップであり、（ｂ）が右側タイヤの参照画像に対するオフセット類似度評価マップであり、（ｃ）がナンバプレートの参照画像に対するオフセット類似度評価マップであり、（ｄ）が左側テールランプの参照画像に対するオフセット類似度評価マップであり、（ｅ）が左側タイヤの参照画像に対するオフセット類似度評価マップである。図１２は、図１１の５つのオフセット類似度評価マップを統合させた統合マップである。 Next, with reference to FIG. 1 and FIGS. 8 to 12, the configuration of the image recognition device 1 when applied to vehicle type recognition of an automobile will be described. FIG. 8 is an example of a reference image when the recognition target is the back style of an automobile. FIG. 9 is an example of a captured image obtained by capturing an automobile from the rear. FIG. 10 is a similarity evaluation map for recognizing a car from the captured image of FIG. 9 based on the reference image of FIG. 8, and (a) is a similarity evaluation map for the reference image of the right tail lamp. ) Is a similarity evaluation map for the reference image of the right tire, (c) is a similarity evaluation map for the reference image of the number plate, (d) is a similarity evaluation map for the reference image of the left tail lamp, e) is a similarity evaluation map for the reference image of the left tire. 11 is an offset similarity evaluation map obtained by offsetting the similarity evaluation map of FIG. 10 to the center of gravity, (a) is an offset similarity evaluation map for a reference image of the right tail lamp, and (b) is a reference of the right tire. An offset similarity evaluation map for an image, (c) is an offset similarity evaluation map for a reference image of a number plate, (d) is an offset similarity evaluation map for a reference image of a left tail lamp, and (e) is It is an offset similarity evaluation map with respect to the reference image of the left tire. FIG. 12 is an integrated map in which the five offset similarity evaluation maps of FIG. 11 are integrated.

画像認識装置１は、対象物を撮像した撮像画像から認識対象の車種であるか否かを判定する車種認識装置である。画像認識装置１では、自動車のバックスタイルの左右のテールランプ部分、左右のタイヤ部分、ナンバプレート部分の参照画像及び自動車のバックスタイル画像の重心から各参照画像の中心までのオフセット量を保持している。そして、画像認識装置１では、ある自動車を後方から撮像した撮像画像において５つの参照画像にそれぞれ類似する左右のテールランプ、左右のタイヤ、ナンバプレートが正しい位置に存在するか否かを判定する。ここでは、画像認識装置１の構成において、人物の顔認識する場合と異なる点のみ以下で説明する。 The image recognition device 1 is a vehicle type recognition device that determines whether a vehicle is a recognition target vehicle type from a captured image obtained by imaging an object. The image recognition device 1 holds the left and right tail lamp parts of the back style of the automobile, the left and right tire parts, the reference image of the number plate part, and the offset amount from the center of gravity of the back style image of the automobile to the center of each reference image. . Then, the image recognition apparatus 1 determines whether left and right tail lamps, left and right tires, and number plates similar to the five reference images in a captured image obtained by imaging a certain automobile from the rear are present at the correct positions. Here, only differences from the case of recognizing a person's face in the configuration of the image recognition apparatus 1 will be described below.

カメラ２では、自動車を真後ろから撮像し、その撮像画像のデータを画像ＥＣＵ３に送信する。例えば、図９に示すような、撮像画像である。 The camera 2 captures an image of the automobile from behind and transmits data of the captured image to the image ECU 3. For example, it is a captured image as shown in FIG.

参照画像データベース１０に格納する参照画像は、車種を判別するので、左右のテールランプ、左右のタイヤ、ナンバプレートの画像である。自動車の場合、車種により、テールランプ、タイヤ、ナンバプレートの位置が異なり、テールランプやタイヤについては大きさや形状も異なるので、車種を見分けるには有効な特徴部分となる。これ以外でも、バンパ、リヤウイング等の他の特徴部分の参照画像を用いてもよい。本実施の形態では、特定の車種を判別するので、参照画像を作成するための自動車のバックスタイルの画像としては、その車種の自動車を真後ろから撮像した画像とする。なお、カメラ２で必ずしも真後ろから自動車を撮像できない場合、様々な向きからのバックスタイルの平均的なバックスタイル画像を作成し、その平均的なバックスタイル画像から参照画像を作成してもよい。 The reference image stored in the reference image database 10 is an image of the left and right tail lamps, the left and right tires, and the number plate because the vehicle type is discriminated. In the case of an automobile, the position of the tail lamp, tire, and number plate differs depending on the vehicle type, and the size and shape of the tail lamp and tire are also different. Other than this, reference images of other characteristic portions such as a bumper and a rear wing may be used. In this embodiment, since a specific vehicle type is discriminated, the back style image of the vehicle for creating the reference image is an image obtained by capturing the vehicle of that vehicle type from behind. If the camera 2 cannot always capture a car from behind, an average back style image of various back styles from various directions may be created, and a reference image may be created from the average back style image.

参照画像を作成するために、例えば、図８に示すような、自動車のバックスタイル画像を用意する。この画像において、左右のテールランプ、左右のタイヤ、ナンバプレートを含む最小の長方形であり、この長方形の４辺が画像の縦軸又は横軸に平行なバックスタイル領域を設定する（図８の一点鎖線で示す長方形の領域）。さらに、このバックスタイル領域の中心を、重心Ｃｅｎ＿Ｂ（＝（０，０））とする（図８の○印）。さらに、この画像から、右側テールランプを含む長方形、右側タイヤを含む長方形、ナンバプレートを含む長方形、左側テールランプを含む長方形及び左側テールランプを含む長方形であり、この各長方形の４辺が画像の縦軸又は横軸に平行な領域を切り出す（図８の破線で示す５つの長方形の領域）。この各領域の中心を、特徴点Ｐ＿Ｂ＿ｉ（ｉ＝０，１，２，３，４）とする（図８の５つの十印）。この切り出した長方形の領域で区画される各画像が参照画像であり、この参照画像のデータが参照画像データベース１０に格納されている。 In order to create the reference image, for example, a back style image of a car as shown in FIG. 8 is prepared. In this image, it is the smallest rectangle including the left and right tail lamps, the left and right tires, and the number plate, and a backstyle area in which the four sides of this rectangle are parallel to the vertical axis or the horizontal axis of the image is set (a dashed-dotted line in FIG. 8). Rectangle area). Further, the center of the backstyle area is set as the center of gravity Cen_B (= (0, 0)) (circle mark in FIG. 8). Further, from this image, there are a rectangle including a right tail lamp, a rectangle including a right tire, a rectangle including a number plate, a rectangle including a left tail lamp, and a rectangle including a left tail lamp. A region parallel to the horizontal axis is cut out (five rectangular regions indicated by broken lines in FIG. 8). The center of each region is defined as a feature point P_B_i (i = 0, 1, 2, 3, 4) (five crosses in FIG. 8). Each image partitioned by the cut-out rectangular area is a reference image, and the data of the reference image is stored in the reference image database 10.

オフセットデータベース１１に格納するオフセット量ΔＰ＿Ｂ＿ｉ（ｉ＝０，１，２，３，４）は、バックスタイル領域の重心Ｃｅｎ＿Ｂと各特徴点Ｐ＿Ｂ＿ｉ（ｉ＝０，１，２，３，４）との偏差である。 The offset amount ΔP_B_i (i = 0, 1, 2, 3, 4) stored in the offset database 11 is the difference between the center Cen_B of the backstyle area and each feature point P_B_i (i = 0, 1, 2, 3, 4). Deviation.

画像間類似度評価部１３では、例えば、図１０に示すような類似度評価マップＭａｐ＿Ｂ＿ｉ（ｉ＝０，１，２，３，４）を生成する。図８（ａ）は、右側テールランプの参照画像の類似度評価マップＭａｐ＿Ｂ＿０が表示されたものであり、特徴点Ｐ＿Ｂ＿０周辺及び左側テールランプの特徴点Ｐ＿Ｂ＿３周辺の輝度値が大きくなっている。図８（ｂ）は、右側タイヤの参照画像の類似度評価マップＭａｐ＿Ｂ＿１が表示されたものであり、特徴点Ｐ＿Ｂ＿１周辺及び左側タイヤの特徴点Ｐ＿Ｂ＿４の輝度値周辺が大きくなっている。図８（ｃ）は、ナンバプレートの参照画像の類似度評価マップＭａｐ＿Ｂ＿２が表示されたものであり、特徴点Ｐ＿Ｂ＿２周辺の輝度値が大きくなっている。図８（ｄ）は、左側テールランプの参照画像の類似度評価マップＭａｐ＿Ｂ＿３が表示されたものであり、特徴点Ｐ＿Ｂ＿３周辺及び右側テールランプの特徴点Ｐ＿Ｂ＿０周辺の輝度値が大きくなっている。図８（ｅ）は、左側タイヤの参照画像の類似度評価マップＭａｐ＿Ｂ＿４が表示されたものであり、特徴点Ｐ＿Ｂ＿４周辺及び右側タイヤの特徴点Ｐ＿Ｂ＿１周辺の輝度値が大きくなっている。図８（ａ）〜（ｅ）では、特徴点以外でも、輝度値が大きくなる（白くなっている）箇所があるが、これらの箇所はノイズである。 The inter-image similarity evaluation unit 13 generates, for example, a similarity evaluation map Map_B_i (i = 0, 1, 2, 3, 4) as shown in FIG. FIG. 8A displays a similarity evaluation map Map_B_0 of the reference image of the right tail lamp, and the luminance values around the feature point P_B_0 and around the feature point P_B_3 of the left tail lamp are large. FIG. 8B shows a similarity evaluation map Map_B_1 of the reference image of the right tire, and the vicinity of the brightness value of the feature point P_B_1 and the feature point P_B_4 of the left tire is large. FIG. 8C shows the similarity evaluation map Map_B_2 of the reference image of the number plate, and the luminance value around the feature point P_B_2 is large. FIG. 8D shows a similarity evaluation map Map_B_3 of the reference image of the left tail lamp, and the luminance values around the feature point P_B_3 and around the feature point P_B_0 of the right tail lamp are large. FIG. 8E displays a similarity evaluation map Map_B_4 of the reference image of the left tire, and the luminance values around the feature point P_B_4 and around the feature point P_B_1 of the right tire are large. In FIGS. 8A to 8E, there are places where the luminance value becomes large (white) other than the feature points, but these places are noise.

統合処理部１４では、例えば、図１１に示すようなオフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉ（ｉ＝０，１，２，３，４）を生成する。図１１（ａ）〜（ｅ）は、図１０に示す参照画像毎の類似度評価マップが矢印で示すオフセット量ΔＰ＿Ｂ＿ｉ分それぞれオフセットされた各オフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉが表示されたものであり、全て重心Ｃｅｎ＿Ｂ周辺の輝度値が大きくなっている。また、オフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉでは、ノイズが残っている。さらに、統合処理部１４では、例えば、図１２に示すような統合マップＴ＿Ｍａｐを生成する。図１２は、図１１に示す５つのオフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉが統合された統合マップＴ＿Ｍａｐが表示されたものであり、重心Ｃｅｎ＿Ｂ周辺の輝度値だけが大きくなっており、ノイズが無くなっている。 For example, the integration processing unit 14 generates an offset similarity evaluation map ΔMap_B_i (i = 0, 1, 2, 3, 4) as shown in FIG. FIGS. 11A to 11E show the respective offset similarity evaluation maps ΔMap_B_i obtained by offsetting the similarity evaluation map for each reference image shown in FIG. 10 by the offset amount ΔP_B_i indicated by the arrows. All of the luminance values around the center of gravity Cen_B are large. In the offset similarity evaluation map ΔMap_B_i, noise remains. Further, the integration processing unit 14 generates, for example, an integration map T_Map as shown in FIG. FIG. 12 shows an integrated map T_Map in which the five offset similarity evaluation maps ΔMap_B_i shown in FIG. 11 are integrated, and only the luminance value around the center of gravity Cen_B is increased, and noise is eliminated.

統合処理部１４では、統合マップＴ＿Ｍａｐの重心Ｃｅｎ＿Ｂ周辺の各画素の輝度値を閾値と比較し、輝度値が閾値より大きい場合には撮像画像の自動車が認識対象の車種と判定し、輝度値が閾値以下の場合には撮像画像の自動車が認識対象の車種でないと判定する。 The integration processing unit 14 compares the luminance value of each pixel around the center of gravity Cen_B of the integrated map T_Map with a threshold value. If the luminance value is larger than the threshold value, the vehicle of the captured image is determined as the recognition target vehicle type, and the luminance value is If it is less than or equal to the threshold value, it is determined that the vehicle in the captured image is not the vehicle model to be recognized.

図１を参照して、画像認識装置１における動作を説明する。認識を行う前に、認識対象物（任意の人物の顔又は自動車の特定の車種）に応じて、その特徴となる部分（両目、鼻、口又は左右のテールランプ、左右のタイヤ、ナンバプレート）を複数抽出する。そして、その複数の特徴部分の参照画像を作成し、その参照画像のデータ（画素毎の輝度値）を参照画像データベース１０に格納しておく。また、各特徴部分の中心である特徴点Ｐ＿Ｂ＿ｉと画像の中心である重心Ｃｅｎ＿Ｂとのオフセット量ΔＰ＿Ｂ＿ｉを求め、そのオフセット量ΔＰ＿Ｂ＿ｉを参照画像に対応付けてオフセットデータベース１１に格納しておく。 The operation in the image recognition apparatus 1 will be described with reference to FIG. Before performing recognition, depending on the object to be recognized (the face of any person or a specific vehicle model), the characteristic parts (both eyes, nose, mouth or left and right tail lamps, left and right tires, number plate) Extract multiple. Then, a reference image of the plurality of characteristic portions is created, and data of the reference image (luminance value for each pixel) is stored in the reference image database 10. Further, an offset amount ΔP_B_i between the feature point P_B_i that is the center of each feature portion and the center of gravity Cen_B that is the center of the image is obtained, and the offset amount ΔP_B_i is stored in the offset database 11 in association with the reference image.

カメラ２により、対象物（人物の顔又は自動車のバックスタイル）を撮像し、その撮像画像データを画像ＥＣＵ３に送信する。画像ＥＣＵ３では、参照画像とサイズ（解像度）を合わせるために、撮像画像のサイズを変換する。このサイズ変換した撮像画像のデータは、輝度値である。次に、画像ＥＣＵ３では、参照画像データベース１０から参照画像を取り出す。そして、画像ＥＣＵ３では、サイズ変換した撮像画像からその取り出した参照画像と同一のサイズの領域を順次切り出し、その切り出した領域と参照画像との類似度を示す輝度値を計算する。さらに、画像ＥＣＵ３では、その計算した輝度値を切り出した領域の中心座標（ｘｃ，ｙｃ）に設定し、類似度評価マップＭａｐ＿Ｂ＿ｉを生成する。画像ＥＣＵ３では、複数の参照画像に対してそれぞれ類似度評価マップＭａｐ＿Ｂ＿ｉを生成する。撮像画像において参照画像と類似する特徴部分が存在する場合、類似度評価マップＭａｐ＿Ｂ＿ｉでは、特徴点Ｐ＿Ｂ＿ｉ周辺での輝度値が大きくなっている。 The camera 2 captures an image of an object (a person's face or the back style of a car), and transmits the captured image data to the image ECU 3. The image ECU 3 converts the size of the captured image in order to match the size (resolution) with the reference image. The size-converted captured image data is a luminance value. Next, the image ECU 3 extracts a reference image from the reference image database 10. Then, the image ECU 3 sequentially cuts out a region having the same size as the extracted reference image from the size-converted captured image, and calculates a luminance value indicating the similarity between the extracted region and the reference image. Further, the image ECU 3 sets the calculated luminance value to the center coordinates (xc, yc) of the extracted area, and generates a similarity evaluation map Map_B_i. The image ECU 3 generates a similarity evaluation map Map_B_i for each of a plurality of reference images. When there is a feature portion similar to the reference image in the captured image, the brightness value around the feature point P_B_i is large in the similarity evaluation map Map_B_i.

続いて、画像ＥＣＵ３では、オフセットデータベース１１から参照画像に対応したオフセット量ΔＰ＿Ｂ＿ｉを取り出す。そして、画像ＥＣＵ３では、特徴点Ｐ＿Ｂ＿ｉを重心Ｃｅｎ＿Ｂに位置させるために、そのオフセット量ΔＰ＿Ｂ＿ｉにより類似度評価マップＭａｐ＿Ｂ＿ｉをオフセットする。画像ＥＣＵ３では、複数の参照画像に対してそれぞれオフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉを生成する。撮像画像において参照画像と類似する特徴部分が存在する場合、オフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉでは、重心Ｃｅｎ＿Ｂ周辺での輝度値が大きくなっている。 Subsequently, the image ECU 3 extracts the offset amount ΔP_B_i corresponding to the reference image from the offset database 11. Then, in the image ECU 3, in order to place the feature point P_B_i at the center of gravity Cen_B, the similarity evaluation map Map_B_i is offset by the offset amount ΔP_B_i. The image ECU 3 generates an offset similarity evaluation map ΔMap_B_i for each of the plurality of reference images. When there is a feature portion similar to the reference image in the captured image, the luminance value around the centroid Cen_B is large in the offset similarity evaluation map ΔMap_B_i.

さらに、画像ＥＣＵ３では、全てのオフセット類似度評価マップΔＭａｐ＿Ｂ＿ｉの輝度値を画素毎に積算し、統合マップＴ＿Ｍａｐを生成する。撮像画像において複数の参照画像と類似する特徴部分が全て存在する場合、統合マップＴ＿Ｍａｐでは、重心Ｃｅｎ＿Ｂ周辺での輝度値が最も大きくなっている。そこで、画像ＥＣＵ３では、統合マップＴ＿Ｍａｐの重心Ｃｅｎ＿Ｂ周辺の各画素の輝度値を閾値と比較し、閾値より大きい場合には認識対象物（任意の人物の顔又は特定の車種）であると判定し、閾値以下の場合には認識対象物でないと判定する。 Further, the image ECU 3 integrates the luminance values of all the offset similarity evaluation maps ΔMap_B_i for each pixel to generate an integrated map T_Map. When there are all feature parts similar to a plurality of reference images in the captured image, the luminance value around the center of gravity Cen_B is the largest in the integrated map T_Map. In view of this, the image ECU 3 compares the luminance value of each pixel around the center of gravity Cen_B of the integrated map T_Map with a threshold value, and if it is greater than the threshold value, determines that the object is a recognition target (an arbitrary person's face or a specific vehicle type). If it is equal to or less than the threshold, it is determined that the object is not a recognition object.

この画像認識装置１によれば、特徴部分（両目、鼻、口又はテールランプ、タイヤ、ナンバプレート）の位置、形や大きさに関係なく、処理負荷（処理速度）が一定となので、処理負荷が軽く、処理速度も速くなる。ちなみに、画像認識装置１では、パターンマッチングを行わないので、参照画像に対して撮像画像における対象物の特徴部分の位置、形や大きさが多少違っていても処理量は変わらない。 According to the image recognition apparatus 1, the processing load (processing speed) is constant regardless of the position, shape, and size of the characteristic parts (both eyes, nose, mouth or tail lamp, tire, number plate), so the processing load is Lighter and faster processing speed. Incidentally, since the image recognition apparatus 1 does not perform pattern matching, the amount of processing does not change even if the position, shape, and size of the characteristic part of the object in the captured image are slightly different from the reference image.

また、画像認識装置１では、参照画像に対する類似度を求めることにより撮像画像において特徴部分に類似している部分が存在しているか否かを検出でき、求めた複数の類似度評価マップＭａｐ＿Ｂ＿ｉを重心Ｃｅｎ＿Ｂを基準として統合することにより撮像画像において特徴部分と類似している部分の位置が合っているか否かを検出できるので、認識精度が高い。さらに、画像認識装置１では、参照画像として用いる特徴部分の数を増加することにより、認識精度が向上する。また、画像認識装置１では、各類似度評価マップＭａｐ＿Ｂ＿ｉでノイズがでても、それらを統合することにより、ノイズを抑制できる。 Further, the image recognition apparatus 1 can detect whether or not there is a portion similar to the feature portion in the captured image by obtaining the similarity to the reference image, and the obtained plurality of similarity evaluation maps Map_B_i By integrating with Cen_B as a reference, it is possible to detect whether or not the position of the portion similar to the characteristic portion in the captured image is correct, so that the recognition accuracy is high. Furthermore, in the image recognition apparatus 1, the recognition accuracy is improved by increasing the number of feature portions used as reference images. Further, in the image recognition device 1, even if noise appears in each similarity evaluation map Map_B_i, the noise can be suppressed by integrating them.

また、画像認識装置１では、撮像画像のサイズを変換により小さくした場合、処理負荷を更に削減でき、撮像画像におけるノイズも抑制できる。さらに、画像認識装置１では、輝度値により類似度を示しているので、視覚的に類似度を判断できる。 Further, in the image recognition device 1, when the size of the captured image is reduced by conversion, the processing load can be further reduced, and noise in the captured image can be suppressed. Further, in the image recognition apparatus 1, since the similarity is indicated by the luminance value, the similarity can be visually determined.

以上、本発明に係る実施の形態について説明したが、本発明は上記実施の形態に限定されることなく様々な形態で実施される。 As mentioned above, although embodiment which concerns on this invention was described, this invention is implemented in various forms, without being limited to the said embodiment.

例えば、本実施の形態では人物や自動車の認識に適用したが、これら以外の様々のものの認識に適用可能である。人物を認識する場合、特定の個人を認証する個人認証にも適用可能であり、その場合にはその認識する個人の顔を撮像した顔画像から特徴部分の参照画像を作成する。また、ある特定の人種、ある特定の年齢層、男性と女性等の様々な認識にも適用可能であり、各認識に応じて参照画像を用意する。例えば、ある特定の人種を認識する場合、その人種の平均的な顔画像から参照画像を作成する。自動車を認識する場合、小型車、中型車、大型車等の大きな枠組みでの認識にも適用可能である。この場合には、参照画像やオフセット量については各大きさの自動車の平均的な値を用いる。 For example, in the present embodiment, the present invention is applied to the recognition of a person or a car, but it can be applied to the recognition of various other things. In the case of recognizing a person, the present invention can also be applied to personal authentication for authenticating a specific individual. In this case, a reference image of a feature portion is created from a face image obtained by capturing the face of the recognized individual. Further, the present invention can be applied to various recognitions such as a specific race, a specific age group, men and women, and a reference image is prepared for each recognition. For example, when a certain race is recognized, a reference image is created from an average face image of that race. When recognizing an automobile, the present invention can be applied to recognition in a large framework such as a small car, a medium car, and a large car. In this case, average values of automobiles of various sizes are used for the reference image and the offset amount.

また、本実施の形態ではオフセット量の基準となる点を画像の重心（中心）に設定したが、画像の左上端や右下端等の画像中ならどの位置に設定してもよい。この場合、オフセット量は、設定した位置に対して求められる。ちなみに、オフセット類似度評価マップや統合マップにおける類似度が高くなるのはその設定した位置周辺となるので、閾値と比較する画素もその設定した位置周辺の画素とする必要がある。 Further, in the present embodiment, the point serving as the reference for the offset amount is set at the center of gravity (center) of the image, but may be set at any position in the image such as the upper left corner or the lower right corner of the image. In this case, the offset amount is obtained for the set position. Incidentally, since the similarity in the offset similarity evaluation map and the integrated map is high around the set position, the pixel to be compared with the threshold needs to be a pixel around the set position.

また、本実施の形態では撮像画像の解像度（サイズ）を変換する構成としたが、撮像画像における対象物と参照画像との解像度に整合がとれる場合、撮像画像の解像度を変換しない構成としてもよい。 In the present embodiment, the resolution (size) of the captured image is converted. However, when the resolution of the target object and the reference image in the captured image can be matched, the resolution of the captured image may not be converted. .

また、本実施の形態では参照画像に対する類似度を輝度値としたが、輝度値ではなく、絶対値差分総和値自体を類似度としてもよいし、あるいは、他の手法により参照画像に対する類似度を求めてもよい。 In this embodiment, the similarity to the reference image is set as the luminance value, but the absolute value difference sum value itself may be used as the similarity instead of the luminance value, or the similarity to the reference image may be determined by another method. You may ask for it.

本実施の形態に係る画像認識装置の構成図である。It is a block diagram of the image recognition apparatus which concerns on this Embodiment. 図１の画像間類似度評価部の説明図である。It is explanatory drawing of the similarity evaluation part between images of FIG. 認識対象を人物の顔とした場合の参照画像の一例であり、（ａ）が参照画像を作成するための人物の顔画像であり、（ｂ）が（ａ）図の顔画像における両目、鼻、口を特徴部分とした参照画像である。It is an example of a reference image when the recognition target is a person's face, (a) is a person's face image for creating a reference image, and (b) is both eyes and nose in the face image of FIG. The reference image has a mouth as a characteristic part. ある人物の顔を正面から撮像した撮像画像の一例であり、（ａ）が撮像画像であり、（ｂ）が顔を認識した結果を示す撮像画像である。It is an example of the captured image which imaged the face of a certain person from the front, (a) is a captured image, (b) is a captured image which shows the result of having recognized the face. 図３の参照画像に基づいて図４の撮像画像から顔を認識する際の類似度評価マップであり、（ａ）が両目の参照画像に対する類似度評価マップであり、（ｂ）が鼻の参照画像に対する類似度評価マップであり、（ｃ）が口の参照画像に対する類似度評価マップである。4 is a similarity evaluation map for recognizing a face from the captured image of FIG. 4 based on the reference image of FIG. 3, (a) is a similarity evaluation map for the reference images of both eyes, and (b) is a nose reference. It is a similarity evaluation map for images, and (c) is a similarity evaluation map for mouth reference images. 図５の類似度評価マップを重心にオフセットしたオフセット類似度評価マップであり、（ａ）が両目の参照画像に対するオフセット類似度評価マップであり、（ｂ）が鼻の参照画像に対するオフセット類似度評価マップであり、（ｃ）が口の参照画像に対するオフセット類似度評価マップである。5 is an offset similarity evaluation map obtained by offsetting the similarity evaluation map of FIG. 5 to the center of gravity, (a) is an offset similarity evaluation map for the reference images of both eyes, and (b) is an offset similarity evaluation for the reference images of the nose. (C) is an offset similarity evaluation map for the mouth reference image. 図６の３つのオフセット類似度評価マップを統合させた統合マップである。7 is an integrated map obtained by integrating the three offset similarity evaluation maps of FIG. 6. 認識対象を自動車のバックスタイルとした場合の参照画像の一例である。It is an example of the reference image at the time of making recognition object the back style of a car. ある自動車を後方から撮像した撮像画像の一例である。It is an example of the captured image which imaged a certain automobile from back. 図８の参照画像に基づいて図９の撮像画像から自動車を認識する際の類似度評価マップであり、（ａ）が右側テールランプの参照画像に対する類似度評価マップであり、（ｂ）が右側タイヤの参照画像に対する類似度評価マップであり、（ｃ）がナンバプレートの参照画像に対する類似度評価マップであり、（ｄ）が左側テールランプの参照画像に対する類似度評価マップであり、（ｅ）が左側タイヤの参照画像に対する類似度評価マップである。9 is a similarity evaluation map for recognizing a car from the captured image of FIG. 9 based on the reference image of FIG. 8, (a) is a similarity evaluation map for the reference image of the right tail lamp, and (b) is a right tire. (C) is a similarity evaluation map for the reference image of the number plate, (d) is a similarity evaluation map for the reference image of the left tail lamp, and (e) is the left side. It is a similarity evaluation map with respect to the reference image of a tire. 図１０の類似度評価マップを重心にオフセットしたオフセット類似度評価マップであり、（ａ）が右側テールランプの参照画像に対するオフセット類似度評価マップであり、（ｂ）が右側タイヤの参照画像に対するオフセット類似度評価マップであり、（ｃ）がナンバプレートの参照画像に対するオフセット類似度評価マップであり、（ｄ）が左側テールランプの参照画像に対するオフセット類似度評価マップであり、（ｅ）が左側タイヤの参照画像に対するオフセット類似度評価マップである。10 is an offset similarity evaluation map obtained by offsetting the similarity evaluation map of FIG. 10 to the center of gravity, (a) is an offset similarity evaluation map for a reference image of the right tail lamp, and (b) is an offset similarity for a reference image of the right tire. (C) is an offset similarity evaluation map for the reference image of the number plate, (d) is an offset similarity evaluation map for the reference image of the left tail lamp, and (e) is a reference of the left tire. It is an offset similarity evaluation map with respect to an image. 図１１の５つのオフセット類似度評価マップを統合させた統合マップである。12 is an integrated map obtained by integrating the five offset similarity evaluation maps in FIG. 11.

Explanation of symbols

１…画像認識装置、２…カメラ、３…画像ＥＣＵ、１０…参照画像データベース、１１…オフセットデータベース、１２…画像解像度変換部、１３…画像間類似度評価部、１４…統合処理部 DESCRIPTION OF SYMBOLS 1 ... Image recognition apparatus, 2 ... Camera, 3 ... Image ECU, 10 ... Reference image database, 11 ... Offset database, 12 ... Image resolution conversion part, 13 ... Inter-image similarity evaluation part, 14 ... Integration processing part

Claims

Imaging means;
Reference image holding means for holding reference images of a plurality of characteristic parts of the recognition object;
A positional relationship holding means for holding a positional relationship between the position of the reference image in the entire image and an arbitrary point in the image corresponding to the reference image;
A comparison unit that compares the captured image captured by the imaging unit with a plurality of reference images held by the reference image holding unit;
Based on the positional relationship held by the positional relationship holding unit, each of the plurality of comparison results by the comparison unit is moved with reference to an arbitrary point in the image, and the plurality of comparison results moved are integrated, An image recognition apparatus comprising: a determination unit that determines whether the object is a recognition object based on the integration result .

Image size conversion means for converting the size of the captured image captured by the imaging means;
The image recognition apparatus according to claim 1, wherein the comparison unit compares the captured image whose size has been converted by the image size conversion unit with a plurality of reference images held by the reference image holding unit. .

An image recognition method for recognizing a recognition object from a captured image,
Corresponding to the reference image and the reference image of the plurality of characteristic parts of the recognition target object, the positional relationship between the position of the reference image in the entire image and an arbitrary point in the image is held in advance.
A comparison step of comparing the captured image and the plurality of held reference images respectively;
Based on the held positional relationship, the plurality of comparison results in the comparison step are moved with reference to any point in the image, the plurality of moved comparison results are integrated, and based on the integration result And a determination step of determining whether the object is a recognition object.

Including an image size conversion step of converting the size of the captured image;
4. The image recognition method according to claim 3, wherein in the comparison step, the captured image whose size has been converted in the image size conversion step is compared with the plurality of held reference images.