JP2009069996A

JP2009069996A - Image processing device and image processing method, recognition device and recognition method, and program

Info

Publication number: JP2009069996A
Application number: JP2007235778A
Authority: JP
Inventors: Jun Yokono; 順横野
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2007-09-11
Filing date: 2007-09-11
Publication date: 2009-04-02

Abstract

PROBLEM TO BE SOLVED: To separate a background from image data to extract an area corresponding to an object to be recognized. SOLUTION: In an image obtained by passing background-included image data taken by imaging with a focus camera through a 3×3 filter that the coefficient of a target pixel is 8 and the coefficient of all of 8 pixels in the vicinity thereof is -1, a black area indicates an out-of-focus pixel that the difference between it and the vicinity pixel is nearly zero and a bright (white) area indicates an in-focus pixel having a sharp pattern. A filtering process of determining an average between the target pixel and its vicinity area is performed on the image to bind an area looking like the object to be recognized thereto. After a result of filtering has been binarized, Morphological processing is performed or masking is performed on a part which is not the black area from which the value is generated as the result of filtering to divide the area to separate the background from the image data. The present invention is applied to the image processing system, a learning device or a recognition device. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、処理装置および画像処理方法、認識装置および認識方法、並びに、プログラムに関し、特に、画像を用いた認識処理を行う場合に用いて好適な、処理装置および画像処理方法、認識装置および認識方法、並びに、プログラムに関する。 The present invention relates to a processing device, an image processing method, a recognition device, a recognition method, and a program, and in particular, a processing device, an image processing method, a recognition device, and a recognition suitable for performing recognition processing using an image. The present invention relates to a method and a program.

実環境で物体認識を、画像（例えば、カメラなどを用いて撮像された画像）を用いて行う場合、画像内には、認識するべき物体のみならず、背景が含まれる。認識するべき物体が、背景を含む雑多なシーン中のどこにあるのか(localization)を認識するにあたって、背景のパターン（形状、色など）が誤認識の原因になることから、背景の雑多な環境での物体認識は非常に困難である。 When object recognition is performed using an image (for example, an image captured using a camera or the like) in a real environment, the image includes not only the object to be recognized but also the background. In recognizing where the object to be recognized is located in a miscellaneous scene including the background (localization), the background pattern (shape, color, etc.) can cause misrecognition. The object recognition is very difficult.

例えば、従来、複眼により撮像された画像によりステレオ計算を用いて、撮像されている物体のカメラからの距離に基づいて、認識物をセグメンテーション（切り出し）する方法があった。 For example, conventionally, there has been a method of segmenting (cutting out) a recognized object based on the distance from the camera of an object being imaged using stereo calculation from an image imaged by a compound eye.

また、画像処理による物体認識（Object Recognition）の手法は、近年様々なものが提案されており、最近１０年で飛躍的に向上している。 In addition, various methods for object recognition by image processing have been proposed in recent years, and have improved dramatically in the last 10 years.

画像処理による物体認識には、特徴量、幾何学的接続、および、識別器の３つが必要である。すなわち、画像処理による物体認識の手法は、特徴量の選択と、幾何学的接続の方法の選択と、識別器の選択とをどのようにするかによって決まる。 Object recognition by image processing requires three features: a feature value, a geometric connection, and a classifier. That is, the method of object recognition by image processing depends on how to select a feature quantity, a method of geometric connection, and a classifier.

まず、特徴量として何を使うのかについては、グローバルな特徴量を使うのか、局所的な特徴量を使うのか、また、局所的な特徴量の中でも、どのような計算をするのかなどで選択肢が多数ある。 First of all, as to what to use as the feature quantity, there are choices depending on whether to use a global feature quantity, a local feature quantity, or what kind of calculation to use among local feature quantities. There are many.

そして、例えば、局所領域特徴を使う場合、それらを幾何学的につなげる方法を選択する必要があり、大きく分けると、テンプレートを明示的に用意する方法(explicit template)と、“投票”(implicit voting method)により暗示的に求める方法がある。 For example, when using local region features, it is necessary to select a method for geometrically connecting them, and roughly speaking, a method of explicitly preparing a template (explicit template) and “voting” (implicit voting) method).

そして、最終的には、その特徴量を使って物体（認識するべき対象）を表現するために、どのような識別器を用いるのかということを考えなくてはならない。 Finally, it is necessary to consider what classifier is used to represent the object (target to be recognized) using the feature amount.

特徴量について言えば、近年の手法では、全体の画像を局所領域（local region）と呼ばれるいくつかの小さい領域に分割し、その局所領域から得られる特徴点や特徴量といった局所情報に基づいて物体認識を行うことが主流になりつつある。なお、この局所領域という表現は、局所記述子（local descriptor）、コンポーネント（component）、パーツ（parts）、フラグメント（fragments）等、様々な呼称を有する。 Speaking of features, in recent methods, the entire image is divided into several small regions called local regions, and the object is based on local information such as feature points and features obtained from the local regions. Recognition is becoming mainstream. The expression “local region” has various names such as a local descriptor, a component, a part, and a fragment.

このような局所情報の例としては、ガボアジェット(Gabor Jet)や、ハーウェイブレット(Haar Wavelett)、ガウシアン導関数(Gaussian Derivatives)、ＳＩＦＴ特徴などが挙げられる。また、識別器としては、統計学習器械がよく用いられるが、例としては、サポートベクターマシン（ＳＶＭ：Support Vector Machine）や、ブースティング、ベイズ推定などがあげられる。 Examples of such local information include Gabor Jet, Haar Wavelet, Gaussian Derivatives, SIFT features, and the like. As a discriminator, a statistical learning machine is often used. Examples thereof include a support vector machine (SVM), boosting, and Bayesian estimation.

従来、上述したように、さまざまな特徴量、幾何学的接続方法、および、識別器をもちいて、認識処理が行われてきた（例えば、非特許文献１乃至被特許文献１１参照）。 Conventionally, as described above, recognition processing has been performed using various feature amounts, geometric connection methods, and discriminators (see, for example, Non-Patent Document 1 to Patent Document 11).

Object Recognition with Cortex-like MechanismsSerre, T., L. Wolf, S. Bileschi, M. Riesenhuber and T. Poggio. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 3, 411-426, 2007.Object Recognition with Cortex-like MechanismsSerre, T., L. Wolf, S. Bileschi, M. Riesenhuber and T. Poggio.IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 3, 411-426, 2007.

R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scale-invariant learning. In Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR), pages: II-264- II-271 vol.2June 2003R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scale-invariant learning.In Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR), pages: II-264- II-271 vol.2 2003

B. Leibe, A. Leonardis, and B. Schiele. Combined object categorization and segmentation with an implicit shape model. In European Conference on Computer Vision (ECCV'04)Workshop on Statistical Learning in Computer Vision, 2004.B. Leibe, A. Leonardis, and B. Schiele. Combined object categorization and segmentation with an implicit shape model.In European Conference on Computer Vision (ECCV'04) Workshop on Statistical Learning in Computer Vision, 2004.

Contour-Based Learning for Object DetectionJamie Shotton, Andrew Blake, and Roberto CipollaInternational Conference on Computer Vision(ICCV 2005)Contour-Based Learning for Object Detection Jamie Shotton, Andrew Blake, and Roberto Cipolla International Conference on Computer Vision (ICCV 2005)

M. Vidal-Naquet and S. Ullman. Object recognition with informative features and linear classification. In ICCV, pages 281-288, 2003.M. Vidal-Naquet and S. Ullman.Object recognition with informative features and linear classification.In ICCV, pages 281-288, 2003.

Using the forest to see the trees: a graphical model relating features, objects and scenesP. Murphy, A. Torralba and W. T. FreemanAdv. in Neural Information Processing Systems 16 (NIPS), Vancouver, BC, MIT Press, 2003.Using the forest to see the trees: a graphical model relating features, objects and scenes P. Murphy, A. Torralba and W. T. Freeman Adv. In Neural Information Processing Systems 16 (NIPS), Vancouver, BC, MIT Press, 2003.

The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features.K. Grauman and T. Darrell. International Conference on Computer Vision(ICCV 2005)The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features.K. Grauman and T. Darrell.International Conference on Computer Vision (ICCV 2005)

Discovering objects and their location in imagesJosef Sivic, B.Russell, A.Efros, A.Zisserman, W.FreemanInternational Conference on Computer Vision(ICCV 2005)Discovering objects and their location in images Josef Sivic, B. Russell, A. Efros, A. Zisserman, W. Freeman International Conference on Computer Vision (ICCV 2005)

SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category RecognitionHao Zhang Alexander C. Berg Michael Maire Jitendra MalikInternational Conference on Computer Vision and Pattern Recognition (CVPR 2006)SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition Hao Zhang Alexander C. Berg Michael Maire Jitendra Malik International Conference on Computer Vision and Pattern Recognition (CVPR 2006)

G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1-22, 2004.G. Csurka, C. Bray, C. Dance, and L. Fan.Visual categorization with bags of keypoints.In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1-22, 2004.

Ian R. Fasel, Learning to Detect Objects in Real-Time: Probabilistic Generative Approaches, PhD thesis, UCSD, June 2006Ian R. Fasel, Learning to Detect Objects in Real-Time: Probabilistic Generative Approaches, PhD thesis, UCSD, June 2006

図１を用いて、非特許文献１乃至非特許文献１１について説明する。 Non-Patent Document 1 to Non-Patent Document 11 will be described with reference to FIG.

非特許文献１に記載の認識処理に用いられているアプローチ方法は、HMAXと称され、特徴量には、ガボアフィルタの組み合わせで位置ずれに強くした特徴量であるC2(gabor)を用い、近傍でのベストマッチ特徴量であるlocal MAX poolingを用いて、RBF（RadialBasisFunction） Networkにより識別を行うようになされている。 The approach method used in the recognition processing described in Non-Patent Document 1 is called HMAX, and the feature value is C2 (gabor), which is a feature value that is strong against misalignment by a combination of Gabor filters. Using local MAX pooling, which is the best match feature amount, identification is performed by an RBF (Radial Basis Function) Network.

非特許文献２に記載の認識処理に用いられているアプローチ方法は、Constellationと称され、特徴量には輝度画像パッチであるgray patchを用い、ガウス分布を利用して、ML（Maximum Likelihood）(Bayes)により識別を行うようになされている。 The approach method used in the recognition processing described in Non-Patent Document 2 is called Constellation, and a gray patch that is a luminance image patch is used as a feature value, and a Gaussian distribution is used to calculate ML (Maximum Likelihood) ( (Bayes) is used for identification.

非特許文献３に記載の認識処理に用いられているアプローチ方法は、ISM（Implicit Shape Mode）と称され、特徴量には輝度画像パッチであるgray patchを用い、テンプレートを投票(implicit voting method)により暗示的に求める方法を利用して、SVM（SupportVectorMachines）により認識を行うようになされている。 The approach method used in the recognition process described in Non-Patent Document 3 is called ISM (Implicit Shape Mode), and a gray patch that is a luminance image patch is used as a feature quantity, and a template is voted (implicit voting method). By using the method of obtaining implicitly, recognition is performed by SVM (Support Vector Machines).

非特許文献４に記載の認識処理に用いられているアプローチ方法は、この文献の著者の名前から、Blakeと称され、特徴量にはエッジ画像（edgels）を用い、star modelを利用して、ブースティング（Boosting）アルゴリズムにより認識を行うようになされている。 The approach method used in the recognition process described in Non-Patent Document 4 is called Blake from the name of the author of this document, using edge images (edgels) as feature quantities, and using a star model, Recognition is performed by a boosting algorithm.

非特許文献５に記載の認識処理に用いられているアプローチ方法は、Fragmentsと称され、特徴量には輝度画像パッチであるgray patchを用い、テンプレートを用いる方法を利用して、相互情報量（mutual info）により認識を行うようになされている。 The approach method used in the recognition process described in Non-Patent Document 5 is called Fragments, and the feature amount uses a gray patch that is a luminance image patch, and a template method is used to obtain mutual information ( Recognition by mutual info).

非特許文献６に記載の認識処理に用いられているアプローチ方法は、この文献の著者の名前から、Torralbaと称され、特徴量として、GD（Gaussian Derivatives）Lap（Laplacians）Haar（Haar Features）を用い、テンプレートを用いる方法を利用して、ブースティング（Boosting）アルゴリズムおよびベイズ（Bayes）推定により認識を行うようになされている。 The approach used in the recognition process described in Non-Patent Document 6 is called Torralba from the name of the author of this document, and GD (Gaussian Derivatives) Lap (Laplacians) Haar (Haar Features) is used as a feature quantity. Using a template-based method, recognition is performed by a boosting algorithm and Bayesian estimation.

非特許文献７に記載の認識処理に用いられているアプローチ方法は、PMK（Pyramid Match Kernel）と称されている。これは、２つのbag同士の部分マッチングに基づいて類似度を計算するカーネル関数を提案するものであり、特徴量として、Pyramid Histogramを用いて、SVMを用いた画像分類を行うようになされている。 The approach method used for the recognition processing described in Non-Patent Document 7 is called PMK (Pyramid Match Kernel). This proposes a kernel function that calculates the degree of similarity based on partial matching between two bags, and uses Pyramid Histogram as a feature quantity to perform image classification using SVM. .

非特許文献８に記載の認識処理に用いられているアプローチ方法は、pLSA（probabilistic Latent Semantic Analysis）と称され、特徴量として、SIFT（Scale Invariant Feature Transform）を用い、pLSAにより認識を行うようになされている。 The approach method used in the recognition process described in Non-Patent Document 8 is called pLSA (probabilistic Latent Semantic Analysis), and SIFT (Scale Invariant Feature Transform) is used as a feature quantity so that recognition is performed by pLSA. Has been made.

非特許文献９に記載の認識処理に用いられているアプローチ方法は、SVM-kNN（k-NearestNeighbor；最近傍分類法）と称され、特徴量として、shape contextが用いられ、テンプレートを利用して、まずｋNN探索を行い、所定数がすべて同じラベルの対象ならそのクラスに分類し、そうでなければ、マルチクラスSVMを実行することにより認識を行うようになされている。 The approach method used for the recognition process described in Non-Patent Document 9 is called SVM-kNN (k-NearestNeighbor), and the shape context is used as a feature quantity, and a template is used. First, a kNN search is performed, and if a predetermined number of objects have the same label, they are classified into that class. Otherwise, recognition is performed by executing a multi-class SVM.

非特許文献１０に記載の認識処理に用いられているアプローチ方法は、Bag-of-Featuresと称され、特徴量として、SIFTを用い、SVMにより認識を行うようになされている。 The approach method used for the recognition process described in Non-Patent Document 10 is called Bag-of-Features, and uses SIFT as a feature quantity and performs recognition by SVM.

非特許文献１１に記載の認識処理に用いられているアプローチ方法は、この文献の著者の名前から、Ianと称され、特徴量として、Haar型の特徴量が用いられ、テンプレートを利用して、ベイズ（Bayes）推定により認識を行うようになされている。 The approach method used in the recognition process described in Non-Patent Document 11 is called Ian from the name of the author of this document, and a Haar type feature quantity is used as a feature quantity. Recognition is performed by Bayesian estimation.

しかしながら、認識するべき物体が、背景を含む雑多なシーン中のどこにあるのかを簡単な方法を用いて検出することができなかった。例えば、従来利用されていた、ステレオ計算を用いたセグメンテーションを用いるためには、複眼カメラを利用しなければならず、コストアップの原因となっていた。そこで、例えば、単眼カメラにより得られた画像データから、ユーザの操作などにより、必要な領域を切り出すような煩雑な操作が行われていた。 However, it has not been possible to detect where an object to be recognized is in a miscellaneous scene including a background using a simple method. For example, in order to use segmentation using stereo calculation, which has been conventionally used, a compound eye camera must be used, which causes an increase in cost. Therefore, for example, a complicated operation for cutting out a necessary area from the image data obtained by a monocular camera is performed by a user operation or the like.

そこで、コストを抑制しつつ、煩雑な操作を行うことなく、容易に、認識するべき物体を画像データから抽出する、すなわち、背景を分離することができる技術が求められている。 Therefore, there is a need for a technique that can easily extract an object to be recognized from image data, that is, separate a background, without performing complicated operations while suppressing cost.

また、上述した非特許文献１乃至１１に記載されている従来技術においては、局所情報として得られる特徴量がその種類によって内容が異なり、相互に互換性が担保されない。例えば、色に関する特徴量と形に関する特徴量とでは、一般にベクトルの次元やスケールが異なるため、互いに比較対象とはなり得ない。従って、異なる種類の特徴量を利用して物体の認識に役立てることは困難であった。 Further, in the conventional techniques described in Non-Patent Documents 1 to 11 described above, the content of the feature amount obtained as local information differs depending on the type, and compatibility is not ensured. For example, a feature quantity relating to color and a feature quantity relating to shape generally have different vector dimensions and scales, and therefore cannot be compared with each other. Therefore, it is difficult to use different types of feature quantities for object recognition.

本発明はこのような状況に鑑みてなされたものであり、認識するべき物体を画像から簡単に抽出して物体認識を行うことができるようにするものである。 The present invention has been made in view of such a situation, and makes it possible to easily extract an object to be recognized from an image and perform object recognition.

本発明の第１の側面の画像処理装置は、認識対象を認識するための認識器を学習処理により予め生成する画像処理装置であって、前記学習処理に用いる学習画像を取得する学習画像取得手段と、前記認識対象に対応するモデル画像を取得するモデル画像取得手段と、前記学習画像取得手段により取得された前記学習画像と前記モデル画像取得手段により取得された前記モデル画像とを用いて前記学習処理を実行し、前記認識対象を認識するための認識器を生成する認識器生成手段とを備え、前記学習画像取得手段または前記モデル画像取得手段のうちの少なくとも一方は、所定の焦点距離に存在する被写体の像に焦点が合致し、それ以外の物体には焦点が合致していない画像データを取得する画像取得手段と、前記画像取得手段により取得された前記画像データから、焦点が合致した前記被写体に対応する部分を抽出する画像抽出手段とを備え、前記画像抽出手段により抽出された前記被写体に対応する部分を、前記学習画像、または、前記モデル画像として取得する。 An image processing apparatus according to a first aspect of the present invention is an image processing apparatus that generates a recognizer for recognizing a recognition target in advance by a learning process, and acquires a learning image used for the learning process. Using the model image acquisition means for acquiring a model image corresponding to the recognition target, the learning image acquired by the learning image acquisition means, and the model image acquired by the model image acquisition means A recognizer generating unit that executes a process and generates a recognizer for recognizing the recognition target, and at least one of the learning image acquiring unit or the model image acquiring unit exists at a predetermined focal length An image acquisition unit that acquires image data that is in focus with the image of the subject to be imaged and that is not in focus with other objects, and the image acquisition unit acquires the image data. Image extraction means for extracting a portion corresponding to the subject in focus from the image data, and the portion corresponding to the subject extracted by the image extraction means is the learning image or the model. Obtain as an image.

前記画像抽出手段には、前記画像取得手段により取得された前記画像データの各画素において、近傍の画素との差分が大きい画素を抽出するための演算処理を実行する第１の演算手段と、前記第１の演算手段により抽出された近傍の画素との差分が大きい画素を注目画素として、前記注目画素とその近傍領域との平均を求める第２の演算手段と、前記第２の演算手段の演算結果に基づいて、前記画像データを、検出するべき物体に対応する領域と、背景であると考えられる領域に分割する分割手段とを含ませるようにすることができる。 The image extraction unit includes a first calculation unit that executes a calculation process for extracting a pixel having a large difference from a neighboring pixel in each pixel of the image data acquired by the image acquisition unit; A second arithmetic unit that obtains an average of the target pixel and its neighboring region using a pixel having a large difference from a neighboring pixel extracted by the first arithmetic unit as a target pixel, and an operation of the second arithmetic unit Based on the result, the image data can include a region corresponding to the object to be detected and a dividing unit that divides the image data into regions considered to be the background.

前記分割手段には、前記第２の演算手段の演算結果を所定の閾値で２値化することにより、検出するべき物体に対応する領域と、背景であると考えられる領域に分割させるようにすることができる。 The dividing means binarizes the calculation result of the second calculating means with a predetermined threshold value, thereby dividing the result into an area corresponding to the object to be detected and an area considered to be the background. be able to.

前記分割手段にには、前記第２の演算手段の演算結果が正の値である画素を検出するべき物体に対応する領域と認識させるようにすることができる。 The dividing unit can be made to recognize a pixel whose calculation result of the second calculation unit is a positive value as a region corresponding to an object to be detected.

前記認識器生成手段は、前記モデル画像取得手段により取得された前記モデル画像から複数の特徴点をモデル特徴点として生成するモデル特徴点生成手段と、前記モデル特徴点生成手段により生成された前記モデル特徴点のそれぞれにおける特徴量をモデル特徴量として生成するモデル特徴量生成手段と、前記学習画像取得手段により取得された前記学習画像から複数の特徴点を学習特徴点として生成する学習特徴点生成手段と、前記学習特徴点生成手段により生成された前記学習特徴点のそれぞれにおける特徴量を学習特徴量として生成する学習特徴量生成手段と、前記モデル特徴量生成手段により生成された前記モデル特徴量の各々について、前記学習特徴量生成手段により生成された前記学習特徴量のうち最も相関の高いものを選択して、選択された前記学習特徴量との間の相関の程度を学習相関特徴量として生成する学習相関特徴量生成手段と、前記学習画像が前記認識対象を含むか否かを示す正誤情報を取得する正誤情報取得手段と、前記学習相関特徴量生成手段により生成された前記学習相関特徴量、および、前記正誤情報取得手段により取得された前記正誤情報に基づいて認識器を生成する認識器生成手段とを備えさせるようにすることができる。 The recognizer generating unit includes a model feature point generating unit that generates a plurality of feature points as model feature points from the model image acquired by the model image acquiring unit, and the model generated by the model feature point generating unit. Model feature quantity generation means for generating a feature quantity at each feature point as a model feature quantity, and learning feature point generation means for generating a plurality of feature points as learning feature points from the learning image acquired by the learning image acquisition means Learning feature amount generating means for generating, as a learning feature amount, a feature amount at each of the learning feature points generated by the learning feature point generating means, and the model feature amount generated by the model feature amount generating means. For each of the learning feature amounts generated by the learning feature amount generation means, the one with the highest correlation is selected. Learning correlation feature value generating means for generating a degree of correlation with the selected learning feature value as a learning correlation feature value, and acquiring correct / incorrect information indicating whether or not the learning image includes the recognition target Corrector information acquisition means, recognizer generation means for generating a recognizer based on the learned correlation feature value generated by the learned correlation feature value generation means, and the correctness information acquired by the correctness information acquisition means; Can be provided.

前記モデル特徴点生成手段により生成される前記モデル特徴点は、前記モデル特徴点における前記モデル特徴量の種類に応じて選択されるものとすることができ、前記学習特徴点生成手段により生成される前記学習特徴点は、前記学習特徴点における前記学習特徴量の種類に応じて選択されるものとすることができる。 The model feature point generated by the model feature point generation unit may be selected according to the type of the model feature amount in the model feature point, and is generated by the learning feature point generation unit. The learning feature point may be selected according to the type of the learning feature amount at the learning feature point.

前記モデル特徴量生成手段により生成される前記モデル特徴量は、前記モデル特徴量の種類に応じて選択されるものとすることができ、前記学習特徴量生成手段により生成される前記学習特徴量は、前記学習特徴量の種類に応じて選択されるものとすることができる。 The model feature quantity generated by the model feature quantity generation means can be selected according to the type of the model feature quantity, and the learning feature quantity generated by the learning feature quantity generation means is The learning feature value may be selected according to the type.

前記認識器生成手段には、重み付き投票に基づく学習処理により、前記認識器を生成させるようにすることができる。 The recognizer generation means can generate the recognizer by a learning process based on weighted voting.

前記重み付き投票に基づく学習処理は、ブースティングアルゴリズムであるものとすることができる。 The learning process based on the weighted voting may be a boosting algorithm.

前記画像抽出手段には、前記画像取得手段により取得された前記画像データのうちの焦点が合致していない領域を抽出することにより、焦点が合致した前記被写体に対応する部分を抽出させるようにすることができる。 The image extracting unit extracts a portion of the image data acquired by the image acquiring unit that is out of focus, thereby extracting a portion corresponding to the subject in focus. be able to.

前記画像抽出手段には、ＦＦＴを用いて、前記画像取得手段により取得された前記画像データを構成する各画像領域の周波数スペクトルを分析させ、高周波成分が十分含まれている領域では焦点が合致していると判定させることにより、焦点が合致した前記被写体に対応する部分を抽出させるようにすることができる。 The image extraction means uses FFT to analyze the frequency spectrum of each image area constituting the image data acquired by the image acquisition means, and the focus is matched in an area that contains sufficient high-frequency components. Therefore, it is possible to extract a portion corresponding to the subject in focus.

前記認識器生成手段により生成された前記認識器を記憶する認識器記憶手段と、前記認識器記憶手段により記憶されている前記認識器のそれぞれに対応する選択特徴量を記憶する選択特徴量記憶手段と、認識処理を行うために用いられる認識画像を取得する認識画像取得手段と、前記認識画像取得手段により取得された前記認識画像から複数の特徴点を認識特徴点として生成する認識特徴点生成手段と、前記認識特徴点生成手段により生成された前記認識特徴点のそれぞれにおける特徴量を認識特徴量として生成する認識特徴量生成手段と、前記選択特徴量記憶手段により記憶される前記選択特徴量のそれぞれについて前記認識特徴量生成手段により生成された前記認識特徴量のうち最も相関の高いものを選択して、選択された前記認識特徴量との間の相関の程度を認識相関特徴量として生成する認識相関特徴量生成手段と、前記認識相関特徴量生成手段により生成された前記認識相関特徴量を、前記認識器生成手段により生成された前記認識器に代入することによって、前記認識画像取得手段により取得された前記認識画像に前記認識対象が含まれているか否かを判断する認識処理手段とを更に備えさせるようにすることができる。 A recognizer storage unit that stores the recognizer generated by the recognizer generation unit, and a selection feature amount storage unit that stores a selection feature amount corresponding to each of the recognizers stored by the recognizer storage unit. A recognition image acquisition means for acquiring a recognition image used for performing recognition processing, and a recognition feature point generation means for generating a plurality of feature points as recognition feature points from the recognition image acquired by the recognition image acquisition means A recognition feature value generation unit that generates a feature value at each of the recognition feature points generated by the recognition feature point generation unit as a recognition feature value, and a selection feature value stored in the selection feature value storage unit. For each of the recognition feature values generated by the recognition feature value generation means, the one having the highest correlation is selected, and the selected recognition feature is selected. A recognition correlation feature quantity generating means for generating a degree of correlation as a recognition correlation feature quantity, and the recognition correlation feature quantity generated by the recognition correlation feature quantity generation means generated by the recognizer generation means. By substituting in the recognizer, it is possible to further comprise recognition processing means for determining whether or not the recognition object is included in the recognition image acquired by the recognition image acquisition means.

前記認識画像取得手段には、前記画像取得手段および前記画像抽出手段を備えさせるようにすることができ、前記画像抽出手段により抽出された前記被写体に対応する部分を、前記認識画像として取得させるようにすることができる。 The recognition image acquisition unit may include the image acquisition unit and the image extraction unit, and a part corresponding to the subject extracted by the image extraction unit may be acquired as the recognition image. Can be.

本発明の第１の側面の画像処理方法は、認識対象を認識するための認識器を学習処理により予め生成する画像処理装置の画像処理方法であって、前記学習処理に用いる学習画像を取得し、前記認識対象に対応するモデル画像を取得し、取得された学習画像と前記モデル画像とを用いて前記学習処理を実行し、前記認識対象を認識するための認識器を生成するステップを含み、前記学習画像を取得するステップ、または、前記モデル画像を取得するステップのうちの少なくとも一方は、所定の焦点距離に存在する被写体の像に焦点が合致し、それ以外の物体には焦点が合致していない画像データを取得し、取得された前記画像データから、焦点が合致した前記被写体に対応する部分を抽出するステップを含み、抽出された前記被写体に対応する部分を、前記学習画像、または、前記モデル画像として取得する。 An image processing method according to a first aspect of the present invention is an image processing method of an image processing apparatus that generates a recognizer for recognizing a recognition target in advance by learning processing, and acquires a learning image used for the learning processing. Obtaining a model image corresponding to the recognition target, performing the learning process using the acquired learning image and the model image, and generating a recognizer for recognizing the recognition target, At least one of the step of acquiring the learning image or the step of acquiring the model image is focused on an image of a subject existing at a predetermined focal distance, and focused on other objects. A step of acquiring non-image data and extracting a portion corresponding to the focused subject from the acquired image data, and corresponding to the extracted subject Min, the learning image, or to obtain, as the model image.

本発明の第１の側面のプログラムは、認識対象を認識するための認識器を学習処理により予め生成する処理をコンピュータに実行させるためのプログラムであって、前記学習処理に用いる学習画像の取得を制御し、前記認識対象に対応するモデル画像の取得を制御し、取得された学習画像と前記モデル画像とを用いて前記学習処理を実行し、前記認識対象を認識するための認識器を生成するステップを含み、前記学習画像を取得するステップ、または、前記モデル画像を取得するステップのうちの少なくとも一方は、所定の焦点距離に存在する被写体の像に焦点が合致し、それ以外の物体には焦点が合致していない画像データの取得を制御し、取得された前記画像データから、焦点が合致した前記被写体に対応する部分を抽出するステップを含み、抽出された前記被写体に対応する部分を、前記学習画像、または、前記モデル画像として取得する処理をコンピュータに実行させる。 A program according to a first aspect of the present invention is a program for causing a computer to execute processing for generating a recognizer for recognizing a recognition target in advance by learning processing, and acquiring a learning image used for the learning processing. Control, acquisition of the model image corresponding to the recognition target is controlled, the learning process is executed using the acquired learning image and the model image, and a recognizer for recognizing the recognition target is generated. And at least one of the step of acquiring the learning image or the step of acquiring the model image is in focus on an image of a subject existing at a predetermined focal length, and other objects include Controlling the acquisition of image data that is out of focus, and extracting a portion corresponding to the subject in focus from the acquired image data; Look, the extracted part corresponding to the subject, the learning image, or to execute a process of acquiring, as the model image to the computer.

本発明の第１の側面においては、学習処理に用いる学習画像が取得され、認識対象に対応するモデル画像が取得され、取得された学習画像とモデル画像とを用いて学習処理が実行され、認識対象を認識するための認識器が生成される。そして、学習画像の取得、または、モデル画像の取得の少なくとも一方においては、所定の焦点距離に存在する被写体の像に焦点が合致し、それ以外の物体には焦点が合致していない画像データが取得され、取得された画像データから、焦点が合致した被写体に対応する部分が抽出され、抽出された被写体に対応する部分が、学習画像、または、モデル画像として取得される。 In the first aspect of the present invention, a learning image used for learning processing is acquired, a model image corresponding to a recognition target is acquired, learning processing is executed using the acquired learning image and model image, and recognition is performed. A recognizer for recognizing the object is generated. In at least one of the acquisition of the learning image or the acquisition of the model image, there is image data in which the focus is matched to the image of the subject existing at a predetermined focal length and the focus is not matched to the other objects. A portion corresponding to the subject in focus is extracted from the acquired image data, and a portion corresponding to the extracted subject is acquired as a learning image or a model image.

本発明の第２の側面の認識装置は、学習処理により生成された認識器を用いて、認識対象が認識画像に含まれているか否かを判断する認識処理を行う認識装置であって、認識処理を行うために用いられる前記認識画像を取得する認識画像取得手段と、前記認識器を記憶する認識器記憶手段と、前記認識器記憶手段により記憶されている前記認識器のそれぞれに対応する選択特徴量を記憶する選択特徴量記憶手段と、前記認識器記憶手段により記憶されている前記認識器および前記選択特徴量記憶手段により記憶されている前記選択特徴量を用いて、前記認識画像取得手段により取得された前記認識画像に前記認識対象が含まれているか否かを判断する認識処理手段とを備え、前記認識画像取得手段は、所定の焦点距離に存在する被写体の像に焦点が合致し、それ以外の物体には焦点が合致していない画像データを取得する画像取得手段と、前記画像取得手段により取得された前記画像データから、焦点が合致した前記被写体に対応する部分を抽出する画像抽出手段とを備え、前記画像抽出手段により抽出された前記被写体に対応する部分を、前記認識画像として取得する。 A recognition device according to a second aspect of the present invention is a recognition device that performs recognition processing for determining whether or not a recognition target is included in a recognition image, using a recognizer generated by learning processing. Recognition image acquisition means for acquiring the recognition image used for performing processing, recognition device storage means for storing the recognition device, and selection corresponding to each of the recognition devices stored by the recognition device storage means Using the selected feature quantity storage means for storing the feature quantity, the recognizer stored in the recognizer storage means, and the selected feature quantity stored in the selected feature quantity storage means, the recognized image acquisition means Recognition processing means for determining whether or not the recognition target is included in the recognition image acquired by the recognition image acquisition means, the recognition image acquisition means to the image of the subject existing at a predetermined focal length Image acquisition means for acquiring image data in which points match and other objects are not in focus, and a portion corresponding to the subject in focus from the image data acquired by the image acquisition means And a portion corresponding to the subject extracted by the image extraction means is acquired as the recognition image.

前記分割手段には、前記第２の演算手段の演算結果が正の値である画素を検出するべき物体に対応する領域と認識させるようにすることができる。 The dividing unit may recognize a pixel whose calculation result of the second calculation unit is a positive value as a region corresponding to an object to be detected.

前記認識処理手段には、前記認識画像取得手段により取得された前記認識画像から複数の特徴点を認識特徴点として生成する認識特徴点生成手段と、前記認識特徴点生成手段により生成された前記認識特徴点のそれぞれにおける特徴量を認識特徴量として生成する認識特徴量生成手段と、前記選択特徴量記憶手段に記憶される前記選択特徴量のそれぞれについて前記認識特徴量生成手段により生成された前記認識特徴量のうち最も相関の高いものを選択して、選択された前記認識特徴量との間の相関の程度を認識相関特徴量として生成する認識相関特徴量生成手段と、前記認識相関特徴量生成手段により生成された前記認識相関特徴量を、前記認識器記憶手段により記憶された前記認識器に代入することによって、前記認識画像取得手段により取得された前記認識画像に前記認識対象が含まれているか否かを判断する判断手段とを備えさせるようにすることができる。 The recognition processing means includes recognition feature point generation means for generating a plurality of feature points as recognition feature points from the recognition image acquired by the recognition image acquisition means, and the recognition generated by the recognition feature point generation means. Recognition feature value generation means for generating a feature value at each feature point as a recognition feature value, and the recognition feature value generated by the recognition feature value generation means for each of the selected feature values stored in the selected feature value storage means A recognition correlation feature value generating unit that selects a feature value having the highest correlation among the feature values and generates a degree of correlation with the selected recognition feature value as a recognition correlation feature value; and the recognition correlation feature value generation By substituting the recognized correlation feature quantity generated by the means into the recognizer stored by the recognizer storage means, the recognized image acquisition means It can be made to make a determination means for determining whether or not the recognition target on the acquired recognition image contains.

前記認識器記憶手段により記憶されている前記認識器は、所定のモデル画像から複数の特徴点をモデル特徴点として生成し、前記モデル特徴点のそれぞれにおける特徴量をモデル特徴量として生成し、所定の学習画像から複数の特徴点を学習特徴点として生成し、前記学習特徴点のそれぞれにおける特徴量を学習特徴量として生成し、前記モデル特徴量の各々について、前記学習特徴量のうち最も相関の高いものを選択して、選択された前記学習特徴量との間の相関の程度を学習相関特徴量として生成し、前記学習画像が前記認識対象を含むか否かを示す正誤情報を取得し、前記学習相関特徴量、および、前記正誤情報に基づいて生成された認識器であるものとすることができる。 The recognizer stored by the recognizer storage unit generates a plurality of feature points as model feature points from a predetermined model image, generates a feature amount at each of the model feature points as a model feature amount, and A plurality of feature points are generated as learning feature points from the learning image, and a feature amount at each of the learning feature points is generated as a learning feature amount. For each of the model feature amounts, the most correlated among the learning feature amounts Select a high one, generate a degree of correlation with the selected learning feature amount as a learning correlation feature amount, obtain correct / incorrect information indicating whether the learning image includes the recognition target, It may be a recognizer generated based on the learning correlation feature quantity and the correct / incorrect information.

本発明の第２の側面の認識方法は、学習処理により生成され、記憶部に記憶された認識器、および、前記記憶部に記憶されている前記認識器のそれぞれに対応する選択特徴量を用いて、認識対象が認識画像に含まれているか否かを判断する認識処理を行う認識装置の認識方法であって、認識処理を行うために用いられる前記認識画像を取得し、前記認識器および前記選択特徴量を用いて、取得された前記認識画像に前記認識対象が含まれているか否かを判断するステップを含み、前記認識画像を取得するステップの処理では、所定の焦点距離に存在する被写体の像に焦点が合致し、それ以外の物体には焦点が合致していない画像データを取得し、取得された前記画像データから、焦点が合致した前記被写体に対応する部分を抽出するステップを含み、抽出された前記被写体に対応する部分を、前記認識画像として取得する。 The recognition method according to the second aspect of the present invention uses a recognizer generated by learning processing and stored in a storage unit, and a selection feature amount corresponding to each of the recognizers stored in the storage unit. A recognition method of a recognition device that performs a recognition process for determining whether or not a recognition target is included in a recognition image, obtains the recognition image used for performing the recognition process, and A step of determining whether or not the recognition target is included in the acquired recognition image using a selected feature amount, and in the processing of the step of acquiring the recognition image, a subject existing at a predetermined focal length Obtaining image data that is in focus with the other image and not in focus with other objects, and extracting a portion corresponding to the subject in focus from the obtained image data Wherein the extracted part corresponding to the subject and obtained as the recognition image.

本発明の第２の側面のプログラムは、学習処理により生成され、記憶部に記憶された認識器、および、前記記憶部に記憶されている前記認識器のそれぞれに対応する選択特徴量を用いて、認識対象が認識画像に含まれているか否かを判断する処理をコンピュータに実行させるプログラムであって、認識処理を行うために用いられる前記認識画像の取得を制御し、前記認識器および前記選択特徴量を用いて、取得された前記認識画像に前記認識対象が含まれているか否かを判断するステップを含み、前記認識画像を取得するステップの処理では、所定の焦点距離に存在する被写体の像に焦点が合致し、それ以外の物体には焦点が合致していない画像データの取得を制御し、取得された前記画像データから、焦点が合致した前記被写体に対応する部分を抽出するステップを含み、抽出された前記被写体に対応する部分を、前記認識画像として取得する処理をコンピュータに実行させる。 A program according to a second aspect of the present invention uses a recognizer generated by learning processing and stored in a storage unit, and a selection feature amount corresponding to each of the recognizers stored in the storage unit. A program for causing a computer to execute a process of determining whether or not a recognition target is included in a recognition image, controlling acquisition of the recognition image used for performing the recognition process, the recognizer and the selection A step of determining whether or not the recognition target is included in the acquired recognition image using a feature amount, and in the processing of the step of acquiring the recognition image, a subject existing at a predetermined focal length Controls the acquisition of image data that is in focus on the image but not on any other object, and corresponds to the subject in focus from the acquired image data It comprises extracting a minute, the extracted part corresponding to the object to execute a process of acquiring, as the recognition image on the computer.

本発明の第２の側面においては、認識処理を行うために用いられる認識画像が取得され、認識器および選択特徴量が用いられて、取得された認識画像に認識対象が含まれているか否かが判断される。そして、認識画像が取得されるとき、所定の焦点距離に存在する被写体の像に焦点が合致し、それ以外の物体には焦点が合致していない画像データが取得され、取得された画像データから、焦点が合致した被写体に対応する部分が抽出され、抽出された被写体に対応する部分が、認識画像として取得される。 In the second aspect of the present invention, a recognition image used for performing recognition processing is acquired, and whether or not a recognition target is included in the acquired recognition image using a recognizer and a selected feature amount. Is judged. Then, when the recognition image is acquired, image data in which the focus is matched with the image of the subject existing at a predetermined focal length and the focus is not matched with other objects is acquired, and the acquired image data is used. A portion corresponding to the subject in focus is extracted, and a portion corresponding to the extracted subject is acquired as a recognition image.

ネットワークとは、少なくとも２つの装置が接続され、ある装置から、他の装置に対して、情報の伝達をできるようにした仕組みをいう。ネットワークを介して通信する装置は、独立した装置どうしであっても良いし、１つの装置を構成している内部ブロックどうしであっても良い。 The network is a mechanism in which at least two devices are connected and information can be transmitted from one device to another device. The devices that communicate via the network may be independent devices, or may be internal blocks that constitute one device.

また、通信とは、無線通信および有線通信は勿論、無線通信と有線通信とが混在した通信、即ち、ある区間では無線通信が行われ、他の区間では有線通信が行われるようなものであっても良い。さらに、ある装置から他の装置への通信が有線通信で行われ、他の装置からある装置への通信が無線通信で行われるようなものであっても良い。 The communication is not only wireless communication and wired communication, but also communication in which wireless communication and wired communication are mixed, that is, wireless communication is performed in a certain section and wired communication is performed in another section. May be. Further, communication from one device to another device may be performed by wired communication, and communication from another device to one device may be performed by wireless communication.

画像処理装置は、独立した装置であっても良いし、情報処理装置の記録処理を行うブロックであっても良い。また、学習装置や認識装置も、独立した装置であっても良いし、情報処理装置の記録処理を行うブロックであっても良い。 The image processing apparatus may be an independent apparatus or a block that performs recording processing of the information processing apparatus. In addition, the learning device and the recognition device may be independent devices or may be blocks that perform recording processing of the information processing device.

以上のように、本発明の第１の側面によれば、認識器を生成することができ、特に、認識器の学習に用いる画像のうちの少なくとも一部から、自動的に、認識対象の撮像領域を抽出することができる。 As described above, according to the first aspect of the present invention, a recognizer can be generated, and in particular, imaging of a recognition target is automatically performed from at least a part of an image used for learning of the recognizer. Regions can be extracted.

また、本発明の第２の側面によれば、認識処理を行うことができ、特に、認識処理に用いる画像のうちの少なくとも一部から、自動的に、認識対象の撮像領域を抽出することができる。 In addition, according to the second aspect of the present invention, recognition processing can be performed. In particular, an imaging region to be recognized can be automatically extracted from at least a part of an image used for recognition processing. it can.

以下に本発明の実施の形態を説明するが、本発明の構成要件と、明細書または図面に記載の実施の形態との対応関係を例示すると、次のようになる。この記載は、本発明をサポートする実施の形態が、明細書または図面に記載されていることを確認するためのものである。従って、明細書または図面中には記載されているが、本発明の構成要件に対応する実施の形態として、ここには記載されていない実施の形態があったとしても、そのことは、その実施の形態が、その構成要件に対応するものではないことを意味するものではない。逆に、実施の形態が構成要件に対応するものとしてここに記載されていたとしても、そのことは、その実施の形態が、その構成要件以外の構成要件には対応しないものであることを意味するものでもない。 Embodiments of the present invention will be described below. Correspondences between the constituent elements of the present invention and the embodiments described in the specification or the drawings are exemplified as follows. This description is intended to confirm that the embodiments supporting the present invention are described in the specification or the drawings. Therefore, even if there is an embodiment which is described in the specification or the drawings but is not described here as an embodiment corresponding to the constituent elements of the present invention, that is not the case. It does not mean that the form does not correspond to the constituent requirements. Conversely, even if an embodiment is described here as corresponding to a configuration requirement, that means that the embodiment does not correspond to a configuration requirement other than the configuration requirement. It's not something to do.

本発明の第１の側面の画像処理装置は、認識対象を認識するための認識器を学習処理により予め生成する画像処理装置（例えば、図１９の学習装置７１、または、画像処理システム５１に対応する装置）であって、前記学習処理に用いる学習画像を取得する学習画像取得手段（例えば、図１９の学習画像取得部９５）と、前記認識対象に対応するモデル画像を取得するモデル画像取得手段（例えば、図１９のモデル画像取得部９１）と、前記学習画像取得手段により取得された前記学習画像と前記モデル画像取得手段により取得された前記モデル画像とを用いて前記学習処理を実行し、前記認識対象を認識するための認識器を生成する認識器生成手段（例えば、図１９のモデル特徴点生成部９２、モデル特徴量生成部９３、モデル特徴量記憶部９４、学習特徴点生成部９６、学習特徴量生成部９７、学習相関特徴量生成部９８、正誤情報取得部９９、および、認識器生成部１００）とを備え、前記学習画像取得手段または前記モデル画像取得手段のうちの少なくとも一方は、所定の焦点距離に存在する被写体の像に焦点が合致し、それ以外の物体には焦点が合致していない画像データ（例えば、図３のＡに示されるようなフォーカスカメラにより撮像された画像データ）を取得する画像取得手段（例えば、図４の画像取得部２１）と、前記画像取得手段により取得された前記画像データから、焦点が合致した前記被写体に対応する部分を抽出する画像抽出手段（例えば、図４の背景分離処理部２２）とを備え、前記画像抽出手段により抽出された前記被写体に対応する部分を、前記学習画像、または、前記モデル画像として取得する。 The image processing apparatus according to the first aspect of the present invention corresponds to an image processing apparatus (for example, the learning apparatus 71 in FIG. 19 or the image processing system 51) that generates a recognizer for recognizing a recognition target in advance by a learning process. A learning image acquisition unit (for example, a learning image acquisition unit 95 in FIG. 19) that acquires a learning image used in the learning process, and a model image acquisition unit that acquires a model image corresponding to the recognition target. (E.g., using the model image acquisition unit 91 of FIG. 19), the learning image acquired by the learning image acquisition means and the model image acquired by the model image acquisition means, Recognizer generating means for generating a recognizer for recognizing the recognition target (for example, model feature point generator 92, model feature generator 93, model feature storage in FIG. 19) 94, a learning feature point generation unit 96, a learning feature amount generation unit 97, a learning correlation feature amount generation unit 98, a correct / incorrect information acquisition unit 99, and a recognizer generation unit 100), and the learning image acquisition means or the model At least one of the image acquisition means is focused on image data of a subject existing at a predetermined focal length, and image data that is not focused on other objects (for example, as shown in FIG. 3A). Image acquisition means (for example, the image acquisition unit 21 in FIG. 4) for acquiring image data captured by such a focus camera, and the subject in focus from the image data acquired by the image acquisition means. Image extraction means (for example, the background separation processing unit 22 in FIG. 4) for extracting a corresponding part, and the part corresponding to the subject extracted by the image extraction means is Learning image, or to obtain, as the model image.

前記画像抽出手段は、前記画像取得手段により取得された前記画像データの各画素において、近傍の画素との差分が大きい画素を抽出するための演算処理を実行する第１の演算手段（例えば、図５の近傍画素差分フィルタ計算処理部３１）と、前記第１の演算手段により抽出された近傍の画素との差分が大きい画素を注目画素として、前記注目画素とその近傍領域との平均を求める第２の演算手段（例えば、図５の近傍領域和フィルタ計算処理部３２）と、前記第２の演算手段の演算結果に基づいて、前記画像データを、検出するべき物体に対応する領域と、背景であると考えられる領域に分割する分割手段（例えば、図５の閾値処理部３３）とを含むことができる。
請求項１に記載の画像処理装置。 The image extraction unit is a first calculation unit that executes a calculation process for extracting a pixel having a large difference from a neighboring pixel in each pixel of the image data acquired by the image acquisition unit (for example, FIG. 5 is a pixel having a large difference between the neighboring pixel difference filter calculation processing unit 31) and the neighboring pixel extracted by the first calculation unit, and an average of the pixel of interest and its neighboring region is obtained. Based on the calculation result of the second calculation means (for example, the neighborhood region sum filter calculation processing unit 32 in FIG. 5) and the calculation result of the second calculation means, the area corresponding to the object to be detected, the background And a dividing unit (for example, a threshold processing unit 33 in FIG. 5) that divides the region into regions that are considered to be.
The image processing apparatus according to claim 1.

前記認識器生成手段は、前記モデル画像取得手段により取得された前記モデル画像から複数の特徴点をモデル特徴点として生成するモデル特徴点生成手段（例えば、図１９のモデル特徴点生成部９２）と、前記モデル特徴点生成手段により生成された前記モデル特徴点のそれぞれにおける特徴量をモデル特徴量として生成するモデル特徴量生成手段（例えば、図１９のモデル特徴量生成部９３）と、前記学習画像取得手段により取得された前記学習画像から複数の特徴点を学習特徴点として生成する学習特徴点生成手段（例えば、図１９の学習特徴点生成部９６）と、前記学習特徴点生成手段により生成された前記学習特徴点のそれぞれにおける特徴量を学習特徴量として生成する学習特徴量生成手段（例えば、図１９の学習特徴量生成部９７）と、前記モデル特徴量生成手段により生成された前記モデル特徴量の各々について、前記学習特徴量生成手段により生成された前記学習特徴量のうち最も相関の高いものを選択して、選択された前記学習特徴量との間の相関の程度を学習相関特徴量として生成する学習相関特徴量生成手段（例えば、図１９の学習相関特徴量生成部９８）と、前記学習画像が前記認識対象を含むか否かを示す正誤情報を取得する正誤情報取得手段（例えば、図１９の正誤情報取得部９９）と、前記学習相関特徴量生成手段により生成された前記学習相関特徴量、および、前記正誤情報取得手段により取得された前記正誤情報に基づいて認識器を生成する認識器生成手段とを備えることができる。 The recognizer generation unit includes a model feature point generation unit (for example, a model feature point generation unit 92 in FIG. 19) that generates a plurality of feature points as model feature points from the model image acquired by the model image acquisition unit. Model feature value generation means (for example, the model feature value generation unit 93 in FIG. 19) that generates a feature value at each of the model feature points generated by the model feature point generation means as a model feature value; and the learning image A learning feature point generation unit (for example, a learning feature point generation unit 96 in FIG. 19) that generates a plurality of feature points as learning feature points from the learning image acquired by the acquisition unit, and the learning feature point generation unit. Further, learning feature value generation means for generating a feature value at each of the learning feature points as a learning feature value (for example, the learning feature value generation unit 9 in FIG. 19). ) And the model feature amount generated by the model feature amount generation unit, the learning feature amount generated by the learning feature amount generation unit is selected and selected. A learning correlation feature value generation unit (for example, a learning correlation feature value generation unit 98 in FIG. 19) that generates a degree of correlation with the learning feature value as a learning correlation feature value, and the learning image includes the recognition target. Correct / incorrect information acquisition means (for example, the correct / incorrect information acquisition unit 99 in FIG. 19) for acquiring correct / incorrect information indicating whether or not, the learned correlation feature quantity generated by the learned correlation feature quantity generation means, and the correct / incorrect information Recognizing device generating means for generating a recognizing device based on the correct / incorrect information acquired by the acquiring device.

前記モデル特徴点生成手段により生成される前記モデル特徴点は、前記モデル特徴点における前記モデル特徴量の種類（例えば、形、色、動き、テクスチャ、素材、歩行パターンなど）に応じて選択され、前記学習特徴点生成手段により生成される前記学習特徴点は、前記学習特徴点における前記学習特徴量の種類（例えば、形、色、動き、テクスチャ、素材、歩行パターンなど）に応じて選択されるものとすることができる。 The model feature point generated by the model feature point generation means is selected according to the type of model feature amount (for example, shape, color, movement, texture, material, walking pattern, etc.) at the model feature point, The learning feature point generated by the learning feature point generation means is selected according to the type of the learning feature amount (for example, shape, color, movement, texture, material, walking pattern, etc.) at the learning feature point. Can be.

前記モデル特徴量生成手段により生成される前記モデル特徴量は、前記モデル特徴量の種類（例えば、形、色、動き、テクスチャ、素材、歩行パターンなど）に応じて選択され、前記学習特徴量生成手段により生成される前記学習特徴量は、前記学習特徴量の種類（例えば、形、色、動き、テクスチャ、素材、歩行パターンなど）に応じて選択されるものとすることができる。 The model feature quantity generated by the model feature quantity generation unit is selected according to the type of model feature quantity (for example, shape, color, movement, texture, material, walking pattern, etc.), and the learning feature quantity generation The learning feature amount generated by the means may be selected according to the type of the learning feature amount (for example, shape, color, movement, texture, material, walking pattern, etc.).

前記画像抽出手段は、前記画像取得手段により取得された前記画像データのうちの焦点が合致していない領域を抽出（例えば、Blur Detection for Digital Images Using Wavelet Transform; Hanghang Tong: Mingjing Li, Hongjiang Zhang: Changshiui Zhangに記載されている技術を用いる）することにより、焦点が合致した前記被写体に対応する部分を抽出することができる。 The image extraction means extracts an out-of-focus area of the image data acquired by the image acquisition means (for example, Blur Detection for Digital Images Using Wavelet Transform; Hanghang Tong: Mingjing Li, Hongjiang Zhang: By using the technique described in Changshiui Zhang), a portion corresponding to the subject in focus can be extracted.

前記認識器生成手段により生成された前記認識器を記憶する認識器記憶手段（例えば、図１９の認識器記憶部１２２）と、前記認識器記憶手段により記憶されている前記認識器のそれぞれに対応する選択特徴量を記憶する選択特徴量記憶手段（例えば、図１９の選択特徴量記憶部１２１）と、認識処理を行うために用いられる認識画像を取得する認識画像取得手段（例えば、図１９の認識画像取得部１２３）と、前記認識画像取得手段により取得された前記認識画像から複数の特徴点を認識特徴点として生成する認識特徴点生成手段（例えば、図１９の認識特徴点生成部１２４）と、前記認識特徴点生成手段により生成された前記認識特徴点のそれぞれにおける特徴量を認識特徴量として生成する認識特徴量生成手段（例えば、図１９の認識特徴量生成部１２５）と、前記選択特徴量記憶手段により記憶される前記選択特徴量のそれぞれについて前記認識特徴量生成手段により生成された前記認識特徴量のうち最も相関の高いものを選択して、選択された前記認識特徴量との間の相関の程度を認識相関特徴量として生成する認識相関特徴量生成手段（例えば、図１９の認識相関特徴量生成部１２６）と、前記認識相関特徴量生成手段により生成された前記認識相関特徴量を、前記認識器生成手段により生成された前記認識器に代入することによって、前記認識画像取得手段により取得された前記認識画像に前記認識対象が含まれているか否かを判断する認識処理手段（例えば、図１９の認識処理部１２７）とを更に備えることができる。 Recognizer storage means for storing the recognizer generated by the recognizer generation means (for example, the recognizer storage unit 122 in FIG. 19) and the recognizer stored in the recognizer storage means respectively. Selection feature quantity storage means for storing the selected feature quantity (for example, the selection feature quantity storage section 121 in FIG. 19) and recognition image acquisition means for acquiring the recognition image used for performing the recognition processing (for example, in FIG. 19). A recognition image acquisition unit 123), and a recognition feature point generation unit that generates a plurality of feature points as recognition feature points from the recognition image acquired by the recognition image acquisition unit (for example, the recognition feature point generation unit 124 of FIG. 19). And a recognition feature value generation unit (for example, the recognition feature in FIG. 19) that generates a feature value at each of the recognition feature points generated by the recognition feature point generation unit as a recognition feature value. And the selected feature quantity stored by the selected feature quantity storage means is selected from the recognized feature quantities generated by the recognized feature quantity generation means for the selected feature quantity. A recognition correlation feature value generating means (for example, a recognition correlation feature value generation unit 126 in FIG. 19) that generates a degree of correlation with the selected recognition feature value as a recognition correlation feature value, and the recognition correlation feature value The recognition target is included in the recognition image acquired by the recognition image acquisition unit by substituting the recognition correlation feature amount generated by the generation unit into the recognition unit generated by the recognition unit generation unit. Recognition processing means (for example, the recognition processing unit 127 in FIG. 19) for determining whether or not the image is present.

本発明の第１の側面の画像処理方法は、認識対象を認識するための認識器を学習処理により予め生成する画像処理装置（例えば、図１９の学習装置７１、または、画像処理システム５１に対応する装置）の画像処理方法であって、前記学習処理に用いる学習画像を取得し（例えば、図２９のステップＳ１６の処理）、前記認識対象に対応するモデル画像を取得し（例えば、図２９のステップＳ１１の処理）、取得された学習画像と前記モデル画像とを用いて前記学習処理を実行し、前記認識対象を認識するための認識器を生成する（例えば、図２９のステップＳ２０の処理）ステップを含み、前記学習画像を取得するステップ、または、前記モデル画像を取得するステップのうちの少なくとも一方は、所定の焦点距離に存在する被写体の像に焦点が合致し、それ以外の物体には焦点が合致していない画像データ（例えば、図３のＡに示されるようなフォーカスカメラにより撮像された画像データ）を取得し（例えば、図３０のステップＳ４１の処理）、取得された前記画像データから、焦点が合致した前記被写体に対応する部分を抽出する（例えば、図３０のステップＳ４２）ステップを含み、抽出された前記被写体に対応する部分を、前記学習画像、または、前記モデル画像として取得する。 The image processing method according to the first aspect of the present invention corresponds to an image processing device (for example, the learning device 71 in FIG. 19 or the image processing system 51) that generates a recognizer for recognizing a recognition target in advance by learning processing. Image processing method for acquiring the learning image used in the learning process (for example, the process of step S16 in FIG. 29), and acquiring the model image corresponding to the recognition target (for example, in FIG. 29). Step S11), the learning process is executed using the acquired learning image and the model image, and a recognizer for recognizing the recognition target is generated (for example, the process of Step S20 in FIG. 29). At least one of the step of acquiring the learning image and the step of acquiring the model image includes an image of a subject existing at a predetermined focal length. Image data (for example, image data captured by a focus camera as shown in FIG. 3A) is acquired (for example, step in FIG. 30). (Step S41), including a step of extracting the portion corresponding to the subject in focus from the acquired image data (for example, step S42 in FIG. 30), and including the portion corresponding to the extracted subject. Acquired as the learning image or the model image.

本発明の第１の側面のプログラムは、認識対象を認識するための認識器を学習処理により予め生成する処理をコンピュータに実行させるためのプログラムであって、前記学習処理に用いる学習画像の取得を制御し（例えば、図２９のステップＳ１６の処理）、前記認識対象に対応するモデル画像の取得を制御し（例えば、図２９のステップＳ１１の処理）、取得された学習画像と前記モデル画像とを用いて前記学習処理を実行し、前記認識対象を認識するための認識器を生成する（例えば、図２９のステップＳ２０の処理）ステップを含み、前記学習画像を取得するステップ、または、前記モデル画像を取得するステップのうちの少なくとも一方は、所定の焦点距離に存在する被写体の像に焦点が合致し、それ以外の物体には焦点が合致していない画像データ（例えば、図３のＡに示されるようなフォーカスカメラにより撮像された画像データ）の取得を制御し（例えば、図３０のステップＳ４１の処理）、取得された前記画像データから、焦点が合致した前記被写体に対応する部分を抽出する（例えば、図３０のステップＳ４２）ステップを含み、抽出された前記被写体に対応する部分を、前記学習画像、または、前記モデル画像として取得する処理をコンピュータに実行させる。 A program according to a first aspect of the present invention is a program for causing a computer to execute processing for generating a recognizer for recognizing a recognition target in advance by learning processing, and acquiring a learning image used for the learning processing. Control (for example, the process of step S16 in FIG. 29), control the acquisition of the model image corresponding to the recognition target (for example, the process of step S11 in FIG. 29), and acquire the acquired learning image and the model image. Using the learning process to generate a recognizer for recognizing the recognition target (for example, the process of step S20 in FIG. 29), and acquiring the learning image, or the model image At least one of the steps of obtaining the image is in focus on the image of the subject existing at the predetermined focal length, and in focus on the other objects. Control of acquisition of non-image data (for example, image data captured by a focus camera as shown in FIG. 3A) (for example, processing in step S41 of FIG. 30), and from the acquired image data, a focus is controlled. Including a step of extracting a portion corresponding to the subject that matches (for example, step S42 in FIG. 30), and acquiring a portion corresponding to the extracted subject as the learning image or the model image. Let the computer run.

本発明の第２の側面の認識装置は、学習処理により生成された認識器を用いて、認識対象が認識画像に含まれているか否かを判断する認識処理を行う認識装置（例えば、図１９の認識装置７２）であって、認識処理を行うために用いられる前記認識画像を取得する認識画像取得手段（例えば、図１９の認識画像取得部１２３）と、前記認識器を記憶する認識器記憶手段（例えば、図１９の認識器記憶部１２２）と、前記認識器記憶手段により記憶されている前記認識器のそれぞれに対応する選択特徴量を記憶する選択特徴量記憶手段（例えば、図１９の選択特徴量記憶部１２１）と、前記認識器記憶手段により記憶されている前記認識器および前記選択特徴量記憶手段により記憶されている前記選択特徴量を用いて、前記認識画像取得手段により取得された前記認識画像に前記認識対象が含まれているか否かを判断する認識処理手段（例えば、図１９の認識特徴点生成部１２４、認識特徴量生成部１２５、認識相関特徴量生成部１２６、認識処理部１２７）とを備え、前記認識画像取得手段は、所定の焦点距離に存在する被写体の像に焦点が合致し、それ以外の物体には焦点が合致していない画像データ（例えば、図３のＡに示されるようなフォーカスカメラにより撮像された画像データ）を取得する画像取得手段（例えば、図４の画像取得部２１）と、前記画像取得手段により取得された前記画像データから、焦点が合致した前記被写体に対応する部分を抽出する画像抽出手段（例えば、図４の背景分離処理部２２）とを備え、前記画像抽出手段により抽出された前記被写体に対応する部分を、前記認識画像として取得する。 The recognition apparatus according to the second aspect of the present invention uses a recognizer generated by learning processing to perform a recognition process for determining whether or not a recognition target is included in a recognition image (for example, FIG. 19). A recognition image acquisition means (for example, a recognition image acquisition unit 123 in FIG. 19) for acquiring the recognition image used for performing recognition processing, and a recognition device storage for storing the recognition device. Means (for example, the recognizer storage unit 122 in FIG. 19) and selection feature amount storage means (for example, in FIG. 19) that stores selection feature amounts corresponding to each of the recognizers stored in the recognizer storage unit. A selected feature amount storage unit 121), the recognizer stored in the recognizer storage unit, and the selected feature amount stored in the selected feature amount storage unit; Recognition processing means for determining whether or not the recognition target is included in the acquired recognition image (for example, a recognition feature point generation unit 124, a recognition feature amount generation unit 125, a recognition correlation feature amount generation unit in FIG. 19). 126, a recognition processing unit 127), and the recognition image acquisition unit has image data (for example, a focus that matches an image of a subject existing at a predetermined focal length and a focus that does not match other objects) From image acquisition means (for example, the image acquisition unit 21 in FIG. 4) for acquiring image data captured by a focus camera as shown in FIG. 3A and the image data acquired by the image acquisition means Image extracting means (for example, the background separation processing unit 22 in FIG. 4) for extracting a portion corresponding to the subject that is in focus, and for the subject extracted by the image extracting means. A portion, which obtained as the recognition image.

前記画像抽出手段は、前記画像取得手段により取得された前記画像データの各画素において、近傍の画素との差分が大きい画素を抽出するための演算処理を実行する第１の演算手段（例えば、図５の近傍画素差分フィルタ計算処理部３１）と、前記第１の演算手段により抽出された近傍の画素との差分が大きい画素を注目画素として、前記注目画素とその近傍領域との平均を求める第２の演算手段（例えば、図５の近傍領域和フィルタ計算処理部３２）と、前記第２の演算手段の演算結果に基づいて、前記画像データを、検出するべき物体に対応する領域と、背景であると考えられる領域に分割する分割手段（例えば、図５の閾値処理部３３）とを含むことができる。 The image extraction unit is a first calculation unit that executes a calculation process for extracting a pixel having a large difference from a neighboring pixel in each pixel of the image data acquired by the image acquisition unit (for example, FIG. 5 is a pixel having a large difference between the neighboring pixel difference filter calculation processing unit 31) and the neighboring pixel extracted by the first calculation unit, and an average of the pixel of interest and its neighboring region is obtained. Based on the calculation result of the second calculation means (for example, the neighborhood region sum filter calculation processing unit 32 in FIG. 5) and the calculation result of the second calculation means, the area corresponding to the object to be detected, the background And a dividing unit (for example, a threshold processing unit 33 in FIG. 5) that divides the region into regions considered to be

前記認識処理手段は、前記認識画像取得手段により取得された前記認識画像から複数の特徴点を認識特徴点として生成する認識特徴点生成手段（例えば、図１９の認識特徴点生成部１２４）と、前記認識特徴点生成手段により生成された前記認識特徴点のそれぞれにおける特徴量を認識特徴量として生成する認識特徴量生成手段（例えば、図１９の認識特徴量生成部１２５）と、前記選択特徴量記憶手段に記憶される前記選択特徴量のそれぞれについて前記認識特徴量生成手段により生成された前記認識特徴量のうち最も相関の高いものを選択して、選択された前記認識特徴量との間の相関の程度を認識相関特徴量として生成する認識相関特徴量生成手段（例えば、図１９の認識相関特徴量生成部１２６）と、前記認識相関特徴量生成手段により生成された前記認識相関特徴量を、前記認識器記憶手段により記憶された前記認識器に代入することによって、前記認識画像取得手段により取得された前記認識画像に前記認識対象が含まれているか否かを判断する判断手段（例えば、図１９の認識処理部１２７）とを備えることができる。 The recognition processing unit includes a recognition feature point generation unit (for example, a recognition feature point generation unit 124 in FIG. 19) that generates a plurality of feature points as recognition feature points from the recognition image acquired by the recognition image acquisition unit. Recognition feature value generation means (for example, the recognition feature value generation unit 125 in FIG. 19) that generates a feature value at each of the recognition feature points generated by the recognition feature point generation means as a recognition feature value; and the selected feature value For each of the selected feature quantities stored in the storage means, the recognition feature quantity generated by the recognition feature quantity generation means is selected from the recognition feature quantities having the highest correlation, and the selected feature quantity between the selected feature quantities is selected. A recognition correlation feature value generation unit (for example, a recognition correlation feature value generation unit 126 in FIG. 19) that generates the degree of correlation as a recognition correlation feature value, and the recognition correlation feature value generation unit. Whether the recognition target is included in the recognition image acquired by the recognition image acquisition unit by substituting the generated recognition correlation feature amount into the recognition unit stored by the recognition unit storage unit. Judgment means for judging whether or not (for example, the recognition processing unit 127 of FIG. 19) can be provided.

本発明の第２の側面の認識方法は、学習処理により生成され、記憶部に記憶された認識器、および、前記記憶部に記憶されている前記認識器のそれぞれに対応する選択特徴量を用いて、認識対象が認識画像に含まれているか否かを判断する認識処理を行う認識装置（例えば、図１９の認識装置７２）の認識方法であって、認識処理を行うために用いられる前記認識画像を取得し（例えば、図３４のステップＳ１８１、または、図３７のステップＳ２７１の処理）、前記認識器および前記選択特徴量を用いて、取得された前記認識画像に前記認識対象が含まれているか否かを判断する（例えば、図３４のステップＳ１８６、または、図３７のステップＳ２７８の処理）ステップを含み、前記認識画像を取得するステップの処理では、所定の焦点距離に存在する被写体の像に焦点が合致し、それ以外の物体には焦点が合致していない画像データ（例えば、図３のＡに示されるようなフォーカスカメラにより撮像された画像データ）を取得し（例えば、図３０のステップＳ４１の処理）、取得された前記画像データから、焦点が合致した前記被写体に対応する部分を抽出する（例えば、図３０のステップＳ４２）ステップを含み、抽出された前記被写体に対応する部分を、前記認識画像として取得する。 The recognition method according to the second aspect of the present invention uses a recognizer generated by learning processing and stored in a storage unit, and a selection feature amount corresponding to each of the recognizers stored in the storage unit. A recognition method of a recognition apparatus (for example, the recognition apparatus 72 in FIG. 19) that performs a recognition process for determining whether or not a recognition target is included in the recognition image, and is used for performing the recognition process. An image is acquired (for example, the process in step S181 in FIG. 34 or the process in step S271 in FIG. 37), and the recognition object is included in the acquired recognition image using the recognizer and the selected feature amount. Whether or not (for example, the process of step S186 of FIG. 34 or the process of step S278 of FIG. 37) includes a step of acquiring a recognized image. Acquire image data (for example, image data picked up by a focus camera as shown in FIG. 3A) that is focused on an image of a subject that is far away and not focused on other objects. (For example, the process of step S41 of FIG. 30), including the step of extracting the portion corresponding to the subject in focus (for example, step S42 of FIG. 30) from the acquired image data. A portion corresponding to the subject is acquired as the recognition image.

本発明の第２の側面のプログラムは、学習処理により生成され、記憶部に記憶された認識器、および、前記記憶部に記憶されている前記認識器のそれぞれに対応する選択特徴量を用いて、認識対象が認識画像に含まれているか否かを判断する処理をコンピュータに実行させるプログラムであって、認識処理を行うために用いられる前記認識画像の取得を制御し（例えば、図３４のステップＳ１８１、または、図３７のステップＳ２７１の処理）、前記認識器および前記選択特徴量を用いて、取得された前記認識画像に前記認識対象が含まれているか否かを判断する（例えば、図３４のステップＳ１８６、または、図３７のステップＳ２７８の処理）ステップを含み、前記認識画像を取得するステップの処理では、所定の焦点距離に存在する被写体の像に焦点が合致し、それ以外の物体には焦点が合致していない画像データ（例えば、図３のＡに示されるようなフォーカスカメラにより撮像された画像データ）の取得を制御し（例えば、図３０のステップＳ４１の処理）、取得された前記画像データから、焦点が合致した前記被写体に対応する部分を抽出する（例えば、図３０のステップＳ４２）ステップを含み、抽出された前記被写体に対応する部分を、前記認識画像として取得する処理をコンピュータに実行させる。 A program according to a second aspect of the present invention uses a recognizer generated by learning processing and stored in a storage unit, and a selection feature amount corresponding to each of the recognizers stored in the storage unit. A program for causing a computer to execute processing for determining whether or not a recognition target is included in a recognition image, and controlling acquisition of the recognition image used for performing recognition processing (for example, step of FIG. 34). S181 or the process of step S271 in FIG. 37), using the recognizer and the selected feature amount, it is determined whether or not the recognition target is included in the acquired recognition image (for example, FIG. 34). Step S186 or the process of step S278 of FIG. 37), and in the process of acquiring the recognized image, the object existing at a predetermined focal length is obtained. Controls the acquisition of image data (for example, image data captured by a focus camera as shown in FIG. 3A) that is in focus on the body image and not in focus on other objects ( For example, the process of step S41 of FIG. 30 includes a step of extracting a portion corresponding to the subject in focus (for example, step S42 of FIG. 30) from the acquired image data, and the extracted subject The computer is caused to execute a process of acquiring a portion corresponding to the above as a recognized image.

以下、図を参照して、本発明の実施の形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

画像認識処理を行うにあたって、学習処理においても、認識処理においても、取得する画像データから、学習対象、または、認識対象となる部分のみを抽出する、換言すれば、学習や認識に利用するべきではない背景部分を除去することが必要である。 In performing the image recognition process, both the learning process and the recognition process extract only the learning target or the part to be recognized from the acquired image data. In other words, it should not be used for learning or recognition. It is necessary to remove no background part.

例えば、図２に示されるように、画像認識処理は、認識処理のための物体モデルを学習するための学習処理と、学習の結果得られた物体モデルを用いた認識処理とに大きく分かれる。学習処理と認識処理とは、いずれも、背景を除去した画像で行われる。 For example, as shown in FIG. 2, the image recognition processing is largely divided into learning processing for learning an object model for recognition processing and recognition processing using an object model obtained as a result of learning. Both the learning process and the recognition process are performed on an image from which the background is removed.

すなわち、学習処理時は、学習のために取得された入力画像データから、認識するべき物体の切り出し、すなわち、背景の分離が行われて、入力画像データから切り出された認識するべき物体に対応する部分の画像データを用いて、学習処理が実行される。そして、学習処理の結果得られた認識用の物体モデルデータは、物体モデルデータベース（ＤＢ）に格納される。そして、認識処理時には、認識のために取得された入力画像データから、認識するべき物体の切り出し、すなわち、背景の分離が行われて、入力画像データから切り出された認識するべき物体に対応する部分の画像データを用いて、物体モデルデータベースに格納されている物体モデルデータを参照して、認識処理、すなわち、認識のために取得された入力画像データに、認識するべき物体が含まれているか否かが判断される。 That is, during the learning process, the object to be recognized is cut out from the input image data acquired for learning, that is, the background is separated to correspond to the object to be recognized cut out from the input image data. A learning process is executed using the partial image data. The object model data for recognition obtained as a result of the learning process is stored in an object model database (DB). During the recognition process, the object to be recognized is cut out from the input image data acquired for recognition, that is, the background is separated, and the portion corresponding to the object to be recognized cut out from the input image data Whether the object to be recognized is included in the input image data acquired for recognition processing, that is, recognition by referring to the object model data stored in the object model database. Is judged.

このとき、学習処理と認識処理とにおいて入力画像から背景を容易に除去するために、入力画像データを、フォーカスカメラにより撮像するものとすると好適である。 At this time, in order to easily remove the background from the input image in the learning process and the recognition process, it is preferable that the input image data is captured by the focus camera.

フォーカスカメラで撮像された画像データは、所定の焦点距離近傍に存在する物体のみに焦点が合い、その他の物体、すなわち、背景部分においては、焦点が合わず、所謂ピンボケ状態となる。 The image data picked up by the focus camera is focused only on an object existing in the vicinity of a predetermined focal length, and is not focused on other objects, that is, the background portion, and is in a so-called out-of-focus state.

認識するべき物体を含む画像データを、フォーカスカメラにより撮像した場合と、ある程度背景にも焦点が合致するように撮像した場合について、図３を用いて説明する。 A case where image data including an object to be recognized is picked up by a focus camera and a case where the image data is picked up so as to be in focus to some extent will be described with reference to FIG.

図３のＡは、フォーカスカメラにより撮像された、認識するべき物体を含む画像データを示す図である。そして、図３のＢは、通常の焦点制御により撮像された、認識するべき物体を含む画像データを示す図である。 FIG. 3A is a diagram illustrating image data including an object to be recognized, which is captured by the focus camera. FIG. 3B is a diagram illustrating image data including an object to be recognized, which is captured by normal focus control.

図３のＢに示される画像データにおいては、認識するべき物体のみならず、背景に見える物体にも、焦点があっており、背景に見える物体も、はっきりと撮像されている。これに対して、図３のＡに示される画像データにおいては、認識するべき物体のみに焦点が合致しており、背景に見える物体は、ピンボケ状態である。すなわち、フォーカスカメラにより撮像された画像データにおいては、認識するべき物体だけが浮き出て見える。 In the image data shown in FIG. 3B, not only the object to be recognized but also the object that appears in the background is in focus, and the object that appears in the background is clearly imaged. On the other hand, in the image data shown in FIG. 3A, only the object to be recognized is in focus, and the object that appears in the background is out of focus. In other words, in the image data captured by the focus camera, only the object to be recognized appears to be raised.

図４は、フォーカスカメラにより撮像して得られた画像データから、焦点が合致した部分を抽出する処理を実行する画像処理部１１の構成を示すブロック図である。 FIG. 4 is a block diagram illustrating a configuration of the image processing unit 11 that executes a process of extracting a portion in focus from image data obtained by imaging with a focus camera.

画像処理部１１は、画像取得部２１と背景分離処理部２２とを含んで構成されている。 The image processing unit 11 includes an image acquisition unit 21 and a background separation processing unit 22.

画像取得部２１は、フォーカスカメラにより撮像して得られた画像データを外部から取得するか、または、内部にフォーカスカメラを備えて撮像処理を実行し、背景分離処理部２２に供給する。 The image acquisition unit 21 acquires image data obtained by imaging with a focus camera from the outside, or includes an internal focus camera to execute an imaging process and supplies the image data to the background separation processing unit 22.

背景分離処理部２２は、フォーカスカメラにより撮像して得られた画像データから、焦点が合致した部分を抽出し、抽出された部分の画像データを出力する。 The background separation processing unit 22 extracts a focused part from the image data obtained by imaging with the focus camera, and outputs the extracted part of the image data.

フォーカスカメラにより撮像して得られた画像データから、焦点が合致した部分を抽出するためには、一般的に用いられている、画像のボケ検出の技術（例えば、Blur Detection for Digital Images Using Wavelet Transform; Hanghang Tong: Mingjing Li, Hongjiang Zhang: Changshiui Zhangに記載されている技術）を応用することができる。すなわち、画像データのうち、焦点の合致していないボケ部分を検出し、その部分を削除することにより、焦点が合致した部分を抽出することが可能である。 In order to extract the in-focus part from the image data captured by the focus camera, a commonly used image blur detection technique (for example, Blur Detection for Digital Images Using Wavelet Transform) ; Technology described in Hanghang Tong: Mingjing Li, Hongjiang Zhang: Changshiui Zhang) can be applied. That is, it is possible to extract a portion in which the focus is matched by detecting a blurred portion that is out of focus in the image data and deleting that portion.

また、フォーカスカメラにより撮像して得られた画像データにＦＦＴ（Fast Fourier transform）をかけることによって、撮像された画像データのそれぞれの画像領域の周波数スペクトルを分析し、高周波成分が十分含まれている領域では、焦点があっているとし、一方、周波数が低いところは、ピンボケになっていると判定することにより、認識するべき物体に対応する部分を抽出することも可能である。 Further, by applying FFT (Fast Fourier transform) to the image data obtained by imaging with the focus camera, the frequency spectrum of each image area of the imaged image data is analyzed, and high frequency components are sufficiently contained. On the other hand, it is possible to extract a portion corresponding to an object to be recognized by determining that a region is in focus and a region having a low frequency is out of focus.

更に、フォーカスカメラにより撮像して得られた画像データの各画素において、注目画素とその近傍画素との輝度差フィルタを使うことにより、焦点が合致した部分を抽出することも可能である。 Further, in each pixel of the image data obtained by imaging with the focus camera, it is possible to extract a focused portion by using a luminance difference filter between the target pixel and its neighboring pixels.

図５は、上述した輝度差フィルタを用いて、フォーカスカメラにより撮像して得られた背景を含む画像データから、認識対象の物質に対応する部分を抽出する場合の背景分離処理部２２の更に詳細な構成を示すブロック図である。 FIG. 5 shows further details of the background separation processing unit 22 when a portion corresponding to a substance to be recognized is extracted from image data including a background obtained by imaging with a focus camera using the above-described luminance difference filter. It is a block diagram which shows a structure.

フォーカスカメラにより撮像して得られた背景を含む画像データから、認識対象の物質に対応する部分を抽出する場合の背景分離処理部２２は、近傍画素差分フィルタ計算処理部３１、近傍領域和フィルタ計算処理部３２、および、閾値処理部３３を含んで構成されている。 A background separation processing unit 22 in the case of extracting a part corresponding to a recognition target substance from image data including a background obtained by imaging with a focus camera includes a neighboring pixel difference filter calculation processing unit 31 and a neighboring region sum filter calculation. A processing unit 32 and a threshold processing unit 33 are included.

近傍画素差分フィルタ計算処理部３１は、供給された画像データのピンボケ領域の特徴を出すため、隣り合う画素値との差分を計算する。その画素がピンボケしている場合、隣り合う画素との差分は小さい（0に近い）。そこで、近傍画素差分フィルタ計算処理部３１は、8つの近傍画素との差分を考え、今着目している画素点を座標(x,y)とし、その点での画素値をI(x,y)として、次の式（１）を計算する。 The neighboring pixel difference filter calculation processing unit 31 calculates a difference between adjacent pixel values in order to obtain a feature of the out-of-focus area of the supplied image data. When the pixel is out of focus, the difference between adjacent pixels is small (close to 0). Therefore, the neighboring pixel difference filter calculation processing unit 31 considers differences from the eight neighboring pixels, sets the pixel point of interest as coordinates (x, y), and sets the pixel value at that point to I (x, y ), The following equation (1) is calculated.

(Ｉ（ｘ−１，ｙ−１）−Ｉ（ｘ，ｙ）)＋(Ｉ(ｘ，ｙ−１)−Ｉ(ｘ，ｙ))
＋（Ｉ(ｘ＋１，ｙ−１)−Ｉ（ｘ，ｙ））＋（Ｉ（ｘ−１，ｙ）−Ｉ（ｘ，ｙ））
＋（Ｉ（ｘ＋１，ｙ）−Ｉ(ｘ，ｙ)）＋（Ｉ（ｘ−１，ｙ＋１）−Ｉ（ｘ，ｙ））
＋（Ｉ（ｘ，ｙ＋１）−Ｉ（ｘ，ｙ））＋（Ｉ（ｘ＋１，ｙ＋１）−Ｉ（ｘ，ｙ））
＝Σ（Ｉ（ｘ＋Δｘ，ｙ＋Δｙ）−Ｉ（ｘ，ｙ））
・・・（１） (I (x-1, y-1) -I (x, y)) + (I (x, y-1) -I (x, y))
+ (I (x + 1, y-1) -I (x, y)) + (I (x-1, y) -I (x, y))
+ (I (x + 1, y) -I (x, y)) + (I (x-1, y + 1) -I (x, y))
+ (I (x, y + 1) -I (x, y)) + (I (x + 1, y + 1) -I (x, y))
= Σ (I (x + Δx, y + Δy) −I (x, y))
... (1)

ただし、式(１)の右辺において、Δｘ＝−１，０，１であり、Δｙ＝−１，０，１である。 However, in the right side of the equation (1), Δx = −1, 0, 1 and Δy = −1, 0, 1.

これを畳み込みフィルタとして考えると、図６に示されるように、３×３のフィルタで、中心、すなわち、注目画素に対する係数が８、近傍８画素の全ての係数が−１というフィルタになる。 If this is considered as a convolution filter, as shown in FIG. 6, it is a 3 × 3 filter, in which the coefficient for the center, that is, the pixel of interest is 8 and all the coefficients of the neighboring 8 pixels are −1.

図７は、図３のＡを入力画像としたときの近傍画素差分フィルタ計算処理部３１の出力例を示す図である。図７の計算結果において、黒い領域は近傍画素との差分が０に近い、所謂ピンボケした画素であり、明るい（白い）ところは、パターンがシャープでボケていない画素である。 FIG. 7 is a diagram illustrating an output example of the neighboring pixel difference filter calculation processing unit 31 when A in FIG. 3 is an input image. In the calculation result of FIG. 7, the black area is a so-called out-of-focus pixel whose difference from the neighboring pixels is close to 0, and the bright (white) area is a pixel whose pattern is sharp and not blurred.

しかしながら、図７においては、認識するべき対象物に対応する部分であっても、テクスチャのない部分、例えば、図３を用いて説明した画像データにおいて認識するべき対象物である人形の頬やおでこの部分などにおいても、隣接画素の差分値が０に近いため、黒い領域となってしまっている。したがって、近傍画素差分フィルタ計算処理部３１の出力に対してそのまま閾値処理を行っても、認識するべき対象物に対応する領域を正しく切り出すことができない。 However, in FIG. 7, even a portion corresponding to an object to be recognized does not have a texture, for example, a doll's cheek or a doll that is an object to be recognized in the image data described with reference to FIG. 3. Even in this portion and the like, the difference value between adjacent pixels is close to 0, so that the area is black. Therefore, even if the threshold processing is directly performed on the output of the neighboring pixel difference filter calculation processing unit 31, the region corresponding to the object to be recognized cannot be cut out correctly.

そこで、近傍領域和フィルタ計算処理部３２は、注目画素とその近傍領域との平均を求めるフィルタリング処理を行う。 Therefore, the neighborhood region sum filter calculation processing unit 32 performs a filtering process for obtaining an average of the target pixel and its neighborhood region.

認識するべき対象物全体を背景から抜き出すためには、「認識するべき対象物らしい領域」を結合し、一つの領域としなければならない。そのために、注目画素と近傍領域との画素値の和を求める計算を行う。これは、ある画素に着目したときに、周辺Ｎ×Ｍ画素の領域の和を求める計算であり、以下、和フィルタ計算と称するものとする。和フィルタ計算は、フィルタ処理後の画素を、その画素の周辺領域の平均をとった値とするのと同義である。和フィルタ計算の実際の計算処理としては、窓サイズＮ×Ｍの要素が全て１の畳み込みフィルタをかけることに対応する。フィルタの窓サイズを３×３とした場合の和フィルタの例を図８に示す。 In order to extract the entire object to be recognized from the background, it is necessary to combine “regions that are likely to be recognized” into one region. For this purpose, calculation is performed to obtain the sum of pixel values of the target pixel and the neighboring region. This is a calculation for obtaining the sum of areas of neighboring N × M pixels when attention is paid to a certain pixel, and is hereinafter referred to as a sum filter calculation. The sum filter calculation is synonymous with setting the pixel after filter processing to a value obtained by averaging the peripheral areas of the pixel. The actual calculation process of the sum filter calculation corresponds to applying a convolution filter in which all the elements of the window size N × M are 1. FIG. 8 shows an example of the sum filter when the filter window size is 3 × 3.

近傍領域和フィルタ計算処理部３２は、図９に示されるように、所定の窓サイズ（図中、画像上の四角の枠に対応する）の和フィルタ演算を、近傍画素差分フィルタ計算処理部３１の処理結果の全画素に対して実行し、計算結果を閾値処理部３３に供給する。 As shown in FIG. 9, the neighborhood region sum filter calculation processing unit 32 performs a sum filter operation of a predetermined window size (corresponding to a square frame on the image in the figure), and a neighborhood pixel difference filter calculation processing unit 31. The calculation result is supplied to the threshold value processing unit 33.

近傍領域和フィルタ計算処理部３２による和フィルタ計算においては、適用する窓サイズにより、結果が変化する。図１０乃至図１２を用いて、窓サイズと和フィルタ計算の結果について説明する。 In the sum filter calculation by the neighborhood region sum filter calculation processing unit 32, the result varies depending on the window size to be applied. The window size and the result of the sum filter calculation will be described with reference to FIGS.

図１０は、窓サイズが小さい場合の和フィルタ計算の結果の例である。窓サイズが小さすぎると、テクスチャの少ない物体の場合、テクスチャの小さい部分（ここでは、人形の鼻や頬などの部分）に黒い領域が大きく残ってしまうため、認識するべき物体の領域をきれいに切り出すことができない。 FIG. 10 is an example of the result of the sum filter calculation when the window size is small. If the window size is too small, an object with little texture will leave a large black area in the small area of the texture (in this case, the doll's nose, cheeks, etc.). I can't.

図１１は、図１０と比較して窓サイズが大きい場合の和フィルタ計算の結果の例である。窓サイズが大きくなると、テクスチャの小さい部分の黒い領域が減少してくる。 FIG. 11 is an example of the result of the sum filter calculation when the window size is larger than that in FIG. As the window size increases, the black areas of the smaller texture areas decrease.

図１２は、図１１と比較して更に窓サイズが大きい場合の和フィルタ計算の結果の例である。窓サイズが大きすぎると、認識するべき物体の周辺の部分も白い領域となってしまうため、背景の領域も切り出してしまう可能性がある。 FIG. 12 is an example of the result of the sum filter calculation when the window size is larger than that in FIG. If the window size is too large, the peripheral area of the object to be recognized also becomes a white area, so that the background area may be cut out.

図１０乃至図１２に示されるように、窓サイズが小さいほど、検出するべき物体のうちのテクスチャの少ない部分を背景であると誤検出してしまう可能性が大きくなり、窓サイズが大きいほど、背景の部分を検出するべき物体に対応する領域であると誤検出してしまう可能性がある。このように、窓サイズと領域の検出の精度とは、トレードオフの関係となっている。したがって、窓サイズは、例えば、抽出するべき物体のテクスチャが小さいか否か、または、画像データの全領域における抽出するべき物体の占める割合などに基づいて、適宜設定変更、または、調節することが可能なようになされていると好適である。 As shown in FIGS. 10 to 12, the smaller the window size, the greater the possibility of erroneous detection of a portion with less texture in the object to be detected as the background, and the larger the window size, There is a possibility that the background portion is erroneously detected as a region corresponding to an object to be detected. Thus, there is a trade-off relationship between the window size and the area detection accuracy. Therefore, the window size can be appropriately changed or adjusted based on, for example, whether the texture of the object to be extracted is small or the proportion of the object to be extracted in the entire area of the image data. It is preferable to make it possible.

閾値処理部３３は、近傍領域和フィルタ計算処理部３２による和フィルタ計算結果に基づいて、画像データを、検出するべき物体に対応する領域と、背景であると考えられる領域に分割する。 The threshold processing unit 33 divides the image data into a region corresponding to the object to be detected and a region considered to be the background based on the sum filter calculation result by the neighborhood region sum filter calculation processing unit 32.

領域の分割は、和フィルタ計算結果を所定の閾値で分離する、すなわち、２値化処理を行うようにしても良いし、和フィルタ計算結果のうち値が発生している部分（黒い領域ではない部分）をマスクするものとしても良い。なお、閾値処理をして２値化する場合、白い領域のなかに黒い領域が残ってしまうことがあるので、２値化したあとに、膨張・縮小というMorphological処理を行う。この処理により、白い領域の中に黒い穴があった場合でも、その穴を埋めることができ、ノイズの影響を軽減できる。 In the region division, the sum filter calculation result may be separated by a predetermined threshold, that is, binarization processing may be performed, or a portion of the sum filter calculation result where a value is generated (not a black region) It is good also as what masks (part). Note that when binarization is performed by threshold processing, a black region may remain in a white region. Therefore, after binarization, Morphological processing such as expansion / reduction is performed. With this process, even if there is a black hole in the white area, the hole can be filled and the influence of noise can be reduced.

図１３は、２値化処理の後、Morphological処理を行うことにより、領域を分割する処理が行われた場合の図３のＡの入力画像に対する閾値処理部３３の出力の例である。また、図１４は、和フィルタ計算結果のうち値が発生している部分（黒い領域ではない部分）をマスクすることにより領域を分割する処理が行われた場合の図３のＡの入力画像に対する閾値処理部３３の出力の例である。 FIG. 13 is an example of the output of the threshold processing unit 33 for the input image of FIG. 3A when the process of dividing the region by performing the Morphological process after the binarization process. FIG. 14 shows the input image of FIG. 3A when the region dividing process is performed by masking the portion where the value is generated (the portion that is not a black region) in the sum filter calculation result. It is an example of the output of the threshold value process part 33.

このようにして、背景領域と認識するべき物体が撮像されている領域とが分割される。 In this way, the background area and the area where the object to be recognized is imaged are divided.

図１５乃至図１８に、近傍との輝度差フィルタを用いて、フォーカスカメラにより焦点距離を固定して撮像して得られた背景を含む画像データから、認識対象の物質に対応する部分を抽出する場合の背景分離処理部２２の近傍画素差分フィルタ計算処理部３１、近傍領域和フィルタ計算処理部３２、および、閾値処理部３３のそれぞれの出力例を示す。 15 to 18, a portion corresponding to a recognition target substance is extracted from image data including a background obtained by imaging with a focal distance fixed by a focus camera using a brightness difference filter with the vicinity. The output examples of the neighboring pixel difference filter calculation processing unit 31, the neighboring region sum filter calculation processing unit 32, and the threshold processing unit 33 of the background separation processing unit 22 are shown.

図１５乃至図１８のそれぞれは、上から、背景分離処理部２２への入力画像、近傍画素差分フィルタ計算処理部３１の出力、近傍領域和フィルタ計算処理部３２の出力、閾値処理後２値化しMorphological処理を行った場合の閾値処理部３３の出力、アナログマスク処理を行った場合の閾値処理部３３の出力を示している。 Each of FIGS. 15 to 18 includes, from above, an input image to the background separation processing unit 22, an output from the neighborhood pixel difference filter calculation processing unit 31, an output from the neighborhood region sum filter calculation processing unit 32, and binarization after threshold processing. An output of the threshold processing unit 33 when the morphological processing is performed and an output of the threshold processing unit 33 when the analog mask processing is performed are illustrated.

図１５乃至図１８に示されるように、認識するべき物体に焦点距離を合致させてフォーカスカメラにより撮像された画像データに対して、和フィルタ計算および近傍画素差分フィルタ計算を行い、その結果に対して閾値処理後２値化を行うか、アナログマスク処理を行うことにより、画像データ内の背景と認識するべき物体とを分離することができる。 As shown in FIGS. 15 to 18, the sum filter calculation and the neighboring pixel difference filter calculation are performed on the image data captured by the focus camera with the focal length matched to the object to be recognized. By performing binarization after threshold processing or performing analog mask processing, the background in the image data and the object to be recognized can be separated.

背景分離処理部２２の処理により背景と認識するべき物体とを分離することにより、認識するべき物体に対応する部分を画像データ内から抽出することで、学習処理の効率、および、認識処理の性能の向上を期待することができる。 By separating the object to be recognized from the background by the process of the background separation processing unit 22, the part corresponding to the object to be recognized is extracted from the image data, thereby improving the efficiency of the learning process and the performance of the recognition process. Improvement can be expected.

すなわち、画像処理による物体認識（Object Recognition）においては、その認識処理においても、認識器を生成するために学習処理が行われる場合であっても、従来は、認識するべき物体を画像中から人が切り出し、そのデータを使用していた。これに対して、図２乃至図１８を用いて説明したようにして、画像データの背景部分と認識するべき物体に対応する部分を自動的に分離することができれば、人が認識するべき物体を画像中から切り出す処理を行わなくて良く、好適である。 In other words, in object recognition by image processing (object recognition), even in the recognition processing, even when learning processing is performed in order to generate a recognizer, conventionally, an object to be recognized is detected from a human image. Cut out and used the data. On the other hand, if the portion corresponding to the object to be recognized as the background portion of the image data can be automatically separated as described with reference to FIGS. It is not necessary to perform the process of cutting out from the image, which is preferable.

次に、上述した処理により、画像データ内から抽出された認識するべき物体に対応する部分を用いて、実際に物体を認識する場合について説明する。 Next, a case where an object is actually recognized using the portion corresponding to the object to be recognized extracted from the image data by the above-described processing will be described.

画像処理による物体認識の手法は、近年様々なものが提案されており、最近１０年で飛躍的に向上している。これらの方法は、従来方法に比べても、より柔軟な認識手法を用いることにより、例えば、“ペット”の認識にも適用できる可能性のあるものが多い。ここで、“ペット”とは、例えば、一般家庭で飼われているものを指し、例えば、犬、猫、鳥、魚、カメレオン、ハムスター、モルモット、ねずみ、リス、ウサギ、カメ、ヘビなど、様々な種類の動物のいずれであっても良い。 Various methods for object recognition by image processing have been proposed in recent years, and have improved dramatically in the last decade. In many cases, these methods can be applied to, for example, “pet” recognition by using a more flexible recognition method than the conventional methods. Here, “pet” means, for example, one kept in a general household, such as dog, cat, bird, fish, chameleon, hamster, guinea pig, mouse, squirrel, rabbit, turtle, snake, etc. Any kind of animal may be used.

認識されるペットの種類が異なる場合、異なる特徴量を用いて認識するほうが認識の精度が高くなることが考えられる。例えば、犬であれば、“四本足”、“関節”、“毛”、“しっぽ”などがその特徴量になり得るし、鳥であれば、“羽”のテクスチャを特徴量として認識に用いると好適である。したがって、特徴量として何を採用するかを明示的に与えることなしに、装置が、そのペット特有の特徴量を選択することで識別器を構成することができると、自由度の高い画像認識が可能となる。 When the types of pets to be recognized are different, it is conceivable that the recognition accuracy is higher when the recognition is performed using different feature amounts. For example, “four legs”, “joint”, “hair”, “tail”, etc. can be the feature amount for dogs, and the texture of “wings” can be recognized as feature amounts for birds. It is preferable to use it. Therefore, if the device can configure the discriminator by selecting the feature amount specific to the pet without explicitly giving what to adopt as the feature amount, image recognition with a high degree of freedom is achieved. It becomes possible.

そこで、次に、図１９乃至図３７を参照して、上述した処理により、画像データ内から抽出された認識するべき物体に対応する部分を用いて、実際に物体を認識する認識器を生成するための学習処理や、学習の結果である認識器を利用した認識処理の具体的な例について説明する。以下に説明する方法は、パラメータを変化させることで、上述したようなペットを認識する場合などに、柔軟に対応することが出来る。 Then, referring to FIG. 19 to FIG. 37, a recognizer that actually recognizes an object is generated by using the portion corresponding to the object to be recognized extracted from the image data by the above-described processing. A specific example of learning processing for the purpose and recognition processing using a recognizer that is the result of learning will be described. The method described below can flexibly cope with the case of recognizing a pet as described above by changing parameters.

図１９は、本発明の実施の形態における画像処理システム５１の一構成例を示す図である。この画像処理システム５１は、学習フェーズに用いられる学習装置７１と、認識フェーズに用いられる認識装置７２とを備えている。また、画像処理システム５１は、ここでは、学習装置７１と認識装置７２により構成されているものとして図示されているが、同様の機能を有する１つの装置により構成されているものとしても良いことはいうまでもない。 FIG. 19 is a diagram showing a configuration example of the image processing system 51 in the embodiment of the present invention. The image processing system 51 includes a learning device 71 used in the learning phase and a recognition device 72 used in the recognition phase. Further, although the image processing system 51 is illustrated here as being configured by the learning device 71 and the recognition device 72, it may be configured by a single device having the same function. Needless to say.

学習装置７１は、モデル画像取得部９１、モデル特徴点生成部９２、モデル特徴量生成部９３、モデル特徴量記憶部９４、学習画像取得部９５、学習特徴点生成部９６、学習特徴量生成部９７、学習相関特徴量生成部９８、正誤情報取得部９９、および、認識器生成部１００を含んで構成されている。 The learning device 71 includes a model image acquisition unit 91, a model feature point generation unit 92, a model feature amount generation unit 93, a model feature amount storage unit 94, a learning image acquisition unit 95, a learning feature point generation unit 96, and a learning feature amount generation unit. 97, a learning correlation feature value generation unit 98, an error information acquisition unit 99, and a recognizer generation unit 100.

モデル画像取得部９１は、図４を用いて説明した画像処理部１１と同様の構成を有しており、フォーカスカメラにより撮像して得られたモデル画像となる画像データを外部から取得するか、または、内部にフォーカスカメラを備えてモデル画像を撮像するとともに、フォーカスカメラにより撮像して得られたモデル画像データから焦点が合致した部分を抽出し、抽出された部分の画像データを、モデル特徴点生成部９２およびモデル特徴量生成部９３に出力する。 The model image acquisition unit 91 has the same configuration as the image processing unit 11 described with reference to FIG. 4, and acquires image data to be a model image obtained by imaging with a focus camera from the outside, Alternatively, a model image is taken with a focus camera inside, and a part in focus is extracted from the model image data obtained by the focus camera, and the extracted part of the image data is used as a model feature point. The data is output to the generation unit 92 and the model feature amount generation unit 93.

また、モデル画像取得部９１を、図４を用いて説明した画像処理部１１と同様の構成とはせずに、認識するべき部分のみを示すモデル画像を取得したり、または、ユーザの操作などにより、所定の画像データから、認識するべき部分のみを抽出して、モデル特徴点生成部９２に出力するようにしてもよい。モデル画像の数が少ないような場合などにおいては、後者の構成とし、モデル画像として認識に用いる部分をユーザの操作などにより確実に切り出すようにしても良い。 Further, the model image acquisition unit 91 is not configured similarly to the image processing unit 11 described with reference to FIG. 4, and acquires a model image showing only a portion to be recognized, or a user operation, etc. Thus, only the portion to be recognized may be extracted from the predetermined image data and output to the model feature point generation unit 92. In the case where the number of model images is small, the latter configuration may be used, and a portion used for recognition as a model image may be surely cut out by a user operation or the like.

モデル特徴点生成部９２は、モデル画像取得部９１から供給されたモデル画像からモデル特徴点を生成し、モデル特徴量生成部９３に供給する。特徴点については、画像における任意の点を利用することができ、特徴量の種類によってどのような点を用いるかを定義することもできる。具体的には、例えば、特徴点として色を用いる場合には、テクスチャのない平坦な領域内に特徴点が生成されると好適であり、特徴点として形や動きやテクスチャなどを用いる場合には、エッジ部分に特徴点が生成されると好適である。このようにすることにより、特徴量の種類に適した特徴点を適宜利用することが可能となる。 The model feature point generation unit 92 generates model feature points from the model image supplied from the model image acquisition unit 91 and supplies the model feature points to the model feature amount generation unit 93. As the feature points, any point in the image can be used, and what kind of points are used can be defined depending on the type of feature amount. Specifically, for example, when color is used as a feature point, it is preferable that the feature point is generated in a flat area without texture, and when a shape, movement, texture, or the like is used as the feature point. It is preferable that feature points are generated at the edge portions. In this way, it is possible to appropriately use feature points suitable for the type of feature amount.

モデル特徴量生成部９３は、モデル特徴点生成部９２によって生成されたモデル特徴点におけるモデル特徴量を生成し、モデル特徴量記憶部９４に供給する。この特徴量は、局所特徴量（local features）および大域特徴量（global features）のいずれでもよく、その種類も、形、色、動き、テクスチャ、素材、歩行パターンなどに関する種々のものを定義することができる。モデル特徴量はその種類に応じて選択することができ、例えば、局所的な形情報として、ガウス導関数の一次微分、二次微分、三次微分や色の分布などが選択されるようにすることができる。これにより、特徴量の種類に適した特徴量が適宜利用される。 The model feature amount generation unit 93 generates a model feature amount at the model feature point generated by the model feature point generation unit 92 and supplies the model feature amount to the model feature amount storage unit 94. This feature may be either a local feature or a global feature, and its type defines various things related to shape, color, movement, texture, material, walking pattern, etc. Can do. The model feature can be selected according to its type. For example, as the local shape information, the first derivative, second derivative, third derivative or color distribution of the Gauss derivative is selected. Can do. Thereby, the feature amount suitable for the type of feature amount is appropriately used.

そして、モデル特徴量記憶部９４は、モデル特徴量生成部９３によって生成されたモデル特徴点におけるモデル特徴量を記憶する。 The model feature amount storage unit 94 stores the model feature amount at the model feature point generated by the model feature amount generation unit 93.

学習画像取得部９５は、図４を用いて説明した画像処理部１１と同様の構成を有しており、フォーカスカメラにより撮像して得られた学習処理に用いられる学習画像データを外部から取得するか、または、内部にフォーカスカメラを備えて学習画像を撮像するとともに、フォーカスカメラにより撮像して得られた学習画像データから焦点が合致した部分を抽出し、抽出された部分の画像データを、学習特徴点生成部９６および学習特徴量生成部９７に出力する。 The learning image acquisition unit 95 has the same configuration as that of the image processing unit 11 described with reference to FIG. 4, and acquires learning image data used for learning processing obtained by imaging with a focus camera from the outside. Or, a focus camera is provided inside to capture a learning image, and a focus-matched part is extracted from learning image data acquired by the focus camera, and the extracted part of the image data is learned. The result is output to the feature point generation unit 96 and the learning feature amount generation unit 97.

学習特徴点生成部９６は、学習画像から学習特徴点を生成し、学習特徴量生成部９７に供給する。特徴点については、画像における任意の点を利用することができ、特徴量の種類によってどのような点を用いるかを定義することもできる。具体的には、例えば、特徴点として色を用いる場合には、テクスチャのない平坦な領域内に特徴点が生成されると好適であり、特徴点として形や動きやテクスチャなどを用いる場合には、エッジ部分に特徴点が生成されると好適である。これにより、特徴量の種類に適した特徴点を適宜利用することができる。 The learning feature point generation unit 96 generates a learning feature point from the learning image and supplies the learning feature point to the learning feature amount generation unit 97. As the feature points, any point in the image can be used, and what kind of points are used can be defined depending on the type of feature amount. Specifically, for example, when color is used as a feature point, it is preferable that the feature point is generated in a flat area without texture, and when a shape, movement, texture, or the like is used as the feature point. It is preferable that feature points are generated at the edge portions. Thereby, the feature point suitable for the kind of feature-value can be utilized suitably.

学習特徴量生成部９７は、学習特徴点生成部９６によって生成された学習特徴点における学習特徴量を生成し、学習相関特徴量生成部９８に供給する。この特徴量も、局所特徴量および大域特徴量の何れでもよく、その種類も、形、色、動き、テクスチャ、素材、歩行パターンなどに関する種々のものを定義することができる。学習特徴量はその種類に応じて選択することができ、例えば、局所的な形情報として、ガウス導関数の一次微分、二次微分、三次微分や色の分布などが選択されるようにすることができる。これにより、特徴量の種類に適した特徴量を適宜利用することができる。 The learning feature amount generation unit 97 generates a learning feature amount at the learning feature point generated by the learning feature point generation unit 96 and supplies the learning feature amount to the learning correlation feature amount generation unit 98. This feature amount may be any one of a local feature amount and a global feature amount, and various types of shapes, colors, movements, textures, materials, walking patterns, and the like can be defined. The learning feature can be selected according to its type. For example, the first, second, third, and color distributions of Gaussian derivatives are selected as local shape information. Can do. Thereby, the feature amount suitable for the type of feature amount can be used as appropriate.

学習相関特徴量生成部９８は、モデル特徴量のそれぞれに対して各学習特徴量との間の相関を求めて、学習相関特徴量を生成する。相関特徴量の生成方法についての詳細は、図２５および図２６を用いて後述する。 The learning correlation feature value generation unit 98 obtains a correlation between each model feature value and each learning feature value, and generates a learning correlation feature value. Details of the correlation feature generation method will be described later with reference to FIGS. 25 and 26.

正誤情報取得部９９は、学習画像のそれぞれに対して、モデル画像に含まれる認識対象を含む画像であるか否かを示す情報を取得する。正誤情報は、例えば、学習処理のためのモデル画像や学習画像を学習装置に供給する、または、モデル画像や学習画像の撮像を指令する処理を指示するユーザにより入力されるものである。 The correct / incorrect information acquisition unit 99 acquires information indicating whether each of the learning images is an image including a recognition target included in the model image. The correct / incorrect information is input by a user who supplies a model image or learning image for learning processing to the learning device, or instructs processing for instructing imaging of the model image or learning image, for example.

認識器生成部１００は、学習相関特徴量生成部９８によって生成された学習相関特徴量および正誤情報に基づいて認識器の統計学習を行い、その過程で選択されたモデル特徴量を、選択特徴量として認識装置７２に供給するとともに、学習の結果得られる認識器を認識装置７２に供給する。認識器の生成には、例えば、ブースティングアルゴリズムを用いることができる。このブースティングアルゴリズムは、重み付き投票に基づくものであり、例えば、Discrete AdaBoost AlgorithmやGentle AdaBoost Algorithm等を利用することができる。認識器の生成についての詳細は、図２７および図２８を用いて後述する。 The recognizer generation unit 100 performs statistical learning of the recognizer based on the learning correlation feature amount and the correct / incorrect information generated by the learning correlation feature amount generation unit 98, and selects the model feature amount selected in the process as the selected feature amount. And a recognizer obtained as a result of learning is supplied to the recognition device 72. For example, a boosting algorithm can be used to generate the recognizer. This boosting algorithm is based on weighted voting, and for example, Discrete AdaBoost Algorithm, Gentle AdaBoost Algorithm, etc. can be used. Details of the generation of the recognizer will be described later with reference to FIGS. 27 and 28.

次に、認識装置７２は、選択特徴量記憶部１２１、認識器記憶部１２２、認識画像取得部１２３、認識特徴点生成部１２４、認識特徴量生成部１２５、認識相関特徴量生成部１２６、認識処理部１２７、および、認識結果出力部１２８を含んで構成されている。 Next, the recognition device 72 includes a selection feature amount storage unit 121, a recognizer storage unit 122, a recognition image acquisition unit 123, a recognition feature point generation unit 124, a recognition feature amount generation unit 125, a recognition correlation feature amount generation unit 126, a recognition A processing unit 127 and a recognition result output unit 128 are included.

選択特徴量記憶部１２１は、学習装置７１における学習処理の過程で選択されたモデル特徴量、すなわち、認識器記憶部１２２に記憶されている、認識器生成部１００により生成された認識器に対応する特徴量である選択特徴量の供給を受け、記憶する。 The selected feature amount storage unit 121 corresponds to the model feature amount selected in the course of the learning process in the learning device 71, that is, the recognizer generated by the recognizer generation unit 100 stored in the recognizer storage unit 122. The selected feature quantity, which is the feature quantity to be received, is supplied and stored.

認識器記憶部１２２は、学習装置７１における学習処理によって認識器生成部１００において生成された認識器を記憶する。 The recognizer storage unit 122 stores the recognizer generated in the recognizer generation unit 100 by the learning process in the learning device 71.

認識画像取得部１２３は、図４を用いて説明した画像処理部１１と同様の構成を有しており、フォーカスカメラにより撮像して得られた認識処理に用いられる学習画像データを外部から取得するか、または、内部にフォーカスカメラを備えて認識画像を撮像するとともに、フォーカスカメラにより撮像して得られた認識画像データから焦点が合致した部分を抽出し、抽出された部分の画像データを、認識特徴点生成部１２４および認識特徴量生成部１２５に出力する。 The recognition image acquisition unit 123 has the same configuration as that of the image processing unit 11 described with reference to FIG. 4, and acquires learning image data used for recognition processing obtained by imaging with a focus camera from the outside. Alternatively, the camera is equipped with a focus camera inside to capture a recognition image, and a portion in focus is extracted from the recognition image data obtained by capturing with the focus camera, and the extracted portion of the image data is recognized. The result is output to the feature point generation unit 124 and the recognized feature amount generation unit 125.

認識特徴点生成部１２４は、認識画像から特徴点である認識特徴点を生成する。特徴点については、画像における任意の点を利用することができ、特徴量の種類によってどのような点を用いるかを定義することもできる。具体的には、例えば、特徴点として色を用いる場合には、テクスチャのない平坦な領域内に特徴点が生成されると好適であり、特徴点として形や動きやテクスチャなどを用いる場合には、エッジ部分に特徴点が生成されると好適である。これにより、特徴量の種類に適した特徴点を適宜利用することができる。 The recognition feature point generation unit 124 generates a recognition feature point that is a feature point from the recognition image. As the feature points, any point in the image can be used, and what kind of points are used can be defined depending on the type of feature amount. Specifically, for example, when color is used as a feature point, it is preferable that the feature point is generated in a flat area without texture, and when a shape, movement, texture, or the like is used as the feature point. It is preferable that feature points are generated at the edge portions. Thereby, the feature point suitable for the kind of feature-value can be utilized suitably.

認識特徴量生成部１２５は、認識特徴点生成部１２４によって生成された認識特徴点における特徴量である認識特徴量を生成する。この特徴量も、局所特徴量および大域特徴量の何れでもよく、その種類も、形、色、動き、テクスチャ、素材、歩行パターンなどに関する種々のものを定義することができる。認識特徴量はその種類に応じて選択することができ、例えば、局所的な形情報として、ガウス導関数の一次微分、二次微分、三次微分や色の分布などが選択されるようにすることができる。これにより、特徴量の種類に適した特徴量を適宜利用することができる。 The recognition feature value generation unit 125 generates a recognition feature value that is a feature value at the recognition feature point generated by the recognition feature point generation unit 124. This feature amount may be any one of a local feature amount and a global feature amount, and various types of shapes, colors, movements, textures, materials, walking patterns, and the like can be defined. The recognition feature can be selected according to its type. For example, as the local shape information, the first derivative, second derivative, third derivative or color distribution of the Gaussian derivative is selected. Can do. Thereby, the feature amount suitable for the type of feature amount can be used as appropriate.

認識相関特徴量生成部１２６は、選択特徴量記憶部１２１に記憶された選択特徴量のそれぞれに対して各認識特徴量との間の相関を求めて、認識相関特徴量を生成する。相関特徴量の生成方法についての詳細は、学習処理における場合と同様に、図２５および図２６を用いて後述する。 The recognition correlation feature value generation unit 126 calculates a correlation between each recognition feature value for each of the selection feature values stored in the selection feature value storage unit 121, and generates a recognition correlation feature value. Details of the correlation feature generation method will be described later with reference to FIGS. 25 and 26, as in the learning process.

認識処理部１２７は、認識相関特徴量生成部１２６によって生成された認識相関特徴量を、認識器記憶部１２２に記憶された認識器へ代入することによって、認識画像データの各々に認識対象が含まれるか否かの認識を行い、認識結果を認識結果出力部１２８に供給する。 The recognition processing unit 127 substitutes the recognition correlation feature value generated by the recognition correlation feature value generation unit 126 to the recognizer stored in the recognizer storage unit 122, so that each recognition image data includes a recognition target. Is recognized, and the recognition result is supplied to the recognition result output unit 128.

そして、認識結果出力部１２８は、認識処理部１２７から供給された認識結果を、例えば、表示部に表示したり、音声データとして出力したり、または、ＬＥＤなどを用いてユーザに通知したり、もしくは、所定の伝送路を介したり所定の記録媒体に記録させることなどにより、他の装置に出力する。 Then, the recognition result output unit 128 displays the recognition result supplied from the recognition processing unit 127, for example, on the display unit, outputs it as voice data, or notifies the user using an LED, Alternatively, the data is output to another device through a predetermined transmission path or recorded on a predetermined recording medium.

ここで、モデル画像取得部９１、学習画像取得部９５、または、認識画像取得部１２３のうちの少なくともいずれか１つを、図４を用いて説明した画像処理部１１と同様の構成とし、内部にフォーカスカメラを備えて認識画像を撮像するとともに、フォーカスカメラにより撮像して得られた認識画像データから焦点が合致した部分を抽出することができるようにすることにより実現可能となる学習処理や認識処理について説明する。 Here, at least one of the model image acquisition unit 91, the learning image acquisition unit 95, and the recognition image acquisition unit 123 has the same configuration as the image processing unit 11 described with reference to FIG. In addition to taking a recognition image with a focus camera, learning processing and recognition that can be realized by making it possible to extract a focused part from the recognition image data obtained by the focus camera. Processing will be described.

例えば、学習処理や認識処理において用いられる画像データを取得するにあたって、フォーカスカメラにおいて設定された焦点距離に、ユーザが認識される物体を設置したのち、装置に対して撮像の指示を与えて、撮像処理を行うようにしても良い。しかしながら、そのようにした場合、例えば、ペットなどの動く物体を認識する場合や、認識しようとする物体が多数存在する場合など、ユーザが、フォーカスカメラにおいて設定された焦点距離に認識される物体をいちいち設置するのでは、操作が煩雑となってしまう。 For example, when acquiring image data used in learning processing or recognition processing, an object to be recognized by the user is placed at the focal length set in the focus camera, and then an imaging instruction is given to the device to capture the image. Processing may be performed. However, in such a case, for example, when a moving object such as a pet is recognized or when there are many objects to be recognized, the user recognizes an object recognized at a focal length set in the focus camera. If it is installed one by one, the operation becomes complicated.

そこで、フォーカスカメラの撮像範囲内に入るように、認識される物体を適当に移動させ、その間、連続的に撮像処理を実行させるようにする。このようにすれば、認識対象に焦点が合致していない場合には、正しく抽出処理が行われないか、または、学習処理において認識するべき物体を含まない画像の入力となるか、もしくは、認識処理において、認識物を含まない画像と判定される画像の入力となる。そして、これに対して、認識される物体がフォーカスカメラにおいて設定された焦点距離にいるときに撮像された画像に基づいて背景の分離処理が行われたとき、認識するべき物体が撮像された領域のみが切り出された画像データを、煩雑な操作を行うことなく容易に得ることが可能となる。 Therefore, the recognized object is appropriately moved so as to fall within the imaging range of the focus camera, and the imaging process is continuously executed during that time. In this way, when the focus is not matched to the recognition target, the extraction process is not performed correctly, or an image that does not include an object to be recognized in the learning process is input, or the recognition is performed. In the processing, an image determined to be an image that does not include a recognized object is input. On the other hand, when the background separation processing is performed based on the image captured when the recognized object is at the focal length set in the focus camera, the area where the object to be recognized is captured It is possible to easily obtain image data obtained by cutting out only the image data without performing complicated operations.

例えば、認識するべき物体がフォーカスカメラの焦点距離から遠いとき、得られる画像データにおいては、全体的にピンボケとなってしまうか、または、全く関係の無いものにピントがあってしまう。このような画像データの背景を分離しても、認識するべき物体の領域が全く存在しないという結果となって、学習処理や認識処理が実行できないか、または、全く関係のないものが写っている部分が認識するべき物体の領域として抽出される。全く関係のないものが写っている部分が認識するべき物体の領域として抽出された場合、学習処理においては、認識するべき物体を含まない画像の入力に対応し、認識処理においては、全く関係のないものが写っている部分から得られる特徴量では記憶されたモデル特徴量に一致しないことから、認識するべき物体を含む画像ではないと言う結果が得られる。そして、認識するべき物体がフォーカスカメラの焦点距離に合致したときに、正しい抽出処理が行われて、有用なモデル画像、学習画像、または、認識画像が得られて、学習処理または認識処理を行うことができる。 For example, when the object to be recognized is far from the focal length of the focus camera, the obtained image data may be out of focus as a whole, or may be out of focus at all irrelevant things. Even if the background of such image data is separated, the result is that there is no region of the object to be recognized, and the learning process or the recognition process cannot be executed, or there is something unrelated to it. A part is extracted as a region of an object to be recognized. When a part that has nothing to do with the relationship is extracted as an object region to be recognized, the learning process corresponds to the input of an image that does not include the object to be recognized. Since the feature amount obtained from the part where the non-existent image is captured does not match the stored model feature amount, the result is that the image does not include the object to be recognized. When the object to be recognized matches the focal length of the focus camera, correct extraction processing is performed, and useful model images, learning images, or recognition images are obtained, and learning processing or recognition processing is performed. be able to.

このようにすることにより、例えば、ペットを認識したい場合にカメラ前でペットが動いていたとしても、その対象に焦点が合致したときのみ、認識物の抽出処理が正しく行われて、そのとき得られた画像データによって、学習処理や認識処理が正しく実行される。また、ユーザが、例えば、把持することなどにより移動可能なものを認識させたい場合には、フォーカスカメラの焦点位置ぴったりに認識物を設置しなくても、大体、そのあたりを適当に移動させていれば、その位置がフォーカスカメラの焦点距離に合致したときに正しい抽出処理が行われて、有用なモデル画像、学習画像、または、認識画像が得られて、学習処理または認識処理を正しく行うことができる。 In this way, for example, even if the pet is moving in front of the camera when it is desired to recognize the pet, the recognition object is correctly extracted only when the focus is matched to the target. The learning process and the recognition process are correctly executed by the obtained image data. In addition, when the user wants to recognize a movable object, for example, by gripping it, the user moves the area appropriately without setting the recognition object exactly at the focal point of the focus camera. If the position matches the focal length of the focus camera, correct extraction processing is performed, and useful model images, learning images, or recognition images are obtained, and learning processing or recognition processing is performed correctly. Can do.

特に、認識処理において認識画像を得る際に、フォーカスカメラの撮像範囲内に入るように、認識される物体を適当に移動させ、その間、連続的に撮像処理を実行させるようにすると好適である場合が考えられる。例えば、通路を通行する人物を認識したいとき、その通路を通行する人物がフォーカスカメラの焦点位置に対応する所定の位置を通過するときに自動的に正しい抽出処理が行われるので、人物を所定位置に立たせて、撮像開始を指令することなどを必要とせずに、正しい認識処理が可能となる。 In particular, when a recognition image is obtained in the recognition processing, it is preferable to appropriately move the recognized object so as to fall within the imaging range of the focus camera and continuously execute the imaging processing during that time. Can be considered. For example, when it is desired to recognize a person passing through the passage, the correct extraction process is automatically performed when the person passing through the passage passes a predetermined position corresponding to the focus position of the focus camera. Thus, the correct recognition process can be performed without requiring the start of imaging.

認識結果の出力の方法は、例えば、連続して撮像された認識画像のそれぞれに対して実行される認識処理の全ての認識結果を出力するものとしても良いし、連続して撮像された認識画像のそれぞれに対して実行される認識処理の認識結果を所定回数、または、所定時間だけ保持しておき、その推移を観察して、認識されているらしいピーク値における認識結果を出力するものとしても良い。また、同様に認識結果の推移を観察して、閾値以上の値があれば、認識されたものとしてその結果を出力するものとしても良い。 As a method of outputting the recognition results, for example, all the recognition results of the recognition processing executed on each of the recognition images captured continuously may be output, or the recognition images captured continuously. The recognition result of the recognition process executed for each of the above is held for a predetermined number of times or for a predetermined time, the transition is observed, and the recognition result at the peak value that seems to be recognized is output. good. Similarly, the transition of the recognition result is observed, and if there is a value equal to or greater than the threshold value, the result may be output as being recognized.

図２０を参照して、学習装置７１において実行される学習フェーズの概要について説明する。 The outline of the learning phase executed in the learning device 71 will be described with reference to FIG.

ここでは、Ｘ個（Ｘは２以上の整数）のモデル画像（ＰＭ₁乃至ＰＭ_X）から生成されたＮ個（Ｎは２以上の整数）の特徴点（モデル特徴点）における特徴量（モデル特徴量）がモデル特徴量記憶部９４（特徴量プール）に蓄積されているものとする。モデル画像は、全て認識対象を含むものである。すなわち、モデル特徴量記憶部９４には、認識対象を含む画像全般の特徴点における特徴量が蓄積されることになる。なお、この例では、ペットの犬が認識対象として含まれている。 Here, feature quantities (model feature points) of N (N is an integer of 2 or more) feature points (model feature points) generated from _X (X is an integer of 2 or more) model images (PM _{1 to} PM _X ). It is assumed that (feature amount) is accumulated in the model feature amount storage unit 94 (feature amount pool). All model images include recognition targets. That is, the feature amount at the feature points of the entire image including the recognition target is accumulated in the model feature amount storage unit 94. In this example, a pet dog is included as a recognition target.

一方、Ｍ個（Ｍは２以上の整数）の学習画像（ＰＩ₁乃至ＰＩ_M）には、認識対象を含むものと含まないものとが混在する。認識対象を含むか否かは、正誤情報取得部９９により取得される正誤情報によって示される。図２０の例では、認識対象を含む場合には「＋１」を、認識対象を含まない場合には「−１」がそれぞれ付与されている。すなわち、学習装置７１においては、学習画像取得部９５が認識対象を含む学習画像の供給を受けたとき、学習特徴量生成部９７において、認識対象を含む画像全般の特徴点における特徴量が求められるとともに、正誤情報取得部９９が、この学習画像には認識対象が含まれているという正誤情報「＋１」の供給を受ける。また、学習装置７１においては、学習画像取得部９５が認識対象を含まない学習画像の供給を受けたとき、学習特徴量生成部９７において、認識対象を含まない画像全般の特徴点における特徴量が求められるとともに、正誤情報取得部９９が、この学習画像には認識対象が含まれていないという正誤情報「−１」の供給を受ける。 On the other hand, M learning images (PI _{1 to} PI _M ) (M is an integer equal to or greater than 2) include those that include a recognition target and those that do not. Whether or not the recognition target is included is indicated by correct / incorrect information acquired by the correct / incorrect information acquisition unit 99. In the example of FIG. 20, “+1” is assigned when the recognition target is included, and “−1” is assigned when the recognition target is not included. That is, in the learning device 71, when the learning image acquisition unit 95 receives the supply of the learning image including the recognition target, the learning feature amount generation unit 97 obtains the feature amount at the feature points of the entire image including the recognition target. At the same time, the correct / incorrect information acquiring unit 99 receives supply of correct / incorrect information “+1” indicating that the learning image includes a recognition target. Further, in the learning device 71, when the learning image acquisition unit 95 receives the supply of the learning image that does not include the recognition target, the learning feature amount generation unit 97 determines the feature amount at the feature point of the entire image that does not include the recognition target. In addition, the correct / incorrect information acquisition unit 99 is supplied with correct / incorrect information “−1” indicating that the recognition target is not included in the learning image.

そして、学習相関特徴量生成部９８において、Ｍ個の学習画像の各々について生成された複数の特徴点（学習特徴点）における特徴量（学習特徴量）と、モデル特徴量記憶部９４に記憶されたＮ個のモデル特徴量との間の相関値が生成され、Ｎ個のモデル特徴量のそれぞれに対して最も相関の高い学習特徴量が選択されて、その際に生成されたＮ個の相関値が相関特徴量となる。この相関特徴量は、Ｍ個の学習画像の各々について生成され、Ｍ個の学習相関特徴量を構成する。 The learning correlation feature value generation unit 98 stores the feature values (learning feature values) at a plurality of feature points (learning feature points) generated for each of the M learning images and the model feature value storage unit 94. The correlation value between the N model feature values is generated, and the learning feature value having the highest correlation is selected for each of the N model feature values, and the N correlation values generated at that time are selected. The value is the correlation feature amount. This correlation feature amount is generated for each of the M learning images, and constitutes M learning correlation feature amounts.

このようにして得られた学習相関特徴量および正誤情報によって、認識器生成部１００において、認識器の学習が行われる。この認識器は、学習フェーズに続く認識フェーズにおいて、入力された認識画像に認識対象が含まれているか否かを判断するためのものである。 The recognizer generation unit 100 learns the recognizer based on the learning correlation feature quantity and the correct / incorrect information obtained in this way. This recognizer is for determining whether or not a recognition target is included in the input recognition image in the recognition phase following the learning phase.

この画像処理システム５１において利用される特徴量は、局所特徴量（local features）および大域特徴量（global features）のいずれでもよく、その種類も、形、色、動き、テクスチャ、素材、歩行パターンなどに関する種々のものを定義することができる。例えば、形に関する局所特徴量としては、部分領域の輝度情報をそのまま利用してもよく、また、ラプラシアン（二次微分）、ガウシアン微分関数（Gaussian Derivatives）、ステアラブルフィルタ（Steerable Filters）、ガボアフィルタ（Gabor Filters）、ＳＩＦＴ（Scale-Invariant Features Transform）などによる変換を施したものを採用してもよい。また、色に関する局所特徴量としては、部分領域の色情報（ＲＧＢやＨＳＶ等）をそのまま利用してもよいし、ヒストグラムとしてまとめた情報を採用してもよい。さらに、動きに関する局所特徴量としては、動きベクトル（所謂、optical flow）を利用することができる。 The feature quantity used in the image processing system 51 may be either a local feature quantity or a global feature quantity, and the type of the feature quantity is shape, color, movement, texture, material, walking pattern, etc. Various things can be defined. For example, as the local feature amount related to the shape, the luminance information of the partial area may be used as it is, and Laplacian (secondary derivative), Gaussian derivative function (Gaussian Derivatives), steerable filter (Steerable Filters), Gabor filter ( Gabor Filters) or SIFT (Scale-Invariant Features Transform) may be used. Further, as the local feature amount related to the color, the color information (RGB, HSV, etc.) of the partial area may be used as it is, or information collected as a histogram may be employed. Furthermore, a motion vector (so-called optical flow) can be used as a local feature amount related to motion.

動物は、例えば、足の運び方や移動の仕方など、その種類によって、動きに特徴を有する。このことから、特に、ペットを認識させようとする場合、複数のフレーム画像データにより供給される動画像データを、モデル画像データ、学習画像データ、および、認識画像データとして取得することができれば、認識対象物の動きを特徴量として用いることも有用であると考えられる。 Animals have characteristics in movement depending on their types, for example, how to carry their feet and how to move. Therefore, in particular, when trying to recognize a pet, if moving image data supplied by a plurality of frame image data can be acquired as model image data, learning image data, and recognition image data, It is considered useful to use the movement of the object as a feature amount.

具体的には、例えば、オプティカルフロー（optical flow）に代表される、動きを記述する手法を用いることにより、認識対象物の動きを特徴量として用いることが可能となる。オプティカルフローとは、視覚表現（通常、時間的に連続する画像データ）の中で物体の動きをベクトルで表したものである。 Specifically, for example, by using a method for describing movement represented by optical flow, it is possible to use the movement of the recognition target object as a feature amount. The optical flow is a representation of the motion of an object as a vector in a visual expression (usually temporally continuous image data).

また、特徴点については、画像における任意の点を利用することができるが、一般にはエッジやコーナー点が用いられることが多い。この特徴点は、特徴量の種類によって定義することができる。例えば、形に関する特徴量についてはエッジやコーナー点に特徴が現れ易いため、エッジやコーナー点を特徴点として採用することが望ましい。一方、色に関する特徴量については、物体の領域内に特徴が現れ易いため、特定の点に限定せずにランダムな点を特徴点として採用したり、エッジ部分から遠いテクスチャのない部分から特徴点を採用することが望ましい。 As the feature point, any point in the image can be used, but in general, an edge or a corner point is often used. This feature point can be defined by the type of feature amount. For example, as for the feature amount related to the shape, it is desirable to employ the edge or corner point as the feature point because the feature easily appears at the edge or corner point. On the other hand, with regard to feature quantities related to color, since features tend to appear in the object area, random points can be adopted as feature points without being limited to specific points, or feature points can be found from parts without texture far from the edge part. It is desirable to adopt.

形に関する特徴点としてエッジやコーナー点を求めるためには、ハリス・コーナー点検出器（Harris corner detector）を用いることができる。このハリス・コーナー点検出器では、まず、画像データにおける各画素点Ｉ（ｘ，ｙ）において、その輝度勾配を求め、局所領域での２次モーメントマトリックスＭを次の式（２）のように算出する。 In order to obtain an edge or a corner point as a feature point related to the shape, a Harris corner detector can be used. In this Harris corner point detector, first, the luminance gradient is obtained at each pixel point I (x, y) in the image data, and the second moment matrix M in the local region is expressed by the following equation (2). calculate.

この２次モーメントマトリックスＭの２つの固有値をαおよびβとすると、固有値αおよびβのうち、両者が所定の閾値より大きければコーナー点、片方が所定の閾値より大きければエッジ、両方が所定の閾値より小さければ何もない点となる。そこで、この判定を行うために、この２次モーメントマトリックスＭの行列式ｄｅｔ（Ｍ）およびトレース（対角成分の和）ｔｒａｃｅ（Ｍ）を算出して、次の式（３）を用いてコーナー応答関数ＣＲを求める。 Assuming that the two eigenvalues of the second moment matrix M are α and β, of the eigenvalues α and β, if both are greater than a predetermined threshold, a corner point, if one is greater than the predetermined threshold, both are predetermined thresholds. If it is smaller, there is nothing. Therefore, in order to make this determination, a determinant det (M) and a trace (sum of diagonal components) trace (M) of the second moment matrix M are calculated, and a corner is calculated using the following equation (3). A response function CR is obtained.

ＣＲ＝ｄｅｔ（Ｍ）−ｋ（ｔｒａｃｅ（Ｍ））２・・・（３） CR = det (M) -k (trace (M)) 2 (3)

ここでは、式（２）において、ｋ＝０．０４であるものとすることができる。 Here, in Equation (2), it can be assumed that k = 0.04.

このコーナー応答関数ＣＲが正の数であればコーナー点であることを意味し、負の数であればエッジであることを意味する。但し、コーナー応答関数ＣＲが一定値よりも小さい場合には何もない点となる。このような手順によりコーナー点またはエッジを抽出することができる。 If this corner response function CR is a positive number, it means a corner point, and if it is a negative number, it means an edge. However, there is nothing when the corner response function CR is smaller than a certain value. A corner point or an edge can be extracted by such a procedure.

なお、ここではコーナー点またはエッジを判定するために減算によるコーナー応答関数ＣＲを用いたが、次の式（４）のように除算を用いるものとしてもよい。 Although the corner response function CR by subtraction is used here to determine the corner point or edge, division may be used as in the following equation (4).

ＣＲ＝ｄｅｔ（Ｍ）／（ｋ（ｔｒａｃｅ（Ｍ））２）・・・（４） CR = det (M) / (k (trace (M)) 2) (4)

また、形に関する特徴量としてSteerableFilters（Gausian Derivatives）を用いる場合、以下の式（５）乃至式（１０）で示される、ガウス関数とその微分関数により、基底カーネルの演算を行い、その線形結合で表現される。Ｇがガウス関数、Ｇ₁は一次微分、Ｇ₂は二次微分、Ｇ₃は三次微分関数である。θは、計算したいフィルタの方向である。例えば、piを４方向に等分したり、８方向に等分することにより、特徴量を求めることができる。 Also, when SteerableFilters (Gausian Derivatives) are used as feature quantities related to the shape, calculation of the base kernel is performed by the Gaussian function and its differential function expressed by the following equations (5) to (10), and the linear combination is performed. Expressed. G is a Gaussian function, G ₁ is a first derivative, G ₂ is a second derivative, and G ₃ is a third derivative function. θ is the direction of the filter to be calculated. For example, the feature amount can be obtained by equally dividing pi into four directions or equally dividing into eight directions.

上述した式で計算された、二次元上でのSteerableFilterのカーネルの形状を、図２１に示す。 FIG. 21 shows the shape of the SteerableFilter kernel in two dimensions calculated by the above formula.

局所特徴量を特徴量として強くするために、近傍のジェットを結合したものを用いても良い。この時、図２２に示すように、注目画素から５画素程度離れた場所からジェットを取ってくると好適である。結合に用いるジェットが注目画素からあまり離れすぎていると、局所情報が物体の変形に弱くなる。逆に、結合に用いるジェットが注目画素に近すぎると、多くのジェットを結合している意味が薄くなってしまう。 In order to strengthen the local feature quantity as the feature quantity, a combination of neighboring jets may be used. At this time, as shown in FIG. 22, it is preferable to pick up the jet from a location about 5 pixels away from the target pixel. If the jet used for combining is too far away from the pixel of interest, the local information becomes weak against deformation of the object. Conversely, if the jet used for combining is too close to the pixel of interest, the meaning of combining many jets will be reduced.

また、局所特徴量を回転に対して不変にすることもできる。例えば、図２３に示されるように、中心の画素点での主方向αを計算し、その方向に対して、特徴量を回転することにより、局所特徴量を回転に対して不変にすることができる。主方向αは、あるガウシアンの幅σの1次微分のｘ方向とｙ方向の出力から、次の式（１１）で求められる。 Further, the local feature amount can be made invariant with respect to the rotation. For example, as shown in FIG. 23, by calculating the main direction α at the center pixel point and rotating the feature amount in the direction, the local feature amount can be made invariant to the rotation. it can. The main direction α is obtained by the following expression (11) from the x-direction and y-direction outputs of the first derivative of a certain Gaussian width σ.

このαを使えば、例えば、次の式（１２）を用いて、4方向の出力を得ることができる。 If this α is used, an output in four directions can be obtained using, for example, the following equation (12).

一方、色に関する特徴点として色のヒストグラムを利用する場合、色空間を所定の色領域に区切って、各色領域における分布を求める。図２４は、ＨＳＶ空間におけるヒストグラムの例である。なお、このＨＳＶ表現では、Ｈ（Hue）が色相を表し、Ｓ（Saturation）が彩度を表し、Ｖ（Value）が明度を表す。 On the other hand, when a color histogram is used as a color feature point, the color space is divided into predetermined color areas, and distributions in the respective color areas are obtained. FIG. 24 is an example of a histogram in the HSV space. In this HSV expression, H (Hue) represents hue, S (Saturation) represents saturation, and V (Value) represents lightness.

図２４のＡでは、簡単な例として、ＨＳＶ各成分について２区間に区分して、合計８つ（＝２３）の色領域を設けている。ある特徴点についてその近傍（例えば、１０ピクセル程度）を含む画像領域における色の分布から、各色領域における出現頻度を求めたものが、図２４のＢに示されるヒストグラムである。図２４のＢに示されるヒストグラムの８つの頻度データは、図２４のＡに示される８つの色領域のうちのいずれかにそれぞれ対応するものである。 In FIG. 24A, as a simple example, each HSV component is divided into two sections, and a total of eight (= 23) color regions are provided. A histogram shown in FIG. 24B is obtained by calculating the appearance frequency in each color region from the color distribution in the image region including the vicinity (for example, about 10 pixels) of a certain feature point. The eight frequency data of the histogram shown in B of FIG. 24 respectively correspond to any of the eight color regions shown in A of FIG.

このように、学習処理や認識処理に用いられる特徴点や特徴量は、特徴量の種類に応じてそれぞれ適したものを定義することができる。そして、このようにして求められた特徴量は、学習相関特徴量生成部９８および認識相関特徴量生成部１２６において相関特徴量に変換される。学習相関特徴量生成部９８では、学習特徴量のそれぞれについてモデル特徴量との相関を求めることにより、様々な特徴量を同じ次元で比較し、その結果を、認識器生成部１００における認識器の学習のために供給する。また、認識相関特徴量生成部１２６では、認識特徴量のそれぞれについて、選択特徴量記憶部１２１に記憶されているさまざまな特徴量のうちの同じ次元の選択特徴量との相関を求めることにより、様々な特徴量を同じ次元で比較し、その結果を、認識処理に用いるために、認識処理部１２７に供給する。 As described above, the feature points and feature quantities used in the learning process and the recognition process can be defined as appropriate according to the type of feature quantity. The feature quantity obtained in this way is converted into a correlation feature quantity in the learning correlation feature quantity generation unit 98 and the recognized correlation feature quantity generation unit 126. The learning correlation feature value generation unit 98 compares various feature values in the same dimension by obtaining a correlation with the model feature value for each of the learning feature values, and the result is compared with the recognition device in the recognizer generation unit 100. Supply for learning. In addition, the recognition correlation feature value generation unit 126 obtains a correlation between each of the recognition feature values and the selected feature value of the same dimension among various feature values stored in the selection feature value storage unit 121. Various feature values are compared in the same dimension, and the result is supplied to the recognition processing unit 127 for use in recognition processing.

一般に、特徴量を表す２つのベクトルｖ１およびｖ２の相関値Ｃは、次の式（１３）により算出される。なお、ベクトルの上線は、そのベクトルの平均を表す。 In general, the correlation value C between the two vectors v1 and v2 representing the feature amount is calculated by the following equation (13). Note that the upper line of the vector represents the average of the vectors.

式（１３）に示される相関値Ｃは、０．０から１．０までの範囲の値となり、相関が高いほど１．０に近く、相関が低いほど０．０に近い値を示す。 The correlation value C shown in the equation (13) is a value in the range from 0.0 to 1.0. The higher the correlation, the closer to 1.0, and the lower the correlation, the closer to 0.0.

また、相関値を求める際には、エラスティック・バンチ・グラフ・マッチング（ＥＢＧＭ）法を利用してもよい。学習相関特徴量生成部９８は、このＥＢＧＭ法を用いた場合、学習特徴量のうち、モデル特徴量記憶部９４に記憶されたモデル特徴量に対応する特徴点の近傍で最も相関の高い点（相関最大点）を求め、その相関最大点における相関値を学習相関特徴量として利用する。また、認識相関特徴量生成部１２６は、このＥＢＧＭ法を用いた場合、認識特徴量のうち、選択特徴量記憶部１２１に記憶された選択特徴量に対応する特徴点の近傍で最も相関の高い点（相関最大点）を求め、その相関最大点における相関値を認識相関特徴量として利用する。 Further, when obtaining the correlation value, an elastic bunch graph matching (EBGM) method may be used. When this EBGM method is used, the learning correlation feature quantity generation unit 98 has the highest correlation point in the vicinity of the feature point corresponding to the model feature quantity stored in the model feature quantity storage unit 94 among the learning feature quantities ( (Correlation maximum point) is obtained, and the correlation value at the maximum correlation point is used as a learned correlation feature quantity. Further, when this EBGM method is used, the recognition correlation feature quantity generation unit 126 has the highest correlation in the vicinity of the feature point corresponding to the selection feature quantity stored in the selection feature quantity storage unit 121 among the recognition feature quantities. A point (maximum correlation point) is obtained, and the correlation value at the maximum correlation point is used as a recognized correlation feature quantity.

図２５を用いて、ＥＢＧＭ法による相関最大点の探索例について説明する。ここでは、学習装置７１における学習処理における場合を例として説明する。 A search example of the maximum correlation point by the EBGM method will be described with reference to FIG. Here, the case of the learning process in the learning device 71 will be described as an example.

図２５に示されるように、モデル画像において特徴点αが生成されると、特徴点αに対応する学習画像上の点α´が定まる。学習相関特徴量生成部９８は、学習画像上の点α´の近傍において、特徴点αとの間の相関値を算出して、相関最大点βを求める。この相関最大点βにおける相関値が学習相関特徴量となる。 As shown in FIG. 25, when the feature point α is generated in the model image, the point α ′ on the learning image corresponding to the feature point α is determined. The learning correlation feature value generation unit 98 calculates a correlation value with the feature point α in the vicinity of the point α ′ on the learning image to obtain the maximum correlation point β. The correlation value at the maximum correlation point β becomes the learning correlation feature amount.

このように、相関特徴量を求める際にＥＢＧＭ法を利用することにより、物体の歪みや視点の変化対してロバストになり、これら外乱に対してより柔軟に対応することができるようになる。 As described above, by using the EBGM method when obtaining the correlation feature amount, the object distortion and viewpoint change are robust, and it becomes possible to deal with these disturbances more flexibly.

なお、ここでは、学習相関特徴量生成部９８において学習相関特徴量を求める際にＥＢＧＭ法を利用する場合について説明したが、認識相関特徴量生成部１２６において認識相関特徴量を求める際にも同様にＥＢＧＭ法を適用することができる。 Here, the case where the EBGM method is used when the learning correlation feature value generation unit 98 obtains the learning correlation feature value has been described, but the same applies when the recognition correlation feature value generation unit 126 obtains the recognition correlation feature value. The EBGM method can be applied to.

次に、図２６を参照して、複数種類の特徴量による相関特徴量算出の例について説明する。ここでは、学習装置７１における学習処理における場合を例として説明する。 Next, an example of correlation feature amount calculation using a plurality of types of feature amounts will be described with reference to FIG. Here, the case of the learning process in the learning device 71 will be described as an example.

図２６に示されるように、モデル特徴量記憶部９４に記憶されたモデル特徴量の種類として、例えば、色に関するモデル特徴量、形に関するモデル特徴量、および、動きに関するモデル特徴量があるものとする。 As shown in FIG. 26, the types of model feature values stored in the model feature value storage unit 94 include, for example, a model feature value related to color, a model feature value related to shape, and a model feature value related to motion. To do.

学習相関特徴量生成部９８は、特徴量の種類ごとに相関を算出する。すなわち、図２６における場合、学習相関特徴量生成部９８は、色に関する相関を算出する相関算出部１４１、形に関する相関を算出する相関算出部１４２、および、動きに関する相関を算出する相関算出部１４３を含む。相関算出部１４１は、色に関するモデル特徴量について、学習特徴量生成部９７により生成された学習特徴量の中から、対応する学習特徴量として色に関する学習特徴量を抽出して、これらの間の相関値を算出し、認識器生成部１００に出力する。同様に、相関算出部１４２は、形に関するモデル特徴量について、学習特徴量生成部９７により生成された学習特徴量の中から、対応する学習特徴量として形に関する学習特徴量を抽出して、これらの間の相関値を算出し、認識器生成部１００に出力する。また、相関算出部１４３は、動きに関するモデル特徴量について、学習特徴量生成部９７により生成された学習特徴量の中から、対応する学習特徴量として動きに関する学習特徴量を抽出して、これらの間の相関値を算出し、認識器生成部１００に出力する。 The learning correlation feature value generation unit 98 calculates a correlation for each type of feature value. That is, in the case of FIG. 26, the learning correlation feature value generation unit 98 includes a correlation calculation unit 141 that calculates a color-related correlation, a correlation calculation unit 142 that calculates a shape-related correlation, and a correlation calculation unit 143 that calculates a motion-related correlation. including. The correlation calculation unit 141 extracts a learning feature amount related to color as a corresponding learning feature amount from the learning feature amount generated by the learning feature amount generation unit 97 for the model feature amount related to color, and The correlation value is calculated and output to the recognizer generation unit 100. Similarly, the correlation calculation unit 142 extracts the learning feature amount related to the shape as the corresponding learning feature amount from the learning feature amount generated by the learning feature amount generation unit 97 for the model feature amount related to the shape. Is calculated and output to the recognizer generation unit 100. In addition, the correlation calculation unit 143 extracts a learning feature amount related to the motion as a corresponding learning feature amount from the learning feature amount generated by the learning feature amount generation unit 97 for the model feature amount related to the motion. The correlation value between them is calculated and output to the recognizer generation unit 100.

このように、相関算出部１４１乃至１４３は、それぞれ異なる種類の特徴量について相関値を算出する。元々の特徴量自体は、特徴量の種類によってベクトルの次元が異なるため、互いにそのままの形で比較することは難しい。しかし、学習相関特徴量生成部９８においては、相関の度合いに応じて一定の範囲の値（０．０から１．０）を示す相関特徴量に正規化するため、異なる種類の特徴量であっても互換性を有する。 In this way, the correlation calculation units 141 to 143 calculate correlation values for different types of feature amounts. Since the original feature values themselves have different vector dimensions depending on the type of feature values, it is difficult to compare them as they are. However, since the learning correlation feature value generation unit 98 normalizes to a correlation feature value indicating a value in a certain range (0.0 to 1.0) according to the degree of correlation, it is a different type of feature value. Even compatible.

そして、認識器生成部１００は、このような相関特徴量を用いて認識器の学習を行い、認識を行うことによって、様々な種類の特徴量を用いた統計学習による物体認識を実現することができる。 And the recognizer production | generation part 100 implement | achieves the object recognition by the statistical learning using various kinds of feature-values by learning a recognizer using such a correlation feature-value, and performing recognition. it can.

なお、ここでは、学習相関特徴量生成部９８において学習特徴量から学習相関特徴量に変換する際の処理について説明したが、認識相関特徴量生成部１２６において認識特徴量から認識相関特徴量に変換する場合も、基本的に同様の処理が実行される。また、ここでは、色、形、動きの異なる３種類の特徴量を用いる場合について説明したが、特徴量の種類や種類の数は、これに限らないことはいうまでもない。 Here, the processing when the learning correlation feature value generation unit 98 converts the learning feature value into the learning correlation feature value has been described. However, the recognition correlation feature value generation unit 126 converts the recognition feature value into the recognition correlation feature value. In this case, basically the same processing is executed. Although the case where three types of feature quantities having different colors, shapes, and movements are used has been described here, it goes without saying that the types of feature quantities and the number of types are not limited thereto.

次に、図２７および図２８を参照して、認識器生成部１００において実行される学習処理の例について説明する。 Next, an example of learning processing executed in the recognizer generation unit 100 will be described with reference to FIGS. 27 and 28.

図２７では、Ｍ個の学習画像（ＰＩ₁乃至ＰＩ_M）の相関特徴量の各々は、モデル特徴量記憶部９４に記憶されたモデル特徴量の特徴点の数Ｎに対応するＮ次元のベクトルとして表されている。すなわち、１個目の学習画像ＰＩ₁の相関特徴量は（Ａ₁，Ａ₂，・・・，Ａ_N）、２個目の学習画像ＰＩ₂の相関特徴量は（Ｂ₁，Ｂ_2,・・・，Ｂ_N）、３個目の学習画像ＰＩ₃の相関特徴量は（Ｃ₁，Ｃ₂，・・・，Ｃ_N）と表され、同様の要領でＭ個目の学習画像ＰＩ_Mの相関特徴量は（Ｍ₁，Ｍ₂，・・・，Ｍ_N）と表される。 In FIG. 27, each of the correlation feature amounts of the _M learning images (PI _{1 to} PI _M ) is an N-dimensional vector corresponding to the number N of feature points of the model feature amount stored in the model feature amount storage unit 94. It is expressed as That is, the correlation feature quantity of the _first learning image PI ₁ is (A ₁ , A ₂ ,..., A _N ), and the correlation feature quantity of the _second learning image PI ₂ is (B ₁ , B _2, .., B _N ) The correlation feature amount of the _third learning image PI ₃ is represented as (C ₁ , C ₂ ,..., C _N ), and the M-th learning image PI in the same manner. correlation feature quantity of _M is represented as _{_{(M 1, M 2, ···}} , M N).

このとき、モデル特徴量の特徴点ｋに対してグループＧ_rkを想定すると、特徴点ｋ＝１の相関特徴量はグループＧ_r1によって示される（Ａ₁，Ｂ₁，Ｃ₁，・・・，Ｍ₁）であり、同様に、特徴点ｋ＝２の相関特徴量はグループＧ_r2によって示される（Ａ₂，Ｂ₂，Ｃ₂，・・・，Ｍ₂）であり、同様の要領で、特徴点ｋ＝Ｎの相関特徴量はグループＧ_rNによって示される（Ａ_N，Ｂ_N，Ｃ_N，・・・，Ｍ_N）となる。すなわち、各特徴点ｋについて、Ｍ個の学習画像ＰＩ₁乃至ＰＩ_Mに対応して計Ｍ個の相関特徴量のグループＧ_rkが定義されることになる。 At this time, assuming the group G _rk for the feature point k of the model feature amount, the correlation feature amount of the feature point k = 1 is indicated by the group G _r1 (A ₁ , B ₁ , C ₁ ,... M ₁ ), and similarly, the correlation feature quantity of the feature point k = 2 is (A ₂ , B ₂ , C ₂ ,..., M ₂ ) indicated by the group G _r2 . The correlation feature quantity of the feature point k = N is _represented by the group G _rN (A _N , B _N , C _N ,..., M _N ). That is, for each feature point k, a total of M correlation feature amount groups G _rk are defined corresponding to the _M learning images PI _{1 to} PI _M.

なお、左端の「＋１」もしくは「−１」の値は、正誤情報取得部９９から供給される、対応する学習画像が認識対象を含むか否かを示している学習画像ごとの正誤情報である。 Note that the value “+1” or “−1” at the left end is correct / incorrect information for each learning image that is supplied from the correct / incorrect information acquisition unit 99 and indicates whether or not the corresponding learning image includes a recognition target. .

特徴点ｋ毎に、各学習画像（ＰＩ_i）（ｉは、１乃至Ｍのいずれか）に設定された重みｗｉに応じて、相関特徴量がＭ個抽選で抽出される。最初の処理においては、いずれの重みｗｉも等しく、Ｍ個が抽選されると確率的には全ての相関特徴量が選択されることになるため、最初の処理では各特徴点ｋにおいて全ての相関特徴量が選択されたものとする。これ以降の繰り返しにおいては、同一の相関特徴量が重複して選択されることもあり得る。 For each feature point k, M correlation feature amounts are extracted by lottery according to the weight wi set for each learning image (PI _i ) (i is any one of 1 to M). In the first process, all the weights wi are equal, and if M pieces are selected, all the correlation feature quantities are selected probabilistically. Therefore, in the first process, all correlations are performed at each feature point k. It is assumed that a feature amount is selected. In the subsequent iterations, the same correlation feature amount may be selected redundantly.

そして、Ｎ個の入力特徴量のそれぞれについてサンプリングされたＭ個の入力特徴量は、昇べきの順、または、降べきの順に並び替えられる。そして、入力特徴量が抽出された学習用画像に認識しようとする対象物体が含まれている画像であるか否かを示す正誤情報、すなわち、図２７における（＋１）または（−１）に基づいて、昇べきの順、または、降べきの順に並び替えられたＮ個の入力特徴量のそれぞれについて、グループＧ_rkにおける特徴量を２つに分けるように設定されるある閾値ｔｈ_jkを設定したとき、その閾値以上と閾値以下で、正誤が正しく分かれるか否か、閾値を変化させながら特徴点ｋ毎のグループＧ_rkの誤り率ｅ_jkを、次の式（１４）により計算して、この誤り率ｅ_jkが最小となるように閾値を設定する。但し、ｊは特徴点ｋにおける相関特徴量ベクトルｘに対するＬ個（Ｌは１以上の整数）の弱認識器ｆ_jk（ｘ）の番号をカウントするカウンタであり、１からＬの範囲を示す整数である。 The M input feature values sampled for each of the N input feature values are rearranged in ascending order or descending order. Then, based on correct / incorrect information indicating whether or not the learning target image from which the input feature amount is extracted includes the target object to be recognized, that is, (+1) or (−1) in FIG. Then, for each of the N input feature quantities rearranged in ascending order or descending order, a certain threshold th _jk is set to divide the feature quantity in the group G _rk into two. When the error rate e _jk of the group G _rk for each feature point k is calculated by the following equation (14) while changing the threshold value, whether or not right and wrong are correctly divided between the threshold value and the threshold value is The threshold is set so that the error rate e _jk is minimized. Here, j is a counter that counts the number of L weak recognizers f _jk (x) (L is an integer of 1 or more) with respect to the correlation feature quantity vector x at the feature point k, and is an integer indicating a range from 1 to L It is.

ここで、ｙ≠fjkは、エラーとなっている特徴点ｋの条件を示しており、Ｅwは、エラーの発生した特徴点ｋにおける重みが加算されることを示している。 Here, y ≠ fjk indicates the condition of the feature point k in error, and Ew indicates that the weight at the feature point k where the error has occurred is added.

そして、この閾値ｔｈ_jkが、弱認識器として設定される。 This threshold th _jk is set as a weak recognizer.

図２８に示される例では、Ｊ＝１として、１つ目の特徴点ｋ＝１における閾値ｔｈ₁₁の設定例を示している。具体的には、例えば、特徴点ｋ＝１に対応するＭ個の特徴量が、図２８で示されるようにＬ₁，Ａ₁，Ｃ₁，Ｂ₁，・・・，Ｍ₁に昇べき、または、降べきの順に並べられ、閾値より小さい範囲では、認識しようとする対象物体がないと認識し、閾値より大きい範囲では、認識しようとする対象物体があると認識する。ここで、教師ラベルｙ（すなわち、正誤情報）および弱認識器ｆ_jk（ｘ）は、認識対象の有無によって「＋１」もしくは「−１」の値を示し、両者が一致した場合には予想が的中したことを示す。図２８に示されるように、閾値th₁₁が特徴量Ａ₁とＣ₁の間に設定されたときには、図中の点線で囲まれた特徴量Ａ₁は、認識しようとする対象物体が含まれた学習用画像の特徴量であり、一方、特徴量Ｃ₁および特徴量Ｍ₁は、認識しようとする対象物体が含まれない学習用画像の特徴量であるので、エラーであるとみなされる。そして、Ｅｗの値は、予想が外れた場合に、誤りが生じたものとして誤り回数の累算が行われることにより設定される。 In the example shown in FIG. 28, an example of setting the threshold th _{11 at} the first feature point k = 1 is shown with J = 1. Specifically, for example, M feature amounts corresponding to the feature point k = 1 should rise to L ₁ , A ₁ , C ₁ , B ₁ ,..., M ₁ as shown in FIG. If the range is smaller than the threshold value, it is recognized that there is no target object to be recognized, and if it is larger than the threshold value, it is recognized that there is a target object to be recognized. Here, the teacher label y (that is, correct / incorrect information) and the weak recognizer f _jk (x) indicate a value of “+1” or “−1” depending on the presence / absence of a recognition target. It shows what was right. As shown in FIG. 28, when the threshold th ₁₁ is set between the feature amounts A ₁ and C ₁ , the feature amount A ₁ surrounded by the dotted line in the figure includes the target object to be recognized. On the other hand, the feature amount C ₁ and the feature amount M ₁ are regarded as errors because they are the feature amounts of the learning image that do not include the target object to be recognized. Then, the value of Ew is set by accumulating the number of errors as if an error has occurred when the prediction is wrong.

このようにして、学習用画像の正誤情報（認識しようとする対象物体が含まれているか否かの情報）に基づいて、エラーであるとみなされた特徴量が抽出された学習用画像の重みＷiが加算されて、誤り率ｅ_jkが計算される。 In this way, the weight of the learning image from which the feature amount considered to be an error is extracted based on the correctness information of the learning image (information on whether or not the target object to be recognized is included) The error rate e _jk is calculated by adding Wi.

このようにして誤り率ｅ_jkが計算されると、次に、設定された弱認識器ｆ_jk（ｘ）のうち、誤り率ｅ_jkが最小となる弱認識器ｆ_jk（ｘ）が選択される。そして、その弱認識器ｆ_jk（ｘ）の信頼度ｃ_jが、誤り率ｅ_jkを用いて、次の式（１５）によって計算される。 When the error rate e _jk is calculated in this way, the weak recognizer f _jk (x) having the smallest error rate e _jk is then selected from the set weak recognizers f _jk (x). The Then, the reliability c _{j of} the weak recognizer f _jk (x) is calculated by the following equation (15) using the error rate e _jk .

そして、さらにこのようにして得られた信頼度ｃ_jによって学習画像の重みｗｉ（ｉは１からＮの範囲を示す整数）が、次の式（１６）を用いて演算されて、ｗｉの合計が１となるようにさらに正規化された後、更新される。 Further, the weight wi (i is an integer indicating a range from 1 to N) of the learning image is calculated using the following expression (16) based on the reliability c _j thus obtained, and the sum of wi Is further normalized and then updated.

これにより、誤りの発生した相関特徴量を含む学習画像の重みが大きくなり、再度学習を要する学習画像が明確に区別されることになる。 As a result, the weight of the learning image including the correlation feature quantity in which the error has occurred becomes large, and the learning image that needs to be learned again is clearly distinguished.

このようにして選択された弱認識器ｆ_jk（ｘ）が、式（１５）に示される信頼度ｃ_jによって重み付けされて、相関特徴量ベクトルｘに対する認識器Ｒ（ｘ）が次の式（１７）のように更新される。 The weak recognizer f _jk (x) selected in this way is weighted by the reliability c _j shown in Expression (15), and the recognizer R (x) for the correlation feature vector x is expressed by the following expression ( It is updated as in 17).

すなわち、重み付けされた弱認識器f_jkが、既に保持されている認識器Ｒ（ｘ）に加算され、新たな認識器Ｒ（ｘ）として更新される。すなわち、生成される認識器Ｒ（ｘ）は、比較的誤り率の低い複数の弱認識器f_jkにより構成される。 That is, the weighted weak recognizer f _jk is added to the already held recognizer R (x), and is updated as a new recognizer R (x). That is, the generated recognizer R (x) includes a plurality of weak recognizers f _jk having a relatively low error rate.

認識器生成部１００は、このような学習処理を繰り返し、その結果、Ｒ（ｘ）が正の数であれば認識対象を含むことを示し、負の数であれば認識対象を含まないことを示す認識器Ｒ（ｘ）を生成することができる。すなわち、この認識器は、弱認識器の多数決により、認識しようとする対象物体の有無を出力する関数である。認識器生成部１００は、生成された認識器を、認識装置７２の認識器記憶部１２２に供給して記憶させる。 The recognizer generation unit 100 repeats such learning processing, and as a result, if R (x) is a positive number, it indicates that the recognition target is included, and if it is a negative number, it indicates that the recognition target is not included. The shown recognizer R (x) can be generated. That is, this recognizer is a function that outputs the presence / absence of a target object to be recognized by the majority decision of weak recognizers. The recognizer generation unit 100 supplies the generated recognizer to the recognizer storage unit 122 of the recognition device 72 to be stored.

そして、認識器生成部１００は、誤り率ｅ_jkが最小となるそれぞれの弱認識器f_jkで使用されるべき特徴点ｋのモデル特徴量を選択して、選択特徴量として出力する。出力された選択特徴量は、認識装置７２の選択特徴量記憶部１２１に記憶される。 Then, the recognizer generation unit 100 selects the model feature quantity of the feature point k to be used in each weak recognizer f _jk that minimizes the error rate e _jk and outputs it as the selected feature quantity. The output selected feature amount is stored in the selected feature amount storage unit 121 of the recognition device 72.

このように弱認識器を学習処理により重み付けしながら付加することを繰り返して認識器を生成する学習処理は、ブースティング（重み付き投票）アルゴリズムの一種であり、”Discrete AdaBoost Algorithm”と称される。この学習処理においては、誤り率の高い学習特徴量の重みが順次大きくなり、誤り率の低い学習特徴量の重みが小さくなるように、モデル特徴量ごとに認識器と誤り率が計算される処理が繰り返される。したがって、繰り返し処理の中で、認識器を設定する際に選択される学習相関特徴量は、徐々に誤り率の高いものが選択され易くなり、認識し難い学習相関特徴量が繰り返されるほどに選択されて学習が繰り返されることになるため、認識し難い学習画像の相関特徴量がより多く選択されることになり、最終的に高い認識率にすることが可能となる。 The learning process for generating a recognizer by repeatedly adding a weak recognizer while weighting it by the learning process is a kind of boosting (weighted voting) algorithm and is called “Discrete AdaBoost Algorithm”. . In this learning process, the recognizer and error rate are calculated for each model feature so that the weight of the learning feature with a high error rate increases sequentially and the weight of the learning feature with a low error rate decreases. Is repeated. Therefore, the learning correlation feature amount selected when setting the recognizer in the iterative process is gradually selected with a high error rate, and is selected so that the learning correlation feature amount that is difficult to recognize is repeated. Since learning is repeated, more correlation feature amounts of learning images that are difficult to recognize are selected, and a high recognition rate can be finally achieved.

また、このブースティングアルゴリズムによれば、Ｎ個の誤り率ｅ_jkのうち最小となる特徴点ｋのモデル特徴量が選択されて、選択特徴量記憶部１２１に記憶されていくため、認識器の学習と特徴量の選択を同時に行うことができ、認識フェーズにおいてモデル特徴量記憶部９４に記憶されている全ての特徴量を使用することなく、認識に適した特徴量を効率良く利用することができる。 Also, according to this boosting algorithm, the model feature quantity of the feature point k that is the smallest among the N error rates e _jk is selected and stored in the selected feature quantity storage unit 121. Learning and feature quantity selection can be performed at the same time, and feature quantities suitable for recognition can be efficiently used without using all the feature quantities stored in the model feature quantity storage unit 94 in the recognition phase. it can.

次に、図２９乃至図３７のフローチャートを参照して、図１９の画像処理システム５１が実行する処理について説明する。 Next, processing executed by the image processing system 51 in FIG. 19 will be described with reference to flowcharts in FIGS. 29 to 37.

まず、図２９のフローチャートを参照して、画像処理システム５１の学習装置７１が実行する学習処理について説明する。 First, the learning process executed by the learning device 71 of the image processing system 51 will be described with reference to the flowchart of FIG.

ステップＳ１１において、モデル画像取得部９１は、モデル画像の特徴量の抽出に用いるモデル画像を取得し、モデル特徴点生成部９２に供給する。 In step S 11, the model image acquisition unit 91 acquires a model image used for extracting the feature amount of the model image and supplies the model image to the model feature point generation unit 92.

なお、ここでは、モデル画像取得部９１は、認識するべき部分のみを示すモデル画像を取得したり、または、ユーザの操作などにより、所定の画像データから、認識するべき部分のみを抽出して、モデル特徴点生成部９２に出力するものであってもよいが、図４を用いて説明した画像処理部１１と同様の構成を有するものとしても良い。 Here, the model image acquisition unit 91 acquires a model image indicating only a portion to be recognized, or extracts only a portion to be recognized from predetermined image data by a user operation or the like, Although it may be output to the model feature point generation unit 92, it may have the same configuration as the image processing unit 11 described with reference to FIG.

ステップＳ１２において、モデル特徴点生成部９２は、モデル画像取得部９１から供給されたモデル画像の特徴点を生成し、モデル特徴量生成部９３に供給する。例えば、１つのモデル画像についてＮ個のモデル特徴点が生成される In step S 12, the model feature point generation unit 92 generates a feature point of the model image supplied from the model image acquisition unit 91 and supplies the feature point to the model feature amount generation unit 93. For example, N model feature points are generated for one model image.

ステップＳ１３において、モデル特徴量生成部９３は、モデル画像のモデル特徴点における特徴量を生成し、モデル特徴量記憶部９４に供給する。例えば、Ｎ個のモデル特徴点が生成された場合、Ｎ個のモデル特徴点におけるＮ個のモデル特徴量がモデル特徴量生成部９３によって生成される。 In step S 13, the model feature quantity generation unit 93 generates a feature quantity at the model feature point of the model image and supplies it to the model feature quantity storage unit 94. For example, when N model feature points are generated, N model feature amounts at the N model feature points are generated by the model feature amount generation unit 93.

ステップＳ１４において、モデル特徴量記憶部９４は、モデル特徴量生成部９３によって生成されたモデル特徴点におけるモデル特徴量を記憶する。 In step S 14, the model feature quantity storage unit 94 stores the model feature quantity at the model feature point generated by the model feature quantity generation unit 93.

そして、ステップＳ１５において、全てのモデル画像の特徴量がモデル特徴量記憶部９４に記憶されたか否かが判断され、記憶されていないと判断された場合、処理は、ステップＳ１１に戻り、それ以降の処理が繰り返される。 In step S15, it is determined whether or not the feature values of all model images are stored in the model feature value storage unit 94. If it is determined that the feature values are not stored, the process returns to step S11, and thereafter. The process is repeated.

ここでは、１枚のモデル画像の取得に対して、ステップＳ１２乃至ステップ１５の処理が実行され、複数のモデル画像が取得されたとき、これらの処理が繰り返されるものとして説明したが、ステップＳ１１において複数のモデル画像を取得し、それ以降の処理が、それぞれのモデル画像ごとに、順次、または、並行して実行されるものであっても良い。 Here, it has been described that the processing of step S12 to step 15 is executed for the acquisition of one model image, and these processes are repeated when a plurality of model images are acquired. However, in step S11, A plurality of model images may be acquired, and the subsequent processing may be executed sequentially or in parallel for each model image.

ステップＳ１５において、全てのモデル画像の特徴量がモデル特徴量記憶部９４に記憶されたと判断された場合、ステップＳ１６において、図３０を用いて後述する学習画像取得処理が実行される。 If it is determined in step S15 that the feature values of all model images have been stored in the model feature value storage unit 94, a learning image acquisition process described later with reference to FIG. 30 is executed in step S16.

ステップＳ１７において、学習特徴点生成部９６は、ステップＳ１６の処理により取得された学習画像の特徴点を生成し、学習特徴量生成部９７に供給する。 In step S 17, the learning feature point generation unit 96 generates a feature point of the learning image acquired by the process of step S 16 and supplies the learning feature amount generation unit 97.

ステップＳ１８において、学習特徴量生成部９７は、学習特徴点生成部９６によって生成された学習特徴点における学習画像の特徴量を生成し、学習相関特徴量生成部９８に供給する。 In step S 18, the learning feature amount generation unit 97 generates a feature amount of the learning image at the learning feature point generated by the learning feature point generation unit 96, and supplies it to the learning correlation feature amount generation unit 98.

ステップＳ１９において、図３２を用いて後述する学習相関特徴量生成処理が実行される。この処理は、モデル特徴量記憶部９４に記憶された、例えば、Ｎ個のモデル特徴量のそれぞれに対して、学習画像の各々における学習特徴点の学習特徴量との間の相関値が学習相関特徴量生成部９８によって生成され、最も相関の高いものが学習相関特徴量とされる処理である。 In step S19, a learning correlation feature value generation process, which will be described later with reference to FIG. 32, is executed. In this process, for example, for each of N model feature amounts stored in the model feature amount storage unit 94, a correlation value between learning feature amounts of learning feature points in each of the learning images is a learning correlation. The process that is generated by the feature quantity generation unit 98 and has the highest correlation is the learning correlation feature quantity.

ステップＳ２０において、図３３を用いて後述する認識器生成処理が実行される。この処理において、ステップＳ１９において生成された学習相関特徴量に基づいて認識器生成部１００によって統計学習が行われる。 In step S20, a recognizer generation process to be described later with reference to FIG. 33 is executed. In this process, statistical learning is performed by the recognizer generation unit 100 based on the learning correlation feature amount generated in step S19.

そして、ステップＳ２１において、供給される全ての学習画像に対して処理が終了したか否かが判断される。ステップＳ２１において、供給される全ての学習画像に対する処理が終了していないと判断された場合、処理は、ステップＳ１６に戻り、それ以降の処理が繰り返される。ステップＳ２１において、供給される全ての学習画像に対する処理が終了したと判断された場合、ステップＳ２０の処理により生成された認識器が、認識装置７２の認識器記憶部１２２に供給されて記憶されるとともに、その過程で選択されたモデル特徴量が、認識装置７２の選択特徴量記憶部１２１に供給されて記憶され、処理は終了される。 In step S21, it is determined whether or not processing has been completed for all supplied learning images. If it is determined in step S21 that the processing for all supplied learning images has not been completed, the processing returns to step S16, and the subsequent processing is repeated. If it is determined in step S21 that the processing for all supplied learning images has been completed, the recognizer generated by the processing in step S20 is supplied to and stored in the recognizer storage unit 122 of the recognition device 72. At the same time, the model feature value selected in the process is supplied to and stored in the selected feature value storage unit 121 of the recognition device 72, and the process is terminated.

このような処理により、学習処理が実行されて、画像に含まれる物体を認識可能な認識器が生成される。この処理では、学習特徴量を学習相関特徴量に変換して認識器の学習を行うため、種類の異なる特徴量を同じスケールの下で扱い、統計学習させることを可能とする。 Through such processing, learning processing is executed, and a recognizer capable of recognizing an object included in the image is generated. In this process, the learning feature value is converted into a learning correlation feature value and learning of the recognizer is performed, so that different types of feature values can be handled under the same scale and statistical learning can be performed.

次に、図３０のフローチャートを参照して、図２９のステップＳ１６において実行される学習画像取得処理について説明する。この学習画像取得処理は、図４を用いて説明した画像処理部１１と同様の構成を有しており、内部にフォーカスカメラを備えて学習画像を撮像するとともに、フォーカスカメラにより撮像して得られた学習画像データから焦点が合致した部分を抽出することができる学習画像取得部９５によって実行される。したがって、図３０のフローチャートにおいては、図４および図５を用いて説明した画像処理部１１の構成を学習画像取得部９５が有しているものとして説明する。 Next, the learning image acquisition process executed in step S16 in FIG. 29 will be described with reference to the flowchart in FIG. This learning image acquisition process has the same configuration as that of the image processing unit 11 described with reference to FIG. 4, and is obtained by capturing a learning image with a focus camera therein and by capturing with a focus camera. This is executed by the learning image acquisition unit 95 that can extract the portion in focus from the learning image data. Therefore, in the flowchart of FIG. 30, it is assumed that the learning image acquisition unit 95 has the configuration of the image processing unit 11 described with reference to FIGS.

ステップＳ４１において、学習画像取得部９５の画像取得部２１は、所定の焦点距離で、撮像処理を実行し、得られた画像を背景分離処理部２２に供給する。 In step S 41, the image acquisition unit 21 of the learning image acquisition unit 95 executes an imaging process at a predetermined focal length, and supplies the obtained image to the background separation processing unit 22.

ステップＳ４２において、学習画像取得部９５の背景分離処理部２２は、図３１のフローチャートを用いて後述する背景分離処理を実行する。 In step S42, the background separation processing unit 22 of the learning image acquisition unit 95 performs background separation processing described later using the flowchart of FIG.

ステップＳ４３において、学習画像取得部９５の背景分離処理部２２は、背景が分離された画像には、認識するべき物体が存在するか否かを判断する。ステップＳ４３において、例えば、画像データ全体がピンボケであった場合など、認識するべき物体が存在しないと判断された場合、処理は、ステップＳ４１に戻り、それ以降の処理が繰り返される。 In step S43, the background separation processing unit 22 of the learning image acquisition unit 95 determines whether there is an object to be recognized in the image from which the background is separated. In step S43, when it is determined that there is no object to be recognized, for example, when the entire image data is out of focus, the process returns to step S41, and the subsequent processes are repeated.

ステップＳ４４において、認識するべき物体が存在したと判断された場合、ステップＳ４３において、学習画像取得部９５の背景分離処理部２２は、背景が分離された画像データに対して、必要に応じて、アライメントなどの画像処理を施す。 If it is determined in step S44 that there is an object to be recognized, in step S43, the background separation processing unit 22 of the learning image acquisition unit 95 performs image data on which the background has been separated, as necessary. Image processing such as alignment is performed.

ステップＳ４５において、学習画像取得部９５の背景分離処理部２２は、背景が分離された認識されるべき物体に対応する画像を、学習特徴点生成部９６および学習特徴量生成部９７に出力し、処理は、図２９のステップＳ１６に戻り、ステップＳ１７に進む。 In step S45, the background separation processing unit 22 of the learning image acquisition unit 95 outputs an image corresponding to the object to be recognized from which the background is separated to the learning feature point generation unit 96 and the learning feature amount generation unit 97, The process returns to step S16 in FIG. 29 and proceeds to step S17.

このような処理により、学習用画像データが取得される。画像データはフォーカスカメラにより撮像されるので、焦点が合致する位置に存在する認識するべき物体を、容易な処理で背景から分離することが可能である。 Through such processing, learning image data is acquired. Since the image data is picked up by the focus camera, it is possible to separate the object to be recognized existing at the position where the focus is matched from the background by an easy process.

次に、図３１のフローチャートを参照して、図３０のステップＳ４２において実行される背景分離処理について説明する。このフローチャートにおいても、図４および図５を用いて説明した画像処理部１１の構成を用いて処理を説明する。 Next, the background separation process executed in step S42 of FIG. 30 will be described with reference to the flowchart of FIG. Also in this flowchart, processing will be described using the configuration of the image processing unit 11 described with reference to FIGS. 4 and 5.

ステップＳ８１において、背景分離処理部２２の近傍画素差分フィルタ計算処理部３１は、式(１)の演算を実行し、図６に示されるような近傍画素差分フィルタによる計算処理を行って、図７を用いて説明したような出力を近傍領域和フィルタ計算処理部３２に供給する。 In step S81, the neighboring pixel difference filter calculation processing unit 31 of the background separation processing unit 22 performs the calculation of Expression (1), performs the calculation process using the neighboring pixel difference filter as shown in FIG. The output as described above is supplied to the neighborhood region sum filter calculation processing unit 32.

ステップＳ８２において、近傍領域和フィルタ計算処理部３２は、図８を用いて説明したような近傍領域和フィルタによる計算処理を行うことにより、注目画素とその近傍領域との平均を求め、閾値処理部３３に供給する。近傍領域和フィルタ計算処理部３２による和フィルタ計算においては、適用する窓サイズにより、結果が変化するので、認識するべき物体のテクスチャの大小などによって決まる最適な窓サイズを用いることができるようにすると好適である。 In step S82, the neighborhood region sum filter calculation processing unit 32 obtains an average of the pixel of interest and the neighborhood region by performing a computation process using the neighborhood region sum filter as described with reference to FIG. 33. In the sum filter calculation by the neighborhood region sum filter calculation processing unit 32, the result changes depending on the window size to be applied. Therefore, an optimum window size determined by the size of the texture of the object to be recognized can be used. Is preferred.

ステップＳ８３において、閾値処理部３３は、和フィルタ計算結果を所定の閾値で分離する、すなわち、２値化処理を行うことや、和フィルタ計算結果のうち値が発生している部分（黒い領域ではない部分）をマスクする閾値処理を行うことなどにより、背景部分と認識するべき物体に対応する部分とを分離する。 In step S83, the threshold processing unit 33 separates the sum filter calculation result by a predetermined threshold, that is, performs binarization processing, or a portion where a value is generated in the sum filter calculation result (in a black region) The background portion is separated from the portion corresponding to the object to be recognized, for example, by performing threshold processing for masking the non-existing portion.

このような処理により、フォーカスカメラにより撮像された画像から、焦点が合致する位置に存在する認識するべき物体を、容易な処理で背景から分離することが可能である。 By such processing, it is possible to separate an object to be recognized existing at a position where the focus is matched from the background by an easy processing from the image captured by the focus camera.

次に、図３２のフローチャートを参照して、図２９のステップＳ１９において実行される、学習相関特徴量生成処理について説明する。 Next, the learning correlation feature value generation process executed in step S19 in FIG. 29 will be described with reference to the flowchart in FIG.

ステップＳ１１１において、学習相関特徴量生成部９８は、処理済の特徴量の数を示す変数ｋを、ｋ＝１とする。 In step S111, the learning correlation feature value generation unit 98 sets k = 1 as a variable k indicating the number of processed feature values.

ステップＳ１１２において、学習相関特徴量生成部９８は、モデル画像のＮ個の特徴量のうちｋ番目の特徴量である特徴量ｋのモデル特徴量について、学習画像の対応する特徴点における学習特徴量との相関値を生成する。 In step S 112, the learning correlation feature value generation unit 98 learns the feature value corresponding to the feature point of the learning image with respect to the model feature value of the feature value k that is the k-th feature value among the N feature values of the model image. A correlation value is generated.

ステップＳ１１３において、学習相関特徴量生成部９８は、生成された相関値から、最も相関の高い学習特徴量を選択する。 In step S113, the learning correlation feature value generation unit 98 selects the learning feature value having the highest correlation from the generated correlation values.

ステップＳ１１４において、学習相関特徴量生成部９８は、ステップＳ１１３で選択された学習特徴量の相関値を、特徴量ｋの学習相関特徴量とする。 In step S114, the learning correlation feature value generation unit 98 sets the correlation value of the learning feature value selected in step S113 as the learning correlation feature value of the feature value k.

ステップＳ１１５において、学習相関特徴量生成部９８は、変数ｋは、１つの画像データに対する特徴量の総数Ｎであるか否かを判断する。 In step S115, the learning correlation feature value generation unit 98 determines whether or not the variable k is the total number N of feature values for one image data.

ステップＳ１１５において、変数ｋはＮではない、すなわち、Ｎに達していないと判断された場合、ステップＳ１１６において、学習相関特徴量生成部９８は、変数ｋを１インクリメントして、処理は、ステップＳ１１２に戻り、それ以降の処理が繰り返される。 If it is determined in step S115 that the variable k is not N, that is, it has not reached N, in step S116, the learning correlation feature value generation unit 98 increments the variable k by 1, and the process proceeds to step S112. Return to, and the subsequent processing is repeated.

ステップＳ１１５において、変数ｋはＮであると判断された場合、処理は、図２９のステップＳ１９に戻り、ステップＳ２０に進む。 If it is determined in step S115 that the variable k is N, the process returns to step S19 in FIG. 29 and proceeds to step S20.

このような処理により、例えば、図２５を用いて説明した手法を用いることなどにより、学習相関特徴量が生成される。また、図２６を用いて説明したように、学習相関特徴量は、異なる種類の特徴量(例えば、形、色、動きなど)でも、それぞれの相関を算出することにより求めることが可能である。 By such processing, for example, the learning correlation feature amount is generated by using the method described with reference to FIG. In addition, as described with reference to FIG. 26, the learning correlation feature quantity can be obtained by calculating the correlation of different types of feature quantities (for example, shape, color, motion, etc.).

次に、図３３のフローチャートを参照して、図２９のステップＳ２０において実行される、認識器生成処理について説明する。 Next, the recognizer generation process executed in step S20 of FIG. 29 will be described with reference to the flowchart of FIG.

ステップＳ１４１において、認識器生成部１００は、例えば、学習用画像毎の重みＷiを全て１／Ｍに初期化すると共に、カウンタQを１に、認識器R(x)を０にそれぞれ初期化する。ここで、ｉは、複数の学習用入力画像ＰＩiをそれぞれ識別するものであり、１＜ｉ＜Ｍである。したがって、ステップＳ１４１の処理により、全ての学習用画像ＰＩiは、いずれも正規化された同一の重み（＝１／Ｍ）に設定されることになる。 In step S141, for example, the recognizer generation unit 100 initializes all the weights Wi for each learning image to 1 / M, and initializes the counter Q to 1 and the recognizer R (x) to 0, respectively. . Here, i identifies each of the plurality of learning input images PIi, and 1 <i <M. Therefore, all the learning images PIi are set to the same normalized weight (= 1 / M) by the processing in step S141.

ステップＳ１４２において、認識器生成部１００は、特徴点ｋ（ｋ＝１，２，３，・・・Ｎ）のそれぞれの局所特徴量の組み合わせごと、すなわち、１枚の学習用画像に対して供給されたＮ×Ｐの特徴量ごとに、学習用入力画像ＰＩiの重みＷiに応じてＭ個の特徴量を選択する。 In step S142, the recognizer generation unit 100 supplies each local feature amount combination of the feature points k (k = 1, 2, 3,... N), that is, one learning image. For each of the N × P feature amounts, M feature amounts are selected according to the weight Wi of the learning input image PIi.

この場合、特徴点ｋ＝１の１つ目の局所特徴量の組み合わせにおける特徴量は、グループＧ_r1-1で示される（Ａ_1-1，Ｂ_1-1，Ｃ_1-1，・・・Ｍ_1-1）であり、同様に、特徴点ｋ＝１の２つ目の局所特徴量の組み合わせにおける特徴量は、グループＧ_r1-2で示される（Ａ_1-2，Ｂ_1-2，Ｃ_1-2，・・・Ｍ_1-2）であり、以下、同様に、特徴点ｋ＝ＮのＰ番目の局所特徴量の組み合わせにおける特徴量は、グループＧ_rN-Pで示される（Ａ_N-P，Ｂ_N-P，Ｃ_N-P，・・・Ｍ_N-P）となる。 In this case, the feature quantity in the first combination of local feature quantities at the feature point k = 1 is represented by a group G _r1-1 (A _1-1 , B _1-1 , C _1-1 ,... M _1-1 ). Similarly, the feature quantities in the second combination of local feature quantities at the feature point k = 1 are represented by a group G _r1-2 (A _1-2 , B _1-2 , C _1-2 ,... M _1-2 ), and similarly, the feature amount in the combination of the P-th local feature amount at the feature point k = N is indicated by a group G _rN-P (A _NP , _BNP , _CNP ,... _MNP ).

すなわち、各特徴点ｋのそれぞれの局所特徴量のＰ種類の組み合わせについて、学習用画像ＰＩiによるＭ個の特徴量のグループが設定されることになる。 That is, a group of M feature amounts based on the learning image PIi is set for the P types of combinations of the local feature amounts of the feature points k.

認識器生成部１００は、各特徴点ｋのそれぞれの局所特徴量のＰ種類の組み合わせごとに、各学習用画像ＰＩiに設定された重みに応じて特徴量をＭ個抽選で抽出する。最初の処理においては、いずれの重みＷiも等しいため、Ｍ個が抽選されると、確率的には全ての特徴量が選択されることになるので、ここでは、最初の処理では各特徴点ｋのそれぞれの局所特徴量の組み合わせにおいて、全ての特徴量が選択されたものとする。もちろん、実際には、同一の特徴量が重複して選択されることもある。 The recognizer generation unit 100 extracts M feature quantities by lottery according to the weight set for each learning image PIi for each P type combination of local feature quantities of each feature point k. In the first process, since all the weights Wi are equal, when M pieces are selected, all feature quantities are selected stochastically. Here, in the first process, each feature point k is selected. It is assumed that all feature quantities are selected in each combination of local feature quantities. Of course, in practice, the same feature amount may be selected in duplicate.

ステップＳ１４３において、認識器生成部１００は、Ｎ個の特徴点のそれぞれの局所特徴量の組み合わせごとにサンプリングされたＭ個の特徴量のグループ毎に、特徴量を昇べきの順、または、降べきの順に並び替える。 In step S143, the recognizer generation unit 100 increases or decreases the feature amount for each group of M feature amounts sampled for each combination of local feature amounts of the N feature points. Sort in order of power.

ステップＳ１４４において、認識器生成部１００は、入力特徴量が抽出された学習用画像に認識しようとする対象物体が含まれている画像であるか否かを示す情報に基づいて、特徴点ｋのそれぞれのＰ種類の局所特徴量の組み合わせごとに、Ｍ個の特徴量のそれぞれについて、閾値を変化させながら、上述した式（１４）で示すように誤り率ｅ_jkを計算させ、誤り率ｅ_jkが最小となるように閾値を設定する。ここで、特徴点ｋのそれぞれの局所特徴量の組み合わせごとの閾値th_jkが、1個の弱認識器f_jkとなる。すなわち、Ｎ個存在する特徴点ｋのそれぞれのＰ種類の局所特徴量の組み合わせごとについて、すなわち、Ｎ×Ｐ個の特徴量に応じて、Ｎ×Ｐ個の弱認識器f_jkが設定され、Ｎ×Ｐ個のそれぞれについて（弱認識器f_jkのそれぞれについて）誤り率ｅ_jkが求められることになる。ここで、認識器f_jkは、認識しようとする対象物体を含む場合「＋１」を出力し、認識しようとする対象物体を含まない場合「−１」を出力する関数である。 In step S144, the recognizer generation unit 100 determines whether the feature point k is based on information indicating whether the target object to be recognized is included in the learning image from which the input feature amount is extracted. for each combination of each of the P type local feature quantity for each of M feature quantity, while changing the threshold value, to calculate the error rate e _jk as shown in the above expression (14), the error rate e _jk The threshold value is set so that is minimized. Here, the threshold th _jk for each combination of local feature quantities of the feature point k is one weak recognizer f _jk . That is, N × P weak recognizers f _jk are set for each combination of P types of local feature amounts of N feature points k, that is, according to N × P feature amounts, The error rate e _jk is obtained for each of N × P (for each weak recognizer f _jk ). Here, the recognizer f _jk is a function that outputs “+1” when the target object to be recognized is included, and outputs “−1” when the target object to be recognized is not included.

すなわち、図２８を用いて説明した場合と同様にして、ある特徴点におけるある局所特徴量の組み合わせに対応する特徴量（得られた相関係数）が、昇べき、または、降べきの順に並べられた場合、設定された閾値th_jkの位置と、その閾値に対してどちら側に認識しようとする対象物体が含まれた学習用画像に対応する特徴量と認識しようとする対象物体が含まれない学習用画像に対応する特徴量が並べられているかに基づいて、エラーであるか否かが判定される。認識器生成部１００は、上述した式（１４）で示されるように、学習用入力画像の正誤情報（認識しようとする対象物体が含まれているか否かの情報）に基づいて、エラーであるとみなされた特徴量が抽出された学習用入力画像の重みＷiを加算し、誤り率ｅ_jkを計算する。 That is, in the same manner as described with reference to FIG. 28, feature amounts (obtained correlation coefficients) corresponding to combinations of a certain local feature amount at a certain feature point are arranged in ascending or descending order. The target value to be recognized as the feature quantity corresponding to the learning image including the position of the set threshold th _jk and the target object to be recognized on either side with respect to the threshold value. Whether or not there is an error is determined based on whether or not the feature amounts corresponding to the non-learning images are arranged. The recognizer generation unit 100 is an error based on correct / incorrect information (information on whether or not a target object to be recognized is included) of the input image for learning, as indicated by the above-described equation (14). An error rate e _jk is calculated by adding the weights Wi of the learning input image from which the feature amount regarded as is extracted.

ステップＳ１４５において、認識器生成部１００は、Ｎ個の弱認識器f_jkのうち、誤り率ｅ_jkが最小となる弱認識器f_jkを選択する。 In step S145, the recognizer generation unit 100 selects a weak recognizer f _jk having a minimum error rate e _{jk from} among the N weak recognizers f _jk .

ステップＳ１４６において、認識器生成部１００は、選択した弱認識器の最小の誤り率ｅ_jkに基づいて、上述した式（１５）で示されるように信頼度ｃ_jkを計算する。 In step S146, the recognizer generation unit 100 calculates the reliability c _jk based on the minimum error rate e _jk of the selected weak recognizer as represented by the above-described equation (15).

ステップＳ１４７において、認識器生成部１００は、供給された信頼度ｃ_jkに基づいて、上述した式（１６）で示されるように各学習用入力画像毎に重みＷiを再計算するとともに、全ての重みＷiを正規化して更新する。そして、認識器生成部１００は、重みの更新結果に基づいて、学習入力画像毎の重みを設定する。 In step S147, the recognizer generation unit 100 recalculates the weights Wi for each learning input image based on the supplied reliability c _jk as shown in the above-described equation (16), The weight Wi is normalized and updated. And the recognizer production | generation part 100 sets the weight for every learning input image based on the update result of a weight.

ステップＳ１４８において、認識器生成部１００は、選択された認識器f_jkを基に、Q番目の認識器ｆ_Qを一時記憶する。換言すれば、認識器生成部１００は、（Ｑ−１）番目の認識器認識器ｆ_Q-1を、選択された認識器f_jkを加えたQ番目の認識器ｆ_Qに更新させる。 In step S148, the recognizer generation unit 100, based on the recognizer f _jk selected, temporarily stores the Q-th recognizer f _Q. In other words, learner generation block 100 causes the updated (Q-1) -th recognizer recognizers f _Q-1, Q-th recognizer f _Q plus recognizer f _jk selected.

すなわち、認識器生成部１００は、上述した式（１７）で示されるように、認識器R(x)を更新する。このようにして、重み付けされた弱認識器f_jkが認識器R(x)に加算される。 That is, the recognizing device generating unit 100 updates the recognizing device R (x) as represented by the above-described equation (17). In this way, the weighted weak recognizer f _jk is added to the recognizer R (x).

ステップＳ１４９において、認識器生成部１００は、弱認識器f_jkの特徴点ｋの対応する局所特徴量の組み合わせにおけるモデル特徴量を、選択特徴量として一時記憶する。 In step S149, the recognizer generation unit 100 temporarily stores a model feature amount in a combination of local feature amounts corresponding to the feature point k of the weak recognizer f _jk as a selected feature amount.

ステップＳ１５０において、認識器生成部１００は、カウンタQの値が、認識器の生成のための繰り返し回数Ｌより大きいか否かを判断する。 In step S150, the recognizer generation unit 100 determines whether or not the value of the counter Q is larger than the number of repetitions L for generating the recognizer.

ステップＳ１５０において、カウンタQの値がＬよりも大きくないと判定された場合、ステップＳ１５１において、認識器生成部１００は、カウンタQを１インクリメントし、その後、処理は、ステップＳ１４２に戻り、それ以降の処理が繰り返される。ステップＳ１５０において、カウンタQがＬよりも大きいと判断された場合、現在記憶されている認識器Ｒ（ｘ）が、認識装置７２の認識器記憶部１２２に供給されるとともに、現在記憶されている選択特徴量が、認識装置７２の選択特徴量記憶部１２１に供給されて、処理は、図２９のステップＳ２０に戻り、ステップＳ２１に進む。 If it is determined in step S150 that the value of the counter Q is not greater than L, in step S151, the recognizer generation unit 100 increments the counter Q by 1, and then the process returns to step S142 and thereafter. The process is repeated. If it is determined in step S150 that the counter Q is greater than L, the currently stored recognizer R (x) is supplied to the recognizer storage unit 122 of the recognizer 72 and is currently stored. The selected feature amount is supplied to the selected feature amount storage unit 121 of the recognition device 72, and the process returns to step S20 in FIG. 29 and proceeds to step S21.

以上の処理により、Ｌ個の比較的誤り率の低い弱認識器f_Q（１＜Q＜L）からなる認識器Ｒ(ｘ)が生成されて、認識装置７２の認識器記憶部１２２に記憶されると共に、それぞれの弱認識器f_Qで使用されるべき特徴点ｋのモデル特徴量が選択特徴量として、認識装置７２の選択特徴量記憶部１２１に記憶される。ここで、繰り返し回数Ｌは、Ｌ≦Ｎ×Ｐである。 Through the above processing, a recognizer R (x) including L weak recognizers f _Q (1 <Q <L) having a relatively low error rate is generated and stored in the recognizer storage unit 122 of the recognizer 72. with the model feature quantity of feature point k to be used in each weak learner f _Q as the selected feature quantity, it is stored in the selected feature amount storage unit 121 of the recognition device 72. Here, the number of repetitions L is L ≦ N × P.

なお、式（１７）の認識器は、Ｌ個の弱認識器の多数決により、認識しようとする対象物体の有無を出力する関数であると言える。また、このようにして、弱認識器を学習処理により重み付けしつつ付加することを繰り返し、認識器を生成する学習処理は、Discrete Adaboost Algorithmと称される。 In addition, it can be said that the recognizer of Formula (17) is a function which outputs the presence or absence of the target object to recognize by the majority decision of L weak recognizers. Also, learning processing for repeatedly generating weak recognizers by repeatedly adding weak recognizers while weighting them by learning processing is referred to as Discrete Adaboost Algorithm.

すなわち、以上の認識器生成処理により、誤り率の高い学習用入力画像の学習用入力特徴量の重みが順次大きくなり、誤り率の低い学習用入力特徴量の重みが小さくなるように、モデル特徴量ごとに認識器と誤り率が計算される処理が繰り返されることになる。したがって、繰り返し処理（ステップＳ１４２乃至Ｓ１５０の処理）の中で、認識器を設定する際に選択される学習用入力特徴量（ステップＳ１４２で選択される学習特徴量）は、徐々に誤り率の高いものが選択されやすくなるので、認識し難い学習用入力特徴量が繰り返されるほどに選択されて学習が繰り返されることになるため、認識し難い学習用入力画像の特徴量がより多く選択されることになり、最終的に高い認識率の認識器を生成することが可能となる。 That is, by the above recognizer generation processing, the weight of the learning input feature amount of the learning input image having a high error rate is sequentially increased, and the weight of the learning input feature amount having a low error rate is decreased. For each quantity, the process of calculating the recognizer and the error rate is repeated. Accordingly, the learning input feature quantity (learning feature quantity selected in step S142) selected when setting the recognizer in the iterative process (the processes in steps S142 to S150) gradually has a high error rate. Since it becomes easy to select, the learning input feature quantity that is difficult to recognize is repeated and learning is repeated, so that more feature quantities of the learning input image that are difficult to recognize are selected. As a result, a recognizer having a high recognition rate can be finally generated.

また、繰り返し処理（ステップＳ１４２乃至Ｓ１５０の処理）の中で、認識器生成部１００は、常に誤り率の最も低いモデル特徴量に対応する弱認識器を選択することになるので、学習処理の繰り返しにより、常に信頼度の最も高いモデル特徴量についての弱認識器が選択されて認識器に加算されることになり、繰り返されるごとに精度の高い弱認識器が順次加算されることになる。 In the iterative process (the processes in steps S142 to S150), the recognizer generation unit 100 always selects the weak recognizer corresponding to the model feature having the lowest error rate, so the learning process is repeated. Thus, the weak recognizer for the model feature having the highest reliability is always selected and added to the recognizer, and the weak recognizer with high accuracy is sequentially added each time it is repeated.

すなわち、以上の学習処理により、特徴点および組み合わせごとに、特徴量に幾何学的な拘束が付加された特徴量を用いて、誤り率e_jkの低いＬ個の弱認識器f_jkからなる認識器Ｒ(ｘ)が生成されることになる。その結果として、信頼度の高い弱認識器のみからなる認識器が構成されることになるので、限られた個数の弱認識器で信頼度の高い認識器を構成することが可能となるので、後述する認識処理における演算処理数を低減させつつ、認識精度を向上させることが可能となる。 That is, by the learning process described above, a recognition made up of L weak recognizers f _{jk having} a low error rate e _jk is used for each feature point and each combination using a feature value obtained by adding a geometric constraint to the feature value. A device R (x) will be generated. As a result, since a recognizer consisting only of a weak recognizer with high reliability is configured, it becomes possible to configure a highly reliable recognizer with a limited number of weak recognizers, It is possible to improve recognition accuracy while reducing the number of arithmetic processes in recognition processing described later.

また、弱認識器の数を多くすれば（上述したＬを大きくすれば）、認識器による認識精度を向上させることが可能となる。一方、弱認識器の数を少数にしても（上述したＬを小さくしても）、選択される弱認識器は、少数ながらも信頼度の高い弱認識器のみを用いた認識処理を実行することができるので、信頼度を維持しつつ認識処理における演算処理数を低減させることが可能となる。すなわち、必要に応じて、学習処理に手間を掛けて認識器の生成個数を多くすることでより高い精度の認識器を生成することも可能であるし、逆に、学習に手間を掛けず生成する認識器の個数を減らして１発学習に近い学習にしても比較的精度の高い認識器を生成することが可能となる。 Further, if the number of weak recognizers is increased (L is increased as described above), the recognition accuracy by the recognizer can be improved. On the other hand, even if the number of weak recognizers is small (even if L described above is reduced), the weak recognizers that are selected perform recognition processing using only weak weak recognizers with high reliability. Therefore, it is possible to reduce the number of arithmetic processes in the recognition process while maintaining the reliability. In other words, if necessary, it is possible to generate higher-accuracy recognizers by increasing the number of recognizers generated by spending time on learning processing, and conversely, generating them without taking effort in learning. Even if learning is close to one-shot learning by reducing the number of recognizers to be performed, it is possible to generate a relatively highly accurate recognizer.

また、このように、学習装置７１においては、学習相関特徴量生成部９８によって生成された学習相関特徴量を用いて認識器の学習を行い、認識相関特徴量生成部１２６によって生成された認識相関特徴量を用いて認識処理を行うため、種類の異なる特徴量を同じスケールの下で相互に比較して認識対象の存否を判断することができる。すなわち、学習装置７１は、物体認識を行うに際して様々な特徴量を適宜用いることができる。そのため、学習装置７１においては、予め用意された様々な種類の特徴量の中から認識に適した特徴量の種類を自動的に選択して用いることができ、また、予め用意された様々な特徴量の中から認識に適した特徴量を自動的に選択して用いることができる。さらに、学習装置７１においては、認識に適した特徴点を自動的に統計学習することができる。 As described above, in the learning device 71, the recognizer is learned using the learning correlation feature value generated by the learning correlation feature value generation unit 98, and the recognition correlation generated by the recognition correlation feature value generation unit 126 is used. Since recognition processing is performed using feature amounts, different types of feature amounts can be compared with each other under the same scale to determine whether a recognition target exists. That is, the learning device 71 can appropriately use various feature amounts when performing object recognition. For this reason, the learning device 71 can automatically select and use types of feature amounts suitable for recognition from various types of feature amounts prepared in advance, and can also use various features prepared in advance. A feature quantity suitable for recognition can be automatically selected from the quantities and used. Furthermore, the learning device 71 can automatically statistically learn feature points suitable for recognition.

なお、ここでは、ブースティングアルゴリズムの一例として、Discrete AdaBoost Algorithmの適用例について説明したが、他のブースティングアルゴリズムを適用してもよく、例えば、Gentle AdaBoost Algorithmを用いるようにしてもよい。このGentle AdaBoost Algorithmによれば、弱認識器がそれぞれ信頼度を含んだ連続変量の値を出力することになるため、対応した重み付けがなされて、信頼度の計算を省くことができる。 Here, as an example of the boosting algorithm, the application example of the Discrete AdaBoost Algorithm has been described. However, other boosting algorithms may be applied, for example, the Gentle AdaBoost Algorithm may be used. According to this Gentle AdaBoost Algorithm, each weak recognizer outputs a continuous variable value including reliability, so that corresponding weighting is performed and calculation of reliability can be omitted.

次に、図３４のフローチャートを参照して、認識装置７２が実行する認識処理の一例である認識処理１について説明する。 Next, the recognition process 1 which is an example of the recognition process which the recognition apparatus 72 performs is demonstrated with reference to the flowchart of FIG.

ステップＳ１８１において、図３５を用いて後述する認識画像取得処理が実行される。 In step S181, a recognition image acquisition process described later with reference to FIG. 35 is executed.

ステップＳ１８２において、認識特徴点生成部１２４は、ステップＳ１８１の処理により取得された認識画像の特徴点を生成し、認識特徴量生成部１２５に供給する。 In step S 182, the recognition feature point generation unit 124 generates a feature point of the recognition image acquired by the processing in step S 181 and supplies the feature point to the recognition feature amount generation unit 125.

ステップＳ１８３において、認識特徴量生成部１２５は、認識特徴点生成部１２４によって生成された認識特徴点における認識画像の特徴量を生成し、認識相関特徴量生成部１２６に供給する。 In step S 183, the recognition feature value generation unit 125 generates a feature value of the recognition image at the recognition feature point generated by the recognition feature point generation unit 124, and supplies it to the recognition correlation feature value generation unit 126.

ステップＳ１８４において、図３６を用いて後述する認識相関特徴量生成処理が実行される。この処理は、選択特徴量記憶部１２１に記憶された選択特徴量のそれぞれに対して各認識特徴量との間の相関を求める処理である。 In step S184, recognition correlation feature value generation processing, which will be described later with reference to FIG. 36, is executed. This process is a process for obtaining a correlation between each selected feature quantity stored in the selected feature quantity storage unit 121 and each recognized feature quantity.

ステップＳ１８５において、認識処理部１２７は、認識相関特徴量生成部１２６によって生成された認識相関特徴量を、認識器記憶部１２２に記憶された認識器へ代入する計算処理を行う。 In step S 185, the recognition processing unit 127 performs calculation processing for substituting the recognition correlation feature value generated by the recognition correlation feature value generation unit 126 into the recognizer stored in the recognizer storage unit 122.

ステップＳ１８６において、認識処理部１２７は、ステップＳ１８５の計算結果に基づいて、認識画像には認識するべき物体が含まれているか否かを判断し、その結果を認識結果出力部１２８に供給する。 In step S186, the recognition processing unit 127 determines whether or not an object to be recognized is included in the recognition image based on the calculation result in step S185, and supplies the result to the recognition result output unit 128.

ステップＳ１８７において、認識結果出力部１２８は、認識処理部１２７から供給された認識結果を、例えば、表示部に表示したり、音声データとして出力したり、または、ＬＥＤなどを用いてユーザに通知したり、もしくは、所定の伝送路を介したり所定の記録媒体に記録させることなどにより、他の装置に出力して、処理が終了される。 In step S187, the recognition result output unit 128 displays the recognition result supplied from the recognition processing unit 127 on the display unit, outputs it as audio data, or notifies the user using an LED or the like. Or by recording the data on a predetermined recording medium through a predetermined transmission path or the like, and outputting to another device, and the processing is terminated.

このような処理により、学習特徴量を学習相関特徴量に変換して学習された認識器を用いて、認識特徴量を認識相関特徴量に変換して認識処理が実行される。このようにすることにより、種類の異なる特徴量を同じスケールの下で相互に比較して認識対象の存否を判断させることが可能となる。 By such processing, the recognition process is executed by converting the recognition feature amount into the recognition correlation feature amount using the recognizer learned by converting the learning feature amount into the learning correlation feature amount. In this way, it is possible to determine whether or not there is a recognition target by comparing different types of feature quantities with each other under the same scale.

次に、図３５のフローチャートを参照して、図３４のステップＳ１８１において実行される、認識画像取得処理について説明する。この認識画像取得処理は、図４を用いて説明した画像処理部１１と同様の構成を有しており、内部にフォーカスカメラを備えて認識画像を撮像するとともに、フォーカスカメラにより撮像して得られた認識画像データから焦点が合致した部分を抽出することができる認識画像取得部１２３によって実行される。したがって、図３５のフローチャートにおいては、図４および図５を用いて説明した画像処理部１１の構成を認識画像取得部１２３が有しているものとして説明する。 Next, the recognition image acquisition process executed in step S181 in FIG. 34 will be described with reference to the flowchart in FIG. This recognition image acquisition process has the same configuration as that of the image processing unit 11 described with reference to FIG. 4, and is obtained by capturing a recognition image with a focus camera inside and capturing it with the focus camera. This is executed by the recognition image acquisition unit 123 that can extract a focused part from the recognized image data. Therefore, in the flowchart of FIG. 35, the configuration of the image processing unit 11 described with reference to FIGS. 4 and 5 is described as having the recognition image acquisition unit 123.

ステップＳ２１１において、認識画像取得部１２３の画像取得部２１は、所定の焦点距離で、撮像処理を実行し、得られた画像を背景分離処理部２２に供給する。 In step S 211, the image acquisition unit 21 of the recognition image acquisition unit 123 executes an imaging process at a predetermined focal length, and supplies the obtained image to the background separation processing unit 22.

ステップＳ２１２において、認識画像取得部１２３の背景分離処理部２２は、図３１のフローチャートを用いて説明した背景分離処理を実行する。 In step S212, the background separation processing unit 22 of the recognized image acquisition unit 123 executes the background separation process described with reference to the flowchart of FIG.

ステップＳ２１３において、認識画像取得部１２３の背景分離処理部２２は、背景が分離された画像には、認識するべき物体が存在するか否かを判断する。ステップＳ２１３において、例えば、画像データ全体がピンボケであった場合など、認識するべき物体が存在しないと判断された場合、処理は、ステップＳ２１１に戻り、それ以降の処理が繰り返される。 In step S213, the background separation processing unit 22 of the recognized image acquisition unit 123 determines whether there is an object to be recognized in the image from which the background is separated. In step S213, when it is determined that there is no object to be recognized, for example, when the entire image data is out of focus, the process returns to step S211 and the subsequent processes are repeated.

ステップＳ２１３において、認識するべき物体が存在したと判断された場合、ステップＳ２１４において、認識画像取得部１２３の背景分離処理部２２は、背景が分離された画像データに対して、必要に応じて、アライメントなどの画像処理を施す。 If it is determined in step S213 that there is an object to be recognized, in step S214, the background separation processing unit 22 of the recognized image acquisition unit 123 performs image data on which the background has been separated, as necessary. Image processing such as alignment is performed.

ステップＳ２１５において、認識画像取得部１２３の背景分離処理部２２は、背景が分離された認識されるべき物体に対応する画像を、認識特徴点生成部１２４および認識特徴量生成部１２５に出力し、処理は、図３４のステップＳ１８１に戻り、ステップＳ１８２に進む。 In step S215, the background separation processing unit 22 of the recognition image acquisition unit 123 outputs an image corresponding to the object to be recognized from which the background is separated to the recognition feature point generation unit 124 and the recognition feature amount generation unit 125, The process returns to step S181 in FIG. 34 and proceeds to step S182.

このような処理により、認識用画像データが取得される。画像データはフォーカスカメラにより撮像されるので、焦点が合致する位置に存在する認識するべき物体を、容易な処理で背景から分離することが可能である。 Through such processing, recognition image data is acquired. Since the image data is picked up by the focus camera, it is possible to separate the object to be recognized existing at the position where the focus is matched from the background by an easy process.

次に、図３６のフローチャートを参照して、図３４のステップＳ１８４において実行される、認識相関特徴量生成処理について説明する。 Next, the recognition correlation feature value generation process executed in step S184 of FIG. 34 will be described with reference to the flowchart of FIG.

ステップＳ２４１において、認識相関特徴量生成部１２６は、処理済の特徴量の数を示す変数ｋを、ｋ＝１とする。 In step S241, the recognized correlation feature value generation unit 126 sets a variable k indicating the number of processed feature values to k = 1.

ステップＳ２４２において、認識相関特徴量生成部１２６は、Ｎ個の選択特徴量のうちｋ番目の特徴量である特徴量ｋの選択特徴量について、認識画像の対応する特徴点における認識特徴量との相関値を生成する。 In step S242, the recognition correlation feature value generation unit 126 calculates the selected feature value of the feature value k, which is the k-th feature value among the N selected feature values, from the recognized feature value at the corresponding feature point of the recognized image. Generate correlation values.

ステップＳ２４３において、認識相関特徴量生成部１２６は、生成された相関値から、最も相関の高い認識特徴量を選択する。 In step S243, the recognition correlation feature value generation unit 126 selects the recognition feature value having the highest correlation from the generated correlation values.

ステップＳ２４４において、認識相関特徴量生成部１２６は、ステップＳ２４３で選択された認識特徴量の相関値を、特徴量ｋの認識相関特徴量とする。 In step S244, the recognition correlation feature value generation unit 126 sets the correlation value of the recognition feature value selected in step S243 as the recognition correlation feature value of the feature value k.

ステップＳ２４５において、認識相関特徴量生成部１２６は、変数ｋは、１つの画像データに対する特徴量の総数Ｎであるか否かを判断する。 In step S245, the recognized correlation feature value generation unit 126 determines whether or not the variable k is the total number N of feature values for one image data.

ステップＳ２４５において、変数ｋはＮではない、すなわち、Ｎに達していないと判断された場合、ステップＳ２４６において、認識相関特徴量生成部１２６は、変数ｋを１インクリメントして、処理は、ステップＳ２４２に戻り、それ以降の処理が繰り返される。 If it is determined in step S245 that the variable k is not N, that is, it has not reached N, in step S246, the recognized correlation feature value generation unit 126 increments the variable k by 1, and the process proceeds to step S242. Return to, and the subsequent processing is repeated.

ステップＳ２４５において、変数ｋはＮであると判断された場合、処理は、図３４のステップＳ１８４に戻り、ステップＳ１８５に進む。 If it is determined in step S245 that the variable k is N, the process returns to step S184 in FIG. 34 and proceeds to step S185.

このような処理により、例えば、図２５を用いて説明した手法を用いることなどにより、認識相関特徴量が生成される。また、図２６を用いて説明したように、認識相関特徴量においても、異なる種類の特徴量(例えば、形、色、動きなど)でも、それぞれの相関を算出することにより求めることが可能である。 By such processing, for example, the recognition correlation feature amount is generated by using the method described with reference to FIG. In addition, as described with reference to FIG. 26, the recognized correlation feature quantity and the different types of feature quantities (for example, shape, color, motion, etc.) can be obtained by calculating the respective correlations. .

次に、図３７のフローチャートを参照して、認識装置７２が実行する認識処理の異なる処理例である認識処理２について説明する。 Next, with reference to the flowchart of FIG. 37, the recognition process 2 which is a different process example of the recognition process which the recognition apparatus 72 performs is demonstrated.

図３４のフローチャートを用いて説明した認識処理１は、撮像された認識画像のそれぞれに対して実行される認識処理の全ての認識結果を出力するものであった。これは、学習処理や認識処理において用いられる画像データを取得するにあたって、フォーカスカメラにおいて設定された焦点距離に、ユーザが認識される物体を設置したのち、装置に対して撮像の指示を与えて、撮像処理を行うようにした場合であっても良いし、フォーカスカメラの撮像範囲内に入るように認識される物体を適当に移動させ、その間、連続的に撮像処理を実行させるようにした場合であっても、実行可能な処理である。 The recognition process 1 described using the flowchart of FIG. 34 outputs all the recognition results of the recognition process executed on each of the captured recognition images. This is because, when acquiring image data used in learning processing and recognition processing, an object to be recognized by the user is placed at the focal length set in the focus camera, and then an imaging instruction is given to the device. It may be a case where imaging processing is performed, or a case where an object recognized so as to fall within the imaging range of the focus camera is appropriately moved, and during that time, the imaging processing is continuously executed. Even if it exists, it is an executable process.

これに対して、図３７のフローチャートを用いて説明する処理は、認識処理において認識画像を得る際に、フォーカスカメラの撮像範囲内に入るように、認識される物体を適当に移動させ、その間、連続的に撮像処理を実行させるようにする場合において、連続して撮像された認識画像のそれぞれに対して実行される認識処理の全ての認識結果を出力するのではなく、連続して撮像された認識画像のそれぞれに対して実行される認識処理の認識結果を所定回数、または、所定時間だけ保持しておき、その推移を観察して、認識されているらしいピーク値における認識結果を出力する処理である。 On the other hand, in the process described using the flowchart of FIG. 37, when the recognition image is obtained in the recognition process, the recognized object is appropriately moved so as to fall within the imaging range of the focus camera. In the case where the imaging process is continuously executed, not all the recognition results of the recognition process executed for each of the continuously captured recognition images are output, but continuously captured. Processing for holding the recognition result of the recognition processing executed for each recognition image for a predetermined number of times or for a predetermined time, observing the transition, and outputting the recognition result at the peak value that seems to be recognized It is.

ステップＳ２７１乃至ステップＳ２７５において、図３４のステップＳ１８１乃至ステップＳ１８５と同様の処理が実行される。 In steps S271 to S275, processing similar to that in steps S181 to S185 in FIG. 34 is executed.

すなわち、認識画像が取得されて、認識画像の特徴点が生成されて、特徴点における特徴量が生成される。そして、図３６を用いて説明した認識相関特徴量生成処理が実行され、得られた認識相関特徴量を、認識器へ代入する計算処理が実行される。 That is, a recognition image is acquired, a feature point of the recognition image is generated, and a feature amount at the feature point is generated. And the recognition correlation feature-value production | generation process demonstrated using FIG. 36 is performed, and the calculation process which substitutes the acquired recognition correlation feature-value to a recognizer is performed.

そして、ステップＳ２７６において、認識処理部１２７は、得られた計算結果を一時保持する。保持された計算結果は、計算処理における所定回数、または、所定時間だけ保持される。 In step S276, the recognition processing unit 127 temporarily holds the obtained calculation result. The held calculation result is held for a predetermined number of times or a predetermined time in the calculation process.

ステップＳ２７７において、認識処理部１２７は、連続して撮像された認識画像のそれぞれに対して実行される認識処理の計算結果を所定回数、または、所定時間だけ保持しておき、その推移を観察して、ピーク値が得られたか否か、換言すれば、最も認識されているらしい結果が得られたか否かを判断する。 In step S277, the recognition processing unit 127 retains the calculation result of the recognition process executed for each of the continuously captured recognition images for a predetermined number of times or for a predetermined time, and observes the transition. Thus, it is determined whether or not a peak value has been obtained, in other words, whether or not a result most likely to be recognized has been obtained.

ステップＳ２７７において、ピーク値が得られていないと判断された場合、処理は、ステップＳ２７１に戻り、それ以降の処理が繰り返される。 If it is determined in step S277 that no peak value has been obtained, the process returns to step S271, and the subsequent processes are repeated.

ステップＳ２７７において、ピーク値が得られたと判断された場合、ステップＳ２７８において、認識処理部１２７は、ピーク値に基づいて、認識画像には認識するべき物体が含まれているか否かを判断し、その結果を認識結果出力部１２８に供給する。 If it is determined in step S277 that a peak value has been obtained, in step S278, the recognition processing unit 127 determines whether or not an object to be recognized is included in the recognized image based on the peak value. The result is supplied to the recognition result output unit 128.

ステップＳ２７８において、認識結果出力部１２８は、認識処理部１２７から供給された認識結果を、例えば、表示部に表示したり、音声データとして出力したり、または、ＬＥＤなどを用いてユーザに通知したり、もしくは、所定の伝送路を介したり所定の記録媒体に記録させることなどにより、他の装置に出力して、処理が終了される。 In step S278, the recognition result output unit 128 displays the recognition result supplied from the recognition processing unit 127 on the display unit, outputs it as audio data, or notifies the user using an LED or the like. Or, the data is output to another device through a predetermined transmission path or recorded on a predetermined recording medium, and the processing is terminated.

このような処理により、学習特徴量を学習相関特徴量に変換して学習された認識器を用いて、認識特徴量を認識相関特徴量に変換して認識処理を行うため、種類の異なる特徴量を同じスケールの下で相互に比較して認識対象の存否を判断させることが可能となる。 By using a recognizer that has been learned by converting learning feature quantities to learning correlation feature quantities through such processing, the recognition feature quantities are converted to recognition correlation feature quantities and recognition processing is performed. Can be compared with each other under the same scale to determine whether a recognition target exists.

また、さらに、このような処理により、ユーザが、フォーカスカメラにおいて設定された焦点距離ぴったりの位置に、認識される物体をいちいち設置するような煩雑な動作を行う必要がなくなる。例えば、ペットを認識したい場合にカメラ前でペットが動いていたとしても、認識処理を正しく実行させることができる。また、ユーザが、例えば、把持することなどにより移動可能なものを認識させたい場合には、フォーカスカメラの焦点位置ぴったりに認識物を設置しなくても、大体、そのあたりを適当に移動させていれば、その位置がフォーカスカメラの焦点距離に合致したときに正しい抽出処理が行われて、有用な認識画像が得られて、認識処理を行うことができる。更に、例えば、通路を通行する人物を認識したいとき、その通路を通行する人物がフォーカスカメラの焦点位置に対応する所定の位置を通過するときに自動的に正しい抽出処理が行われるので、人物を所定位置に立たせて、撮像開始を指令することなどを必要とせずに、正しい認識処理が可能となる。 Furthermore, such processing eliminates the need for the user to perform complicated operations such as installing the recognized objects one by one at the exact focal length set in the focus camera. For example, when it is desired to recognize a pet, even if the pet is moving in front of the camera, the recognition process can be executed correctly. In addition, when the user wants to recognize a movable object, for example, by gripping the object, it is generally moved appropriately around the focal point of the focus camera without setting the recognized object exactly. Then, when the position matches the focal length of the focus camera, correct extraction processing is performed, and a useful recognition image can be obtained and recognition processing can be performed. Further, for example, when it is desired to recognize a person passing through the passage, a correct extraction process is automatically performed when the person passing through the passage passes a predetermined position corresponding to the focal position of the focus camera. It is possible to perform a correct recognition process without having to stand at a predetermined position and instruct the start of imaging.

以上説明した認識処理のアプローチ方法は、Collage-of-Featと称され、特徴量としてどのようなものを利用することも可能である。そして、local MAX poolingを用いて、ブースティングアルゴリズムにより認識を行うようになされている。 The recognition processing approach described above is called Collage-of-Feat, and any feature can be used. Then, recognition is performed by a boosting algorithm using local MAX pooling.

上述した処理により、オンライン実行の学習における、物体モデルの学習、すなわち、学習の結果得られる物体モデルの登録が、背景を分離した画像データを用いて行えるため、学習性能の向上が見込まれる。更に、認識実行時にも、認識画像の取得において、煩雑な処理を行うことなく、背景を削除することができるので、誤認識が減ることが期待でき、性能が向上する。さらに、学習処理および認識処理とも、切り出した領域だけ処理すればよいので、計算時間の削減にもつながる。 By the above-described processing, learning of an object model in online execution learning, that is, registration of an object model obtained as a result of learning can be performed using image data with a separated background, so that learning performance can be improved. Furthermore, since the background can be deleted without performing complicated processing in the recognition image acquisition even during recognition execution, it can be expected that false recognition is reduced, and the performance is improved. Furthermore, both the learning process and the recognition process need only be performed on the cut out region, leading to a reduction in calculation time.

なお、これらの処理における画像データの取得および認識物体の領域の抽出は、複眼を利用することなく、単眼カメラのみで行えるので、コストアップを抑制することができる。 Note that image data acquisition and recognition object region extraction in these processes can be performed only with a monocular camera without using a compound eye, so that an increase in cost can be suppressed.

また、認識処理および学習処理が、上述した処理と異なる方法であっても、フォーカスカメラを用いて撮像された画像データを、図４を用いて説明した画像処理部１１を用いて処理することにより、煩雑な操作などを行わずに認識に必要な部分の抽出ができるという効果を得ることができる。すなわち、例えば、従来技術としてあげたような学習処理または認識処理において、図４を用いて説明した画像処理部１１を用いても、煩雑な操作などを行わずに認識に必要な部分の抽出ができるという効果を得ることができる。 Even if the recognition process and the learning process are different from the processes described above, the image data captured using the focus camera is processed using the image processing unit 11 described with reference to FIG. Thus, it is possible to obtain an effect that a part necessary for recognition can be extracted without performing a complicated operation. That is, for example, in the learning process or the recognition process described as the conventional technique, even if the image processing unit 11 described with reference to FIG. 4 is used, a part necessary for recognition can be extracted without performing a complicated operation. The effect that it is possible can be obtained.

なお、ここでは、画像処理システム５１は、学習装置７１と認識装置７２とで構成されるものとして説明したが、学習装置７１と認識装置７２との両方の機能を有する１つの装置によって、学習処理と認識処理の両方が行われるものであっても良いことは言うまでもない。換言すれば、学習処理システム５１が、１つの装置によって構成されていても良い。さらに、学習処理システム５１は、学習装置７１と認識装置７２との両方の機能を有するような２つ以上の装置により構成されていても良い。 Here, the image processing system 51 has been described as configured by the learning device 71 and the recognition device 72, but the learning processing is performed by one device having both functions of the learning device 71 and the recognition device 72. Needless to say, both the recognition processing and the recognition processing may be performed. In other words, the learning processing system 51 may be configured by a single device. Furthermore, the learning processing system 51 may be configured by two or more devices having both functions of the learning device 71 and the recognition device 72.

また、学習処理と認識処理は連続して行われなくても良く、学習装置７１と認識装置７２とは、乖離して設置されていても良いことはいうまでもない。換言すれば、学習装置７１により生成される選択特徴量および認識器が選択特徴量記憶部１２１および認識器記憶部１２２にそれぞれ記憶されている認識装置７２は、学習装置７１と乖離した場所に設置されても単独で認識処理を行うことができる。 Needless to say, the learning process and the recognition process may not be performed continuously, and the learning device 71 and the recognition device 72 may be installed apart from each other. In other words, the recognition device 72 in which the selected feature value and the recognizer generated by the learning device 71 are stored in the selected feature value storage unit 121 and the recognizer storage unit 122, respectively, is installed at a location that is different from the learning device 71. However, the recognition process can be performed independently.

また、モデル画像の取得、学習画像の取得、および、認識画像の取得のうちのいずれの処理においても、画像データから認識するべき物体に対応する領域を自動的に抽出するものとしても良いし、これらの画像取得の処理のうちのいずれか少なくとも１つを、他の方法を用いて取得するようにしても良い。例えば、モデル画像の数が少ないような場合などにおいては、モデル画像取得部９１を、図４を用いて説明した画像処理部１１と同様の構成とはせずに、認識するべき部分のみを示すモデル画像を取得したり、または、ユーザの操作などにより、所定の画像データから、認識するべき部分のみを抽出して、モデル特徴点生成部９２に出力するようにしてもよい。 Further, in any process of obtaining the model image, obtaining the learning image, and obtaining the recognition image, a region corresponding to the object to be recognized from the image data may be automatically extracted. At least one of these image acquisition processes may be acquired using another method. For example, when the number of model images is small, the model image acquisition unit 91 is not configured similarly to the image processing unit 11 described with reference to FIG. Only a portion to be recognized may be extracted from predetermined image data by acquiring a model image or by a user operation, and may be output to the model feature point generation unit 92.

上述した一連の処理は、ハードウェアにより実行させることもできるし、ソフトウェアにより実行させることもできる。そのソフトウェアは、そのソフトウェアを構成するプログラムが、専用のハードウェアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどに、記録媒体からインストールされる。この場合、上述した処理は、図３８に示されるようなパーソナルコンピュータ５００により実行される。 The series of processes described above can be executed by hardware or can be executed by software. The software is a computer in which the program constituting the software is incorporated in dedicated hardware, or various functions can be executed by installing various programs, for example, a general-purpose personal computer For example, it is installed from a recording medium. In this case, the processing described above is executed by a personal computer 500 as shown in FIG.

図３８において、CPU（Central Processing Unit）５０１は、ROM(Read Only Memory)５０２に記憶されているプログラム、または、記憶部５０８からRAM(Random Access Memory)５０３にロードされたプログラムに従って各種の処理を実行する。RAM５０３にはまた、CPU５０１が各種の処理を実行する上において必要なデータなどが適宜記憶される。 38, a CPU (Central Processing Unit) 501 performs various processes according to a program stored in a ROM (Read Only Memory) 502 or a program loaded from a storage unit 508 to a RAM (Random Access Memory) 503. Execute. The RAM 503 also appropriately stores data necessary for the CPU 501 to execute various processes.

CPU５０１、ROM５０２、およびRAM５０３は、内部バス５０４を介して相互に接続されている。この内部バス５０４にはまた、入出力インターフェース５０５も接続されている。 The CPU 501, ROM 502, and RAM 503 are connected to each other via an internal bus 504. An input / output interface 505 is also connected to the internal bus 504.

入出力インターフェース５０５には、キーボード、マウスなどよりなる入力部５０６、CRT，LCDなどよりなるディスプレイ、スピーカなどよりなる出力部５０７、ハードディスクなどより構成される記憶部５０８、並びに、モデム、ターミナルアダプタなどより構成される通信部５０９が接続されている。通信部５０９は、電話回線やCATVを含む各種のネットワークを介しての通信処理を行う。 The input / output interface 505 includes an input unit 506 including a keyboard and a mouse, a display including CRT and LCD, an output unit 507 including a speaker, a storage unit 508 including a hard disk, a modem, a terminal adapter, and the like. A communicator 509 is connected. A communication unit 509 performs communication processing via various networks including a telephone line and CATV.

入出力インターフェース５０５にはまた、必要に応じてドライブ５１０が接続され、磁気ディスク、光ディスク、光磁気ディスク、あるいは半導体メモリなどによりなるリムーバブルメディア５２１が適宜装着され、それから読み出されたコンピュータプログラムが、必要に応じて記憶部５０８にインストールされる。 A drive 510 is also connected to the input / output interface 505 as necessary, and a removable medium 521 composed of a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is appropriately mounted, and a computer program read therefrom is It is installed in the storage unit 508 as necessary.

一連の処理をソフトウェアにより実行させる場合には、そのソフトウェアを構成するプログラムが、ネットワークや記録媒体からインストールされる。 When a series of processing is executed by software, a program constituting the software is installed from a network or a recording medium.

この記録媒体は、図３８に示されるように、コンピュータとは別に、ユーザにプログラムを提供するために配布される、プログラムが記録されているリムーバブルメディア５２１よりなるパッケージメディアにより構成されるだけでなく、装置本体に予め組み込まれた状態でユーザに提供される、プログラムが記録されているROM５０２や記憶部５０８が含まれるハードディスクなどで構成される。 As shown in FIG. 38, this recording medium is not only composed of a package medium consisting of a removable medium 521 on which a program is recorded, which is distributed to provide a program to the user, separately from the computer. These are configured by a hard disk including a ROM 502 storing a program and a storage unit 508 provided to the user in a state of being pre-installed in the apparatus main body.

また、本明細書において、記録媒体に記録されるプログラムを記述するステップは、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。 Further, in the present specification, the step of describing the program recorded on the recording medium is not limited to the processing performed in chronological order according to the described order, but may be performed in parallel or It also includes processes that are executed individually.

なお、本明細書において、システムとは、複数の装置により構成される装置全体を表すものである。 In the present specification, the system represents the entire apparatus constituted by a plurality of apparatuses.

なお、本発明の実施の形態は、上述した実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiment of the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the gist of the present invention.

従来の認識技術について説明するための図である。It is a figure for demonstrating the conventional recognition technique. 学習処理と認識処理に関係について説明するための図である。It is a figure for demonstrating the relationship between a learning process and a recognition process. フォーカスカメラで撮像された画像の特徴について説明するための図である。It is a figure for demonstrating the characteristic of the image imaged with the focus camera. フォーカスカメラで撮像された画像から背景を分離することができる画像処理部の構成を示す図である。It is a figure which shows the structure of the image process part which can isolate | separate a background from the image imaged with the focus camera. 図４の背景分離処理部の更に詳細な構成を示す図である。It is a figure which shows the further detailed structure of the background separation process part of FIG. 近傍画素差分フィルタについて説明するための図である。It is a figure for demonstrating a neighborhood pixel difference filter. 近傍画素差分フィルタ計算処理部の出力例を示す図である。It is a figure which shows the example of an output of a near pixel difference filter calculation process part. 近傍領域和フィルタについて説明するための図である。It is a figure for demonstrating a neighborhood area | region sum filter. 近傍領域和フィルタ計算処理部の処理について説明するための図である。It is a figure for demonstrating the process of a neighborhood area | region sum filter calculation process part. 近傍領域和フィルタ計算処理部の出力例を示す図である。It is a figure which shows the example of an output of a neighborhood area | region sum filter calculation process part. 近傍領域和フィルタ計算処理部の出力例を示す図である。It is a figure which shows the example of an output of a neighborhood area | region sum filter calculation process part. 近傍領域和フィルタ計算処理部の出力例を示す図である。It is a figure which shows the example of an output of a neighborhood area | region sum filter calculation process part. 閾値処理部の出力例を示す図である。It is a figure which shows the example of an output of a threshold value process part. 閾値処理部の出力例を示す図である。It is a figure which shows the example of an output of a threshold value process part. 背景分離処理部による処理の流れについて説明するための図である。It is a figure for demonstrating the flow of a process by a background separation process part. 背景分離処理部による処理の流れについて説明するための図である。It is a figure for demonstrating the flow of a process by a background separation process part. 背景分離処理部による処理の流れについて説明するための図である。It is a figure for demonstrating the flow of a process by a background separation process part. 背景分離処理部による処理の流れについて説明するための図である。It is a figure for demonstrating the flow of a process by a background separation process part. 画像処理システムの構成について説明するためのブロック図である。It is a block diagram for demonstrating the structure of an image processing system. 学習装置において実行される学習フェーズの概要について説明するための図である。It is a figure for demonstrating the outline | summary of the learning phase performed in a learning apparatus. 二次元上でのSteerableFilterのカーネルの形状を示す図である。It is a figure which shows the shape of the kernel of SteerableFilter on two dimensions. 近傍のジェットを結合する処理について説明するための図である。It is a figure for demonstrating the process which couple | bonds the jet of a vicinity. 局所特徴量の回転について説明するための図である。It is a figure for demonstrating rotation of a local feature-value. ＨＳＶ空間におけるヒストグラムの例を示す図である。It is a figure which shows the example of the histogram in HSV space. ＥＢＧＭ法による相関最大点の探索例について説明するための図である。It is a figure for demonstrating the example of a search of the correlation maximum point by EBGM method. 複数種類の特徴量による相関特徴量算出の例について説明するための図である。It is a figure for demonstrating the example of the correlation feature-value calculation by multiple types of feature-value. 認識器生成部において実行される学習処理の例について説明するための図である。It is a figure for demonstrating the example of the learning process performed in a recognizer production | generation part. 認識器生成部において実行される学習処理の例について説明するための図である。It is a figure for demonstrating the example of the learning process performed in a recognizer production | generation part. 学習処理について説明するためのフローチャートである。It is a flowchart for demonstrating a learning process. 学習画像取得処理について説明するためのフローチャートである。It is a flowchart for demonstrating a learning image acquisition process. 背景分離処理について説明するためのフローチャートである。It is a flowchart for demonstrating a background separation process. 学習相関特徴量生成処理について説明するためのフローチャートである。It is a flowchart for demonstrating a learning correlation feature-value production | generation process. 認識器生成処理について説明するためのフローチャートである。It is a flowchart for demonstrating a recognizer production | generation process. 認識処理１について説明するためのフローチャートである。10 is a flowchart for explaining recognition processing 1; 認識画像取得処理について説明するためのフローチャートである。It is a flowchart for demonstrating recognition image acquisition processing. 認識相関督著量生成処理について説明するためのフローチャートである。It is a flowchart for demonstrating a recognition correlation author amount production | generation process. 認識処理２について説明するためのフローチャートである。It is a flowchart for demonstrating the recognition process 2. FIG. パーソナルコンピュータの構成を示すブロック図である。It is a block diagram which shows the structure of a personal computer.

Explanation of symbols

１１画像処理部，２１画像取得部，２２背景分離処理部，３１近傍画素差分フィルタ計算処理部，３２近傍領域和フィルタ計算処理部，３３閾値処理部，５１画像処理システム，７１学習装置，７２認識装置，９１モデル画像取得部，９２モデル特徴点生成部，９３モデル特徴量生成部，９４モデル特徴量記憶部，９５学習画像取得部，９６学習特徴点生成部，９７学習特徴量生成部，９８学習相関特徴量生成部，９９正誤情報取得部，１００認識器生成部，１２１選択特徴量記憶部，１２２認識器記憶部，１２３認識画像取得部，１２４認識特徴点生成部，１２５認識特徴量生成部，１２６認識相関特徴量生成部，１２７認識処理部，１２８認識結果出力部，１４１乃至１４３相関算出部 DESCRIPTION OF SYMBOLS 11 Image processing part, 21 Image acquisition part, 22 Background separation processing part, 31 Neighboring pixel difference filter calculation processing part, 32 Neighboring area sum filter calculation processing part, 33 Threshold processing part, 51 Image processing system, 71 Learning apparatus, 72 Recognition Apparatus, 91 model image acquisition unit, 92 model feature point generation unit, 93 model feature amount generation unit, 94 model feature amount storage unit, 95 learning image acquisition unit, 96 learning feature point generation unit, 97 learning feature amount generation unit, 98 Learning correlation feature generation unit, 99 correct / incorrect information acquisition unit, 100 recognizer generation unit, 121 selected feature amount storage unit, 122 recognizer storage unit, 123 recognition image acquisition unit, 124 recognition feature point generation unit, 125 recognition feature amount generation , 126 recognition correlation feature value generation unit, 127 recognition processing unit, 128 recognition result output unit, 41 to 143 correlation calculation unit

Claims

In an image processing apparatus that generates a recognizer for recognizing a recognition target in advance by learning processing,
Learning image acquisition means for acquiring a learning image used in the learning process;
Model image acquisition means for acquiring a model image corresponding to the recognition target;
Recognition that executes the learning process using the learning image acquired by the learning image acquisition unit and the model image acquired by the model image acquisition unit, and generates a recognizer for recognizing the recognition target Generator generating means, and
At least one of the learning image acquisition means or the model image acquisition means is:
Image acquisition means for acquiring image data in which the focal point matches an image of a subject existing at a predetermined focal length and the focal point does not match other objects;
Image extracting means for extracting from the image data acquired by the image acquiring means a portion corresponding to the subject in focus;
An image processing apparatus that acquires a portion corresponding to the subject extracted by the image extraction means as the learning image or the model image.

The image extracting means includes
First calculation means for executing calculation processing for extracting a pixel having a large difference from neighboring pixels in each pixel of the image data acquired by the image acquisition means;
Second computing means for obtaining an average of the target pixel and its neighboring area, with a pixel having a large difference from the neighboring pixel extracted by the first computing means as a target pixel;
2. The image according to claim 1, further comprising: an area corresponding to an object to be detected based on a calculation result of the second calculation means; and a dividing means for dividing the image data into areas considered to be the background. Processing equipment.

The division unit divides the calculation result of the second calculation unit into a region corresponding to an object to be detected and a region considered to be the background by binarizing with a predetermined threshold value. The image processing apparatus described.

The image processing apparatus according to claim 2, wherein the dividing unit recognizes a pixel whose calculation result of the second calculation unit is a positive value as a region corresponding to an object to be detected.

The recognizer generating means includes
Model feature point generation means for generating a plurality of feature points as model feature points from the model image acquired by the model image acquisition means;
Model feature value generating means for generating a feature value at each of the model feature points generated by the model feature point generating means as a model feature value;
Learning feature point generation means for generating a plurality of feature points as learning feature points from the learning image acquired by the learning image acquisition means;
Learning feature value generation means for generating a feature value at each of the learning feature points generated by the learning feature point generation means as a learning feature value;
For each of the model feature quantities generated by the model feature quantity generation means, the learning feature quantity generated by the learning feature quantity generation means is selected, and the selected learning feature is selected. A learning correlation feature value generating means for generating a degree of correlation with a quantity as a learning correlation feature value;
Correct / incorrect information acquisition means for acquiring correct / incorrect information indicating whether or not the learning image includes the recognition target;
The recognizing device generating unit that generates a recognizing device based on the learning correlation feature amount generated by the learning correlation feature amount generating unit and the correct / incorrect information acquired by the correct / incorrect information acquiring unit. The image processing apparatus described.

The model feature point generated by the model feature point generation means is selected according to the type of the model feature amount in the model feature point,
The image processing apparatus according to claim 5, wherein the learning feature point generated by the learning feature point generation unit is selected according to a type of the learning feature amount at the learning feature point.

The model feature value generated by the model feature value generation means is selected according to the type of the model feature value,
The image processing apparatus according to claim 5, wherein the learning feature amount generated by the learning feature amount generation unit is selected according to a type of the learning feature amount.

The image processing apparatus according to claim 5, wherein the recognizer generation unit generates the recognizer through a learning process based on a weighted vote.

The image processing apparatus according to claim 8, wherein the learning process based on the weighted voting is a boosting algorithm.

2. The image extracting unit extracts a portion corresponding to the subject that is in focus by extracting a region that is out of focus from the image data acquired by the image acquisition unit. The image processing apparatus described.

The image extraction means analyzes the frequency spectrum of each image area that constitutes the image data acquired by the image acquisition means using FFT, and the focus is matched in an area that sufficiently contains high-frequency components. The image processing apparatus according to claim 1, wherein a portion corresponding to the subject in focus is extracted by determining that the subject is in focus.

Recognizer storage means for storing the recognizer generated by the recognizer generation means;
Selection feature quantity storage means for storing a selection feature quantity corresponding to each of the recognizers stored by the recognizer storage means;
Recognition image acquisition means for acquiring a recognition image used for performing recognition processing;
Recognition feature point generation means for generating a plurality of feature points as recognition feature points from the recognition image acquired by the recognition image acquisition means;
Recognition feature value generation means for generating a feature value at each of the recognition feature points generated by the recognition feature point generation means as a recognition feature value;
For each of the selected feature amounts stored by the selected feature amount storage unit, the recognition feature amount having the highest correlation among the recognition feature amounts generated by the recognition feature amount generation unit is selected and selected. A recognition correlation feature value generating means for generating a degree of correlation with the recognition correlation feature value;
By substituting the recognition correlation feature quantity generated by the recognition correlation feature quantity generation unit into the recognizer generated by the recognition unit generation unit, the recognition image acquired by the recognition image acquisition unit is added to the recognition image. The image processing apparatus according to claim 1, further comprising: a recognition processing unit that determines whether or not a recognition target is included.

The image processing according to claim 12, wherein the recognition image acquisition unit includes the image acquisition unit and the image extraction unit, and acquires a portion corresponding to the subject extracted by the image extraction unit as the recognition image. apparatus.

In an image processing method of an image processing apparatus for generating a recognizer for recognizing a recognition target in advance by learning processing,
Obtaining a learning image used in the learning process;
Obtaining a model image corresponding to the recognition target;
Performing the learning process using the acquired learning image and the model image, and generating a recognizer for recognizing the recognition target,
At least one of the step of acquiring the learning image or the step of acquiring the model image includes:
Obtain image data that has a focal point that matches the image of the subject that exists at the specified focal length and that does not match the focal point of other objects,
Extracting a portion corresponding to the subject in focus from the acquired image data,
An image processing method for acquiring a portion corresponding to the extracted subject as the learning image or the model image.

A program for causing a computer to execute a process of generating a recognizer for recognizing a recognition object in advance by a learning process,
Controlling acquisition of learning images used in the learning process;
Controlling the acquisition of a model image corresponding to the recognition target;
Performing the learning process using the acquired learning image and the model image, and generating a recognizer for recognizing the recognition target,
At least one of the step of acquiring the learning image or the step of acquiring the model image includes:
Controls the acquisition of image data that is in focus with the image of the subject present at a given focal length and not in focus with other objects,
Extracting a portion corresponding to the subject in focus from the acquired image data,
A program that causes a computer to execute a process of acquiring a portion corresponding to the extracted subject as the learning image or the model image.

In a recognition apparatus that performs a recognition process that determines whether or not a recognition target is included in a recognition image using a recognizer generated by a learning process,
Recognition image acquisition means for acquiring the recognition image used for performing recognition processing;
Recognizer storage means for storing the recognizer;
Selection feature quantity storage means for storing a selection feature quantity corresponding to each of the recognizers stored by the recognizer storage means;
The recognition target is included in the recognition image acquired by the recognition image acquisition unit using the recognition unit stored in the recognition unit storage unit and the selected feature amount stored in the selection feature amount storage unit. A recognition processing means for judging whether or not it is included,
The recognized image acquisition means includes
Image acquisition means for acquiring image data in which the focal point matches an image of a subject existing at a predetermined focal length and the focal point does not match other objects;
Image extracting means for extracting from the image data acquired by the image acquiring means a portion corresponding to the subject in focus;
A recognition apparatus that acquires, as the recognition image, a portion corresponding to the subject extracted by the image extraction means.

The image extracting means includes
First calculation means for executing calculation processing for extracting a pixel having a large difference from neighboring pixels in each pixel of the image data acquired by the image acquisition means;
Second computing means for obtaining an average of the target pixel and its neighboring area, with a pixel having a large difference from the neighboring pixel extracted by the first computing means as a target pixel;
17. The image according to claim 16, further comprising: an area corresponding to an object to be detected based on a calculation result of the second calculation means; and a dividing means for dividing the image data into an area considered to be a background. Processing equipment.

The image processing apparatus according to claim 17, wherein the dividing unit recognizes a pixel whose calculation result of the second calculation unit is a positive value as a region corresponding to an object to be detected.

The recognition processing means includes
Recognition feature point generation means for generating a plurality of feature points as recognition feature points from the recognition image acquired by the recognition image acquisition means;
Recognition feature value generation means for generating a feature value at each of the recognition feature points generated by the recognition feature point generation means as a recognition feature value;
For each of the selected feature amounts stored in the selected feature amount storage unit, the recognition feature amount having the highest correlation among the recognition feature amounts generated by the recognition feature amount generation unit is selected and selected. A recognition correlation feature value generating means for generating a degree of correlation with the recognition correlation feature value;
By substituting the recognition correlation feature quantity generated by the recognition correlation feature quantity generation means into the recognizer stored by the recognizer storage means, the recognition image acquired by the recognition image acquisition means is added to the recognition image. The recognition apparatus according to claim 16, further comprising: a determination unit that determines whether or not a recognition target is included.

The recognizer stored by the recognizer storage means is:
Generate multiple feature points as model feature points from a given model image,
Generating a feature quantity at each of the model feature points as a model feature quantity,
A plurality of feature points are generated as learning feature points from a predetermined learning image,
Generating a feature quantity at each of the learning feature points as a learning feature quantity;
For each of the model feature amounts, the one having the highest correlation among the learning feature amounts is selected, and a degree of correlation with the selected learning feature amount is generated as a learning correlation feature amount,
Correct / incorrect information indicating whether or not the learning image includes the recognition target,
The recognition device according to claim 16, wherein the recognition device is a recognizer generated based on the learning correlation feature quantity and the correct / incorrect information.

Whether the recognition target is included in the recognition image using the recognition unit generated by the learning process and stored in the storage unit, and the selected feature amount corresponding to each of the recognition units stored in the storage unit In a recognition method of a recognition device that performs recognition processing to determine whether or not,
Obtaining the recognition image used for performing the recognition process;
Determining whether the recognition target is included in the acquired recognition image using the recognizer and the selected feature amount;
In the process of acquiring the recognition image,
Obtain image data that has a focal point that matches the image of the subject that exists at the specified focal length and that does not match the focal point of other objects,
Extracting a portion corresponding to the subject in focus from the acquired image data,
A recognition method for acquiring a portion corresponding to the extracted subject as the recognition image.

Whether the recognition target is included in the recognition image using the recognition unit generated by the learning process and stored in the storage unit, and the selected feature amount corresponding to each of the recognition units stored in the storage unit A program for causing a computer to execute a process for determining whether or not,
Controlling the acquisition of the recognition image used to perform recognition processing;
Determining whether the recognition target is included in the acquired recognition image using the recognizer and the selected feature amount;
In the process of acquiring the recognition image,
Controls the acquisition of image data that is in focus with the image of the subject present at a given focal length and not in focus with other objects,
Extracting a portion corresponding to the subject in focus from the acquired image data,
A program for causing a computer to execute a process of acquiring a portion corresponding to the extracted subject as the recognized image.