JP2008171147A

JP2008171147A - Object recognition device and method

Info

Publication number: JP2008171147A
Application number: JP2007002832A
Authority: JP
Inventors: Yusuke Nakano; 雄介中野
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2007-01-10
Filing date: 2007-01-10
Publication date: 2008-07-24

Abstract

<P>PROBLEM TO BE SOLVED: To suppress incorrect recognition of an attitude of a recognition object resulting from discrepancy of the center of a recognition object and the center of a scanning window. <P>SOLUTION: An object recognition device 1 has: a template storage section 16 which stores a template dictionary which describes a feature vector of a plurality of rotation template images for collating with an input image and the feature vector of a plurality of translation template images; a feature vector calculation section 13 which calculates the feature vector of a partial image partially chosen from an input image by the scanning window as a processing target area; and an estimation section (template matching section 15 and result integrated section 17) which estimates a location and an attitude of the recognition object included in the input image based on the template dictionary and the feature vector of the partial image. Further, a plurality of rotation template images are an aggregate of images looked at a recognition object from a plurality of directions, and a plurality of translation template images is an assembly of images which have caught a recognition object partially from a plurality of views along a straight line trajectory. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、入力画像を複数のテンプレート画像と照合することにより、入力画像に含まれる認識対象物の位置及び姿勢を推定する物体認識装置及び物体認識方法に関する The present invention relates to an object recognition apparatus and an object recognition method for estimating the position and orientation of a recognition target included in an input image by collating the input image with a plurality of template images.

自身の判断により自律的に移動するロボットの視覚を担うコンピュータビジョンにおいては、カメラによって撮影された画像等から物体の位置及び姿勢を認識できることが必要である。このような物体認識に用いられる技術として、認識対象物の多方向からの見え方に対応する複数のテンプレート画像を予め記憶しておき、撮影によって得られた入力画像を複数のテンプレート画像と照合し、入力画像がどのテンプレート画像に類似するかによって認識対象物の姿勢を推定する方法が知られている。以下では、物体認識に用いる複数のテンプレート画像を記憶したデータをテンプレート辞書と呼ぶ。 In computer vision, which is responsible for the vision of a robot that moves autonomously based on its own judgment, it is necessary to be able to recognize the position and orientation of an object from an image taken by a camera. As a technique used for such object recognition, a plurality of template images corresponding to how a recognition target object is viewed from multiple directions are stored in advance, and an input image obtained by photographing is collated with a plurality of template images. There is known a method for estimating the posture of a recognition object depending on which template image an input image is similar to. Hereinafter, data storing a plurality of template images used for object recognition is referred to as a template dictionary.

入力画像とテンプレート辞書との照合は、入力画像から抽出した特徴量とテンプレート辞書に含まれるテンプレート画像の特徴量とを照合することによって行われる。ここで、特徴量とは、入力画像及びテンプレート画像が有する情報のうち、画像の識別に有用な本質的な成分を抽出したものである。通常、特徴量は複数の特徴量の値を要素とする特徴ベクトルとして記述される。特徴ベクトルの次元数は、特徴量の数に対応する。また、入力画像及びテンプレート画像はそれぞれ、特徴ベクトルによって張られる多次元の特徴空間上の点として表現される。 The collation between the input image and the template dictionary is performed by collating the feature amount extracted from the input image with the feature amount of the template image included in the template dictionary. Here, the feature amount is obtained by extracting essential components useful for image identification from information included in the input image and the template image. Usually, the feature amount is described as a feature vector having elements of a plurality of feature amount values. The number of dimensions of the feature vector corresponds to the number of feature quantities. Further, each of the input image and the template image is expressed as a point on a multidimensional feature space stretched by feature vectors.

最も単純な特徴ベクトルは、画像データ及びテンプレート画像を構成する各画素の値を要素とする特徴ベクトルである。しかしながら、これでは特徴ベクトルの次元数が画像データを構成する画素数に達し、その数が膨大となるため、テンプレート画像を格納するための記憶容量及び画像照合に要する時間も膨大となり現実的でない。 The simplest feature vector is a feature vector having the values of each pixel constituting the image data and the template image as elements. However, in this case, the number of dimensions of the feature vector reaches the number of pixels constituting the image data, and the number thereof is enormous. Therefore, the storage capacity for storing the template image and the time required for image matching become enormous, which is not realistic.

そこで、特徴ベクトルの次元数を圧縮することにより、少ない記憶容量でテンプレート辞書を記憶することができ、高速に画像照合を行うことができる固有空間法が知られている。固有空間法は、テンプレート画像の主成分分析（PCA：Principal Components Analysis）によって固有空間を生成し、テンプレート画像を固有空間に射影して得られる主成分を特徴量として、テンプレート画像の特徴ベクトルを生成するものである。 Therefore, an eigenspace method is known in which a template dictionary can be stored with a small storage capacity by compressing the number of dimensions of a feature vector, and image matching can be performed at high speed. The eigenspace method generates the eigenspace by principal component analysis (PCA: Principal Components Analysis) of the template image, and generates the feature vector of the template image using the principal components obtained by projecting the template image onto the eigenspace. To do.

また、上述した固有空間法を物体の位置及び姿勢の推定に応用したパラメトリック固有空間法が提案されている（非特許文献１、特許文献１を参照）。パラメトリック固有空間法では、認識対象物を回転させながら連続的に撮影したテンプレート画像を予め学習しておく。以下では、物体を多様な方向から撮影して得られるテンプレート画像を回転テンプレート画像と呼ぶ。回転テンプレート画像の集合は、固有空間上の多様体（以下、回転多様体と呼ぶ）として表現される。認識対象物の姿勢の推定は、入力画像を固有空間に射影して得られる射影点との距離が最も近いテンプレート画像の射影点を求めることにより行われる。 Further, a parametric eigenspace method in which the eigenspace method described above is applied to estimation of the position and orientation of an object has been proposed (see Non-Patent Document 1 and Patent Document 1). In the parametric eigenspace method, a template image captured continuously while rotating a recognition object is learned in advance. Hereinafter, a template image obtained by photographing an object from various directions is referred to as a rotated template image. A set of rotation template images is represented as a manifold on the eigenspace (hereinafter referred to as a rotation manifold). The posture of the recognition target object is estimated by obtaining a projection point of the template image that is closest to the projection point obtained by projecting the input image onto the eigenspace.

さらに、入力画像内における認識対象物の位置の推定は以下のように行われる。即ち、入力画像内の一部を部分的に選択するための走査ウィンドウを設定し、走査ウィンドウによって入力画像を順次走査しながら、走査ウィンドウによって選択される部分画像に対して上述したパラメトリック固有空間法による物体の姿勢推定を行う。そして、認識対象物の姿勢が推定された走査ウィンドウの位置をもとに認識対象物の入力画像内における位置を推定する。
特開２００５−１４９１６７号公報村瀬洋, S.K.Nayar、「２次元照合による３次元物体認識−パラメトリック固有空間法−」、信学論、J77-D-II、No.11、pp.2179-2187、1994年11月 Further, the position of the recognition object in the input image is estimated as follows. That is, the above-described parametric eigenspace method is applied to the partial image selected by the scanning window while setting a scanning window for partially selecting a part of the input image and sequentially scanning the input image by the scanning window. Estimate the pose of the object. Then, the position of the recognition object in the input image is estimated based on the position of the scanning window where the posture of the recognition object is estimated.
JP 2005-149167 A Murase Hiroshi, SKNayar, "3D object recognition by 2D collation-parametric eigenspace method", IEICE theory, J77-D-II, No.11, pp.2179-2187, November 1994

上述したように、テンプレート画像との照合による従来の物体認識方法は、学習段階において、認識対象物を回転させながら撮影することにより得られる回転テンプレート画像により構成されるテンプレート辞書を生成しておく。さらに、物体認識段階では、走査ウィンドウによって入力画像の一部が選択された部分画像を、テンプレート辞書に含まれる回転テンプレート画像と照合することにより、物体の位置及び姿勢の推定を行う。 As described above, the conventional object recognition method based on matching with a template image generates a template dictionary composed of a rotated template image obtained by photographing a recognition target object while rotating it at the learning stage. Further, in the object recognition stage, the position and orientation of the object are estimated by collating a partial image selected from a part of the input image by the scanning window with the rotation template image included in the template dictionary.

しかしながら、このような従来の物体認識方法では、走査ウィンドウによって選択される部分画像内に認識対象物の一部が含まれている場合、より詳細に述べると、撮影された認識対象物の中心と走査ウィンドウによって選択される部分画像の中心がずれている場合に、認識対象物の姿勢を誤認識しやすいため、姿勢推定の精度が低下するという問題がある。なお、以下では、認識対象物の中心と走査ウィンドウの中心とのずれを"並進ずれ"と表現する。 However, in such a conventional object recognition method, when a part of the recognition target object is included in the partial image selected by the scanning window, more specifically, the center of the recognized recognition target object is recorded. When the center of the partial image selected by the scanning window is deviated, there is a problem that the posture estimation accuracy is lowered because the posture of the recognition target object is easily misrecognized. In the following, the shift between the center of the recognition object and the center of the scanning window is expressed as “translational shift”.

本発明は、上述した問題を考慮してなされたものであり、認識対象物の中心と走査ウィンドウの中心とのずれ（並進ずれ）に起因する認識対象物の姿勢の誤認識を抑制することを目的とする。 The present invention has been made in consideration of the above-described problem, and suppresses misrecognition of the posture of the recognition object due to a shift (translational shift) between the center of the recognition object and the center of the scanning window. Objective.

本発明の第１の態様にかかる物体認識装置は、撮像手段により撮像されて入力される入力画像を用いて認識対象物の位置及び姿勢を推定する装置であって、前記入力画像と照合するための複数の回転テンプレート画像の特徴ベクトル及び複数の並進テンプレート画像の特徴ベクトルに関して記述されたテンプレート辞書を記憶する記憶部と、前記入力画像上を走査されて処理対象領域を指定する走査ウィンドウによって、前記入力画像から前記処理対象領域として部分的に選択される部分画像の特徴を示す特徴ベクトルを算出する特徴ベクトル算出部と、前記テンプレート辞書及び前記部分画像の特徴ベクトルに基づいて、前記入力画像内に含まれる認識対象物の位置及び姿勢を推定する推定部とを有する。さらに、前記複数の回転テンプレート画像は、前記認識対象物を複数の方向から見た画像の集合であり、前記複数の並進テンプレート画像は、直線軌道に沿った複数の視点から前記認識対象物を部分的に捉えた画像の集合である。なお、後述する発明の実施の形態１では、テンプレート照合部１５及び結果統合部１７が、前記第１の態様にかかる物体認識装置が有する推定部に相当する。 An object recognition apparatus according to a first aspect of the present invention is an apparatus that estimates the position and orientation of a recognition object using an input image that is captured and input by an imaging unit, and for collating with the input image. A storage unit for storing a template dictionary described with respect to a feature vector of a plurality of rotation template images and a feature vector of a plurality of translation template images, and a scanning window that scans the input image and designates a processing target region. A feature vector calculation unit that calculates a feature vector indicating a feature of a partial image that is partially selected as the processing target region from the input image, and based on the template dictionary and the feature vector of the partial image, And an estimation unit that estimates the position and orientation of the included recognition target object. Further, the plurality of rotation template images are a set of images obtained by viewing the recognition target object from a plurality of directions, and the plurality of translation template images are obtained by partially imaging the recognition target object from a plurality of viewpoints along a linear trajectory. It is a set of images captured as a target. In the first embodiment of the invention to be described later, the template matching unit 15 and the result integration unit 17 correspond to an estimation unit included in the object recognition apparatus according to the first aspect.

ここで、前記回転テンプレート画像は、前記認識対象物を中心とする円軌道上に位置する複数の視点から前記認識対象物を見た画像の集合とし、前記並進テンプレート画像は、直線軌道上に位置する複数の視点から前記認識対象物を部分的に捉えた画像の集合としてもよい。 Here, the rotation template image is a set of images obtained by viewing the recognition target object from a plurality of viewpoints positioned on a circular trajectory centered on the recognition target object, and the translation template image is positioned on a linear trajectory. It is good also as a set of images which caught the above-mentioned recognition subject partially from a plurality of viewpoints.

また、前記複数の回転テンプレート画像は、前記認識対象物を中心とする円軌道に沿った複数の視点から前記認識対象物を含んで撮影された画像から前記認識対象物の全体を含むように切り出された画像の集合とし、前記複数の並進テンプレート画像は、前記認識対象物を含んで撮影された画像から前記認識対象物の部分のみを含むよう切り出された画像の集合としてもよい。 Further, the plurality of rotation template images are cut out so as to include the entire recognition target object from images photographed including the recognition target object from a plurality of viewpoints along a circular orbit centered on the recognition target object. The plurality of translation template images may be a set of images cut out so as to include only a part of the recognition target object from the image photographed including the recognition target object.

このような構成によって、入力画像から切り出される部分画像の特徴ベクトルが走査ウィンドウの移動に伴って示す変動特性と、並進テンプレート画像の特徴ベクトルが並進ずれに伴って示す変動特性とを照合させることができる。したがって、仮に、回転テンプレート画像との照合において並進ずれに起因する姿勢の誤判定が生じた場合にも、並進テンプレート画像との照合において、誤判定された回転テンプレート画像の並進多様体と評価対象の部分画像の並進多様体との間に不整合が生じるため、誤って推定された認識対象物の姿勢をリジェクトできる。これにより、並進ずれに起因する認識対象物の姿勢の誤認識を抑制することができる。 With such a configuration, it is possible to collate the variation characteristic that the feature vector of the partial image cut out from the input image shows with the movement of the scanning window and the variation characteristic that the feature vector of the translation template image shows with translational deviation. it can. Therefore, even if a misjudgment of a posture due to translational deviation occurs in the collation with the rotation template image, the translation manifold of the rotation template image misjudged in the collation with the translation template image and the evaluation target Since a mismatch occurs between the translation manifolds of the partial images, the posture of the recognition target object that is erroneously estimated can be rejected. Thereby, the misrecognition of the attitude | position of the recognition target object resulting from a translation shift can be suppressed.

上述した本発明の第１の態様にかかる物体認識装置において、前記複数の並進テンプレート画像は、前記複数の回転テンプレート画像の各々に比べて撮影画像の中心をずらして前記認識対象物を撮影した画像群を含むよう構成してもよい。 In the object recognition apparatus according to the first aspect of the present invention described above, the plurality of translation template images are images obtained by photographing the recognition object with the center of the photographed image shifted from each of the plurality of rotation template images. You may comprise so that a group may be included.

さらに、前記テンプレート照合部は、前記複数の回転テンプレート画像の特徴ベクトルによって作られる回転多様体と前記部分画像の特徴ベクトルとの間の特徴空間における距離が第１の閾値より小さく、かつ、前記第１の閾値より小さい前記回転多様体との距離を与える回転テンプレート画像に対応する並進テンプレート画像によって作られる並進多様体と前記部分画像の特徴ベクトルとの間の前記特徴空間における距離が第２の閾値より小さい場合に、前記第１の閾値より小さい前記回転多様体との距離を与える回転テンプレート画像によって特定される姿勢を前記認識対象物の姿勢と推定するよう構成してもよい。これにより、並進ずれに起因する認識対象物の姿勢の誤認識を効率よく抑制できる。 Further, the template matching unit has a distance in a feature space between a rotated manifold created by the feature vectors of the plurality of rotated template images and a feature vector of the partial image smaller than a first threshold, and the first A distance in the feature space between a translation manifold created by a translation template image corresponding to a rotation template image that gives a distance from the rotation manifold smaller than a threshold value of 1 and a feature vector of the partial image is a second threshold value. In the case of being smaller, the posture specified by the rotation template image that gives the distance from the rotating manifold smaller than the first threshold value may be estimated as the posture of the recognition target object. Thereby, the misrecognition of the attitude | position of the recognition target object resulting from translational displacement can be suppressed efficiently.

本発明の第２の態様にかかる物体認識方法は、処理対象領域を規定する走査ウィンドウによって入力画像上を走査する処理と、前記走査ウィンドウによって前記処理対象領域として前記入力画像から部分的に選択される部分画像の特徴を示す特徴ベクトルを算出する処理と、前記認識対象物を複数の方向から見た画像の集合である複数の回転テンプレート画像の特徴ベクトル、及び、直線軌道に沿った複数の視点から前記認識対象物を部分的に捉えた画像の集合である複数の並進テンプレート画像の特徴ベクトルに関して記述されたテンプレート辞書と前記部分画像の特徴ベクトルとを照合することによって、前記入力画像内に撮影されている認識対象物の位置及び姿勢を推定する処理とを含む。 An object recognition method according to a second aspect of the present invention includes a process of scanning an input image by a scanning window that defines a processing target area, and the scanning window is partially selected from the input image as the processing target area. A feature vector indicating the feature of the partial image, a plurality of rotation template image feature vectors that are a set of images obtained by viewing the recognition target object from a plurality of directions, and a plurality of viewpoints along a linear trajectory The template image described with respect to the feature vectors of a plurality of translation template images, which is a set of images partially capturing the recognition object, is compared with the feature vectors of the partial images, and photographed in the input image And a process for estimating the position and orientation of the recognized recognition object.

このような方法によって、入力画像から切り出される部分画像の特徴ベクトルが走査ウィンドウの移動に伴って示す変動特性と、並進テンプレート画像の特徴ベクトルが並進ずれに伴って示す変動特性とを照合させることができる。したがって、仮に、回転テンプレート画像との照合において並進ずれに起因する姿勢の誤判定が生じた場合にも、並進テンプレート画像との照合において、誤判定された回転テンプレート画像の並進多様体と評価対象の部分画像の並進多様体との間に不整合が生じるため、誤って推定された認識対象物の姿勢をリジェクトできる。これにより、並進ずれに起因する認識対象物の姿勢の誤認識を抑制することができる。 By such a method, it is possible to collate the variation characteristic that the feature vector of the partial image cut out from the input image shows with the movement of the scanning window and the variation characteristic that the feature vector of the translation template image shows with translational deviation. it can. Therefore, even if a misjudgment of a posture due to translational deviation occurs in the collation with the rotation template image, the translation manifold of the rotation template image misjudged in the collation with the translation template image and the evaluation target Since a mismatch occurs between the translation manifolds of the partial images, the posture of the recognition target object that is erroneously estimated can be rejected. Thereby, the misrecognition of the attitude | position of the recognition target object resulting from a translation shift can be suppressed.

さらに、前記第２の態様にかかる物体認識方法における認識対象物の姿勢推定では、前記複数の回転テンプレート画像の特徴ベクトルによって作られる回転多様体と前記部分画像の特徴ベクトルとの間の特徴空間における距離が第１の閾値より小さいか否かを判定し、前記第１の閾値より小さいと判定される場合に、前記第１の閾値より小さい前記回転多様体との距離を与える回転テンプレート画像に対応する並進テンプレート画像によって作られる並進多様体と前記部分画像の特徴ベクトルとの間の前記特徴空間における距離が第２の閾値より小さいか否かを判定し、前記第２の閾値より小さいと判定される場合に、前記第１の閾値より小さい前記回転多様体との距離を与える回転テンプレート画像によって特定される姿勢を前記認識対象物の姿勢推定結果としてもよい。これによって、部分画像と並進テンプレート画像との照合に要する計算量を削減し、物体認識を効率よく行うことができる。 Further, in the posture estimation of the recognition target object in the object recognition method according to the second aspect, in the feature space between the rotation manifold created by the feature vectors of the plurality of rotation template images and the feature vectors of the partial images. It is determined whether or not the distance is smaller than the first threshold, and corresponds to a rotation template image that gives a distance from the rotating manifold smaller than the first threshold when it is determined that the distance is smaller than the first threshold. Determining whether the distance in the feature space between the translation manifold created by the translation template image and the feature vector of the partial image is less than a second threshold, and is determined to be less than the second threshold. The posture identified by the rotation template image that gives a distance from the rotating manifold that is smaller than the first threshold is the recognition target. Of it may be used as the attitude estimation results. As a result, the amount of calculation required for collation between the partial image and the translation template image can be reduced, and object recognition can be performed efficiently.

本発明により、認識対象物の中心と走査ウィンドウの中心とのずれ（並進ずれ）に起因する認識対象物の姿勢の誤認識を抑制することが可能な物体認識装置及び物体認識方法を提供できる。 According to the present invention, it is possible to provide an object recognition apparatus and an object recognition method capable of suppressing erroneous recognition of the posture of a recognition target object due to a shift (translational shift) between the center of the recognition target object and the center of the scanning window.

以下では、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。各図面において、同一要素には同一の符号が付されており、説明の明確化のため、必要に応じて重複説明は省略される。 Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings. In the drawings, the same elements are denoted by the same reference numerals, and redundant description is omitted as necessary for the sake of clarity.

発明の実施の形態１．
本実施の形態にかかる物体認識装置１の構成を図１に示す。物体認識装置１は、入力画像に含まれる認識対象物の位置及び姿勢の推定結果を出力する装置である。より詳しく述べると、物体認識装置１は、入力画像を走査ウィンドウ単位で順に走査し、走査ウィンドウによって選択される部分画像を、認識対象物の多様な姿勢を表す複数のテンプレート画像と照合することによって、入力画像に含まれる認識対象物の位置及び姿勢を推定する。以下では、図１を参照して物体認識装置１が有する各構成要素について説明する。 Embodiment 1 of the Invention
A configuration of an object recognition apparatus 1 according to the present embodiment is shown in FIG. The object recognition device 1 is a device that outputs an estimation result of the position and orientation of a recognition target object included in an input image. More specifically, the object recognition apparatus 1 sequentially scans input images in units of scanning windows, and collates partial images selected by the scanning windows with a plurality of template images representing various postures of the recognition target object. The position and orientation of the recognition object included in the input image are estimated. Below, with reference to FIG. 1, each component which the object recognition apparatus 1 has is demonstrated.

図１において、撮像部１１は、ＣＣＤ（Charge Coupled Device）等の撮像素子（不図示）、及び撮像素子によって得られたアナログ画像データをデジタル画像データに変換して出力する信号処理部（不図示）を備える。 In FIG. 1, an image pickup unit 11 is an image pickup device (not shown) such as a CCD (Charge Coupled Device), and a signal processing unit (not shown) that converts analog image data obtained by the image pickup device into digital image data and outputs it. ).

画像特徴算出部１２は、撮像部１１から入力される入力画像に対する画像処理を行って入力画像の特徴量を算出する。より具体的に述べると、画像特徴算出部１２は、入力画像から走査ウィンドウによって選択される部分画像を単位として特徴量を算出する。特徴量の算出は、例えば、入力画像データを濃淡画像データに変換し、変換後の濃淡画像データに対する微分処理を行って、エッジ強度やエッジの方向を求め特徴量とすることができる。また、入力画像データの各画素の正規化された輝度値を特徴量としてもよい。なお、画像特徴算出部１２における特徴量の抽出手法は上述した手法に限られることはなく、従来知られている手法を含む様々な手法を適用可能である。入力画像（具体的には走査ウィンドウにより選択された部分画像）から算出されたｐ個（ｐは正の整数）の特徴量は、ｐ次元の特徴ベクトルＤとして表される。 The image feature calculation unit 12 performs image processing on the input image input from the imaging unit 11 and calculates the feature amount of the input image. More specifically, the image feature calculation unit 12 calculates a feature amount in units of partial images selected by the scanning window from the input image. The feature amount can be calculated, for example, by converting input image data into grayscale image data and performing a differentiation process on the converted grayscale image data to obtain edge strength and edge direction as the feature amount. Further, the normalized luminance value of each pixel of the input image data may be used as the feature amount. Note that the feature amount extraction method in the image feature calculation unit 12 is not limited to the above-described method, and various methods including a conventionally known method can be applied. The p feature quantities (p is a positive integer) calculated from the input image (specifically, the partial image selected by the scanning window) is represented as a p-dimensional feature vector D.

特徴ベクトル算出部１３は、基底ベクトル記憶部１４に予め格納された基底ベクトル群Ｅを用いて、画像特徴算出部１２により算出された特徴ベクトルＤを次元圧縮する。特徴ベクトル算出部１３により次元圧縮された特徴ベクトルＤの次元数をｎ（ｎは正の整数であり、ｎ＜ｐ）とする。基底ベクトル群Ｅは、例えば、従来から知られている固有空間法を適用することによって得ることができる。具体的には、入力画像との照合に使用される複数のテンプレート画像（具体的には後述する回転テンプレート画像及び並進テンプレート画像）に対する主成分分析（ＰＣＡ）を行うことで決定された固有ベクトルの集合とすればよい。このとき、基底ベクトル群Ｅは、ｎ個のベクトルの集合として、以下の（１）式で表すことができる。また、次元圧縮された部分画像の特徴ベクトルＤを（２）式に示す。 The feature vector calculation unit 13 dimensionally compresses the feature vector D calculated by the image feature calculation unit 12 using the basis vector group E stored in advance in the basis vector storage unit 14. The number of dimensions of the feature vector D dimensionally compressed by the feature vector calculation unit 13 is n (n is a positive integer, n <p). The basis vector group E can be obtained, for example, by applying a conventionally known eigenspace method. Specifically, a set of eigenvectors determined by performing principal component analysis (PCA) on a plurality of template images (specifically, rotation template images and translation template images described later) used for matching with the input image And it is sufficient. At this time, the basis vector group E can be expressed by the following expression (1) as a set of n vectors. In addition, the feature vector D of the dimensionally compressed partial image is shown in Equation (2).

テンプレート照合部１５は、走査ウィンドウによって入力画像から選択された部分画像と、複数のテンプレート画像との照合を行うことにより、入力画像に撮影されている認識対象物の位置及び姿勢の推定を行う。具体的には、テンプレート記憶部１６に格納された複数のテンプレート画像の特徴ベクトルのうち、部分画像の特徴ベクトルＤとの特徴空間における距離ｓが最小となるテンプレート画像を選択し、選択されたテンプレート画像に対応する認識対象物の姿勢を推定する。また、姿勢推定が行われた部分画像の位置をもとに認識対象物の入力画像内における位置を推定する。なお、テンプレート照合部１５は、距離ｓの最小値が所定の閾値より大きい場合は、評価対象である部分画像に近いテンプレート画像が存在しない、つまり、入力画像中に認識対象の物体は含まれていないと判定して、その部分画像をリジェクトする。 The template matching unit 15 estimates the position and orientation of the recognition object captured in the input image by matching the partial image selected from the input image with the scanning window and a plurality of template images. Specifically, the template image that minimizes the distance s in the feature space with the feature vector D of the partial image is selected from the feature vectors of the plurality of template images stored in the template storage unit 16, and the selected template is selected. The posture of the recognition object corresponding to the image is estimated. Further, the position of the recognition object in the input image is estimated based on the position of the partial image for which the posture estimation has been performed. If the minimum value of the distance s is larger than the predetermined threshold, the template matching unit 15 does not have a template image close to the partial image to be evaluated, that is, the recognition target object is included in the input image. It is determined that there is no image and the partial image is rejected.

テンプレート照合部１５は、撮像部１１によって撮影された入力画像から走査ウィンドウによって切り出される複数の部分画像に対して、認識対象物の姿勢を推定するための上述した演算処理を繰り返し行う。本実施の形態では、図２に示すように、入力画像２１に対して走査ウィンドウ２２を横方向に順次走査する方法、いわゆるラスタスキャンを行うことにより、部分画像の選択からテンプレート照合の一連の処理を繰り替えし行うものとする。なお、走査ウィンドウによる入力画像の走査方法は、図２に示す方法に限られないことはもちろんである。例えば、走査ウィンドウによる走査を入力画像２１の上下方向に沿って行ってもよい。 The template matching unit 15 repeatedly performs the above-described arithmetic processing for estimating the posture of the recognition target object for a plurality of partial images cut out by the scanning window from the input image captured by the imaging unit 11. In this embodiment, as shown in FIG. 2, a method of sequentially scanning the scanning window 22 in the horizontal direction with respect to the input image 21, that is, a so-called raster scan, performs a series of processes from selection of a partial image to template matching. Shall be repeated. Of course, the scanning method of the input image by the scanning window is not limited to the method shown in FIG. For example, scanning using a scanning window may be performed along the vertical direction of the input image 21.

テンプレート記憶部１６は、認識対象物のテンプレート画像に関する特徴ベクトルの集合{Ｔ_ｉ ^ｊ}を保持する。なお、本実施の形態におけるテンプレート記憶部１６は、回転テンプレート画像の特徴ベクトルの集合及び並進テンプレート画像の特徴ベクトル集合を保持することを特徴とする。ここで、回転テンプレート画像とは、上述したように、認識対象物を回転させながら撮影することにより得られるテンプレート画像である。言い換えると、回転テンプレート画像群は、認識対象物を複数の方向から見た画像の集合である。一方、並進テンプレート画像とは、走査ウィンドウの走査方向に応じて、撮影画像の中心と認識対象物の中心とがずれた状態で連続的に撮影することにより得られるテンプレート画像である。言い換えると、並進テンプレート画像群は、直線軌道に沿った複数の視点から認識対象物を部分的に捉えた画像の集合である。つまり、並進テンプレート画像をテンプレート記憶部１６に保持することによって、認識処理時に"並進ずれ"が発生した場合の認識対象物の見え方を物体認識装置１に学習させることができる。 The template storage unit 16 holds a set {T _i ^j } of feature vectors related to the template image of the recognition object. Note that the template storage unit 16 in the present embodiment is characterized by holding a set of feature vectors of a rotation template image and a feature vector set of a translation template image. Here, as described above, the rotation template image is a template image obtained by shooting while rotating the recognition target object. In other words, the rotation template image group is a set of images obtained by viewing the recognition target object from a plurality of directions. On the other hand, the translation template image is a template image obtained by continuously capturing images in a state where the center of the captured image is shifted from the center of the recognition object in accordance with the scanning direction of the scanning window. In other words, the translation template image group is a set of images in which the recognition target is partially captured from a plurality of viewpoints along the straight trajectory. That is, by holding the translation template image in the template storage unit 16, the object recognition apparatus 1 can learn how the recognition target object looks when a “translational shift” occurs during the recognition process.

認識対象物の回転テンプレート画像及び並進テンプレート画像の一例を図３に示す。図３（ａ）は、円柱の表面に文字"Ａ"が付された認識対象物２３の一例を示している。図３（ｂ）は、互いに９０度ずつ異なる４方向から認識対象物２３を撮影して得られる４つの回転テンプレート画像Ｔ_１ ^０、Ｔ_２ ^０、Ｔ_３ ^０、及びＴ_４ ^０を示している。具体的に述べると、回転テンプレート画像Ｔ_１ ^０は、認識対象物２３の文字Ａが付された部分に対向する位置から撮影された画像である。回転テンプレート画像Ｔ_２ ^０は、回転テンプレート画像Ｔ_１ ^０に比べて認識対象物２３を時計回りに９０度だけ回転させた状態で撮影された画像である。回転テンプレート画像Ｔ_３ ^０は、回転テンプレート画像Ｔ_２ ^０に比べて認識対象物２３を時計回りに９０度だけ回転させた状態で撮影された画像である。また、回転テンプレート画像Ｔ_４ ^０は、回転テンプレート画像Ｔ_３ ^０に比べて認識対象物２３を時計回りに９０度だけ回転させた状態で撮影された画像である。 An example of the rotation template image and translation template image of the recognition object is shown in FIG. FIG. 3A shows an example of the recognition object 23 in which the letter “A” is attached to the surface of the cylinder. FIG. 3B shows four rotation template images T ₁ ⁰ , T ₂ ⁰ , T ₃ ⁰ , and T ₄ ⁰ obtained by photographing the recognition object 23 from four directions that are different from each other by 90 degrees. . To be specific, rotate the template image T ₁ ⁰ is an image taken from a position facing the letter A of the recognition target object 23 is attached portion. Rotation template image T ₂ ⁰ is a photographed image of the recognition target object 23 as compared with the rotating template image T ₁ ⁰ while being rotated by 90 degrees clockwise. Rotation template image T ₃ ⁰ is a photographed image of the recognition target object 23 as compared with the rotating template image T ₂ ⁰ in a state of being rotated by 90 degrees clockwise. The rotation template image T ₄ ⁰ is a photographed image of the recognition target object 23 as compared with the rotating template image T ₃ ⁰ in a state of being rotated by 90 degrees clockwise.

図３（ｃ）は、回転テンプレート画像Ｔ_１ ^０に関連づけて撮影された一連の並進テンプレート画像を示している。図３（ｃ）の並進テンプレート画像の集合｛Ｔ_１ ^-2、Ｔ_１ ^-1、Ｔ_１ ⁰、Ｔ_１ ¹、Ｔ_１ ²｝には、回転テンプレート画像Ｔ_１ ^０のほかに、走査ウィンドウの走査方向に従って、回転テンプレート画像Ｔ_１ ^０に比べて撮影画像の中心をずらして認識対象物２３を撮影した画像群が含まれている。 FIG. 3 (c) shows a series of translational template images taken in relation to the rotating template image T ₁ ^0. In the set of translation template images {T ₁ ^-2 , T ₁ ^-1 , T ₁ ⁰ , T ₁ ¹ , T ₁ ² } in FIG. 3 (c), in addition to the rotated template image T ₁ ⁰ , accordance scanning direction, includes images obtained by photographing the recognition subject 23 by shifting the center of the captured image compared to rotate the template image T ₁ ⁰ is.

なお、上述したように、複数の異なる視点から認識対象物を直接撮影することで全ての回転テンプレート画像及び全ての並進テンプレート画像を得てもよいが、少なくとも一部のテンプレート画像を他の画像に対する画像処理によって生成してもよい。つまり、回転テンプレート画像群は、認識対象物を中心とする円軌道上に位置する複数の視点から前記認識対象物を見た画像の集合として生成すればよく、並進テンプレート画像群は、直線軌道上に位置する複数の視点から認識対象物を部分的に捉えた画像の集合として生成すればよい。 As described above, all the rotation template images and all the translation template images may be obtained by directly capturing the recognition target object from a plurality of different viewpoints. It may be generated by image processing. That is, the rotation template image group may be generated as a set of images obtained by viewing the recognition target object from a plurality of viewpoints located on a circular orbit centered on the recognition target object. What is necessary is just to produce | generate as a collection of the image which caught the recognition target object partially from several viewpoints located in.

また、回転テンプレート画像群は、前記認識対象物を中心とする円軌道に沿った複数の視点から前記認識対象物を含んで撮影された画像から認識対象物の全体を含むように切り出された画像の集合として生成してもよい。並進テンプレート画像群は、認識対象物を含んで撮影された画像から認識対象物の部分のみを含むよう切り出された画像の集合として生成してもよい。 In addition, the rotation template image group is an image that is cut out so as to include the entire recognition target object from images taken including the recognition target object from a plurality of viewpoints along a circular orbit centered on the recognition target object. May be generated as a set of The translation template image group may be generated as a set of images cut out so as to include only a portion of the recognition object from an image captured including the recognition object.

テンプレート記憶部１６に保持されるテンプレート画像の特徴ベクトルは、以下の（３）式により一般化して表される。（３）式において、ｍは回転テンプレートの総数である。ｑは、回転テンプレートに対して並進ずれを有する状態で撮影された並進テンプレート数を規定するパラメータである。 The feature vector of the template image held in the template storage unit 16 is generalized by the following equation (3). In equation (3), m is the total number of rotation templates. q is a parameter that defines the number of translation templates photographed in a state of translational deviation with respect to the rotation template.

図１に戻り、結果統合部１７は、テンプレート照合部１５による物体の位置及び姿勢の推定結果を統合して、最終的な推定結果を出力する。具体的には、走査ウィンドウによって入力画像を走査することにより選択される複数の部分画像のうち、テンプレート照合部１５によって認識対象物の位置及び姿勢が推定された画像が１つだけである場合は、その部分画像の位置を認識対象物の最終的な推定位置とし、テンプレート照合部１５の推定した姿勢を最終的な推定結果として出力すればよい。一方、テンプレート照合部１５によって物体の位置及び姿勢が推定された画像が複数存在する場合は、これら複数の候補を以下の（４）式に示すような重み付けを行って統合するとよい。 Returning to FIG. 1, the result integration unit 17 integrates the estimation results of the position and orientation of the object by the template matching unit 15 and outputs a final estimation result. Specifically, when there is only one image in which the position and orientation of the recognition target object are estimated by the template matching unit 15 among the plurality of partial images selected by scanning the input image with the scanning window. The position of the partial image may be set as the final estimated position of the recognition target object, and the posture estimated by the template matching unit 15 may be output as the final estimation result. On the other hand, when there are a plurality of images in which the position and orientation of the object are estimated by the template matching unit 15, the plurality of candidates may be integrated by weighting as shown in the following equation (4).

（４）式は、２つの候補Ｃ１＝（ｘ１、ｙ１、θ１）及びＣ２＝（ｘ２、ｙ２、θ２）の重み付け統合によって、最終的な物体位置及び姿勢の推定結果Ｃ３＝（ｘ３、ｙ３、θ３）を得る場合を示している。（４）式において、ｘ１及びｙ１は、第１の候補Ｃ１の入力画像中の位置座標を示し、θ１は第１の候補Ｃ１の姿勢を表す姿勢パラメータを表している。同様に、ｘ２及びｙ２は、第２の候補Ｃ２の入力画像中の位置座標を示し、θ２は第２の候補Ｃ２の姿勢を表す姿勢パラメータを表している。また、ｘ３及びｙ３は、推定結果Ｃの入力画像中での位置座標を示し、θ３は推定結果Ｃの姿勢を表す姿勢パラメータを表している。 Equation (4) is obtained by weighted integration of two candidates C1 = (x1, y1, θ1) and C2 = (x2, y2, θ2), and the final object position and orientation estimation result C3 = (x3, y3, In this case, θ3) is obtained. In the equation (4), x1 and y1 represent position coordinates in the input image of the first candidate C1, and θ1 represents a posture parameter representing the posture of the first candidate C1. Similarly, x2 and y2 represent position coordinates in the input image of the second candidate C2, and θ2 represents a posture parameter representing the posture of the second candidate C2. Further, x3 and y3 represent position coordinates in the input image of the estimation result C, and θ3 represents a posture parameter representing the posture of the estimation result C.

さらに、ｗ１及びｗ２は、第１の候補Ｃ１及び第２の候補Ｃ２を統合する際の重みを表しており、これらは、テンプレート照合部１５において算出されたテンプレート画像との距離ｓ１及びｓ２の関数として決定する。なお、ｓ１は第１の候補Ｃ１を推定した際のテンプレート画像との最小距離を表し、ｓ２は第２の候補Ｃ２を推定した際のテンプレート画像との最小距離を表す。重みｗ１及びｗ２は様々に定めることができるが、最も簡単な一例は線形近似である。 Further, w1 and w2 represent weights when the first candidate C1 and the second candidate C2 are integrated, and these are functions of the distances s1 and s2 from the template image calculated by the template matching unit 15 Determine as. Note that s1 represents the minimum distance from the template image when the first candidate C1 is estimated, and s2 represents the minimum distance from the template image when the second candidate C2 is estimated. The weights w1 and w2 can be variously determined, but the simplest example is a linear approximation.

続いて、本実施の形態にかかる物体認識装置１が行う物体認識処理に関して、図４のフローチャートを用いて詳細に説明する。図４は、物体認識装置１が行う物体認識処理手順を示すフローチャートである。ステップＳ１０１では、撮像部１１から画像特徴算出部１２に対して撮影画像データが入力される。ステップＳ１０２では、画像特徴算出部１２が、入力画像から走査ウィンドウによって選択される部分画像より特徴量を算出する。ステップＳ１０３では、特徴ベクトル算出部１３が、n次元の特徴空間に射影された入力画像の特徴ベクトル、具体的には走査ウィンドウによって選択された部分画像の特徴ベクトルＤを算出する。 Next, object recognition processing performed by the object recognition apparatus 1 according to the present embodiment will be described in detail with reference to the flowchart of FIG. FIG. 4 is a flowchart illustrating an object recognition processing procedure performed by the object recognition apparatus 1. In step S <b> 101, captured image data is input from the imaging unit 11 to the image feature calculation unit 12. In step S102, the image feature calculation unit 12 calculates a feature amount from the partial image selected by the scanning window from the input image. In step S103, the feature vector calculation unit 13 calculates the feature vector of the input image projected on the n-dimensional feature space, specifically, the feature vector D of the partial image selected by the scanning window.

続いて、ステップＳ１０４では、テンプレート照合部１５が、複数の回転テンプレート画像の特徴ベクトル｛Ｔ_ｉ ^０｝によって作られる特徴空間における回転多様体と、入力画像の特徴ベクトルＤとの最小距離Ｌ_ｓｃを算出する。具体的には、回転テンプレート画像の特徴ベクトルと入力画像の特徴ベクトルＤとの間の距離ｓ_ｃを回転テンプレート画像の各々について算出し、距離ｓ_ｃの最小値を最小距離Ｌ_ｓｃとすればよい。距離ｓ_ｃは以下の（５）式により算出でき、このときの最小距離Ｌ_ｓｃは、（６）式により表される。ここで、Ｍは信頼度重みであり、Ｃｏｎｆは回転多様体を表す。 Subsequently, in step S104, the template matching unit 15 sets the minimum distance L _sc between the rotated manifold in the feature space created by the feature vectors {T _i ⁰ } of the plurality of rotated template images and the feature vector D of the input image. calculate. Specifically, the distance s _c between the feature vector of the rotated template image and the feature vector D of the input image may be calculated for each of the rotated template images, and the minimum value of the distance s _c may be set as the minimum distance L _sc. . The distance s _c can be calculated by the following equation (5), and the minimum distance L _{sc at} this time is represented by the equation (6). Here, M is a reliability weight, and Conf represents a rotating manifold.

ステップＳ１０５では、テンプレート照合部１５が、ステップＳ１０４において算出された最小距離Ｌ_ｓｃが所定の第１の閾値Ｔｈ１以下であるか否かを判定する。判定の結果、最小距離Ｌ_ｓｃが第１の閾値Ｔｈ１より大きい場合には、走査ウィンドウを次の画素に移動して、ステップＳ１０２以降の処理を繰り返す（ステップＳ１１１）。一方、最小距離Ｌ_ｓｃが第１の閾値Ｔｈ１以下であると判定された場合は、ステップＳ１０６以降の処理を行う。 In step S105, the template matching unit 15 determines whether or not the minimum distance L _sc calculated in step S104 is equal to or smaller than a predetermined first threshold Th1. As a result of the determination, if the minimum distance L _sc is larger than the first threshold Th1, the scanning window is moved to the next pixel, and the processes after step S102 are repeated (step S111). On the other hand, when it is determined that the minimum distance L _sc is equal to or less than the first threshold Th1, the processes after step S106 are performed.

ステップＳ１０６では、テンプレート照合部１５が、最小距離Ｌ_ｓｃを与える回転テンプレート画像の並進多様体と、現在の評価対象である部分画像の並進多様体との最小距離Ｌ_ｓｔを算出する。ここで、回転テンプレート画像の並進多様体とは、図３（ｃ）に示したような回転テンプレート画像に関連づけて撮影された一連の並進テンプレート画像の特徴ベクトル群｛Ｔ_ｉ ^ｊ｝を、特徴空間に射影して得られる多様体である。また、部分画像の並進多様体とは、現在の評価対象である部分画像の特徴ベクトルと、現在の評価対象である部分画像の前後に走査ウィンドウによって選択される複数の部分画像の特徴ベクトル群を特徴空間に射影して得られる多様体である。 In step S106, the template matching unit 15 calculates the translation manifolds rotating the template image providing a minimum distance L _sc, the minimum distance L _st of the translational manifolds partial image which is currently being evaluated. Here, the translation manifold of the rotation template image refers to a feature vector group {T _i ^j } of a series of translation template images photographed in association with the rotation template image as shown in FIG. It is a manifold obtained by projecting to. The translation manifold of partial images includes a feature vector of a partial image that is a current evaluation target, and a feature vector group of a plurality of partial images selected by a scanning window before and after the partial image that is a current evaluation target. A manifold obtained by projecting onto a feature space.

図５は、回転テンプレート画像及び並進テンプレート画像に関する特徴空間の一例を示す概念図である。回転多様体３００は、複数の回転テンプレート画像によって形成される多様体を示している。並進多様体３１１は、ある回転テンプレート画像３１に関連づけて撮影された一連の並進テンプレート画像によって形成される多様体を示している。また、並進多様体３２１は、現在の評価対象である部分画像３２と、部分画像３２の前後において走査ウィンドウによって選択される複数の部分画像とによって形成される多様体を示している。 FIG. 5 is a conceptual diagram illustrating an example of a feature space related to a rotation template image and a translation template image. The rotation manifold 300 shows a manifold formed by a plurality of rotation template images. The translation manifold 311 indicates a manifold formed by a series of translation template images photographed in association with a certain rotation template image 31. The translation manifold 321 indicates a manifold formed by the partial image 32 that is the current evaluation target and a plurality of partial images selected by the scanning window before and after the partial image 32.

最小距離Ｌ_ｓｔは、以下の（７）式により算出することができる。ここで、Ｍは信頼度重みであり、Ｔｒａｎｓは回転テンプレート画像の並進多様体である。 The minimum distance _Lst can be calculated by the following equation (7). Here, M is a reliability weight, and Trans is a translation manifold of the rotated template image.

ステップＳ１０７では、テンプレート照合部１５が、ステップＳ１０６において算出された最小距離Ｌ_ｓｔが所定の第２の閾値Ｔｈ２以下であるか否かを判定する。判定の結果、最小距離Ｌ_ｓｃが第２の閾値Ｔｈ２より大きい場合には、走査ウィンドウを次の画素に移動して、ステップＳ１０２以降の処理を繰り返す（ステップＳ１１１）。一方、最小距離Ｌ_ｓｃが第２の閾値Ｔｈ２以下であると判定された場合は、ステップＳ１０８以降の処理を行う。 In step S107, the template matching unit 15 determines whether or not the minimum distance _Lst calculated in step S106 is equal to or smaller than a predetermined second threshold Th2. As a result of the determination, when the minimum distance L _sc is larger than the second threshold Th2, the scanning window is moved to the next pixel, and the processes after step S102 are repeated (step S111). On the other hand, when it is determined that the minimum distance L _sc is equal to or smaller than the second threshold Th2, the processes after step S108 are performed.

ステップＳ１０８では、今回の評価対象である部分画像により特定された認識対象物の位置及び姿勢を候補値として選択する。ステップＳ１０９の判定では、入力画像の全体に対して走査ウィンドウによる走査が終了していなければ、走査ウィンドウを次の画素に移動して、ステップＳ１０２以降の処理を繰り返す（ステップＳ１１１）。一方、入力画像全体の走査が終了した場合には、結果統合部１７が、異なる走査ウィンドウによって推定された位置及び姿勢の候補値を統合して、認識対象物の位置及び姿勢の推定結果を出力する。 In step S108, the position and orientation of the recognition target specified by the partial image that is the current evaluation target are selected as candidate values. If it is determined in step S109 that scanning by the scanning window has not been completed for the entire input image, the scanning window is moved to the next pixel, and the processing from step S102 onward is repeated (step S111). On the other hand, when the entire input image has been scanned, the result integration unit 17 integrates the position and orientation candidate values estimated by different scanning windows, and outputs the estimation result of the position and orientation of the recognition target object. To do.

なお、図４に示した物体認識処理の処理フローは一例である。図４の処理フローは、部分画像と回転テンプレート画像との照合を行って、部分画像の特徴ベクトルに近い回転テンプレート画像の特徴ベクトルが探索された場合にのみ、部分画像と並進テンプレート画像との照合を行うこととした。しかしながら、このような２段階の判定処理に拠らずに、部分画像と回転テンプレート画像との照合ステップを省略して、部分画像と並進テンプレート画像との照合をすぐに行ってもよい。しかしながら、全ての場合に部分画像と並進テンプレート画像との照合を行うと計算量が大きくなるため、上述したような２段階の探索によって、回転テンプレート画像との照合結果が良好な場合にのみ、並進テンプレートとの照合を行うことが望ましい。 The process flow of the object recognition process illustrated in FIG. 4 is an example. The processing flow of FIG. 4 performs the matching between the partial image and the translation template image only when the partial image and the rotation template image are collated and the feature vector of the rotation template image close to the feature vector of the partial image is searched. It was decided to do. However, the collation step between the partial image and the translation template image may be performed immediately without performing the collation step between the partial image and the rotation template image without relying on such a two-stage determination process. However, since the amount of calculation increases when collation between the partial image and the translation template image is performed in all cases, the translation is performed only when the collation result with the rotation template image is satisfactory by the two-stage search as described above. It is desirable to check against the template.

上述したように、本実施の形態にかかる物体認識装置１は、認識対象物を回転させながら撮影することにより得られる回転テンプレート画像の特徴ベクトルに加えて、撮影画像の中心と認識対象物の中心とが、走査ウィンドウの走査方向に沿ってずれを生じた状態で撮影することにより得られる並進テンプレート画像の特徴ベクトルをテンプレート記憶部１６に保持している。つまり、物体認識装置１は、並進テンプレート画像を保持することによって、認識対象物の中心と走査ウィンドウの中心とのずれ（並進ずれ）がある場合の認識対象物の見え方を予め学習している。 As described above, the object recognition apparatus 1 according to the present embodiment includes the center of the captured image and the center of the recognition object in addition to the feature vector of the rotated template image obtained by photographing the recognition object while rotating. Are stored in the template storage unit 16 as feature vectors of translation template images obtained by photographing in a state where a shift occurs in the scanning direction of the scanning window. In other words, the object recognition apparatus 1 learns in advance how the recognition object looks when there is a deviation (translational deviation) between the center of the recognition object and the center of the scanning window by holding the translation template image. .

さらに、物体認識装置１は、図４のステップＳ１０６乃至１０８で説明したように、回転テンプレート画像の並進多様体と、現在の評価対象である部分画像の並進多様体との特徴空間での距離を算出し、算出された距離を所定の閾値Ｔｈ２と比較する。これによって、入力画像から切り出される部分画像の特徴ベクトルが走査ウィンドウの移動に伴って示す変動特性と、並進テンプレート画像の特徴ベクトルが並進ずれに伴って示す変動特性とを照合させている。したがって、仮に、回転テンプレート画像との照合において並進ずれに起因する姿勢の誤判定が生じた場合にも、並進テンプレート画像との照合において、誤判定された回転テンプレート画像の並進多様体と評価対象の部分画像の並進多様体との間に不整合が生じるため、誤って推定された認識対象物の姿勢をリジェクトできる。これにより、並進ずれに起因する認識対象物の姿勢の誤認識を抑制することができる。 Further, as described in steps S106 to S108 in FIG. 4, the object recognition apparatus 1 calculates the distance in the feature space between the translation manifold of the rotated template image and the translation manifold of the partial image that is the current evaluation target. The calculated distance is compared with a predetermined threshold Th2. Thereby, the variation characteristic indicated by the feature vector of the partial image cut out from the input image with the movement of the scanning window is collated with the variation characteristic indicated by the feature vector of the translation template image with the translational deviation. Therefore, even if a misjudgment of a posture due to translational deviation occurs in the collation with the rotation template image, the translation manifold of the rotation template image misjudged in the collation with the translation template image and the evaluation target Since a mismatch occurs between the translation manifolds of the partial images, the posture of the recognition target object that is erroneously estimated can be rejected. Thereby, the misrecognition of the attitude | position of the recognition target object resulting from a translation shift can be suppressed.

なお、上述した本実施の形態にかかる物体認識装置１のうち撮像部１１を除く他の構成要素は、プログラムを実行可能なＣＰＵ（Central Processing Unit）を有するコンピュータシステムを用いて構成することが可能である。具体的には、ＣＰＵが内部に備える又はＣＰＵの外部に接続されるＲＯＭ又はフラッシュメモリ等の記憶部に、画像特徴算出部１２、特徴ベクトル算出部１３、テンプレート照合部１５及び結果統合部１７が行う図４のフローチャートに示した処理をコンピュータシステムに実行させるためのプログラムを格納しておき、当該プログラムをＣＰＵで実行することとすればよい。また、基底ベクトル記憶部１４及びテンプレート記憶部１６は、コンピュータシステムが備えるハードディスクやフラッシュメモリ等の不揮発性の記憶部とすればよい。 In the object recognition apparatus 1 according to the present embodiment described above, other components other than the imaging unit 11 can be configured using a computer system having a CPU (Central Processing Unit) that can execute a program. It is. Specifically, the image feature calculation unit 12, the feature vector calculation unit 13, the template matching unit 15, and the result integration unit 17 are stored in a storage unit such as a ROM or flash memory that the CPU is provided inside or connected to the outside of the CPU. A program for causing the computer system to execute the processing shown in the flowchart of FIG. 4 is stored, and the program is executed by the CPU. The basis vector storage unit 14 and the template storage unit 16 may be non-volatile storage units such as a hard disk and a flash memory included in the computer system.

図４のフローチャートに示した処理をコンピュータシステムに実行させるためのプログラムは、ＲＯＭ又はフラッシュメモリに限らず様々な種類の記憶媒体に格納することが可能であり、また、通信媒体を介して伝達することが可能である。ここで、記憶媒体には、例えば、フレキシブルディスク、ハードディスク、磁気ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＤＶＤ、バッテリバックアップ付きＲＡＭメモリカートリッジ等を含む。また、通信媒体には、電話回線等の有線通信媒体、マイクロ波回線等の無線通信媒体等を含み、インターネットも含まれる。 The program for causing the computer system to execute the processing shown in the flowchart of FIG. 4 can be stored in various types of storage media, not limited to ROM or flash memory, and is transmitted via a communication medium. It is possible. Here, the storage medium includes, for example, a flexible disk, a hard disk, a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD, a RAM memory cartridge with battery backup, and the like. The communication medium includes a wired communication medium such as a telephone line, a wireless communication medium such as a microwave line, and the Internet.

その他の実施の形態．
発明の実施の形態１にかかる物体認識装置１は、結果統合部１７において、複数の走査ウィンドウによって得た物体位置及び姿勢の推定結果の重み付き統合を行うこととした。しかしながら、このような重み付き統合を行わず、複数の位置及び姿勢の候補のうち、単純に最小距離Ｌ_ｓｔが最も小さくなる位置及び姿勢を最終的な推定結果として選択してもよい。また、物体認識装置１は、撮像部１１を必ずしも備える必要はなく、撮影後にメモリに格納された画像データを画像特徴算出部１２に入力してもよい。 Other embodiments.
In the object recognition apparatus 1 according to the first exemplary embodiment of the invention, the result integration unit 17 performs weighted integration of the object position and orientation estimation results obtained by a plurality of scanning windows. However, without performing such weighted integration, a position and orientation with the smallest minimum distance _Lst may be simply selected as a final estimation result among a plurality of position and orientation candidates. Further, the object recognition apparatus 1 does not necessarily include the imaging unit 11, and may input image data stored in the memory after shooting to the image feature calculation unit 12.

また、発明の実施の形態１にかかる物体認識装置１により行われる物体認識処理、具体的には、図４のフローチャートのステップＳ１０４では、距離ｓ_ｃが最小となる回転テンプレート画像に対応する物体の姿勢を探索している。しかしながら、ステップＳ１０４で用いる認識対象物の姿勢の識別方法は、距離最小の回転テンプレート画像を選択する方法、いわゆる最近傍決定則（ＮＮ法）に限られない。例えば、複数の回転テンプレート画像を示す特徴空間内の射影点の間を多項式補間やスプライン補間などによって補間することで回転多様体を解析的に表現し、回転多様体と部分画像の特徴ベクトルとの最短距離、最短距離を与える射影点を解析的に算出してもよい。 Also, the object recognition process performed by the object recognition device 1 according to the first embodiment of the invention, specifically, at step S104 in the flowchart of FIG. 4, the object distance s _c correspond to rotate the template image as the minimum Searching for posture. However, the method of identifying the posture of the recognition target used in step S104 is not limited to the method of selecting the rotation template image with the minimum distance, the so-called nearest neighbor determination rule (NN method). For example, a rotating manifold is analytically expressed by interpolating between projected points in a feature space indicating a plurality of rotated template images by polynomial interpolation or spline interpolation, and the rotation manifold and the feature vector of the partial image are expressed. The shortest distance and the projection point that gives the shortest distance may be calculated analytically.

また、テンプレート記憶部１６には、テンプレート画像の特徴ベクトル自体を格納しなくてもよい。例えば、テンプレート画像の特徴ベクトルを入力画像に含まれる認識対象物の姿勢を特定するための他のパラメータに変換し、変換後のパラメータをテンプレート記憶部１６に格納してもよい。また、入力画像の特徴ベクトルを入力画像に含まれる認識対象物の姿勢を特定するための他のパラメータに変換するための射影行列などの変換情報をテンプレート記憶部１６に格納してもよい。 Further, the template storage unit 16 may not store the feature vector of the template image itself. For example, the feature vector of the template image may be converted into another parameter for specifying the posture of the recognition target included in the input image, and the converted parameter may be stored in the template storage unit 16. Also, conversion information such as a projection matrix for converting the feature vector of the input image into another parameter for specifying the posture of the recognition target included in the input image may be stored in the template storage unit 16.

さらに、本発明は上述した実施の形態のみに限定されるものではなく、既に述べた本発明の要旨を逸脱しない範囲において種々の変更が可能であることは勿論である。 Furthermore, the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the gist of the present invention described above.

本発明の実施の形態にかかる物体認識装置の構成図である。It is a block diagram of the object recognition apparatus concerning embodiment of this invention. 走査ウィンドウによる入力画像の走査を説明するための概念図である。It is a conceptual diagram for demonstrating the scanning of the input image by a scanning window. 本発明の実施の形態にかかる物体認識装置が保持するテンプレート画像の一例を示す図である。It is a figure which shows an example of the template image which the object recognition apparatus concerning embodiment of this invention hold | maintains. 本発明の実施の形態にかかる物体認識装置による認識対象物体の認識処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the recognition process of the recognition target object by the object recognition apparatus concerning embodiment of this invention. 特徴空間の一例を示す図である。It is a figure which shows an example of feature space.

Explanation of symbols

１物体認識装置
１１撮像部
１２画像特徴算出部
１３特徴ベクトル算出部
１４基底ベクトル算出部
１５テンプレート照合部
１６テンプレート記憶部
１７結果統合部
２１入力画像
２２走査ウィンドウ
２３認識対象物
３１回転テンプレート画像
３２入力画像
３００回転多様体
３１１回転テンプレート画像の並進多様体
３２１入力画像の並進多様体 DESCRIPTION OF SYMBOLS 1 Object recognition apparatus 11 Image pick-up part 12 Image feature calculation part 13 Feature vector calculation part 14 Basis vector calculation part 15 Template collation part 16 Template memory | storage part 17 Result integration part 21 Input image 22 Scan window 23 Recognition object 31 Rotation template image 32 Input Image 300 rotation manifold 311 translation manifold 321 of rotation template image translation manifold of input image

Claims

An object recognition device that estimates the position and orientation of a recognition target object using an input image that is captured and input by an imaging unit,
A storage unit for storing a template dictionary described with respect to a feature vector of a plurality of rotation template images and a feature vector of a plurality of translation template images for matching with the input image;
A feature vector calculation unit that calculates a feature vector indicating a feature of a partial image that is partially selected as the processing target region from the input image by a scanning window that is scanned on the input image and specifies the processing target region;
An estimation unit that estimates the position and orientation of a recognition object included in the input image based on the template dictionary and the feature vector of the partial image;
The plurality of rotation template images are a set of images obtained by viewing the recognition object from a plurality of directions,
The plurality of translation template images is an object recognition device that is a set of images obtained by partially capturing the recognition target object from a plurality of viewpoints along a linear trajectory.

The rotation template image is a set of images obtained by viewing the recognition object from a plurality of viewpoints located on a circular orbit centered on the recognition object.
The object recognition apparatus according to claim 1, wherein the translation template image is a set of images obtained by partially capturing the recognition target object from a plurality of viewpoints located on a linear trajectory.

The plurality of rotation template images are cut out so as to include the entirety of the recognition target object from images photographed including the recognition target object from a plurality of viewpoints along a circular orbit centered on the recognition target object. A collection of images,
The object recognition apparatus according to claim 1, wherein the plurality of translation template images are a set of images cut out so as to include only a portion of the recognition target object from an image captured including the recognition target object.

The plurality of translation template images include an image group obtained by photographing the recognition object by shifting a center of the photographed image and a center of the recognition object as compared with each of the plurality of rotation template images. 4. The object recognition apparatus according to any one of 3.

The template matching unit is configured such that a distance in a feature space between a rotated manifold created by the feature vectors of the plurality of rotated template images and a feature vector of the partial image is smaller than a first threshold, and the first The distance in the feature space between the translation manifold created by the translation template image corresponding to the rotation template image that gives a distance to the rotation manifold that is less than a threshold and the feature vector of the partial image is less than a second threshold The object recognition apparatus according to claim 4, wherein a posture specified by a rotation template image that gives a distance from the rotation manifold smaller than the first threshold is estimated as a posture of the recognition target object.

Scanning the input image by a scanning window that defines a processing target area, and calculating a feature vector indicating characteristics of a partial image partially selected from the input image as the processing target area by the scanning window;
A feature vector of a plurality of rotation template images that are a set of images obtained by viewing the recognition target object from a plurality of directions, and a set of images obtained by partially capturing the recognition target object from a plurality of viewpoints along a linear trajectory. Estimating a position and orientation of a recognition object photographed in the input image by comparing a template dictionary described with respect to a feature vector of a plurality of translation template images and a feature vector of the partial image; Object recognition method.

The object recognition method according to claim 6, wherein the plurality of translation template images include an image group obtained by photographing the recognition target object with a center of the photographed image shifted from each of the plurality of rotation template images.

The rotation variety in which the distance in the feature space between the rotation manifold created by the feature vectors of the plurality of rotation template images and the feature vector of the partial image is smaller than the first threshold and smaller than the first threshold. The distance in the feature space between the translation manifold created by the translation template image corresponding to the rotation template image that gives the distance to the body and the feature vector of the partial image is less than a second threshold. The object recognition method according to claim 7, wherein a posture specified by a rotation template image that gives a distance from the rotation manifold that is smaller than a threshold is a posture estimation result of the recognition target object.

Determining whether the distance in the feature space between the rotation manifold created by the feature vectors of the plurality of rotation template images and the feature vector of the partial image is less than a first threshold;
A translation manifold created by a translation template image corresponding to a rotation template image that gives a distance from the rotation manifold that is less than the first threshold when determined to be less than the first threshold; Determining whether a distance in the feature space to a feature vector is less than a second threshold;
The posture specified by the rotation template image that gives a distance from the rotating manifold that is smaller than the first threshold when determined to be smaller than the second threshold is set as the posture estimation result of the recognition object. Item 8. The object recognition method according to Item 7.