JP5305031B2

JP5305031B2 - Feature amount extraction apparatus and method, and position estimation apparatus and method

Info

Publication number: JP5305031B2
Application number: JP2009200764A
Authority: JP
Inventors: 修長谷川; カーウィーウォンアラム; タンロアムサプシリナート; 祐人服部; 将慶土永
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2009-08-31
Filing date: 2009-08-31
Publication date: 2013-10-02
Anticipated expiration: 2029-08-31
Also published as: JP2011053823A

Description

本発明は、入力画像から局所特徴量を抽出する特徴量抽出装置及び方法、並びにこれを使用した位置推定装置及び方法に関する。 The present invention relates to a feature quantity extraction apparatus and method for extracting a local feature quantity from an input image, and a position estimation apparatus and method using the same.

自己位置の特定は、人間や機械にとっては必須の能力である。今自分はどこにいるかということを知ることは、ロボットやコンピュータビジョンにとっては、常に重要である。特に、可動式のロボットにとって、世界で自分がどこにいるかを知ることは、ナビゲーションシステムのための基本的要求である。 Identification of self-location is an essential capability for humans and machines. Knowing where you are now is always important for robots and computer vision. Especially for mobile robots, knowing where you are in the world is a fundamental requirement for navigation systems.

このような自己位置同定には、局所特徴をいかに正確に抽出するかが大きなポイントとなる。従来、アフィン不変の特徴量（MSER, Harris-Affine, Hessian-Affine, Salient Regionなど）や、大きさ変化に不変な特徴量（SIFT：非特許文献１）, SURF（非特許文献２）がある。 For such self-location identification, how to extract local features accurately is a big point. Conventionally, there are affine-invariant feature quantities (MSER, Harris-Affine, Hessian-Affine, Salient Region, etc.), feature-invariant feature quantities (SIFT: Non-Patent Document 1), SURF (Non-Patent Document 2) .

D. G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l Jour. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004D. G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," Int'l Jour. Computer Vision, vol. 60, no. 2, pp. 91-110, 2004 H. Bay, T. Tuytelaars and L. V. Gool, "SURF: Speeded Up Robust Features," Proc. European Conf. Computer Vision, 2006H. Bay, T. Tuytelaars and L. V. Gool, "SURF: Speeded Up Robust Features," Proc. European Conf. Computer Vision, 2006

しかしながら、これらの特徴量には、以下の問題点がある。先ず、同一のものを異なった位置から撮影した場合、従来の局所特徴をそのまま抽出したのでは、マッチングのとれない異なった局所特徴も大量に抽出してしまう。次に、同一のものとして認識させたい場合であっても、ある程度見え方に変化がある場合、それぞれの見え方について全ての局所特徴を学習しなければならず、記憶容量の使用効率が悪い。さらに、ある一つの見え方の場合のみ出現している特徴などは誤認識の元となる場合もあり、従来の局所特徴をそのまま用いた場合、とてもノイズの多い特徴量となってしまう。 However, these feature quantities have the following problems. First, when the same object is photographed from different positions, if the conventional local features are extracted as they are, a large amount of different local features that cannot be matched are extracted. Next, even if it is desired to recognize them as the same, if there is a change in the appearance to some extent, all local features must be learned for each appearance, and the use efficiency of the storage capacity is poor. Furthermore, a feature that appears only in a certain way of appearance may be a source of misrecognition, and when a conventional local feature is used as it is, the feature amount is very noisy.

また、従来のアフィン変化と、大きさ変化の両方に対応した特徴量は研究途上であり、有用な方法は報告されていない。 Moreover, the feature-value corresponding to both the conventional affine change and a magnitude | size change is under research, and a useful method is not reported.

本発明は、このような問題点を解決するためになされたものであり、位置の変化にロバストな特徴量を抽出する特徴量抽出装置、及びそれを搭載した位置推定装置を提供することを目的とする。 The present invention has been made to solve such problems, and it is an object of the present invention to provide a feature amount extraction device that extracts a feature amount robust to a change in position, and a position estimation device equipped with the feature amount extraction device. And

本発明に係る位置推定装置は、連続して撮影された連続画像からなる入力画像から不変特徴量を抽出する特徴量抽出手段と、前記特徴量抽出手段が抽出した前記不変特徴量に基づき位置を推定する位置推定手段とを有し、前記特徴量抽出手段は、前記入力画像それぞれから、局所特徴量を抽出する局所特徴量抽出手段と、前記局所特徴量抽出手段により抽出された前記局所特徴量について、前記連続する入力画像間でマッチングをとる特徴量マッチング手段と、前記特徴量マッチング手段により所定数連続する画像間でマッチングが取れた局所特徴量を連続特徴量として選択する連続特徴量選択手段と、各前記連続特徴量の平均を不変特徴量として求める不変特徴量算出手段とを有するものである。 The position estimation apparatus according to the present invention includes a feature amount extraction unit that extracts an invariant feature amount from an input image that is a continuously captured image, and a position based on the invariant feature amount extracted by the feature amount extraction unit. A position estimation unit for estimation, wherein the feature amount extraction unit extracts a local feature amount from each of the input images, and the local feature amount extracted by the local feature amount extraction unit. A feature amount matching unit that performs matching between the continuous input images, and a continuous feature amount selection unit that selects, as the continuous feature amount, a local feature amount that is matched between a predetermined number of consecutive images by the feature amount matching unit. And an invariant feature quantity calculating means for obtaining an average of the continuous feature quantities as an invariant feature quantity.

本発明においては、連続して撮影された画像を使用し、連続する２枚の画像間で特徴量のマッチングをとり、さらにマッチングを取った特徴量が連続して出現するもののみを抽出し、その平均の局所特徴量を求めることで、撮影位置の変化にロバストな特徴を抽出することができ、それを使用して位置を推定するため、正確に位置を特定することが可能となる。連続画像とは、ビデオ画像から取得される連続的に撮影された複数枚の画像セットである。 In the present invention, using images that are taken consecutively, matching feature amounts between two consecutive images, and further extracting only those features that have been matched appear continuously, By obtaining the average local feature amount, it is possible to extract a feature that is robust to changes in the shooting position, and to estimate the position using this, it is possible to accurately identify the position. A continuous image is a set of a plurality of images taken continuously from a video image.

ここで、前記位置推定手段は、教師画像を入力画像とし、複数の当該教師画像から抽出した前記不変特徴量と当該不変特徴量を抽出したエリアを示すエリア識別情報とを対応づけたエリアデータベースを格納する記憶手段と、周囲を撮影した画像を入力画像とし、前記局所特徴量抽出手段により抽出した前記入力画像の複数の局所特徴量のそれぞれと、前記エリアデータベースに登録された前記不変特徴量とのマッチングを取り、各局所特徴量にマッチする不変特徴量に対応するエリアに投票し、得票数が最大となったエリアを現在の位置として推定する推定結果出力手段とを有するものとすることができる。学習等によりエリアデータベースを構築することで、様々な位置の特定が可能である。 Here, the position estimation means uses an area database associating the invariant feature amount extracted from a plurality of the teacher images with area identification information indicating the area from which the invariant feature amount is extracted, using the teacher image as an input image. Storage means for storing; an image obtained by photographing the surroundings as an input image; each of a plurality of local feature quantities of the input image extracted by the local feature quantity extraction means; and the invariant feature quantity registered in the area database; And an estimation result output means for voting on the area corresponding to the invariant feature quantity matching each local feature quantity and estimating the area with the maximum number of votes as the current position. it can. By constructing an area database by learning or the like, various positions can be specified.

また、前記位置推定手段は、前記得票数が所定の閾値未満である場合は、当該エリアを未学習エリアとして認識することができる。これにより、位置が定かではない場合、エリアを特定しないため、不正確な推定を行わないようにすることができ、当該エリアを未学習エリアとすることで、再度の学習を的確に行うことができる。 Moreover, the said position estimation means can recognize the said area as an unlearned area, when the said number of votes is less than a predetermined threshold value. As a result, when the position is not clear, the area is not specified, so that it is possible to prevent inaccurate estimation, and by making the area as an unlearned area, the second learning can be accurately performed. it can.

さらに、前記局所特徴量は、ＳＩＦＴ（Scale Invariant Feature Transformation）及び／又はＳＵＲＦ（Speed Up Robustness Features）の特徴量とすることができる。また、これらＳＩＦＴやＳＵＲＦに限らず、スケール、回転の変動、又はノイズ等に対してロバストな他の局所特徴量を用いることも可能である。これにより、これら既存の局所特徴量を用いることで、これらの特徴量が有する性能もそのまま引き継がれ、照明変化等にも頑健な特徴として抽出・記述することが可能となる。 Further, the local feature amount may be a feature amount of SIFT (Scale Invariant Feature Transformation) and / or SURF (Speed Up Robustness Features). In addition to these SIFTs and SURFs, other local feature quantities that are robust to scale, rotation fluctuation, noise, and the like can be used. Thus, by using these existing local feature amounts, the performance of these feature amounts is inherited as it is, and it becomes possible to extract and describe as features that are robust against illumination changes and the like.

また、前記データベースを最適化する最適化手段を有し、前記最適化手段は、前記不変特徴量を構成する複数の個別不変特徴量のうち、冗長な個別不変特徴量を削除することができる。これにより、データベースの容量を削減することができる。 In addition, the data processing apparatus includes optimization means for optimizing the database, and the optimization means can delete redundant individual invariant feature quantities among a plurality of individual invariant feature quantities constituting the invariant feature quantity. Thereby, the capacity of the database can be reduced.

さらに、前記最適化手段は、前記教師画像を再使用し、当該教師画像を識別するために使用された個別不変特徴量に投票し、当該投票結果に応じて冗長な個別特徴量を削除することができる。これにより、識別能力の低い個別不変特徴量を削除することができ、容量の削減により識別能力の低下を抑制することができる。 Further, the optimization unit reuses the teacher image, votes for the individual invariant feature amount used to identify the teacher image, and deletes the redundant individual feature amount according to the voting result. Can do. Thereby, the individual invariant feature quantity with low discrimination ability can be deleted, and the reduction of discrimination ability can be suppressed by reducing the capacity.

さらにまた、前記最適化手段は、前記最適化をオフライン又はオンラインで行うことができる。最適化の方法により、オンラインかオフラインかを選択することができる。 Furthermore, the optimization means can perform the optimization offline or online. Depending on the optimization method, it is possible to select online or offline.

また、前記最適化手段は、前記エリアデータベースに登録された前記不変特徴量及び前記エリア識別情報について、冗長なデータを削除するようにすることができる。これにより、データベースの容量を削減することができる。 The optimization unit may delete redundant data for the invariant feature and the area identification information registered in the area database. Thereby, the capacity of the database can be reduced.

さらに、前記冗長なデータは、前記エリア識別情報が示すエリアに再び移動する蓋然性が低いものとすることができる。これにより、しばらく使用していないようなデータを削除することができる。 Further, the redundant data may have a low probability of moving again to the area indicated by the area identification information. As a result, data that has not been used for a while can be deleted.

本発明に係る特徴量抽出装置は、連続して撮影された連続画像からなる入力画像それぞれから、局所特徴量を抽出する局所特徴量抽出手段と、前記局所特徴量抽出手段により抽出された前記局所特徴量について、前記連続する入力画像間でマッチングをとる特徴量マッチング手段と、前記特徴量マッチング手段により所定数連続する画像間でマッチングが取れた局所特徴量を連続特徴量として選択する連続特徴量選択手段と、各前記連続特徴量の平均を不変特徴量として求める不変特徴量算出手段とを有するものである。 The feature amount extraction apparatus according to the present invention includes a local feature amount extraction unit that extracts a local feature amount from each of input images that are continuously captured images, and the local feature amount extraction unit that extracts the local feature amount. For feature quantities, a feature quantity matching unit that performs matching between the continuous input images, and a continuous feature quantity that selects a local feature quantity that is matched between a predetermined number of consecutive images by the feature quantity matching unit as a continuous feature quantity. A selection unit; and an invariant feature amount calculation unit that obtains an average of the continuous feature amounts as an invariant feature amount.

本発明によれば、撮影位置の変化に頑健な局所特徴量を抽出することができる特徴量抽出装置及び方法、並びにそれを使用した位置推定装置及び方法を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the feature-value extraction apparatus and method which can extract the local feature-value robust to the change of imaging | photography position, and the position estimation apparatus and method using the same can be provided.

本発明の実施の形態にかかる位置推定装置を示す図である。It is a figure which shows the position estimation apparatus concerning embodiment of this invention. 本実施の形態にかかる特徴量ＰＩＲＦを抽出する方法を説明する図である。It is a figure explaining the method of extracting the feature-value PIRF concerning this Embodiment. 本発明の実施の形態の不変特徴量抽出方法を示すフローチャートである。It is a flowchart which shows the invariant feature-value extraction method of embodiment of this invention. 入力画像がいずれのプレイスであるかを推定する方法を説明する図である。It is a figure explaining the method of estimating which place an input image is. 本発明の実施例１で使用する画像を示す図である。It is a figure which shows the image used in Example 1 of this invention. 本発明の実施例１の結果を示す図である。It is a figure which shows the result of Example 1 of this invention. 本発明の実施例２の結果を示す図である。It is a figure which shows the result of Example 2 of this invention. 本発明の実施例３の結果を示す図である。It is a figure which shows the result of Example 3 of this invention. 本発明の実施例４の結果を示す図である。It is a figure which shows the result of Example 4 of this invention.

以下、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。この実施の形態は、本発明を、撮影位置の変化に頑健な局所特徴量を抽出可能な特徴量抽出装置を搭載した位置推定装置に適用したものである。なお、本発明は、例えば、実環境画像に対するシーン認識・物体認識などにも応用することが可能である。 Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings. In this embodiment, the present invention is applied to a position estimation device equipped with a feature amount extraction device capable of extracting a local feature amount that is robust to changes in the photographing position. The present invention can also be applied to scene recognition / object recognition for real environment images, for example.

図１は、本発明の実施の形態にかかる位置推定装置を示す図である。本実施の形態にかかる位置推定装置１は、連続して撮影された連続画像からなる入力画像から不変特徴量を抽出する特徴量抽出部２０と、特徴量抽出部２０が抽出した前記不変特徴量に基づき対象位置を推定する位置推定部１０とを有する。 FIG. 1 is a diagram illustrating a position estimation apparatus according to an embodiment of the present invention. The position estimation apparatus 1 according to the present embodiment includes a feature quantity extraction unit 20 that extracts an invariant feature quantity from an input image that is a series of continuously captured images, and the invariant feature quantity that the feature quantity extraction section 20 has extracted. And a position estimation unit 10 that estimates the target position based on the above.

特徴量抽出部２０は、入力画像それぞれから、局所特徴量を抽出する局所特徴量抽出部２１と、局所特徴量抽出部２１により抽出された局所特徴量について、連続する入力画像間でマッチングをとる特徴量マッチング部２２と、特徴量マッチング部２２により所定数連続する画像間でマッチングが取れた特徴量を連続特徴量として選択する連続特徴量選択部２３と、各連続特徴量の平均を不変特徴量として求める不変特徴量算出部２４とを有する。 The feature amount extraction unit 20 matches a local feature amount extraction unit 21 that extracts a local feature amount from each of the input images and the local feature amount extracted by the local feature amount extraction unit 21 between successive input images. A feature quantity matching unit 22, a continuous feature quantity selection unit 23 that selects a feature quantity matched between a predetermined number of consecutive images by the feature quantity matching unit 22 as a continuous feature quantity, and an average of each continuous feature quantity is unchanged. And an invariant feature amount calculation unit 24 to be obtained as a quantity.

位置推定部１０は、エリアデータベース１１を格納する記憶部１２と、推定結果出力部１３と、データベースを最適化するＤＢ最適化部１４とを有する。エリアデータベース１１には、教師画像を入力画像とし、複数の当該教師画像から抽出した不変特徴量と当該不変特徴量を抽出したエリアを示すエリア識別情報とが対応づけて格納されている。また、推定結果出力部１３は、位置を特定したい画像を入力画像とし、局所特徴量抽出部２１により抽出した入力画像の複数の局所特徴量のそれぞれと、エリアデータベース１１に登録された不変特徴量とのマッチングを取り、各局所特徴量にマッチする不変特徴量に対応するエリアに投票し、得票数が最大となったエリアを特定対象の位置として推定する。 The position estimation unit 10 includes a storage unit 12 that stores an area database 11, an estimation result output unit 13, and a DB optimization unit 14 that optimizes the database. The area database 11 stores a teacher image as an input image, and stores invariant feature amounts extracted from a plurality of the teacher images and area identification information indicating an area from which the invariant feature amounts are extracted. The estimation result output unit 13 uses an image whose position is to be specified as an input image, each of the plurality of local feature amounts of the input image extracted by the local feature amount extraction unit 21, and the invariant feature amount registered in the area database 11. Is voted on the area corresponding to the invariant feature amount matching each local feature amount, and the area having the maximum number of votes is estimated as the position of the specific target.

以下の説明においては、不変特徴量算出部２４が抽出する不変特徴量のことを、特徴量ＰＩＲＦ（Position-Invariant Robust Features）ということとし、先ず、この特徴量ＰＩＲＦを抽出する特徴量抽出部２０について詳細に説明する。特徴量抽出部２０は、撮影位置の変化に影響を受けにくい（局所）特徴量として特徴量ＰＩＲＦを抽出する。 In the following description, the invariant feature amount extracted by the invariant feature amount calculation unit 24 is referred to as a feature amount PIRF (Position-Invariant Robust Features). First, the feature amount extraction unit 20 that extracts the feature amount PIRF. Will be described in detail. The feature quantity extraction unit 20 extracts the feature quantity PIRF as a (local) feature quantity that is not easily affected by changes in the shooting position.

これは、本願発明者が実環境における移動ロボットの自己位置推定問題を解決すべく鋭意実験研究した結果、近くの対象については撮影位置や撮影時間帯の変化による見え方の差（特徴量変化）が大きいが、遠くの対象については変化が小さい（ランドマークの特徴量はあまり変化しない）ことから、本特徴量ＰＩＲＦを抽出する方法を見出した。 As a result of the present inventor's earnest experiment research to solve the problem of self-position estimation of a mobile robot in a real environment, the difference in appearance (change in feature amount) due to changes in shooting position and shooting time period for nearby objects However, since the change is small for a distant object (the feature amount of the landmark does not change so much), a method for extracting the feature amount PIRF has been found.

本実施の形態にかかる特徴量抽出部２０は、簡単には、連続画像間で局所特徴のマッチングを行い、予め定めた枚数間で、連続してマッチングのとれている特徴を選択し、選択された特徴において、それとマッチングのとれている全特徴の平均を特徴量ＰＩＲＦとして抽出・記述するものである。 The feature amount extraction unit 20 according to the present embodiment simply matches local features between consecutive images, selects features that are continuously matched between a predetermined number of images, and is selected. The average of all the features that are matched with the extracted feature is extracted and described as the feature amount PIRF.

この特徴量抽出部２０により撮影位置の変化に頑健な局所特徴のみを抽出・記述することができる。また、記述子としては、局所特徴であれば様々なものが適用可能であり、上述のように既存の局所特徴量を用いることで、既存の特徴量が持つ性能もそのまま引き継がれ、照明変化等にも頑健な特徴として抽出・記述することが可能となる This feature quantity extraction unit 20 can extract and describe only local features that are robust to changes in the shooting position. Various descriptors can be used as long as they are local features. By using the existing local feature as described above, the performance of the existing feature is inherited as it is, and the illumination change etc. Can be extracted and described as robust features

以下、各ブロックについて詳細に説明する。図２は、本実施の形態にかかる特徴量ＰＩＲＦを抽出する方法を説明する図である。局所特徴量抽出部２１には、連続して撮影された連続画像が入力画像として入力される。ここで、ＰＩＲＦで要求される連続画像とは、ある画像セットであって、一定のフレームで、例えば１秒毎に２フレームなど、毎秒毎に連続的に撮影されたビデオ画像をいう。すなわち、ビデオからキャプチャされた画像は一般的に連続的であり、ＰＩＲＦにおける連続画像は、ビデオ画像を使用したものでなければならない。画像の取得率は、カメラの速度に応じて設定される。たとえば、カメラが車に搭載されていた場合、カメラの速度は１分間に約１０００ｍ／分であり、ビデオからキャプチャされる連続画像はおよそ５０乃至１００フレーム／秒となる。 Hereinafter, each block will be described in detail. FIG. 2 is a diagram for explaining a method of extracting the feature amount PIRF according to the present embodiment. The local feature amount extraction unit 21 receives a continuous image captured continuously as an input image. Here, a continuous image required by PIRF is a video set that is a certain image set and is continuously captured every second, such as two frames every second, for example, two frames every second. That is, the images captured from the video are generally continuous, and the continuous images in the PIRF must use video images. The image acquisition rate is set according to the speed of the camera. For example, if the camera is mounted in a car, the camera speed is about 1000 m / min per minute and the continuous image captured from the video is approximately 50 to 100 frames / sec.

本実施の形態においては、入力画像として全方位画像を用い、連続する画像は、１つのプレイス（place）を撮影したものを使用する。後述するように、このプレイスとは、例えば交差点から交差点までのある領域とする。当該プレイスは、いくつかの連続画像からなる複数のサブプレイスに分割される。すなわち、１つのプレイスは、複数のサブプレイスから構成される。このサブプレイス毎に１又は複数の不変特徴量を抽出する。この不変特徴量の集合をサブプレイスＰＩＲＦ辞書（PIRF-dictionary）という。１つのプレイスから抽出された全不変特徴量の集合、すなわち、上記サブプレイスＰＩＲＦ辞書の集合によりＰＩＲＦ辞書が構成される。１つのプレイスには１つのＰＩＲＦ辞書が対応する。このサブプレイスＰＩＲＦ辞書及びＰＩＲＦ辞書が抽出されたプレイスの識別情報が、ＰＩＲＦ辞書と共に上述のエリアデータベース１１に格納される。 In the present embodiment, an omnidirectional image is used as an input image, and continuous images obtained by photographing one place are used. As will be described later, this place is, for example, an area from an intersection to an intersection. The place is divided into a plurality of sub-places composed of several continuous images. That is, one place is composed of a plurality of sub-places. One or a plurality of invariant feature quantities are extracted for each subplace. This set of invariant features is called a sub-place PIRF dictionary (PIRF-dictionary). A set of all invariant features extracted from one place, that is, a set of sub-place PIRF dictionaries constitutes a PIRF dictionary. One place corresponds to one PIRF dictionary. The sub-place PIRF dictionary and the place identification information from which the PIRF dictionary is extracted are stored in the area database 11 together with the PIRF dictionary.

先ず、局所特徴量抽出部２１は、既存の局所特徴量抽出方法を使用して局所特徴量を抽出する。局所特徴量抽出部２１は、例えば、ＳＩＦＴ（Scale Invariant Feature Transformation）、又はＳＵＲＦ（Speed Up Robustness Features）の特徴量を使用することができる。または、これらＳＩＦＴやＳＵＲＦに限らず、他の局所特徴量を使用することができることは勿論である。特に、スケール、回転の変動、又はノイズ等に対してロバストな他の局所特徴量を用いることが好ましい。これらの局所特徴量を用いることで、既存の特徴量が有する性能もそのまま引き継がれ、照明変化等にも頑健な特徴として抽出・記述することが可能となる。ここで、ｉ番目のエリアにおける全方位画像の数をｎ_ｉとし、その中でｊ番目の画像から抽出された局所特徴量の集合をＵ_ｊ ^ｉとする。ここでは、局所特徴量はｕ→で表す。 First, the local feature quantity extraction unit 21 extracts a local feature quantity using an existing local feature quantity extraction method. The local feature quantity extraction unit 21 can use, for example, a feature quantity of SIFT (Scale Invariant Feature Transformation) or SURF (Speed Up Robustness Features). Of course, the present invention is not limited to these SIFTs and SURFs, and other local feature quantities can be used. In particular, it is preferable to use other local features that are robust against scale, rotation fluctuations, noise, and the like. By using these local feature amounts, the performance of the existing feature amounts is inherited as it is, and it becomes possible to extract and describe as features that are robust against changes in illumination. Here, the number of omnidirectional images in the i-th area is n _i, and a set of local feature values extracted from the j-th image is U _j ⁱ . Here, the local feature amount is represented by u →.

次に、特徴量マッチング部２２が、連続する２枚の画像間全てについて局所特徴量ｕ→を構成する各特徴量のマッチングを行う。すなわち、ｊ＝ｑ番目の画像の全局所特徴量について、ｊ＝ｑ＋１番目の画像の全局所特徴量に対してマッチングを行う。ここでは、それぞれマッチングのとれた特徴へのインデックスをマッチング結果ベクトルｍ_ｑ ^ｉ→として求める。 Next, the feature amount matching unit 22 performs matching of each feature amount constituting the local feature amount u → for all two consecutive images. That is, matching is performed on all local feature values of the j = q + 1-th image with respect to all local feature values of the j = q-th image. Here, the index to the matched feature is obtained as a matching result vector m _q ⁱ →.

ここで、マッチング方法の一例について、ＳＩＦＴを例にとって説明する。画像Ｉ_ａから抽出される特徴をｖとする。この特徴ｖが次の画像Ｉ_ａ＋１の特徴ｖ'とマッチングするか否かを判定する。先ず、特徴ｖと画像Ｉ_ａ＋１から抽出した全ての特徴との間のドット積（dot product）を求める。そして、最も類似する特徴ｖ_{ｆｉｒｓｔ}と、２番目に類似する特徴ｖ_{ｓｅｃｏｎｄ}を求めるために、この結果をソートする。もし、
（ｖ_{ｆｉｒｓｔ}・ｖ）／（ｖ_{ｓｅｃｏｎｄ}・ｖ）＞θ
が成立する場合、マッチングのとれた特徴ｖ'＝ｖ_{ｆｉｒｓｔ}と判定する。ここで、閾値θはたとえば０．６とすることができる。上記が成立しない場合は、画像Ｉ_ａにおける特徴ｖにマッチングする特徴は、画像Ｉ_ａ＋１には存在しないと判定する。 Here, an example of the matching method will be described using SIFT as an example. _{Let the} feature extracted from the image Ia be v. It is determined whether or not the feature v matches the feature v ′ of the next image I _{a + 1} . First, a dot product between the feature v and all the features extracted from the image I _{a + 1} is obtained. Then, this result is sorted in order to obtain the most similar feature v _first and the second most similar feature v _second . if,
(V _first · v) / (v _second · v)> θ
When is established, it is determined that the matched feature v ′ = v _first . Here, the threshold value θ can be set to 0.6, for example. If the above is not satisfied, it characterized matching the characteristic v of the image I _a is judged not to exist in the image I _{a + 1.}

図２に示すように、各入力画像から６つの局所特徴量が抽出された場合について説明する。これら６つの局所特徴量間でマッチングを取り、マッチングが取れた場合にのみ、マッチングが取れた特徴量へのインデックスを付す。例えば、ｍ_１ ^ｉ→の１番目の局所特徴量は、ｍ_２ ^ｉ→の３番目の局所特徴量とマッチングがとれていることを示し、ｍ_２ ^ｉ→の３番目の特徴量は、ｍ_３ ^ｉ→の６番目の特徴量とマッチングが取れていることを示す。 As shown in FIG. 2, a case where six local feature amounts are extracted from each input image will be described. Matching is performed between these six local feature quantities, and an index to the feature quantity that has been matched is attached only when matching is achieved. For example, the _first local feature value of m ₁ ⁱ → indicates that the third local feature value of m ₂ ⁱ → is matched, and the third feature value of m ₂ ⁱ → is m _3. ^This indicates that matching with the sixth feature quantity of ⁱ → is achieved.

次に、連続特徴量選択部２３が連続特徴量を選択する。先ず、いくつのｍ_ｑ ^ｉ→を使用して連続特徴量を求めるかを決定する。このｍ_ｑ ^ｉ→の数を、本明細書においては、ウィンドウサイズｗともいう。また、このウィンドウサイズｗに含まれるｍ_ｑ ^ｉ→の集合を、サブプレイスという。ここで、ウィンドウサイズｗが大きいほどより頑健な、識別力の高い連続特徴量のみを抽出できるが、大きすぎると特徴数が極端に少なくなってしまう。また、小さすぎると、頑健ではない、識別力のないような特徴量も抽出してしまうので、目的等に応じて最適な大きさとする必要がある。 Next, the continuous feature value selector 23 selects a continuous feature value. First, it is determined how many m _q ⁱ → to obtain the continuous feature amount. The number of m _q ⁱ → is also referred to as a window size w in this specification. A set of m _q ⁱ → included in the window size w is called a sub-place. Here, as the window size w is larger, it is possible to extract only a continuous feature amount that is more robust and has a high discriminating power. On the other hand, if it is too small, feature quantities that are not robust and have no discriminating power will be extracted.

本実施の形態においては、ウィンドウサイズｗを３とする。したがって、連続する４枚の入力画像を使用して連続特徴量を求める。すなわち、図２に示すように、１番目のサブプレイスには、ｍ_１ ^ｉ→、ｍ_２ ^ｉ→、ｍ_３ ^ｉ→が含まれ、入力画像Ｉ_１、Ｉ_２、Ｉ_３、Ｉ_４が対応する。なお、インデックスの数が０の場合は、次にマッチングする特徴量がないことを示す。よって、図２の場合、１番目のサブプレイスには、３つの連続特徴量が含まれることになる。 In the present embodiment, the window size w is 3. Therefore, continuous feature values are obtained using four consecutive input images. That is, as shown in FIG. 2, the first sub-place includes m ₁ ⁱ →, m ₂ ⁱ →, m ₃ ⁱ →, and the input images I ₁ , I ₂ , I ₃ , and I ₄ correspond to each other. To do. When the number of indexes is 0, it indicates that there is no feature amount to be matched next. Therefore, in the case of FIG. 2, the first sub-place includes three continuous feature values.

連続特徴量選択部２３は、ウィンドウサイズｗを設定したらこのウィンドウｗをひとつずつずらしながら、そのウィンドウサイズに含まれる全方位画像４枚内で共通して出現する特徴を連続特徴量として抽出する。ウィンドウサイズｗを設定したら、連続特徴量を抽出するために用いる関数を以下のように定義する。ただし、ｂは注目するインデックスベクトルの番号とする。
そして、全てのマッチング結果ベクトルｍ_ｑ ^ｉ→について、ｆ（ｍ_ｘ,ｙ ^ｉ）を計算し、ｆ（ｍ_ｘ,ｙ ^ｉ）＞０となるときの局所特徴量ｕ_ｘ,ｙ ^ｉ→のみを抽出する。入力画像の数がｎ_ｉ、ウィンドウサイズがｗのとき、サブプレイスの数は、ｎ_ｉ−ｗ＋１となる。 When the window size w is set, the continuous feature value selection unit 23 shifts the windows w one by one, and extracts features that appear in common in the four omnidirectional images included in the window size as continuous feature values. When the window size w is set, a function used to extract continuous feature values is defined as follows. Here, b is the number of the index vector of interest.
Then, f ( _{mx, y} ⁱ ) is calculated for all matching result vectors m _q ⁱ →, and only the local feature quantity u _{x, y} ⁱ → when f ( _{mx, y} ⁱ )> 0 is obtained. Extract. When the number of input images is n _i and the window size is w, the number of sub-places is n _i −w + 1.

不変特徴量算出部２４は、同一ウィンドウグループであるサブプレイス内においてマッチングがとれた局所特徴量の平均を求める。これらの平均値からなるベクトルによりＰＩＦＲ辞書が構成される。全サブプレイス（ｎ_ｉ−ｗ＋１）個から抽出された（ｎ_ｉ−ｗ＋１）個のサブプレイスＰＩＦＲ辞書（Ｄ_ｊ ^ｉ，ｊ≦ｎ_ｉ−ｗ＋１）をＰＩＲＦ辞書（Ｄ^ｉ）に登録する。このＰＩＦＲ辞書を構成する、マッチングがとれた局所特徴量の平均各がＰＩＲＦである。 The invariant feature quantity calculation unit 24 obtains an average of local feature quantities that have been matched within sub-places that are the same window group. A PIFR dictionary is constituted by a vector composed of these average values. Registered in all sub place _(n i -w + 1) extracted from pieces _(n i -w + 1) pieces of sub-Place PIFR Dictionary _{^{_{(D j i, j ≦ n}}} i -w + 1) the PIRF dictionary ^{(D i).} Each average of the matched local feature values constituting the PIFR dictionary is PIRF.

次に、特徴量抽出部の不変特徴量ＰＩＦＲの抽出方法について説明する。図３は、不変特徴量抽出方法を示すフローチャートである。先ず、ｉ＝１、ｊ＝１に初期化する（ステップＳ１）。ここで、ｎ_ｉは、プレイスＡ_ｊに属する画像の枚数であり、ｗは、ＰＩＲＦを抽出するウィンドウサイズである。 Next, a method for extracting the invariant feature amount PIFR of the feature amount extraction unit will be described. FIG. 3 is a flowchart showing the invariant feature amount extraction method. First, i = 1 and j = 1 are initialized (step S1). Here, _{n i} is the number of images belonging to the Place _{A j,} w is the window size to extract the PIRF.

次に、エリアＡ_ｊの画像Ｉ_ｉを入力する（ステップＳ２）。そして、ｉがウィンドウサイズｗより大きいか否かを判定する（ステップＳ３）。ｉがウィンドウサイズｗ以下である場合は、ｉをインクリメントし（ステップＳ１０）、ステップＳ２に戻る。 Next, the image I _i of the area A _j is input (step S2). Then, it is determined whether i is larger than the window size w (step S3). If i is less than or equal to the window size w, i is incremented (step S10), and the process returns to step S2.

一方、ウィンドウサイズｗより大きい場合は、画像Ｉ_ｉと画像Ｉ_ｉ−１の間でマッチングを取る（ステップＳ４）。そして、ｗ＋１枚の連続する入力画像（Ｉ_ｉ−ｗ，…，Ｉ_ｉ）から安定している局所特徴量（ＳＩＦＴ）を抽出する（ステップＳ５）。ここで、本実施の形態においては、ウィンドウサイズｗを３に設定している。ウィンドウサイズｗは、何枚の連続画像からＰＩＲＦを抽出するかを示す。たとえば、ウィンドウサイズｗを３に設定した場合、ＰＩＲＦは４つの連続画像に連続して現れる特徴があるときにのみ得られる。従って、あるプレイスＡの画像イメージが４枚より少ない場合、３つの２画像間で一致する特徴を見つけるのに十分ではない。よって、画像が少なくとも４枚であるとき、プレイスＡからのＰＩＲＦの抽出をスタートすることができる。例えば、現在の画像がＩ_６であるとき、ＰＩＲＦは、４枚の連続画像I_３、Ｉ_４、Ｉ_５、Ｉ_６から求めることができる。 On the other hand, if it is larger than the window size w, matching is performed between the image I _i and the image I _i-1 (step S4). Then, a stable local feature amount (SIFT) is extracted from w + 1 consecutive input images (I _i-w ,..., I _i ) (step S5). Here, in the present embodiment, the window size w is set to 3. The window size w indicates how many continuous images the PIRF is extracted from. For example, if the window size w is set to 3, PIRF is obtained only when there are features that appear consecutively in four consecutive images. Therefore, if there are fewer than four image images in a place A, it is not sufficient to find matching features between the three two images. Therefore, when there are at least four images, extraction of PIRF from place A can be started. For example, when the current image is I ₆ , PIRF can be obtained from _four consecutive images I ₃ , I ₄ , I ₅ , and I ₆ .

次に、抽出した安定した局所特徴量の平均をＰＩＲＦとして算出する（ステップＳ６）。次いで、エリアＡ_ｊに含まれるＰＩＲＦを収集し、ＰＩＲＦ辞書Ｄ_ｊに登録する（ステップＳ７）。そして、ｉ＝ｎ_ｊであるか否かが判断され、ｉ＝ｎ_ｊであれば、ｉ＝１、ｊ＝ｊ＋１とし、ステップＳ２からの処理を繰り返す（ステップＳ９）。ｉ＝ｎ_ｊではない場合は、ステップＳ１０に進み、ｉをインクリメントして、ステップＳ２に戻る。 Next, the average of the extracted stable local features is calculated as PIRF (step S6). Next, PIRFs included in the area A _j are collected and registered in the PIRF dictionary D _j (step S7). Then, whether or not i = _{n j} is determined, if i = _{n j,} and i = 1, j = j + 1, and repeats the processing from step S2 (step S9). If i = n _j is not satisfied, the process proceeds to step S10, i is incremented, and the process returns to step S2.

次に位置推定部１０について説明する。上述したように、位置推定部１０の記憶部１２は、エリアデータベース１１を有する。エリアデータベース１１には、エリア（プレイス）を識別するためのエリア識別情報と、当該エリアに含まれる不変特徴量ＰＩＲＦ（ＰＩＲＦ辞書）とが対応づけられ、登録されている。このエリアデータベース１１は、予めエリア識別情報が対応づけられた教師画像から、特徴量抽出部２０を使用して不変特徴量ＰＩＲＦを抽出して登録するという学習により生成されたものである。 Next, the position estimation unit 10 will be described. As described above, the storage unit 12 of the position estimation unit 10 includes the area database 11. In the area database 11, area identification information for identifying an area (place) and an invariant feature amount PIRF (PIRF dictionary) included in the area are associated and registered. This area database 11 is generated by learning to extract and register an invariant feature quantity PIRF from a teacher image associated with area identification information in advance using the feature quantity extraction unit 20.

ここで、本実施の形態においては、位置推定装置がある特定の位置を特定対象の位置として推定するのではなく、ある一定の範囲のある領域をエリアとして認識する場合について説明する。なお、位置推定装置が現在の位置を推定するものとして構成してもよい。 Here, in the present embodiment, a case will be described in which a position estimation device does not estimate a specific position as a specific target position, but recognizes a certain area within a certain range as an area. Note that the position estimation device may be configured to estimate the current position.

ところで、本実施の形態においては、特徴量抽出部２０により位置推定装置１自身でエリアデータベース１１を生成する。このため、例えば、ロボット装置に当該位置推定装置１を搭載した場合、ロボット装置が先ず撮像手段により、認識したいエリアを移動しながら当該エリアの連続画像を取得し、当該エリアを識別するための識別情報を対応づける。そして、識別情報を対応づけた連続画像を教師画像とし、特徴量抽出部２０により、不変特徴量ＰＩＲＦ（ＰＩＲＦ辞書）を抽出する。これを識別情報と共にエリアデータベース１１に登録することで、エリアデータベース１１を構築する。この場合、必要に応じて、例えば未知のエリアに侵入した場合、その未知エリアを学習し、エリアデータベース１１を更新するなどすることができる。なお、予めエリアデータベース１１を準備しておき、それを利用することも勿論可能である。 By the way, in this Embodiment, the area database 11 is produced | generated by the position estimation apparatus 1 itself by the feature-value extraction part 20. FIG. For this reason, for example, when the position estimation device 1 is mounted on a robot apparatus, the robot apparatus first acquires continuous images of the area while moving the area to be recognized by the imaging unit, and identification for identifying the area. Associate information. Then, the continuous image associated with the identification information is used as a teacher image, and the feature amount extraction unit 20 extracts the invariant feature amount PIRF (PIRF dictionary). By registering this in the area database 11 together with the identification information, the area database 11 is constructed. In this case, for example, when an unknown area is entered, it is possible to learn the unknown area and update the area database 11 as necessary. It is of course possible to prepare the area database 11 in advance and use it.

ロボット装置が自己位置を推定する場合、周囲を撮影した画像を入力画像とし、局所特徴量抽出部２１によりその入力画像から複数の局所特徴量を抽出する。推定結果出力部１３は、抽出した局所特徴量と、エリアデータベース１１に登録されているＰＩＲＦ辞書に含まれるＰＩＲＦとのマッチングを取り、各局所特徴量にマッチする不変特徴量を有するエリアに投票し、得票数が最大となったエリアを現在の自己位置として推定し、結果を出力する。図４は、入力画像がいずれのプレイスであるかを推定する方法を説明する図である。図４に示すように、ある環境には、５つのプレイスが含まれているとする。各プレイスは、それぞれＰＩＲＦ辞書（Ｄ_１〜Ｄ_５）を有している。 When the robot apparatus estimates its own position, an image of the surroundings is used as an input image, and the local feature amount extraction unit 21 extracts a plurality of local feature amounts from the input image. The estimation result output unit 13 takes a match between the extracted local feature quantity and the PIRF included in the PIRF dictionary registered in the area database 11 and votes for an area having an invariant feature quantity that matches each local feature quantity. The area where the number of votes is maximized is estimated as the current self-position, and the result is output. FIG. 4 is a diagram for explaining a method for estimating which place the input image is. As shown in FIG. 4, it is assumed that a certain environment includes five places. Each place has a PIRF dictionary (D _{1 to} D ₅ ).

先ず、入力画像（テスト画像）から、局所特徴量を抽出する。抽出された各局所特徴量と、各ＰＩＲＦ辞書に含まれる個別のＰＩＲＦとのマッチングをとる。これらのマッチングにより、最も多くのＰＩＲＦとマッチングが取れたＰＩＲＦ辞書に対応するプレイスが、入力画像の属するプレイスである。図４においては、１枚目、２枚目とも入力画像は、すべて１つめのＰＩＲＦ辞書Ｄ_１とのマッチングがもっとも大きい。したがって、ＰＩＲＦ辞書Ｄ_１に対応するプレイスが入力画像のプレイスとなる。 First, a local feature amount is extracted from an input image (test image). Matching between each extracted local feature and each individual PIRF included in each PIRF dictionary is performed. A place corresponding to the PIRF dictionary that matches most PIRFs by these matchings is a place to which the input image belongs. In FIG. 4, the first and second images all have the largest matching with the _first PIRF dictionary D1. Thus, Place corresponding to PIRF dictionary _{D 1} is the place of the input image.

ここで、局所特徴量抽出部２１は、得票数が所定の閾値未満である場合は、当該エリアを未学習エリアとして認識するものとする。これにより、正確にエリアを推定できない場合は、エリアを特定しないので、確実に特定できるエリアのみを出力することができる。また、不確かな推定を行わないだけでなく、適切な追加学習を行う等の処理を行うことができる。 Here, when the number of votes is less than a predetermined threshold, the local feature quantity extraction unit 21 recognizes the area as an unlearned area. As a result, when the area cannot be estimated accurately, the area is not specified, so that only the area that can be reliably specified can be output. Moreover, not only uncertain estimation but also appropriate additional learning can be performed.

次に、ＤＢ最適化部１４について説明する。１つのＰＩＲＦ辞書は、１つのプレイスに対応する。したがって、認識したプレイスの数だけＰＩＲＦ辞書が生成される。したがって、多数のプレイスを認識すればするほど、ＰＩＲＦ辞書の数も増え、そのメモリ容量も増大する。本位置推定装置１をロボット装置に搭載した場合、例えば訪れた場所（プレイス）の数のＰＩＲＦ辞書を学習することになる。しかしながら、ロボット装置の記憶容量は有限である。したがって、エリアデータベース１１の容量を最適化する必要がある。 Next, the DB optimization unit 14 will be described. One PIRF dictionary corresponds to one place. Accordingly, as many PIRF dictionaries as the number of recognized places are generated. Accordingly, as the number of places is recognized, the number of PIRF dictionaries increases and the memory capacity thereof also increases. When the position estimation apparatus 1 is mounted on a robot apparatus, for example, PIRF dictionaries corresponding to the number of places visited are learned. However, the storage capacity of the robot apparatus is finite. Therefore, it is necessary to optimize the capacity of the area database 11.

すなわち、エリアデータベース１１のデータ容量を削減する必要がある場合がある。このような場合、２通りのデータ削減方法が考えられる。１つは、ＰＩＲＦ辞書を構成する個別のＰＩＦＲの個数を減らす方法であり、他は、ＰＩＲＦ辞書自体の個数を減らす方法である。 That is, it may be necessary to reduce the data capacity of the area database 11. In such a case, two data reduction methods are conceivable. One is a method of reducing the number of individual PIFRs constituting the PIRF dictionary, and the other is a method of reducing the number of PIRF dictionaries themselves.

先ず、ＰＩＲＦ辞書を構成する個別のＰＩＦＲの削減方法について説明する。ＤＢ最適化部１４は、ＰＩＦＲ辞書を構成する複数のＰＩＦＲ（個別不変特徴量）のうち、冗長なものを削除することで、ＰＩＲＦ辞書に含まれる個別のＰＩＲＦの個数を最適化する。 First, a method for reducing individual PIFRs constituting the PIRF dictionary will be described. The DB optimizing unit 14 optimizes the number of individual PIRFs included in the PIRF dictionary by deleting redundant ones of a plurality of PIFRs (individual invariant feature quantities) constituting the PIFR dictionary.

ＤＢ最適化部１４は、エリアデータベース１１を構築する際に使用した教師画像を再使用する。そして、各教師画像を識別するために使用された個別の不変特徴量に対し投票を行い、当該投票結果に応じて冗長な個別特徴量を削除する。すなわち、識別にあまり使用されない識別能力が低いＰＩＲＦを削除する。例えば、投票０で、一度も識別に使用されなかったＰＩＲＦは、削除しても何ら性能に影響がない。なお、このような最適化は、ある程度まとまった教師画像が必要であり、バッチテストであるため、ロボット装置に搭載した場合は、オフライン作業となる。 The DB optimization unit 14 reuses the teacher image used when the area database 11 is constructed. Then, voting is performed on the individual invariant feature amounts used to identify each teacher image, and redundant individual feature amounts are deleted according to the voting result. That is, a PIRF with a low identification capability that is not often used for identification is deleted. For example, a PIRF that has never been used for identification in a vote 0 has no effect on performance even if it is deleted. Note that such optimization requires a certain amount of teacher images and is a batch test, so when it is installed in a robot apparatus, it is offline work.

例えば、東京タワーや富士山のように高いもの、象徴的なものは、複数のプレイスに特徴量としてそのＰＩＲＦが抽出される場合がある。このようなＰＩＲＦはプレイスの識別能力が低いＰＩＲＦとして削除することができる。 For example, PIRFs may be extracted as feature quantities for a plurality of places such as Tokyo Tower and Mt. Fuji. Such a PIRF can be deleted as a PIRF having a low place identification capability.

次に、他の方法として、ＰＩＲＦ辞書自体の個数を減らす最適化について説明する。この場合、ＤＢ最適化部１４は、エリアデータベース１１に登録されたＰＩＲＦ辞書に対応する各プレイスについて、不適切な、冗長なＰＩＲＦ辞書を削除する。例えば、ロボット装置であれば、再び移動する蓋然性が低いプレイスについてのデータ、すなわち再び使用する蓋然性が低いＰＩＲＦ辞書を削除することができる。これは、例えば、ある一定期間以上使用しなかったＰＩＲＦ辞書を削除するようにするなどしればよい。当該動作は、人間の機能の忘れる、という機能に該当する。このような最適化は、そのつどロボット装置が行うことができるため、オンラインでの最適化が可能である。 Next, as another method, optimization for reducing the number of PIRF dictionaries themselves will be described. In this case, the DB optimization unit 14 deletes an inappropriate and redundant PIRF dictionary for each place corresponding to the PIRF dictionary registered in the area database 11. For example, in the case of a robot apparatus, it is possible to delete data on a place that has a low probability of moving again, that is, a PIRF dictionary that has a low probability of being used again. For example, a PIRF dictionary that has not been used for a certain period of time may be deleted. This operation corresponds to a function of forgetting a human function. Since such optimization can be performed each time by the robot apparatus, online optimization is possible.

なお、本発明は上述した実施の形態のみに限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能であることは勿論である。 It should be noted that the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention.

例えば、上述の実施の形態では、ハードウェアの構成として説明したが、これに限定されるものではなく、任意の処理を、ＣＰＵ（Central Processing Unit）にコンピュータプログラムを実行させることにより実現することも可能である。この場合、コンピュータプログラムは、記録媒体に記録して提供することも可能であり、また、インターネットその他の伝送媒体を介して伝送することにより提供することも可能である。 For example, in the above-described embodiment, the hardware configuration has been described. However, the present invention is not limited to this, and arbitrary processing may be realized by causing a CPU (Central Processing Unit) to execute a computer program. Is possible. In this case, the computer program can be provided by being recorded on a recording medium, or can be provided by being transmitted via the Internet or another transmission medium.

〔実施例１〕 [Example 1]

実験データは、「ずずかけ台」と「大岡山」のキャンパス内で撮影した連続画像を使用した。画像サイズは６４０×４２８とした。すずかけ台キャンパスについては、５８０枚を学習に使用し、４８９枚をテストに利用した。大岡山キャンパスについては、４５０枚を学習に、４９３枚をテストに利用した。 For the experimental data, we used continuous images taken on the campus of "Zuzukakedai" and "Ookayama". The image size was 640 × 428. About Suzukakedai campus, 580 sheets were used for learning and 489 sheets were used for the test. On the Ookayama campus, 450 were used for learning and 493 were used for testing.

図５（ａ）左図は、すずかけ台キャンパスを示し、２３のプレイスに手動で分割した。図５（ｂ）右図は、大岡山キャンパスを示し、１３のプレイスに手動で分割した。すずかけ台キャンバスについては、天気の良い休日の昼間に教師画像を取得した。テスト画像としては、平日の様々な天気のもと取得した画像を使用した。大岡山キャンパスについては、教師画像及びテスト画像は、共に平日の様々な天気のものランダムに３か月間で取得した画像を使用した。 The left figure of Fig.5 (a) shows the Suzukakedai campus, and it divided | segmented manually into 23 places. The right figure of FIG.5 (b) shows the Ookayama campus and divided | segmented manually into 13 places. For the Suzukakedai canvas, we acquired a teacher image during the daytime on a sunny day. As test images, images obtained under various weather conditions on weekdays were used. For the Ookayama Campus, both teacher images and test images were randomly acquired for three months for various weather conditions on weekdays.

図５（ｂ）左上図は、プレイスＡ２１を示し、左下図はプレイスＡ１を示す。教師画像は、休日に収集した（左上図）のに対し、テスト画像は、平日に収集した（右上図）。いくつかの教師画像は昼間撮影し（左下図）、テスト画像は夕方取得した（右下図）。 The upper left diagram in FIG. 5B shows the place A21, and the lower left diagram shows the place A1. Teacher images were collected on holidays (upper left diagram), while test images were collected on weekdays (upper right diagram). Several teacher images were taken during the day (bottom left) and test images were acquired in the evening (bottom right).

図５（ｃ）は、ＣＯＬＤデータセットの「Ljubljana lab.」のサンプル画像を示す。プリンターエリア、廊下、シェア部屋、バスルームの４カ所を使用し、曇り、晴れ、夜の環境で撮影されたもで、それぞれ２つずつの連続画像を利用した。約６０００の連続画像があり、サイズは６４０×４８０である。図５（ｄ）は、図５（ｃ）に対応する場所であって、夜間に撮影されたものを示す。 FIG. 5C shows a sample image of “Ljubljana lab.” Of the COLD data set. Using four printer areas, a corridor, a shared room, and a bathroom, the images were taken in a cloudy, sunny, and night environment. There are about 6000 continuous images, and the size is 640 × 480. FIG. 5D shows a place corresponding to FIG. 5C taken at night.

比較手法としては、特徴量にそれぞれＧＩＳＴ（A. Torralba, K. P. Murphy, W. T. Freeman and M. A. Rubin, "Context‐Based Vision System for Place and Object Recognition," Proc. IEEE Int'l Conf. Computer Vision, pp. 1023‐1029, 2003）とsPACT（J. Wu and J. M. Rehg, "Where am I: Place instance and category recognition using spatial PACT," Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2008.）を利用し、識別器としてそれぞれに１-ＮＮ（1-nearest neighbor）とＳＶＭ（Support Vector Machine）を用いた場合の計４手法を使用した。 As a comparison method, GIST (A. Torralba, KP Murphy, WT Freeman and MA Rubin, "Context-Based Vision System for Place and Object Recognition," Proc. IEEE Int'l Conf. Computer Vision, pp. 1023-1029, 2003) and sPACT (J. Wu and JM Rehg, "Where am I: Place instance and category recognition using spatial PACT," Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2008.) A total of four methods were used in the case where 1-NN (1-nearest neighbor) and SVM (Support Vector Machine) were used as discriminators.

図６（ａ）は実験結果及びそれぞれの手法での精度を示す図である。ＰＩＲＦの結果は一番右であり、他手法に比べて２倍以上性能がよいことが確認できた。左がすずかけ台キャンパス、右が大岡山キャンパスでの実験結果を示す。ず６（ｂ）は、すずかけ台キャンパスでの認識結果の混合行列を示す。行が正解を示し、列が推測結果を示す。また図６（ｃ）は各テスト画像に対する信頼値を示すグラフ図である。平均線は３次多項式を使用した。正しい認識結果の時の信頼値は平均して高く、誤認識した際の信頼値は平均して低いことが確認でき、認識結果の信頼度も正しい相関関係が見て取れる。図６（ｄ）は、各テスト画像の認識時間を示している。ＰＩＲＦ−ｎはＰＩＲＦをｎ％まで削減した場合の結果を示す。一番下の実線はＰＩＲＦ−５０で投票処理を並列に実行した場合の結果を示している。ＰＩＲＦの数を削減することによって認識時間を削減することが可能であり、さらに、並列処理を加えることによって既存手法と同程度の認識時間を達成できる。なお、いずれの削減量においても、既存手法よりも優れた認識精度を維持している。 FIG. 6A is a diagram showing the experimental results and the accuracy of each method. The result of PIRF is the rightmost, and it was confirmed that the performance was more than twice as good as other methods. The left shows the experimental results at the Suzukakedai Campus and the right shows the Ookayama Campus. No. 6 (b) shows a mixing matrix of recognition results on the Suzukakedai campus. Rows indicate correct answers and columns indicate guess results. FIG. 6C is a graph showing a confidence value for each test image. The average line used was a cubic polynomial. It can be confirmed that the confidence value at the time of the correct recognition result is high on average, the confidence value at the time of misrecognition is low on average, and the reliability of the recognition result has a correct correlation. FIG. 6D shows the recognition time of each test image. PIRF-n indicates the result when PIRF is reduced to n%. The bottom solid line shows the result when the voting process is executed in parallel with PIRF-50. It is possible to reduce the recognition time by reducing the number of PIRFs, and further, it is possible to achieve a recognition time comparable to that of the existing method by adding parallel processing. In any reduction amount, recognition accuracy superior to that of the existing method is maintained.

本実施例により、屋外における場所認識の実験において、他の識別手法に比して、精度が高いことが確認できた。 According to this example, it was confirmed that the accuracy of the outdoor location recognition experiment was higher than that of other identification methods.

〔実施例２〕
ＣＯＬＤのデータセット（M. M. Ullah, A. Pronobis, B. Caputo, J. Luo, P. Jensfelt and H.I. Christensen, "Towards Robust Place Recognition for Robot Localization," Proc. IEEE Int'l Conf. Robotics and Automation, pp.530‐537, 2008）の一部（図６（ｃ）、（ｄ））を用いて実験を行なった。屋内の４カ所について、晴れ、曇り、夜の３つの環境下で撮影した連続画像があり、それぞれについて２組の連続画像が存在する。よって、計６つの連続画像があり、それらを6-fold cross-validationとして評価した。図７は、本実施例の結果を示す図である。図７（ａ）は、晴れで学習した場合、図７（ｂ）は、曇りで学習した場合、図７（ｃ）は、夜で学習した場合を示す。 [Example 2]
COLD dataset (MM Ullah, A. Pronobis, B. Caputo, J. Luo, P. Jensfelt and HI Christensen, "Towards Robust Place Recognition for Robot Localization," Proc. IEEE Int'l Conf. Robotics and Automation, pp .530-537, 2008) (FIGS. 6C and 6D) were used for experiments. There are continuous images taken under three environments, sunny, cloudy, and night, at four indoor locations, and there are two sets of continuous images for each. Therefore, there were a total of 6 consecutive images, which were evaluated as 6-fold cross-validation. FIG. 7 is a diagram showing the results of this example. FIG. 7A shows a case where learning is performed in a fine state, FIG. 7B shows a case where learning is performed in a cloudy state, and FIG. 7C shows a case where learning is performed at night.

図７（ａ）乃至図７（ｃ）において、左から、晴れ、曇り、夜、平均での結果を示す。それぞれの結果のうち、左からＰＩＲＦ、Harris-Lasplace+SVM、sPACT+NN、Gist+NNでの結果を示す。 In FIG. 7A to FIG. 7C, the average result is shown from the left, clear, cloudy, night, and average. Among the results, the results for PIRF, Harris-Lasplace + SVM, sPACT + NN, and Gist + NN are shown from the left.

本実施例により、他手法と同等の精度が得られることが確認できた。 It has been confirmed that the accuracy equivalent to that of other methods can be obtained by this example.

〔実施例３〕
実施例１及び実施例２で用いたデータセットを混合したもので場所認識の実験を行なった。その結果を図８に示す。行は正解クラス、列は推測結果のクラスを表す。それぞれのデータセットの平均認識精度は、ＣＯＬＤのデータセット：９５．１２％、すずかけ台キャンパスのデータ：８６．５６％、大岡山キャンパスのデータ：７６．０６％であった。 Example 3
A place recognition experiment was performed using a mixture of the data sets used in Example 1 and Example 2. The result is shown in FIG. Rows represent correct classes and columns represent guess results. The average recognition accuracy of each data set was COLD data set: 95.12%, Suzukakedai campus data: 86.56%, Ookayama campus data: 76.06%.

本実施例により、屋内や屋外のデータが混ざっていても、また、各場所について学習データの量に大きな差があったとしても、利用可能な手法であることが確認できた。 According to the present example, even if indoor and outdoor data are mixed, or even if there is a large difference in the amount of learning data for each place, it can be confirmed that the method can be used.

〔実施例４〕
実験１のデータセットを用いて、ＰＩＲＦをＩＳＣ（C. Valgren, T. Duckett and A. Lilienthal, "Incremental Spectral Clustering and Its Application to Topological Mapping," Proc. IEEE Int'l Conf. Robotics and Automation, 2007、C. Valgren and A. Lilienthal, "Incremental Spectral Clustering and Seasons: Appearance‐Based Localization in Outdoor Environments," Proc. IEEE Int'l Conf. Robotics and Automation, 2008）のような追加的なアピアランスベースのトポロジカルマッピングに利用した。図９は、その結果を示す図である。図９（ａ）は、各画像の演算時間を示し、上グラフ（長時間側）がＩＳＣであり、下グラフがＰＩＲＦの結果である。また、図９（ｂ）は、積算時間を示し、同じくＩＳＣがＰＩＲＦよりより長時間かかっている。図９（ｃ）は、認識率と、最初の１００枚をテストした時間を示す。図において、一番上のグラフがＩＳであり、下側の３つのグラフが、ＰＩＲＦ−７５、ＰＩＲＦ−５０、ＰＩＲＦ−２５の結果を示している。ＰＩＲＦ−７５、ＰＩＲＦ−５０、ＰＩＲＦ−２５とは、それぞれＰＩＲＦを７５％、５０％、２５％に減らしたものである。 Example 4
Using the data set of Experiment 1, PIRF was converted to ISC (C. Valgren, T. Duckett and A. Lilienthal, "Incremental Spectral Clustering and Its Application to Topological Mapping," Proc. IEEE Int'l Conf. Robotics and Automation, 2007. , C. Valgren and A. Lilienthal, "Incremental Spectral Clustering and Seasons: Appearance-Based Localization in Outdoor Environments," Proc. IEEE Int'l Conf. Robotics and Automation, 2008) Used for FIG. 9 is a diagram showing the results. FIG. 9A shows the calculation time of each image. The upper graph (long-time side) is ISC, and the lower graph is the result of PIRF. FIG. 9B shows the integration time, and similarly, ISC takes longer than PIRF. FIG. 9C shows the recognition rate and the time for testing the first 100 sheets. In the figure, the top graph is IS, and the lower three graphs show the results of PIRF-75, PIRF-50, and PIRF-25. PIRF-75, PIRF-50, and PIRF-25 are obtained by reducing PIRF to 75%, 50%, and 25%, respectively.

図９（ａ）及び（ｂ）から、従来手法のＩＳＣより明らかに高速に実行することができることが確認できた。また、その際の精度は、従来手法のＩＳＣは４０．２９％であるのに対して、ＰＩＲＦを用いた場合は９３．４８％であり、精度においても高性能であることが確認できた。 From FIGS. 9 (a) and 9 (b), it was confirmed that it can be executed clearly at a higher speed than the conventional ISC. The accuracy at that time was 40.29% for the ISC of the conventional method, but 93.48% when using PIRF, and it was confirmed that the accuracy was also high.

本実施例により、ＰＩＲＦがＩＳＣのような追加的なアピアランスベースのトポロジカルマッピングにも利用でき、かつ、ＩＳＣよりも高速に実行できることが確認できた。 According to the present embodiment, it was confirmed that PIRF can be used for additional appearance-based topological mapping such as ISC and can be executed at higher speed than ISC.

１位置推定装置
１０位置推定部
１１エリアデータベース
１２記憶部
１３推定結果出力部
１４最適化部
２０特徴量抽出部
２１局所特徴量抽出部
２２特徴量マッチング部
２３連続特徴量選択部
２４不変特徴量算出部 DESCRIPTION OF SYMBOLS 1 Position estimation apparatus 10 Position estimation part 11 Area database 12 Storage part 13 Estimation result output part 14 Optimization part 20 Feature-value extraction part 21 Local feature-value extraction part 22 Feature-value matching part 23 Continuous feature-value selection part 24 Invariant feature-value calculation Part

Claims

Feature amount extraction means for extracting an invariant feature amount from an input image consisting of continuous images taken continuously;
Position estimation means for estimating a position based on the invariant feature quantity extracted by the feature quantity extraction means;
The feature amount extraction means includes:
Local feature extraction means for extracting a local feature from each of the input images;
About the local feature amount extracted by the local feature amount extraction unit, a feature amount matching unit that performs matching between the continuous input images;
Continuous feature quantity selecting means for selecting a local feature quantity matched between a predetermined number of consecutive images by the feature quantity matching means as a continuous feature quantity;
A position estimation device comprising: an invariant feature amount calculating means for obtaining an average of the continuous feature amounts as an invariant feature amount.

The position estimating means stores an area database in which a teacher image is used as an input image and the invariant feature amount extracted from a plurality of the teacher images is associated with area identification information indicating an area from which the invariant feature amount is extracted. Means,
Taking an image of the surroundings as an input image, matching each of a plurality of local feature amounts of the input image extracted by the local feature amount extraction means with the invariant feature amount registered in the area database, The position estimation apparatus according to claim 1, further comprising estimation result output means for voting on an area corresponding to an invariant feature amount that matches a local feature amount and estimating an area having the maximum number of votes as a current position.

The position estimation device according to claim 1, wherein the position estimation unit recognizes the area as an unlearned area when the number of votes is less than a predetermined threshold.

The position estimation device according to any one of claims 1 to 3, wherein the local feature amount is a feature amount of SIFT (Scale Invariant Feature Transformation) and / or SURF (Speed Up Robustness Features).

Having optimization means for optimizing the database;
5. The position estimation device according to claim 1, wherein the optimization unit deletes redundant individual invariant feature amounts among a plurality of individual invariant feature amounts constituting the invariant feature amount. 6.

The optimization unit re-uses the teacher image, votes for the individual invariant feature used to identify the teacher image, and deletes the redundant individual feature according to the voting result. 5. The position estimation device according to 5.

The robot apparatus according to claim 5, wherein the optimization unit performs the optimization offline.

The robot apparatus according to claim 5, wherein the optimization unit performs the optimization online.

Having optimization means for optimizing the database;
The position estimation device according to claim 1, wherein the optimization unit deletes redundant data for the invariant feature amount and the area identification information registered in the area database.

The position estimation device according to claim 9, wherein the redundant data has a low probability of moving again to the area indicated by the area identification information.

A feature amount extraction step of extracting an invariant feature amount from an input image composed of continuous images taken continuously;
A position estimation step of estimating a position based on the invariant feature amount extracted by the feature amount extraction unit;
The feature amount extraction step includes:
A local feature amount extracting step for extracting a local feature amount from each of the input images;
A feature amount matching step for matching between the continuous input images for the local feature amount extracted by the local feature amount extraction step;
A continuous feature amount selection step of selecting a local feature amount matched between a predetermined number of consecutive images by the feature amount matching step as a continuous feature amount; and
A position estimation method comprising: an invariant feature amount calculating step of obtaining an average of the continuous feature amounts as an invariant feature amount.

A local feature amount extracting means for extracting a local feature amount from each of the input images composed of continuous images taken continuously;
About the local feature amount extracted by the local feature amount extraction unit, a feature amount matching unit that performs matching between the continuous input images;
Continuous feature quantity selecting means for selecting a local feature quantity matched between a predetermined number of consecutive images by the feature quantity matching means as a continuous feature quantity;
An invariant feature amount calculating means for obtaining an average of the continuous feature amounts as an invariant feature amount.

The feature quantity extraction device according to claim 12, wherein the local feature quantity is a feature quantity of SIFT (Scale Invariant Feature Transformation) and / or SURF (Speed Up Robustness Features).

A local feature extraction step of extracting a local feature from each of the input images consisting of continuous images taken continuously;
A feature amount matching step for matching between the continuous input images for the local feature amount extracted by the local feature amount extraction step;
A continuous feature quantity selection step of selecting a local feature quantity matched between a predetermined number of consecutive images by the feature quantity matching means as a continuous feature quantity;
An invariant feature amount calculating step of obtaining an average of the continuous feature amounts as an invariant feature amount.

A program for causing a computer to execute a position estimation operation for estimating a position based on a feature amount,
A local feature extraction step of extracting a local feature from each of the input images consisting of continuous images taken continuously;
A feature amount matching step for matching between the continuous input images for the local feature amount extracted by the local feature amount extraction step;
A continuous feature quantity selection step of selecting a local feature quantity matched between a predetermined number of consecutive images by the feature quantity matching means as a continuous feature quantity;
And an invariant feature amount calculating step for obtaining an average of the continuous feature amounts as an invariant feature amount.