JP2012178048A

JP2012178048A - Program and image processor

Info

Publication number: JP2012178048A
Application number: JP2011040712A
Authority: JP
Inventors: Akira Chin; 彬陳
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-02-25
Filing date: 2011-02-25
Publication date: 2012-09-13
Anticipated expiration: 2031-02-25
Also published as: JP5626011B2

Abstract

PROBLEM TO BE SOLVED: To perform accurate image collation regardless of the difference of photography conditions of an input image to be collated from photography conditions of a registered image in a program and an image processor.SOLUTION: This image processor includes: a pre-processing part for extracting a feature point with respect to each frame of input image data, and calculating an input feature vector on the basis of the extracted feature point; a storage part for registering a tree structure connected by sub-trees respectively having the feature vectors of the feature points of a detection object image as nodes and having a representative feature vector representing each category and the samples of the feature vectors as members, and clustered into classes of each detection object; a recognition part for performing matching of the input feature vectors with the class in the storage part in a first stage, and performing matching of the input feature vectors with each member in the class whose matching has been performed in a second stage, and outputting detection object data whose matching has been recognized.

Description

本発明は、プログラム及び画像処理装置に関する。 The present invention relates to a program and an image processing apparatus.

或る環境中に存在するランドマークの画像は、例えばこの環境中で移動するロボット等の装置の自己位置推定、環境認識等に利用することができる。このような場合、ランドマークのデータベースを事前に作成しておき、自己位置推定を行う際に装置のカメラが新たに撮影した画像中にデータベースに登録されているランドマークが存在するか否かを判定する。データベースを作成するには、例えば或る環境下での多数の画像を撮影し、各画像から角（又は、コーナ）、色等といった特徴点を検出する。そして、検出された特徴点がデータベースに登録されていない新たな特徴点であれば新たに登録する。一方、検出された特徴点がデータベースに既に登録されていれば、例えば検出された特徴点が属するカテゴリの代表的な特徴ベクトル（以下、代表特徴ベクトルと言う）を更新する。撮影された多数の画像から得た特徴ベクトルをクラスタリングすることで複数のクラスを生成し、各クラスに属するサンプル画像の特徴ベクトルを平均した特徴ベクトルを当該クラスの代表特徴ベクトルとすることができる。 An image of a landmark that exists in a certain environment can be used for self-position estimation, environment recognition, and the like of a device such as a robot that moves in the environment. In such a case, a landmark database is created in advance, and whether or not there is a landmark registered in the database in an image newly captured by the camera of the device when performing self-position estimation. judge. In order to create a database, for example, a large number of images under a certain environment are photographed, and feature points such as corners (or corners) and colors are detected from each image. If the detected feature point is a new feature point not registered in the database, it is newly registered. On the other hand, if the detected feature point is already registered in the database, for example, a representative feature vector of a category to which the detected feature point belongs (hereinafter referred to as a representative feature vector) is updated. A plurality of classes can be generated by clustering feature vectors obtained from a large number of captured images, and a feature vector obtained by averaging feature vectors of sample images belonging to each class can be used as a representative feature vector of the class.

撮影した画像を多数のサンプル画像の全てと照合するのには膨大な時間がかかる。これに対し、撮影した画像の特徴ベクトルをデータベースに登録された各代表特徴ベクトルと照合する場合、照合に要する時間を短縮できる。 It takes an enormous amount of time to collate a photographed image with all of many sample images. On the other hand, when the feature vector of the photographed image is collated with each representative feature vector registered in the database, the time required for collation can be shortened.

しかし、特徴ベクトルは、画像撮影の際のカメラの位置姿勢、照明等の変化にできるだけロバストであるように作成しても、これらの変化に完全に対応することはできない。例えば、データベースの作成時に用いた画像サンプルとは異なる時間帯で撮影した画像、或いは、データベースの作成時に用いた画像サンプルとは異なる視点から撮影した画像等からランドマークを検出しようとすると、ランドマークの誤検出が発生してしまう。つまり、データベースの作成時に用いた画像サンプルに対して、照明変化、視点の違い等により撮影された画像とデータベースに登録された画像サンプルとの照合結果に誤りが生じ、登録されているランドマークではない画像をランドマークであると誤って検出したり、登録されているランドマークの画像をランドマークであると正しく検出できない場合が発生してしまう。これは、同一物体を撮影した場合でも、照明変化、視点の違い、影の違い等の影響により、撮影された画像は若干異なり、同一物体に対する特徴ベクトルが異なることによる。 However, even if the feature vector is created so as to be as robust as possible to changes in the position and orientation of the camera at the time of image capturing, illumination, etc., it is not possible to completely cope with these changes. For example, if a landmark is detected from an image taken at a different time zone from the image sample used when creating the database, or an image taken from a different viewpoint than the image sample used when creating the database, False detection will occur. In other words, with respect to the image sample used at the time of creating the database, an error occurs in the collation result between the image sampled by the illumination change, the difference in viewpoint, etc. and the image sample registered in the database. In some cases, an unacceptable image is erroneously detected as a landmark, or a registered landmark image cannot be detected correctly as a landmark. This is because even when the same object is photographed, the photographed images are slightly different and the feature vectors for the same object are different due to the effects of illumination changes, viewpoint differences, shadow differences, and the like.

同一物体を撮影した画像から検出した特徴ベクトルが照明変化、視点の違い、影の違い等の影響により異なると、代表特徴ベクトルがぼけてしまう。このため、画像サンプルの数が多くなると、代表特徴ベクトルが本来同じカテゴリに属する特徴ベクトルを代表できなる可能性がある。 If the feature vector detected from an image of the same object is different due to the influence of illumination change, difference in viewpoint, difference in shadow, etc., the representative feature vector will be blurred. For this reason, when the number of image samples increases, there is a possibility that the representative feature vector can represent the feature vector originally belonging to the same category.

尚、照明変化、視点の違い、影の違い等により画像の照合時に発生する画像の誤検出の問題は、ランドマークの検出時に限らず、入力画像を登録済みの画像と照合する各種画像処理装置においても同様に発生する。 Note that the problem of image misdetection that occurs during image collation due to changes in illumination, differences in viewpoints, shadows, etc. is not limited to the detection of landmarks, but various image processing devices that collate input images with registered images. It occurs in the same way.

特開２０００−２８５１４１号公報JP 2000-285141 A 特開２００９−５３８４２号公報JP 2009-53842 A 特開平９−２９４２７７号公報JP-A-9-294277 特開２００９−２４５３０４号公報JP 2009-245304 A

従来の画像処理装置では、登録されている画像の撮影条件と照合する入力画像の撮影条件が異なると、画像の誤検出が発生してしまい、正確な画像照合を行うことは難しいという問題があった。 In the conventional image processing apparatus, if the shooting condition of the input image to be checked is different from the shooting condition of the registered image, an erroneous image detection occurs, and it is difficult to perform accurate image matching. It was.

そこで、本発明は、登録されている画像の撮影条件と照合する入力画像の撮影条件の違いにかかわらず正確な画像照合を行うことができるプログラム及び画像処理装置を提供することを目的とする。 SUMMARY An advantage of some aspects of the invention is that it provides a program and an image processing apparatus that can perform accurate image collation regardless of the difference in the photographing conditions of an input image to be collated with a registered image photographing condition.

本発明の一観点によれば、コンピュータに、画像データから検出対象を検出させるプログラムであって、入力された画像データの各フレームに対して特徴点を抽出し、抽出した特徴点に基づいて特徴ベクトルを算出する前処理手順と、検出対象画像の特徴点の特徴ベクトルをノードとし、カテゴリ毎に当該カテゴリを代表する代表特徴ベクトルと特徴ベクトルのサンプルがメンバーであるサブツリーで接続されると共に検出対象毎のクラスにクラスタ化された木構造が登録された記憶部にアクセスし、第１段階では前記前処理手順で算出した特徴ベクトルと前記記憶部内のクラスとのマッチングを行い、第２段階ではマッチングされたクラス内の各メンバーとのマッチングを行いマッチングすると認識された検出対象のデータを出力する認識手順を前記コンピュータに実行させることを特徴とするプログラムが提供される。 According to one aspect of the present invention, a program for causing a computer to detect a detection target from image data, wherein feature points are extracted for each frame of input image data, and features based on the extracted feature points A preprocessing procedure for calculating a vector and a feature vector of a feature point of a detection target image as a node, a representative feature vector representing the category for each category and a sample of the feature vector are connected by a subtree as a member and the detection target Access to the storage unit in which the tree structure clustered in each class is registered. In the first stage, the feature vector calculated in the pre-processing procedure is matched with the class in the storage unit. In the second stage, the matching is performed. Recognition that outputs data of detection target recognized as matching by matching with each member in the selected class A program characterized by executing the order the computer is provided.

本発明の一観点によれば、入力された画像データの各フレームに対して特徴点を抽出し、抽出した特徴点に基づいて入力特徴ベクトルを算出する前処理部と、検出対象画像の特徴点の特徴ベクトルをノードとし、カテゴリ毎に当該カテゴリを代表する代表特徴ベクトルと特徴ベクトルのサンプルがメンバーであるサブツリーで接続されると共に検出対象毎のクラスにクラスタ化された木構造が登録された記憶部と、第１段階では前記入力特徴ベクトルと前記記憶部内のクラスとのマッチングを行い、第２段階ではマッチングされたクラス内の各メンバーとのマッチングを行いマッチングすると認識された検出対象のデータを出力する認識部を備えたことを特徴とする画像処理装置が提供される。 According to one aspect of the present invention, a feature point is extracted for each frame of input image data, a preprocessing unit that calculates an input feature vector based on the extracted feature point, and a feature point of a detection target image A memory in which a representative feature vector representing the category and a sample of the feature vector are connected by a subtree as a member and a tree structure clustered in a class for each detection target is registered for each category. In the first stage, the input feature vector is matched with the class in the storage section, and in the second stage, each member in the matched class is matched and the detection target data recognized as matching is obtained. There is provided an image processing apparatus including a recognition unit for outputting.

開示のプログラム及び画像処理装置によれば、登録されている画像の撮影条件と照合する入力画像の撮影条件の違いにかかわらず正確な画像照合を行うことができる。 According to the disclosed program and the image processing apparatus, accurate image matching can be performed regardless of the difference in the shooting conditions of the input image to be checked against the registered shooting conditions of the image.

本発明の一実施例における自律走行型のロボットの構成の一例を示す図である。It is a figure which shows an example of a structure of the autonomous running type robot in one Example of this invention. ロボットの遠隔操作を説明する図である。It is a figure explaining the remote control of a robot. データベースの作成方法を説明する機能ブロック図である。It is a functional block diagram explaining the creation method of a database. ランドマークＤＢの作成方法を説明するフローチャートである。It is a flowchart explaining the creation method of landmark DB. 特徴点の検出を説明する図である。It is a figure explaining the detection of a feature point. 特徴ベクトルと検出した特徴点との対応付けを説明する図である。It is a figure explaining matching with the feature vector and the detected feature point. 対応点リストの作成を説明する図である。It is a figure explaining preparation of a corresponding point list. ランドマークＤＢ２５に格納される木構造の一例を説明する図である。It is a figure explaining an example of the tree structure stored in landmark DB25. ランドマーク検出方法を説明する機能ブロック図である。It is a functional block diagram explaining a landmark detection method. 比較例における特徴ベクトルのサンプルの収集を説明する図である。It is a figure explaining collection of the sample of the feature vector in a comparative example. 比較例における収集した特徴ベクトルのクラスタリングを説明する図である。It is a figure explaining clustering of the collected feature vector in a comparative example. 比較例における平均ベクトルの計算を説明する図である。It is a figure explaining calculation of the average vector in a comparative example. 比較例におけるＫＤ木の生成を説明する図である。It is a figure explaining the production | generation of KD tree in a comparative example. 比較例における照明の変化と撮影地点による検出性能への影響を説明する図である。It is a figure explaining the influence on the detection performance by the change of the illumination in a comparative example, and a shooting location. 実施例における照明の変化と撮影地点による検出性能への影響を説明する図である。It is a figure explaining the influence on the detection performance by the change of the illumination in an Example, and a shooting location. 比較例における各フレームでのランドマークの検出数を説明する図である。It is a figure explaining the detection number of the landmark in each frame in a comparative example. 実施例における各フレームでのランドマークの検出数を説明する図である。It is a figure explaining the detection number of the landmark in each flame | frame in an Example. 図１６の比較例のランドマーク検出結果を示すヒストグラムである。It is a histogram which shows the landmark detection result of the comparative example of FIG. 図１７の実施例のランドマーク検出結果を示すヒストグラムである。It is a histogram which shows the landmark detection result of the Example of FIG. 図１８及び図１９のヒストグラムを表形式で示す図である。It is a figure which shows the histogram of FIG.18 and FIG.19 in a table format.

開示のプログラム及び画像処理装置では、前処理で、入力された画像データの各フレームに対して特徴点を抽出し、抽出した特徴点に基づいて入力特徴ベクトルを算出する。記憶部には、検出対象画像の特徴点の特徴ベクトルをノードとし、カテゴリ毎に当該カテゴリを代表する代表特徴ベクトルと特徴ベクトルのサンプルがメンバーであるサブツリーで接続されると共に検出対象毎のクラスにクラスタ化された木構造が登録されている。認識処理の第１段階では入力特徴ベクトルと記憶部内のクラスとのマッチングを行い、第２段階ではマッチングされたクラス内の各メンバーとのマッチングを行いマッチングすると認識された検出対象のデータを出力する。 In the disclosed program and the image processing apparatus, feature points are extracted for each frame of the input image data in the preprocessing, and an input feature vector is calculated based on the extracted feature points. In the storage unit, the feature vector of the feature point of the detection target image is a node, and for each category, a representative feature vector representing the category and a sample of the feature vector are connected by a subtree as a member, and the class for each detection target is set. A clustered tree structure is registered. In the first stage of the recognition process, matching is performed between the input feature vector and the class in the storage unit, and in the second stage, matching is performed with each member in the matched class, and the detection target data recognized as matching is output. .

第１の段階及び第２の段階の２段階で検出対象を検出することで、検出対象の検出時の誤りと見逃しを軽減することができる。 By detecting the detection target in the first stage and the second stage, it is possible to reduce errors and oversights when detecting the detection target.

以下に、開示のプログラム及び画像処理装置の各実施例を図面と共に説明する。 Hereinafter, embodiments of the disclosed program and the image processing apparatus will be described with reference to the drawings.

（ロボットシステムの構成）
図１は、本発明の一実施例における自律走行型のロボットの構成の一例を示す図である。ロボット１は、ナビゲーションＣＰＵ（Central Processing Unit）１１、走行制御ＣＰＵ１２、台車１３、センサ部１４、入出力部１５、及び記憶部１６を有する。入出力部１５は、利用者がロボット１に情報やコマンドを入力する入力部（図示せず）と、ロボット１から利用者へ情報を出力する出力部（図示せず）を含む。入力部は、例えばキーボード等の操作部、マイクロホン等を含む。一方、出力部は、表示部、スピーカ等を含む。ＣＰＵ１１，１２は、単一の計算機（又は、コンピュータ）を形成しても、別々の計算機（又は、コンピュータ）を形成しても良い。尚、ロボット１には、周知の構成を有し周知の動作を行うロボットアーム（図示せず）や、外部のサーバ（図示せず）等と通信するためのアンテナや送受信部を含む通信部（図示せず）を更に有しても良い。 (Robot system configuration)
FIG. 1 is a diagram showing an example of the configuration of an autonomously traveling robot in one embodiment of the present invention. The robot 1 includes a navigation CPU (Central Processing Unit) 11, a travel control CPU 12, a carriage 13, a sensor unit 14, an input / output unit 15, and a storage unit 16. The input / output unit 15 includes an input unit (not shown) for inputting information and commands to the robot 1 by the user and an output unit (not shown) for outputting information from the robot 1 to the user. The input unit includes, for example, an operation unit such as a keyboard, a microphone, and the like. On the other hand, the output unit includes a display unit, a speaker, and the like. The CPUs 11 and 12 may form a single computer (or computer) or separate computers (or computers). The robot 1 has a known configuration and a robot arm (not shown) that performs a known operation, a communication unit including an antenna and a transmission / reception unit for communicating with an external server (not shown), and the like. (Not shown).

記憶部１６は、ＣＰＵ１１，１２が実行するプログラムを含む各種プログラム、及びＣＰＵ１１，１２が実行する演算の中間データ、静的地図及び非静的地図のデータ等を含む各種データを格納する。記憶部１６は、コンピュータ読み取り可能な記憶媒体により形成可能である。コンピュータ読み取り可能な記憶媒体は、一例として、磁気記録媒体、光記録媒体、光磁気記録媒体、ディスクを記録媒体として用いるディスク装置、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）等を含む半導体記憶装置等を含む。ディスクを記録媒体として用いるディスク装置には、例えばＨＤＤ（Hard Disk Drive）が使用可能である。又、記憶部１６は、複数の記憶装置で形成されていても良く、この場合、アクセス速度の異なる記憶装置を含んでも良い。 The storage unit 16 stores various programs including programs executed by the CPUs 11 and 12, and various data including intermediate data of operations executed by the CPUs 11 and 12, static map and non-static map data, and the like. The storage unit 16 can be formed by a computer-readable storage medium. Examples of the computer-readable storage medium include a magnetic recording medium, an optical recording medium, a magneto-optical recording medium, a disk device using a disk as a recording medium, a RAM (Random Access Memory), a ROM (Read Only Memory), and other semiconductors. Including storage devices. For example, an HDD (Hard Disk Drive) can be used for a disk device that uses a disk as a recording medium. The storage unit 16 may be formed of a plurality of storage devices, and in this case, may include storage devices having different access speeds.

台車１３は、ジャイロセンサ１３１、センサ・エンコーダ１３２、モータ１３３、及び車輪１３４を有する。ジャイロセンサ１３１は、車輪１３４の回転量を計測して走行制御ＣＰＵ１２に出力し、センサ・エンコーダ１３２は、車輪１３４の回転数を検出して走行制御ＣＰＵ１２に出力する。ジャイロセンサ１３１及びセンサ・エンコーダ１３２は、内的センサを形成する。モータ１３３は、走行制御ＣＰＵ１２からのコマンドに基づいて車輪１３４を直接、或いは、ギア機構（図示せず）を介して回転する。モータ１３３は、複数設けられていても良く、台車１３の移動方向を決めるステアリング部（図示せず）を駆動しても良い。モータ１３３、ギア機構、及びステアリング部等は、ロボット１の走行を制御する走行制御系を形成する。 The carriage 13 includes a gyro sensor 131, a sensor / encoder 132, a motor 133, and wheels 134. The gyro sensor 131 measures the rotation amount of the wheel 134 and outputs it to the travel control CPU 12, and the sensor / encoder 132 detects the rotation speed of the wheel 134 and outputs it to the travel control CPU 12. The gyro sensor 131 and the sensor encoder 132 form an internal sensor. The motor 133 rotates the wheels 134 directly or via a gear mechanism (not shown) based on a command from the travel control CPU 12. A plurality of motors 133 may be provided, and a steering unit (not shown) that determines the moving direction of the carriage 13 may be driven. The motor 133, the gear mechanism, the steering unit, and the like form a travel control system that controls the travel of the robot 1.

走行制御ＣＰＵ１２は、台車１３の移動を制御して例えばナビゲーションＣＰＵ１１により指示された目標経路を追従させたり、台車１３内のジャイロセンサ１３１の出力情報及びセンサ・エンコーダ１３２の出力情報に基づいて台車１３、即ち、ロボット１の姿勢と現在位置を推定する。 The traveling control CPU 12 controls the movement of the carriage 13 to follow, for example, the target route instructed by the navigation CPU 11, or based on the output information of the gyro sensor 131 and the output information of the sensor / encoder 132 in the carriage 13. That is, the posture and current position of the robot 1 are estimated.

センサ部１４は、カメラ１４１及び距離センサ１４２を有する。カメラ１４１は、例えば撮影画像から周知の方法で視覚的ランドマークを抽出してロボット１の３次元位置を計測するステレオカメラで形成可能である。距離センサ１４２は、周囲環境への距離を周知の方法で計測する例えばＬＲＦ等の計測装置で形成可能である。カメラ１４１及び距離センサ１４２は、外的センサを形成する。 The sensor unit 14 includes a camera 141 and a distance sensor 142. The camera 141 can be formed by, for example, a stereo camera that extracts a visual landmark from a captured image and measures the three-dimensional position of the robot 1. The distance sensor 142 can be formed by a measuring device such as an LRF that measures the distance to the surrounding environment by a known method. The camera 141 and the distance sensor 142 form an external sensor.

ナビゲーションＣＰＵ１１は、内的センサ（ジャイロセンサ１３１、センサ・エンコーダ１３２）及び外的センサ（カメラ１４１及び距離センサ１４２）の出力情報に基づいて、ロボット１の現在位置を推定する。又、ナビゲーションＣＰＵ１１は、推定したロボット１の現在位置に基づいて、ロボット１の移動を計画する。 The navigation CPU 11 estimates the current position of the robot 1 based on output information from the internal sensors (the gyro sensor 131 and the sensor / encoder 132) and the external sensors (the camera 141 and the distance sensor 142). Further, the navigation CPU 11 plans the movement of the robot 1 based on the estimated current position of the robot 1.

位置推定装置は、図１に示す如きハードウェア構成を有するロボット１のナビゲーションＣＰＵ１１、即ち、ナビゲーション部の一部として搭載されていても良く、ロボット１が自律移動を行う際に自己位置推定を行う。 The position estimation device may be mounted as a part of the navigation CPU 11 of the robot 1 having the hardware configuration shown in FIG. 1, that is, the navigation unit, and performs self-position estimation when the robot 1 performs autonomous movement. .

図２は、ロボットの遠隔操作を説明する図である。ロボット１は、図２に示すように、サーバ（又はセンタ）９０１と通信可能な構成を有し、サーバ９０１からサービスの提供タイミング等を遠隔操作により制御されるものであっても良い。サーバ９０１は、メモリ９０２、通信部９０３、及びＣＰＵ９０４を有する。サーバ９０１は、オペレータがサーバ９０１に情報やコマンドを入力する入力部（図示せず）と、サーバ９０１からオペレータへ情報を出力する出力部（図示せず）を含んでも良い。この場合、入力部は、例えばキーボード等の操作部で形成可能であり、出力部は、表示部等で形成可能である。図２では、説明の便宜上、ロボット１内の通信部８０１以外の部分の図示は省略するが、通信部８０１は例えば図１に示すナビゲーションＣＰＵ１１及び走行制御ＣＰＵ１２の少なくとも一方に接続されている。 FIG. 2 is a diagram for explaining remote operation of the robot. As shown in FIG. 2, the robot 1 may have a configuration capable of communicating with a server (or center) 901, and the service provision timing and the like may be controlled from the server 901 by remote operation. The server 901 includes a memory 902, a communication unit 903, and a CPU 904. The server 901 may include an input unit (not shown) through which an operator inputs information and commands to the server 901 and an output unit (not shown) through which information is output from the server 901 to the operator. In this case, the input unit can be formed by an operation unit such as a keyboard, and the output unit can be formed by a display unit or the like. In FIG. 2, for convenience of explanation, illustration of portions other than the communication unit 801 in the robot 1 is omitted, but the communication unit 801 is connected to, for example, at least one of the navigation CPU 11 and the travel control CPU 12 illustrated in FIG. 1.

上記の例では、ロボット１が自己位置推定に用いる各種データがロボット１内の記憶部１６に格納されているものとしたが、少なくともデータの一部をロボット１の制御及び管理を司るサーバ９０１内の記憶部９０２に格納しても良い。この場合、ロボット１の通信部８０１は、例えば無線ネットワーク９１１を介してサーバ９０１の通信部９０３と通信することで、自己位置推定に用いる各種データを取得すれば良い。サーバ９０１内の記憶部９０２に格納可能なデータには、ランドマークのデータベース（以下、ランドマークＤＢ（Data-Base）に登録されるデータの他、ロボット１の記憶部１６に格納されるデータを含んでも良い。又、図１に示すナビゲーションＣＰＵ１１の機能の少なくとも一部、或いは、走行制御ＣＰＵ１２の機能の少なくとも一部をサーバ９０１側で実現するようにしても良い。自己位置推定に用いる各種データの少なくとも一部をサーバ９０１側に格納することで、ロボット１内で必要となる記憶容量を減らし、ロボット１内で必要となるデータ管理の負荷を低減可能となる。 In the above example, various data used by the robot 1 for self-position estimation are stored in the storage unit 16 in the robot 1, but at least a part of the data is stored in the server 901 that controls and manages the robot 1. May be stored in the storage unit 902. In this case, the communication unit 801 of the robot 1 may acquire various data used for self-position estimation by communicating with the communication unit 903 of the server 901 via the wireless network 911, for example. Data that can be stored in the storage unit 902 in the server 901 includes data stored in the storage unit 16 of the robot 1 in addition to data registered in a landmark database (hereinafter referred to as a landmark DB (Data-Base)). In addition, at least a part of the function of the navigation CPU 11 shown in Fig. 1 or at least a part of the function of the travel control CPU 12 may be realized on the server 901. Various data used for self-position estimation By storing at least a part of the data on the server 901 side, the storage capacity required in the robot 1 can be reduced, and the data management load required in the robot 1 can be reduced.

サーバ９０１は、画像処理装置の一例である。しかし、ロボット１内で上記サーバ９０１の処理を実行する場合には、ロボット１が画像処理装置を形成する。この場合、ロボット１は自律走行型に限定されず、固定型であっても良い。これは、ロボットが固定型であっても、例えば屋外に設置されていれば撮影条件が時間と共に変化するからである。 The server 901 is an example of an image processing apparatus. However, when the processing of the server 901 is executed in the robot 1, the robot 1 forms an image processing apparatus. In this case, the robot 1 is not limited to the autonomous traveling type, and may be a fixed type. This is because even if the robot is a fixed type, for example, if it is installed outdoors, the shooting conditions change with time.

（データベースの作成）
先ず、本発明の一実施例における画像処理装置で用いるデータベースの作成方法について、図３と共に説明する。図３は、データベースの作成方法、即ち、データベースへのデータ登録時の動作を説明する機能ブロック図である。図３に示す機能ブロックの処理は、ＣＰＵ等のプロセッサと記憶部を含む汎用のコンピュータにより実行可能である。この例では、説明の便宜上、図３に示す機能ブロックの処理は、図２に示すサーバ９０１のＣＰＵ９０４により実行されるものとする。 (Create database)
First, a method for creating a database used in the image processing apparatus according to an embodiment of the present invention will be described with reference to FIG. FIG. 3 is a functional block diagram for explaining a database creation method, that is, an operation when registering data in the database. The processing of the functional blocks shown in FIG. 3 can be executed by a general-purpose computer including a processor such as a CPU and a storage unit. In this example, for convenience of explanation, it is assumed that the processing of the functional blocks shown in FIG. 3 is executed by the CPU 904 of the server 901 shown in FIG.

図３において、入力部２１は、例えばロボット１のカメラ１４１で撮影された画像データ（又は、画像系列）を通信部９０３を介して入力する。画像データは、例えば動画像データである。前処理部２２は、画像データの各フレームに対して、特徴点を周知の手法で抽出する特徴点抽出部２２１と、抽出した特徴点に基づいて周知の手法で特徴ベクトルを算出する算出部２２２を有する。 In FIG. 3, for example, the input unit 21 inputs image data (or an image series) captured by the camera 141 of the robot 1 via the communication unit 903. The image data is, for example, moving image data. The pre-processing unit 22 extracts a feature point for each frame of the image data by a known method, and a calculation unit 222 calculates a feature vector by a known method based on the extracted feature point. Have

ＳＩＦＴ（Scale-Invariant Feature Transform）は、検出した特徴点に対して、画像の回転、スケール変化、照明変化等に対してロバストな特徴量を記述する、特徴点の検出と特徴量の記述を行う周知のアルゴリズムである。以下の説明では、説明の便宜上、検出される特徴点はＳＩＦＴに従って検出されたＳＩＦＴ特徴点であるものとするが、特徴点はＳＩＦＴ特徴点に限定されないことは言うまでもない。特徴ベクトルにＳＩＦＴを使用した場合、特徴ベクトルの長さは例えば１２８次元に設定しても良い。この場合、特徴ベクトルの各値を０〜２５５の区間に正規化すると、例えば特徴ベクトルＶ１は [0,12,53,2,3,12,54,…]、特徴ベクトルＶ２は[76,4,2,6,22,12,67,34,123,…]の如き表現が可能となる。 SIFT (Scale-Invariant Feature Transform) describes feature points that are robust to image feature rotations, scale changes, illumination changes, etc., and detects feature points. This is a well-known algorithm. In the following description, for convenience of explanation, it is assumed that the detected feature point is a SIFT feature point detected according to SIFT, but it goes without saying that the feature point is not limited to the SIFT feature point. When SIFT is used for the feature vector, the length of the feature vector may be set to 128 dimensions, for example. In this case, when each value of the feature vector is normalized to a range of 0 to 255, for example, the feature vector V1 is [0,12,53,2,3,12,54, ...], and the feature vector V2 is [76,4 , 2,6,22,12,67,34,123, ...] is possible.

特徴ベクトル処理部２３は、特徴ベクトルバッファ部２３１、フレーム間特徴点のマッチング（又は、照合）部２３２、及びＩＤ取得部２３３を有する。特徴ベクトルバッファ部２３１は、前処理部２２で算出された特徴ベクトルをバッファリングし、例えば時刻ｔ−２，ｔ−１，ｔにおけるフレームの特徴ベクトルがバッファリングされて例えば記憶部９０２に格納される。尚、バッファリングされる特徴ベクトルは、少なくとも時刻ｔ−１，ｔにおけるフレームの特徴ベクトルであれば良い。マッチング部２３２は、時刻ｔにおける最新（即ち、現在の）フレームの特徴点が時刻ｔ−１における直前フレームの特徴点とマッチ（即ち、一致）するか否かを判定し、マッチした特徴点とマッチしなかった特徴点に分類する。ＩＤ取得部２３３は、マッチした特徴点については時刻ｔ−１における直前フレームの特徴点のＩＤ番号（即ち、ランドマークを識別するＩＤの値）を継承させ、マッチしなかった特徴点についてはランドマークＤＢ２５に登録されている特徴点とマッチする特徴点とランドマークＤＢ２５に登録されている特徴点とマッチしなかった特徴点に更に分類する。ランドマークＤＢ２５は、例えばサーバ９０１の記憶部９０２に格納しても良い。 The feature vector processing unit 23 includes a feature vector buffer unit 231, an inter-frame feature point matching (or matching) unit 232, and an ID acquisition unit 233. The feature vector buffer unit 231 buffers the feature vector calculated by the preprocessing unit 22, and for example, the feature vector of the frame at time t-2, t-1, t is buffered and stored in the storage unit 902, for example. The The feature vector to be buffered may be a feature vector of a frame at least at times t-1 and t. The matching unit 232 determines whether or not the feature point of the latest (ie, current) frame at time t matches (ie, matches) the feature point of the immediately preceding frame at time t−1. Classify the feature points that did not match. The ID acquisition unit 233 inherits the ID number of the feature point of the immediately preceding frame at the time t−1 (that is, the ID value for identifying the landmark) for the matched feature point, and the land for the feature point that did not match. Further classification is made into feature points that match feature points registered in the mark DB 25 and feature points that do not match feature points registered in the landmark DB 25. The landmark DB 25 may be stored in the storage unit 902 of the server 901, for example.

ＤＢ処理部（又は、ＤＢ更新部）２４は、時刻ｔ−１における直前フレームの特徴点とマッチした特徴点、或いは、ランドマークＤＢ２５に登録されている特徴点とマッチした特徴点の特徴ベクトルをサンプルとしてランドマークの木構造（以下、ランドマークツリーとも言う）のサブツリーに登録する。ランドマークツリーは、複数のクラスを階層的に管理するものであり、サブツリーは、クラス内のメンバー（即ち、クラスを形成するメンバー）を階層的に管理するものである。木構造は、メンバーの検索又はメンバーに対する処理を効率良く行って処理時間を短縮するのに好適である。一方、ＤＢ処理部２４は、ランドマークＤＢ２５に登録されている特徴点とマッチしなかった特徴点の特徴ベクトルを新しいクラスの代表としてクラスを生成すると共に、これと同時に特徴ベクトルをサンプルとしてランドマークツリーのサブツリーを作成する。更に、ＤＢ処理部２４では、サンプルが登録されたサブツリー、或いは、作成されたサブツリーに基づいて、ランドマークツリーの木構造の更新又は最適化を行い、更新又は最適化された木構造のランドマークツリーはランドマークＤＢ２５に格納される。 The DB processing unit (or DB updating unit) 24 obtains the feature point that matches the feature point of the immediately preceding frame at time t−1 or the feature vector of the feature point that matches the feature point registered in the landmark DB 25. A sample is registered in a subtree of a landmark tree structure (hereinafter also referred to as a landmark tree). The landmark tree is used to hierarchically manage a plurality of classes, and the subtree is used to hierarchically manage members in the class (that is, members forming the class). The tree structure is suitable for reducing the processing time by efficiently searching for members or processing members. On the other hand, the DB processing unit 24 generates a class with the feature vector of the feature point that does not match the feature point registered in the landmark DB 25 as a representative of the new class, and at the same time uses the feature vector as a sample as a landmark. Create a subtree of the tree. Further, the DB processing unit 24 updates or optimizes the tree structure of the landmark tree based on the subtree in which the sample is registered or the created subtree, and the updated or optimized tree structure landmark. The tree is stored in the landmark DB 25.

図４は、ランドマークＤＢの作成方法を説明するフローチャートである。図４において、ステップＳ１では、ステレオカメラ１４１からの画像系列の全ての画像の処理が完了したか否かを判定し、判定結果がＮＯであると処理はステップＳ２へ進み、判定結果がＹＥＳであると処理はステップＳ６へ進む。ステップＳ２では、ステレオカメラ１４１からの画像系列を入力部２１に入力する。ステップＳ３では、前処理部２２の特徴点抽出部２２１が入力部２１を介して入力されるステレオカメラ１４１からの画像系列のうち、例えば右カメラ画像(又は、左カメラ画像)に対して特徴点抽出を行い、特徴ベクトル算出部２２２が抽出した特徴点の周囲領域の輝度分布特徴から特徴ベクトルを計算する。ステップＳ４では、特徴ベクトル処理部２３の特徴ベクトルバッファ部２３１が特徴ベクトル算出部２２２から供給される特徴ベクトルをバッファリングし、マッチング部２３２がバッファリングされた特徴ベクトルに基づいてフレーム間の特徴点を追跡、即ち、マッチングを行う。ステップＳ５では、マッチング部２３２が未処理の特徴点数を、処理中のフレームの特徴点数に設定する。ステップＳ７では、マッチング２３２が未処理特徴点数が０であるか否かを判定し、判定結果がＹＥＳであると処理は後述するステップＳ１６へ進み、判定結果がＮＯであると処理はステップＳ８へ進む。 FIG. 4 is a flowchart for explaining a landmark DB creation method. In FIG. 4, in step S 1, it is determined whether or not all the images in the image series from the stereo camera 141 have been processed. If the determination result is NO, the process proceeds to step S 2 and the determination result is YES. If so, the process proceeds to step S6. In step S 2, the image series from the stereo camera 141 is input to the input unit 21. In step S 3, a feature point for a right camera image (or a left camera image), for example, among image sequences from the stereo camera 141 input by the feature point extraction unit 221 of the preprocessing unit 22 through the input unit 21. Extraction is performed, and a feature vector is calculated from the luminance distribution features of the surrounding area of the feature points extracted by the feature vector calculation unit 222. In step S4, the feature vector buffer unit 231 of the feature vector processing unit 23 buffers the feature vector supplied from the feature vector calculation unit 222, and the matching unit 232 performs feature points between frames based on the buffered feature vector. Are tracked, that is, matched. In step S5, the matching unit 232 sets the number of unprocessed feature points to the number of feature points of the frame being processed. In step S7, the matching 232 determines whether or not the number of unprocessed feature points is 0. If the determination result is YES, the process proceeds to step S16 described later, and if the determination result is NO, the process proceeds to step S8. move on.

ステップＳ８では、マッチング部２３２が未処理の特徴点を選択する。ステップＳ９では、特徴ベクトル処理部２３のＩＤ取得部２３３が追跡長さ（又は、距離）が一定の閾値以上であるか否かを判定し、判定結果がＹＥＳであると処理はステップＳ１０へ進み、判定結果がＮＯであると処理は後述するステップＳ１６へ進む。ステップＳ１０では、ＩＤ取得部２３３が直前フレームでの対応特徴点がＩＤ番号を有するか否かを判定し、判定結果がＹＥＳであると処理はステップＳ１３へ進み、判定結果がＮＯであると処理はステップＳ１１へ進む。ステップＳ１１では、ＩＤ取得部２３３がステップＳ１０において直前フレームでの対応特徴点が有すると判定されたＩＤ番号と一致するＩＤ番号をランドマークＤＢ２５内で検索する。ステップＳ１２では、ＩＤ取得部２３３が一致するＩＤ番号が検索されたか否かを判定する。一致するＩＤ番号が検索されてステップＳ１２の判定結果がＹＥＳであると処理はステップＳ１４へ進み、一致するＩＤ番号が検索されず（NULL）ステップＳ１２の判定結果がＮＯであると処理はステップＳ１３へ進む。 In step S8, the matching unit 232 selects unprocessed feature points. In step S9, the ID acquisition unit 233 of the feature vector processing unit 23 determines whether the tracking length (or distance) is greater than or equal to a certain threshold value. If the determination result is YES, the process proceeds to step S10. If the determination result is NO, the process proceeds to step S16 described later. In step S10, the ID acquisition unit 233 determines whether the corresponding feature point in the immediately preceding frame has an ID number. If the determination result is YES, the process proceeds to step S13, and if the determination result is NO, the process Advances to step S11. In step S11, the ID acquisition unit 233 searches the landmark DB 25 for an ID number that matches the ID number determined to have the corresponding feature point in the immediately preceding frame in step S10. In step S12, the ID acquisition unit 233 determines whether a matching ID number has been searched. If a matching ID number is retrieved and the determination result in step S12 is YES, the process proceeds to step S14. If a matching ID number is not retrieved (NULL), the process proceeds to step S13 if the determination result in step S12 is NO. Proceed to

ステップＳ１３では、ＤＢ処理部２４が一致するＩＤ番号がランドマークＤＢ２５に登録されていない特徴点の特徴ベクトルをランドマークＤＢ２５内の該当するクラスのランドマークツリー中の該当するカテゴリのサブツリーに追加し、処理はステップＳ１５へ進む。例えば、クラスは検出対象となるランドマークに対応し、各カテゴリは当該カテゴリが属するクラス、即ち、ランドマークを形成する特徴部分に相当する。ランドマークを形成する特徴部分には、例えば右角、左角、色等の情報が含まれる。ステップＳ１４では、ＤＢ処理部２４が一致するＩＤ番号を有する特徴点の特徴ベクトルをランドマークＤＢ２５内の該当するクラスのランドマークツリー中の該当するカテゴリのサブツリーに追加登録する。ステップＳ１５では、特徴ベクトル処理部２３のマッチング部２３２が未処理の特徴点数を１だけデクリメントし、処理はステップＳ７へ戻る。 In step S13, the feature vector of the feature point whose matching ID number is not registered in the landmark DB 25 is added to the sub-tree of the corresponding category in the landmark tree of the corresponding class in the landmark DB 25. The process proceeds to step S15. For example, a class corresponds to a landmark to be detected, and each category corresponds to a class to which the category belongs, that is, a characteristic part that forms a landmark. The characteristic part forming the landmark includes information such as a right corner, a left corner, and a color. In step S14, the feature processing vector of the feature point having the matching ID number is additionally registered in the subtree of the corresponding category in the landmark tree of the corresponding class in the landmark DB 25 by the DB processing unit 24. In step S15, the matching unit 232 of the feature vector processing unit 23 decrements the number of unprocessed feature points by 1, and the process returns to step S7.

ステップＳ１６では、ＤＢ処理部２４がサンプルが登録されたサブツリー、或いは、作成されたサブツリーに基づいて、ランドマークツリーの木構造の更新又は最適化を行い、処理はステップＳ１へ戻る。ステップＳ７の判定結果がＹＥＳになると、ステップＳ６では、ＤＢ処理部２４が更新又は最適化された木構造のランドマークツリーを保存するためにランドマークＤＢ２５に格納し、処理は終了する。 In step S16, the DB processing unit 24 updates or optimizes the tree structure of the landmark tree based on the subtree in which the sample is registered or the created subtree, and the process returns to step S1. If the determination result in step S7 is YES, in step S6, the DB processing unit 24 stores the updated or optimized tree-structured landmark tree in the landmark DB 25, and the process ends.

本実施例におけるランドマークＤＢ２５への登録手順をより詳細に説明すると、次のようになる。ステップＳ３１では、図５に示すようにフレーム０のフレーム画像Ｉ_０において特徴点ｐを検出し、特徴ベクトルを作成する。ステップＳ３２では、図６に示すようにフレームｍ（ｍ＞０）において特徴ｐ_ｍを検出して特徴ベクトルを作成し、フレームｍ−１で検出した特徴点ｐ_ｍ−１との対応付けを行って図７に示す如き対応点リストを作成すると共に、対応点リストの長さｃを記憶する。ステップＳ３３では、フレームＭ（Ｍ＞ｍ＞０）まで、ステップＳ３２の処理を続ける。ステップＳ３４では、フレームＭの全ての特徴点ｐ_Ｍをスキャンする。 The registration procedure to the landmark DB 25 in this embodiment will be described in detail as follows. In step S31, as shown in FIG. 5, the feature point p is detected in the frame image I0 of frame ₀ , and a feature vector is created. In step S32, performing the association, wherein p _m detects create a feature vector, the feature point p _m-1 detected in frame m-1 in frame m (m> 0) as shown in FIG. 6 Then, a corresponding point list as shown in FIG. 7 is created and the length c of the corresponding point list is stored. In step S33, the process of step S32 is continued until frame M (M>m> 0). In step S34, all feature points pM of the frame _M are scanned.

ステップＳ３５では、フレームＭの全ての特徴点ｐ_Ｍのスキャンが完了した後、ランドマークＤＢ２５を最適化する。ステップＳ３６では、フレームｎ（ｎ＝Ｍ＋ｊ，ｊ＞０）において、ステップＳ３２と同様の処理を、ｍをｎに変更して行う。ステップＳ３７では、フレームｎの全ての特徴点リストをスキャンして、以下に説明する処理Ｐ１と処理Ｐ２を行う。 At step S35, after all the feature points _{p M} scan frame M is completed, optimizing the landmark DB 25. In step S36, in frame n (n = M + j, j> 0), the same processing as in step S32 is performed by changing m to n. In step S37, all feature point lists of frame n are scanned, and processing P1 and processing P2 described below are performed.

ステップＳ３８では、フレームｎの全ての特徴点ｐ_ｎのスキャンが完了した後、ランドマークＤＢ２５を最適化する。ステップＳ３９では、最終フレームＮまでステップＳ３６，Ｓ３７，Ｓ３８を行う。ステップＳ４０では、ランドマークＤＢ２５の内容を記憶部９０２に保存して、処理は終了する。 At step S38, the after all of the feature point _{p n} scanning frame n has been completed, to optimize the landmark DB 25. In step S39, steps S36, S37, and S38 are performed up to the final frame N. In step S40, the contents of the landmark DB 25 are stored in the storage unit 902, and the process ends.

ステップＳ３８におけるランドマークＤＢ２５の最適化は、例えば次のように行うことができる。先ず、ランドマークＤＢ２５に格納されているランドマークツリーの各ノードｉに対して、ステップＳ３８−１〜Ｓ３８−４を実行する。ステップＳ３８−１は、ノードｉの特徴ベクトルｗ_ｉを次式から計算する。ここで、Ｋ_ｉはサブツリー（ＳＤ）_ｉのノード数を表す。 The optimization of the landmark DB 25 in step S38 can be performed as follows, for example. First, Steps S38-1 to S38-4 are executed for each node i of the landmark tree stored in the landmark DB 25. In step S38-1, the feature vector w _i of the node i is calculated from the following equation. Here, K _i represents the number of nodes of the subtree (SD) _i .

ステップＳ３８−２は、サブツリー（ＳＤ）_ｉ中で特徴ベクトルｗ_ｉとの距離Ｌ_２が所定値以上のメンバーを削除する。ステップＳ３８−３は、ステップＳ３８−１を再度実行し、ステップＳ３８−４は、サブツリー（ＳＤ）_ｉの構造を最適化、即ち、サブツリー（ＳＤ）_ｉの木構造の再構成を行う。次に、ランドマークＤＢ２５に格納されているランドマークツリーの木構造を最適化、即ち、ランドマークツリーの木構造の再構成を行う。 Step S38-2, the distance _{L 2} between the feature vector _{w i} in the subtree _{(SD) i} to remove a member of more than a predetermined value. In step S38-3, step S38-1 is executed again, and in step S38-4, the structure of the subtree (SD) _i is optimized, that is, the tree structure of the subtree (SD) _i is reconfigured. Next, the tree structure of the landmark tree stored in the landmark DB 25 is optimized, that is, the tree structure of the landmark tree is reconfigured.

尚、ステップＳ１２におけるＩＤ番号の検索（search）は、例えば次のように行うことができる。 The search for the ID number in step S12 can be performed as follows, for example.

図８は、ランドマークＤＢ２５に格納されるランドマークツリーの木構造の一例を説明する図である。図８において、右側にランドマークＤＢ２５全体のランドマークツリーの木構造を示し、左側にこの木構造の一部分を拡大して示す。図８の左側に示す木構造の部分は、ランドマークの１番目のカテゴリＣ１に関するサブツリーＬＭＣ１、ランドマークの２番目のカテゴリＣ２に関するサブツリーＬＭＣ２、ランドマークの３番目のカテゴリＣ３に関するサブツリーＬＭＣ３を有する。ＳＴは、サブツリーＬＭＣ１を形成する特徴ベクトルのサンプルを示し、ＦＲは、サブツリーＬＭＣ１中の代表特徴ベクトルを太線で囲んで示す。サブツリーＬＭＣ２，ＬＭＣ３中の代表特徴ベクトルも同様に、太線で囲んで示す。 FIG. 8 is a diagram for explaining an example of the tree structure of the landmark tree stored in the landmark DB 25. In FIG. 8, the tree structure of the entire landmark DB 25 is shown on the right side, and a part of this tree structure is enlarged on the left side. 8 includes a subtree LMC1 related to the first category C1 of the landmark, a subtree LMC2 related to the second category C2 of the landmark, and a subtree LMC3 related to the third category C3 of the landmark. ST indicates a sample of feature vectors forming the subtree LMC1, and FR indicates a representative feature vector in the subtree LMC1 surrounded by a bold line. Similarly, the representative feature vectors in the subtrees LMC2 and LMC3 are also shown surrounded by thick lines.

（ランドマークの検出）
次に、本発明の一実施例における画像処理装置で行うランドマーク検出方法について、図９と共に説明する。図９は、ランドマーク検出方法を説明する機能ブロック図である。図９中、図３と同一部分には同一符号を付し、その説明は省略する。図９に示す機能ブロックの処理は、サーバ９０１又はロボット１により実行可能である。この例では、説明の便宜上、図９に示す入力部２１、前処理部２２、ランドマークＩＤ認識部２７、及び空間３次元位置測定部２８の処理は、図２に示すサーバ９０１のＣＰＵ９０４により実行されるものとする。 (Landmark detection)
Next, a landmark detection method performed by the image processing apparatus according to an embodiment of the present invention will be described with reference to FIG. FIG. 9 is a functional block diagram illustrating a landmark detection method. 9, parts that are the same as those shown in FIG. 3 are given the same reference numerals, and explanation thereof is omitted. The processing of the functional blocks shown in FIG. 9 can be executed by the server 901 or the robot 1. In this example, for the convenience of explanation, the processing of the input unit 21, the preprocessing unit 22, the landmark ID recognition unit 27, and the spatial three-dimensional position measurement unit 28 shown in FIG. 9 is executed by the CPU 904 of the server 901 shown in FIG. Shall be.

図９に示すように、前処理部２２Ａは、特徴点抽出部２２１及び特徴ベクトル算出部２２２に加え、特徴点座標記憶部２２３を有する。ランドマークＩＤ認識部２７は、特徴ベクトルバッファ部２３１、フレーム間特徴点のマッチング部２３２、ＩＤ取得部２３３、及びＩＤリスト出力部２３４を有する。この例では、ランドマークＤＢ２５がランドマークＩＤ認識部２７内に設けられているが、ランドマークＤＢ２５の配置は図３からもわかるように特に限定されず、サーバ９０１のＣＰＵ９０４からアクセス可能に設けられていれば良い。 As illustrated in FIG. 9, the preprocessing unit 22A includes a feature point coordinate storage unit 223 in addition to the feature point extraction unit 221 and the feature vector calculation unit 222. The landmark ID recognition unit 27 includes a feature vector buffer unit 231, an inter-frame feature point matching unit 232, an ID acquisition unit 233, and an ID list output unit 234. In this example, the landmark DB 25 is provided in the landmark ID recognition unit 27, but the arrangement of the landmark DB 25 is not particularly limited as can be seen from FIG. 3 and is provided so as to be accessible from the CPU 904 of the server 901. It should be.

特徴点座標記憶部２２３は、カメラ１４１からの画像上の特徴点の座標を記憶部９０２に記憶し、空間３次元位置測定部２８に供給する。空間３次元位置測定部２８は、カメラ１４１の座標系における特徴点の空間３次元位置を測定する。 The feature point coordinate storage unit 223 stores the coordinates of the feature points on the image from the camera 141 in the storage unit 902 and supplies them to the spatial three-dimensional position measurement unit 28. The spatial three-dimensional position measurement unit 28 measures the spatial three-dimensional position of the feature point in the coordinate system of the camera 141.

一方、ロボット１のナビゲーションＣＰＵ１１は、空間位置検索部１１１、計測値予測部１１２、仮説評価部１１３、仮説群生成部１１４、選択部１１５、ロボット位置姿勢推定部１１６、及び移動経路計画部１１７を有する。記憶部１６には、ランドマークの空間位置のデータベース、即ち、ランドマーク地図が格納されている。 On the other hand, the navigation CPU 11 of the robot 1 includes a spatial position search unit 111, a measurement value prediction unit 112, a hypothesis evaluation unit 113, a hypothesis group generation unit 114, a selection unit 115, a robot position / posture estimation unit 116, and a movement path planning unit 117. Have. The storage unit 16 stores a database of landmark spatial positions, that is, a landmark map.

空間位置検索部１１１は、ＩＤ番号でランドマークのワールド座標系（Word Coordinate System）又はグローバル座標系（Global Coordinate System）における空間位置を検索する。計測値予測部１１２は、空間位置をカメラ１４１の座標系、即ち、ローカル座標系（Local Coordinate System）或いはボディ座標系（Body Coordinate System）へ変換してＩＤ番号のランドマークをカメラ１４１で計測した場合の計測値を予測する。仮説評価部１１３は、各仮説の長さを評価し、ランドマークに対する実測値と予測値との差から尤度を計算する。仮説群生成部１１４は、ロボット位置（又は、カメラ位置姿勢）の仮説群を生成する。選択部１１５は、評価するべき任意の仮説（例えば、カメラ位置姿勢）を選択する。ロボット位置姿勢推定部１１６は、最良仮説、即ち、推定したロボット１の真の位置姿勢を求める。移動経路計画部１１７は、ロボット１が移動するべき移動経路を計画する。 The spatial position search unit 111 searches for the spatial position of the landmark in the world coordinate system or the global coordinate system using the ID number. The measured value prediction unit 112 converts the spatial position into the coordinate system of the camera 141, that is, the local coordinate system or the body coordinate system, and measures the landmark of the ID number with the camera 141. Predict the measured value of the case. The hypothesis evaluation unit 113 evaluates the length of each hypothesis and calculates the likelihood from the difference between the actually measured value and the predicted value for the landmark. The hypothesis group generation unit 114 generates a hypothesis group of the robot position (or camera position and orientation). The selection unit 115 selects an arbitrary hypothesis (for example, camera position and orientation) to be evaluated. The robot position and orientation estimation unit 116 obtains the best hypothesis, that is, the estimated true position and orientation of the robot 1. The movement route planning unit 117 plans a movement route on which the robot 1 should move.

ランドマークの検出処理は、例えば次のようなステップＳ９０１〜Ｓ９２１により実行可能である。先ず、ステップＳ９０１では、ステレオカメラ１４１からの画像系列を入力部２１に入力する。ステップＳ９０２では、特徴点抽出部２２１が入力部２１を介して入力されるステレオカメラ１４１からの画像系列のうち、例えば右カメラ画像(又は、左カメラ画像)に対して特徴点抽出を行う。右カメラ画像を使用するか、或いは、左カメラ画像を使用するかは、ランドマークＤＢ２５に登録されているランドマークを撮影したカメラが右カメラであるか、或いは、左カメラであるかに応じて選択しても良い。ステップＳ９０３では、特徴ベクトル算出部２２２が抽出した特徴点の周囲領域の輝度分布特徴から特徴ベクトルを計算する。ステップＳ９０４では、特徴点座標記憶部２２３が特徴点のカメラ画像上の位置を示す座標を記憶部９０２に記憶する。 The landmark detection process can be executed by the following steps S901 to S921, for example. First, in step S 901, an image series from the stereo camera 141 is input to the input unit 21. In step S902, the feature point extraction unit 221 performs feature point extraction on, for example, the right camera image (or the left camera image) in the image series from the stereo camera 141 input via the input unit 21. Whether to use the right camera image or the left camera image depends on whether the camera that captured the landmark registered in the landmark DB 25 is the right camera or the left camera. You may choose. In step S903, a feature vector is calculated from the luminance distribution features in the surrounding area of the feature point extracted by the feature vector calculation unit 222. In step S904, the feature point coordinate storage unit 223 stores the coordinates indicating the position of the feature point on the camera image in the storage unit 902.

ステップＳ９０５では、特徴ベクトルバッファ部２３１が特徴ベクトル算出部２２２から供給される特徴ベクトルを記憶部９０２にバッファリングする。ステップＳ９０６では、特徴ベクトルバッファ部２３１が記憶部９０２にバッファリングされた特徴ベクトルに基づいて、例えば時刻ｔの最新フレームでの特徴点リストと時刻ｔ−１の直前フレームでの特徴点リストを抽出してマッチング部２３２に供給する。ステップＳ９０７では、マッチング部２３２が時刻ｔ，ｔ−１の連続する２フレームの特徴点リスト同士を総当りでマッチングする。ステップＳ９０８では、マッチングの得点が閾値以上であるとＩＤ取得部２３３が連続マッチング長さをインクリメントし、これと同時に、直前フレームでの対応する特徴点がＩＤ番号を持っていればそのＩＤ番号を継承させる。直前フレームでの対応する特徴点がＩＤ番号を持っていない場合、ステップＳ９０９では、ＩＤ取得部２３３が連続マッチング長さが一定以上であればランドマークＤＢ２５と照合する。ステップＳ９１０では、ＩＤ取得部２３３がランドマークＤＢ２５とのマッチングに外れた特徴点を廃棄する。ステップＳ９１１では、ＩＤリスト出力部２３４がマッチングすると認識されたランドマークとそのＩＤ番号、即ち、マッチングすると認識されたランドマークのデータを出力する。 In step S905, the feature vector buffer unit 231 buffers the feature vector supplied from the feature vector calculation unit 222 in the storage unit 902. In step S906, the feature vector buffer unit 231 extracts, for example, the feature point list at the latest frame at time t and the feature point list at the immediately previous frame at time t-1 based on the feature vectors buffered in the storage unit 902. And supplied to the matching unit 232. In step S907, the matching unit 232 matches the feature point lists of two consecutive frames at times t and t-1 with brute force. In step S908, if the matching score is greater than or equal to the threshold, the ID acquisition unit 233 increments the continuous matching length. At the same time, if the corresponding feature point in the immediately preceding frame has an ID number, the ID number is set. Inherit. If the corresponding feature point in the immediately preceding frame does not have an ID number, in step S909, the ID acquisition unit 233 collates with the landmark DB 25 if the continuous matching length is greater than or equal to a certain value. In step S910, the ID acquisition unit 233 discards the feature points that are out of matching with the landmark DB 25. In step S911, the landmark recognized by the ID list output unit 234 and its ID number, that is, the data of the landmark recognized to be matched are output.

この例では、ランドマークの空間位置データベース（ＤＢ）が事前に取得されて記憶部１６に格納されているものとする。ステップＳ９１２では、空間位置検索部１１１がＩＤ番号でワールド座標系におけるランドマークの空間位置を検索して出力する。ステップＳ９１２では、全てのＩＤに対して同様な検索を行う。 In this example, it is assumed that a landmark spatial position database (DB) is acquired in advance and stored in the storage unit 16. In step S912, the spatial position search unit 111 searches for and outputs the spatial position of the landmark in the world coordinate system using the ID number. In step S912, a similar search is performed for all IDs.

ステップＳ９１３では、ロボット１の位置姿勢を推定するために、仮説群生成部１１４がロボット１の位置姿勢に関する仮説群を後述する評価部１１３から供給される尤度に基づいて生成する。ステップＳ９１４では、選択部１１５が仮説群から任意の仮説を選択し、選択された任意の仮説に対して評価を行う。この例では、評価は前の仮説に対する繰り返し処理である。ステップＳ９１５では、計測値予測部１１３が空間位置をカメラ１４１のローカル座標系へ変換してＩＤ番号のランドマークをカメラ１４１で計測した場合の計測値を予測する。これにより、空間位置検索部１１１で検索されたランドマークのワールド座標がカメラ１４１のローカル座標系へ変換される。 In step S913, in order to estimate the position and orientation of the robot 1, the hypothesis group generation unit 114 generates a hypothesis group related to the position and orientation of the robot 1 based on the likelihood supplied from the evaluation unit 113 described later. In step S914, the selection unit 115 selects an arbitrary hypothesis from the hypothesis group, and evaluates the selected arbitrary hypothesis. In this example, evaluation is an iterative process for the previous hypothesis. In step S 915, the measurement value prediction unit 113 converts the spatial position to the local coordinate system of the camera 141 and predicts the measurement value when the landmark of the ID number is measured by the camera 141. Thereby, the world coordinates of the landmark searched by the spatial position search unit 111 are converted into the local coordinate system of the camera 141.

ステップＳ９１６では、空間３次元位置測定部２８がカメラ１４１でランドマークを計測した際の実測値を評価部１１３に供給する。ステップＳ９１７では、評価部１１３が空間３次元位置測定部２８から供給される実測値と計測値予測部１１２で変換されたローカル座標とを比較し、比較された２つの値の一致度を表す点数を付ける。この例では、各仮説の良さを評価し、ランドマークに対する実測値と予測値の差から尤度を計算する。ステップＳ９１８では、評価部１１３で計算された尤度（即ち、点数）をロボット位置の仮説群生成部１１４へフィードバックする。ステップＳ９１９では、位置姿勢推定部１１６が最良評価の仮説を推定したロボット１の真の位置姿勢に決定する。ステップＳ９２０では、位置姿勢推定部１１６が推定したロボット１の真の位置姿勢を移動計画部１１７に供給する。ステップＳ９２１では、移動計画部１１７が計画した移動指示に従ってロボット１を移動させるように走行制御ＣＰＵ１２を制御する。これにより、図１に示すモータ１３３は、走行制御ＣＰＵ１２からのコマンドに基づいて車輪１３４を直接、或いは、ギア機構を介して回転し、ロボット１が計画に従って移動する。 In step S 916, the spatial three-dimensional position measurement unit 28 supplies an actual measurement value when the landmark is measured by the camera 141 to the evaluation unit 113. In step S 917, the evaluation unit 113 compares the actual value supplied from the spatial three-dimensional position measurement unit 28 with the local coordinates converted by the measurement value prediction unit 112, and a score representing the degree of coincidence between the two values compared. Add. In this example, the goodness of each hypothesis is evaluated, and the likelihood is calculated from the difference between the actually measured value and the predicted value for the landmark. In step S918, the likelihood (that is, the score) calculated by the evaluation unit 113 is fed back to the hypothesis group generation unit 114 of the robot position. In step S919, the position / orientation estimation unit 116 determines the true position / orientation of the robot 1 that has estimated the best evaluation hypothesis. In step S920, the true position and orientation of the robot 1 estimated by the position and orientation estimation unit 116 is supplied to the movement planning unit 117. In step S921, the travel control CPU 12 is controlled to move the robot 1 according to the movement instruction planned by the movement planning unit 117. Thereby, the motor 133 shown in FIG. 1 rotates the wheel 134 directly or via the gear mechanism based on the command from the traveling control CPU 12, and the robot 1 moves according to the plan.

尚、図３に示す如きＤＢ処理部２４をランドマークＩＤ認識部２７に接続して、ランドマークＤＢ２５を更新又は最適化可能な構成としても良いことは、言うまでもない。 It goes without saying that the DB processing unit 24 as shown in FIG. 3 may be connected to the landmark ID recognition unit 27 so that the landmark DB 25 can be updated or optimized.

（比較例と実施例の比較）
次に、従来のデータ処理装置の一例である比較例におけるランドマークＤＢへの登録手順を説明する。ステップＳ５０１では、図１０に示すように各フレーム画像Ｉ_０〜Ｉ_３において特徴点ｗを検出し、特徴ベクトルのサンプルを収集する。ステップＳ５０２では、収集した特徴ベクトルを図１１に示すようにクラスタリングする。この場合、例えばカテゴリ数が既知であり「３」であるものとする。従って、ステップＳ５０２では収集した特徴ベクトルが３つのカテゴリＣ１，Ｃ２，Ｃ３にクラスタリングされる。ステップＳ５０３では、図１２に太線で囲んで示すように各カテゴリＣ１，Ｃ２，Ｃ３の平均ベクトルｗ_１，ｗ_２，ｗ_３を計算する。ステップＳ５０４では、図１３に示すように各カテゴリＣ１，Ｃ２，Ｃ３の平均ベクトルｗ_１，ｗ_２，ｗ_３、即ち、太線で囲んで示す代表特徴ベクトルをノードとしてＫＤ木（K-Dimensional Tree）を生成し、ランドマークＤＢが作成される。 (Comparison between comparative example and example)
Next, a registration procedure to the landmark DB in a comparative example which is an example of a conventional data processing apparatus will be described. In step S501, as shown in FIG. 10, feature points w are detected in the frame images I _{0 to} I ₃ and feature vector samples are collected. In step S502, the collected feature vectors are clustered as shown in FIG. In this case, for example, it is assumed that the number of categories is known and is “3”. Accordingly, in step S502, the collected feature vectors are clustered into three categories C1, C2, and C3. In step S503, average vectors w ₁ , w ₂ , and w ₃ of the categories C1, C2, and C3 are calculated as shown by bold lines in FIG. In step S504, as shown in FIG. 13, the average vectors w ₁ , w ₂ , and w _{3 of} the categories C1, C2, and C3, that is, the KD tree (K-Dimensional Tree) with the representative feature vector surrounded by the thick line as a node. And a landmark DB is created.

しかし、同一物体を撮影した画像から検出した特徴ベクトルが照明変化、視点の違い、影の違い等の影響により異なると、代表特徴ベクトルがぼけてしまう。このため、画像サンプルの数が多くなると、代表特徴ベクトルが本来同じカテゴリに属する特徴ベクトルを代表できなる可能性がある。 However, if the feature vector detected from the image obtained by photographing the same object differs due to the influence of illumination change, viewpoint difference, shadow difference, etc., the representative feature vector is blurred. For this reason, when the number of image samples increases, there is a possibility that the representative feature vector can represent the feature vector originally belonging to the same category.

一方、上記実施例では、第１及び第２の２段階でランドマークを検出することで、ランドマーク検出時の誤りと見逃しを軽減することができる。 On the other hand, in the above embodiment, by detecting the landmarks in the first and second stages, errors and oversights at the time of landmark detection can be reduced.

第１段階では、クラスとのマッチングを行うことで大まかな（又は、粗い）マッチングを行う。先ず、入力特徴ベクトルとランドマークＤＢ内のランドマークツリーのノードとのマッチングを行い、入力特徴ベクトルとノード間の距離Ｌ_２を算出する。次に、算出した距離Ｌ_２でランドマークツリーのノードを絞込み、入力特徴ベクトルとランドマークツリーのノードとの相関値から類似度を算出する。これにより、動画像追跡を用いて、一定時間以上に連続に追跡されている比較的安定な特徴点のみが生き残り、生き残った特徴点が各クラスの代表的ベクトルと比較的甘くマッチングされる。従って、閾値を過敏に設定したことによるランドマークの検出見逃しを減らすことができる。 In the first stage, rough (or coarse) matching is performed by matching with a class. First, it matches with nodes landmark tree of the input feature vector and the landmark in the DB, and calculates the distance L ₂ between the input feature vector and the node. Then, the node of the calculated distance L ₂ landmark tree refinement, the similarity is calculated from the correlation value between the input feature vector and the landmark tree node. Thus, using moving image tracking, only relatively stable feature points that have been continuously tracked for a certain time or longer survive, and the surviving feature points are relatively sweetly matched with the representative vectors of each class. Therefore, it is possible to reduce the missed detection of the landmark due to the sensitive setting of the threshold value.

第２段階では、第１段階でマッチングされたクラス内の各メンバーとのマッチングを行うことで精密なマッチングを行う。先ず、第１段階で絞り込んだランドマークツリーのノードから該当するカテゴリのサブツリーを抽出する。次に、入力特徴ベクトルと抽出したサブツリーの各ノードとの距離Ｌ_２を算出して最大距離と最小距離のノードのＩＤ番号を抽出し、入力特徴ベクトルとこれらの抽出されたノードとの相関値から総合的に類似度を算出する。これにより、各クラスの代表特徴ベクトルとのマッチングで生き残った特徴点に対して、サブツリーで管理しているサンプルの特徴点とのマッチングを行う。これらのサンプルは、異なる地点や照明等の異なる環境下で取得したデータであり、多様性を保っているので、マッチングの得点を比較的高くすることができる。この段階で最大得点と最少得点の両方に対して比較的高い閾値を設定することで、ランドマークの検出時の誤りを減らすことができる。 In the second stage, precise matching is performed by matching each member in the class matched in the first stage. First, the subtree of the corresponding category is extracted from the landmark tree nodes narrowed down in the first stage. Next, the distance L ₂ between the input feature vector and each node of the extracted subtree is calculated, the ID numbers of the nodes with the maximum distance and the minimum distance are extracted, and the correlation value between the input feature vector and these extracted nodes From the above, the similarity is calculated comprehensively. As a result, the feature points that survive the matching with the representative feature vectors of each class are matched with the feature points of the samples managed in the subtree. These samples are data acquired under different environments such as different points and lighting, and since diversity is maintained, the score of matching can be made relatively high. By setting a relatively high threshold value for both the maximum score and the minimum score at this stage, errors at the time of landmark detection can be reduced.

次に、比較例と実施例についての実験データを図１４乃至図２０と共に説明する。実験例では、或る建物のエントランスで、所定位置に設置されたカメラを例えば垂直軸を中心に所定角度範囲で３周回させて、毎秒５フレームで画像列を取得し、ランドマークＤＢを作成した。更に、ランドマークＤＢを作成した数時間後に同じエントランスで、前記所定位置に設置されたカメラを同様に２周回させて、画像列（３０００枚超）を取得し、ランドマークＤＢ２５を用いて画像列からランドマーク検知を行った。 Next, experimental data on the comparative example and the example will be described with reference to FIGS. In the experimental example, a camera installed at a predetermined position at an entrance of a building was rotated around the vertical axis three times within a predetermined angle range, for example, and an image sequence was acquired at 5 frames per second to create a landmark DB. . Further, several hours after the landmark DB is created, the camera installed at the predetermined position is rotated twice in the same entrance to obtain an image sequence (more than 3000 images), and the landmark sequence DB 25 is used to obtain the image sequence. The landmark was detected.

図１４は、比較例における照明の変化と撮影地点による検出性能への影響を説明する図であり、図１５は、実施例における照明の変化と撮影地点による検出性能への影響を説明する図である。図１４及び図１５において、（ａ）はランドマークＤＢ作成後に取得した例えば４フレーム目の画像を示し、（ｂ）はランドマークＤＢ作成後に取得した例えば２１フレーム目の画像を示す。４フレーム目の画像と２１フレーム目の画像とでは、撮影時間の違いから照明が変化している。又、図１４及び図１５において、○印はランドマークＤＢに登録されたランドマークとのマッチングが取れた箇所を示す。図１４と図１５の比較からもわかるように、本実施例によれば、登録されている画像の撮影条件と照合する入力画像の撮影条件の違いにかかわらず、比較例に比べてより多くのランドマークを検出可能であることが確認された。 FIG. 14 is a diagram for explaining the influence on the detection performance due to the change in the illumination and the photographing point in the comparative example, and FIG. 15 is a diagram for explaining the influence on the detection performance due to the change in the illumination and the photographing point in the embodiment. is there. 14 and 15, (a) shows, for example, an image of the fourth frame acquired after creation of the landmark DB, and (b) shows an image of, for example, the 21st frame acquired after creation of the landmark DB. The illumination changes between the image of the fourth frame and the image of the 21st frame due to the difference in shooting time. In FIG. 14 and FIG. 15, the ◯ marks indicate locations where matching with landmarks registered in the landmark DB is achieved. As can be seen from the comparison between FIG. 14 and FIG. 15, according to the present embodiment, a larger number than the comparative example, regardless of the difference in the shooting condition of the input image to be compared with the shooting condition of the registered image. It was confirmed that the landmark could be detected.

図１６は、比較例における各フレームでのランドマークの検出数を説明する図であり、図１７は、実施例における各フレームでのランドマークの検出数を説明する図である。図１６及び図１７において、縦軸はランドマーク検出数（個）を示し、横軸は画像フレーム番号（時間軸相当）を示す。 FIG. 16 is a diagram for explaining the detected number of landmarks in each frame in the comparative example, and FIG. 17 is a diagram for explaining the detected number of landmarks in each frame in the embodiment. 16 and 17, the vertical axis indicates the number of landmark detections (pieces), and the horizontal axis indicates the image frame number (corresponding to the time axis).

図１８は、図１６の比較例のランドマーク検出結果を示すヒストグラムであり、図１９は、図１７の実施例のランドマーク検出結果を示すヒストグラムである。図１８及び図１９において、最も細かいハッチングがランドマーク検出回数が０回の場合、２番目に細かいハッチングがランドマーク検出階数が１回の場合、３番目に細かいハッチングがランドマーク検出階数が２回の場合、４番目に細かいハッチングがランドマーク検出階数が３回の場合、５番目に細かい（即ち、最も粗い）ハッチングがランドマーク検出階数が４回以上の場合を示す。更に、図２０は、図１８及び図１９のヒストグラムを表形式で示す図である。 FIG. 18 is a histogram showing the landmark detection result of the comparative example of FIG. 16, and FIG. 19 is a histogram showing the landmark detection result of the embodiment of FIG. In FIG. 18 and FIG. 19, when the number of landmark detections is 0 for the finest hatching, when the landmark detection floor is 1 for the second finest hatching, the landmark detection floor is 2 for the third finest hatching. In this case, the fourth finest hatching indicates that the landmark detection rank is three, and the fifth finest hatching (that is, the coarsest) indicates that the landmark detection rank is four or more. Further, FIG. 20 is a diagram showing the histograms of FIGS. 18 and 19 in a table format.

図１４、図１６及び図１８と、図１５、図１７及び図１９との比較、又、図２０からもわかるように、本実施例によれば、登録されている画像の撮影条件と照合する入力画像の撮影条件の違いにかかわらず正確な画像照合を行うことが可能であるため、比較例に比べてより多くのランドマークを検出可能であることが確認された。 14, 16 and 18 are compared with FIGS. 15, 17 and 19, and as can be seen from FIG. 20, according to the present embodiment, the image capturing conditions of registered images are collated. It was confirmed that more landmarks can be detected than in the comparative example because accurate image matching can be performed regardless of the difference in shooting conditions of the input image.

尚、上記実施例では、検出対象がランドマークであるが、検出対象はランドマークに限定されるものではなく、例えば人物、人物の顔等であっても良い。検出対象が例えば特定人物の顔であれば、クラスは検出対象となる特定人物の顔に対応し、各カテゴリは当該カテゴリが属するクラス、即ち、顔を形成する特徴部分に相当する。顔を形成する特徴部分には、例えば右目、左目、鼻、口等の情報が含まれる。 In the above embodiment, the detection target is a landmark, but the detection target is not limited to the landmark, and may be, for example, a person, a human face, or the like. For example, if the detection target is a face of a specific person, the class corresponds to the face of the specific person to be detected, and each category corresponds to a class to which the category belongs, that is, a characteristic part forming the face. The characteristic part forming the face includes, for example, information such as the right eye, left eye, nose and mouth.

開示の画像処理装置及びプログラムの適用は、上記実施例の如き自律移動型のロボットに限定されるものではなく、各種自律移動型の装置、各種固定型の装置、携帯型の電子装置、例えば携帯電話、携帯端末、携帯型パーソナルコンピュータ等にも適用可能であることは言うまでもない。 The application of the disclosed image processing apparatus and program is not limited to the autonomous mobile robot as in the above embodiment, but various autonomous mobile apparatuses, various fixed apparatuses, portable electronic apparatuses, for example, portable Needless to say, the present invention can also be applied to telephones, portable terminals, portable personal computers, and the like.

以上の実施例を含む実施形態に関し、更に以下の付記を開示する。
（付記１）
コンピュータに、画像データから検出対象を検出させるプログラムであって、
入力された画像データの各フレームに対して特徴点を抽出し、抽出した特徴点に基づいて特徴ベクトルを算出する前処理手順と、
検出対象画像の特徴点の特徴ベクトルをノードとし、カテゴリ毎に当該カテゴリを代表する代表特徴ベクトルと特徴ベクトルのサンプルがメンバーであるサブツリーで接続されると共に検出対象毎のクラスにクラスタ化された木構造が登録された記憶部にアクセスし、第１段階では前記前処理手順で算出した特徴ベクトルと前記記憶部内のクラスとのマッチングを行い、第２段階ではマッチングされたクラス内の各メンバーとのマッチングを行いマッチングすると認識された検出対象のデータを出力する認識手順
を前記コンピュータに実行させることを特徴とする、プログラム。
（付記２）
前記認識手順は、
前記第１段階では前記入力特徴ベクトルと前記木構造のノードとの間の距離を算出し、前記距離で前記木構造のノードを絞込み、前記入力特徴ベクトルと前記木構造のノードとの相関値から類似度を算出し、
前記第２段階では前記第１段階で絞り込んだ前記木構造のノードから該当するカテゴリのサブツリーを抽出し、前記入力特徴ベクトルと抽出したサブツリーの各ノードとの距離を算出して最大距離と最小距離のノードを抽出し、前記入力特徴ベクトルと抽出されたノードとの相関値から総合的に類似度を算出することを特徴とする、付記１記載のプログラム。
（付記３）
前記サンプルが登録されたサブツリー、或いは、新たに作成されたサブツリーに基づいて木構造を更新して前記記憶部に格納する更新手順
を前記コンピュータに更に実行させ、
前記更新手順は、直前フレームの特徴点とマッチした最新フレームの特徴点、或いは、前記記憶部に登録されている特徴点とマッチした特徴点の入力特徴ベクトルをサンプルとして前記検出対象の木構造のサブツリーに登録すると共に、前記記憶部に登録されている特徴点とマッチしなかった特徴点の入力特徴ベクトルを新しいクラスの代表としてクラスを生成すると同時に当該入力特徴ベクトルをサンプルとして前記木構造のサブツリーを作成することを特徴とする、付記１又は２記載のプログラム。
（付記４）
前記前処理手順は、前記画像データをカメラの出力から入力し、
前記認識手順により認識された検出対象のデータに基づいて前記カメラの位置を推定する推定手順
を前記コンピュータに更に実行させることを特徴とする、付記１乃至３のいずれか１項記載のプログラム。
（付記５）
付記１乃至４のいずれか１項記載のプログラムが格納されたことを特徴とする、コンピュータ読み取り可能な記憶媒体。
（付記６）
入力された画像データの各フレームに対して特徴点を抽出し、抽出した特徴点に基づいて入力特徴ベクトルを算出する前処理部と、
検出対象画像の特徴点の特徴ベクトルをノードとし、カテゴリ毎に当該カテゴリを代表する代表特徴ベクトルと特徴ベクトルのサンプルがメンバーであるサブツリーで接続されると共に検出対象毎のクラスにクラスタ化された木構造が登録された記憶部と、
第１段階では前記入力特徴ベクトルと前記記憶部内のクラスとのマッチングを行い、第２段階ではマッチングされたクラス内の各メンバーとのマッチングを行いマッチングすると認識された検出対象のデータを出力する認識部を備えたことを特徴とする、画像処理装置。
（付記７）
前記認識部は、
前記第１段階では前記入力特徴ベクトルと前記木構造のノードとの間の距離を算出し、前記距離で前記木構造のノードを絞込み、前記入力特徴ベクトルと前記木構造のノードとの相関値から類似度を算出し、
前記第２段階では前記第１段階で絞り込んだ前記木構造のノードから該当するカテゴリのサブツリーを抽出し、前記入力特徴ベクトルと抽出したサブツリーの各ノードとの距離を算出して最大距離と最小距離のノードを抽出し、前記入力特徴ベクトルと抽出されたノードとの相関値から総合的に類似度を算出することを特徴とする、付記６記載の画像処理装置。
（付記８）
前記サンプルが登録されたサブツリー、或いは、新たに作成されたサブツリーに基づいて木構造を更新して前記記憶部に格納する更新部を更に備え、
前記更新部は、直前フレームの特徴点とマッチした最新フレームの特徴点、或いは、前記記憶部に登録されている特徴点とマッチした特徴点の入力特徴ベクトルをサンプルとして前記検出対象の木構造のサブツリーに登録すると共に、前記記憶部に登録されている特徴点とマッチしなかった特徴点の入力特徴ベクトルを新しいクラスの代表としてクラスを生成すると同時に当該入力特徴ベクトルをサンプルとして前記木構造のサブツリーを作成することを特徴とする、付記６又は７記載の画像処理装置。
（付記９）
前記画像データを出力するカメラと、
前記認識部により認識された検出対象のデータに基づいて前記カメラの位置を推定する推定部を更に備えたことを特徴とする、付記６乃至８のいずれか１項記載の画像処理装置。 The following additional notes are further disclosed with respect to the embodiment including the above examples.
(Appendix 1)
A program for causing a computer to detect a detection target from image data,
A preprocessing procedure for extracting feature points for each frame of input image data and calculating a feature vector based on the extracted feature points;
A tree in which a feature vector of a feature point of a detection target image is a node, a representative feature vector representing the category and a sample of the feature vector are connected by a subtree as a member and clustered in a class for each detection target The storage unit in which the structure is registered is accessed, and the feature vector calculated in the preprocessing procedure is matched with the class in the storage unit in the first stage, and each member in the matched class is compared in the second stage. A program for causing a computer to execute a recognition procedure for performing matching and outputting detection target data recognized as matching.
(Appendix 2)
The recognition procedure is:
In the first step, a distance between the input feature vector and the tree structure node is calculated, the tree structure node is narrowed down by the distance, and a correlation value between the input feature vector and the tree structure node is calculated. Calculate the similarity,
In the second stage, a sub-tree of a corresponding category is extracted from the nodes of the tree structure narrowed down in the first stage, and a distance between the input feature vector and each node of the extracted sub-tree is calculated to calculate a maximum distance and a minimum distance. The program according to claim 1, wherein the similarity is calculated comprehensively from the correlation value between the input feature vector and the extracted node.
(Appendix 3)
The computer further executes an update procedure for updating a tree structure based on a subtree in which the sample is registered or a newly created subtree and storing the updated tree structure in the storage unit,
In the update procedure, the feature point of the latest frame that matches the feature point of the previous frame or the input feature vector of the feature point that matches the feature point registered in the storage unit is used as a sample of the tree structure of the detection target. A class is registered with the input feature vector of the feature point not registered with the feature point registered in the storage unit as a representative of a new class, and at the same time, the input feature vector is used as a sample and the tree-structured subtree The program according to appendix 1 or 2, characterized in that:
(Appendix 4)
In the preprocessing procedure, the image data is input from an output of a camera,
The program according to any one of appendices 1 to 3, further causing the computer to execute an estimation procedure for estimating the position of the camera based on the detection target data recognized by the recognition procedure.
(Appendix 5)
A computer-readable storage medium in which the program according to any one of appendices 1 to 4 is stored.
(Appendix 6)
Extracting a feature point for each frame of input image data and calculating an input feature vector based on the extracted feature point;
A tree in which a feature vector of a feature point of a detection target image is a node, a representative feature vector representing the category and a sample of the feature vector are connected by a subtree as a member and clustered in a class for each detection target A storage unit with a registered structure;
In the first stage, the input feature vector is matched with the class in the storage unit, and in the second stage, matching is performed with each member in the matched class, and recognition target data recognized as matching is output. An image processing apparatus comprising a unit.
(Appendix 7)
The recognition unit
In the first step, a distance between the input feature vector and the tree structure node is calculated, the tree structure node is narrowed down by the distance, and a correlation value between the input feature vector and the tree structure node is calculated. Calculate the similarity,
In the second stage, a sub-tree of a corresponding category is extracted from the nodes of the tree structure narrowed down in the first stage, and a distance between the input feature vector and each node of the extracted sub-tree is calculated to calculate a maximum distance and a minimum distance. The image processing apparatus according to appendix 6, wherein the similarity is calculated comprehensively from the correlation value between the input feature vector and the extracted node.
(Appendix 8)
An update unit that updates a tree structure based on a subtree in which the sample is registered or a newly created subtree and stores the tree structure in the storage unit;
The update unit uses the feature point of the latest frame that matches the feature point of the previous frame or the input feature vector of the feature point that matches the feature point registered in the storage unit as a sample of the tree structure of the detection target A class is registered with the input feature vector of the feature point not registered with the feature point registered in the storage unit as a representative of a new class, and at the same time, the input feature vector is used as a sample and the tree-structured subtree The image processing apparatus according to appendix 6 or 7, characterized in that:
(Appendix 9)
A camera that outputs the image data;
The image processing apparatus according to any one of appendices 6 to 8, further comprising an estimation unit configured to estimate the position of the camera based on detection target data recognized by the recognition unit.

以上、開示のプログラム及び画像処理装置を実施例により説明したが、本発明は上記実施例に限定されるものではなく、本発明の範囲内で種々の変形及び改良が可能であることは言うまでもない。 As described above, the disclosed program and the image processing apparatus have been described using the embodiments. However, the present invention is not limited to the above-described embodiments, and various modifications and improvements can be made within the scope of the present invention. .

１ロボット
１１ナビゲーションＣＰＵ
１２走行制御ＣＰＵ
１３台車
１４センサ部
１５入出力部
１６，９０２記憶部
１４１カメラ
８０１，９０３通信部
９０１サーバ
９０４ＣＰＵ 1 Robot 11 Navigation CPU
12 Travel control CPU
13 trolley 14 sensor unit 15 input / output unit 16,902 storage unit 141 camera 801,903 communication unit 901 server 904 CPU

Claims

A program for causing a computer to detect a detection target from image data,
A preprocessing procedure for extracting feature points for each frame of input image data and calculating a feature vector based on the extracted feature points;
A tree in which a feature vector of a feature point of a detection target image is a node, a representative feature vector representing the category and a sample of the feature vector are connected by a subtree as a member and clustered in a class for each detection target The storage unit in which the structure is registered is accessed, and the feature vector calculated in the preprocessing procedure is matched with the class in the storage unit in the first stage, and each member in the matched class is compared in the second stage. A program for causing a computer to execute a recognition procedure for performing matching and outputting detection target data recognized as matching.

The recognition procedure is:
In the first step, a distance between the input feature vector and the tree structure node is calculated, the tree structure node is narrowed down by the distance, and a correlation value between the input feature vector and the tree structure node is calculated. Calculate the similarity,
In the second stage, a sub-tree of a corresponding category is extracted from the nodes of the tree structure narrowed down in the first stage, and a distance between the input feature vector and each node of the extracted sub-tree is calculated to calculate a maximum distance and a minimum distance. 2. The program according to claim 1, wherein the nodes are extracted, and the similarity is comprehensively calculated from the correlation value between the input feature vector and the extracted nodes.

The computer further executes an update procedure for updating a tree structure based on a subtree in which the sample is registered or a newly created subtree and storing the updated tree structure in the storage unit,
In the update procedure, the feature point of the latest frame that matches the feature point of the previous frame or the input feature vector of the feature point that matches the feature point registered in the storage unit is used as a sample of the tree structure of the detection target. A class is registered with the input feature vector of the feature point not registered with the feature point registered in the storage unit as a representative of a new class, and at the same time, the input feature vector is used as a sample and the tree-structured subtree The program according to claim 1 or 2, wherein the program is created.

In the preprocessing procedure, the image data is input from an output of a camera,
4. The program according to claim 1, further causing the computer to execute an estimation procedure for estimating the position of the camera based on the detection target data recognized by the recognition procedure. 5.

Extracting a feature point for each frame of input image data and calculating an input feature vector based on the extracted feature point;
A tree in which a feature vector of a feature point of a detection target image is a node, a representative feature vector representing the category and a sample of the feature vector are connected by a subtree as a member and clustered in a class for each detection target A storage unit with a registered structure;
In the first stage, the input feature vector is matched with the class in the storage unit, and in the second stage, matching is performed with each member in the matched class, and recognition target data recognized as matching is output. An image processing apparatus comprising a unit.