JP6188345B2

JP6188345B2 - Information processing apparatus and information processing method

Info

Publication number: JP6188345B2
Application number: JP2013028346A
Authority: JP
Inventors: 雅人青葉; 奥野　泰弘; 泰弘奥野; 貴之猿田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-02-15
Filing date: 2013-02-15
Publication date: 2017-08-30
Anticipated expiration: 2033-02-15
Also published as: JP2014157509A

Description

本発明は、画像における対象物体の位置姿勢推定技術に関するものである。 The present invention relates to a technique for estimating the position and orientation of a target object in an image.

対象物体を撮像した画像から該対象物体を検出することを課題とした画像認識に関し、多くの研究開発がおこなわれてきた。画像認識技術はさまざまな分野に応用され、例えば顔認識や、工場における部品認識など、多くの実問題に利用されてきている。 A lot of research and development has been conducted on image recognition in which an object is to detect a target object from an image of the target object. Image recognition technology has been applied to various fields and has been used for many real problems such as face recognition and parts recognition in factories.

画像パターン認識の観点で考えた場合、入力された情報をどのようにしてクラス分類するか、という研究がおこなわれている。例えば、ニューラルネットワークやＳＶＭ、ＲａｎｄｏｍｉｚｅｄＴｒｅｅや非特許文献１によるＦＥＲＮなど、さまざまな手法が提案されている。これらの手法における識別器を生成する際には、学習画像が必要となる。 From the viewpoint of image pattern recognition, research has been conducted on how to classify input information. For example, various methods such as a neural network, SVM, Randomized Tree, and FERN according to Non-Patent Document 1 have been proposed. A learning image is required when generating a discriminator in these methods.

近年の工業的な視覚認識では、例えば山積みされた部品を検出するような、３次元的に姿勢自由度の高い対象物体を認識するニーズもある。３次元的な姿勢を検出しようとする場合には、対象物体のさまざまな姿勢に対応する学習画像が必要となる。ロボットによる部品のピッキングなどを目的とした認識タスクでは、対象物体の姿勢情報は極めて重要となる。 In recent industrial visual recognition, there is a need to recognize a target object having a high degree of freedom of posture in a three-dimensional manner, for example, detecting piled parts. In order to detect a three-dimensional posture, learning images corresponding to various postures of the target object are required. In recognition tasks aimed at picking parts by a robot, the posture information of the target object is extremely important.

学習画像に対応する姿勢は、オイラー角や四元数などのパラメータによって表現されるが、このような姿勢を既知とするような対象物体の学習画像を、実写画像として用意することは難しい。特許文献１では、撮影対象物体とカメラの相対位置姿勢を計測しながら手持ちカメラで画像を撮影し、撮影数が不充分な位置姿勢に対するカメラの移動方向、回転角等を表示することで、必要な全周囲画像データセットが取得できる方法を提案している。 The posture corresponding to the learning image is expressed by parameters such as Euler angles and quaternions. However, it is difficult to prepare a learning image of a target object that makes such posture known as a real image. In Patent Document 1, it is necessary to take an image with a handheld camera while measuring the relative position and orientation of the object to be photographed and the camera, and display the moving direction, rotation angle, etc. of the camera with respect to the position and orientation where the number of photographing is insufficient. A method that can obtain a complete omnidirectional image data set is proposed.

一方、ＣＡＤなどの３次元モデルデータによる任意姿勢のＣＧ画像を生成し、これを学習画像とすることがよく行われる。３次元モデルを用いたＣＧでは、視点を自由に設定することができるため、実写撮影と比べて大量の学習画像を生成することが容易である。特に、デプスマップなどの距離情報を用いて対象物体の位置姿勢を判別する場合には、３次元モデルから得られる距離情報を用いて学習画像として扱うこともできる。 On the other hand, it is often performed to generate a CG image of an arbitrary posture based on three-dimensional model data such as CAD and use it as a learning image. In the CG using the three-dimensional model, the viewpoint can be set freely, so that it is easy to generate a large amount of learning images as compared with the real-photographing. In particular, when the position and orientation of a target object is determined using distance information such as a depth map, it can be handled as a learning image using distance information obtained from a three-dimensional model.

特開２０１１−１９８３４９号公報JP 2011-198349 A

Ｍ．Ｏｚｕｙｓａｌ，ｅｔａｌ．， ”ＦａｓｔＫｅｙｐｏｉｎｔＲｅｃｏｇｎｉｔｉｏｎｕｓｉｎｇＲａｎｄｏｍＦｅｒｎｓ”，ＩＥＥＥＴｒａｎｓ．ｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ，（２０１０）．M.M. Ozuyal, et al. "Fast Keypoint Recognition using Random Ferns", IEEE Trans. on Pattern Analysis and Machine Intelligence, (2010).

３次元モデルからＣＧ画像を生成する際、そのモデルをどこから見た画像を生成するのか、といった視点の設定を行う必要がある。学習画像を生成するための視点は離散的に設定されるもので、設定された視点の位置から見たときに得られる対象物体の画像をＣＧで生成することで、各姿勢の学習画像を得ることになる。 When generating a CG image from a three-dimensional model, it is necessary to set a viewpoint such as where an image viewed from the model is generated. The viewpoint for generating the learning image is set discretely, and the learning image of each posture is obtained by generating an image of the target object obtained when viewed from the position of the set viewpoint with CG. It will be.

対象物体の姿勢自由度が高い場合には、検出時の入力画像における対象物体の姿勢と、離散的に定義された学習画像の姿勢とが、完全に一致することはほとんどない。そのため、識別器としては、学習に使われた姿勢と最も類似した姿勢はどれか、を判定することになる。そのためには、識別器で用いている特徴が、近傍姿勢間で滑らかに変化していくことが望ましい。 When the degree of freedom of posture of the target object is high, the posture of the target object in the input image at the time of detection hardly matches the posture of the discretely defined learning image. Therefore, the discriminator determines which posture is most similar to the posture used for learning. For this purpose, it is desirable that the features used in the classifier change smoothly between neighboring postures.

しかしながら、対象物体の形状によっては、ある姿勢の近辺で、その特徴に急激な変化が発生する場合がある。そのような姿勢の近辺では、識別器が対応できる範囲は極端に狭いため、姿勢に関する認識精度の低下が発生する。 However, depending on the shape of the target object, there may be a sudden change in the characteristics in the vicinity of a certain posture. In the vicinity of such a posture, the range that can be handled by the discriminator is extremely narrow, which causes a reduction in recognition accuracy regarding the posture.

本発明はこのような問題に鑑みてなされたものであり、入力画像中の対象物体の姿勢推定を行う場合に、特定の姿勢クラス近傍における姿勢に関する認識精度の低下を抑制するための技術を提供することを目的とする。 The present invention has been made in view of such problems, and provides a technique for suppressing a decrease in recognition accuracy related to a posture in the vicinity of a specific posture class when performing posture estimation of a target object in an input image. The purpose is to do.

本発明の目的を達成するために、例えば、本発明の情報処理装置は、対象物体の姿勢を推定する識別器の学習を行うために該識別器に与える学習用画像を生成する情報処理装置であって、
前記対象物体の形状モデルに対して少なくとも１つの視点を設定する設定手段と、
前記設定された視点から前記形状モデルを観察した場合の前記視点から前記形状モデルにおける各領域までの距離の値の分布から、前記形状モデルの平面度を導出する導出手段と、
前記設定された視点において導出された平面度が閾値以下の場合に、前記設定された視点から前記形状モデルを見た該形状モデルの画像を、前記学習用画像として生成する生成手段と
を備えることを特徴とする。 In order to achieve the object of the present invention, for example, the information processing apparatus of the present invention is an information processing apparatus that generates a learning image to be given to a discriminator for learning a discriminator that estimates the posture of a target object. There,
Setting means for setting at least one viewpoint for the shape model of the target object;
Deriving means for deriving the flatness of the shape model from the distribution of distance values from the viewpoint to each region in the shape model when the shape model is observed from the set viewpoint;
Generating means for generating, as the learning image, an image of the shape model obtained by viewing the shape model from the set viewpoint when the flatness derived from the set viewpoint is equal to or less than a threshold value. It is characterized by.

本発明の構成により、入力画像中の対象物体の姿勢推定を行う場合に、特定の姿勢クラス近傍における姿勢に関する認識精度の低下を抑制することができる。 With the configuration of the present invention, it is possible to suppress a decrease in recognition accuracy related to a posture in the vicinity of a specific posture class when performing posture estimation of the target object in the input image.

測地ドームを用いた画像生成を説明する図。The figure explaining the image generation using a geodesic dome. 入力画像からツリーを作成する処理を説明する図。The figure explaining the process which creates a tree from an input image. 対象物体の姿勢認識の処理のフローチャート。The flowchart of the process of posture recognition of a target object. 特徴の急激な変化について説明する図。The figure explaining the rapid change of a characteristic. 視点のずれによるツリーの分岐状態について説明する図。The figure explaining the branch state of the tree by the shift | offset | difference of a viewpoint. 姿勢クラスのスコアを示す図。The figure which shows the score of a posture class. 識別器を学習する為の処理のフローチャート。The flowchart of the process for learning a discriminator. ステップＳ１３００における処理の詳細を示すフローチャート。The flowchart which shows the detail of the process in step S1300. 放線頻度マップについて説明する図。The figure explaining a ray frequency map. 近傍視点位置を説明する図。The figure explaining a near viewpoint position. ステップＳ１３００における処理の詳細を示すフローチャート。The flowchart which shows the detail of the process in step S1300. 学習画像へのバイアスの付加について説明する図。The figure explaining the addition of the bias to a learning image. 情報処理装置のハードウェア構成例を示すブロック図。The block diagram which shows the hardware structural example of information processing apparatus.

以下、添付図面を参照し、本発明の好適な実施形態について説明する。なお、以下説明する実施形態は、本発明を具体的に実施した場合の一例を示すもので、特許請求の範囲に記載の構成の具体的な実施例の１つである。 Preferred embodiments of the present invention will be described below with reference to the accompanying drawings. The embodiment described below shows an example when the present invention is specifically implemented, and is one of the specific examples of the configurations described in the claims.

［第１の実施形態］
本実施形態について説明を行う前に、先ず、本実施形態を含む以降の各実施形態の課題について説明する。様々な姿勢の３次元モデルのＣＧ画像を生成するためには、まず、３次元モデルを観察する視点の設定を行う必要がある。視点の設定は、代表的には測地ドームによって行われる。 [First Embodiment]
Before describing this embodiment, first, problems of each embodiment including the present embodiment will be described. In order to generate a CG image of a three-dimensional model in various postures, it is first necessary to set a viewpoint for observing the three-dimensional model. The setting of the viewpoint is typically performed by a geodetic dome.

図１に示すように、学習画像は、対象物体の３次元モデルの物体中心Ａ４０４を中心とした測地ドームＡ４０１上の各点（各視点Ａ４０３）について、該点で様々なロール角で３次元モデルを観察した画像群Ａ４０２として得られる。視点とその視点におけるロール角との組み合わせ（姿勢クラス）ごとにインデックスが与えられる。例えば、視点数を７２とし、それぞれの視点でロール角を３０度ごとに変化させながら（面内回転させながら）３次元モデルを観察して該３次元モデルの画像（学習画像）を生成すると、生成する学習画像の枚数は７２×１２＝８６４枚となる。然るに、辞書としては、７２×１２＝８６４クラスの識別器を学習させることになる。 As shown in FIG. 1, the learning image is a three-dimensional model with various roll angles at each point (each viewpoint A403) on the geodetic dome A401 centered on the object center A404 of the three-dimensional model of the target object. Is obtained as an image group A402. An index is given for each combination (posture class) of the viewpoint and the roll angle at the viewpoint. For example, when the number of viewpoints is 72 and the roll angle is changed every 30 degrees at each viewpoint (while rotating in-plane), the three-dimensional model is observed and a learning image is generated. The number of learning images to be generated is 72 × 12 = 864. However, as a dictionary, 72 × 12 = 864 classifiers are learned.

以降の説明では、視点数をＮ、ロール角を回転させる回数（１つの視点について面内回転させる回数）をＮｒとし、視点のインデックスをｎ（ｎ＝１…Ｎ）、面内回転のインデックスをｒ（ｒ＝１…Ｎｒ）とする。また、視点ｎと面内回転ｒの組み合わせによって決まるクラス（姿勢クラス）のインデックスをν＝１…Ｎ×Ｎｒで表わすこととする。姿勢クラスνは視点ｎと面内回転ｒの組み合わせに対して１対１で対応するため、クラスインデックスをν［ｎ，ｒ］と記述する場合もある。 In the following description, the number of viewpoints is N, the number of rotations of the roll angle (number of in-plane rotations for one viewpoint) is Nr, the viewpoint index is n (n = 1... N), and the in-plane rotation index is Let r (r = 1... Nr). Also, an index of a class (posture class) determined by a combination of the viewpoint n and the in-plane rotation r is represented by ν = 1... N × Nr. Since the attitude class ν has a one-to-one correspondence with the combination of the viewpoint n and the in-plane rotation r, the class index may be described as ν [n, r].

ここでは、ＲａｎｄｏｍｉｚｅｄＴｒｅｅに代表される、二点比較による分岐を行うツリー型識別器を用いて、デプスマップによる対象物体の姿勢認識を行う識別方法に関して、図３のフローチャートに従って説明する。なお、この図３のフローチャートに従った処理は、ＰＣ（パーソナルコンピュータ）などの装置によって実行されるものである。もちろん、この装置は、図３のフローチャートに従った処理をＣＰＵに実行させる為のコンピュータプログラムを保持しており、このコンピュータプログラムをＣＰＵが実行することで、図３のフローチャートに従った処理を実行する。 Here, an identification method for performing posture recognition of a target object by a depth map using a tree type classifier that performs branching by two-point comparison, represented by Randomized Tree, will be described with reference to the flowchart of FIG. Note that the processing according to the flowchart of FIG. 3 is executed by a device such as a PC (personal computer). Of course, this apparatus holds a computer program for causing the CPU to execute the process according to the flowchart of FIG. 3, and the CPU executes the process according to the flowchart of FIG. To do.

先ず、ステップＳ２１００における入力画像読み込み工程では、不図示の撮像装置や記憶装置から、対象物体が写っている画像を入力画像として取得する。ここで、この入力画像は、各画素の画素値が、該入力画像を撮像した撮像装置から該画素に写っている対象までの距離の値（距離情報）であるような、いわゆる距離画像（デプスマップ）であるものとする。撮像装置を用いて距離情報を取得するための手法としては、ステレオカメラや空間コード化による装置などが考えられるが、それらの手法に限定されるものではないし、後述するように、入力画像もデプスマップに限定されるものではない。 First, in the input image reading step in step S2100, an image showing the target object is acquired as an input image from an imaging device (not shown) or a storage device. Here, this input image is a so-called distance image (depth information) in which the pixel value of each pixel is a distance value (distance information) from the imaging device that has captured the input image to the target. Map). As a method for acquiring distance information using an imaging device, a stereo camera or a device using spatial coding is conceivable. However, the method is not limited to these methods, and an input image is also depth-dedicated as described later. It is not limited to maps.

次に、ステップＳ２２００における位置姿勢推定工程では、図２に例示する入力画像Ａ５００に対し、ウィンドウの位置（探索位置Ａ２００）をラスタスキャン順に移動させながら、該ウィンドウ内の画像を用いて後述するツリーを作成する。 Next, in the position / orientation estimation step in step S2200, a tree (to be described later) using the images in the window while moving the window position (search position A200) in the raster scan order with respect to the input image A500 illustrated in FIG. Create

識別処理は、Ａ３００で示される複数のツリーｍ（ｍ＝１…Ｍ）を使って行われる。ここで、Ｍはツリーの本数である。各ツリーの基本動作は以下のとおりである。入力画像に対し、ツリーの各ノードでは、入力画像上の異なる２点間のデプス値の大小で分岐が行われる。図２に示されているｍ番目のツリーＡ３１０の、ｉ番目のノードＡ３１２で比較される２点Ａ１１０およびＡ１２０のそれぞれの位置（探索位置Ａ２００からの相対座標）を、ＸｍｉＢ＝（ｘｍｉＢ，ｙｍｉＢ）、ＸｍｉＬ＝（ｘｍｉＬ，ｙｍｉＬ）とする。そして、上記２点のそれぞれにおけるデプス値がＤ（ＸｍｉＢ）、Ｄ（ＸｍｉＬ）であるとしたとき、以下の式（１）が満たされればツリーの右下のノードに処理を進め、以下の式（２）が満たされればツリーの左下のノードに処理を進める。 The identification process is performed using a plurality of trees m (m = 1... M) indicated by A300. Here, M is the number of trees. The basic operation of each tree is as follows. With respect to the input image, each node of the tree branches according to the depth value between two different points on the input image. The respective positions (relative coordinates from the search position A200) of the two points A110 and A120 to be compared at the i-th node A312 of the m-th tree A310 shown in FIG. 2 are expressed as XmiB = (xmiB, ymiB). , XmiL = (xmiL, ymiL). Then, assuming that the depth values at each of the two points are D (XmiB) and D (XmiL), if the following expression (1) is satisfied, the process proceeds to the lower right node of the tree. If (2) is satisfied, the process proceeds to the lower left node of the tree.

１枚の入力画像に対して根ノードＡ３１１からスタートし、分岐が終端ノードＡ３１３に達した時点で、そのツリーに関する分岐は終了する。各ツリーｍの各終端ノードには姿勢クラスのインデックスν（ν＝１…Ｎ×Ｎｒ）が割り振られており、これら複数のツリーによる姿勢クラスへの投票数を、各姿勢クラスのスコアとする。 Starting from the root node A311 with respect to one input image, when the branch reaches the end node A313, the branch related to the tree ends. Each end node of each tree m is assigned a posture class index ν (ν = 1... N × Nr), and the number of votes for the posture class by the plurality of trees is used as a score of each posture class.

入力画像Ｉが与えられた時、ツリーｍに対して割り振られていた姿勢クラスのインデックスがＩＤＸｍ（Ｉ）であったとしたとき、姿勢クラスνに関するスコアＳＣＯＲＥ（ν，Ｉ）は以下の式（３）で定義される。 When the input image I is given and the index of the posture class assigned to the tree m is IDXm (I), the score SCORE (ν, I) regarding the posture class ν is expressed by the following equation (3) ).

ここで、δ（Ａ）は、条件Ａが真のときに１を、偽のときに０を返す関数である。すべての探索位置Ａ２００に関して検出処理を行った後、スコアに関するピーク検出や閾値処理などを行って、推定される対象物体の位置および姿勢を決定する。 Here, δ (A) is a function that returns 1 when the condition A is true and returns 0 when the condition A is false. After performing detection processing for all search positions A200, peak detection and threshold processing regarding scores are performed to determine the estimated position and orientation of the target object.

次に、このツリーによる識別器の学習方法を簡単に説明する。まず、識別クラスとして与えるすべての姿勢インデックスに対応する学習画像を用意する。各ノードでは、比較する二点の位置をランダムに決定し、検出時と同様の分岐処理を行う。分岐した結果、次のノードに一つの姿勢インデックスに対応する学習画像だけが残った場合には、そのノードを終端ノードとし、姿勢インデックスを割り振る。すべての学習画像に対して、終端ノードが決定されれば、そのツリーに関する学習は終了する。 Next, a classifier learning method using this tree will be briefly described. First, learning images corresponding to all posture indexes given as identification classes are prepared. At each node, the positions of the two points to be compared are determined randomly, and branch processing similar to that at the time of detection is performed. As a result of branching, when only a learning image corresponding to one posture index remains in the next node, that node is set as a terminal node and a posture index is allocated. If the end node is determined for all the learning images, the learning for the tree ends.

測地ドームによって均等間隔で割り振られた視点から生成されたデプス学習画像には、対象物体の形状によってその特徴に急激な変化が発生する場合がある。例えば、図４のように、３次元モデル上の面積の大きな面Ａ４１０に関する法線Ａ４２０が、視点Ａ４３０（視点Ａ）とオブジェクト座標原点を結ぶ視線方向Ａ４４０と並行である場合を考える。この視点Ａ４３０から得られる画像Ａ４５０では、面Ａ４１０に相当する領域Ａ４１１内の各画素のデプス値は、何れも同じデプス値となる。この画像では、領域Ａ４１１から取ったどの２点の組み合わせを比較しても、必ず式（２）の条件を満たすこととなる。このような状態は広い姿勢空間の中でも、視線方向Ａ４４０と面積の大きな面Ａ４１０が完全に直交した場合にのみ発生する特異点である。 In the depth learning image generated from the viewpoint allocated at equal intervals by the geodetic dome, there may be a case where a sudden change occurs in the feature depending on the shape of the target object. For example, as shown in FIG. 4, consider a case where a normal A420 related to a surface A410 having a large area on a three-dimensional model is parallel to a line-of-sight direction A440 connecting the viewpoint A430 (viewpoint A) and the object coordinate origin. In the image A450 obtained from the viewpoint A430, the depth values of the pixels in the area A411 corresponding to the surface A410 are all the same depth value. In this image, even if any combination of two points taken from the area A411 is compared, the condition of Expression (2) is always satisfied. Such a state is a singular point that occurs only when the line-of-sight direction A440 and the large-area surface A410 are completely orthogonal to each other even in a wide posture space.

図５の視点Ａ４３１（視点Ａ”）のように、視点Ａからほんの少しずれただけでも、得られる画像Ａ４５１における面Ａ４１０に対応する領域Ａ４１２のデプス値には勾配が乗ることになり、ツリーにおける分岐の状態は大きく変化する。 Like the viewpoint A431 (viewpoint A ″) in FIG. 5, even if it is slightly deviated from the viewpoint A, the depth value of the area A412 corresponding to the surface A410 in the obtained image A451 has a gradient. The branching state changes greatly.

図６（ａ）は、視点Ａ４３０、視点Ａ４３０と等間隔で並ぶ視点Ａ４３２（視点Ｚ）、Ａ４３３（視点Ｂ）、Ａ４３４（視点Ｃ）に沿って視点位置が移動していったときの、各視点に対応する姿勢クラスのスコアをプロットしたものである。ただし、説明を簡単にするため、ここでは面内回転は考慮しないものとする。 FIG. 6 (a) shows the viewpoints when the viewpoint position moves along viewpoints A432 (viewpoint Z), A433 (viewpoint B), and A434 (viewpoint C) arranged at equal intervals with viewpoint A430 and viewpoint A430. This is a plot of the score of the posture class corresponding to the viewpoint. However, in order to simplify the description, in-plane rotation is not considered here.

Ｂ１３２、Ｂ１３３、Ｂ１３４はそれぞれ、視点Ａ４３２、視点Ａ４３３、視点Ａ４３４に対応する姿勢クラスのスコア曲線である。これらの視点付近では上記のような特異点は通過しないため、スコア曲線は滑らかに変化する。これに対して、視点Ａ４３０に対応する姿勢クラスのスコア曲線は、Ｂ１３０のようにピーキーな曲線となる。そのため、Ｂ１１０のような領域では、姿勢に関する認識精度の低下が発生する。 B132, B133, and B134 are score curves of posture classes corresponding to the viewpoint A432, the viewpoint A433, and the viewpoint A434, respectively. Since the singular points as described above do not pass near these viewpoints, the score curve changes smoothly. On the other hand, the score curve of the posture class corresponding to the viewpoint A430 is a peaky curve like B130. Therefore, in the region like B110, the recognition accuracy regarding the posture is lowered.

対象物体が工業製品である場合には対象物体は平面で構成されることが多く、また、ＣＡＤデータにおける対象物体座標系は何かしらの基準面に合わせて設定されることが多いため、上記のような状況は容易に発生する。 When the target object is an industrial product, the target object is often composed of a plane, and the target object coordinate system in CAD data is often set according to some reference plane. This situation can easily occur.

次に、以上説明した課題を解決するための学習方法に関する実施形態を説明する。先ず、本実施形態に係る情報処理装置のハードウェア構成例について、図１３のブロック図を用いて説明する。 Next, an embodiment relating to a learning method for solving the above-described problem will be described. First, a hardware configuration example of the information processing apparatus according to the present embodiment will be described with reference to the block diagram of FIG.

ＣＰＵ１３５１は、ＲＡＭ１３５２やＲＯＭ１３５３に格納されているコンピュータプログラムやデータを用いて各処理を実行することで、情報処理装置全体の動作制御を行うと共に、情報処理装置が行うものとして後述する各処理を実行する。 The CPU 1351 executes each process using a computer program and data stored in the RAM 1352 and the ROM 1353 to control the operation of the entire information processing apparatus, and executes each process described later as what the information processing apparatus performs. To do.

ＲＡＭ１３５２は、外部記憶装置１３５６からロードされたコンピュータプログラムやデータ、Ｉ／Ｆ（インターフェース）１３５７を介して外部装置から受信したコンピュータプログラムやデータを一時的に記憶するためのエリアを有する。更にＲＡＭ１３５２は、ＣＰＵ１３５１が各種の処理を実行する際に用いるワークエリアを有する。即ち、ＲＡＭ１３５２は、各種のエリアを適宜提供することができる。 The RAM 1352 has an area for temporarily storing the computer program and data loaded from the external storage device 1356 and the computer program and data received from the external device via the I / F (interface) 1357. Further, the RAM 1352 has a work area used when the CPU 1351 executes various processes. That is, the RAM 1352 can provide various areas as appropriate.

ＲＯＭ１３５３には、情報処理装置の設定データや、ブートプログラムなどが格納されている。操作部１３５４はキーボードやマウスなどにより構成されており、情報処理装置の操作者が操作することで、各種の指示をＣＰＵ１３５１に対して入力することができる。表示部１３５５は、ＣＲＴや液晶画面などにより構成されており、ＣＰＵ１３５１による処理結果を画像や文字などでもって表示することができる。 The ROM 1353 stores setting data for the information processing apparatus, a boot program, and the like. The operation unit 1354 is configured by a keyboard, a mouse, and the like, and various instructions can be input to the CPU 1351 by an operator of the information processing apparatus. The display unit 1355 is configured by a CRT, a liquid crystal screen, or the like, and can display a processing result by the CPU 1351 with an image, text, or the like.

外部記憶装置１３５６は、ハードディスクドライブ装置に代表される大容量情報記憶装置である。この外部記憶装置１３５６には、ＯＳ（オペレーティングシステム）や、情報処理装置が行うものとして後述する各処理をＣＰＵ１３５１に実行させるためのコンピュータプログラムやデータが保存されている。また、外部記憶装置１３５６には、以下の説明において既知の情報として扱っているものも保存されている。外部記憶装置１３５６に保存されているコンピュータプログラムやデータは、ＣＰＵ１３５１による制御に従って適宜ＲＡＭ１３５２にロードされ、ＣＰＵ１３５１による処理対象となる。 The external storage device 1356 is a mass information storage device represented by a hard disk drive device. The external storage device 1356 stores an OS (Operating System) and computer programs and data for causing the CPU 1351 to execute processes described later as those performed by the information processing apparatus. The external storage device 1356 also stores what is treated as known information in the following description. Computer programs and data stored in the external storage device 1356 are appropriately loaded into the RAM 1352 under the control of the CPU 1351, and are processed by the CPU 1351.

Ｉ／Ｆ１３５７には外部装置を接続することができ、情報処理装置は、このＩ／Ｆ１３５７を介して外部装置とのデータ通信を行うことができる。例えば、以下の説明において処理対象とする入力画像やそのほかの情報を、このＩ／Ｆ１３５７を介して外部装置から取得してもよいし、情報処理装置による処理結果をＩ／Ｆ１３５７を介して外部装置に対して送信してもよい。 An external device can be connected to the I / F 1357, and the information processing apparatus can perform data communication with the external device via the I / F 1357. For example, in the following description, an input image to be processed and other information may be acquired from an external device via the I / F 1357, and a processing result by the information processing device may be acquired via the I / F 1357. May be sent to.

上記の各部はバス１３５８に接続されている。なお、図１３に示した構成はあくまでも一例であり、以下に説明する各処理と同等の処理を実現可能な構成であれば、図１３に示した構成は適宜変更／変形させてもよい。また、以下の説明では、本実施形態に係る情報処理装置はＰＣなどのコンピュータであるものとするが、入力画像に対して以下のような処理を実現可能な装置であれば、他の機器であってもよく、例えば、ディジタルカメラであってもよい。 Each of the above parts is connected to a bus 1358. Note that the configuration shown in FIG. 13 is merely an example, and the configuration shown in FIG. 13 may be changed or modified as appropriate as long as processing equivalent to the processing described below can be realized. In the following description, the information processing apparatus according to the present embodiment is assumed to be a computer such as a PC. For example, a digital camera may be used.

次に、対象物体の姿勢を推定する識別器を学習するための一連の処理について、同処理のフローチャートを示す図７（ａ）を用いて説明する。以下では説明上、デプスマップによる対象物体の姿勢認識を行う識別器を例にとり説明する。 Next, a series of processes for learning a discriminator for estimating the posture of the target object will be described with reference to FIG. In the following description, a classifier that performs posture recognition of a target object using a depth map will be described as an example.

なお、図７（ａ）のフローチャートに従った処理をＣＰＵ１３５１に実行させるためのコンピュータプログラムやデータは外部記憶装置１３５６に格納されている。このコンピュータプログラムやデータはＣＰＵ１３５１による制御に従って適宜ＲＡＭ１３５２にロードされる。そしてＣＰＵ１３５１がこのロードされたコンピュータプログラムやデータを用いて処理を実行することで、情報処理装置は、図７（ａ）のフローチャートに従った処理を実行することになる。 Note that a computer program and data for causing the CPU 1351 to execute the process according to the flowchart of FIG. 7A are stored in the external storage device 1356. This computer program and data are appropriately loaded into the RAM 1352 under the control of the CPU 1351. When the CPU 1351 executes processing using the loaded computer program and data, the information processing apparatus executes processing according to the flowchart of FIG.

先ず、ステップＳ１１００におけるモデル読み込み工程では、ＣＰＵ１３５１は、外部記憶装置１３５６から、姿勢推定の対象となる対象物体の形状モデルをＲＡＭ１３５２にロードする。この形状モデルは、対象物体の形状を模したＣＡＤなどの３次元モデルであり、学習画像としてのデプスマップを得るために必要な情報を含んだモデルであって、対象物体の面情報を含むものである。 First, in the model reading step in step S1100, the CPU 1351 loads the shape model of the target object that is the target of posture estimation from the external storage device 1356 into the RAM 1352. This shape model is a three-dimensional model such as CAD that imitates the shape of the target object, and includes information necessary for obtaining a depth map as a learning image, and includes surface information of the target object. .

次に、ステップＳ１２００における視点設定工程では、ＣＰＵ１３５１は、この形状モデルを観察する視点を設定する。視点の設定は、代表的には図１のように測地ドームを用いて行われる。測地ドームによる視点設定は、隣接視点との間隔がすべて均等であるため、姿勢クラスとして扱いやすいという利点がある。ただし、視点設定方法は測地ドームを用いた方法に限定されるものではなく、形状モデルを囲む複数の位置に視点を設定できるのであれば、如何なる方法を採用してもよい。 Next, in the viewpoint setting step in step S1200, the CPU 1351 sets a viewpoint for observing the shape model. The setting of the viewpoint is typically performed using a geodetic dome as shown in FIG. The viewpoint setting by the geodetic dome has an advantage that it is easy to handle as a posture class because the distances between the adjacent viewpoints are all equal. However, the viewpoint setting method is not limited to the method using the geodetic dome, and any method may be adopted as long as the viewpoint can be set at a plurality of positions surrounding the shape model.

設定した視点の総数をＮ、視点のインデックスをｎ＝１…Ｎ、対象物体座標系（形状モデル中の１点を原点とし、該原点で互いに直交する３軸をそれぞれｘ、ｙ、ｚ軸とする座標系）における視点ｎの位置ベクトルをＶｎ＝（Ｘｖｎ，Ｙｖｎ，Ｚｖｎ）とする。また、測地ドームの原点は形状モデルの物体中心Ａ４０４と一致しているとする。Ｖｎは視点の方向のみを定義する単位ベクトルとし、よって、すべての視点ｎに対するＶｎのノルムは１である。 The total number of viewpoints set is N, the index of the viewpoint is n = 1... N, the target object coordinate system (one point in the shape model is the origin, and three axes orthogonal to each other at the origin are the x, y, and z axes, respectively. The position vector of the viewpoint n in the coordinate system) is assumed to be Vn = (Xvn, Yvn, Zvn). Further, it is assumed that the origin of the geodetic dome coincides with the object center A404 of the shape model. Vn is a unit vector that defines only the direction of the viewpoint. Therefore, the norm of Vn for all viewpoints n is 1.

ステップＳ１３００における更新工程の処理については、同処理の詳細を示す図８（ａ）のフローチャートを用いて説明する。ステップＳ１３１１における平面度算出工程では、ＣＰＵ１３５１は、視点ｎ、面内回転ｒで与えられる姿勢クラスνに基づいて形状モデルのデプスマップを生成する。この生成したデプスマップを、デプス画像Ｉνとする。即ち、インデックスがｎの視点から、インデックスがｒのロール角（インデックスがｒの面内回転）で観察した形状モデルの画像を、デプス画像Ｉνとして生成する。なお、各姿勢クラスに対応するデプス画像は予め作成して外部記憶装置１３５６に保存しておき、そこから視点に対するインデックスと面内回転に対するインデックスとの組み合わせに対応するデプス画像を取得するようにしても構わない。 The process of the update process in step S1300 will be described with reference to the flowchart of FIG. In the flatness calculation step in step S1311, the CPU 1351 generates a depth map of the shape model based on the posture class ν given by the viewpoint n and the in-plane rotation r. This generated depth map is defined as a depth image Iν. That is, an image of the shape model observed from the viewpoint with the index n at the roll angle with the index r (in-plane rotation with the index r) is generated as the depth image Iν. A depth image corresponding to each posture class is created in advance and stored in the external storage device 1356, and a depth image corresponding to a combination of an index for a viewpoint and an index for in-plane rotation is acquired from the depth image. It doesn't matter.

ここで、デプス画像Ｉν内で形状モデルが写っている領域をＴν、領域Ｔνの面積（画素数）をＳνとする。このとき、姿勢クラスごとに、該姿勢クラスに対応するデプス画像中のデプス値のヒストグラムを作成する。ここで算出するヒストグラムとは、各姿勢クラスに対応するデプス画像から得られる各デプス値の頻度を、該デプス画像におけるＳνで正規化したものである。ヒストグラムの範囲は、すべてのデプス画像Ｉν（ν＝１…Ｎ×Ｎｒ）におけるデプス値の最小値から最大値までとする。ビン幅は、デプス画像を利用して検出処理を行う際の距離測定装置の奥行き解像度に従って設定する。例えば、距離測定装置の奥行き解像度が１００ｎｍであれば、ビン幅も同様に１００ｎｍ、などと設定する。ビン幅とヒストグラム範囲から、ビン数Ｋは容易に得られる。姿勢クラスνに対応するデプス画像Ｉνから生成したヒストグラムにおいて、ｋ（＝１…Ｋ）番目のビンにおける頻度値をＨν（ｋ）と表わしたとき、以下のように定義される視点ｎの平面度Ｐｎを算出（導出）する。 Here, a region where the shape model is reflected in the depth image Iν is Tν, and an area (number of pixels) of the region Tν is Sν. At this time, for each posture class, a histogram of depth values in the depth image corresponding to the posture class is created. The histogram calculated here is obtained by normalizing the frequency of each depth value obtained from the depth image corresponding to each posture class by Sν in the depth image. The range of the histogram is from the minimum value to the maximum value of the depth values in all the depth images Iν (ν = 1... N × Nr). The bin width is set according to the depth resolution of the distance measuring device when performing the detection process using the depth image. For example, if the depth measurement device has a depth resolution of 100 nm, the bin width is set to 100 nm as well. From the bin width and the histogram range, the bin number K can be easily obtained. In the histogram generated from the depth image Iν corresponding to the posture class ν, when the frequency value in the k (= 1... K) bin is represented as Hν (k), the flatness of the viewpoint n defined as follows: Pn is calculated (derived).

ここでΝｎは、視点ｎに関する面内回転バリエーションで得られるクラスの集合で、要素数はＮｒである。即ち、平面度Ｐｎとは、視点ｎからそれぞれ異なるロール角で形状モデルを見た該形状モデルの画像から作成したそれぞれのヒストグラムにおいて同ビンごとの合計頻度値のうち最大値であり、視点ｎの評価値となる。 Here, Νn is a set of classes obtained by the in-plane rotation variation regarding the viewpoint n, and the number of elements is Nr. That is, the flatness Pn is the maximum value of the total frequency values for each bin in each histogram created from images of the shape model viewed from the viewpoint n at different roll angles. It becomes an evaluation value.

この平面度Ｐｎが大きい場合とは、視点ｎにおけるデプス画像において同一のデプス値の割合が大きいということであり、形状モデルに対して大きな面積を占める平面が視線方向に対して直交している状態を意味する。 The case where the flatness Pn is large means that the ratio of the same depth value is large in the depth image at the viewpoint n, and the plane occupying a large area with respect to the shape model is orthogonal to the line-of-sight direction. Means.

次に、全ての視点の平面度Ｐｎが閾値εｎ以下であれば、処理はステップＳ１４００に進む。一方、全ての視点の平面度Ｐｎのうち、１つでも閾値εｎより大きい場合には、処理はステップＳ１３１２に進む。 Next, if the flatness Pn of all viewpoints is equal to or less than the threshold εn, the process proceeds to step S1400. On the other hand, if even one of the flatness Pn of all viewpoints is larger than the threshold value εn, the process proceeds to step S1312.

ここで、閾値εｎは、同一デプス値となる画素が、領域Ｔｎ内でどの程度存在しても良いかを決める閾値であり、０＜εｎ＜１を満たすように設定される。閾値εｎは、０．３などの値で、すべての視点に対して統一の値で設定してもよい。あるいは、εｎ＝１／√Ｓｎといった、デプス画像中の形状モデルの領域のサイズに依存した値で設定してもよい。 Here, the threshold value εn is a threshold value that determines how many pixels having the same depth value may exist in the region Tn, and is set to satisfy 0 <εn <1. The threshold εn is a value such as 0.3, and may be set to a uniform value for all viewpoints. Or you may set with the value depending on the size of the area | region of the shape model in a depth image, such as (epsilon) n / 1 / (root) Sn.

次に、ステップＳ１３１２における視点移動工程では、ＣＰＵ１３５１は、ステップＳ１２００で設定した視点とは異なり且つ形状モデルを囲む視点群を新たに設定する。ここで、新たに設定する視点群とは、対象物体座標系における原点を中心にして、ステップＳ１２００で設定した全視点を同一方向に回転させた視点群を意味する。回転変換をＲとすると、対象物体座標系における視点ｎの位置Ｖｎは以下のようにして更新される。 Next, in the viewpoint moving step in step S1312, the CPU 1351 newly sets a viewpoint group that is different from the viewpoint set in step S1200 and surrounds the shape model. Here, the newly set viewpoint group means a viewpoint group obtained by rotating all viewpoints set in step S1200 in the same direction with the origin in the target object coordinate system as the center. If the rotation transformation is R, the position Vn of the viewpoint n in the target object coordinate system is updated as follows.

ここでＴは行列の転地を表す。回転変換Ｒは、例えばｘ−ｙ−ｚ系のオイラー角で表わした場合、以下のように表現できる。 Here, T represents the shift of the matrix. The rotational transformation R can be expressed as follows, for example, when expressed by an xyz system Euler angle.

視点移動のための回転変換Ｒは乱数で与えてもよい。例えば、式（６）におけるθｘ、θｙ、θｚの値を、それぞれ０〜１８０ｄｅｇの間でランダムに選択する。あるいは、測地ドーム上の視点間の最小角度θｄに対して比例する値で与えるなどとしてもよい。例えば、θｘ、θｙ、θｚ＝θｄ／１０などと設定してもよい。 The rotation transformation R for moving the viewpoint may be given as a random number. For example, the values of θx, θy, and θz in Equation (6) are each randomly selected between 0 and 180 deg. Alternatively, a value proportional to the minimum angle θd between the viewpoints on the geodetic dome may be given. For example, θx, θy, θz = θd / 10 may be set.

そして、ステップＳ１３１２で視点を再設定した後は、処理はステップＳ１３１１に戻り、ステップＳ１３１１では、この再設定した視点を用いて再度、それぞれの視点に対する平面度Ｐｎを求める。 Then, after resetting the viewpoint in step S1312, the process returns to step S1311. In step S1311, the flatness Pn for each viewpoint is obtained again using the reset viewpoint.

ステップＳ１４００における学習画像出力工程では、ＣＰＵ１３５１は、上記の処理によって確定した、各姿勢クラスに対応するデプス画像を学習画像として外部記憶装置１３５６やＲＡＭ１３５２などのメモリに出力する。出力先については特定の出力先に限るものではない。 In the learning image output step in step S1400, the CPU 1351 outputs the depth image determined by the above processing and corresponding to each posture class as a learning image to a memory such as the external storage device 1356 or the RAM 1352. The output destination is not limited to a specific output destination.

ステップＳ１５００における識別器生成工程では、ＣＰＵ１３５１は、この出力された姿勢クラスごとのデプス画像を学習用画像として用いて、対象物体の姿勢を識別する識別器の学習処理を行う。識別器の学習に関する手続きについては前述したため、ここでの説明は省略する。 In the classifier generation step in step S1500, the CPU 1351 performs a learning process for a classifier that identifies the posture of the target object, using the output depth image for each posture class as a learning image. Since the procedure related to learning of the discriminator has been described above, description thereof is omitted here.

なお、本実施形態を含む以降の各実施形態では、枝刈りのないツリーを例として説明したが、識別器の種類はこれに限定されるものではない。ツリー学習時に枝刈りが行われていてもよいし、前述の非特許文献１で示したＦＥＲＮなどを使ってもよい。これらの識別器の場合には、終端ノードに対してクラスが一意に割り振られておらず、学習時に終端ノードに残った学習画像の割合が、各クラスに対する推定スコアとして得られる。入力画像Ｉが与えられたときに、ツリーｍの終端ノードで得られたクラスνに関するスコアをＰｍ（ν｜Ｉ）で得たとすると、識別器全体のスコアは以下の式（７）で与えられる。 In the following embodiments including this embodiment, a tree without pruning has been described as an example, but the type of classifier is not limited to this. Pruning may be performed at the time of tree learning, or FERN shown in Non-Patent Document 1 may be used. In the case of these classifiers, classes are not uniquely assigned to the terminal nodes, and the ratio of learning images remaining in the terminal nodes during learning is obtained as an estimated score for each class. Assuming that the score for the class ν obtained at the terminal node of the tree m is obtained by Pm (ν | I) when the input image I is given, the score of the entire classifier is given by the following equation (7). .

もしくは、下記のように与える場合もある。 Or it may be given as follows.

また、本実施形態を含む以降の各実施形態ではデプスマップによる学習画像を例にして具体的に説明されているが、デプスマップに限定されるものではない。例えば、ＣＧレンダリングによる輝度画像を用いてもよい。 Further, in each of the subsequent embodiments including this embodiment, the learning image by the depth map is specifically described as an example, but the present invention is not limited to the depth map. For example, a luminance image by CG rendering may be used.

このようにして対象物体を構成する平面が視点方向に直交しないように視点を更新することで、特定の姿勢クラス近傍における識別力の低下を抑制することができる。具体的には、以下のように説明される。 In this way, by updating the viewpoint so that the plane constituting the target object is not orthogonal to the viewpoint direction, it is possible to suppress a decrease in discrimination power near a specific posture class. Specifically, it will be described as follows.

上記で示した方法により、図６における視点Ａ，Ｂ，Ｃ、Ｚはそれぞれ、視点Ａ’、Ｂ’、Ｃ’、Ｚ’に移動する。これにより、視点Ａ’、Ｂ’、Ｃ’、Ｚ’の姿勢クラスのスコア曲線はそれぞれ、図６（ｂ）におけるＢ１４０、Ｂ１４３、Ｂ１４４、Ｂ１４２のようになり、図６（ａ）における視点Ａの近辺の領域Ｂ１１０のような姿勢認識精度の低下を抑制することができる。 By the method described above, the viewpoints A, B, C, and Z in FIG. 6 move to the viewpoints A ′, B ′, C ′, and Z ′, respectively. Thereby, the score curves of the posture classes of the viewpoints A ′, B ′, C ′, and Z ′ are respectively as B140, B143, B144, and B142 in FIG. 6B, and the viewpoint A in FIG. It is possible to suppress a decrease in posture recognition accuracy like the region B110 in the vicinity of.

なお、本実施形態で説明した構成は上記の通り、あくまで一例に過ぎず、以下に説明する構成の一例に過ぎない。即ち、本実施形態に係る情報処理装置は、対象物体の姿勢を推定する識別器の学習を行うために該識別器に与える学習用画像を生成する情報処理装置である。この情報処理装置では、対象物体の形状モデルに対して少なくとも１つの視点を設定する。そして、設定された視点から形状モデルを観察した場合、設定された視点と形状モデルとが所定の条件を満たした場合に、設定された視点から形状モデルを見た該形状モデルの画像を、学習用画像として生成する。 Note that the configuration described in the present embodiment is merely an example as described above, and is merely an example of the configuration described below. That is, the information processing apparatus according to the present embodiment is an information processing apparatus that generates a learning image to be given to a discriminator in order to learn a discriminator that estimates the posture of the target object. In this information processing apparatus, at least one viewpoint is set for the shape model of the target object. When the shape model is observed from the set viewpoint, when the set viewpoint and the shape model satisfy a predetermined condition, an image of the shape model viewed from the set viewpoint is learned. It generates as an image.

［第２の実施形態］
第１の実施形態では、全ての視点の平面度が閾値以下であった場合に視点更新を終了したが、さまざまな視点設定の中でより適したものを選択してもよい。その場合の処理を、図７（ａ）および図８（ｂ）を用いて説明する。第２の実施形態以降では、第１の実施形態との差分のみについて説明し、第１の実施形態と同様の点については説明を省略している。 [Second Embodiment]
In the first embodiment, the viewpoint update is terminated when the flatness of all viewpoints is equal to or less than the threshold value. However, a more suitable viewpoint may be selected from various viewpoint settings. Processing in that case will be described with reference to FIGS. 7A and 8B. In the second and subsequent embodiments, only differences from the first embodiment will be described, and descriptions of the same points as in the first embodiment will be omitted.

ステップＳ１１００及びステップＳ１２００における処理は第１の実施形態と同様である。しかし、本実施形態ではそれに加え、ステップＳ１２００において、設定されたＮ個の視点の集合を視点集合と呼び、ｔ＝０と初期化して、ここで設定された視点集合を初期視点集合Ｕ_０とする。 The processes in step S1100 and step S1200 are the same as in the first embodiment. However, in this embodiment, in addition to that, in step S1200, the set of N viewpoints set is called a view set, initialized to t = 0, and the set view set here is set as an initial view set U ₀ . To do.

ステップＳ１３００では、図８（ｂ）に示した処理を実行する。ステップＳ１３２１における平面度算出工程では、上記のステップＳ１３１１と同様にして、視点集合Ｕ_ｔに含まれる各視点ｎの平面度Ｐｎを算出する。この算出は以下の式（９）に従う。 In step S1300, the process shown in FIG. 8B is executed. The flatness calculating step in step S1321, in the same manner as in step S1311, calculates a flatness Pn of respective viewpoints n included in the viewpoint set _{U t.} This calculation follows the following equation (9).

ここでΝｎは、視点ｎに関する面内回転バリエーションで得られるクラスの集合である。そして以下の式（１０）に示す如く、各視点の平面度Ｐｎのうち、最も大きな値となるものを最大平面度Ｐ_{ｔ，ｍａｘ}とする。 Here, Νn is a set of classes obtained by the in-plane rotation variation with respect to the viewpoint n. Then, as shown in the following formula (10), the largest flatness _{Pt, max} of the flatness Pn of each viewpoint is set as the maximum flatness _{Pt, max} .

次に、所定数Ｔの視点集合が生成された場合には処理はステップＳ１３２３における視点選択工程に進み、まだ所定数Ｔの視点集合が生成されていない場合には、処理はステップＳ１３２２における視点移動工程に進む。所定数Ｔの値は、例えば１０などの値で設定すればよい。Ｔは大きいほど良いが、ステップＳ１３２１にかかる処理時間とのトレードオフで設定する。 Next, if a predetermined number T of viewpoint sets are generated, the process proceeds to a viewpoint selection step in step S1323. If a predetermined number T of viewpoint sets have not yet been generated, the process proceeds to step S1322. Proceed to the process. The predetermined number T may be set to a value such as 10, for example. A larger T is better, but is set in a trade-off with the processing time required for step S1321.

ステップＳ１３２２では、視点集合Ｕ_ｔに対して所定の回転変換をかけることで、新たな視点集合Ｕ_ｔ＋１を生成する。この回転変換は第１の実施形態と同様に、ランダムに与える、あるいは測地ドーム上の視点間の最小角度θｄに対して比例する値で与える、などとしてもよい。そして処理をステップＳ１３２１に戻し、この新たな視点集合を用いて最大平面度を求める。 In step S1322, by applying a predetermined rotation transformation on the viewpoint set _{U t,} and generates a new perspective set _{U t + 1.} As in the first embodiment, this rotation conversion may be given randomly, or may be given as a value proportional to the minimum angle θd between the viewpoints on the geodetic dome. Then, the process returns to step S1321, and the maximum flatness is obtained using this new viewpoint set.

そして処理がステップＳ１３２３に進んだ時点で、所定数Ｔの視点集合が生成されているので、この所定数Ｔの視点集合のうち、最も小さい最大平面度を求めた視点集合Ｕ_ｃを特定する（式（１１））。 When the process proceeds to step S1323, since a predetermined number T of viewpoint sets are generated, among the predetermined number T of viewpoint sets, the viewpoint set U _c for which the smallest maximum flatness is obtained is specified ( Formula (11)).

そしてステップＳ１４００以降では、この特定した視点集合Ｕ_ｃ中の各視点に基づく各姿勢クラスに対応するデプス画像を学習画像とする。以降の処理については第１の実施形態と同様であるため、説明は省略する。 In step S1400 and subsequent steps, the depth image corresponding to each posture class based on each viewpoint in the specified viewpoint set U _{c is} set as a learning image. Since the subsequent processing is the same as in the first embodiment, description thereof is omitted.

［第３の実施形態］
第１の実施形態、および第２の実施形態では平面度を基準にして視点を選択したが、視線に対して直交する面の割合が少なくなるように視点を選択してもよい。その場合の処理を、図７（ａ）および図８（ｃ）を用いて説明する。 [Third Embodiment]
In the first embodiment and the second embodiment, the viewpoint is selected based on the flatness, but the viewpoint may be selected so that the ratio of the plane orthogonal to the line of sight decreases. Processing in that case will be described with reference to FIGS. 7 (a) and 8 (c).

ステップＳ１１００及びステップＳ１２００における処理は第１の実施形態と同様である。ステップＳ１３００では、図８（ｃ）に示した処理を実行する。ステップＳ１３３１における法線頻度マップ生成工程では、形状モデルから、対象物体上の平面がどの割合でどちらを向いているのかを示す法線頻度マップを生成する。具体的には以下のようにして法線頻度マップを生成する。 The processes in step S1100 and step S1200 are the same as in the first embodiment. In step S1300, the process shown in FIG. 8C is executed. In the normal frequency map generation step in step S1331, a normal frequency map is generated from the shape model to indicate which direction the plane on the target object is facing. Specifically, the normal frequency map is generated as follows.

先ず、形状モデル（対象物体の形状モデル）を構成するＮｐ個のポリゴンに関して、ポリゴンｐ（＝１…Ｎｐ）の面積Ｓｐおよび対象物体座標系におけるポリゴンｐの法線方向正規化ベクトルｎｐ（｜ｎｐ｜＝１）を求める。全ポリゴンの面積の総和Ｓａｌｌで正規化した面積を、Ｒｐ＝Ｓｐ／Ｓａｌｌとする。これは、形状モデル全体の表面積におけるポリゴンｐの面積比を表わす。単位球上の基準軸ｎ０＝（１，０，０）に対して任意の回転変換Ψを行ったときに得られる単位球上の一点ｎ（Ψ）に対する法線頻度Ｆ（Ψ）を、下記のように定義する。 First, regarding Np polygons constituting the shape model (shape model of the target object), the area Sp of the polygon p (= 1... Np) and the normal direction normalization vector np (| np) of the polygon p in the target object coordinate system. | = 1) is obtained. Let Rp = Sp / Sall be an area normalized by the total area of all polygons. This represents the area ratio of the polygon p in the surface area of the entire shape model. A normal frequency F (Ψ) with respect to one point n (Ψ) on the unit sphere obtained when an arbitrary rotational transformation Ψ is performed on the reference axis n0 = (1, 0, 0) on the unit sphere is expressed as follows. Define as follows.

ここで、δ（Ａ）は、条件Ａが真である場合には１を、偽である場合には０を返す関数である。回転変換ΨのバリエーションはＮｐ個以上は存在しないため、ｎ０からｎｐ（ｐ＝１…Ｎｐ）の回転変換バリエーションに関してＦ（Ψ）を計算すれば、Ｆ（Ψ）の集合として離散的な法線頻度マップが得られることになる。図９で具体例を示す。 Here, δ (A) is a function that returns 1 when the condition A is true and returns 0 when the condition A is false. Since Np or more variations of the rotational transformation Ψ do not exist, if F (Ψ) is calculated for rotational transformation variations from n0 to np (p = 1... Np), a discrete normal as a set of F (Ψ). A frequency map will be obtained. A specific example is shown in FIG.

基準軸ｎ０は、Ｃ１００で表わされている。基準軸Ｃ１００に対してＣ３００で表わされる回転変換ΨをかけたベクトルがＣ２００である。このベクトルＣ２００と同じ向きの正規化法線ベクトルＣ２１０、Ｃ２２０を持つポリゴンＣ４１０、Ｃ４２０に対する、対象物体全体の表面積における各ポリゴンの面積比の総和が、上記の式（１２）によって算出され、Ｃ５１０で示される頻度Ｆ（Ψ）となる。 The reference axis n0 is represented by C100. A vector obtained by multiplying the reference axis C100 by the rotational transformation Ψ represented by C300 is C200. The sum of the area ratios of the polygons in the surface area of the entire target object with respect to the polygons C410 and C420 having normalized normal vectors C210 and C220 in the same direction as the vector C200 is calculated by the above equation (12). The frequency F (Ψ) shown is obtained.

この得られた頻度Ｆ（Ψ）から、以下のような評価値Ｅｎを算出する。 The following evaluation value En is calculated from the obtained frequency F (Ψ).

ここで、ａｎｇ（ａ、ｂ）は、ベクトルａとベクトルｂのなす角を表す。Ｖｎおよびｎ（Ψ）はいずれも単位ベクトルであるため、以下の式を満たす。 Here, ang (a, b) represents an angle formed by the vector a and the vector b. Since Vn and n (Ψ) are both unit vectors, the following expression is satisfied.

ステップＳ１３３２における視点最適化工程では、上記のようにして得られた評価値Ｅｎを、前述の第１の実施形態および第２の実施形態における平面度Ｐｎと同様にして、視点の移動もしくは視点の選択を行うことで、視点の位置を決定する。以降の処理に関しては、第１の実施形態および第２の実施形態と同様であるため、説明は省く。 In the viewpoint optimization step in step S1332, the evaluation value En obtained as described above is used to change the viewpoint or the viewpoint in the same manner as the flatness Pn in the first and second embodiments described above. By selecting, the position of the viewpoint is determined. Since the subsequent processing is the same as in the first embodiment and the second embodiment, a description thereof will be omitted.

［第４の実施形態］
第３の実施形態のように、視線に対して直交する面の割合を評価値とした場合、その割合が少なくなるように視点を逐次更新してもよい。その場合の処理を、図７（ａ）および図８（ｃ）を用いて説明する。 [Fourth Embodiment]
As in the third embodiment, when the ratio of the plane orthogonal to the line of sight is used as the evaluation value, the viewpoint may be sequentially updated so that the ratio decreases. Processing in that case will be described with reference to FIGS. 7 (a) and 8 (c).

ステップＳ１１００及びステップＳ１２００における処理は第１の実施形態と同様である。ステップＳ１３００では、図８（ｃ）に示した処理を実行する。ステップＳ１３３１における法線頻度マップ生成工程では、第３の実施形態と同様にして、法線頻度Ｆ（Ψ）を求める。 The processes in step S1100 and step S1200 are the same as in the first embodiment. In step S1300, the process shown in FIG. 8C is executed. In the normal frequency map generation step in step S1331, the normal frequency F (Ψ) is obtained in the same manner as in the third embodiment.

次に、ステップＳ１３３２における視点最適化工程では、得られた法線頻度マップを用いて視点を更新していく。先ず、ステップＳ１２００で設定された視点位置Ｖｎ＝（Ｘｖｎ，Ｙｖｎ，Ｚｖｎ）（ｎ＝１…Ｎ）に対して、回転変換Ｒ（θ）を行ったときのコストＥｎ（θ）を、以下のように定義する。 Next, in the viewpoint optimization step in step S1332, the viewpoint is updated using the obtained normal frequency map. First, the cost En (θ) when the rotation transformation R (θ) is performed on the viewpoint position Vn = (Xvn, Yvn, Zvn) (n = 1... N) set in step S1200 is expressed as follows. Define as follows.

ここでａｎｇ（ａ、ｂ）は、ベクトルａとベクトルｂのなす角を表す。Ｖｎおよびｎ（Ψ）はいずれも単位ベクトルであるため、以下の式（１６）を満たす。 Here, ang (a, b) represents an angle formed by the vector a and the vector b. Since Vn and n (Ψ) are both unit vectors, the following equation (16) is satisfied.

ηはカーネルの幅を決める正の定数で、η＝２などと設定する。このコストＥｎが大きい状態とは、視点ｎから形状モデル中心を結ぶ視線方向に対して、形状モデルを構成する平面のうち、面積の大きなものが直交に近い状態にあることを意味する。すなわち、回転変換Ｒ（θ）を、コストＥｎが小さくなるように設定できれば、視線方向に対して直交する面が減るように視点を更新することができる。 η is a positive constant that determines the width of the kernel, and is set to η = 2 or the like. The state where the cost En is large means that a plane having a large area is nearly orthogonal to a line-of-sight direction connecting the shape model center from the viewpoint n. That is, if the rotational transformation R (θ) can be set so as to reduce the cost En, the viewpoint can be updated so that the number of planes orthogonal to the line-of-sight direction is reduced.

ステップＳ１２００で設定された視点に対して回転変換Ｒ（θ）を行った場合のエネルギー関数Ｅを、以下のように定義する。 The energy function E when the rotation transformation R (θ) is performed on the viewpoint set in step S1200 is defined as follows.

これを勾配法で解くと、θの更新式は以下のようになる。 If this is solved by the gradient method, the update formula of θ is as follows.

ここでζは０＜ζ＜１の範囲で設定される更新係数で、ζ＝０．１などと設定する。回転変換Ｒ（θ）は、初期設定では回転なし（Ｒ＝単位行列）に設定することが常套であるが、その初期値に限るものではない。回転変換Ｒ（θ）の初期値からスタートして、式（１８）を逐次実行して回転変換Ｒ（θ）を更新していく。回転変換Ｒ（θ）の変化が所定の閾値より小さくなった場合に、収束したものと見なし、逐次計算を終了する。そして式（１９）に示す如く、収束時の回転変換Ｒ（θ）をステップＳ１２００で設定されたすべての視点Ｖｎ（ｎ＝１…Ｎ）に対して行い、視点Ｖｎを更新する。 Here, ζ is an update coefficient set in a range of 0 <ζ <1, and ζ = 0.1 is set. The rotation conversion R (θ) is usually set to no rotation (R = unit matrix) by default, but is not limited to the initial value. Starting from the initial value of the rotation conversion R (θ), the rotation conversion R (θ) is updated by sequentially executing Expression (18). When the change of the rotational transformation R (θ) becomes smaller than a predetermined threshold, it is considered that the rotation has converged, and the sequential calculation is terminated. Then, as shown in Expression (19), the rotation conversion R (θ) at the time of convergence is performed for all the viewpoints Vn (n = 1... N) set in step S1200, and the viewpoint Vn is updated.

視点をすべて更新したら、処理はステップＳ１４００に進む。ステップＳ１４００では、上記の処理によって確定した各姿勢クラスに対応するデプス画像を学習画像として外部記憶装置１３５６やＲＡＭ１３５２などのメモリに出力する。以降の処理については第１の実施形態で説明したとおりであるため、説明は省略する。 When all the viewpoints are updated, the process proceeds to step S1400. In step S1400, the depth image corresponding to each posture class determined by the above processing is output as a learning image to a memory such as the external storage device 1356 or the RAM 1352. Since the subsequent processing is as described in the first embodiment, description thereof is omitted.

［第５の実施形態］
設定された視点における識別器の内部状態に対して、その近傍視点における識別器の状態が極端に変化しないかどうかを、視点更新の条件としてもよい。その場合の処理を、図７（ｂ）および図８（ｄ）を用いて説明する。 [Fifth Embodiment]
Whether or not the state of the classifier at the neighboring viewpoint does not change extremely with respect to the internal state of the classifier at the set viewpoint may be set as the viewpoint update condition. Processing in that case will be described with reference to FIGS. 7B and 8D.

ステップＳ１１００及びステップＳ１２００における処理は第１の実施形態と同様である。ステップＳ１７００における学習画像生成工程では、各姿勢クラスに対応するデプス画像を学習画像として取得する。なお、各姿勢クラスに対応するデプス画像の取得方法については第１の実施形態で説明したとおりである。ステップＳ１５００における処理は第１の実施形態で説明した通りである。 The processes in step S1100 and step S1200 are the same as in the first embodiment. In the learning image generation step in step S1700, a depth image corresponding to each posture class is acquired as a learning image. The depth image acquisition method corresponding to each posture class is as described in the first embodiment. The processing in step S1500 is as described in the first embodiment.

次に、ステップＳ１３００で行う処理について、同処理の詳細を示す図８（ｄ）のフローチャートを用いて説明する。ステップＳ１３４１における視点変動画像生成工程では、現在設定されている視点から微小移動させた近傍視点を設定し、その近傍視点から見たときの形状モデルの画像を生成する。ここで近傍視点とは、設定されている視点Ｖｎ（ｎ＝１，，，Ｎ）に対し、Ｖｎからわずかにずらした位置の視点のことである。図１０におけるＡ４６０がＶｎであったとすると、Ａ４６１で表わされる近傍視点位置Ｖｎｊ（ｊ＝１…Ｊ：図１０ではＪ＝６）は、Ａ４６０をＡ４７０で表わされる微小角度δだけずらして得られる。ここで微小角度δは、０に近い小さな値で、視点Ｖｎの隣接視点との角度より小さな値を設定する。例えば、Ｖｎの隣接視点Ｖｎ’とのなす角が７ｄｅｇであれば、δ＝３ｄｅｇなどと設定する。隣接視点Ｖｎ’は、以下のようにして定義される。 Next, the process performed in step S1300 will be described with reference to the flowchart of FIG. 8D showing details of the process. In the viewpoint variation image generation step in step S1341, a near viewpoint that is slightly moved from the currently set viewpoint is set, and an image of a shape model when viewed from the vicinity viewpoint is generated. Here, the near viewpoint is a viewpoint that is slightly shifted from Vn with respect to the set viewpoint Vn (n = 1,..., N). If A460 in FIG. 10 is Vn, the near viewpoint position Vnj (j = 1... J: J = 6 in FIG. 10) represented by A461 is obtained by shifting A460 by a minute angle δ represented by A470. Here, the small angle δ is a small value close to 0, and is set to a value smaller than the angle between the viewpoint Vn and the adjacent viewpoint. For example, if the angle between Vn and the adjacent viewpoint Vn ′ is 7 deg, δ = 3 deg is set. The adjacent viewpoint Vn ′ is defined as follows.

ここでａｎｇ（Ｖｎ，Ｖｋ）は、測地ドーム中心から見たときの視点Ｖｎと視点Ｖｋのなす角である。ここではＶｎとＶｋはいずれも単位ベクトルで表わされるため、以下の式（２１）が満たされる。 Here, ang (Vn, Vk) is an angle formed by the viewpoint Vn and the viewpoint Vk when viewed from the center of the geodetic dome. Here, since both Vn and Vk are represented by unit vectors, the following equation (21) is satisfied.

近傍視点は各視点Ｖｎに対して周回して複数個設定することが望ましく、図１０のようにＶｎを中心にして均等角度間隔で数点、例えば６０ｄｅｇ間隔で６点（Ｊ＝６）、などと設定してもよい。設定された近傍視点方向から見た形状モデルの画像Ｉ［ｎｊ，ｒ］（ｎ＝１…Ｎ、ｊ＝１…Ｊ、ｒ＝１…Ｎｒ）を生成する。 It is desirable to set a plurality of neighboring viewpoints around each viewpoint Vn. As shown in FIG. 10, several points are set at equal angular intervals around Vn, for example, 6 points at 60 deg intervals (J = 6), etc. May be set. An image I [nj, r] (n = 1... N, j = 1... J, r = 1... Nr) of the shape model viewed from the set near viewpoint direction is generated.

ステップＳ１３４２における比較工程では、設定されている視点とその近傍視点において識別器の状態を比較して、どの程度変化したかを見る。ステップＳ１７００で生成された視点Ｖｎに対する学習画像と、ステップＳ１３４１で生成された視点Ｖｎの近傍視点に対する画像と、をステップＳ１５００で生成された識別器に入力する。そのときの、それぞれの識別器の状態に関する類似度を算出する。類似度としてはさまざまなものが考えられるが、例えばツリー型識別器による例で考えると以下のようになる。 In the comparison step in step S1342, the state of the discriminator is compared between the set viewpoint and its neighboring viewpoints to see how much has changed. The learning image for the viewpoint Vn generated in step S1700 and the image for the viewpoint near the viewpoint Vn generated in step S1341 are input to the classifier generated in step S1500. At that time, the degree of similarity regarding the state of each classifier is calculated. Various similarities can be considered. For example, when considering an example using a tree type discriminator, it is as follows.

入力画像Ｉに対する視点ｎの面内回転ｒに対応する姿勢クラスν［ｎ，ｒ］に対する、複数ツリーによる投票スコアをＳＣＯＲＥ（ν［ｎ，ｒ］，Ｉ）としたとき、視点Ｖｎと近傍視点Ｖｎｊの類似度Ｓｉｍ（ｎ，ｊ）を以下のように定義する。 When the voting score by a plurality of trees for the posture class ν [n, r] corresponding to the in-plane rotation r of the viewpoint n with respect to the input image I is SCORE (ν [n, r], I), the viewpoint Vn and the neighboring viewpoint The similarity Sim (n, j) of Vnj is defined as follows.

ここでＩ［ｎ，ｒ］は、視点ｎの面内回転ｒに対応する姿勢クラスの学習画像である。他には、識別器が枝刈りされたツリーやＦＥＲＮであるなら、ツリーｍの終端ノードで得られたクラスν［ｎ，ｒ］に関するスコアをＰｍ（ν［ｎ，ｒ］｜Ｉ）としたとき、識別器全体の情報量を類似度として用いることもできる（式（２３））。 Here, I [n, r] is a learning image of the posture class corresponding to the in-plane rotation r of the viewpoint n. Otherwise, if the classifier is a pruned tree or FERN, the score for the class ν [n, r] obtained at the terminal node of the tree m is Pm (ν [n, r] | I) In some cases, the information amount of the entire discriminator can be used as the similarity (formula (23)).

視点ｎにおける近傍視点に対する類似度の最小値を、視点ｎの比較尺度Ｌｎとする（式（２４））。 The minimum value of the similarity between the viewpoint n and the near viewpoint is set as the comparison scale Ln of the viewpoint n (Expression (24)).

近傍視点に対する類似度の、全視点に関する尺度Ｌは、Ｌｎの総和（式（２５））、二乗和（式（２６））、最小値（式（２７））の何れかによって算出される。 The scale L for all viewpoints of the similarity to the nearby viewpoint is calculated by any one of the total sum of Ln (Expression (25)), the sum of squares (Expression (26)), and the minimum value (Expression (27)).

その他には、カルバック・ライブラー情報量（相対エントロピー）を利用して尺度Ｌを定義してもよい（式（２８））。 In addition, the scale L may be defined using the amount of information (relative entropy) of the Cullback-Liber (formula (28)).

そしてこのような何れかの方法によって求めた尺度Ｌが閾値θＬ以下であれば、処理はステップＳ１８００に進む。一方、尺度Ｌが閾値θＬより大きい場合は、処理はステップＳ１３４３に進む。ステップＳ１３４３における視点移動工程では、第１の実施形態で説明したステップＳ１３１２における処理と同様の処理を行って、視点の移動を行う。 If the scale L obtained by any of these methods is equal to or smaller than the threshold value θL, the process proceeds to step S1800. On the other hand, when the scale L is larger than the threshold value θL, the process proceeds to step S1343. In the viewpoint moving step in step S1343, the viewpoint is moved by performing the same process as the process in step S1312 described in the first embodiment.

ステップＳ１８００では、ステップＳ１３４３の処理を行って視点移動させたか否かを判断する。この判断の結果、視点移動させていない場合には、図７（ｂ）の処理は終了する。一方、視点移動させている場合には、処理はステップＳ１７００に戻り、移動させた新たな視点を用いて以降のステップを実行する。 In step S1800, it is determined whether the viewpoint is moved by performing the process in step S1343. If the result of this determination is that the viewpoint has not been moved, the processing in FIG. On the other hand, if the viewpoint is moved, the process returns to step S1700, and the subsequent steps are executed using the moved new viewpoint.

これにより、図７（ｂ）のフローチャートに従った処理が終了した時点で設定されている視点による学習画像を、最終的に識別器に与える学習画像として出力することができる。 Thereby, the learning image by the viewpoint set when the process according to the flowchart of FIG.7 (b) is complete | finished can be output as a learning image finally given to a discriminator.

なお、上記のような処理は、あらかじめ所定数Ｔの視点集合を生成しておき、すべてを比較して最も尺度Ｌの小さなものを選んでもよい。その場合は、ステップＳ１３００における処理の詳細は、図８（ｅ）のようになる。 In the above-described processing, a predetermined number T of viewpoint sets may be generated in advance, and all may be compared and the one with the smallest scale L may be selected. In that case, the details of the processing in step S1300 are as shown in FIG.

先ず、ステップＳ１２００で設定された視点集合をＵ_０として保持し、繰り返し回数ｔ＝０と初期化しておく。ステップＳ１３４１における処理は上記の通りである。そして、所定数Ｔの視点集合が生成されていなければ、ｔを１つインクリメントして上記と同様にしてステップＳ１３４３の処理を行い、得られた視点集合Ｕ_ｔを保持する。所定数Ｔは、第２の実施形態と同様にして設定する。所定数Ｔの視点集合が生成されたら、処理はステップＳ１３４４に進む。 First, the view point set that is set in step S1200 holds the U _0, keep initialization and number of repetitions t = 0. The processing in step S1341 is as described above. If a predetermined number T of viewpoint sets have not been generated, t is incremented by 1, and the process of step S1343 is performed in the same manner as described above, and the obtained viewpoint set U _t is held. The predetermined number T is set in the same manner as in the second embodiment. When the predetermined number T of viewpoint sets are generated, the process proceeds to step S1344.

ステップＳ１３４４における視点選択工程では、視点集合Ｕ_０，Ｕ_１…Ｕ_Ｔのそれぞれに対して、前述の尺度Ｌを算出する。そして、視点集合Ｕ_ｔに対する尺度をＬｔとしたときに、Ｌ０，Ｌ１…ＬＴのうち最小値Ｌｃを特定し、特定した最小値Ｌｃに対応する視点集合Ｕ_ｃを、最終的な視点として選択する（式（２９））。 In the viewpoint selection step in step S1344, the aforementioned scale L is calculated for each of the viewpoint sets U ₀ , U ₁ ... U _T. Then, when the scale for the viewpoint set U _t is Lt, the minimum value Lc is specified among L0, L1... LT, and the viewpoint set U _c corresponding to the specified minimum value Lc is selected as the final viewpoint. (Formula (29)).

選択された視点集合Ｕ_ｃに対応する学習画像で学習された識別器を、最終的な識別器とする。尺度Ｌｔの値はステップＳ１３４４ではなく、ステップＳ１３４１で求めるようにしてもよい。その場合、尺度ＬｔをステップＳ１３４４で利用できるように保持していれば、Ｌｔとそれまでの最小値との比較により、視点集合Ｕ_ｔ、それまでの最小値に対応する視点集合の何れかだけを残しておけばよい。その為、生成された学習画像や近傍視点画像をすべて保持しておく必要はない。 The classifier learned from the learning image corresponding to the selected viewpoint set U _c is set as the final classifier. The value of the scale Lt may be obtained in step S1341 instead of step S1344. In that case, if the scale Lt is held so that it can be used in step S1344, only one of the viewpoint set U _t and the viewpoint set corresponding to the previous minimum value is compared by comparing Lt with the previous minimum value. You should leave For this reason, it is not necessary to retain all of the generated learning images and neighboring viewpoint images.

［第６の実施形態］
設定された視点における学習画像から学習して得られた識別器の状態を視点更新の条件としてもよい。その場合の処理を、図７（ｂ）および図１１（ａ）を用いて説明する。ステップＳ１１００、ステップＳ１２００、ステップＳ１７００、ステップＳ１５００における処理は第１の実施形態と同様である。 [Sixth Embodiment]
The state of the discriminator obtained by learning from the learning image at the set viewpoint may be used as the viewpoint update condition. Processing in that case will be described with reference to FIGS. 7B and 11A. The processes in step S1100, step S1200, step S1700, and step S1500 are the same as those in the first embodiment.

ステップＳ１３００における処理について、同処理の詳細を示す図１１（ａ）のフローチャートを用いて説明する。ステップＳ１３５１における識別器判定工程では、学習して得られた識別器の内部状態がどの程度偏っているかを判定する。ここで偏りとはさまざまなものが考えられる。例えば、条件分岐における偏りを尺度として考えれば、姿勢クラスνに対する尺度Ｂνは以下のように定義できる。 The process in step S1300 will be described with reference to the flowchart of FIG. 11A showing details of the process. In the discriminator determining step in step S1351, it is determined how much the internal state of the discriminator obtained by learning is biased. Here, various biases can be considered. For example, if the bias in the conditional branch is considered as a scale, the scale Bν for the posture class ν can be defined as follows.

ここでρｍ（Ｉν，ｍ，ｄ）は、ツリーｍ（ｍ＝１，，，Ｍ）に学習画像Ｉνを入力したときの、深さｄ（ｄ＝１，，，Ｄ）におけるノードにおける分岐が、式（１）の条件に従う場合には０を、式（２）の条件に従う場合には１を返す関数であるとする。ただし、Ｄはツリーの根ノードから終端ノードまでの分岐数である。式（３１）で表わされる値Ｃν，ｍは、ツリーｍにおいて式（１）の条件に従う分岐をカウントした値となる。各ノードの分岐が完全に無作為であれば、Ｃν，ｍはＢ（Ｄ，０．５）の二項分布に従う。よって、Ｃν，ｍを正規化した値である尺度Ｂνは、最小で０、最大で１の値を取り得り、分岐に偏りがなければ０．５に近い値を取る。学習画像Ｉν（ν＝１，，，Ｎ×Ｎｒ）を入力したときの尺度Ｂνの最大値を、Ｂｍａｘとする（式（３２））。 Here, ρm (Iν, m, d) is a branch at a node at a depth d (d = 1,, D) when the learning image Iν is input to the tree m (m = 1,, M). Assume that the function returns 0 when the condition of the expression (1) is satisfied, and returns 1 when the condition of the expression (2) is satisfied. However, D is the number of branches from the root node to the end node of the tree. The value Cν, m represented by the expression (31) is a value obtained by counting branches according to the condition of the expression (1) in the tree m. If the branch of each node is completely random, Cν, m follows a binomial distribution of B (D, 0.5). Therefore, the scale Bν, which is a value obtained by normalizing Cν, m, can take a value of 0 at the minimum and 1 at the maximum, and takes a value close to 0.5 if there is no bias in the branch. The maximum value of the scale Bν when the learning image Iν (ν = 1,, N × Nr) is input is defined as Bmax (Formula (32)).

得られた尺度Ｂｍａｘが所定の閾値θより大きければ、処理はステップＳ１３５３に進み、閾値以下であれば、処理はステップＳ１８００に進む。閾値は０．５＜θ＜１の範囲で設定され、例えばθ＝０．７などと与える。あるいは、学習画像に対してノイズを加えたときのロバストさを、偏りの尺度としてもよい。学習画像Ｉνに対して、Ｗ通りのホワイトノイズを加えた画像をＩν，ｗ（ｗ＝１…Ｗ）とする。このとき、尺度Ｂνを以下のように定義する。 If the obtained scale Bmax is greater than the predetermined threshold value θ, the process proceeds to step S1353, and if it is equal to or less than the threshold value, the process proceeds to step S1800. The threshold value is set in a range of 0.5 <θ <1, and is given as θ = 0.7, for example. Alternatively, robustness when noise is added to the learning image may be used as a measure of the bias. An image obtained by adding W kinds of white noise to the learning image Iν is denoted by Iν, w (w = 1... W). At this time, the scale Bν is defined as follows.

この値が小さければ、姿勢νはノイズに対してロバストであることになる。式（３２）と同様にして、この尺度Ｂνの最大値を尺度Ｂｍａｘとする。ここではノイズは学習画像に対して与えるとしたが、各ノードの分岐時にノイズを付加してもロバストさを測ることはできる。 If this value is small, the posture ν is robust against noise. In the same manner as in the equation (32), the maximum value of the scale Bν is set as the scale Bmax. Although noise is given to the learning image here, robustness can be measured even if noise is added at the time of branching of each node.

ステップＳ１３５３における視点移動工程では、第１の実施形態で説明したステップＳ１３１２における処理と同様の処理を行って、視点の移動を行う。これにより、図７（ｂ）のフローチャートに従った処理が終了した時点で設定されている視点による学習画像を、最終的に識別器に与える学習画像として出力することができる。 In the viewpoint movement process in step S1353, the viewpoint is moved by performing the same process as the process in step S1312 described in the first embodiment. Thereby, the learning image by the viewpoint set when the process according to the flowchart of FIG.7 (b) is complete | finished can be output as a learning image finally given to a discriminator.

なお、上記のような処理は、あらかじめ所定数Ｔの視点集合を生成しておき、すべてを比較して尺度Ｂｍａｘが最も小さくなるものを選んでもよい。その場合は、ステップＳ１３００における処理の詳細は、図１１（ｂ）のようになる。 In the above-described processing, a predetermined number T of viewpoint sets may be generated in advance, and all of them may be compared and the one with the smallest scale Bmax may be selected. In that case, details of the processing in step S1300 are as shown in FIG.

先ず、ステップＳ１２００で設定された視点集合をＵ_０として保持し、繰り返し回数ｔ＝０と初期化しておく。ステップＳ１３５１における処理は上記の通りである。そして、所定数Ｔの視点集合が生成されていなければ、ｔを１つインクリメントして上記と同様にしてステップＳ１３５３の処理を行い、得られた視点集合Ｕ_ｔを保持する。所定数Ｔは、第２の実施形態と同様にして設定する。所定数Ｔの視点集合が生成されたら、処理はステップＳ１３５４に進む。 First, the view point set that is set in step S1200 holds the U _0, keep initialization and number of repetitions t = 0. The processing in step S1351 is as described above. If a predetermined number T of viewpoint sets have not been generated, t is incremented by 1, and the process of step S1353 is performed in the same manner as described above, and the obtained viewpoint set U _t is held. The predetermined number T is set in the same manner as in the second embodiment. If the predetermined number T of viewpoint sets are generated, the process proceeds to step S1354.

ステップＳ１３５４における視点選択工程では、視点集合Ｕ_０，Ｕ_１…Ｕ_Ｔのそれぞれに対して、前述の尺度Ｂｍａｘを算出する。そして、視点集合Ｕ_ｔに対する尺度をＢｔ，ｍａｘとしたときに、Ｂ０，ｍａｘ，Ｂ１，ｍａｘ…ＢＬ，ｍａｘのうち最小値Ｂｃ，ｍａｘを特定し、特定した最小値Ｂｃ，ｍａｘに対応する視点集合Ｕ_ｃを、最終的な視点として選択する（式（３４））。 In the viewpoint selection step in step S1354, the aforementioned scale Bmax is calculated for each of the viewpoint sets U ₀ , U ₁ ... U _T. Then, when the scale for the viewpoint set U _t is Bt, max, the minimum value Bc, max is specified from B0, max, B1, max ... BL, max, and the viewpoint corresponding to the specified minimum value Bc, max is specified. The set U _c is selected as the final viewpoint (formula (34)).

そして、選択された視点集合Ｕ_ｃに対応する学習画像で学習された識別器を、最終的な識別器とする。 Then, the classifier learned from the learning image corresponding to the selected viewpoint set U _c is set as a final classifier.

［第７の実施形態］
第１〜６の実施形態で得られた視点に対し、さらに特徴が急激に変化する視点を追加してやってもよい。その場合、各実施形態におけるステップＳ１３００の最後に、以下のような処理を行う。 [Seventh Embodiment]
You may add the viewpoint from which the characteristic changes rapidly with respect to the viewpoint obtained in 1st-6th embodiment. In that case, the following processing is performed at the end of step S1300 in each embodiment.

ステップＳ１２００若しくはステップＳ１３００で生成された視点すべてに対して、上記の平面度Ｐｎを算出する。そして、算出された平面度Ｐｎが、第１の実施形態で述べられている閾値εｎより大きくなる視点は、ステップＳ１３００で得られた視点の集合に追加する。即ち、更に、それぞれの位置（各視点）のうち評価値が閾値より大きい位置からそれぞれ異なるロール角で形状モデルを見た該形状モデルの画像を、学習用画像として出力する。これにより、図６（ｃ）におけるＢ４４１で表わされる視点Ｙが追加され、そのスコア曲線Ｂ１４１で示されるように、特異姿勢における識別能力が補間されることになる。 The flatness Pn is calculated for all the viewpoints generated in step S1200 or step S1300. Then, viewpoints for which the calculated flatness Pn is greater than the threshold value εn described in the first embodiment are added to the set of viewpoints obtained in step S1300. In other words, an image of the shape model obtained by viewing the shape model at different roll angles from the position where the evaluation value is greater than the threshold value among the positions (each viewpoint) is output as a learning image. As a result, the viewpoint Y represented by B441 in FIG. 6C is added, and the discrimination ability in the specific posture is interpolated as indicated by the score curve B141.

［第８の実施形態］
二点比較による識別器の場合で、設定した視点の中に特異点となる姿勢が含まれていることが分かった場合には、学習画像自体にバイアスを加えてもよい。その場合の処理を、図７（ｃ）を用いて説明する。 [Eighth Embodiment]
In the case of a classifier based on a two-point comparison, if it is found that the posture that becomes a singular point is included in the set viewpoint, a bias may be applied to the learning image itself. Processing in that case will be described with reference to FIG.

ステップＳ１１００，ステップＳ１２００，ステップＳ１７００のそれぞれにおける処理は上記の通りである。ステップＳ１６００における学習画像更新工程では、まず、学習画像の中で特異点となる姿勢クラスの学習画像が含まれているかどうかを判断する。判断基準としては、第１の実施形態で説明した平面度Ｐｎを利用する。すべての視点ｎ＝１…Ｎの中でひとつでも平面度Ｐｎが閾値εｎより大きければ、すべての学習画像に対してバイアスを与える。学習画像に与えるバイアスとは、例えば図１２のように、画像座標に対して線形に与えるものである。Ｃ６００は学習画像Ｉνであり、それを切断位置Ｃ６１０で切断した時のデプスがＣ６２０で与えられているとする。このとき、線形のバイアスをかけることで、断面のデプスはＣ６３０のように更新される。学習画像Ｉνの学習画像座標系（学習画像中の１点（例えば左上隅）を原点とする座標系）の位置Ｘ＝（ｘ、ｙ）におけるデプス値がＤν（Ｘ）であったときに、バイアスを付加したデプス値として、Ｄν（Ｘ）は以下のように更新される。 The processing in each of step S1100, step S1200, and step S1700 is as described above. In the learning image update step in step S1600, first, it is determined whether or not a learning image of a posture class that is a singular point is included in the learning image. As the determination criterion, the flatness Pn described in the first embodiment is used. If at least one of all viewpoints n = 1... N is greater than the threshold value .epsilon.n, a bias is applied to all learning images. The bias given to the learning image is given linearly with respect to the image coordinates as shown in FIG. 12, for example. C600 is the learning image Iν, and it is assumed that the depth when the image is cut at the cutting position C610 is given by C620. At this time, by applying a linear bias, the depth of the cross section is updated like C630. When the depth value at the position X = (x, y) of the learning image coordinate system of the learning image Iν (the coordinate system having one point in the learning image (for example, the upper left corner) as the origin) is Dν (X), As a depth value to which a bias is added, Dν (X) is updated as follows.

バイアス関数ｆ（Ｘ）は、Ｘに対して線形な関数であり、以下のように定義される。 The bias function f (X) is a linear function with respect to X, and is defined as follows.

ただし、ａ、ｂ、ｃは定数である。勾配を表わす定数ａおよびｂは、作業領域における画像解像度と、測地ドームの最短視点間角度によって決定される。例えば、画像解像度が１ｍｍ／画素で、最短視点間角度が７ｄｅｇであったとすると、ａ＝０．１、ｂ＝０、ｃ＝０などと設定する。 However, a, b, and c are constants. The constants a and b representing the gradient are determined by the image resolution in the work area and the shortest viewpoint angle of the geodetic dome. For example, if the image resolution is 1 mm / pixel and the shortest viewpoint angle is 7 deg, a = 0.1, b = 0, c = 0, and the like are set.

ステップＳ１５００では、ステップＳ１６００で更新された学習画像を用いて、対象物体の姿勢を識別する識別器の学習処理を行う。学習して得られた識別器を使って入力画像Ｉに対して識別を行う際には、画像全体に対して上記と同様のバイアスを加える。入力画像座標系におけるｘ軸とｙ軸は、学習画像座標系におけるｘ軸とｙ軸に対して同じ方向であれば、式（３６）のバイアスを同様にして与えてやればよい。すなわち、入力画像における位置ＸＳ＝（ｘＳ、ｙＳ）のデプス値Ｄ（ＸＳ）を、以下のように更新してから識別器による探索を行う。 In step S1500, a learning process for a discriminator for identifying the posture of the target object is performed using the learning image updated in step S1600. When discriminating the input image I using the discriminator obtained by learning, a bias similar to the above is applied to the entire image. If the x-axis and y-axis in the input image coordinate system are in the same direction with respect to the x-axis and y-axis in the learning image coordinate system, the bias of equation (36) may be given in the same manner. That is, the depth value D (XS) at the position XS = (xS, yS) in the input image is updated as follows, and then the search by the discriminator is performed.

二点比較による分岐を考えると、デプスの大小関係のみが重要であるため、入力画像と学習画像の原点位置の違いは影響を与えない。これにより、図４のような大きな面Ａ４１０が視線方向Ａ４４０の軸と直交するような状態に対して、疑似的にデプスに勾配を与えたこととなり、式（２）の条件ばかりが満たされるような特殊な状況を回避することができる。 Considering the bifurcation by the two-point comparison, only the magnitude relation of the depth is important, so the difference in the origin position between the input image and the learning image has no effect. As a result, the depth is artificially given to the state where the large surface A410 as shown in FIG. 4 is orthogonal to the axis of the line-of-sight direction A440, so that only the condition of Expression (2) is satisfied. Special situations can be avoided.

（その他の実施例）
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 (Other examples)
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

Claims

An information processing apparatus for generating a learning image to be given to a discriminator for learning a discriminator for estimating a posture of a target object,
Setting means for setting at least one viewpoint for the shape model of the target object;
Deriving means for deriving the flatness of the shape model from the distribution of distance values from the viewpoint to each region in the shape model when the shape model is observed from the set viewpoint;
Generating means for generating, as the learning image, an image of the shape model obtained by viewing the shape model from the set viewpoint when the flatness derived from the set viewpoint is equal to or less than a threshold value. An information processing apparatus characterized by the above.

Further, when the flatness obtained by the derivation means is larger than the threshold value, the change means for changing the viewpoint position,
2. The information processing according to claim 1 , wherein the generation unit generates an image of the shape model obtained by viewing the shape model from the viewpoint at the viewpoint position changed by the changing unit as the learning image. apparatus.

An information processing apparatus for generating a learning image to be given to a discriminator for learning a discriminator for estimating a posture of a target object,
Setting means for setting at least one viewpoint for the shape model of the target object;
A derivation means for deriving an evaluation value representing the percentage of the surface and each surface constituting the viewing direction and the shape model when observing the shape model from the set viewpoint orthogonal,
Generating means for generating, as the learning image, an image of the shape model obtained by viewing the shape model from a viewpoint in which the evaluation value is set at a position greater than a threshold value among a plurality of positions ;
The information processing apparatus comprising: a.

Furthermore, based on the evaluation value obtained by the derivation means, the update means for updating the position,
Said generating means, according to claim 3, characterized in that the image of the shape model seen the shape model from a viewpoint set in the position updated by said updating means, for generating as the learning image Information processing device.

Moreover, using said learning image processing apparatus according to any one of claims 1 to 4, characterized in that it comprises means for performing learning of the classifier.

An information processing method performed by an information processing apparatus that generates a learning image to be given to a discriminator in order to learn a discriminator that estimates the posture of a target object,
A setting step in which the setting unit of the information processing apparatus sets at least one viewpoint for the shape model of the target object;
The derivation means of the information processing device derives the flatness of the shape model from the distribution of distance values from the viewpoint to each region in the shape model when the shape model is observed from the set viewpoint. A derivation process;
When the flatness derived at the set viewpoint is less than or equal to a threshold value , the generation unit of the information processing apparatus converts the shape model image obtained by viewing the shape model from the set viewpoint. An information processing method comprising: a generation step of generating as

An information processing method performed by an information processing apparatus that generates a learning image to be given to a discriminator in order to learn a discriminator that estimates the posture of a target object,
A setting step in which the setting unit of the information processing apparatus sets at least one viewpoint for the shape model of the target object;
A derivation step in which the derivation means of the information processing device derives an evaluation value that represents a ratio of a plane in which the line-of-sight direction when the shape model is observed from the set viewpoint and each surface constituting the shape model is orthogonal When,
A generation step in which the generation unit of the information processing apparatus generates, as the learning image, an image of the shape model obtained by viewing the shape model from a viewpoint in which the evaluation value is set to a position larger than a threshold value among a plurality of positions. When
An information processing method comprising:

A computer program for causing a computer to function as each unit of the information processing apparatus according to any one of claims 1 to 5 .