JP6255125B2

JP6255125B2 - Image processing apparatus, image processing system, and image processing method

Info

Publication number: JP6255125B2
Application number: JP2017076805A
Authority: JP
Inventors: 雅人青葉
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-04-07
Filing date: 2017-04-07
Publication date: 2017-12-27
Anticipated expiration: 2033-03-29
Also published as: JP2017120672A

Description

本発明は、入力画像から対象物体の画像を検出する画像認識処理に用いられる辞書生成用の学習画像を生成する画像処理装置、画像処理システム、および画像処理方法に関する。 The present invention relates to an image processing apparatus , an image processing system, and an image processing method for generating a learning image for generating a dictionary used for image recognition processing for detecting an image of a target object from an input image.

従来より、対象物体を撮像して得られた画像から該対象物体の画像を検出する画像認識に関し、さまざまな研究開発が行われている。画像認識技術はさまざまな分野に応用され、例えば顔認識や、工場における部品認識など、多くの実問題に利用されてきている。 Conventionally, various research and development have been conducted on image recognition for detecting an image of a target object from an image obtained by imaging the target object. Image recognition technology has been applied to various fields and has been used for many real problems such as face recognition and parts recognition in factories.

このような画像認識をパターン認識の観点で考えることができる。このパターン認識においても、入力された情報をどのようにしてクラス分類するか、という分類器の研究が行われている。このような研究として例えば、ニューラルネットワークやSVM(Support Vector Machine)、Randomized Treeといった様々な手法が提案されている。 Such image recognition can be considered from the viewpoint of pattern recognition. Also in this pattern recognition, research is being made on a classifier that classifies input information as a class. As such research, for example, various methods such as a neural network, SVM (Support Vector Machine), and Randomized Tree have been proposed.

これらの手法においては、予め画像認識用の辞書を生成しておく必要があるが、この辞書生成の際には学習画像が必要となる。近年の工業ロボットにおける視覚認識としては、例えば山積みされた複数種類の部品から所望の部品を検出する部品ピッキング処理など、3次元的に姿勢自由度の高い対象物体を認識するニーズもある。このように3次元的な姿勢を検出するためには、対象物体のさまざまな姿勢に対応した学習画像が必要となる。 In these methods, it is necessary to generate a dictionary for image recognition in advance, but a learning image is required when generating this dictionary. As visual recognition in recent industrial robots, there is a need to recognize a target object having a high degree of freedom in posture in three dimensions, such as a component picking process for detecting a desired component from a plurality of types of components stacked. In order to detect a three-dimensional posture in this way, learning images corresponding to various postures of the target object are required.

このようにロボットによる部品ピッキング等を目的とした認識タスクにおいては、対象物体の姿勢情報が極めて重要となる。学習画像に対応する姿勢は、オイラー角や四元数等のパラメータによって表現されるが、このような姿勢を既知とするような対象物体の学習画像を、実写画像として用意することは困難である。そのため、CADデータによる任意姿勢のCG画像を生成し、これを学習画像とすることが一般的である。 As described above, in the recognition task for the purpose of picking a part by the robot, the posture information of the target object is extremely important. The posture corresponding to the learning image is expressed by parameters such as Euler angles and quaternions. However, it is difficult to prepare a learning image of a target object that makes such posture known as a live-action image. . Therefore, it is common to generate a CG image of an arbitrary posture based on CAD data and use it as a learning image.

特開2010-243478号公報JP 2010-243478 A

上記従来の、CADデータから学習画像を生成する方法としては、CADデータのポリゴン表現による結合部分をエッジとして扱い、2値化されたエッジ画像を生成することが一般的である。そして実際の検出処理時には、部品群の実写入力画像に対してエッジ抽出処理を行い、エッジ画素をベースとしたマッチングを行うことによって、対象物体の位置および姿勢を同定する。このような方法においては、実写画像に対するエッジ抽出処理の結果が、検出性能を大きく左右する。一般にエッジ抽出処理は、対象物体の材質や環境光の影響等による変動が大きく、作業者による合わせ込みの手間が非常に大きい。 As a conventional method for generating a learning image from CAD data, it is common to treat a joint portion of the CAD data represented by a polygon as an edge and generate a binarized edge image. In actual detection processing, edge extraction processing is performed on the live-action input image of the component group, and matching based on edge pixels is performed to identify the position and orientation of the target object. In such a method, the result of edge extraction processing on a real image greatly affects the detection performance. In general, the edge extraction process greatly varies due to the material of the target object, the influence of ambient light, and the like, and requires much labor for adjustment by the operator.

これに対し、レンダリングにより、学習画像を実写に近い画像として生成する方法も利用されている。この方法では、対象物体の各面がどのような輝度値となるのかを推定する必要がある。対象物体のBRDF(双方向反射率分布関数)と環境光の状況が既知であれば、それらを用いることで、対象物体の表面に、推定される輝度値を与えてCGを生成することは可能である。しかしながら、対象物体のBRDFを正確に知るためには特殊な機材による測定が必要であり、実環境における環境光の条件を正確に数値として獲得するための作業にも相当な手間がかかる。 On the other hand, a method of generating a learning image as an image close to a real image by rendering is also used. In this method, it is necessary to estimate what luminance value each surface of the target object has. If the BRDF (bidirectional reflectance distribution function) of the target object and the ambient light conditions are known, it is possible to generate CG by giving the estimated brightness value to the surface of the target object. It is. However, in order to accurately know the BRDF of the target object, measurement with special equipment is necessary, and it takes a considerable amount of work to accurately obtain the environmental light conditions in the actual environment as numerical values.

また、環境内に球体を置くことで輝度値と面方向に関する環境マップを作成し、そのマップに従って学習画像を生成する方法もある。例えば、鏡面物体の学習画像を生成するために、鏡面球体を環境内に置くことで環境マップ画像を生成する(特許文献1参照)。しかし、例えば対象物体が一般的なプラスチック等の材質による物体であった場合、たとえ素材が同じであっても成形金型や表面加工によってその反射特性は変動するため、対象物体と同じ反射特性である球状物体を用意することは非常に困難である。 There is also a method in which a sphere is placed in the environment to create an environment map related to the luminance value and the surface direction, and a learning image is generated according to the map. For example, in order to generate a learning image of a specular object, an environment map image is generated by placing a specular sphere in the environment (see Patent Document 1). However, for example, if the target object is an object made of a general material such as plastic, even if the material is the same, the reflection characteristics vary depending on the molding die and surface processing. It is very difficult to prepare a spherical object.

本発明は上記問題に鑑み、実環境下で対象物体を撮影した情報に基づき、環境条件を反映して対象物体の表面輝度を近似した学習画像を、容易に生成する画像処理装置、画像処理システム、および画像処理方法を提供することを目的とする。 In view of the above problems, the present invention provides an image processing apparatus and an image processing system that easily generate a learning image that approximates the surface luminance of a target object by reflecting environmental conditions based on information obtained by photographing the target object in a real environment. And an image processing method.

上記目的を達成するための一手段として、本発明の画像処理装置は以下の構成を備える。すなわち、物体を検出するための画像認識処理において参照される辞書の作成に用いられる該物体の学習画像を生成する画像処理装置であって、
姿勢の異なる複数の前記物体を含む輝度画像から当該輝度画像の複数の領域における前記物体の表面の輝度値を取得する第１の取得手段と、
前記第１の取得手段により前記輝度値が取得される前記複数の領域における前記物体の表面の向きに係る情報を取得する第２の取得手段と、
前記第１の取得手段により取得した前記複数の領域における前記物体の表面の輝度値と、前記第２の取得手段により取得した前記複数の領域における前記物体の表面の向きに係る情報との対応関係を取得する第３の取得手段と、
前記第３の取得手段により取得した対応関係と、前記物体のモデル情報とに基づいて、前記物体の学習画像を生成する生成手段と、
を有することを特徴とする。 As a means for achieving the above object, an image processing apparatus of the present invention comprises the following arrangement. That is, an image processing apparatus that generates a learning image of an object used to create a dictionary that is referred to in an image recognition process for detecting the object,
First acquisition means for acquiring a luminance value of the surface of the object in a plurality of regions of the luminance image from a luminance image including the plurality of objects having different postures ;
Second acquisition means for acquiring information relating to the orientation of the surface of the object in the plurality of regions from which the luminance values are acquired by the first acquisition means;
Correspondence relationship between the brightness value of the surface of the object in the plurality of areas acquired by the first acquisition means and information related to the orientation of the surface of the object in the plurality of areas acquired by the second acquisition means A third acquisition means for acquiring
A correspondence relationship acquired by the third acquisition means, based on the model information of the object, and generating means for generating a training image of the object,
It is characterized by having.

本発明によれば、対象物体を撮影した情報に基づき、環境条件を反映して対象物体の表面輝度を近似した学習画像を、容易に生成することができる。 According to the present invention, it is possible to easily generate a learning image that approximates the surface luminance of a target object by reflecting environmental conditions based on information obtained by photographing the target object.

第1実施形態に係る画像認識処理を行う画像処理装置において、特に学習画像の生成を行うための概要構成を示すブロック図、In the image processing apparatus that performs the image recognition processing according to the first embodiment, in particular a block diagram showing a schematic configuration for generating a learning image, 第1実施形態の画像処理装置において、ランタイム時に対象物体を検出するための概要構成を示すブロック図、In the image processing apparatus of the first embodiment, a block diagram showing a schematic configuration for detecting a target object at runtime, 第1実施形態におけるランタイム処理および辞書生成処理を示すフローチャート、A flowchart showing runtime processing and dictionary generation processing in the first embodiment, 対象物体の表面性状を説明する図、A diagram for explaining the surface properties of the target object, 対象物体における輝度分布の観測例を示す図、The figure which shows the example of observation of luminance distribution in the target object, CG画像による学習画像生成の様子を示す図、The figure which shows the state of learning image generation with CG image, 複数色からなる対象物体における輝度分布の観測例を示す図、The figure which shows the example of observation of the luminance distribution in the target object which consists of plural colors, 第2実施形態における輝度推定部の詳細構成を示すブロック図、A block diagram showing a detailed configuration of a luminance estimation unit in the second embodiment, 第2実施形態における輝度推定処理の詳細を示すフローチャート、A flowchart showing details of luminance estimation processing in the second embodiment, 第2実施形態における輝度分布の推定方法を説明する図、The figure explaining the estimation method of the luminance distribution in 2nd Embodiment, 第2実施形態における色分け指定用のGUI例を示す図、The figure which shows the GUI example for the color classification specification in 2nd Embodiment, 第2実施形態における色分けによる輝度分布関数の対応付けの例を示す図、The figure which shows the example of matching of the luminance distribution function by color coding in 2nd Embodiment, 第3実施形態における輝度値の予測分布例を示す図、A diagram showing an example of a predicted distribution of luminance values in the third embodiment, 第4実施形態において学習画像の生成を行うための概要構成を示すブロック図、A block diagram showing a schematic configuration for generating a learning image in the fourth embodiment, 第4実施形態における学習処理を示すフローチャート、である。15 is a flowchart showing learning processing in the fourth embodiment.

以下、本発明実施形態について、図面を参照して説明する。なお、以下の実施の形態は特許請求の範囲に関わる本発明を限定するものではなく、また、本実施の形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The following embodiments do not limit the present invention related to the scope of claims, and all combinations of features described in the present embodiments are essential for the solution means of the present invention. Is not limited.

＜第1実施形態＞
本発明は、入力画像から対象物体の画像を検出する画像認識処理において参照される辞書の生成に用いられる学習画像を、実環境下で対象物体を撮影した情報に基づき、環境条件を反映して対象物体の表面輝度を近似させるように生成する。 <First Embodiment>
The present invention reflects a learning image used for generating a dictionary referred to in an image recognition process for detecting an image of a target object from an input image based on information obtained by photographing the target object in an actual environment and reflecting environmental conditions. It is generated so as to approximate the surface brightness of the target object.

●概要構成
図1(a)に、本実施形態における画像認識処理を行う画像処理装置において、特に学習画像の生成を行うための構成の概要を示す。モデル設定部1010は、対象物体のモデルを設定し、モデル記憶部1020に記憶する。画像取得部1110は、対象物体を撮影することで事前取得画像を取得し、事前取得画像記憶部1120に記憶する。観測データ分布取得部1130は、事前取得画像記憶部1120に記憶された事前取得画像から得られる情報に基づき、輝度値の観測データ分布を取得する。輝度推定部1210では、観測データ分布取得部1130で得られたデータ分布に基づき、対象物体表面の輝度分布を推定する。画像生成部1220は、モデル記憶部1020に記憶されているモデルと、輝度推定部1210で推定された輝度分布に基づき、対象物体の様々な姿勢のCG画像を生成する。生成されたCG画像は、学習画像として学習画像記憶部2010に記憶される。本実施形態では、このように対象物体を撮影して得られた輝度分布に基づく学習画像を容易に生成することを特徴とする。 Overview Configuration FIG. 1A shows an overview of a configuration for generating a learning image, in particular, in the image processing apparatus that performs image recognition processing in the present embodiment. The model setting unit 1010 sets the model of the target object and stores it in the model storage unit 1020. The image acquisition unit 1110 acquires a pre-acquired image by photographing the target object, and stores it in the pre-acquired image storage unit 1120. The observation data distribution acquisition unit 1130 acquires an observation data distribution of luminance values based on information obtained from the pre-acquired image stored in the pre-acquisition image storage unit 1120. The luminance estimation unit 1210 estimates the luminance distribution of the target object surface based on the data distribution obtained by the observation data distribution acquisition unit 1130. The image generation unit 1220 generates CG images of various postures of the target object based on the model stored in the model storage unit 1020 and the luminance distribution estimated by the luminance estimation unit 1210. The generated CG image is stored in the learning image storage unit 2010 as a learning image. The present embodiment is characterized in that a learning image based on the luminance distribution obtained by photographing the target object as described above is easily generated.

以上のように生成された学習画像を用いた学習処理により、画像認識用の辞書を生成する。すなわち、学習画像設定部2020が、学習画像記憶部2010に記憶されている複数の学習画像を読み出し、学習部2100が、これらの学習画像を用いて学習処理を行い、対象物体を認識するための辞書を生成する。生成された辞書は辞書記憶部2200に記憶される。 A dictionary for image recognition is generated by learning processing using the learning image generated as described above. That is, the learning image setting unit 2020 reads a plurality of learning images stored in the learning image storage unit 2010, and the learning unit 2100 performs learning processing using these learning images to recognize the target object. Generate a dictionary. The generated dictionary is stored in the dictionary storage unit 2200.

以上の構成により生成された辞書を用いて、ランタイム処理が行われる。ここでランタイムとは、本実施形態によって作成された学習画像に基づいて得られた辞書を用いて、実際の入力画像に対して対象物体の認識(検出)処理を行うことである。ランタイム時には、まず辞書設定部3010が、辞書記憶部2200に記憶された辞書を読み出し、該読み出した辞書を認識部3100に入力する。一方、入力画像取得部3020は、対象物体を撮影した入力画像を取得し、認識部3100に入力する。認識部3100は入力画像について、読み込んだ辞書に従って対象物体の位置および姿勢の推定を行う。認識部3100で推定された対象物体の位置および姿勢は、認識結果として、認識結果出力部3200にて所定の方法で提示される。 A runtime process is performed using the dictionary generated by the above configuration. Here, runtime refers to performing recognition (detection) processing of a target object on an actual input image using a dictionary obtained based on a learning image created according to the present embodiment. At runtime, the dictionary setting unit 3010 first reads a dictionary stored in the dictionary storage unit 2200 and inputs the read dictionary to the recognition unit 3100. On the other hand, the input image acquisition unit 3020 acquires an input image obtained by photographing the target object and inputs the input image to the recognition unit 3100. The recognition unit 3100 estimates the position and orientation of the target object for the input image according to the read dictionary. The position and orientation of the target object estimated by the recognition unit 3100 are presented as a recognition result by the recognition result output unit 3200 by a predetermined method.

●ロボット作業概要
以下、上述した構成からなる本実施形態の画像処理装置を、ロボットによる作業へ応用する場合を例として、説明を行う。まず、ランタイム時に対象物体を検出するための装置構成を図2に示す。図2において、トレイ500に検出の対象物体400が配置されている。300は画像情報および撮影位置からの距離情報を得るための撮像装置であり、図1における入力画像取得部3020に相当する。撮像装置300としては、撮影時に画像情報とともに距離情報を得ることができれば、ステレオカメラやTOFセンサ、あるいは、カメラと投光器の組み合わせによる光切断や空間コード化などの装置でもよく、本発明において限定されるものではない。また、後述するようにトラッキング技術を利用したモデルとの位置合わせを行う場合には、距離情報を得る必要はなく、撮像装置300はカメラのみによる構成であってもよい。撮像装置300は有線もしくは無線を介して計算機100に接続されている。 Outline of Robot Work Hereinafter, an example in which the image processing apparatus according to the present embodiment configured as described above is applied to work by a robot will be described. First, FIG. 2 shows an apparatus configuration for detecting a target object at runtime. In FIG. 2, a target object 400 to be detected is arranged on a tray 500. Reference numeral 300 denotes an imaging device for obtaining image information and distance information from the shooting position, and corresponds to the input image acquisition unit 3020 in FIG. The imaging device 300 may be a stereo camera, a TOF sensor, or a device such as light cutting or spatial coding using a combination of a camera and a projector as long as distance information can be obtained together with image information at the time of shooting, and is limited in the present invention. It is not something. Further, as will be described later, when performing alignment with a model using a tracking technique, it is not necessary to obtain distance information, and the imaging apparatus 300 may be configured only by a camera. The imaging device 300 is connected to the computer 100 via a wired or wireless connection.

計算機100には、図1に示す認識部3100および認識結果出力部3200に相当する構成がプログラムもしくは回路として組み込まれている。計算機100の内部もしくは外部に備えられたハードディスク(不図示)が辞書記憶部2200に相当し、辞書を記憶している。なお、これら認識部3100、認識結果出力部3200、および辞書記憶部2200は、上記のような構成に限定されるものではない。例えば、ネットワークを介した計算機やハードディスク、あるいはカメラ内部に組み込まれた回路とフラッシュメモリ、などの組み合わせによる構成であってもよい。 In the computer 100, a configuration corresponding to the recognition unit 3100 and the recognition result output unit 3200 shown in FIG. 1 is incorporated as a program or a circuit. A hard disk (not shown) provided inside or outside of the computer 100 corresponds to the dictionary storage unit 2200 and stores a dictionary. Note that the recognition unit 3100, the recognition result output unit 3200, and the dictionary storage unit 2200 are not limited to the configuration described above. For example, it may be a computer or hard disk via a network, or a combination of a circuit incorporated in the camera and a flash memory.

計算機100は、ロボットコントローラ210へと接続されている。ロボットコントローラ210はロボットアーム220に接続されており、ロボットアーム220はロボットコントローラ210からの命令信号を受けて動作する。ロボットアーム220には被作業物体への把持作業等、所定の作業を行うためのエンドエフェクタ230が設置されている。 The computer 100 is connected to the robot controller 210. The robot controller 210 is connected to the robot arm 220, and the robot arm 220 operates in response to a command signal from the robot controller 210. The robot arm 220 is provided with an end effector 230 for performing a predetermined work such as a gripping work on a work object.

●ランタイム処理
図2に示す構成におけるランタイム処理について、図3(a)のフローチャートを用いて説明する。上述したようにランタイムとは、本実施形態で作成された学習画像に基づく辞書を用いて、実際の入力画像に対して対象物体の認識(検出)処理を行うことである。図2に示す構成例では、例えば実際の工場などで次々と搬送されてくる対象物体を撮影し、物体の位置および姿勢を認識することを指す。ただし、本発明は上記のようなランタイム処理に限定されるものではなく、例えば本発明を顔認識に応用した場合であれば、カメラを人に向けて撮影して顔認識を行う場面なども、ランタイム処理であると言える。 ● Runtime Processing The runtime processing in the configuration shown in FIG. 2 will be described with reference to the flowchart of FIG. As described above, the runtime is to perform recognition (detection) processing of the target object on the actual input image using the dictionary based on the learning image created in the present embodiment. In the configuration example shown in FIG. 2, for example, the target object that is successively conveyed in an actual factory or the like is photographed to indicate the position and orientation of the object. However, the present invention is not limited to the runtime processing as described above. For example, when the present invention is applied to face recognition, a scene in which a camera is photographed toward a person and face recognition is performed. It can be said that this is a runtime process.

図3(a)において、まず辞書設定工程S3010で辞書設定部3010が、予め生成されて辞書記憶部2200に記憶されている辞書を読み出して、認識部3100に出力する。なお、辞書生成の詳細については後述する。 In FIG. 3A, first in dictionary setting step S3010, the dictionary setting unit 3010 reads out a dictionary that has been generated in advance and stored in the dictionary storage unit 2200, and outputs it to the recognition unit 3100. Details of dictionary generation will be described later.

入力画像取得工程S3020では、トレイ500に配置された対象物体400がカメラ300によって撮影され、得られた画像(輝度画像)と距離情報は計算機100に送られる。次に認識工程S3100で認識部3100が、入力画像に対し、辞書設定部3010が読み込んだ辞書を用いて画像認識処理を行い、対象物体400の位置および姿勢を推定する。推定された位置および姿勢は、認識結果として認識結果出力部3200へと送られる。 In the input image acquisition step S3020, the target object 400 placed on the tray 500 is photographed by the camera 300, and the obtained image (luminance image) and distance information are sent to the computer 100. Next, in the recognition step S3100, the recognition unit 3100 performs image recognition processing on the input image using the dictionary read by the dictionary setting unit 3010, and estimates the position and orientation of the target object 400. The estimated position and orientation are sent to the recognition result output unit 3200 as a recognition result.

ここで行われる画像認識処理とは、対象物体の位置および姿勢をクラス分類器によってクラス分けする処理であり、その際に用いられる辞書とは、このクラス分類器を定義するものである。辞書によって定義されるクラス分類器は、画像の一部に写っている対象物体がどのクラスに当てはまるかを判定することで、その位置および姿勢を認識する。例えば、周知の技術であるSVM(Support Vector Machine)やRT(Randomized Trees)によるクラス分類器を用いてもよい。また、クラス分類器への入力として、入力画像に所定の画像処理を行ったデータを入力してもよい。ここで入力画像に施す画像処理とは、入力画像をクラス分類器が扱い易い形式に変換する処理の総称であって、その処理内容は限定されない。例えば、ガウスフィルタやメジアンフィルタ等を用いたノイズ除去や、Sobelフィルタ、LoGフィルタ、ラプラシアンフィルタ、Cannyエッジ検出器によるエッジ抽出処理等が考えられる。また、拡大縮小、ガンマ補正等の前処理や、HOGやSIFTなどの特徴抽出処理も、ここで述べた画像処理として考えられる。また、画像処理は上記各処理のうち1つのみを選択的に行うものに限定されず、例えば、ガウスフィルタによるノイズ除去後にSobelフィルタによるエッジ抽出を行う、という一連の組み合わせ処理であってもよい。 The image recognition processing performed here is processing for classifying the position and orientation of the target object by a classifier, and the dictionary used at this time defines this classifier. The class classifier defined by the dictionary recognizes the position and orientation by determining to which class the target object shown in a part of the image applies. For example, a classifier based on a well-known technique such as SVM (Support Vector Machine) or RT (Randomized Trees) may be used. Further, as an input to the class classifier, data obtained by performing predetermined image processing on the input image may be input. Here, the image processing applied to the input image is a general term for processing for converting the input image into a format that can be easily handled by the classifier, and the processing content is not limited. For example, noise removal using a Gaussian filter, median filter, or the like, edge extraction processing using a Sobel filter, LoG filter, Laplacian filter, Canny edge detector, or the like can be considered. Further, preprocessing such as enlargement / reduction and gamma correction, and feature extraction processing such as HOG and SIFT are also considered as the image processing described here. Further, the image processing is not limited to the one that selectively performs only one of the above-described processes, and may be a series of combination processes in which edge extraction is performed using a Sobel filter after noise removal using a Gaussian filter, for example. .

次に、認識結果出力工程S3200では、認識結果出力部3200によって、認識結果である対象物体400の推定位置および姿勢から、ロボットが所定作業を行うための命令をエンコードし、ロボットコントローラ210へ出力する。ロボットコントローラ210は送られてきた命令をデコードし、該命令に応じてロボットアーム220およびエンドエフェクタ230を動作させ、認識された被作業物体(対象物体400)に対して所定の作業を行う。 Next, in the recognition result output step S3200, the recognition result output unit 3200 encodes a command for the robot to perform a predetermined work from the estimated position and orientation of the target object 400, which is the recognition result, and outputs it to the robot controller 210. . The robot controller 210 decodes the sent command, operates the robot arm 220 and the end effector 230 in accordance with the command, and performs a predetermined work on the recognized work object (target object 400).

なお、ランタイムにおいて上記S3100の認識作業を繰り返し行う場合、S3010で設定された辞書をメモリ(不図示)に保持しておくことで、辞書設定工程S3010を繰り返す必要はない。すなわちこの場合、入力画取得工程S3020以降の処理を繰り返し実行すればよい。 If the recognition operation of S3100 is repeated at runtime, it is not necessary to repeat the dictionary setting step S3010 by holding the dictionary set in S3010 in a memory (not shown). That is, in this case, the processing after the input image acquisition step S3020 may be repeatedly executed.

●辞書生成処理(学習処理)
以上のようなランタイム処理を行うにあたり、対象物体400を検出するための辞書を事前に用意しておく必要がある。以下、この辞書を設定するための処理について詳細に説明する。なお上述したように、繰り返し行われるランタイム作業において辞書は再利用されるものであるから、辞書生成処理は一度だけ行えばよい。 Dictionary generation process (learning process)
In performing the runtime processing as described above, it is necessary to prepare a dictionary for detecting the target object 400 in advance. Hereinafter, processing for setting the dictionary will be described in detail. As described above, since the dictionary is reused in repeated runtime work, the dictionary generation process needs to be performed only once.

本実施形態における辞書生成処理は、上述した図1に示す構成により行われる。なお、図1における画像生成部1220、学習部2100、認識部3100は、いずれも図2に示す計算機100内部におけるプログラムとして実装される。また、事前取得画像記憶部1120、学習画像記憶部2010、辞書記憶部2200、ハードディスクに実装されるものとして説明する。なお、このハードディスクは計算機100の内部もしくは外部に接続されているものである。ただし、本発明の実装形態はこの例に限定されるものではなく、ランタイム時に用いるものとは別の計算機やハードディスク、あるいはカメラ内に搭載された計算機やメモリに実装されていてもよい。また、画像取得部1110は、図2に示す撮像装置300もしくは計算機100の内部に、撮像装置300を制御するプログラムとして実装される。 The dictionary generation process in the present embodiment is performed by the configuration shown in FIG. Note that the image generation unit 1220, the learning unit 2100, and the recognition unit 3100 in FIG. 1 are all implemented as programs in the computer 100 shown in FIG. Also, description will be made assuming that the pre-acquired image storage unit 1120, the learning image storage unit 2010, the dictionary storage unit 2200, and the hard disk are mounted. The hard disk is connected to the inside or outside of the computer 100. However, the implementation form of the present invention is not limited to this example, and may be implemented in a computer or hard disk different from that used at runtime, or in a computer or memory installed in the camera. Further, the image acquisition unit 1110 is implemented as a program for controlling the imaging apparatus 300 in the imaging apparatus 300 or the computer 100 illustrated in FIG.

以下、図3(b)のフローチャートに従って、本実施形態における辞書生成処理について説明する。まず、モデル設定工程S1000でモデル設定部1010が、対象物体400のモデルを設定し、モデル記憶部1020に記憶させる。ここでモデルとは、後述するように対象物体のCG画像を生成する際に必要な情報を含むモデル情報であり、具体的には対象物体400のCADデータやポリゴンモデルなどを指す。 Hereinafter, dictionary generation processing according to the present embodiment will be described with reference to the flowchart of FIG. First, in the model setting step S1000, the model setting unit 1010 sets the model of the target object 400 and stores it in the model storage unit 1020. Here, the model is model information including information necessary for generating a CG image of a target object as described later, and specifically refers to CAD data, a polygon model, and the like of the target object 400.

次に画像取得工程S1100では、トレイ500に対象物体400を配置して、画像取得部1110が撮像装置300で撮影を行い、輝度画像と、該輝度画像における各画素位置の距離情報(距離画像)を取得する。取得された輝度画像と距離画像をまとめて事前取得画像として、事前取得画像記憶部1050に書き込まれる。ここで事前取得画像とは、画像生成部1220での学習画像の生成の際に利用される画像であり、ランタイム時と同じ環境条件、すなわち入力画像取得工程S3020と同様の環境条件下で撮影されることが望ましい。例えば、事前取得画像の撮影時の照明条件は、S3020とほぼ同じ照明条件であることが望ましい。また、事前取得画像として撮影される対象物体は、複数個がランダムに山積みされている状態であることが望ましい。また、事前取得画像としては少なくとも1枚が撮影され、本実施形態では5枚程度を撮影するものとする。このように事前取得画像を複数枚撮影する場合には、それぞれの撮影状態において対象物体400の配置が異なっていること、すなわち位置および姿勢のバリエーションが多いことが望ましい。なお、事前取得画像の撮影には、ランタイム時と同じ撮像装置を用いることが理想的ではあるが、撮像装置とトレイの位置関係や照明条件が類似しているのであれば、別の撮像装置によって事前取得画像を取得してもよい。また、事前取得画像として、単数の対象物体を様々な姿勢で撮影してもよい。その場合には、山積み状態で撮影する場合よりも多めの画像(例えば20枚程度)を撮影することが望ましい。 Next, in the image acquisition step S1100, the target object 400 is placed on the tray 500, and the image acquisition unit 1110 takes an image with the imaging device 300, and the luminance image and distance information (distance image) of each pixel position in the luminance image. To get. The acquired luminance image and distance image are collectively written in the pre-acquired image storage unit 1050 as a pre-acquired image. Here, the pre-acquired image is an image used when the learning image is generated by the image generation unit 1220, and is captured under the same environmental conditions as at the time of runtime, that is, under the same environmental conditions as the input image acquisition step S3020. It is desirable. For example, it is desirable that the illumination condition at the time of capturing the pre-acquired image is substantially the same as that in S3020. Moreover, it is desirable that a plurality of target objects photographed as pre-acquired images be stacked in a random manner. In addition, at least one image is captured as a pre-acquired image, and about 5 images are captured in the present embodiment. Thus, when a plurality of pre-acquired images are captured, it is desirable that the arrangement of the target object 400 is different in each capturing state, that is, there are many variations in position and orientation. It is ideal to use the same imaging device as that used at the time of shooting for pre-acquired images, but if the positional relationship and lighting conditions between the imaging device and the tray are similar, use another imaging device. A pre-acquired image may be acquired. Further, as a pre-acquired image, a single target object may be photographed in various postures. In that case, it is desirable to photograph a larger number of images (for example, about 20 images) than when photographing in a stacked state.

次に、観測データ分布取得工程S1130では、上記S1100で取得された事前取得画像に基づき、輝度値の分布を示す観測データ分布を取得する。事前取得画像記憶部1120に記憶されている事前取得画像には、輝度画像における任意の画素jに対し、距離画像としてカメラ座標系位置が(Xj,Yj,Zj)で観測されているとする。ここでカメラ座標系とは、輝度画像の各画素に対応する3次元座標系であり、カメラ300を原点としたXYZの各軸によって示される撮影空間を示す。このとき、画素jとその近傍数点(例えば画素jと隣接8画素の計9画素)のカメラ座標系位置を集計し、それらを平面近似することで、画素jに対応するカメラ座標系位置(Xj,Yj,Zj)の表面法線ベクトルNjを算出する。事前取得画像における対象物体400の存在領域(例えばトレイ500の内部領域)内のすべての画素について、対応する表面法線ベクトルを算出することにより、輝度値と表面法線方向の対応を示す観測データ分布を得ることができる。なお、ここでは観測輝度値を各画素値として説明しているが、一般には画像上の所定位置における輝度値であることは言うまでもない。したがって、観測輝度値としては画素単体における輝度値でなくてもよく、近傍数画素による局所領域としての平均値や、ノイズ除去フィルタを通した後の値等であってもよい。 Next, in the observation data distribution acquisition step S1130, an observation data distribution indicating a luminance value distribution is acquired based on the pre-acquired image acquired in S1100. In the pre-acquired image stored in the pre-acquired image storage unit 1120, it is assumed that the camera coordinate system position is observed as a distance image at (Xj, Yj, Zj) for an arbitrary pixel j in the luminance image. Here, the camera coordinate system is a three-dimensional coordinate system corresponding to each pixel of the luminance image, and indicates an imaging space indicated by each axis of XYZ with the camera 300 as the origin. At this time, by summing up the camera coordinate system positions of the pixel j and its neighboring points (for example, a total of 9 pixels including the pixel j and the adjacent 8 pixels), the camera coordinate system position ( The surface normal vector Nj of Xj, Yj, Zj) is calculated. Observation data indicating the correspondence between the luminance value and the surface normal direction by calculating the corresponding surface normal vector for all pixels in the existing area of the target object 400 (for example, the inner area of the tray 500) in the pre-acquired image Distribution can be obtained. Here, although the observed luminance value is described as each pixel value, it is needless to say that it is generally a luminance value at a predetermined position on the image. Therefore, the observed luminance value does not have to be a luminance value in a single pixel, but may be an average value as a local region by a number of neighboring pixels, a value after passing through a noise removal filter, or the like.

次に輝度推定工程S1210では、事前取得画像から得られた観測データ分布に基づき、対象物体400の表面輝度分布を推定する。対象物体400のCADデータからCGによって学習画像を生成するには、対象物体400の表面輝度分布をモデル化し、該表面輝度分布モデルにおけるパラメータ(輝度分布パラメータ)を推定する必要がある。 Next, in the luminance estimation step S1210, the surface luminance distribution of the target object 400 is estimated based on the observation data distribution obtained from the previously acquired image. In order to generate a learning image by CG from CAD data of the target object 400, it is necessary to model the surface luminance distribution of the target object 400 and estimate a parameter (luminance distribution parameter) in the surface luminance distribution model.

ここで、輝度分布パラメータの具体例としては以下のようなものが考えられる。例えば、光源を単一の平行光と仮定し、対象物体400としての部品表面がランバート反射(拡散反射)するとすれば、比較的簡単な輝度分布モデルで近似することができる。この近似例を図4を用いて説明する。図4は、対象物体400に対し、光源600から光を照射し、カメラ300がその反射光を受光する様子を示している。対象物体400の表面から光源600へ向かう光源軸20のカメラ座標系における光源方向L＝(Lx,Ly,Lz)と、カメラ光軸10のカメラ光軸方向V=(Vx,Vy,Vz)の中間を、反射中心軸30とする。すると反射中心軸30の方向ベクトルH＝(Hx,Hy,Hz)は、以下の(1)式で表わされる。 Here, specific examples of the luminance distribution parameter are as follows. For example, assuming that the light source is a single parallel light and the surface of the part as the target object 400 is Lambertian reflection (diffuse reflection), it can be approximated by a relatively simple luminance distribution model. An example of this approximation will be described with reference to FIG. FIG. 4 shows a state in which the target object 400 is irradiated with light from the light source 600, and the camera 300 receives the reflected light. The light source direction L = (Lx, Ly, Lz) in the camera coordinate system of the light source axis 20 from the surface of the target object 400 toward the light source 600 and the camera optical axis direction V = (Vx, Vy, Vz) of the camera optical axis 10 The middle is the reflection center axis 30. Then, the direction vector H = (Hx, Hy, Hz) of the reflection center axis 30 is expressed by the following equation (1).

H＝(L+V)/|L+V| ・・・(1)
そして、対象物体400表面の任意の位置における表面法線40の法線ベクトルN＝(Nx,Ny,Nz)と、反射中心軸30の方向ベクトルHのなす角50をθとすると、θは以下の(2)式で表わされる。 H = (L + V) / | L + V | (1)
When θ is an angle 50 formed by the normal vector N = (Nx, Ny, Nz) of the surface normal 40 at an arbitrary position on the surface of the target object 400 and the direction vector H of the reflection central axis 30, θ is (2).

θ＝cos^-1(H・N/(|H||N|)) ・・・(2)
このとき、上記対象物体表面の任意の位置における輝度値Jは、θの関数として以下の(3)式のようにガウス関数で近似することができる。 θ = cos ^-1 (H ・ N / (| H || N |)) (2)
At this time, the luminance value J at an arbitrary position on the surface of the target object can be approximated by a Gaussian function as the following equation (3) as a function of θ.

J(θ)＝Cexp(-θ²/m) ・・・(3)
なお、(3)式におけるCとmはそれぞれ、輝度分布全体の強度と輝度分布の広がりを表す輝度分布パラメータであり、これを推定することで輝度分布モデルの近似が行われる。 J (θ) ＝ Cexp (-θ ² / m) (3)
Note that C and m in Equation (3) are luminance distribution parameters representing the intensity of the entire luminance distribution and the spread of the luminance distribution, respectively, and approximation of the luminance distribution model is performed by estimating this.

ここで、光源は単一であると仮定されているため、得られた観測値の中で輝度値が最大となる画素の表面法線ベクトルNjが、光源方向L=(Lx,Ly,Lz)であると推定される。このとき、観測誤差や輝度のサチュレーション等を考慮して、近傍画素による輝度値の平均化等を行っても良いことは言うまでもない。また、光源方向Lが既知である場合には、上記光源方向の推定を行う必要がないことはもちろんである。 Here, since it is assumed that the light source is single, the surface normal vector Nj of the pixel having the maximum luminance value among the obtained observation values is the light source direction L = (Lx, Ly, Lz) It is estimated that. At this time, it goes without saying that luminance values may be averaged by neighboring pixels in consideration of observation errors, luminance saturation, and the like. Of course, when the light source direction L is known, it is not necessary to estimate the light source direction.

輝度分布を上記(3)式のようなガウス関数で近似する場合、まず上記のようにして得られた光源方向Lに対して、上記(1)式に従って反射中心軸の方向ベクトルHを算出する。これにより、各画素jにおける反射中心軸からの角度θjが上記(2)式から求まる。以下、画素jにおける角度θjと輝度値Jjの組を、観測点pj=(θj,Jj)と表わす。すべての画素jについての観測点pjを算出することで、図5のような観測分布を得ることができる。図5において、データ点B100は角度θjと輝度値Ijの観測点pjである。 When approximating the luminance distribution with a Gaussian function like the above equation (3), the direction vector H of the reflection central axis is first calculated according to the above equation (1) with respect to the light source direction L obtained as described above. . Thereby, the angle θj from the reflection center axis in each pixel j is obtained from the above equation (2). Hereinafter, a set of the angle θj and the luminance value Jj in the pixel j is represented as an observation point pj = (θj, Jj). By calculating the observation points pj for all the pixels j, an observation distribution as shown in FIG. 5 can be obtained. In FIG. 5, a data point B100 is an observation point pj of the angle θj and the luminance value Ij.

図5に示す観測分布に対し、上記(3)式のモデルをB200のように最尤推定フィッティングすることで、対象物体400の表面輝度分布の推定モデルを得ることができる。具体的にはまず、誤差関数Eを推定値と観測値の差の二乗和として、以下の(4)式のように定義する。なお、(4)式においてΣはjについての総和を示す。 An estimated model of the surface luminance distribution of the target object 400 can be obtained by performing maximum likelihood estimation fitting on the observed distribution shown in FIG. Specifically, first, the error function E is defined as the sum of squares of the difference between the estimated value and the observed value as shown in the following equation (4). In the equation (4), Σ represents the total sum for j.

E＝Σ{J(θj)−Jj}² ・・・(4)
最尤推定フィッティングをこの誤差関数Eの最小化問題として考える。すると、誤差関数EはパラメータCに関して下に凸の2次関数であるため、以下の(5)式を解けば、パラメータCの更新式が(6)式のように求まる。 E = Σ {J (θj) −Jj} ² (4)
The maximum likelihood estimation fitting is considered as a minimization problem of the error function E. Then, since the error function E is a quadratic function convex downward with respect to the parameter C, an equation for updating the parameter C can be obtained as in equation (6) by solving the following equation (5).

∂E/∂C＝0 ・・・(5) ∂E / ∂C = 0 (5)

またパラメータmに関しては、計算を簡単にするためγ=1/mとし、γの最適化問題として解く。ここで、誤差関数Eはγに関して凸の関数ではないため、以下の(7)式のように誤差関数Eをデータごとに分解し、それぞれに関して解く。 The parameter m is solved as an optimization problem of γ by setting γ = 1 / m to simplify the calculation. Here, since the error function E is not a convex function with respect to γ, the error function E is decomposed for each data as shown in the following equation (7) and solved for each.

Ej＝{J(θj)−Jj} ・・・(7)
(7)式を最急降下的に解くと、逐次更新式は以下の(8)式のようになり、これはRobbins-Monroの手続きと呼ばれる。なお、(8)式における係数ηは正の値で定義される定数であり、観測データ数の逆数として与えるのが一般的である。 Ej = {J (θj) −Jj} (7)
When equation (7) is solved in a steepest descent, the sequential update equation becomes the following equation (8), which is called the Robbins-Monro procedure. Note that the coefficient η in equation (8) is a constant defined by a positive value, and is generally given as the reciprocal of the number of observation data.

以上、対象物体表面が拡散反射する場合に、輝度分布パラメータC,mを推定することによって、ガウス関数による輝度分布モデルでの近似を可能とする例を示した。一方、対象物体表面における鏡面反射成分を考慮する場合には、以下の(9)式に示すようなTorrance-Sparrowの輝度分布モデルを適用すればよい。 As described above, when the surface of the target object is diffusely reflected, the luminance distribution parameter C, m is estimated to allow approximation with the luminance distribution model using the Gaussian function. On the other hand, when the specular reflection component on the surface of the target object is considered, a Torrance-Sparrow luminance distribution model as shown in the following equation (9) may be applied.

J(θ,α,β)＝K_dcosα+K_s(1/cosβ)exp(-θ²/m) ・・・(9)
なお、(9)式においてK_d,K_s,mはこのモデルにおける輝度分布パラメータである。このモデルを図4に適用すると、θは(2)式のθと同様であり、すなわち表面法線40の方向ベクトルNと反射中心軸30の方向ベクトルHのなす角50である。また、αは表面法線40の方向ベクトルNと光源軸20の方向ベクトルLのなす角70、βは表面法線40の方向ベクトルNとカメラ光軸100の方向ベクトルVのなす角60であり、それぞれ以下の(10),(11)式で表わされる。 J (θ, α, β) = K _d cosα + K _s (1 / cosβ) exp (-θ ² / m) (9)
In Equation (9), K _d , K _s , and m are luminance distribution parameters in this model. When this model is applied to FIG. 4, θ is the same as θ in the equation (2), that is, an angle 50 formed by the direction vector N of the surface normal 40 and the direction vector H of the reflection central axis 30. Α is the angle 70 formed by the direction vector N of the surface normal 40 and the direction vector L of the light source axis 20, and β is the angle 60 formed by the direction vector N of the surface normal 40 and the direction vector V of the camera optical axis 100. Are expressed by the following equations (10) and (11), respectively.

α＝cos^-1(L・N/(|L||N|)) ・・・(10)
β＝cos^-1(V・N/(|V||N|)) ・・・(11)
各観測画素jに対応する(9)式における角度αjとβjは、上記(10),(11)式から得ることができ、これによりθj、αj、βjに対応する輝度値Jjの観測分布を得ることができる。この観測分布に対して(9)式のモデルを最尤推定によってフィッティングすることで、対象物体400の表面輝度分布の推定モデルを得ることができる。 α = cos ^-1 (L ・ N / (| L || N |)) (10)
β = cos ^-1 (V ・ N / (| V || N |)) (11)
The angles αj and βj in the equation (9) corresponding to each observation pixel j can be obtained from the above equations (10) and (11), whereby the observation distribution of the luminance value Jj corresponding to θj, αj, βj is obtained. Can be obtained. An estimated model of the surface luminance distribution of the target object 400 can be obtained by fitting the model of equation (9) to the observed distribution by maximum likelihood estimation.

光源が複数、もしくは環境光などによる外乱光がある場合には、表面法線方向ベクトルNを入力とし、輝度値Jを出力とするノンパラメトリックな回帰モデルJ(N)で輝度分布を近似してもよい。観測値における各画素jに関する表面法線方向ベクトルNjに対して、輝度値Jjを教師値とし、所定のノンパラメトリックモデルを学習させることで、輝度分布推定関数を得る。ノンパラメトリックな回帰モデルとしては、SVMやSVR(Support Vector Regression)、ニューラルネットワーク等さまざまな方法を利用することができる。これらノンパラメトリックモデルを利用する場合には、フィッティングの前に予め光源方向を推定する必要はない。 When there are multiple light sources or ambient light such as ambient light, the brightness distribution is approximated by a nonparametric regression model J (N) with the surface normal direction vector N as input and the brightness value J as output. Also good. A luminance distribution estimation function is obtained by learning a predetermined nonparametric model with the luminance value Jj as a teacher value for the surface normal direction vector Nj for each pixel j in the observed value. As the nonparametric regression model, various methods such as SVM, SVR (Support Vector Regression), and neural network can be used. When using these non-parametric models, it is not necessary to estimate the light source direction in advance before fitting.

また、回帰モデルの引数としてカメラ座標系位置(X,Y,Z)を与えてJ(N,X,Y,Z)を近似することによって、位置による照明条件の違いを考慮した輝度分布推定関数を得ることもできる。また、輝度値が多チャンネルで得られている場合には、それぞれのチャンネルに関する輝度分布を別々に推定すればよい。多チャンネルである場合とは、例えばRGBによるカラー画像や、赤外光あるいは紫外光による非可視光画像を追加情報として含むような場合である。 Also, given the camera coordinate system position (X, Y, Z) as an argument of the regression model and approximating J (N, X, Y, Z), the luminance distribution estimation function considering the difference in lighting conditions depending on the position You can also get In addition, when the luminance values are obtained with multiple channels, the luminance distribution for each channel may be estimated separately. The case of multi-channel is a case where, for example, an RGB color image or an invisible light image using infrared light or ultraviolet light is included as additional information.

以上のように輝度推定工程S1210で対象物体400の表面輝度が推定されると、次に画像生成工程S1220において、辞書を生成するために必要な複数の学習画像を設定する。学習画像は、S1000で設定された対象物体400のモデル(例えばCADデータ)に基づくCG画像として生成される。例えば、BRDFで表わされる対象物体400の表面性状と作業環境における光源情報が既知であれば、周知のレンダリング技術を用いて、モデルから様々な姿勢に関する対象物体400の見えをCG画像で再現することができる。ここで図6に、CG画像による学習画像の生成の様子を示す。図6に示すように、学習画像は対象物体400の物体中心404を中心とした測地ドーム401における各視点403と、該視点403における画像面内回転402によるバリエーションで生成され、それぞれに姿勢クラスのインデックスが与えられる。例えば、72視点、30度毎の面内回転によるバリエーションで学習画像を生成すると、辞書としては72×(360/30)＝864クラスの分類器を学習させることになる。具体的には、モデル記憶部1020に記憶されている対象物体400のモデルに対して、上記各姿勢に対応する射影変換を行い、該射影変換後の各画素に対応するモデル上の点の法線方向(表面法線方向)を算出する。そして、輝度推定工程S1210で得られた結果に従って、その法線方向に対応する輝度値を与えることで、各姿勢に応じた学習画像が生成される。 As described above, when the surface luminance of the target object 400 is estimated in the luminance estimation step S1210, a plurality of learning images necessary for generating a dictionary are set in the next image generation step S1220. The learning image is generated as a CG image based on the model (for example, CAD data) of the target object 400 set in S1000. For example, if the surface properties of the target object 400 represented by BRDF and the light source information in the work environment are known, the appearance of the target object 400 in various postures can be reproduced from the model as a CG image using a known rendering technique. Can do. Here, FIG. 6 shows a learning image generated from a CG image. As shown in FIG. 6, a learning image is generated with each viewpoint 403 in the geodetic dome 401 centered on the object center 404 of the target object 400 and a variation by the in-image rotation 402 at the viewpoint 403, and each has a posture class. An index is given. For example, when learning images are generated with variations of 72 viewpoints and in-plane rotation every 30 degrees, 72 × (360/30) = 864 classifiers are learned as a dictionary. Specifically, projective transformation corresponding to each of the above postures is performed on the model of the target object 400 stored in the model storage unit 1020, and the method of the point on the model corresponding to each pixel after the projection transformation is performed. The line direction (surface normal direction) is calculated. Then, according to the result obtained in the luminance estimation step S1210, a learning image corresponding to each posture is generated by giving a luminance value corresponding to the normal direction.

次に学習工程S2000において、画像生成工程S1220にて生成された複数姿勢の学習画像を用いて、認識部3100にて利用されるクラス分類器の形式に従って辞書を生成する。生成された辞書は辞書記憶部2200に記憶される。 Next, in the learning step S2000, a dictionary is generated according to the class classifier format used in the recognition unit 3100, using the learning images of a plurality of postures generated in the image generation step S1220. The generated dictionary is stored in the dictionary storage unit 2200.

以上説明したように本実施形態によれば、実環境において山積み状態、もしくは複数姿勢の単品状態での対象物体を撮影して得られた輝度情報及び距離情報に基づき、照明等の環境条件を反映して対象物体の表面輝度を近似した輝度画像を容易に生成する。この近似された輝度画像がすなわち、学習画像として辞書生成の際に利用される。 As described above, according to the present embodiment, environmental conditions such as lighting are reflected based on luminance information and distance information obtained by photographing a target object in a piled state or a single product state in a plurality of postures in an actual environment. Thus, a luminance image approximating the surface luminance of the target object is easily generated. This approximated luminance image is used as a learning image when generating a dictionary.

＜変形例＞
第1実施形態では、撮像装置300での撮影によって、対象物体を含む輝度画像および距離画像を取得する例を示したが、撮像装置300が測距機能を有さない場合にも、本実施形態は適用可能である。以下、撮像装置300にて距離画像を取得できない場合に、第1実施形態を適用する変形例について説明する。本変形例においては、周知のトラッキング技術を用いて対象物体の位置姿勢を推定することで、距離画像を生成する。図1(b)に、本変形例における学習画像生成用の構成を示すが、図1(a)と同様の構成には同一番号を付し、説明を省略する。すなわち、図1(b)における距離画像生成部1140にて、距離画像を生成する。この場合、撮像装置300で撮影した輝度画像に対し、ユーザが対象物体のモデルを画像上に重畳させて概位置および概姿勢を設定し、既存のトラッキング技術を用いて詳細な位置および姿勢に合わせ込む。合わせ込まれたモデルとの対応から、対象物体が写っている領域に関する各画素のカメラ座標系における推定位置を算出することで、推定された距離画像として扱うことができる。 <Modification>
In the first embodiment, an example in which a luminance image and a distance image including a target object are acquired by photographing with the imaging device 300 has been described, but the present embodiment also applies when the imaging device 300 does not have a ranging function. Is applicable. Hereinafter, a modification example in which the first embodiment is applied when a distance image cannot be acquired by the imaging apparatus 300 will be described. In this modification, the distance image is generated by estimating the position and orientation of the target object using a known tracking technique. FIG. 1 (b) shows a configuration for learning image generation in the present modification, but the same number is assigned to the same configuration as in FIG. 1 (a), and the description is omitted. That is, the distance image is generated by the distance image generation unit 1140 in FIG. In this case, the user sets the approximate position and orientation by superimposing the model of the target object on the image, and matches the detailed position and orientation using the existing tracking technology. Include. By calculating the estimated position in the camera coordinate system of each pixel related to the region where the target object is captured from the correspondence with the fitted model, it can be handled as an estimated distance image.

＜第2実施形態＞
以下、本発明に係る第2実施形態について説明する。上述した第1実施形態では、対象物体が単色物体であることを前提として説明を行った。しかしながら、対象物体が複数の色から成る場合ももちろんある。例えば、複数部品によるアッセンブリとして供給される部品が対象物体である場合、該対象物体の一部が黒いプラスチック、一部が白いプラスチックで構成されるといったように、部位によって輝度特性が異なることが考えられる。第2実施形態ではこのように複数の輝度特性を有する対象物体について、学習画像を生成する例について説明する。 <Second Embodiment>
Hereinafter, a second embodiment according to the present invention will be described. The first embodiment described above has been described on the assumption that the target object is a single color object. However, there are of course cases where the target object is composed of a plurality of colors. For example, when a component supplied as an assembly of a plurality of components is a target object, the luminance characteristics may be different depending on the part, such that a part of the target object is made of black plastic and part of it is made of white plastic. It is done. In the second embodiment, an example in which a learning image is generated for a target object having a plurality of luminance characteristics will be described.

第2実施形態における画像認識処理を行うための基本構成は、第1実施形態で示した図1(a)と同様である。特に、モデル設定部1010、モデル記憶部1020、画像取得部1110、事前取得画像記憶部1120、観測データ分布取得部1130については、その動作も第1実施形態と同様であるため、説明を省略する。第2実施形態では、観測データ分布から対象物体の表面輝度を推定する輝度推定部1210における処理の詳細が、第1実施形態とは異なる。図8に、第2実施形態における輝度推定部1210の詳細構成を示し、以下に説明する。なお、画像生成部1220以降の処理、およびランタイム時の処理についても、第1実施形態と同様であるため説明を省略する。 The basic configuration for performing image recognition processing in the second embodiment is the same as that shown in FIG. 1A shown in the first embodiment. In particular, the operation of the model setting unit 1010, the model storage unit 1020, the image acquisition unit 1110, the pre-acquired image storage unit 1120, and the observation data distribution acquisition unit 1130 is the same as that of the first embodiment, and thus description thereof is omitted. . In the second embodiment, the details of the processing in the luminance estimation unit 1210 that estimates the surface luminance of the target object from the observation data distribution are different from those in the first embodiment. FIG. 8 shows a detailed configuration of the luminance estimation unit 1210 in the second embodiment, which will be described below. Note that the processing after the image generation unit 1220 and the processing at runtime are the same as in the first embodiment, and thus description thereof is omitted.

第2実施形態の輝度推定部1210は、初期化部1211、データ割り当て部1212、近似部1213、収束判定部1214に細分化される。入力された観測データ分布に対し、まず初期化部1211にて、輝度分布を近似する複数の関数が初期化され、次にデータ割り当て部1212で、観測データ分布を該複数の関数のいずれかに割り当てる。そして近似部1213で、それぞれ割り当てられた観測データ分布に対して、輝度分布関数をフィッティングさせ、収束判定部1214で輝度分布推定計算が収束したか否かを判断する。 The luminance estimation unit 1210 of the second embodiment is subdivided into an initialization unit 1211, a data allocation unit 1212, an approximation unit 1213, and a convergence determination unit 1214. First, a plurality of functions that approximate the luminance distribution are initialized by the initialization unit 1211 with respect to the input observation data distribution, and then the observation data distribution is converted into one of the plurality of functions by the data allocation unit 1212. assign. Then, the approximation unit 1213 fits the luminance distribution function to the assigned observation data distributions, and the convergence determination unit 1214 determines whether the luminance distribution estimation calculation has converged.

●辞書生成処理(学習処理)
第2実施形態においても第1実施形態と同様に、後述するように生成された学習画像から、対象物体を検出するための辞書を生成する。第2実施形態における辞書生成処理の概要は第1実施形態で示した図3(b)のフローチャートと同様であるが、その詳細が異なる。なお、図3(b)においてモデル設定工程S1000、画像取得工程S1100、観測データ分布取得工程S1130については第1実施形態と同様であるため、説明を省略する。 Dictionary generation process (learning process)
Also in the second embodiment, as in the first embodiment, a dictionary for detecting a target object is generated from a learning image generated as described later. The outline of the dictionary generation process in the second embodiment is the same as the flowchart of FIG. 3B shown in the first embodiment, but the details are different. In FIG. 3B, the model setting step S1000, the image acquisition step S1100, and the observation data distribution acquisition step S1130 are the same as those in the first embodiment, and thus the description thereof is omitted.

第2実施形態の輝度推定工程S1210においては、事前取得画像から得られた観測データ分布に基づき、対象物体の表面輝度を推定する。まず、事前取得画像記憶部1120に記憶されている画像から、輝度値と表面法線方向の対応に関する観測データ分布を得る。ここで、光源が単一の平行光であって、部品表面がランバート反射するものと仮定して、輝度分布モデルの近似例を図4を用いて説明する。 In the luminance estimation step S1210 of the second embodiment, the surface luminance of the target object is estimated based on the observation data distribution obtained from the previously acquired image. First, an observation data distribution relating to the correspondence between the luminance value and the surface normal direction is obtained from the image stored in the pre-acquired image storage unit 1120. Here, an approximate example of the luminance distribution model will be described with reference to FIG. 4 on the assumption that the light source is a single parallel light and the surface of the component undergoes Lambertian reflection.

図4において、対象物体400の表面から光源600へ向かう光源軸20のカメラ座標系における光源方向Lと、カメラ光軸10のカメラ光軸方向Vの中間を、反射中心軸30とする。すると、反射中心軸30の方向ベクトルHは、第1実施形態と同様に(1)式で表わされる。また、表面法線40の法線ベクトルNと、反射中心軸30の方向ベクトルHのなす角θも、第1実施形態と同様にして(2)式で表わされる。このとき、対象物体400の表面輝度特性が、T種類の特性の組み合わせによって表わされるとする。t番目(t=1,…,T)の輝度分布関数Jt(θ)は、以下の(12)式のようにガウス関数で近似することができる。 In FIG. 4, an intermediate between the light source direction L in the camera coordinate system of the light source axis 20 from the surface of the target object 400 toward the light source 600 and the camera optical axis direction V of the camera optical axis 10 is defined as a reflection center axis 30. Then, the direction vector H of the reflection center axis 30 is expressed by the equation (1) as in the first embodiment. Further, the angle θ formed by the normal vector N of the surface normal 40 and the direction vector H of the reflection center axis 30 is also expressed by equation (2), as in the first embodiment. At this time, it is assumed that the surface luminance characteristic of the target object 400 is represented by a combination of T types of characteristics. The t-th (t = 1,..., T) luminance distribution function Jt (θ) can be approximated by a Gaussian function as shown in the following equation (12).

Jt(θ)＝Ct・exp(-θ²/mt) ・・・(12)
なお、(12)式におけるCtとmtはそれぞれ、輝度分布全体の強度と輝度分布の広がりを表すパラメータである。 Jt (θ) ＝ Ct ・ exp (-θ ² / mt) (12)
Note that Ct and mt in equation (12) are parameters representing the intensity of the entire luminance distribution and the spread of the luminance distribution, respectively.

対象物体400の輝度分布特性は、T個の輝度分布関数Jt(θ)(t=1,…,T)によって近似される。T=2のときの輝度分布関数の例を図7に示す。図7において、曲線B210,B220はそれぞれ、t=1,2で推定された輝度分布関数による曲線である。なおTの値は、CADデータ等により材質の異なる部分として分解できる数が事前に分かっていれば、Tの値を該数(固定値)として事前に決定することができる。以下ではTが事前に設定されているものとして説明するが、Tが未知である場合の処理についても後述する。 The luminance distribution characteristic of the target object 400 is approximated by T luminance distribution functions Jt (θ) (t = 1,..., T). An example of the luminance distribution function when T = 2 is shown in FIG. In FIG. 7, curves B210 and B220 are curves based on the luminance distribution function estimated at t = 1 and 2, respectively. Note that the value of T can be determined in advance as the number (fixed value) if the number that can be decomposed as parts of different materials is known beforehand by CAD data or the like. In the following description, it is assumed that T is set in advance, but processing when T is unknown will also be described later.

各輝度分布関数Jt(θ)のパラメータCt,mtを推定するために、第2実施形態における輝度推定工程S1210は、図9(a)に示すように細分化される。また図10に、このパラメータ推定の概念図を示す。以下、これらの図を用いてパラメータ推定処理の手順を説明する。 In order to estimate the parameters Ct, mt of each luminance distribution function Jt (θ), the luminance estimation step S1210 in the second embodiment is subdivided as shown in FIG. 9 (a). FIG. 10 shows a conceptual diagram of this parameter estimation. Hereinafter, the procedure of parameter estimation processing will be described with reference to these drawings.

まず初期化工程S1211では、パラメータCt,mtを初期化する。具体的には、初期値をランダムに選択しても良いし、T=1として最尤推定した後、該推定結果から僅かにずらした異なる値を、複数のCt,mtの初期値として設定する、等が考えられる。 First, in initialization step S1211, parameters Ct and mt are initialized. Specifically, the initial value may be selected at random, or after estimating maximum likelihood with T = 1, a different value slightly shifted from the estimation result is set as the initial value of a plurality of Ct and mt. , Etc. are conceivable.

次にデータ割り当て工程S1212では、各観測点pj=(θj,Jj)を、最も近い各輝度分布関数Jt(θ)に対して割り当てる。詳細には例えば、複数の輝度分布関数Jt(θ)のうち、観測点の表面法線方向を入力としたときに得られる推定輝度値が観測点の輝度値と最も近い輝度分布関数に対して、該観測点を割り当てればよい。すなわち、輝度分布関数Jt(θ)を推定するためのデータ集合Stが、以下の(13)式のように定義される。 Next, in the data assignment step S1212, each observation point pj = (θj, Jj) is assigned to each closest brightness distribution function Jt (θ). Specifically, for example, among the plurality of luminance distribution functions Jt (θ), the estimated luminance value obtained when the surface normal direction of the observation point is input is the luminance distribution function closest to the observation point luminance value. The observation point may be assigned. That is, a data set St for estimating the luminance distribution function Jt (θ) is defined as the following equation (13).

(13)式は、各観測点に対し、輝度分布関数のインデックスでラベリングすることに相当する。この例を図10(a)に示す。同図において曲線B210-a,B220-aは、それぞれ異なるパラメータで初期化された輝度分布関数である。これに対し、各観測点が最も近い輝度分布関数に割り当てられる。すなわち、点B110-a等の白丸は曲線B210-aに割り当てられた観測点群であり、点B120-a等の黒丸は曲線B220-aに割り当てられた観測点群である。 Equation (13) corresponds to labeling each observation point with the index of the luminance distribution function. An example of this is shown in FIG. In the figure, curves B210-a and B220-a are luminance distribution functions initialized with different parameters. On the other hand, each observation point is assigned to the closest luminance distribution function. That is, white circles such as point B110-a are observation point groups assigned to the curve B210-a, and black circles such as point B120-a are observation point groups assigned to the curve B220-a.

次に、近似工程S1213では、各輝度分布関数に割り当てられた観測点群Stを用いて、それぞれの輝度分布関数Jt(θ)を最尤推定フィッティングにより更新する。この例を図10(b)に示す。同図における曲線B210-b,B220-bはそれぞれ、図10(a)に示す曲線B210-a,B220-aを自身に割り当てられた観測点群によって更新した曲線を表している。 Next, in the approximation step S1213, each observation point group St assigned to each luminance distribution function is used to update each luminance distribution function Jt (θ) by maximum likelihood estimation fitting. An example of this is shown in FIG. Curves B210-b and B220-b in the figure represent curves obtained by updating the curves B210-a and B220-a shown in FIG. 10 (a) with observation point groups assigned to themselves.

そして全て(この場合2つ)の輝度分布関数Jt(θ)を更新した後、再び各観測点pjに対して最も近い輝度分布関数を特定する。全ての観測点pjに対して特定された輝度分布関数が、各観測点について既に割り当てられている輝度分布関数と同じであれば、輝度推定工程S1210は収束したものとして、次の画像生成工程S1220へと進む。一方、特定された輝度分布関数が既に割り当てられている輝度分布関数と異なる観測点があればデータ割り当て工程S1212へと戻り、上記処理を繰り返す。 Then, after updating all (two in this case) luminance distribution functions Jt (θ), the luminance distribution function closest to each observation point pj is specified again. If the luminance distribution function specified for all the observation points pj is the same as the luminance distribution function already assigned to each observation point, it is assumed that the luminance estimation step S1210 has converged and the next image generation step S1220 Proceed to On the other hand, if there is an observation point where the specified luminance distribution function is different from the already assigned luminance distribution function, the process returns to the data allocation step S1212 and the above processing is repeated.

ここで、対象物体の輝度分布に鏡面反射モデルを取り入れる場合には、第1実施形態と同様にTorrance-Sparrowの輝度分布モデルを適用する。その場合、t番目の輝度分布関数Jt(θ,α,β)を以下の(14)式のように近似すればよい。 Here, when the specular reflection model is incorporated in the luminance distribution of the target object, the Torrance-Sparrow luminance distribution model is applied as in the first embodiment. In that case, the t-th luminance distribution function Jt (θ, α, β) may be approximated as in the following equation (14).

Jt(θ,α,β)＝K_dtcosα+K_st(1/cosβ)exp(-θ²/mt) ・・・(14)
なお、(14)式においてK_dt,K_st,mtはこのモデルにおけるパラメータであり、α,βは上記(10),(11)式で表わされる。これらのパラメータも第1実施形態と同様に関数フィッティングによって推定される。なお、輝度値が多チャンネルで得られている場合にも、第1実施形態と同様にそれぞれのチャンネルに関する輝度分布を別々に推定すればよい。 Jt (θ, α, β) = K _dt cosα + K _st (1 / cosβ) exp (-θ ² / mt) (14)
In Equation (14), K _dt , K _st , and mt are parameters in this model, and α and β are expressed by Equations (10) and (11) above. These parameters are also estimated by function fitting as in the first embodiment. Even when the luminance values are obtained with multiple channels, the luminance distribution for each channel may be estimated separately as in the first embodiment.

以上、対象物体の輝度分布特性を構成する輝度分布関数の数Tが既知である場合(上記例ではT=2)について説明したが、以下、Tが未知である場合の推定例について説明する。Tが未知である場合、まず複数のTを設定して推定を行い、それらの中で、分布が最もよく分離しているものを選べばよい。この場合の輝度推定工程S1210における処理を、図9(b)を用いて説明する。図9(b)において、初期化工程S1211、データ割り当て工程S1212、近似工程S1213における処理内容は、図9(a)の場合と同様である。すなわち図9(b)では図9(a)と同様のS1211〜S1213の処理を、複数種類のTのバリエーションT=1,…,Tmaxに対して行った後、分離度評価工程S1214を行うことを特徴とする。なお、Tmaxは対象物体を構成する色の上限数であり、例えばTmax=5等を設定する。 The case where the number T of the luminance distribution functions constituting the luminance distribution characteristic of the target object is known has been described above (T = 2 in the above example). Hereinafter, an estimation example in the case where T is unknown will be described. When T is unknown, first, estimation is performed by setting a plurality of T, and among them, the one with the best separation is selected. The process in the luminance estimation step S1210 in this case will be described with reference to FIG. In FIG. 9 (b), the processing contents in the initialization step S1211, the data allocation step S1212, and the approximation step S1213 are the same as those in FIG. 9 (a). That is, in FIG. 9B, the processing of S1211 to S1213 similar to FIG. 9A is performed for a plurality of T variations T = 1,..., Tmax, and then the separation degree evaluation step S1214 is performed. It is characterized by. Tmax is the upper limit number of colors constituting the target object, and for example, Tmax = 5 is set.

分離度評価工程S1214では、各Tに関する分離評価値λ_Tを、以下の(15),(16),(17)式のように定義する。 In the separation degree evaluation step S1214, the separation evaluation value λ _T for each T is defined as in the following equations (15), (16), and (17).

なお、(17)式におけるεtは推定値に対する二乗誤差であり、(16)式におけるζtは推定値を中心とした歪度を表す。各輝度分布関数Jtに割り当てられているデータ集合StがJtに対して正規分布に近い形になるほど、歪度ζtの値は0に近づく。したがってこの場合、分離評価値λ_Tの値が最も小さくなるTの値を、輝度分布関数の数Tの推定値とする。 In the equation (17), εt is a square error with respect to the estimated value, and ζt in the equation (16) represents a skewness centered on the estimated value. As the data set St assigned to each luminance distribution function Jt becomes closer to a normal distribution with respect to Jt, the value of the skewness ζt approaches 0. Therefore, in this case, the value of _{T with} the smallest separation evaluation value λ _T is set as the estimated value of the number T of luminance distribution functions.

以上のように、対象物体の輝度分布関数の数Tが既知または未知のいずれの場合であっても、輝度推定工程S1210で輝度分布関数が推定されると、次に画像生成工程S1220で、該推定された輝度分布関数に基づく学習画像が生成される。以下、第2実施形態における学習画像の生成処理について説明する。学習画像を生成するにあたり、まず、対象物体の各部分に輝度分布関数を対応付ける必要があるが、この対応付けは以下のような方法で行うことができる。 As described above, even if the number T of the luminance distribution functions of the target object is known or unknown, if the luminance distribution function is estimated in the luminance estimation step S1210, then in the image generation step S1220, A learning image based on the estimated luminance distribution function is generated. The learning image generation process in the second embodiment will be described below. In generating a learning image, first, it is necessary to associate a luminance distribution function with each part of the target object. This association can be performed by the following method.

例えば、輝度分布の拡散反射成分の大小比較によって、自動的に対応付けを行うことができる。上記(12)式に示す近似式であれば、θの値が大きい部分(例えばθ=1rad)における輝度値の大小によって、明るい材質の部分と暗い材質の部分のそれぞれに対応する輝度分布関数を判断し、対応付けることができる。また、上記(14)式に示す近似式であれば、パラメータK_dtが拡散反射成分の強さを表しているため、その大小によって輝度分布関数を対応付けてもよい。 For example, the correlation can be automatically performed by comparing the diffuse reflection components of the luminance distribution. In the approximate expression shown in the above equation (12), the luminance distribution function corresponding to each of the bright material portion and the dark material portion is determined depending on the luminance value in the portion where the value of θ is large (for example, θ = 1 rad). Judgment and association can be made. Further, in the approximate expression shown in the above expression (14), the parameter K _dt represents the intensity of the diffuse reflection component, and therefore the luminance distribution function may be associated with the magnitude thereof.

また、対象物体のカラー画像等に対し多チャンネルの輝度分布関数が推定されている場合には、特徴のあるチャンネルの拡散反射成分を比較してもよい。例えば、赤い部位と緑色の部位を対応付ける場合、RチャンネルとGチャンネルにおける拡散反射成分の強さを比較することで、容易に対応づけることができる。 Further, when a multi-channel luminance distribution function is estimated for a color image or the like of the target object, the diffuse reflection components of characteristic channels may be compared. For example, when a red part and a green part are associated, they can be easily associated by comparing the intensity of diffuse reflection components in the R channel and the G channel.

また、事前取得画像において複数点をユーザに指定させ、該指定点に対応する画素が寄与する輝度分布関数を検出することで、対応付けを行ってもよい。例えば図11に示すように、対象物体の山積み状態を示す事前取得画像T500をGUIに表示し、明輝度部位の一部である部位T100と、暗輝度部位の一部である部位T200を、カーソルT300で指定させる。この指定により、図12に示すように輝度分布関数の対応付けがなされる。すなわち、指定した部位T100における観測データが、図12におけるC100であり、同じく部位T200における観測データがC200であったとする。図12において、観測データC100とC200のそれぞれが曲線B210,B220で示される輝度分布関数に属することが、上記(13)式によって判定できる。 Further, the user may designate a plurality of points in the pre-acquired image, and the association may be performed by detecting the luminance distribution function contributed by the pixel corresponding to the designated point. For example, as shown in FIG. 11, a pre-acquired image T500 indicating a piled state of target objects is displayed on the GUI, and a part T100 that is a part of a bright luminance part and a part T200 that is a part of a dark luminance part are displayed with a cursor. Specify with T300. By this designation, the luminance distribution functions are associated as shown in FIG. That is, it is assumed that the observation data at the designated part T100 is C100 in FIG. 12, and the observation data at the part T200 is C200. In FIG. 12, it can be determined from the above equation (13) that the observation data C100 and C200 belong to the luminance distribution functions indicated by the curves B210 and B220, respectively.

画像生成工程S1220では、以上のように対象物体の各部分と輝度分布関数の対応づけがなされれば、以降は第1実施形態と同様に学習画像を生成することができる。なお、その後の学習工程S2000で学習画像を用いて辞書を生成する処理については、第1実施形態と同様であるため説明を省略する。 In the image generation step S1220, if each part of the target object is associated with the luminance distribution function as described above, a learning image can be generated thereafter as in the first embodiment. In addition, since the process which produces | generates a dictionary using a learning image in subsequent learning process S2000 is the same as that of 1st Embodiment, description is abbreviate | omitted.

以上説明したように第2実施形態によれば、対象物体表面が複数色によって構成される場合であっても、その表面輝度を近似した学習画像を生成することができる。 As described above, according to the second embodiment, even when the target object surface is composed of a plurality of colors, a learning image approximating the surface luminance can be generated.

＜第3実施形態＞
以下、本発明に係る第3実施形態について説明する。上述した第1および第2実施形態では、対象物体の表面性状が表面法線方向に対して安定しているものとして説明を行ったが、この表面性状は必ずしも安定しているとは限らない。例えば、対象物体表面に梨地のような加工がなされていれば、同一方向を向いた対象物体の面であっても、部分によって輝度値が変化する。また意図的な表面加工がなくても、成型時の型の粗さ等によって同様に輝度値の変化が発生する場合がある。このような表面正常が不安定な対象物体に対し、その輝度値の不安定さを再現するために学習画像生成時にノイズを加えることが考えられる。第3実施形態ではすなわち、学習画像生成時に加えるノイズを考慮して、対象物体の輝度分布を推定することを特徴とする。 <Third embodiment>
The third embodiment according to the present invention will be described below. In the first and second embodiments described above, the surface texture of the target object has been described as being stable with respect to the surface normal direction. However, the surface texture is not necessarily stable. For example, if the surface of the target object is processed like a satin finish, the luminance value changes depending on the portion even on the surface of the target object facing the same direction. Even if there is no intentional surface treatment, the luminance value may change in the same manner depending on the roughness of the mold at the time of molding. In order to reproduce the instability of the luminance value of such a target object with unstable surface normality, it is conceivable to add noise at the time of learning image generation. That is, the third embodiment is characterized in that the luminance distribution of the target object is estimated in consideration of noise added at the time of learning image generation.

●辞書生成処理(学習処理)
第3実施形態においても第1実施形態と同様に、後述するように生成された学習画像から、対象物体を検出するための辞書を生成するが、この辞書生成処理も図3(b)のフローチャートに従う。図3(b)においてモデル設定工程S1000、画像取得工程S1100、観測データ分布取得工程S1130については第1実施形態と同様であるため、説明を省略する。 Dictionary generation process (learning process)
Also in the third embodiment, as in the first embodiment, a dictionary for detecting the target object is generated from a learning image generated as described later. This dictionary generation processing is also a flowchart of FIG. 3 (b). Follow. In FIG. 3B, the model setting step S1000, the image acquisition step S1100, and the observation data distribution acquisition step S1130 are the same as those in the first embodiment, and thus the description thereof is omitted.

第3実施形態の輝度推定工程S1210においては、事前取得画像から得られた観測データ分布に基づき、対象物体の表面輝度を以下のように推定する。 In the luminance estimation step S1210 of the third embodiment, the surface luminance of the target object is estimated as follows based on the observation data distribution obtained from the previously acquired image.

輝度分布を以下の(18)(19)(20)式に示すような線形ガウスカーネルモデルy(θ,w)で表わすことを考える。 Consider that the luminance distribution is expressed by a linear Gaussian kernel model y (θ, w) as shown in the following equations (18), (19), and (20).

y(θ,w)＝w^TΦ(θ) ・・・(18)
w＝(w₁,…,w_h,…,w_M)^T ・・・(19)
Φ(θ)＝(φ₁(θ),…,φ_h(θ),…,φ_M(θ))^T ・・・(20)
ここで、θは図4で説明した表面法線40の方向ベクトルNと反射中心軸30の方向ベクトルHのなす角50である。またMはカーネルの数であって、ユーザによって定義される。角度θは0〜90degの範囲で収まるため、例えばM=9と設定すれば、10deg程度の間隔でカーネルを配置することができる。また、パラメータwはM次元ベクトルであり、その要素w_h(h=1,…,M)は正の実数値である。ベクトルφはM次元ベクトルであり、その要素φ_h(h=1,…,M)は(21)式のように定義されるガウスカーネルである。 y (θ, w) = w ^T Φ (θ) (18)
w = (w ₁ ,…, w _h ,…, w _M ) ^T (19)
Φ (θ) = (φ ₁ (θ), ..., φ _h (θ), ..., φ _M (θ)) ^T ... (20)
Here, θ is an angle 50 formed by the direction vector N of the surface normal 40 described in FIG. 4 and the direction vector H of the reflection central axis 30. M is the number of kernels and is defined by the user. Since the angle θ falls within the range of 0 to 90 deg, for example, if M = 9 is set, the kernels can be arranged at intervals of about 10 deg. The parameter w is an M-dimensional vector, and the element w _h (h = 1,..., M) is a positive real value. The vector φ is an M-dimensional vector, and its element φ _h (h = 1,..., M) is a Gaussian kernel defined as in equation (21).

φ_h(θ)＝exp{-(θ-μ_h)/2S²} ・・・(21)
ここで、μ_hはガウスカーネルφ_hの中心位置を表す。カーネル中心μ_hは角度θの定義域内に配置すればよく、例えばM=9と定義した場合には、9degごとにμ_hを設定してもよい。 φ _h (θ) = exp {-(θ-μ _h ) / 2S ² } (21)
Here, mu _h represents the central position of the Gaussian kernel phi _h. The kernel center μ _h may be arranged within the definition area of the angle θ. For example, when M = 9 is defined, μ _h may be set every 9 degrees.

このような線形ガウスカーネルモデルで輝度分布を近似した場合、以下の(22)式のように定義される予測輝度分布を考える。 When the luminance distribution is approximated by such a linear Gaussian kernel model, a predicted luminance distribution defined as the following equation (22) is considered.

p(J|R,θ)＝∫p(J|w,θ)p(w|R,θ)dw ・・・(22)
ここで、Rは観測輝度値の集合ベクトルであり、観測画素の総数がNである場合には以下の(23)式に示すような列ベクトルで表わされる。 p (J | R, θ) = ∫p (J | w, θ) p (w | R, θ) dw (22)
Here, R is a set vector of observed luminance values, and when the total number of observed pixels is N, it is represented by a column vector as shown in the following equation (23).

R＝(J₁,…,J_j,…,J_N)^T ・・・(23)
ここで、J_jは観測データの画素jにおける輝度値の観測値である。 R = (J ₁ , ..., J _j , ..., J _N ) ^T ... (23)
Here, J _j is an observed value of the luminance value at pixel j of the observed data.

上記(22)式の右辺第一項は輝度値の条件付き分布であって、以下の(24)式に示すような正規分布で表現される。 The first term on the right side of the above equation (22) is a conditional distribution of luminance values, and is represented by a normal distribution as shown in the following equation (24).

p(J|w,θ)＝N(J|y(θ,w),ε²) ・・・(24)
ここで、εは精度パラメータであり、以下の(25)式に示すような、第1および第2実施形態における推定輝度関数J(θj)と観測輝度値Jjとの二乗誤差平均を用いる。なお、(25)式においてΣはjについての総和を示す。 p (J | w, θ) = N (J | y (θ, w), ε ² ) (24)
Here, ε is an accuracy parameter, and a mean square error between the estimated luminance function J (θj) and the observed luminance value Jj in the first and second embodiments as shown in the following equation (25) is used. In the equation (25), Σ represents the total sum for j.

ε²＝(1/N)Σ{Jj-J(θj)}² ・・・(25)
上記(24)式を重みwの尤度関数と考え、この共役事前分布を、以下の(26)式に示すような期待値m₀、共分散S₀を持つガウス分布であるとする。 ε ² = (1 / N) Σ {Jj-J (θj)} ² ... (25)
The above equation (24) is considered as a likelihood function of the weight w, and this conjugate prior distribution is assumed to be a Gaussian distribution having an expected value m ₀ and a covariance S ₀ as shown in the following equation (26).

p(w)＝N(w|m₀,S₀) ・・・(26)
このとき、事後分布である(22)式の右辺第二項は、以下の(27)(28)(29)式に示すような正規分布で表わすことができる。 p (w) = N (w | m ₀ , S ₀ ) (26)
At this time, the second term on the right side of the posterior distribution (22) can be represented by a normal distribution as shown in the following expressions (27), (28), and (29).

p(w|R,θ)＝N(w|m_N,S_N) ・・・(27)
m_N＝S_N(S₀ ^-1m₀+1/ε²Φ^TR) ・・・(28)
S_N ^-1＝S₀ ^-1+1/ε²Φ^TΦ ・・・(29)
ここで、Φは計画行列と呼ばれ、カーネルと観測データから以下の(30)式のように決定される。 p (w | R, θ) = N (w | m _N , S _N ) (27)
m _N = S _N (S ₀ ^-1 m ₀ + 1 / ε ² Φ ^T R) (28)
S _N ^-1 = S ₀ ^-1 + 1 / ε ² Φ ^T Φ (29)
Here, Φ is called a design matrix and is determined from the kernel and observation data as shown in the following equation (30).

このとき、最小二乗法に沿って上記(18)式の線形ガウスカーネルモデルを近似すると、最終的に(22)式の予測輝度分布は以下の(31)(32)式のように表わされることが知られている。 At this time, if the linear Gaussian kernel model of the above equation (18) is approximated along the least square method, the predicted luminance distribution of the equation (22) is finally expressed as the following equations (31) and (32): It has been known.

p(J|R,θ)＝N(J|m_N ^Tφ(θ),σ_N ²(θ)) ・・・(31)
σ_N ²(θ)＝ε²+φ(θ)^TS_Nφ(θ) ・・・(32)
なお、(32)式は予測輝度分布の分散であって、その平方根であるσ_N(θ)は予測輝度分布の標準偏差である。 p (J | R, θ) = N (J | m _N ^T φ (θ), σ _N ² (θ)) (31)
σ _N ² (θ) = ε ² + φ (θ) ^T S _N φ (θ) (32)
Equation (32) is the variance of the predicted luminance distribution, and σ _N (θ), which is the square root, is the standard deviation of the predicted luminance distribution.

以上のように、輝度推定工程S1210で対象物体の予測輝度分布が推定されると、次に画像生成工程S1220で、該推定された予測輝度分布に基づく学習画像が生成される。第3実施形態における学習画像の生成は第1実施形態と同様に、各姿勢におけるCADモデルを射影変換したときの各画素に対応するモデル上の法線方向に対して輝度値を算出し、画素に輝度値を与えることで、学習画像を生成する。具体的には、生成する学習画像における画素kに射影される面の法線方向と反射中心軸とのなす角θkに対して、上記(31)(32)式より、予測分布p(J｜R,θk)を得ることができる。 As described above, when the predicted luminance distribution of the target object is estimated in the luminance estimation step S1210, in the next image generation step S1220, a learning image based on the estimated predicted luminance distribution is generated. As in the first embodiment, the generation of the learning image in the third embodiment calculates a luminance value with respect to the normal direction on the model corresponding to each pixel when the CAD model in each posture is projectively transformed, and the pixel A learning image is generated by giving a luminance value to. Specifically, with respect to the angle θk formed by the normal direction of the surface projected onto the pixel k in the generated learning image and the reflection central axis, the predicted distribution p (J | R, θk) can be obtained.

ここで図13に、第3実施形態において得られる輝度値の予測分布例を示す。図13において、実線B300は変数θにおける予測分布の中心であり、破線B310および破線B320は、上記(32)式にθを引数として与えて得られる標準偏差σ_N(θ)で表わされる予測分布の幅を示す。また、曲線B330はθkにおける輝度方向の予測分布を示し、その幅B340はは標準偏差σ_N(θ)である。第3実施形態では、この予測輝度分布に従った乱数を発生させ、得られた値を画素kに与えてやることで、学習画像を生成することができる。 Here, FIG. 13 shows an example of a predicted distribution of luminance values obtained in the third embodiment. In FIG. 13, the solid line B300 is the center of the prediction distribution in the variable θ, and the broken line B310 and the broken line B320 are the prediction distributions represented by the standard deviation σ _N (θ) obtained by giving θ as an argument to the above equation (32). Indicates the width. A curve B330 shows the predicted distribution in the luminance direction at θk, and its width B340 is the standard deviation σ _N (θ). In the third embodiment, a learning image can be generated by generating a random number according to the predicted luminance distribution and giving the obtained value to the pixel k.

以上説明したように第3実施形態によれば、対象物体表面における画素ごとの輝度値を、該対象物体について推定された輝度分布の分散から決定することで、対象物体の表面輝度のばらつきまで考慮した学習画像を生成することができる。 As described above, according to the third embodiment, the luminance value for each pixel on the target object surface is determined from the variance of the luminance distribution estimated for the target object, thereby taking into account variations in the surface luminance of the target object. Learning images can be generated.

＜第4実施形態＞
以下、本発明に係る第4実施形態について説明する。上述した第1実施形態では、対象物体表面の輝度を上記(3)式や(9)式による輝度分布モデルで近似する例を示した。第4実施形態ではさらに、輝度分布モデルについてのパラメータ(輝度分布パラメータ)として複数のパラメータ候補を用意する。そして、各パラメータ候補に基づいて対象物体の認識用の辞書を作成し、実写による入力画像に対して該辞書を適用した認識結果を評価値として、最適なパラメータ候補を選択する例を示す。ここで輝度分布パラメータとは、(3)式におけるCとm、あるいは(9)式におけるK_d,K_s,mである。 <Fourth embodiment>
The fourth embodiment according to the present invention will be described below. In the first embodiment described above, an example has been shown in which the luminance of the surface of the target object is approximated by the luminance distribution model according to the above equations (3) and (9). In the fourth embodiment, a plurality of parameter candidates are prepared as parameters (luminance distribution parameters) for the luminance distribution model. An example in which a dictionary for recognizing a target object is created based on each parameter candidate, and an optimal parameter candidate is selected using a recognition result obtained by applying the dictionary to an input image obtained by actual shooting as an evaluation value. Here, the luminance distribution parameter is C and m in equation (3) or K _d , K _s , m in equation (9).

第4実施形態における画像認識処理を行うための基本構成を図14に示す。同図において、第1実施形態で示した図1(a)と同様の構成には同一番号を付し、説明を省略する。すなわち第4実施形態では、図1(a)に示す構成から輝度推定用の構成を除き、パラメータ設定部1230と選択部2110を加えたことを特徴とする。 FIG. 14 shows a basic configuration for performing image recognition processing in the fourth embodiment. In the same figure, the same components as those in FIG. 1A shown in the first embodiment are denoted by the same reference numerals, and description thereof is omitted. That is, the fourth embodiment is characterized in that a parameter setting unit 1230 and a selection unit 2110 are added except for the configuration for luminance estimation from the configuration shown in FIG.

●辞書生成処理(学習処理)
第4実施形態における辞書生成処理は、図15(a)のフローチャートに従う。なお、図15(a)においてモデル設定工程S1000、画像取得工程S1100では第1実施形態と同様に、対象物体のCADモデルと複数の事前取得画像が設定される。 Dictionary generation process (learning process)
The dictionary generation process in the fourth embodiment follows the flowchart of FIG. In FIG. 15A, in the model setting step S1000 and the image acquisition step S1100, the CAD model of the target object and a plurality of pre-acquired images are set as in the first embodiment.

画像生成パラメータ候補設定工程S1215では、学習画像を生成するための画像生成パラメータの候補をKパターン用意する。ここで画像生成パラメータとはすなわち、上述した第1実施形態で推定される輝度分布パラメータである。 In an image generation parameter candidate setting step S1215, K pattern candidates for image generation parameters for generating a learning image are prepared. Here, the image generation parameter is a luminance distribution parameter estimated in the first embodiment described above.

すると画像生成工程S1220では、用意されたKパターンの画像生成パラメータ候補のそれぞれに対応した学習画像を、第1実施形態と同様の方法によって生成する。以下、全Kパターンのうち、k番目の画像生成パラメータ候補によって生成された様々な姿勢の学習画像の集合を、学習画像集合Skとする。そして学習工程S2000では、K個の学習画像集合Sk(k=1,…,K)のそれぞれを用いて、K個の辞書を生成する。 Then, in the image generation step S1220, learning images corresponding to the prepared K pattern image generation parameter candidates are generated by the same method as in the first embodiment. Hereinafter, among all the K patterns, a set of learning images having various postures generated by the kth image generation parameter candidate is referred to as a learning image set Sk. In the learning step S2000, K dictionaries are generated using each of the K learning image sets Sk (k = 1,..., K).

そして選択工程S2110において、生成されたK個の辞書を用いて、S1100で設定された全ての事前取得画像に対する評価を行い、該評価結果に基づいて最適な辞書および画像生成パラメータ候補を選択する。 In the selection step S2110, evaluation is performed on all the pre-acquired images set in S1100 using the generated K dictionaries, and optimal dictionaries and image generation parameter candidates are selected based on the evaluation results.

ここで、選択工程S2110の詳細を図15(b)のフローチャートに示し、説明する。 Here, details of the selection step S2110 will be described with reference to the flowchart of FIG.

まず認識工程S2111において、第1実施形態で説明したランタイム時の認識処理(S3100)と同様に、事前取得画像に対して辞書を用いた認識処理を行い、事前取得画像中の対象物体の位置および姿勢の推定値を算出する。 First, in the recognition step S2111, similarly to the recognition process at runtime described in the first embodiment (S3100), the recognition process using the dictionary is performed on the pre-acquired image, and the position of the target object in the pre-acquired image and Calculate the estimated value of the posture.

次に評価工程S2112では、認識工程S2111にて得られた認識結果を以下のように評価する。まず、認識結果として得られた推定位置および姿勢に基づき、S1000で設定されたモデルから対象物体のCG画像を生成する。このとき、推定位置および姿勢に対して直接CG画像を生成してもよいが、周知のトラッキング技術を用いてより詳細にマッチングした結果に対してCG画像を生成してもよい。すなわち、認識結果として得られた対象物体の推定位置および姿勢を初期値として、既知のトラッキング技術を用いて事前取得画像に対してより詳細にマッチングさせた後の推定位置および姿勢を用いて、CG画像を生成することも可能である。そして、生成されたCG画像と事前取得画像に対してエッジ抽出処理をかけて2値化し、双方で得られたエッジ点位置を比較して距離を算出し、その和、もしくは2乗和による検出誤差を、評価値として算出する。すなわち評価工程S2112では、認識結果およびモデル情報から生成されるモデル画像(CG画像)と、事前取得画像における対象物体の画像との対応部位における差分に基づいて、該認識結果の評価値を算出する。 Next, in the evaluation step S2112, the recognition result obtained in the recognition step S2111 is evaluated as follows. First, a CG image of the target object is generated from the model set in S1000 based on the estimated position and orientation obtained as a recognition result. At this time, a CG image may be directly generated for the estimated position and orientation, but a CG image may be generated for a result of more detailed matching using a well-known tracking technique. That is, the estimated position and orientation of the target object obtained as a recognition result are used as initial values, and the estimated position and orientation after matching in advance with a pre-acquired image using a known tracking technique are used to calculate CG. It is also possible to generate an image. Then, the generated CG image and the pre-acquired image are binarized by edge extraction processing, and the distance is calculated by comparing the edge point positions obtained by both, and the sum or square sum is detected. The error is calculated as an evaluation value. That is, in the evaluation step S2112, the evaluation value of the recognition result is calculated based on the difference in the corresponding part between the model image (CG image) generated from the recognition result and model information and the image of the target object in the pre-acquired image. .

あるいは、距離画像を利用して、距離残差による評価を行ってもよい。すなわち、認識結果として得られた推定位置および姿勢に基づき、モデルからその位置および姿勢における部品表面の距離を算出する。そして、事前取得画像に対応する距離情報と比較し、その物体表面における距離の残差の和もしくは2乗和による誤差を、評価値として算出する。 Alternatively, evaluation using a distance residual may be performed using a distance image. That is, based on the estimated position and orientation obtained as a recognition result, the distance of the part surface at the position and orientation is calculated from the model. Then, by comparing with distance information corresponding to the pre-acquired image, an error due to the sum of the residual distance or the sum of squares on the object surface is calculated as an evaluation value.

あるいは、学習画像と入力画像の類似度を評価してもよい。その場合、認識結果における推定位置および姿勢に基づいて生成されたCG画像と、事前取得画像における対象物体の存在領域の類似度を正規化相関等で比較し、その類似度を評価値として算出する。 Alternatively, the similarity between the learning image and the input image may be evaluated. In that case, the similarity between the CG image generated based on the estimated position and orientation in the recognition result and the existing area of the target object in the pre-acquired image is compared using normalized correlation etc., and the similarity is calculated as an evaluation value. .

さらに、生成されたCG画像と、入力された事前取得画像をユーザが目視で確認し、その差異を評価してもよい。例えば、位置ずれや姿勢ずれの誤差を数段階(例えば5段階程度)に定義し、主観的にユーザが入力して評価値とする。また、上記様々な評価値の組み合わせ、例えば線形和等によって一つの評価値としてもよい。 Furthermore, the user may visually check the generated CG image and the input pre-acquired image and evaluate the difference. For example, an error of positional deviation or posture deviation is defined in several stages (for example, about five stages), and the user inputs subjectively as an evaluation value. Moreover, it is good also as one evaluation value by the combination of the said various evaluation values, for example, a linear sum.

評価工程S2112では、以上のように各事前取得画像についてK個の辞書を用いた評価値が得られたら、辞書ごとに、全ての事前取得画像で得られた価値を累積加算した値を評価値とする。 In the evaluation step S2112, when evaluation values using K dictionaries are obtained for each pre-acquired image as described above, the value obtained by accumulating the values obtained from all the pre-acquired images is evaluated for each dictionary. And

すると辞書選択工程S2113において、S2112で算出された評価値が最も良い辞書を、最適な辞書として選択する。ここで良い評価値とは、評価値がエッジ誤差や距離残差等の検出誤差であればより小さい値、評価値が類似度であればより大きい値であり、評価値の定義に依存する。 Then, in the dictionary selection step S2113, the dictionary having the best evaluation value calculated in S2112 is selected as the optimum dictionary. Here, a good evaluation value is a smaller value if the evaluation value is a detection error such as an edge error or a distance residual, and a larger value if the evaluation value is a similarity, and depends on the definition of the evaluation value.

画像生成パラメータ選択工程S2114では、辞書選択工程S2113にて選択された辞書を生成する際に用いられた画像生成パラメータ候補を、学習画像を生成するために最適な画像生成パラメータとして選択する。すなわち、画像生成工程S1220において上記選択された辞書に対応する学習画像集合Skを生成する際に用いられた画像生成パラメータ候補が選択される。 In the image generation parameter selection step S2114, the image generation parameter candidate used when generating the dictionary selected in the dictionary selection step S2113 is selected as an optimal image generation parameter for generating a learning image. That is, the image generation parameter candidate used when generating the learning image set Sk corresponding to the selected dictionary in the image generation step S1220 is selected.

以上説明したように第4実施形態によれば、実際の認識結果を評価することによって、最適な辞書を生成するための最適な学習画像を生成するパラメータ(輝度分布パラメータ)を決定することができる。したがって、該決定されたパラメータに基づいて生成された学習画像を用いて最適な辞書を作成することで、該辞書を用いた最適な認識処理を行うことができる。 As described above, according to the fourth embodiment, it is possible to determine a parameter (luminance distribution parameter) for generating an optimal learning image for generating an optimal dictionary by evaluating an actual recognition result. . Therefore, by creating an optimal dictionary using the learning image generated based on the determined parameters, optimal recognition processing using the dictionary can be performed.

＜他の実施形態＞
なお、上述した第1〜第4実施形態においては、ランタイム時の入力画像を事前取得画像として用いてもよい。そうすることで、ランタイム時の環境変化に対して、動的に適切な学習画像を生成することが可能となる。 <Other embodiments>
In the first to fourth embodiments described above, an input image at runtime may be used as a pre-acquired image. By doing so, it becomes possible to dynamically generate a learning image that is dynamically suitable for environmental changes during runtime.

また、本発明は、上述した実施形態の機能(例えば、上記の各部の処理を各工程に対応させたフローチャートにより示される処理)を実現するソフトウェアのプログラムコードを記録した記憶媒体を、システム或いは装置に供給することによっても実現できる。この場合、そのシステム或いは装置のコンピュータ(又はCPUやMPU)が、コンピュータが読み取り可能に記憶媒体に格納されたプログラムコードを読み出し実行することにより、上述した実施形態の機能を実現する。 Further, the present invention provides a storage medium storing a program code of software for realizing the functions of the above-described embodiments (for example, processing shown by a flowchart in which processing of each unit described above is associated with each process), a system or an apparatus It can also be realized by supplying to. In this case, the function of the above-described embodiment is realized by the computer (or CPU or MPU) of the system or apparatus reading and executing the program code stored in the storage medium so that the computer can read it.

１１２０：事前取得画像記憶部１１１０：画像取得部１０２０：モデル記憶部１０１０：モデル設定部１１３０：観測データ分布取得部１２１０：輝度推定部１２２０：画像生成部２０１０：学習画像記憶部３０１０：辞書設定部２２００：辞書記憶部２１００：学習部２０２０：学習画像設定部３０２０：入力画像取得部３１００：認識部３２００：認識結果出力部 1120: Prior acquisition image storage unit 1110: Image acquisition unit 1020: Model storage unit 1010: Model setting unit 1130: Observation data distribution acquisition unit 1210: Luminance estimation unit 1220: Image generation unit 2010: Learning image storage unit 3010: Dictionary setting unit 2200: Dictionary storage unit 2100: Learning unit 2020: Learning image setting unit 3020: Input image acquisition unit 3100: Recognition unit 3200: Recognition result output unit

Claims

An image processing apparatus for generating a learning image of an object used for creating a dictionary referred to in an image recognition process for detecting the object,
First acquisition means for acquiring a luminance value of the surface of the object in a plurality of regions of the luminance image from a luminance image including the plurality of objects having different postures ;
Second acquisition means for acquiring information relating to the orientation of the surface of the object in the plurality of regions from which the luminance values are acquired by the first acquisition means;
Correspondence relationship between the brightness value of the surface of the object in the plurality of areas acquired by the first acquisition means and information related to the orientation of the surface of the object in the plurality of areas acquired by the second acquisition means A third acquisition means for acquiring
A correspondence relationship acquired by the third acquisition means, based on the model information of the object, and generating means for generating a training image of the object,
An image processing apparatus comprising:

The first acquisition means acquires a plurality of luminance images including a plurality of the objects having different postures, acquires a luminance value of the surface of the object in a plurality of regions from each luminance image,
The second acquisition means acquires information relating to the orientation of the surface of the object in the plurality of regions for each of the luminance images,
The third acquisition unit is configured to acquire, for each of the luminance images, a correspondence relationship between information relating to the orientation of the surface of the object in the plurality of regions and a luminance value corresponding to each region. The image processing apparatus according to claim 1.

The second acquisition unit acquires distance information with respect to the object , and acquires information regarding the orientation of the surface of the object in the plurality of regions based on the acquired distance information. Or the image processing apparatus of 2.

The second acquisition means acquires distance information of each pixel of the luminance image, and acquires information related to the orientation of the surface of the object in the plurality of regions based on the acquired distance information. The image processing apparatus according to claim 3.

Luminance image including a plurality of objects having the different posture, the image according to any one of claims 1 to 4, wherein the plurality of objects are captured image in a state of being arranged randomly Processing equipment.

Luminance image including a plurality of objects having the different posture, the image processing apparatus according to any one of claims 1 to 5, wherein the plurality of objects are captured image in a state of being piled .

Based on the correspondence acquired by the third acquisition means, further comprising an estimation means for estimating a luminance distribution on the surface of the object;
The generation unit, based on the model information of the object and luminance distribution estimated by the estimating means, the image processing according to any one of claims 1 to 6, characterized in that to generate the learning image apparatus.

The image processing apparatus according to claim 7 , wherein the generation unit determines a luminance value according to information related to a direction of a surface of the object from a variance of the luminance distribution.

The image processing apparatus according to claim 7 , wherein the estimation unit estimates the luminance distribution based on the correspondence acquired by the third acquisition unit and a predetermined luminance distribution model.

The estimation means includes a means for acquiring a plurality of luminance distribution models, an assigning means for assigning the correspondence acquired by the third acquisition means to the plurality of luminance distribution models, and the plurality of luminance distribution models as the luminance distribution. The image processing apparatus according to claim 9 , further comprising an update unit configured to update based on a correspondence relationship assigned to the model.

The assigning means is the brightness distribution in which a brightness value obtained by inputting information relating to the orientation of the surface of the object indicated by the correspondence acquired by the third acquisition means is closest to the brightness value obtained by the correspondence The image processing apparatus according to claim 10 , wherein the correspondence relationship is assigned to a model.

11. The apparatus according to claim 10 , further comprising: a unit that causes the allocation unit and the update unit to repeatedly perform processing until the allocation destinations of the correspondence relationship acquired by the third acquisition unit before and after the update match. The image processing apparatus described.

The generating means performs projective transformation corresponding to a plurality of postures of the object on the model information of the object, and in the orientation of the surface of the object indicated by the model information after the projective transformation, giving a brightness value corresponding to the orientation of the surface, the image processing apparatus according to any one of claims 1 to 12, characterized in that to generate the learning images for each of the posture.

The model information is CAD data, the learning image is an image processing apparatus according to any one of claims 1 to 13, characterized in that a CG image.

The image processing according to any one of claims 1 to 14 , wherein the information related to the orientation of the surface of the object acquired by the second acquisition means is information on a normal of the surface of the object. apparatus.

The image processing apparatus according to any one of claims 1 to 15, further comprising a distance image generating means for generating a distance image from the model information of the said luminance image object.

The image processing apparatus according to claim 16 , wherein the distance image represents a position of a surface of the object.

Based on the generated training image by the generation unit, an image processing apparatus according to any one of claims 1 to 17, further comprising means for generating a classifier for recognizing said object .

Means for acquiring an image including the object;
The image processing apparatus according to claim 18 , further comprising a recognition unit configured to recognize the object from an image including the object based on the classifier.

Luminance image including a plurality of objects having the different posture, the image processing apparatus according to any one of claims 1 to 19, wherein the a recognized image taken under the same lighting conditions of the object.

An image processing method for generating a learning image of an object used for creating a dictionary referred to in image recognition processing for detecting an object,
A first acquisition step of acquiring a luminance value of the surface of the object in a plurality of regions of the luminance image from a luminance image including the plurality of objects having different postures ;
A second acquisition step of acquiring information relating to the orientation of the surface of the object in the plurality of regions from which the luminance value is acquired in the first acquisition step;
Correspondence relationship between the luminance value of the surface of the object in the plurality of areas acquired in the first acquisition step and information related to the orientation of the surface of the object in the plurality of areas acquired in the second acquisition step A third acquisition step of acquiring
A correspondence relationship acquired by the third acquisition step, based on the model information of the object, a generation step of generating a training image of the object,
An image processing method comprising:

Further comprising the step of inputting the luminance image;
In the first acquisition step, the luminance value of the surface of the object in each of the plurality of regions of the input luminance image is acquired;
The image processing method according to claim 21 , wherein in the second acquisition step, information relating to a direction of a surface of the object in each of the plurality of regions is acquired.

The distance information with respect to the object is acquired in the second acquisition step, and information related to the orientation of the surface of the object in the plurality of regions is acquired based on the acquired distance information. Or the image processing method of 22.

In the second acquisition step, distance information of each pixel of the luminance image is acquired, and information related to the orientation of the surface of the object in the plurality of regions is acquired based on the acquired distance information. The image processing method according to claim 23.

An image processing system for generating a learning image used for generating a dictionary referred to in object recognition,
Photographing means for photographing a luminance image including a plurality of the objects having different postures;
First acquisition means for acquiring a luminance value of the surface of the object in a plurality of regions of the luminance image from a luminance image including the plurality of objects having different postures ;
Second acquisition means for acquiring information relating to the orientation of the surface of the object in the plurality of regions from which the luminance values are acquired by the first acquisition means;
Correspondence relationship between the brightness value of the surface of the object in the plurality of areas acquired by the first acquisition means and information related to the orientation of the surface of the object in the plurality of areas acquired by the second acquisition means A third acquisition means for acquiring
A correspondence relationship acquired by the third acquisition means, based on the model information of the object, and generating means for generating a training image of the object,
An image processing system comprising:

By being executed by a computer device, a program to function as each unit of the image processing apparatus according to the computer device in any one of claims 1 to 20.