JP2008015895A

JP2008015895A - Image processor and image processing program

Info

Publication number: JP2008015895A
Application number: JP2006188074A
Authority: JP
Inventors: Akihiro Tsukada; 明宏塚田
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2006-07-07
Filing date: 2006-07-07
Publication date: 2008-01-24
Anticipated expiration: 2026-07-07
Also published as: JP4876742B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processor and image processing program for accurately determining the adequacy of the result of deducing the attitude of an object to be recognized. <P>SOLUTION: When the attitude of an object to be recognized is deduced, first of all, the attitude of the object is deduced, on the basis of a contour of the object extracted from a photographic image of the object. Then, the three-dimensional shape of the object is deduced, on the basis of the result of attitude deduction. The three-dimensional shape of the object is detected, and the object is restored in three dimensions. By collating the three-dimensional shape deduced, based on the result of the attitude deduction with the result of the three-dimensional restoration restored based on the result of the three-dimensional detection, validity of the result of the attitude deduction is decided. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、物体の撮像画像から物体の姿勢を推定する画像処理装置及び画像処理プログラムに関するものである。 The present invention relates to an image processing apparatus and an image processing program for estimating the posture of an object from a captured image of the object.

この種の画像処理装置としては、例えば下記非特許文献１に記載されているように、物体の輪郭を用いて姿勢を推定するものが知られている。
Hendrik P. A. Lensch,et al. ”’A Silhouette-Based Algorithm forTexture Registration and Stitching”’, Graphical Models 63.2001,pp.245-262. As this type of image processing apparatus, for example, as described in Non-Patent Document 1 below, an apparatus that estimates the posture using the contour of an object is known.
Hendrik PA Lensch, et al. “A Silhouette-Based Algorithm for Texture Registration and Stitching” ”, Graphical Models 63.2001, pp.245-262.

しかしながら、上記非特許文献１のように、単に二次元的な情報である物体の輪郭を用いて物体の姿勢を推定すると、物体の誤認識や姿勢の誤推定が起こる場合がある。 However, as in Non-Patent Document 1, when the posture of an object is estimated using the outline of the object, which is simply two-dimensional information, erroneous recognition of the object or incorrect posture estimation may occur.

そこで本発明の目的は、物体の姿勢推定結果の妥当性を的確に判断することができる画像処理装置及び画像処理プログラムを提供することである。 SUMMARY OF THE INVENTION An object of the present invention is to provide an image processing apparatus and an image processing program that can accurately determine the validity of an object posture estimation result.

本発明の画像処理装置は、物体の撮像画像から抽出した物体の輪郭に基づいて物体の姿勢を推定する姿勢推定手段と、姿勢推定手段による姿勢推定結果に基づいて物体の三次元形状を推定する３Ｄ推定手段と、物体の三次元形状を検出する検出手段と、３Ｄ推定手段によって推定された三次元形状と検出手段による検出結果とを照合することにより、姿勢推定手段による姿勢推定結果の妥当性を判断する判断手段と、を備えることを特徴とする。 An image processing apparatus according to the present invention estimates a posture of an object based on a contour of an object extracted from a captured image of the object, and estimates a three-dimensional shape of the object based on a posture estimation result obtained by the posture estimation unit. Validity of the posture estimation result by the posture estimation means by collating the 3D estimation means, the detection means for detecting the three-dimensional shape of the object, and the three-dimensional shape estimated by the 3D estimation means and the detection result by the detection means And determining means for determining.

単に、物体の輪郭に基づいて物体の姿勢を推定すると、２次元上における推定誤差は少なくても、三次元上の物体の占める領域としては推定誤差が大きい場合がある。本発明では、物体の輪郭に基づいて推定された姿勢推定結果を利用して物体の三次元形状を推定し、推定した三次元形状を別に得られた物体の三次元形状の検出結果と照合することによって、物体の姿勢推定結果の妥当性を三次元的に判断する。従って、物体の姿勢推定結果の妥当性を的確に判断することができる。 If the posture of the object is simply estimated based on the contour of the object, the estimation error may be large as a region occupied by the object in three dimensions even if the estimation error in two dimensions is small. In the present invention, the three-dimensional shape of the object is estimated using the posture estimation result estimated based on the contour of the object, and the estimated three-dimensional shape is collated with the detection result of the three-dimensional shape of the object obtained separately. Thus, the validity of the object posture estimation result is determined three-dimensionally. Therefore, it is possible to accurately determine the validity of the object posture estimation result.

好ましくは、判断手段は、物体における予め設定された照合対象領域について、３Ｄ推定手段によって推定された三次元形状と検出結果とを照合することにより、姿勢推定結果の妥当性を判断する。この場合、照合対象に適した特徴のある領域を予め設定して、その領域についての照合を行うことで、物体の姿勢推定結果の妥当性をより的確に判断することができる。 Preferably, the determination unit determines the validity of the posture estimation result by comparing the three-dimensional shape estimated by the 3D estimation unit and the detection result with respect to a preset comparison target region in the object. In this case, the validity of the posture estimation result of the object can be determined more accurately by presetting a region having characteristics suitable for the collation target and collating the region.

好ましくは、複数の視点から物体を撮像して得られる複数の画像を重ね合わせた画像において、重なり度合いが所定値より低い領域を照合対象領域に選定する選定手段を更に備える。複数の視点から物体を撮像して得られる複数の画像を重ね合わせた画像において重なり度合いが所定値より低い領域は、その物体において特徴的な領域である。よって、特徴的な領域を照合対象領域に選定することにより、物体の姿勢推定結果の妥当性を十分的確に判断することができる。 Preferably, the image processing apparatus further includes selection means for selecting an area having a degree of overlap lower than a predetermined value as an area to be collated in an image obtained by superimposing a plurality of images obtained by imaging an object from a plurality of viewpoints. A region where the overlapping degree is lower than a predetermined value in an image obtained by superimposing a plurality of images obtained by imaging an object from a plurality of viewpoints is a characteristic region in the object. Therefore, by selecting a characteristic area as a comparison target area, it is possible to sufficiently adequately determine the validity of the object posture estimation result.

本発明の画像処理プログラムは、物体の撮像画像から抽出した物体の輪郭に基づいて物体の姿勢を推定する姿勢推定ステップと、姿勢推定ステップにおける姿勢推定結果に基づいて物体の三次元形状を推定する３Ｄ推定ステップと、物体の三次元形状を検出する検出ステップと、３Ｄ推定ステップにおいて推定された三次元形状と検出ステップにおける検出結果とを照合することにより、姿勢推定ステップにおける姿勢推定結果の妥当性を判断する判断ステップと、をコンピュータに実行させることを特徴とする。 An image processing program according to the present invention estimates a posture of an object based on a contour of an object extracted from a captured image of the object, and estimates a three-dimensional shape of the object based on a posture estimation result in the posture estimation step Validity of the posture estimation result in the posture estimation step by collating the 3D estimation step, the detection step for detecting the three-dimensional shape of the object, and the three-dimensional shape estimated in the 3D estimation step with the detection result in the detection step And a determination step of determining

単に、物体の輪郭に基づいて物体の姿勢を推定すると、２次元上における推定誤差は少なくても、三次元上の物体の占める領域としては推定誤差が大きい場合がある。本発明では、物体の輪郭に基づいて推定された姿勢推定結果を利用して物体の三次元形状を推定し、推定した三次元形状を物体の三次元形状の検出結果と照合することによって、物体の姿勢推定結果の妥当性を三次元的に判断する。従って、物体の姿勢推定結果の妥当性を的確に判断することができる。 If the posture of the object is simply estimated based on the contour of the object, the estimation error may be large as a region occupied by the object in three dimensions even if the estimation error in two dimensions is small. In the present invention, by estimating the three-dimensional shape of the object using the posture estimation result estimated based on the contour of the object, and comparing the estimated three-dimensional shape with the detection result of the three-dimensional shape of the object, The validity of the pose estimation result is determined three-dimensionally. Therefore, it is possible to accurately determine the validity of the object posture estimation result.

好ましくは、判断ステップでは、物体における予め推定された照合対象領域について、３Ｄ推定ステップにおいて推定された三次元形状と検出結果とを照合することにより、姿勢推定結果の妥当性を判断する。この場合、照合対象に適した特徴ある領域を予め設定して、その領域についての照合を行うことで、物体の姿勢推定結果の妥当性をより的確に判断することができる。 Preferably, in the determination step, the validity of the posture estimation result is determined by collating the three-dimensional shape estimated in the 3D estimation step with the detection result for the collation target region estimated in advance in the object. In this case, the validity of the posture estimation result of the object can be determined more accurately by presetting a characteristic region suitable for the collation target and collating the region.

好ましくは、複数の視点から物体を撮像して得られる複数の画像を重ね合わせた画像において重なり度合いが所定値より低い領域を照合対象領域に選定する選定ステップを更にコンピュータに実行させる。複数の視点から物体を撮像して得られる複数の画像を重ね合わせた画像において重なり度合いが所定値より低い領域は、その物体において特徴的な領域である。よって、特徴的な領域を照合対象領域に選定することにより、物体の姿勢推定結果の妥当性を十分的確に判断することができる。 Preferably, the computer further executes a selection step of selecting an area whose overlapping degree is lower than a predetermined value in an image obtained by superimposing a plurality of images obtained by imaging an object from a plurality of viewpoints. A region where the overlapping degree is lower than a predetermined value in an image obtained by superimposing a plurality of images obtained by imaging an object from a plurality of viewpoints is a characteristic region in the object. Therefore, by selecting a characteristic area as a comparison target area, it is possible to sufficiently adequately determine the validity of the object posture estimation result.

本発明によれば、物体の姿勢推定結果の妥当性を的確に判断することができる。これにより、姿勢推定の推定精度を向上させることができる。 According to the present invention, it is possible to accurately determine the validity of an object posture estimation result. Thereby, the estimation accuracy of posture estimation can be improved.

以下、本発明に係わる画像処理装置及び画像処理プログラムの好適な実施形態について、図面を参照して詳細に説明する。 DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, preferred embodiments of an image processing apparatus and an image processing program according to the present invention will be described in detail with reference to the drawings.

図１は、本発明に係わる画像処理装置の一実施形態の構成を示すブロック図である。本実施形態の画像処理装置１は、例えば認識対象物体として急須やマグカップ等の取っ手付き容器を把持するロボット（図示せず）に搭載されるものである。 FIG. 1 is a block diagram showing a configuration of an embodiment of an image processing apparatus according to the present invention. The image processing apparatus 1 of this embodiment is mounted on a robot (not shown) that grips a container with a handle such as a teapot or a mug as a recognition target object.

同図において、画像処理装置１は、認識対象物体を撮像するカメラ２Ａ，２Ｂと、これらのカメラ２Ａ，２Ｂによる撮像画像を入力し、所定の画像処理を行い、認識対象物体の姿勢を推定する画像処理部３と、この画像処理部３の処理結果を表示するモニタ部４と、画像処理部３による画像処理に使用されるデータベースを蓄積記憶するデータ格納部５とを備えている。 In FIG. 1, an image processing apparatus 1 inputs cameras 2A and 2B that capture a recognition target object and images captured by these cameras 2A and 2B, performs predetermined image processing, and estimates the posture of the recognition target object. An image processing unit 3, a monitor unit 4 for displaying the processing result of the image processing unit 3, and a data storage unit 5 for accumulating and storing a database used for image processing by the image processing unit 3 are provided.

カメラ２Ａ，２Ｂは、例えばＣＣＤカメラであり、ロボットの両眼部（図示せず）に設けられている。つまり、カメラ２Ａ，２Ｂは異なる２つの視点から物体を撮像するように配置されている。 The cameras 2A and 2B are, for example, CCD cameras, and are provided in both eyes (not shown) of the robot. That is, the cameras 2A and 2B are arranged so as to capture an image of an object from two different viewpoints.

画像処理部３は、物体認識処理に特化した専用のハードウェアとして構成されていても良いし、あるいは、パーソナルコンピュータ等の汎用のコンピュータを用い、このコンピュータにソフトウェアとしての画像処理プログラムを実行させても良い。このとき、画像処理プログラムは、例えば、ＣＤ−ＲＯＭ、ＤＶＤもしくはＲＯＭ等の記憶媒体または半導体メモリによって提供される。また、画像処理プログラムは、搬送波に重畳されたコンピュータデータ信号としてネットワークを介して提供されるものであってもよい。 The image processing unit 3 may be configured as dedicated hardware specialized for object recognition processing, or a general-purpose computer such as a personal computer is used to cause the computer to execute an image processing program as software. May be. At this time, the image processing program is provided by, for example, a storage medium such as a CD-ROM, a DVD, or a ROM, or a semiconductor memory. The image processing program may be provided via a network as a computer data signal superimposed on a carrier wave.

また、画像処理部３の処理結果は、把持制御処理部６に送られる。把持制御処理部６は、画像処理部３で推定された認識対象物体の姿勢に基づいて、認識対象物体を把持するようにロボットハンド（図示せず）を制御する。 Further, the processing result of the image processing unit 3 is sent to the grip control processing unit 6. The grip control processing unit 6 controls a robot hand (not shown) so as to grip the recognition target object based on the posture of the recognition target object estimated by the image processing unit 3.

図２は、画像処理部３による処理手順の概略を示すフローチャートである。まず、カメラ２Ａ，２Ｂによる認識対象物体の撮像画像を取得する（ステップ１１）。カメラ２Ａ，２Ｂによる撮像画像の一例を図３に示す。図３（ａ）は、ロボットの左眼部に配置されたカメラ２Ａによる撮像画像（左画像）の概略を示し、図３（ｂ）は、ロボットの右眼部に配置されたカメラ２Ｂによる撮像画像（右画像）の概略を示している。本実施形態では、例として、認識対象物体を急須としている。 FIG. 2 is a flowchart showing an outline of a processing procedure by the image processing unit 3. First, captured images of recognition target objects by the cameras 2A and 2B are acquired (step 11). An example of images captured by the cameras 2A and 2B is shown in FIG. FIG. 3A shows an outline of an image (left image) captured by the camera 2A arranged in the left eye part of the robot, and FIG. 3B shows an image taken by the camera 2B arranged in the right eye part of the robot. An outline of an image (right image) is shown. In the present embodiment, as an example, the recognition target object is a teapot.

左画像（図３（ａ））には、ロボットの左眼部から見た姿勢の急須を示す画像として急須画像３１Ｌが示され、同じく牛乳パックを示す画像として牛乳パック画像３２Ｌが示されている。右画像（図３（ｂ））には、ロボットの右眼部から見た姿勢の急須を示す画像として急須画像３１Ｒが示され、同じく牛乳パックを示す画像として牛乳パック画像３２Ｒが示されている。 In the left image (FIG. 3A), a teapot image 31L is shown as an image showing the teapot of the posture as viewed from the left eye of the robot, and a milk pack image 32L is also shown as an image showing the milk pack. In the right image (FIG. 3B), a teapot image 31R is shown as an image showing the teapot of the posture viewed from the right eye of the robot, and a milk pack image 32R is also shown as an image showing the milk pack.

続いて、ステップ１１で取得した左画像及び右画像について、濃度値が一様とみなせる物体毎の領域に分割する（ステップ１２）。また、ステップ１２の領域分割処理と並行して、２次元の左画像及び右画像から三次元画像を復元する（ステップ１３）。この三次元画像の復元は、例えば両眼視差の考え方を利用して、ある点の左画像及び右画像での位置座標とカメラ２Ａ，２Ｂ間の距離とから当該点の奥行きを計算することにより行う。 Subsequently, the left image and the right image acquired in step 11 are divided into regions for each object whose density values can be regarded as uniform (step 12). In parallel with the region dividing process in step 12, a three-dimensional image is restored from the two-dimensional left image and right image (step 13). This three-dimensional image restoration is performed by, for example, using the concept of binocular parallax by calculating the depth of a point from the position coordinates of the point in the left and right images and the distance between the cameras 2A and 2B. Do.

続いて、ステップ１２で領域分割された２次元画像とステップ１３で復元された三次元画像とに基づいて、エッジ検出等により物体の輪郭を抽出する（ステップ１４）。本実施形態では、認識対象物体である急須の色と同色の領域を示す輪郭を左右画像それぞれから抽出する。 Subsequently, based on the two-dimensional image divided in step 12 and the three-dimensional image restored in step 13, the contour of the object is extracted by edge detection or the like (step 14). In the present embodiment, a contour indicating a region of the same color as the teapot color that is the recognition target object is extracted from each of the left and right images.

続いて、ステップ１４で得られた各画像の輪郭の特徴量をそれぞれ抽出する（ステップ１５）。輪郭の特徴量としては、輪郭の位置、回転及び大きさに対して不変な不変量を用いる。例えば、撮像画像においては、輪郭に対して互いに平行な複数の接線を描ける場合がある。この場合、接線間の距離は、輪郭の位置、回転及び大きさに対して不変である。そこで、ここでは、不変量として、勾配角度毎の接線間の距離を用いる。 Subsequently, the feature amount of the contour of each image obtained in step 14 is extracted (step 15). As the feature amount of the contour, an invariant that is invariant with respect to the position, rotation, and size of the contour is used. For example, in a captured image, there are cases where a plurality of tangent lines parallel to each other can be drawn. In this case, the distance between the tangents is invariant to the position, rotation and size of the contour. Therefore, here, the distance between tangents for each gradient angle is used as an invariant.

続いて、ステップ１５で抽出された各画像の輪郭の特徴量を、データ格納部５にデータベースとして記憶されている特徴照合データとマッチングすることにより、各認識対象物体の特徴照合データに対する類似度を算出する（ステップ１６）。特徴照合データとしては、複数の視点に応じた認識対象物体の姿勢に関するデータが登録されている。より具体的には、複数の視点に応じた認識対象物体の輪郭を示すデータと、当該輪郭より抽出された特徴量データとが登録されている。 Subsequently, by matching the feature amount of the contour of each image extracted in step 15 with the feature matching data stored as a database in the data storage unit 5, the similarity to the feature matching data of each recognition target object is obtained. Calculate (step 16). Data relating to the posture of the recognition target object corresponding to a plurality of viewpoints is registered as the feature matching data. More specifically, data indicating the contour of the recognition target object corresponding to a plurality of viewpoints and feature amount data extracted from the contour are registered.

ステップ１６では、算出した類似度が最も高いデータに対応する物体が、各画像の輪郭によって示される物体に対応すると認識される。本実施形態では、各画像の輪郭は、急須を示すと認識される。なお、マッチング手法としては、例えばＤＰ（Dynamic Programming）マッチング等が採用される。 In step 16, it is recognized that the object corresponding to the data with the highest calculated similarity corresponds to the object indicated by the contour of each image. In the present embodiment, it is recognized that the outline of each image indicates a teapot. As a matching method, for example, DP (Dynamic Programming) matching or the like is adopted.

続いて、左画像の輪郭と右画像の輪郭との信頼度を評価し、左右画像のうち信頼度の高いほうの輪郭を含む画像を選択する（ステップ１７）。信頼度とは、輪郭を用いて物体の姿勢を推定するにあたって、推定の信頼性を予測するためのものである。具体的には、信頼度として輪郭の複雑度及び特徴量の類似度を用いる。 Subsequently, the reliability of the contour of the left image and the contour of the right image is evaluated, and an image including the contour having the higher reliability of the left and right images is selected (step 17). The reliability is for predicting the reliability of the estimation when estimating the posture of the object using the contour. Specifically, contour complexity and feature similarity are used as the reliability.

複雑度とは、輪郭の複雑性を定量化して表現したものであり、具体的には、輪郭における勾配角度毎の不変量の次元数総和である。一般に、複雑度が高い輪郭は、認識対象物体の姿勢に関してより多くの情報を有しているので、姿勢を推定するにあたってより信頼性が高い。すなわち、複雑度を用いて信頼性を評価することにより、姿勢を推定するにあたって信頼性の高い輪郭を含む画像を選択することができる。ただし、左右画像において輪郭の複雑度が同等である場合には、特徴量の類似度が高いほうの画像を選択する。 The complexity is expressed by quantifying the complexity of the contour, and specifically, the total number of invariant dimensions for each gradient angle in the contour. In general, a contour having a high degree of complexity has more information on the posture of the recognition target object, and is therefore more reliable in estimating the posture. That is, by evaluating the reliability using the complexity, it is possible to select an image including a highly reliable contour when estimating the posture. However, when the complexity of the contours is the same in the left and right images, an image having a higher feature amount similarity is selected.

続いて、選択された画像の輪郭の特徴量と、ステップ１６で識別された認識対象物体の種別（本実施形態では、急須。）の特徴照合データにおける各輪郭の特徴量との最大類似度を算出する（ステップ１８）。そして、特徴照合データにおいて最大類似度に対応する輪郭の形状を認識対象物体の初期姿勢に設定する（ステップ１９）。 Subsequently, the maximum similarity between the feature amount of the contour of the selected image and the feature amount of each contour in the feature matching data of the type of the recognition target object identified in step 16 (teapot in this embodiment) is obtained. Calculate (step 18). Then, the contour shape corresponding to the maximum similarity in the feature matching data is set as the initial posture of the recognition target object (step 19).

続いて、まず基準画像の輪郭を用いて、認識対象物体の姿勢を粗推定する（ステップ２０）。すなわち、一方のカメラによって得られた画像を用いて姿勢を推定する（単眼姿勢推定）。粗推定とは、後段で行う詳細推定より粗く行う姿勢の推定である。ステップ２０では、基準画像における輪郭についてＤＴ（Distance Transforms）画像を作成し、データ格納部５に記憶されている輪郭形状データとマッチングすることにより、回転・並進計算を行って認識対象物体の姿勢を粗推定する。 Subsequently, the posture of the recognition target object is roughly estimated using the outline of the reference image (step 20). That is, the posture is estimated using an image obtained by one camera (monocular posture estimation). The rough estimation is an estimation of the posture that is performed more coarsely than the detailed estimation that is performed later. In step 20, a DT (Distance Transforms) image is created for the contour in the reference image and matched with the contour shape data stored in the data storage unit 5 to perform rotation / translation calculation to determine the posture of the recognition target object. Coarse estimation.

続いて、その粗姿勢推定の妥当性を判断する（ステップ２１）。この妥当性は、例えば姿勢推定によって得られた輪郭と左右画像から抽出した輪郭とを重ね合わせた時の重なり程度から判断する。 Subsequently, the validity of the rough posture estimation is determined (step 21). This validity is determined, for example, from the degree of overlap when the contour obtained by posture estimation and the contour extracted from the left and right images are superimposed.

ステップ２１において姿勢推定が正しくないと判断されたときは、新たな初期姿勢を設定して、再び認識対象物体の粗姿勢推定を行う。例えば、ステップ１８で算出された最大類似度の次ぎに高い類似度に対応する輪郭の形状を認識対象物体の新たな初期姿勢として設定する（ステップ２２）。そして、上記のステップ２０を再度実行する。 If it is determined in step 21 that the posture estimation is not correct, a new initial posture is set, and the coarse posture estimation of the recognition target object is performed again. For example, the contour shape corresponding to the highest similarity after the maximum similarity calculated in step 18 is set as a new initial posture of the recognition target object (step 22). And said step 20 is performed again.

引き続き左画像及び右画像の両方を用いて、認識対象物体の姿勢を詳細に推定する（ステップ２２）。ステップ２２における詳細姿勢推定処理では、左画像及び右画像における輪郭についてＤＴ画像を作成し、データ格納部５に記憶されている輪郭形状データとマッチングすることにより回転・並進計算を行って認識対象物体の姿勢を上記の粗姿勢推定より詳細に推定する（複眼姿勢推定）。 Subsequently, the posture of the recognition target object is estimated in detail using both the left image and the right image (step 22). In the detailed posture estimation process in step 22, a DT image is created for the contours in the left image and the right image, and the object to be recognized is subjected to rotation / translation calculation by matching with the contour shape data stored in the data storage unit 5 Is estimated in more detail than the above rough posture estimation (compound eye posture estimation).

例えば、左画像では、図４（ａ）に示されるように、認識対象物体の姿勢が輪郭３３Ｌとして推定され、右画像では図４（ｂ）に示されるように、認識対象物体の姿勢が輪郭３３Ｒとして推定される。以上ステップ１１〜２２の処理により、認識対象物体の撮像画像から認識対象物体の輪郭を抽出し、抽出した輪郭に基づいて認識物体の姿勢を推定する。 For example, in the left image, the posture of the recognition target object is estimated as a contour 33L as shown in FIG. 4A, and in the right image, the posture of the recognition target object is a contour as shown in FIG. 4B. Estimated as 33R. Through the processes of steps 11 to 22 described above, the contour of the recognition target object is extracted from the captured image of the recognition target object, and the posture of the recognition object is estimated based on the extracted contour.

続いて、その姿勢推定結果の妥当性を判断する（ステップ２３）。ステップ２３において姿勢推定結果が正しくないと判断されたときは、上述したステップ２５の処理を実行する。ステップ２３において、姿勢推定結果が正しいと判定されたときは、その姿勢推定結果を把持制御処理部６に送出すると共にモニタ部４に表示させる（ステップ２４）。このようにして、画像処理部３によって認識対象物体の姿勢が推定される。 Subsequently, the validity of the posture estimation result is determined (step 23). If it is determined in step 23 that the posture estimation result is not correct, the processing in step 25 described above is executed. If it is determined in step 23 that the posture estimation result is correct, the posture estimation result is sent to the grip control processing unit 6 and displayed on the monitor unit 4 (step 24). In this way, the posture of the recognition target object is estimated by the image processing unit 3.

引き続いて、上記ステップ２３における姿勢推定結果の妥当性の判断について、より詳細に説明する。この詳細姿勢推定の妥当性は、データ格納部５に登録されている重複領域データを利用して判断される。まず、重複領域データについて説明すると共に、重複領域データをデータベースとしてデータ格納部５に登録する方法について説明する。図５は、データ格納部５にデータを登録する手順を示すフローチャートである。 Subsequently, the determination of the validity of the posture estimation result in step 23 will be described in more detail. The validity of this detailed posture estimation is determined using the overlapping area data registered in the data storage unit 5. First, the overlapping area data will be described, and a method for registering the overlapping area data as a database in the data storage unit 5 will be described. FIG. 5 is a flowchart showing a procedure for registering data in the data storage unit 5.

複数の視点から認識対象物体の投影画像を取得し、各視点の投影画像における輝度値の総和を算出する（ステップ４１）。例えば、図６に示すように、急須３４を中心とするように想定した仮想球の球面Ｕ上の複数の視点Ｗから急須３４の投影画像を得る。投影画像とは、急須３４の占める領域とそれ以外の領域とを輝度値の差によって示す画像である。例えば、図７に示す投影画像では、閉曲線で示す輪郭３５の外側の領域の輝度値が低く、輪郭３５の内側の領域（急須３４を示す領域）の輝度値は高くなっている。 Projected images of the recognition target object are acquired from a plurality of viewpoints, and the sum of luminance values in the projected images of the respective viewpoints is calculated (step 41). For example, as shown in FIG. 6, projection images of the teapot 34 are obtained from a plurality of viewpoints W on the spherical surface U of the virtual sphere assumed to be centered on the teapot 34. The projected image is an image that shows the area occupied by the teapot 34 and the other area by the difference in luminance value. For example, in the projected image shown in FIG. 7, the brightness value of the area outside the contour 35 indicated by the closed curve is low, and the brightness value of the area inside the contour 35 (area indicating the teapot 34) is high.

球面Ｕ上の複数の視点Ｗから急須３４の投影画像を得ることにより、急須３４の複数の姿勢を示す二次元画像を取得することができる。その後、取得した全ての投影画像を重ね合わせて輝度値を加算する。図８に、重ね合わせた投影画像の一例と示す。輪郭３６〜３８は、３つの投影画像における急須の輪郭を示す。 By obtaining projected images of the teapot 34 from a plurality of viewpoints W on the spherical surface U, a two-dimensional image showing a plurality of postures of the teapot 34 can be acquired. Thereafter, all the acquired projection images are overlaid and the luminance value is added. FIG. 8 shows an example of the superimposed projected image. Outlines 36 to 38 show the outlines of the teapots in the three projected images.

次に、図８に示すように、投影画像を重ねた二次元画像上において、輝度値の総和が閾値以上の領域Ｓを抽出する（ステップ４２）。この領域Ｓは、全ての視点Ｗから得られる投影画像において、急須３４が占める領域である。この領域Ｓを円Ｃで近似する（ステップ４３）。円Ｃは、領域Ｓを含むように設定される。なお、近似する形状は、円Ｃのような真円に限らず、楕円、矩形、又は領域Ｓを膨張させた形状でもよい。 Next, as shown in FIG. 8, a region S in which the sum of luminance values is equal to or greater than a threshold is extracted from the two-dimensional image obtained by superimposing the projection images (step 42). This area S is an area occupied by the teapot 34 in the projection images obtained from all the viewpoints W. This region S is approximated by a circle C (step 43). Circle C is set to include region S. The approximate shape is not limited to a perfect circle such as the circle C, but may be an ellipse, a rectangle, or a shape obtained by expanding the region S.

次に、図９に示すように、全ての視点Ｗに対して円Ｃが投影されるような球Ｖを算出する（ステップ４４）。すなわち、球Ｖは、何れの視点Ｗから得られる二次元画像においても同一形状となる。球面Ｕの中心に位置する認識対象物体において球Ｖからはみ出た領域は、複数の視点から撮像して得られる複数の画像を重ね合わせた画像において重なり度合いが所定値より低い領域である。この領域は、認識対象物体の形状が特徴的な領域である。例えば、球Ｖからはみ出た領域３５ａは、急須３４の取っ手である。球Ｖからはみ出た領域３５ｂは、急須の注ぎ口である。 Next, as shown in FIG. 9, a sphere V on which a circle C is projected for all viewpoints W is calculated (step 44). That is, the sphere V has the same shape in any two-dimensional image obtained from any viewpoint W. The region that protrudes from the sphere V in the recognition target object located at the center of the spherical surface U is a region in which the degree of overlap is lower than a predetermined value in an image obtained by superimposing a plurality of images obtained from a plurality of viewpoints. This area is an area where the shape of the recognition target object is characteristic. For example, the region 35 a that protrudes from the sphere V is a handle of the teapot 34. A region 35b protruding from the sphere V is a spout of a teapot.

続いて、算出した球Ｖの情報を重複領域データとしてデータ格納部５に登録する（ステップ４５）。以上のようにして認識対象物体の重複領域データを算出して、認識対象物体の姿勢推定の前に予めデータ格納部５に登録する。 Subsequently, the calculated information of the sphere V is registered in the data storage unit 5 as overlapping area data (step 45). As described above, the overlapping area data of the recognition target object is calculated and registered in the data storage unit 5 in advance before the recognition target object posture estimation.

引き続いて、上記ステップ２３における姿勢推定結果の妥当性の判断手順について説明する。図１０は、姿勢推定結果の妥当性を判断する手順を示すフローチャートである。 Subsequently, the procedure for determining the validity of the posture estimation result in step 23 will be described. FIG. 10 is a flowchart illustrating a procedure for determining the validity of the posture estimation result.

まず、図２に示すステップ２２によって得られた姿勢推定結果を示す投影画像に対して、データ格納部５に登録された上述の球Ｖを投影する（ステップ５１）。具体的には、図１１に示すように、姿勢推定結果を示す輪郭６１に球Ｖを投影する。 First, the above-mentioned sphere V registered in the data storage unit 5 is projected on the projection image showing the posture estimation result obtained in step 22 shown in FIG. 2 (step 51). Specifically, as shown in FIG. 11, a sphere V is projected onto a contour 61 indicating the posture estimation result.

次に、姿勢推定結果を示す投影画像において球Ｖの領域以外の領域を示す輪郭Ｔを求める（ステップ５２）。具体的には、図１１の輪郭Ｔで囲まれた斜線部分が、姿勢推定結果を示す投影画像において球Ｖの領域以外の領域を示す。つまり、輪郭Ｔによって示される領域が、照合対象領域として選定される。このような照合対象領域（輪郭Ｔ）を、左右画像における姿勢推定結果に対してそれぞれ求める。 Next, a contour T indicating a region other than the region of the sphere V in the projection image indicating the posture estimation result is obtained (step 52). Specifically, the hatched portion surrounded by the contour T in FIG. 11 indicates a region other than the region of the sphere V in the projection image indicating the posture estimation result. That is, the area indicated by the contour T is selected as the verification target area. Such collation target regions (contours T) are obtained for the posture estimation results in the left and right images, respectively.

続いて、左右画像を用いて、輪郭Ｔによって示される領域の三次元形状を推定する（ステップ５３）。例えば、両眼視差の考え方を利用して、ある点の左画像及び右画像での位置座標とカメラ２Ａ，２Ｂ間の距離とから当該点の奥行きを計算することにより、三次元形状を推定する。この三次元形状（三次元推定形状とする）は、例えば、姿勢推定の誤差と三次元形状に変換する際の誤差とを加味して、輪郭Ｔを用いて算出される結果より大きい形状に設定される。 Subsequently, the three-dimensional shape of the region indicated by the contour T is estimated using the left and right images (step 53). For example, using the concept of binocular parallax, a three-dimensional shape is estimated by calculating the depth of a point from the position coordinates in the left and right images of a point and the distance between the cameras 2A and 2B. . This three-dimensional shape (referred to as a three-dimensional estimated shape) is set to a shape larger than the result calculated using the contour T, for example, taking into account errors in posture estimation and errors in converting to a three-dimensional shape Is done.

一方で、認識対象物体の三次元形状を別に検出し、認識対象物体を三次元復元する（ステップ５４）。例えば、ステレオ・レンジファインダによって三次元計測を行うことにより、認識対象物体の三次元形状の検出を行う。なお、三次元形状の検出は、ステレオ・レンジファインダを用いて行うことに限られず、例えば、図２のステップ１３で取得した三次元復元画像を利用してもよい。 On the other hand, the three-dimensional shape of the recognition target object is separately detected, and the recognition target object is three-dimensionally restored (step 54). For example, the three-dimensional shape of the recognition target object is detected by performing three-dimensional measurement using a stereo range finder. Note that the detection of the three-dimensional shape is not limited to being performed using a stereo range finder, and for example, the three-dimensional restored image acquired in step 13 in FIG. 2 may be used.

続いて、ステップ５３において得られた三次元推定形状とステップ５４において得られた三次元復元結果とを重ねる（ステップ５５）。例えば、図１２に示すように、急須の三次元復元結果６２を三次元座標上に設定し、その三次元座標上に重ねて、輪郭Ｔによって示される領域の三次元推定形状６３を設定する。 Subsequently, the three-dimensional estimated shape obtained in step 53 and the three-dimensional restoration result obtained in step 54 are overlapped (step 55). For example, as shown in FIG. 12, a teapot three-dimensional restoration result 62 is set on three-dimensional coordinates, and is superimposed on the three-dimensional coordinates to set a three-dimensional estimated shape 63 of the region indicated by the contour T.

続いて、三次元座標上における三次元推定形状と三次元復元結果との重なる領域を算出する（ステップ５６）。例えば、両者の重なり合うVoxel数を算出する。また、三次元座標上に設定された三次元推定形状と三次元復元結果とを示す二次元画面から、三次元推定形状と三次元復元結果とが重なる領域を算出してもよい。 Subsequently, a region where the three-dimensional estimated shape on the three-dimensional coordinates overlaps with the three-dimensional restoration result is calculated (step 56). For example, the number of overlapping Voxels is calculated. Further, an area where the 3D estimated shape and the 3D restoration result overlap may be calculated from a 2D screen showing the 3D estimated shape set on the 3D coordinates and the 3D restoration result.

そして、算出した重なり領域の大きさを示す値が閾値以上である場合は、姿勢推定結果が妥当であると判断する（ステップ５７）。算出した重なり領域の大きさを示す値が閾値より小さい場合は、姿勢推定結果が妥当でないと判断し（ステップ５７）、図２に示すステップ２５へ進む。以上説明したように、ステップ５３において得られた三次元推定形状とステップ５４において得られた検出結果とを照合することにより、姿勢推定結果の妥当性を判断する。 If the calculated value indicating the size of the overlapping area is equal to or greater than the threshold, it is determined that the posture estimation result is valid (step 57). If the calculated value indicating the size of the overlapping area is smaller than the threshold value, it is determined that the posture estimation result is not valid (step 57), and the process proceeds to step 25 shown in FIG. As described above, the validity of the posture estimation result is determined by collating the three-dimensional estimated shape obtained in step 53 with the detection result obtained in step 54.

以上において、図２に示すステップ１５〜２２は、認識対象物体の撮像画像から抽出した認識対象物体の輪郭に基づいて認識対象物体の姿勢を推定する姿勢推定手段（姿勢推定ステップ）を構成する。図１０に示すステップ５１，５２は、複数の視点から認識対象物体を撮像して得られる複数の画像を重ね合わせた画像において重なり度合いが所定値より低い領域を照合対象領域に選定する選定手段（選定ステップ）を構成する。図１０に示すステップ５３は、姿勢推定手段による姿勢推定結果に基づいて認識対象物体の三次元形状を推定する３Ｄ推定手段（３Ｄ推定ステップ）を構成する。図１０のステップ５４は、認識対象物体の三次元形状を検出する検出手段（検出ステップ）を構成する。図１０のステップ５５〜５７は、３Ｄ推定手段によって推定された三次元形状と検出手段による検出結果とを照合することにより、姿勢推定手段による姿勢推定結果の妥当性を判断する判断手段（判断ステップ）を構成する。 In the above, steps 15 to 22 shown in FIG. 2 constitute posture estimation means (posture estimation step) for estimating the posture of the recognition target object based on the outline of the recognition target object extracted from the captured image of the recognition target object. Steps 51 and 52 shown in FIG. 10 are selection means for selecting, as a collation target area, an area having a degree of overlap lower than a predetermined value in an image obtained by superimposing a plurality of images obtained by imaging a recognition target object from a plurality of viewpoints. Selection step). Step 53 shown in FIG. 10 constitutes 3D estimation means (3D estimation step) for estimating the three-dimensional shape of the recognition target object based on the posture estimation result by the posture estimation means. Step 54 in FIG. 10 constitutes detection means (detection step) for detecting the three-dimensional shape of the recognition target object. Steps 55 to 57 in FIG. 10 are judgment means (judgment step) for judging the validity of the posture estimation result by the posture estimation means by collating the three-dimensional shape estimated by the 3D estimation means with the detection result by the detection means. ).

ところで、姿勢推定結果の妥当性を判断するには、カメラによる撮像画像より抽出した認識対象物体の輪郭又は絵柄と、姿勢推定結果の輪郭又は絵柄とのずれに基づいて判断することが考えられる。しかしながら、姿勢が異なっていても輪郭や絵柄の配置が類似してしまう場合がある。この場合、輪郭同士の差又は絵柄の配置の差は小さくても、姿勢推定結果と実際の認識対象物体の姿勢とが大きくずれることとなる。よって、このように単に二次元的な情報を用いるだけでは、姿勢推定結果の妥当性を的確に判断するのは困難である。 By the way, in order to determine the validity of the posture estimation result, it is conceivable to make a determination based on the deviation between the contour or the pattern of the recognition target object extracted from the image captured by the camera and the contour or the pattern of the posture estimation result. However, there are cases where the outline and the arrangement of the patterns are similar even if the postures are different. In this case, even if the difference between the contours or the difference in the arrangement of the patterns is small, the posture estimation result and the actual posture of the recognition target object are greatly deviated. Therefore, it is difficult to accurately determine the validity of the posture estimation result simply by using two-dimensional information.

これに対し本実施形態では、認識対象物体の輪郭に基づいて推定された姿勢推定結果を利用して、認識対象物体の三次元推定形状６３を三次元座標上に設定すると共に、認識対象物体の三次元検出結果を利用して、三次元座標上に認識対象物体の三次元復元結果６２を設定する。つまり、三次元復元結果６２と三次元推定形状６３とを三次元座標上に重ね合わせる。続いて、三次元復元結果６２と三次元推定形状６３との重なる領域を算出し、その結果から、認識対象物体の姿勢推定結果の妥当性を判断する。よって、認識対象物体の姿勢推定結果の妥当性を三次元的な情報に基づいて判断することとなり、姿勢推定結果の妥当性を的確に判断することができる。このとき、認識対象物体の特徴的な領域（ここでは取っ手）について、三次元復元結果と三次元推定形状とを三次元座標上に重ね合わせるのが好適である。このように姿勢推定結果の妥当性を的確に判断することにより、姿勢推定結果の妥当性が低い場合に再び姿勢推定を行い、姿勢推定の誤認識を確実に抑制することができる。 On the other hand, in this embodiment, using the posture estimation result estimated based on the contour of the recognition target object, the three-dimensional estimated shape 63 of the recognition target object is set on the three-dimensional coordinates, and the recognition target object Using the three-dimensional detection result, the three-dimensional restoration result 62 of the recognition target object is set on the three-dimensional coordinates. That is, the three-dimensional restoration result 62 and the three-dimensional estimated shape 63 are superimposed on the three-dimensional coordinates. Subsequently, a region where the three-dimensional restoration result 62 and the three-dimensional estimated shape 63 overlap is calculated, and the validity of the posture estimation result of the recognition target object is determined from the result. Therefore, the validity of the posture estimation result of the recognition target object is determined based on the three-dimensional information, and the validity of the posture estimation result can be accurately determined. At this time, it is preferable to superimpose the three-dimensional restoration result and the three-dimensional estimated shape on the three-dimensional coordinates for the characteristic region (the handle in this case) of the recognition target object. Thus, by accurately determining the validity of the posture estimation result, posture estimation can be performed again when the posture estimation result is low in validity, and erroneous recognition of posture estimation can be reliably suppressed.

上記実施形態は、物体を把持するロボットに適用されるものであるが、本発明の画像処理装置及び画像処理プログラムは、物体を認識して物体の姿勢を推定する他の装置やシステム等にも適用可能である。 Although the above embodiment is applied to a robot that grips an object, the image processing apparatus and the image processing program of the present invention can be applied to other apparatuses and systems that recognize an object and estimate the posture of the object. Applicable.

本発明に係わる画像処理装置の一実施形態の構成を示すブロック図である。1 is a block diagram showing a configuration of an embodiment of an image processing apparatus according to the present invention. 図１に示す画像処理部による処理手順の概略を示すフローチャートである。It is a flowchart which shows the outline of the process sequence by the image process part shown in FIG. 図１に示す２つのカメラで撮像して得られた左画像及び右画像の一例を示す図である。It is a figure which shows an example of the left image and right image which were imaged with the two cameras shown in FIG. 複眼姿勢推定結果の一例を示す図である。It is a figure which shows an example of a compound eye attitude | position estimation result. 図１に示すデータ格納部にデータを登録する手順を示すフローチャートである。It is a flowchart which shows the procedure which registers data in the data storage part shown in FIG. カメラの視点位置を示すイメージ図である。It is an image figure which shows the viewpoint position of a camera. ある視点から得られる投影画像を示す図である。It is a figure which shows the projection image obtained from a certain viewpoint. 複数の視点位置の投影画像を加算した画像を示す図である。It is a figure which shows the image which added the projection image of several viewpoint positions. 図５のステップ４４で算出される球Ｖを示すイメージ図である。It is an image figure which shows the sphere V calculated by step 44 of FIG. 姿勢推定結果の妥当性を判断する手順を示すフローチャートである。It is a flowchart which shows the procedure which judges the validity of an attitude | position estimation result. 照合対象領域を示す図である。It is a figure which shows a collation object area | region. 三次元推定形状と三次元復元結果とを重ねた様子を示すイメージ図である。It is an image figure which shows a mode that the three-dimensional estimated shape and the three-dimensional restoration result were piled up.

Explanation of symbols

１…画像処理装置、２Ａ，２Ｂ…カメラ、３…画像処理部（姿勢推定手段、選定手段、３Ｄ推定手段、検出手段、判断手段）。 DESCRIPTION OF SYMBOLS 1 ... Image processing apparatus, 2A, 2B ... Camera, 3 ... Image processing part (Attitude estimation means, selection means, 3D estimation means, detection means, determination means).

Claims

Posture estimation means for estimating the posture of the object based on the contour of the object extracted from the captured image of the object;
3D estimation means for estimating a three-dimensional shape of the object based on a posture estimation result by the posture estimation means;
Detecting means for detecting a three-dimensional shape of the object;
A determination unit that determines the validity of the posture estimation result by the posture estimation unit by collating the three-dimensional shape estimated by the 3D estimation unit with the detection result by the detection unit;
An image processing apparatus comprising:

The determination unit determines the validity of the posture estimation result by comparing the three-dimensional shape estimated by the 3D estimation unit and the detection result with respect to a preset comparison target region in the object. The image processing apparatus according to claim 1.

3. The image processing apparatus according to claim 2, further comprising selection means for selecting, as the comparison target area, an area having a degree of overlap lower than a predetermined value in an image obtained by superimposing a plurality of images obtained by imaging the object from a plurality of viewpoints. An image processing apparatus according to 1.

A posture estimation step of estimating the posture of the object based on the contour of the object extracted from the captured image of the object;
A 3D estimation step for estimating a three-dimensional shape of the object based on a posture estimation result in the posture estimation step;
A detecting step for detecting a three-dimensional shape of the object;
A determination step of determining the validity of the posture estimation result in the posture estimation step by collating the three-dimensional shape estimated in the 3D estimation step with the detection result in the detection step;
An image processing program for causing a computer to execute.

In the determination step, the validity of the posture estimation result is determined by collating the three-dimensional shape estimated in the 3D estimation step with the detection result for a predetermined collation target region in the object. The image processing program according to claim 4.

A computer is further configured to perform a selection step of selecting an area whose overlapping degree is lower than a predetermined value in an image obtained by superimposing a plurality of images obtained by imaging the object from a plurality of viewpoints as the comparison target area. The image processing program according to claim 5.