JP7298687B2

JP7298687B2 - Object recognition device and object recognition method

Info

Publication number: JP7298687B2
Application number: JP2021525479A
Authority: JP
Inventors: 嘉典小西
Original assignee: Omron Corp
Current assignee: Omron Corp
Priority date: 2019-06-12
Filing date: 2019-06-12
Publication date: 2023-06-27
Anticipated expiration: 2039-06-12
Also published as: CN113939852A; EP3961556A4; EP3961556A1; JPWO2020250348A1; US20220230459A1; WO2020250348A1

Description

本発明は、テンプレートマッチングにより３次元物体を認識する技術に関する。 The present invention relates to technology for recognizing a three-dimensional object by template matching.

画像から物体を認識（検出）する方法の一つとしてテンプレートマッチングがある。テンプレートマッチングは、認識対象となる物体のモデル（テンプレート）を予め用意しておき、入力画像とモデルのあいだの画像特徴の一致度を評価することで、入力画像に含まれる物体を検出する方法である。テンプレートマッチングによる物体認識は、例えば、ＦＡ（Factory Automation）における検査やピッキング、ロボットビジョン、監視カメラなど、多岐にわたる分野で利用されている。 Template matching is one of methods for recognizing (detecting) an object from an image. Template matching is a method of detecting an object contained in an input image by preparing a model (template) of an object to be recognized in advance and evaluating the matching degree of image features between the input image and the model. be. Object recognition by template matching is used in a wide variety of fields, such as inspection and picking in FA (Factory Automation), robot vision, and monitoring cameras.

近年、テンプレートマッチングを物体の３次元的な位置及び姿勢の認識に応用する技術に注目が集まっている。その基本的な原理は、対象物体に対する視点位置を変えることでビュー（見え）の異なる多数のテンプレートを用意し、それらのテンプレートの中から入力画像における対象物体のビューに最もマッチするものを選択することで、カメラに対する対象物体の３次元的な位置及び姿勢を特定するというものである。しかしこの方法は、認識の分解能がテンプレートのバリエーションに比例するため、認識の分解能を上げようとすると、テンプレート作成の負荷増大、テンプレートのデータ量の増加、テンプレートマッチングの処理時間の増大などの問題が顕著になる。 In recent years, attention has been focused on techniques that apply template matching to recognition of the three-dimensional position and orientation of an object. Its basic principle is to prepare a large number of templates with different views by changing the viewpoint position for the target object, and select the one that best matches the view of the target object in the input image from those templates. Thus, the three-dimensional position and orientation of the target object with respect to the camera are specified. However, in this method, the resolution of recognition is proportional to the variation of the template, so if you try to increase the resolution of recognition, problems such as an increase in the load of template creation, an increase in the amount of template data, and an increase in the processing time for template matching will occur. become prominent.

このような問題への対応策として、特許文献１には、デプスセンサによって対象物体の奥行き距離を計測し、その奥行き距離に応じてテンプレート（特徴値をサンプリングする２次元グリッド）をスケーリング（拡大／縮小）する、というアイデアが開示されている。 As a countermeasure against such problems, Patent Document 1 discloses that the depth distance of a target object is measured by a depth sensor, and a template (two-dimensional grid for sampling feature values) is scaled (enlarged/reduced) according to the depth distance. ) is disclosed.

米国特許第９６５９２１７号明細書U.S. Pat. No. 9,659,217

特許文献１の方法によれば、奥行き距離のみが異なる複数のビューのテンプレートを共通化できるため、テンプレート作成の負荷軽減や、テンプレート数の削減などの効果が期待できる。しかしながら、テンプレートマッチングの探索時に、各画素の奥行き距離に合わせてテンプレートを拡大又は縮小する処理が発生するため、処理速度が遅くなるというデメリットがある。テンプレートの拡大又は縮小にかかる時間を削減するために、テンプレートマッチング処理に先立ち、対象物体が存在し得る距離範囲と必要な分解能に応じて複数スケールのテンプレートを生成しワークメモリに保持しておくことも技術的には可能であるが、非常に多くのメモリ容量が必要となるため実用的でない。 According to the method of Japanese Patent Laid-Open No. 2002-200312, since templates for a plurality of views that differ only in depth distance can be shared, effects such as reduction in the load of template creation and reduction in the number of templates can be expected. However, when searching for template matching, a process of enlarging or reducing the template according to the depth distance of each pixel occurs, so there is a demerit that the processing speed is slowed down. In order to reduce the time required to enlarge or reduce the template, generate multiple scale templates according to the distance range where the target object can exist and the required resolution and store them in the work memory prior to the template matching process. Although it is technically possible, it is impractical because it requires a very large memory capacity.

本発明は、上記実情に鑑みてなされたものであり、様々な奥行き距離に存在し得る物体をテンプレートマッチングにより高速に検出することを可能にする実用的な技術を提供することを目的とする。 SUMMARY OF THE INVENTION It is an object of the present invention to provide a practical technique that enables high-speed detection of objects that can exist at various depths by template matching.

本発明の一側面は、各々が３次元情報をもつ複数の点から構成される３次元データを取得する３次元データ取得部と、前記３次元データの各点をある投影面に平行投影することにより２次元画像を生成する平行投影変換部と、テンプレートマッチングにより前記２次元画像から対象物体を検出する認識処理部と、を有することを特徴とする物体認識装置を提供する。 One aspect of the present invention is a three-dimensional data acquisition unit that acquires three-dimensional data composed of a plurality of points each having three-dimensional information, and parallel projection of each point of the three-dimensional data onto a certain projection plane. and a recognition processing unit for detecting a target object from the two-dimensional image by template matching.

３次元データは、３次元計測により得られるデータであるとよい。３次元計測の方式はどのようなものでもよく、アクティブ計測方式でもパッシブ計測方式でもよい。テンプレートマッチングは、対象物体のテンプレート（モデル）と２次元画像における注目領域とのあいだの画像特徴の一致度（類似度）を評価することによって、当該注目領域内の部分画像が対象物体の画像であるか否かを判断する方法である。対象物体のビュー（見え）が異なる複数のテンプレートをテンプレートマッチングに用いれば、対象物体の姿勢の認識も可能である。 The three-dimensional data is preferably data obtained by three-dimensional measurement. Any three-dimensional measurement method may be used, and may be an active measurement method or a passive measurement method. Template matching evaluates the degree of matching (similarity) of image features between a template (model) of a target object and a region of interest in a two-dimensional image, so that a partial image within the region of interest is the image of the target object. It is a method of judging whether or not there is. By using a plurality of templates with different views of the target object for template matching, it is possible to recognize the posture of the target object.

本発明では、３次元データを平行投影することで生成された２次元画像をテンプレートマッチングに利用する。平行投影では、投影面から対象物体までの距離にかかわらず、対象物体は同じ大きさで投影される。それゆえ、平行投影により生成された２次元画像においては、対象物体の像は（その奥行き距離によらず）常に同じ大きさをとる。したがって、単一のサイズのテンプレートだけを用いてマッチングを行えばよいので、従来方法（奥行き距離に応じてテンプレートのスケーリングを行う方法）に比べて高速な処理が可能である。また、テンプレートの数及びデータ量を削減できるとともに、ワークメモリの必要量も少なくて済むため、実用性に優れるという利点もある。 In the present invention, a two-dimensional image generated by parallel projection of three-dimensional data is used for template matching. In parallel projection, the target object is projected in the same size regardless of the distance from the projection plane to the target object. Therefore, in a two-dimensional image generated by parallel projection, the image of the target object always has the same size (regardless of its depth distance). Therefore, since matching can be performed using only templates of a single size, high-speed processing is possible compared to conventional methods (methods in which templates are scaled according to depth distances). In addition, the number of templates and the amount of data can be reduced, and the required amount of work memory can be reduced, so there is also the advantage of being excellent in practicality.

前記認識処理部は、前記対象物体のテンプレートとして、前記対象物体を平行投影した画像から生成されたテンプレートを用いてもよい。テンプレートも平行投影画像から生成することによって、テンプレートと２次元画像における対象物体像とのマッチング精度が向上するため、物体認識処理の信頼性を高めることができる。 The recognition processing unit may use a template generated from an image obtained by parallel projection of the target object as the template of the target object. By also generating the template from the parallel projection image, the matching accuracy between the template and the target object image in the two-dimensional image is improved, so that the reliability of the object recognition processing can be enhanced.

前記投影面は任意に設定してよいが、３次元データを構成する各点の投影点が前記投影面上でできるだけ広い範囲に分布するように前記投影面を設定することが好ましい。例えば、前記３次元データが、カメラで撮影された画像を用いて生成されたデータである場合には、前記平行投影変換部は、前記カメラの光軸に直交するように前記投影面を設定してもよい。 The projection plane may be set arbitrarily, but it is preferable to set the projection plane so that the projection points of the points forming the three-dimensional data are distributed as wide as possible on the projection plane. For example, when the three-dimensional data is data generated using an image captured by a camera, the parallel projection conversion unit sets the projection plane so as to be orthogonal to the optical axis of the camera. may

前記平行投影変換部は、前記３次元データにおける第１の点が前記２次元画像における第１の画素に投影された場合に、前記第１の点の３次元情報から求まるデプス情報を前記第１の画素に関連付けてもよい。前記３次元データの各点が輝度の情報を有している場合には、前記平行投影変換部は、前記３次元データにおける第１の点が前記２次元画像における第１の画素に投影された場合に、前記第１の点の輝度の情報を前記第１の画素に関連付けてもよい。前記３次元データの各点が色の情報を有している場合には、前記平行投影変換部は、前記３次元データにおける第１の点が前記２次元画像における第１の画素に投影された場合に、前記第１の点の色の情報を前記第１の画素に関連付けてもよい。 The parallel projection conversion unit converts depth information obtained from the three-dimensional information of the first point into the first pixel when the first point in the three-dimensional data is projected onto the first pixel in the two-dimensional image. pixels. When each point of the three-dimensional data has luminance information, the parallel projection conversion unit converts the first point of the three-dimensional data into the first pixel of the two-dimensional image. In some cases, the luminance information of the first point may be associated with the first pixel. When each point of the three-dimensional data has color information, the parallel projection conversion unit converts the first point of the three-dimensional data into the first pixel of the two-dimensional image. In some cases, the color information of the first point may be associated with the first pixel.

前記平行投影変換部は、前記２次元画像における第２の画素に投影される点が存在しない場合に、前記第２の画素の周辺の画素に関連付けられた情報に基づいて、前記第２の画素に関連付ける情報を生成してもよい。例えば、前記平行投影変換部は、前記第２の画素の周辺の画素に関連付けられた情報を補間することによって、前記第２の画素に関連付ける情報を求めてもよい。このような処理により２次元画像の情報量を増すことで、テンプレートマッチングの精度向上が期待できる。 When there is no point to be projected onto the second pixel in the two-dimensional image, the parallel projection conversion unit converts the second pixel into may generate information associated with For example, the parallel projection conversion unit may obtain information associated with the second pixel by interpolating information associated with pixels around the second pixel. By increasing the amount of information in the two-dimensional image through such processing, an improvement in the accuracy of template matching can be expected.

前記３次元データは、カメラで撮影された画像を用いて生成されたデータであり、前記平行投影変換部は、前記３次元データにおける複数の点が前記投影面上の同じ位置に投影される場合には、前記複数の点のうち前記カメラに最も近い点を前記２次元画像の生成に用いてもよい。このような処理により、投影面側から見たときの物体同士の重なり（隠れ）を考慮した平行投影像が生成されるため（つまり、カメラから見える点のみが２次元画像にマッピングされるため）、テンプレートマッチングによる物体認識処理を精度良く行うことができる。 The three-dimensional data is data generated using an image captured by a camera, and the parallel projection conversion unit projects a plurality of points in the three-dimensional data onto the same position on the projection plane. Alternatively, a point closest to the camera among the plurality of points may be used to generate the two-dimensional image. This processing generates a parallel projection image that considers overlapping (hidden) objects when viewed from the projection plane side (that is, only the points visible from the camera are mapped to the two-dimensional image). , object recognition processing by template matching can be performed with high accuracy.

本発明は、上述した手段ないし構成の少なくとも一部を有する物体認識装置として捉えてもよいし、上述した平行投影変換を行う画像処理装置として捉えてもよい。また、本発明は、上記処理の少なくとも一部を含む物体認識方法、画像処理方法、テンプレートマッチング方法、物体認識装置の制御方法などとして捉えてもよく、または、かかる方法を実現するためのプログラムやそのプログラムを非一時的に記録した記録媒体として捉えることもできる。なお、上記手段および処理の各々は可能な限り互いに組み合わせて本発明を構成することができる。 The present invention may be regarded as an object recognition device having at least a part of the above-described means or configuration, or as an image processing device that performs the above-described parallel projection transformation. In addition, the present invention may be regarded as an object recognition method, an image processing method, a template matching method, an object recognition device control method, etc., including at least a part of the above processing, or a program or program for realizing such a method. It can also be regarded as a recording medium in which the program is non-temporarily recorded. It should be noted that each of the means and processes described above can be combined with each other as much as possible to constitute the present invention.

本発明によれば、様々な奥行き距離に存在し得る物体をテンプレートマッチングにより高速に検出することを可能にする実用的な技術を提供することができる。 According to the present invention, it is possible to provide a practical technique that enables high-speed detection of objects that can exist at various depth distances by template matching.

図１は、物体認識装置による処理を模式的に示す図である。FIG. 1 is a diagram schematically showing processing by an object recognition device. 図２は、物体認識装置の全体構成を模式的に示す図である。FIG. 2 is a diagram schematically showing the overall configuration of the object recognition device. 図３は、画像処理装置の構成を示すブロック図である。FIG. 3 is a block diagram showing the configuration of the image processing apparatus. 図４は、テンプレート作成処理のフローチャートである。FIG. 4 is a flowchart of template creation processing. 図５は、視点位置の設定例を示す図である。FIG. 5 is a diagram showing a setting example of viewpoint positions. 図６は、テンプレート作成処理における平行投影画像の例を示す図である。FIG. 6 is a diagram showing an example of parallel projection images in the template creation process. 図７は、物体認識処理のフローチャートである。FIG. 7 is a flowchart of object recognition processing. 図８は、物体認識処理における平行投影変換のフローチャートである。FIG. 8 is a flowchart of parallel projection conversion in object recognition processing. 図９は、カメラ座標系と投影画像座標系の設定例を示す図である。FIG. 9 is a diagram showing a setting example of the camera coordinate system and the projection image coordinate system. 図１０は、投影点補完処理のフローチャートである。FIG. 10 is a flowchart of projection point complementing processing.

＜適用例＞
図１は、本発明の適用例の一つである物体認識装置による処理を模式的に示している。図１の符号１０は、ステージ１０１上の３つの物体１０２ａ、１０２ｂ、１０２ｃを斜め上方からカメラ１０３によって計測（撮影）する様子を示している。物体１０２ａ、１０２ｂ、１０２ｃは同じ形状（円柱形）・同じサイズの物体であるが、カメラ１０３からの奥行き距離が物体１０２ａ、物体１０２ｂ、物体１０２ｃの順で遠い。<Application example>
FIG. 1 schematically shows processing by an object recognition device, which is one of application examples of the present invention. Reference numeral 10 in FIG. 1 indicates how three objects 102a, 102b, and 102c on the stage 101 are measured (photographed) by the camera 103 from obliquely above. The objects 102a, 102b, and 102c have the same shape (cylindrical shape) and the same size, but the depth distance from the camera 103 is farther in the order of the object 102a, the object 102b, and the object 102c.

符号１１は、カメラ１０３で撮影された画像に基づき生成された３次元データの一例である。３次元データ１１は、各点が３次元情報をもつ複数の点から構成されるデータである。３次元データ１１の形式はどのようなものでもよく、例えば、各点が３次元座標値をもつ形式のデータでもよいし、２次元画像の各点（各画素）にデプス値（奥行き距離の情報）が関連付けられた形式のデータでもよい。３次元座標値は、カメラ座標系の座標値でもよいし、グローバル座標系の座標値でもよいし、それ以外の座標系の座標値でもよい。図１の３次元データ１１はデプス画像の例であり、デプス値を便宜的に濃淡で表している（カメラ１０３から遠い点ほど暗い。）。一般的な光学系ではカメラ１０３から遠い物体ほど小さく結像するため、画像上のサイズは、物体１０２ａ、物体１０２ｂ、物体１０２ｃの順で小さくなる。 Reference numeral 11 is an example of three-dimensional data generated based on the image captured by the camera 103 . The three-dimensional data 11 is data composed of a plurality of points each having three-dimensional information. The three-dimensional data 11 may have any format, for example, data in which each point has a three-dimensional coordinate value, or a depth value (depth distance information) for each point (each pixel) of a two-dimensional image. ) may be associated with the data. The three-dimensional coordinate values may be coordinate values in the camera coordinate system, coordinate values in the global coordinate system, or coordinate values in another coordinate system. The three-dimensional data 11 in FIG. 1 is an example of a depth image, and the depth values are represented by gradation for convenience (a point farther from the camera 103 is darker). In a general optical system, an object farther from the camera 103 is formed into a smaller image, so the sizes on the image become smaller in the order of the object 102a, the object 102b, and the object 102c.

従来のテンプレートマッチングは、様々なサイズの物体に対応するために、サイズの異なる複数種類のテンプレートを用いるか、特許文献１のようにデプス値に応じてテンプレートのサイズをスケーリングしていた。しかしながら、これらの従来方法は、前述のとおり、処理速度の低下やメモリ容量の増大などの問題が生じるという不利があった。 In conventional template matching, in order to deal with objects of various sizes, a plurality of types of templates with different sizes are used, or template sizes are scaled according to depth values as in Patent Document 1. However, these conventional methods have the disadvantage of causing problems such as a decrease in processing speed and an increase in memory capacity, as described above.

そこで、本発明の実施形態では、３次元データ１１を平行投影変換して２次元画像１２を生成し、この２次元画像１２をテンプレートマッチングに用いる。平行投影変換を行うことによって、実際のサイズが同じ物体は、２次元画像１２上でのサイズも同じになる。したがって、単一のサイズのテンプレート１３を適用するだけで、２次元画像１２に含まれているすべての物体１０２ａ、１０２ｂ、１０２ｃを検出することができる。符号１４は、認識結果の例を示している。 Therefore, in the embodiment of the present invention, the three-dimensional data 11 is subjected to parallel projection transformation to generate a two-dimensional image 12, and this two-dimensional image 12 is used for template matching. Objects having the same actual size also have the same size on the two-dimensional image 12 by performing the parallel projection transformation. Therefore, all objects 102a, 102b, 102c contained in the two-dimensional image 12 can be detected by applying a template 13 of a single size. Reference numeral 14 indicates an example of recognition results.

本実施形態の方法によれば、従来方法に比べて高速な処理が可能である。また、テンプレートの数及びデータ量を削減できるとともに、ワークメモリの必要量も少なくて済むため、実用性に優れるという利点もある。なお、説明の便宜のため図１では物体１０２ａ、１０２ｂ、１０２ｃの姿勢が同じである例を示したが、物体の姿勢（つまり物体を見る角度）によってその形状が変化する場合には、認識したい姿勢ごとにテンプレート１３を用意しておけばよい。 According to the method of this embodiment, high-speed processing is possible compared to the conventional method. In addition, the number of templates and the amount of data can be reduced, and the required amount of work memory can be reduced, so there is also the advantage of being excellent in practicality. For convenience of explanation, FIG. 1 shows an example in which the objects 102a, 102b, and 102c have the same orientation. A template 13 may be prepared for each posture.

＜実施形態＞
（物体認識装置の全体構成）
図２を参照して、本発明の実施形態に係る物体認識装置について説明する。<Embodiment>
(Overall configuration of object recognition device)
An object recognition device according to an embodiment of the present invention will be described with reference to FIG.

物体認識装置２は、物品の組み立てや加工などを行う生産ラインに設置され、センサユニット２０から取り込まれたデータを用いて、テンプレートマッチングによりトレイ２６に積載された物体２７の位置・姿勢を認識（３次元の物体認識）するシステムである。トレイ２６上には、認識対象の物体（以下、「対象物体」ともいう。）２７がバラ積みされている。 The object recognition device 2 is installed in a production line that assembles and processes articles, and uses data captured from the sensor unit 20 to recognize the position and orientation of an object 27 loaded on a tray 26 by template matching ( 3D object recognition). Objects to be recognized (hereinafter also referred to as “target objects”) 27 are randomly stacked on the tray 26 .

物体認識装置２は、概略、センサユニット２０と画像処理装置２１から構成される。センサユニット２０と画像処理装置２１のあいだは有線又は無線で接続されており、センサユニット２０の出力は画像処理装置２１に取り込まれる。画像処理装置２１は、センサユニット２０から取り込まれたデータを用いて各種の処理を行うデバイスである。画像処理装置２１の処理としては、例えば、距離計測（測距）、３次元形状認識、物体認識、シーン認識などが含まれてもよい。物体認識装置２の認識結果は、例えばＰＬＣ（プログラマブルロジックコントローラ）２５やディスプレイ２２などに出力される。認識結果は、例えば、ピッキング・ロボット２８の制御、加工装置や印字装置の制御、対象物体２７の検査や計測などに利用される。 The object recognition device 2 is roughly composed of a sensor unit 20 and an image processing device 21 . A wired or wireless connection is established between the sensor unit 20 and the image processing device 21 , and the output of the sensor unit 20 is taken into the image processing device 21 . The image processing device 21 is a device that performs various processes using data captured from the sensor unit 20 . The processing of the image processing device 21 may include, for example, distance measurement (distance measurement), three-dimensional shape recognition, object recognition, scene recognition, and the like. A recognition result of the object recognition device 2 is output to a PLC (programmable logic controller) 25, a display 22, or the like, for example. The recognition results are used, for example, for control of the picking robot 28, control of processing devices and printing devices, inspection and measurement of the target object 27, and the like.

（センサユニット）
センサユニット２０は、対象物体２７の光学像を撮影するためのカメラを少なくとも有する。さらに、センサユニット２０は、対象物体２７の３次元計測を行うために必要な構成（センサ、照明装置、投光装置など）を含んでもよい。例えば、ステレオマッチング（ステレオビジョン、ステレオカメラ方式などとも呼ばれる。）によって奥行き距離を計測する場合には、センサユニット２０に複数台のカメラが設けられる。アクティブステレオの場合はさらに、対象物体２７にパターン光を投射する投光装置がセンサユニット２０に設けられる。空間コード化パターン投影方式により３次元計測を行う場合には、パターン光を投射する投光装置とカメラがセンサユニット２０に設けられる。他にも、照度差ステレオ法、ＴＯＦ（タイムオブフライト）法、位相シフト法など、対象物体２７の３次元情報を取得可能な方法であればいかなる方式を用いてもよい。(sensor unit)
The sensor unit 20 has at least a camera for taking an optical image of the target object 27 . Furthermore, the sensor unit 20 may include components (sensors, illumination devices, light projection devices, etc.) necessary for three-dimensional measurement of the target object 27 . For example, when the depth distance is measured by stereo matching (also called stereo vision, stereo camera method, etc.), the sensor unit 20 is provided with a plurality of cameras. In the case of active stereo, the sensor unit 20 is further provided with a light projection device for projecting pattern light onto the target object 27 . When three-dimensional measurement is performed by the space-encoded pattern projection method, the sensor unit 20 is provided with a light projection device for projecting pattern light and a camera. In addition, any method such as a photometric stereo method, a TOF (time of flight) method, a phase shift method, or the like may be used as long as the method can acquire the three-dimensional information of the target object 27 .

（画像処理装置）
画像処理装置２１は、例えば、ＣＰＵ（プロセッサ）、ＲＡＭ（メモリ）、不揮発性記憶装置（ハードディスク、ＳＳＤなど）、入力装置、出力装置などを備えるコンピュータにより構成される。この場合、ＣＰＵが、不揮発性記憶装置に格納されたプログラムをＲＡＭに展開し、当該プログラムを実行することによって、後述する各種の構成が実現される。ただし、画像処理装置２１の構成はこれに限られず、後述する構成のうちの全部又は一部を、ＦＰＧＡやＡＳＩＣなどの専用回路で実現してもよいし、クラウドコンピューティングや分散コンピューティングにより実現してもよい。(Image processing device)
The image processing device 21 is configured by a computer including, for example, a CPU (processor), RAM (memory), nonvolatile storage device (hard disk, SSD, etc.), input device, output device, and the like. In this case, the CPU develops a program stored in the nonvolatile storage device in the RAM and executes the program, thereby realizing various configurations described later. However, the configuration of the image processing device 21 is not limited to this, and all or part of the configuration described later may be realized by a dedicated circuit such as FPGA or ASIC, or by cloud computing or distributed computing. You may

図３は、画像処理装置２１の構成を示すブロック図である。画像処理装置２１は、テンプレート作成装置３０の構成と、物体認識処理装置３１の構成を有している。テンプレート作成装置３０は、物体認識処理で利用するテンプレートを作成するための構成であり、３次元ＣＡＤデータ取得部３００、平行投影パラメータ設定部３０１、視点位置設定部３０２、２次元投影画像作成部３０３、特徴抽出部３０４、テンプレート作成部３０５を有する。物体認識処理装置３１は、テンプレートマッチングによる物体認識処理を実行するための構成であり、３次元データ取得部３１０、平行投影パラメータ設定部３１１、平行投影変換部３１２、特徴抽出部３１３、テンプレート記憶部３１４、テンプレートマッチング部３１５、認識結果出力部３１６を有する。本実施形態では、特徴抽出部３１３、テンプレート記憶部３１４、及び、テンプレートマッチング部３１５により、本発明の「認識処理部」が構成されている。 FIG. 3 is a block diagram showing the configuration of the image processing device 21. As shown in FIG. The image processing device 21 has the configuration of the template creation device 30 and the configuration of the object recognition processing device 31 . The template creation device 30 is a configuration for creating a template used in object recognition processing, and includes a three-dimensional CAD data acquisition unit 300, a parallel projection parameter setting unit 301, a viewpoint position setting unit 302, and a two-dimensional projection image creation unit 303. , a feature extraction unit 304 and a template creation unit 305 . The object recognition processing device 31 is configured to execute object recognition processing by template matching, and includes a three-dimensional data acquisition unit 310, a parallel projection parameter setting unit 311, a parallel projection conversion unit 312, a feature extraction unit 313, and a template storage unit. 314 , a template matching unit 315 , and a recognition result output unit 316 . In this embodiment, the feature extraction unit 313, the template storage unit 314, and the template matching unit 315 constitute the "recognition processing unit" of the present invention.

（テンプレート作成処理）
図４のフローチャートを参照して、テンプレート作成装置３０によるテンプレート作成処理の一例を説明する。(Template creation process)
An example of template creation processing by the template creation device 30 will be described with reference to the flowchart of FIG.

ステップＳ４００において、３次元ＣＡＤデータ取得部３００が、対象物体２７の３次元ＣＡＤデータを取得する。ＣＡＤデータは、画像処理装置２１の内部記憶装置から読み込んでもよいし、外部のＣＡＤシステムやストレージなどからネットワークを介して取得してもよい。なお、ＣＡＤデータの代わりに、３次元センサなどで計測された３次元形状データを取得してもよい。 In step S<b>400 , the three-dimensional CAD data acquisition unit 300 acquires three-dimensional CAD data of the target object 27 . The CAD data may be read from the internal storage device of the image processing apparatus 21, or may be obtained from an external CAD system, storage, or the like via a network. Note that three-dimensional shape data measured by a three-dimensional sensor or the like may be obtained instead of CAD data.

ステップＳ４０１において、視点位置設定部３０２が、テンプレートを作成する視点位置を設定する。図５は、視点位置の設定例を示している。この例では、対象物体２７を包含する八十面体の４２個の頂点に視点（黒丸で図示）を設定している。なお、視点の数や配置は、要求される分解能、対象物体２７の形状や採り得る姿勢などに応じて適宜設定すればよい。視点の数や配置は、ユーザにより指定されてもよいし、視点位置設定部３０２によって自動で設定されてもよい。 In step S401, the viewpoint position setting unit 302 sets a viewpoint position for creating a template. FIG. 5 shows an example of setting viewpoint positions. In this example, viewpoints (indicated by black circles) are set at 42 vertices of an octahedron including the target object 27 . Note that the number and arrangement of viewpoints may be appropriately set according to the required resolution, the shape of the target object 27, possible postures, and the like. The number and arrangement of viewpoints may be specified by the user, or may be automatically set by the viewpoint position setting unit 302 .

ステップＳ４０２において、平行投影パラメータ設定部３０１が、テンプレート作成に使用する平行投影パラメータを設定する。ここでは、平行投影パラメータとして、ｒｅｓ_ｘ，ｒｅｓ_ｙの２つのパラメータを用いる。（ｒｅｓ_ｘ，ｒｅｓ_ｙ）は投影画像の１画素の大きさ（単位はｍｍ）である。なお、後述する物体認識処理における平行投影変換でも平行投影パラメータを用いるが、テンプレート作成時と物体認識処理時で同じ値のパラメータを使用するとよい。平行投影パラメータの値を揃えることで、テンプレートにおける対象物体２７のサイズと物体認識処理で生成される平行投影画像における対象物体２７のサイズとが一致するため、テンプレートマッチングの際にテンプレート又は画像のスケール調整をする必要がなくなるからである。In step S402, the parallel projection parameter setting unit 301 sets parallel projection parameters used for template creation. Here, two parameters res _x and res _y are used as parallel projection parameters. (res _x , res _y ) is the size of one pixel of the projected image (unit: mm). Parallel projection parameters are also used in parallel projection transformation in object recognition processing, which will be described later. By aligning the values of the parallel projection parameters, the size of the target object 27 in the template matches the size of the target object 27 in the parallel projection image generated by the object recognition processing. This is because it eliminates the need for adjustment.

ステップＳ４０３において、２次元投影画像作成部３０３が、３次元ＣＡＤデータを平行投影した２次元投影画像を作成する。図６は、２次元投影画像の例を示している。対象物体２７の表面上の各点を、視点ＶＰを通る投影面６２に平行投影することによって、視点ＶＰに対応する２次元投影画像６０が作成される。 In step S403, the two-dimensional projection image creation unit 303 creates a two-dimensional projection image by parallel projection of the three-dimensional CAD data. FIG. 6 shows an example of a two-dimensional projection image. A two-dimensional projection image 60 corresponding to the viewpoint VP is created by parallel-projecting each point on the surface of the target object 27 onto a projection plane 62 passing through the viewpoint VP.

ステップＳ４０４において、特徴抽出部３０４が、ステップＳ４０３で作成された２次元投影画像６０から対象物体２７の画像特徴を抽出する。画像特徴としては、例えば、輝度、色、輝度勾配方向、量子化勾配方向、ＨｏＧ（Histogram of Oriented Gradients）、表面の法線方向、ＨＡＡＲ－ｌｉｋｅ、ＳＩＦＴ（Scale-Invariant Feature Transform）などを用いることができる。輝度勾配方向は、特徴点を中心とする局所領域での輝度の勾配の方向（角度）を連続値で表すものであり、量子化勾配方向は、特徴点を中心とする局所領域での輝度の勾配の方向を離散値で表す（例えば、８方向を０～７の１バイトの情報で保持する）ものである。特徴抽出部３０４は、２次元投影画像６０の全ての点（画素）について画像特徴を求めてもよいし、所定の規則に従ってサンプリングした一部の点について画像特徴を求めてもよい。画像特徴が得られた点を特徴点と呼ぶ。 In step S404, the feature extraction unit 304 extracts image features of the target object 27 from the two-dimensional projection image 60 created in step S403. Image features include, for example, brightness, color, brightness gradient direction, quantization gradient direction, HoG (Histogram of Oriented Gradients), surface normal direction, HAAR-like, SIFT (Scale-Invariant Feature Transform), etc. can be done. The intensity gradient direction represents the direction (angle) of the intensity gradient in a local area centered on a feature point as a continuous value. Gradient directions are represented by discrete values (for example, 8 directions are stored as 1-byte information from 0 to 7). The feature extraction unit 304 may obtain image features for all points (pixels) of the two-dimensional projection image 60, or may obtain image features for some points sampled according to a predetermined rule. A point from which an image feature is obtained is called a feature point.

ステップＳ４０５において、テンプレート作成部３０５が、ステップＳ４０４で抽出された画像特徴に基づいて、視点ＶＰに対応するテンプレートを作成する。テンプレートは、例えば、各特徴点の座標値と抽出された画像特徴とを含むデータセットである。 In step S405, the template creation unit 305 creates a template corresponding to the viewpoint VP based on the image features extracted in step S404. A template is, for example, a data set including coordinate values of each feature point and extracted image features.

ステップＳ４０３～Ｓ４０５の処理が、ステップＳ４０１で設定された全ての視点について行われる（ステップＳ４０６）。全ての視点についてテンプレートの作成が完了すると、テンプレート作成部３０５が、テンプレートのデータを物体認識処理装置３１のテンプレート記憶部３１４に格納する（ステップＳ４０７）。以上でテンプレート作成処理は終了である。 The processing of steps S403 to S405 is performed for all viewpoints set in step S401 (step S406). When template creation for all viewpoints is completed, the template creation unit 305 stores template data in the template storage unit 314 of the object recognition processing device 31 (step S407). This completes the template creation process.

（物体認識処理）
図７のフローチャートを参照して、物体認識処理装置３１による物体認識処理の一例を説明する。(Object recognition processing)
An example of object recognition processing by the object recognition processing device 31 will be described with reference to the flowchart of FIG.

ステップＳ７００において、３次元データ取得部３１０が、センサユニット２０で撮影された画像に基づいて、視野内の３次元データを生成する。本実施形態では、投光装置からパターン光を投射した状態で、２台のカメラによってステレオ画像を撮影し、画像間の視差に基づき奥行き距離を計算する、アクティブステレオ方式によって視野内の各点の３次元情報を得る。 In step S<b>700 , the three-dimensional data acquisition section 310 generates three-dimensional data within the field of view based on the image captured by the sensor unit 20 . In the present embodiment, stereo images are captured by two cameras while pattern light is projected from the projection device, and the depth distance is calculated based on the parallax between the images. Obtain 3D information.

ステップＳ７０１において、平行投影パラメータ設定部３１１が、平行投影変換に使用する平行投影パラメータを設定する。ここでは、平行投影パラメータとして、ｒｅｓ_ｘ，ｒｅｓ_ｙ，ｃ_ｘ，ｃ_ｙの４つのパラメータを用いる。（ｒｅｓ_ｘ，ｒｅｓ_ｙ）は投影画像の１画素の大きさ（単位はｍｍ）であり、任意の値に設定してよい。例えば、センサユニット２０のカメラの焦点距離（ｆ_ｘ，ｆ_ｙ）を用いて、
ｒｅｓ_ｘ＝ｄ／ｆ_ｘ
ｒｅｓ_ｙ＝ｄ／ｆ_ｙ
としてもよい。ｄは、対象物体２７が存在し得る奥行き距離に応じて設定される定数である。例えば、センサユニット２０から対象物体２７までの奥行き距離の平均値、最小値、もしくは、最大値などを定数ｄに設定してもよい。なお、前述のように、（ｒｅｓ_ｘ，ｒｅｓ_ｙ）については、テンプレート作成時と同じ値を用いることが好ましい。（ｃ_ｘ，ｃ_ｙ）は投影画像の中心座標である。In step S701, the parallel projection parameter setting unit 311 sets parallel projection parameters used for parallel projection conversion. Here, four parameters of res _x , res _y , c _x , and c _y are used as parallel projection parameters. (res _x , res _y ) is the size of one pixel of the projection image (unit: mm), and may be set to any value. For example, using the focal length (f _x , f _y ) of the camera of the sensor unit 20,
_resx = d/ _fx
res _y =d/f _y
may be d is a constant set according to the depth distance in which the target object 27 can exist. For example, the average value, minimum value, or maximum value of the depth distance from the sensor unit 20 to the target object 27 may be set as the constant d. As described above, it is preferable to use the same values for (res _x , res _y ) as those used when creating the template. (c _x , c _y ) are the center coordinates of the projection image.

ステップＳ７０２において、平行投影変換部３１２が、３次元データにおける各点（以下、「３次元点」と呼ぶ）を所定の投影面に平行投影することにより、２次元投影画像を生成する。 In step S702, the parallel projection conversion unit 312 generates a two-dimensional projection image by parallel-projecting each point in the three-dimensional data (hereinafter referred to as "three-dimensional point") onto a predetermined projection plane.

図８及び図９を参照して、平行投影変換の詳細を説明する。ステップＳ８００において、平行投影変換部３１２は、３次元点を平行投影した場合の画像座標値を計算する。カメラ座標系を（Ｘ，Ｙ，Ｚ）、投影画像の画像座標系を（ｘ，ｙ）とする。図９の例では、原点Ｏがセンサユニット２０のカメラのレンズの中心（主点）に一致し、Ｚ軸が光軸に重なり、Ｘ軸とＹ軸がカメラの撮像素子の水平方向と垂直方向にそれぞれ平行となるように、カメラ座標系が設定される。また、画像座標系は、画像中心（ｃ_ｘ，ｃ_ｙ）がカメラ座標系のＺ軸上にあり、ｘ軸とｙ軸がカメラ座標系のＸ軸とＹ軸にそれぞれ平行となるように設定される。画像座標系のｘｙ平面が投影面である。すなわち、本実施形態では、カメラの光軸に直交するように、平行投影変換の投影面が設定されている。図９のように座標系を設定した場合、３次元点（Ｘ_ｉ，Ｙ_ｉ，Ｚ_ｉ）に対応する、平行投影変換後の画像座標値（ｘ_ｉ，ｙ_ｉ）は、
ｘ_ｉ＝ＲＯＵＮＤ（Ｘ_ｉ／ｒｅｓ_ｘ＋ｃ_ｘ）
ｙ_ｉ＝ＲＯＵＮＤ（Ｙ_ｉ／ｒｅｓ_ｙ＋ｃ_ｙ）
により求まる。ＲＯＵＮＤは小数点以下を丸める演算子である。Details of the parallel projection transformation will be described with reference to FIGS. 8 and 9. FIG. In step S800, the parallel projection conversion unit 312 calculates image coordinate values when a three-dimensional point is parallel projected. Assume that the camera coordinate system is (X, Y, Z) and the image coordinate system of the projection image is (x, y). In the example of FIG. 9, the origin O coincides with the center (principal point) of the lens of the camera of the sensor unit 20, the Z axis overlaps the optical axis, and the X axis and Y axis are the horizontal and vertical directions of the imaging element of the camera. A camera coordinate system is set so as to be parallel to . The image coordinate system is set so that the image center (c _x , c _y ) is on the Z axis of the camera coordinate system, and the x and y axes are parallel to the X and Y axes of the camera coordinate system, respectively. be done. The xy plane of the image coordinate system is the projection plane. That is, in this embodiment, the projection plane for parallel projection conversion is set so as to be orthogonal to the optical axis of the camera. When the coordinate system is set as shown in FIG. 9, the image coordinate values (x i , y _i ) after parallel projection transformation corresponding to the three-dimensional point (X _i _, Y _i , Z _i ) are
x _i =ROUND(X _i /res _x +c _x )
y _i =ROUND(Y _i /res _y +c _y )
Determined by ROUND is an operator for rounding to decimal places.

ステップＳ８０１において、平行投影変換部３１２は、画像座標値（ｘ_ｉ，ｙ_ｉ）に投影された３次元点が既に存在していたかどうかを調べる。具体的には、投影画像の画素（ｘ_ｉ，ｙ_ｉ）に対し既に３次元点の情報が関連付けられているかどうかがチェックされる。関連付けられている３次元点が未だ無い場合（ステップＳ８０１のＮＯ）、平行投影変換部３１２は、画素（ｘ_ｉ，ｙ_ｉ）に対し３次元点（Ｘ_ｉ，Ｙ_ｉ，Ｚ_ｉ）の情報を関連付ける（ステップＳ８０３）。本実施形態では、３次元点の座標値（Ｘ_ｉ，Ｙ_ｉ，Ｚ_ｉ）を画素（ｘ_ｉ，ｙ_ｉ）に関連付けるが、これに限らず、３次元点のデプス情報（例えばＺ_ｉの値）、色情報（例えばＲＧＢ値）、輝度情報などを関連付けてもよい。関連付けられている３次元点が既に存在していた場合（ステップＳ８０１のＹＥＳ）、平行投影変換部３１２は、Ｚ_ｉの値と、既に関連付けられているＺの値とを比較し、Ｚ_ｉの方が小さければ（ステップＳ８０２のＹＥＳ）、画素（ｘ_ｉ，ｙ_ｉ）に関連付けられた情報を３次元点（Ｘ_ｉ，Ｙ_ｉ，Ｚ_ｉ）の情報で上書きする（ステップＳ８０３）。このような処理により、複数の３次元点が投影面上の同じ位置に投影される場合には、複数の３次元点のうちカメラに最も近い３次元点の情報が投影画像の生成に用いられることとなる。ステップＳ８００～Ｓ８０３の処理が全ての３次元点について行われたら、図７のステップＳ７０３に進む（ステップＳ８０４）。In step S801, the parallel projection transformation unit 312 checks whether a three-dimensional point projected onto the image coordinate values (x _i , y _i ) already exists. Specifically, it is checked whether the pixel (x _i , y _i ) of the projection image is already associated with 3D point information. If there is no associated three-dimensional point yet (NO in step S801), the parallel projection conversion unit 312 converts the information of the three-dimensional point (X _i , Y _i , Z _i ) to the pixel (x _i , y _i ). are associated (step S803). In this embodiment, the coordinate values (X _i , Y _i , Z _i ) of the three-dimensional point are associated with the pixel (x _i , y _i ₎ . values), color information (eg, RGB values), luminance information, and the like. If the associated three-dimensional point _already exists (YES in step S801), the parallel projection transformation unit 312 compares the value of Z _i with the already associated Z value, If it is smaller (YES in step S802), the information associated with the pixel (x _i , y _i ) is overwritten with the information of the three-dimensional point (X _i , Y _i , Z _i ) (step S803). When a plurality of 3D points are projected onto the same position on the projection plane by such processing, the information of the 3D point closest to the camera among the plurality of 3D points is used to generate the projected image. It will happen. After the processing of steps S800 to S803 has been performed for all three-dimensional points, the process proceeds to step S703 in FIG. 7 (step S804).

ステップＳ７０３において、特徴抽出部３１３が、投影画像から画像特徴を抽出する。ここで抽出される画像特徴は、テンプレート作成に用いられた画像特徴と同じものである。ステップＳ７０４において、テンプレートマッチング部３１５が、テンプレート記憶部３１４からテンプレートを読み込み、当該テンプレートを用いたテンプレートマッチング処理によって、投影画像から対象物体を検出する。このとき、異なる視点のテンプレートを用いることで、対象物体の姿勢を認識することもできる。ステップＳ７０５において、認識結果出力部３１６が認識結果を出力する。以上で物体認識処理が終了する。 In step S703, the feature extraction unit 313 extracts image features from the projection image. The image features extracted here are the same as the image features used to create the template. In step S704, the template matching unit 315 reads the template from the template storage unit 314, and detects the target object from the projection image by template matching processing using the template. At this time, the orientation of the target object can also be recognized by using templates of different viewpoints. In step S705, the recognition result output unit 316 outputs the recognition result. The object recognition processing ends here.

（本実施形態の利点）
以上述べた構成及び処理では、３次元データを平行投影することで生成された２次元画像をテンプレートマッチングに利用する。平行投影では、投影面から対象物体までの距離にかかわらず、対象物体は同じ大きさで投影される。それゆえ、平行投影により生成された２次元画像においては、対象物体の像は（その奥行き距離によらず）常に同じ大きさをとる。したがって、単一のサイズのテンプレートだけを用いてマッチングを行えばよいので、従来方法に比べて高速な処理が可能である。また、テンプレートの数及びデータ量を削減できるとともに、ワークメモリの必要量も少なくて済むため、実用性に優れるという利点もある。(Advantages of this embodiment)
In the configuration and processing described above, a two-dimensional image generated by parallel projection of three-dimensional data is used for template matching. In parallel projection, the target object is projected in the same size regardless of the distance from the projection plane to the target object. Therefore, in a two-dimensional image generated by parallel projection, the image of the target object always has the same size (regardless of its depth distance). Therefore, since matching can be performed using only templates of a single size, high-speed processing is possible compared to the conventional method. In addition, the number of templates and the amount of data can be reduced, and the required amount of work memory can be reduced, so there is also the advantage of being excellent in practicality.

また、本実施形態では、テンプレートも平行投影画像から生成することとしたので、テンプレートと平行投影変換により得られた画像における対象物体像とのマッチング精度が向上する。これにより、物体認識処理の信頼性を高めることができる。 Further, in this embodiment, since the template is also generated from the parallel projection image, the accuracy of matching between the template and the target object image in the image obtained by the parallel projection conversion is improved. Thereby, the reliability of object recognition processing can be improved.

また、本実施形態では、カメラの光軸に直交するように投影面を設定したので、カメラ座標系から画像座標系への変換の計算を簡単化でき、平衡投影変換処理の高速化、ひいてはテンプレートマッチングによる物体認識処理の高速化を図ることができる。また、カメラの光軸に直交するように投影面を設定したことで、平行投影変換後の対象物体像の歪みを抑えることもできる。 In addition, in this embodiment, the projection plane is set so as to be orthogonal to the optical axis of the camera. Therefore, the calculation of the conversion from the camera coordinate system to the image coordinate system can be simplified, and the balanced projection conversion process can be speeded up. It is possible to increase the speed of object recognition processing by matching. Further, by setting the projection plane so as to be orthogonal to the optical axis of the camera, it is possible to suppress the distortion of the target object image after parallel projection conversion.

また、複数の３次元点が同一の画素に投影される場合には、カメラに最も近い３次元点の情報のみを用いることとしたので、カメラから見たときの物体同士の重なり（隠れ）を考慮した平行投影像が生成され、テンプレートマッチングによる物体認識処理を精度良く行うことができる。 Also, when a plurality of 3D points are projected onto the same pixel, only the information of the 3D point closest to the camera is used. A parallel projection image that takes into consideration the object is generated, and object recognition processing by template matching can be performed with high accuracy.

＜その他＞
上記実施形態は、本発明の構成例を例示的に説明するものに過ぎない。本発明は上記の具体的な形態には限定されることはなく、その技術的思想の範囲内で種々の変形が可能である。<Others>
The above-described embodiment is merely an example of the configuration of the present invention. The present invention is not limited to the specific forms described above, and various modifications are possible within the technical scope of the present invention.

例えば、図１０に示す投影点補完処理を、平行投影変換処理（図７のステップＳ７０２）の後に行ってもよい。具体的には、平行投影変換部３１２が、ステップＳ７０２で生成された投影画像の各画素（ｘ_ｉ，ｙ_ｉ）について、３次元点の情報が関連付けられているか否かを調べ（ステップＳ１００）、３次元点の情報が関連付けられていない場合（つまり、投影点が存在しない場合）には、画素（ｘ_ｉ，ｙ_ｉ）の周辺の画素（例えば４近傍画素や８近傍画素など）に関連付けられている情報に基づいて、画素（ｘ_ｉ，ｙ_ｉ）用の情報を生成する（ステップＳ１０１）。例えば、ニアレストネイバー、バイリニア、バイキュービックなどの補間によって画素（ｘ_ｉ，ｙ_ｉ）用の情報を生成してもよい。そして、平行投影変換部３１２は、ステップＳ１０１で生成した情報を、画素（ｘ_ｉ，ｙ_ｉ）に対し関連付ける（ステップＳ１０２）。ステップＳ１００～Ｓ１０２の処理を投影画像の全ての画素について実施する。このような処理によって、投影画像の情報量（投影点の数）が増すので、テンプレートマッチングの精度向上が期待できる。For example, the projection point complementing process shown in FIG. 10 may be performed after the parallel projection conversion process (step S702 in FIG. 7). Specifically, the parallel projection conversion unit 312 checks whether each pixel (x _i , y _i ) of the projection image generated in step S702 is associated with three-dimensional point information (step S100). , when the information of the 3D point is not associated (that is, when the projection point does not exist), the pixel (x _i , y _i ) is associated with the surrounding pixels (for example, 4 neighboring pixels, 8 neighboring pixels, etc.) Information for the pixel (x _i , y _i ) is generated based on the information stored (step S101). For example, nearest neighbor, bilinear, bicubic, etc. interpolation may generate information for pixel (x _i , y _i ). Then, the parallel projection transformation unit 312 associates the information generated in step S101 with the pixel (x _i , y _i ) (step S102). The processing of steps S100 to S102 is performed for all pixels of the projection image. Such processing increases the amount of information (the number of projection points) of the projected image, and thus can be expected to improve the accuracy of template matching.

また、投影面の設定は図９の例に限られない。例えば、カメラ座標系の原点Ｏの後ろ側（像側）に投影面を配置してもよい。あるいは、投影面が光軸（Ｚ軸）と斜めに交わるように（つまり、投影方向が光軸と非平行になるように）、投影面を配置してもよい。 Also, the setting of the projection plane is not limited to the example in FIG. For example, the projection plane may be arranged on the back side (image side) of the origin O of the camera coordinate system. Alternatively, the projection plane may be arranged so that it obliquely intersects the optical axis (Z-axis) (that is, the projection direction is non-parallel to the optical axis).

＜付記＞
（１）各々が３次元情報をもつ複数の点から構成される３次元データを取得する３次元データ取得部（３１０）と、
前記３次元データの各点をある投影面に平行投影することにより２次元画像を生成する平行投影変換部（３１２）と、
テンプレートマッチングにより前記２次元画像から対象物体を検出する認識処理部（３１３、３１４、３１５）と、
を有することを特徴とする物体認識装置（２）。<Appendix>
(1) a three-dimensional data acquisition unit (310) for acquiring three-dimensional data composed of a plurality of points each having three-dimensional information;
a parallel projection transformation unit (312) that generates a two-dimensional image by parallel-projecting each point of the three-dimensional data onto a certain projection plane;
a recognition processing unit (313, 314, 315) that detects a target object from the two-dimensional image by template matching;
An object recognition device (2) characterized by comprising:

２：物体認識装置
２０：センサユニット
２１：画像処理装置
２２：ディスプレイ
２７：対象物体
３０：テンプレート作成装置
３１：物体認識処理装置2: Object recognition device 20: Sensor unit 21: Image processing device 22: Display 27: Target object 30: Template creation device 31: Object recognition processing device

Claims

a three-dimensional data acquisition unit that acquires three-dimensional data composed of a plurality of points each having three-dimensional information;
a parallel projection conversion unit that generates a two-dimensional image by parallel-projecting each point of the three-dimensional data onto a certain projection plane;
a recognition processing unit that detects a target object from the two-dimensional image by template matching ;
The three-dimensional data is data generated using an image captured by a camera,
The object recognition apparatus , wherein the parallel projection conversion unit sets the projection plane so as to be orthogonal to the optical axis of the camera .

a three-dimensional data acquisition unit that acquires three-dimensional data composed of a plurality of points each having three-dimensional information;
a parallel projection conversion unit that generates a two-dimensional image by parallel-projecting each point of the three-dimensional data onto a certain projection plane;
a recognition processing unit that detects a target object from the two-dimensional image by template matching;
Each point of the three-dimensional data has luminance information,
The parallel projection conversion unit associates luminance information of the first point with the first pixel when the first point in the three-dimensional data is projected onto the first pixel in the two-dimensional image.
An object recognition device characterized by:

a three-dimensional data acquisition unit that acquires three-dimensional data composed of a plurality of points each having three-dimensional information;
a parallel projection conversion unit that generates a two-dimensional image by parallel-projecting each point of the three-dimensional data onto a certain projection plane;
a recognition processing unit that detects a target object from the two-dimensional image by template matching;
has
Each point of the three-dimensional data has color information,
The parallel projection conversion unit associates color information of the first point with the first pixel when the first point in the three-dimensional data is projected onto the first pixel in the two-dimensional image. An object recognition device characterized by:

a three-dimensional data acquisition unit that acquires three-dimensional data composed of a plurality of points each having three-dimensional information;
a parallel projection conversion unit that generates a two-dimensional image by parallel-projecting each point of the three-dimensional data onto a certain projection plane;
a recognition processing unit that detects a target object from the two-dimensional image by template matching;
The three-dimensional data is data generated using an image captured by a camera,
When a plurality of points in the three-dimensional data are projected onto the same position on the projection plane, the parallel projection conversion section converts a point closest to the camera from among the plurality of points to generate the two-dimensional image. used for
An object recognition device characterized by:

The parallel projection conversion unit converts depth information obtained from the three-dimensional information of the first point into the first pixel when the first point in the three-dimensional data is projected onto the first pixel in the two-dimensional image. associated with pixels of
The object recognition device according to any one of claims 1 to 4, characterized in that:

When there is no point to be projected onto the second pixel in the two-dimensional image, the parallel projection conversion unit converts the second pixel into generate information associated with
6. The object recognition device according to claim 2, 3, or 5, characterized in that:

The parallel projection conversion unit obtains information associated with the second pixel by interpolating information associated with pixels surrounding the second pixel.
7. The object recognition device according to claim 6, characterized by:

The object recognition apparatus according to any one of claims 1 to 7, wherein the recognition processing unit uses, as a template of the target object, a template generated from an image obtained by parallel projection of the target object. .

obtaining three-dimensional data consisting of a plurality of points each having three-dimensional information;
generating a two-dimensional image by parallel projection of each point of the three-dimensional data onto a projection plane;
detecting a target object from the two-dimensional image by template matching;
has
The three-dimensional data is data generated using an image captured by a camera,
In the parallel projection, the object recognition method is characterized in that the projection plane is set so as to be orthogonal to the optical axis of the camera .

obtaining three-dimensional data consisting of a plurality of points each having three-dimensional information;
generating a two-dimensional image by parallel projection of each point of the three-dimensional data onto a projection plane;
detecting a target object from the two-dimensional image by template matching;
has
Each point of the three-dimensional data has luminance information,
The parallel projection associates luminance information of the first point with the first pixel when the first point in the three-dimensional data is projected onto the first pixel in the two-dimensional image.
An object recognition method characterized by:

obtaining three-dimensional data consisting of a plurality of points each having three-dimensional information;
generating a two-dimensional image by parallel projection of each point of the three-dimensional data onto a projection plane;
detecting a target object from the two-dimensional image by template matching;
has
Each point of the three-dimensional data has color information,
The parallel projection associates color information of the first point with the first pixel when the first point in the three-dimensional data is projected onto the first pixel in the two-dimensional image.
An object recognition method characterized by:

obtaining three-dimensional data consisting of a plurality of points each having three-dimensional information;
generating a two-dimensional image by parallel projection of each point of the three-dimensional data onto a projection plane;
detecting a target object from the two-dimensional image by template matching;
has
The three-dimensional data is data generated using an image captured by a camera,
In the parallel projection, when a plurality of points in the three-dimensional data are projected onto the same position on the projection plane, the point closest to the camera among the plurality of points is used to generate the two-dimensional image.
An object recognition method characterized by:

A program for causing a computer to execute each step of the object recognition method according to any one of claims 9 to 12 .