JP2016103094A

JP2016103094A - Image processing method, image processor, and image processing program

Info

Publication number: JP2016103094A
Application number: JP2014240234A
Authority: JP
Inventors: 亮介小関; Ryosuke Koseki; 康寿松浦; Yasuhisa Matsuura; 藤吉　弘亘; Hironobu Fujiyoshi; 弘亘藤吉
Original assignee: Toyota Industries Corp; Chubu University
Current assignee: Toyota Industries Corp; Chubu University
Priority date: 2014-11-27
Filing date: 2014-11-27
Publication date: 2016-06-02

Abstract

PROBLEM TO BE SOLVED: To provide an image processor, image processing method and image processing program which recognize an area in which an object corresponding to a previously set detection object to an input image, and can make determination even about a state of the recognized object corresponding to the detection object.SOLUTION: The image processing method includes a step for determining whether a detection object is included in an input image on the basis of a first probability about end nodes which respective partial input images respectively reach when a plurality of partial input images obtained from the input image are given to a determination tree group, and determining whether the detection object included in the input image meets a predetermined condition on the basis of a second probability about the respective end nodes.SELECTED DRAWING: Figure 9

Description

本発明は、物体認識に向けられた画像処理方法、画像処理装置、および画像処理プログラムに関する。 The present invention relates to an image processing method, an image processing apparatus, and an image processing program directed to object recognition.

製造現場などでは、画像処理技術を用いた様々な自動化技術が開発されている。例えば、特開２００４−１８８５６２号公報（特許文献１）は、ロボットに３次元視覚センサを搭載し、開口付の容器内等に存在するワークの位置姿勢を認識し、それに基づいてワークの取出しを行なうワーク取出し装置を開示する。 Various automation techniques using image processing techniques have been developed at manufacturing sites and the like. For example, Japanese Patent Application Laid-Open No. 2004-188562 (Patent Document 1) mounts a three-dimensional visual sensor on a robot, recognizes the position and orientation of a workpiece existing in a container with an opening, and takes out the workpiece based on the recognition. Disclosed is a workpiece picking device to perform.

このような物体認識技術としては、各種の方法が提案されている。例えば、特開２０１３−００３９１９号公報（特許文献２）は、カメラで取得した撮像画像データをコードブックと照合し、複数の小領域画像パターンのうち最も近い小領域画像パターンを選択し、その小領域画像パターンについて重みが閾値以上となるノードの中で重みが最も小さいノードに係るクラスを抽出し、そのクラスに対して小領域画像パターンの位置情報を投票して、物体を認識する方法を開示する。 As such an object recognition technique, various methods have been proposed. For example, Japanese Patent Laid-Open No. 2013-003919 (Patent Document 2) collates captured image data acquired by a camera with a code book, selects a closest small area image pattern from among a plurality of small area image patterns, Disclosed is a method for recognizing an object by extracting a class related to a node having the smallest weight among nodes whose weight is equal to or greater than a threshold value for the region image pattern, and voting position information of the small region image pattern to the class. To do.

また、ＲａｎｄｏｍＦｏｒｅｓｔｓ法と称される識別方法が知られている。ＲａｎｄｏｍＦｏｒｅｓｔｓ法では、学習処理として、データ集合から複数個のサブセットを抽出し、各サブセットについて決定木（単純ベイズ識別器）を構築する。すなわち、教師あり学習として、サブセットごとの決定木群が構築される。例えば、特開２０１２−０４２９９０号公報（特許文献３）は、決定木群を識別器として用いるランダムフォレスト法を画像アノテーション技術（画像識別情報付与技術）に適用した画像識別情報付与装置を開示する。 Also, an identification method called a Random Forests method is known. In the Random Forests method, as a learning process, a plurality of subsets are extracted from a data set, and a decision tree (simple Bayes classifier) is constructed for each subset. That is, a decision tree group for each subset is constructed as supervised learning. For example, Japanese Patent Laying-Open No. 2012-042990 (Patent Document 3) discloses an image identification information providing apparatus in which a random forest method using a decision tree group as a classifier is applied to an image annotation technology (image identification information providing technology).

特開２００４−１８８５６２号公報JP 2004-188562 A 特開２０１３−００３９１９号公報JP 2013-003919 A 特開２０１２−０４２９９０号公報JP 2012-042990 A

上述した先行技術では、主として、検出対象の物体がいずれの位置・姿勢にあるのか、検出対象の物体が存在するのか否か、あるいは、ある物体が複数の候補のうちいずれと一致するのかといったことを判定する。しかしながら、実際の製造現場などでは、認識された物体がおかれた状況なども評価する必要があるが、上述した先行技術では、このような状況までも評価することは想定されていない。 In the above-described prior art, mainly the position / orientation of the object to be detected, whether the object to be detected exists, or which of the plurality of candidates matches the object. Determine. However, in an actual manufacturing site or the like, it is necessary to evaluate a situation where a recognized object is placed. However, in the above-described prior art, it is not assumed that such a situation is evaluated.

そのため、撮像などにより取得された入力画像に対して、予め設定された検出対象に相当する物体が存在する領域を認識するとともに、その認識された検出対象に相当する物体の状況についても判定できる画像処理装置、画像処理方法および画像処理プログラムが要望されている。 Therefore, for an input image acquired by imaging or the like, an image that recognizes a region where an object corresponding to a preset detection target exists and can also determine the status of the object corresponding to the recognized detection target There is a demand for a processing apparatus, an image processing method, and an image processing program.

本発明のある局面に従う画像処理方法は、学習画像から得られた複数の部分学習画像を用いて、ルートノードから複数の末端ノードまでの階層構造を有する決定木群を構築するステップを含む。複数の部分学習画像は、検出対象のうち予め定められた条件に適合する部分を示す第１のサンプルと、検出対象のうち予め定められた条件に適合しない部分を示す第２のサンプルと、非検出対象を示す第３のサンプルとを含む。決定木群を構築するステップは、末端ノードではない各ノードについて、与えられた部分学習画像が、当該ノードから分岐する子ノードのうちいずれに分類されるべきかを示す分岐関数をそれぞれ決定するステップと、決定されたそれぞれの分岐関数に従って、部分学習画像の各々をいずれかの末端ノードに到達するまで順次分岐させることで、各末端ノードについて、第１のサンプルおよび第２のサンプルの合計と第３のサンプルとの割合を示す第１の確率、および、第１のサンプルと第２のサンプルとの割合を示す第２の確率、を決定するステップとを含む。画像処理方法は、さらに、入力画像から得られた複数の部分入力画像が決定木群に与えられたときの、各部分入力画像がそれぞれ到達する末端ノードについての第１の確率に基づいて、入力画像内に検出対象が含まれているか否かを判定するとともに、当該それぞれの末端ノードについての第２の確率に基づいて、入力画像内に含まれる検出対象が予め定められた条件に適合するか否かを判定するステップを含む。 An image processing method according to an aspect of the present invention includes a step of constructing a decision tree group having a hierarchical structure from a root node to a plurality of terminal nodes using a plurality of partial learning images obtained from a learning image. The plurality of partial learning images include a first sample that indicates a portion that matches a predetermined condition among detection targets, a second sample that indicates a portion that does not match a predetermined condition among detection targets, And a third sample indicating a detection target. The step of constructing a decision tree group is a step of determining a branch function indicating to which of the child nodes branched from the node the given partial learning image should be classified for each node that is not a terminal node. And sequentially branching each of the partial learning images until reaching one of the terminal nodes according to the determined branch function, and for each terminal node, the sum of the first sample and the second sample and the first sample Determining a first probability indicating a ratio of three samples and a second probability indicating a ratio of the first sample and the second sample. The image processing method further includes an input based on a first probability for a terminal node to which each partial input image arrives when a plurality of partial input images obtained from the input image are given to the decision tree group. Whether or not a detection target is included in the image and whether or not the detection target included in the input image meets a predetermined condition based on the second probability for each terminal node Determining whether or not.

好ましくは、決定木群を構築するステップは、末端ノードではない各ノードにおいて、決定された分岐関数に従って与えられた複数の部分学習画像をいずれかの子ノードにそれぞれ分類した結果に基づいて、各ノードについての、検出対象と非検出対象との識別能力を示す第１の重み、および、第１のサンプルと第２のサンプルとの識別能力を示す第２の重みをそれぞれ決定するステップをさらに含む。 Preferably, the step of constructing the decision tree group is performed for each node based on a result of classifying a plurality of partial learning images given according to the determined branch function into any one of the child nodes at each node that is not a terminal node. The method further includes determining a first weight indicating the discrimination capability between the detection target and the non-detection target, and a second weight indicating the discrimination capability between the first sample and the second sample.

あるいは、好ましくは、決定木群を構築するステップは、決定された分岐関数に従って与えられた複数の部分学習画像をいずれかの子ノードにそれぞれ分類したときに、第１のサンプルおよび第２のサンプルが同一の子ノードに分類されている割合が高いほど、第１のサンプルと第２のサンプルとの識別能力を示す重みを低くするステップをさらに含む。 Alternatively, preferably, in the step of constructing the decision tree group, the first sample and the second sample are the same when the plurality of partial learning images given according to the determined branch function are classified into any child node, respectively. The step of lowering the weight indicating the discriminating ability between the first sample and the second sample is further included as the ratio of being classified into the child nodes is higher.

好ましくは、画像処理方法は、さらに、部分学習画像、および部分入力画像のうち誤識別されたまたは誤識別される可能性の高い部分入力画像を、決定木群に与えて、各画像をいずれかの末端ノードに到達するまで順次分岐させることで、各画像が到達する末端ノードを特定するステップと、部分入力画像が到達した末端ノードにおける部分学習画像と部分入力画像との識別確率に応じて、当該末端ノードから分岐する子ノードを追加するステップとを含む。 Preferably, in the image processing method, a partial learning image and a partial input image that is misidentified or likely to be misidentified among the partial input images are further given to the decision tree group, and each of the images is selected. According to the step of identifying the terminal node that each image reaches by branching sequentially until reaching the terminal node of the image, and the identification probability between the partial learning image and the partial input image at the terminal node that the partial input image has reached, Adding a child node that branches off from the end node.

好ましくは、画像処理方法は、さらに、互いに近傍にある領域から抽出された検出対象を示す複数の部分学習画像を単一のグループに設定するステップを含み、決定木群を構築するステップは、同一のグループに属する複数の部分学習画像に対して共通して重みを決定するステップを含む。 Preferably, the image processing method further includes a step of setting a plurality of partial learning images indicating detection targets extracted from regions adjacent to each other in a single group, and the steps of constructing the decision tree group are the same Determining a weight in common for a plurality of partial learning images belonging to the group.

好ましくは、画像処理方法は、さらに、学習画像を所定角度ずつ回転させて複数の学習画像を生成するとともに、生成した複数の学習画像から複数の部分学習画像を抽出するステップを含む。 Preferably, the image processing method further includes a step of rotating the learning image by a predetermined angle to generate a plurality of learning images and extracting a plurality of partial learning images from the generated learning images.

本発明の別の局面に従う画像処理装置は、学習画像から得られた複数の部分学習画像を用いて、ルートノードから複数の末端ノードまでの階層構造を有する決定木群を構築する手段を含む。複数の部分学習画像は、検出対象のうち予め定められた条件に適合する部分を示す第１のサンプルと、検出対象のうち予め定められた条件に適合しない部分を示す第２のサンプルと、非検出対象を示す第３のサンプルとを含む。決定木群を構築する手段は、末端ノードではない各ノードについて、与えられた部分学習画像が、当該ノードから分岐する子ノードのうちいずれに分類されるべきかを示す分岐関数をそれぞれ決定する手段と、決定されたそれぞれの分岐関数に従って、部分学習画像の各々をいずれかの末端ノードに到達するまで順次分岐させることで、各末端ノードについて、第１のサンプルおよび第２のサンプルの合計と第３のサンプルとの割合を示す第１の確率、および、第１のサンプルと第２のサンプルとの割合を示す第２の確率、を決定する手段とを含む。画像処理装置は、さらに、入力画像から得られた複数の部分入力画像が決定木群に与えられたときの、各部分入力画像がそれぞれ到達する末端ノードについての第１の確率に基づいて、入力画像内に検出対象が含まれているか否かを判定するとともに、当該それぞれの末端ノードについての第２の確率に基づいて、入力画像内に含まれる検出対象が予め定められた条件に適合するか否かを判定する手段を含む。 An image processing apparatus according to another aspect of the present invention includes means for constructing a decision tree group having a hierarchical structure from a root node to a plurality of end nodes using a plurality of partial learning images obtained from a learning image. The plurality of partial learning images include a first sample that indicates a portion that matches a predetermined condition among detection targets, a second sample that indicates a portion that does not match a predetermined condition among detection targets, And a third sample indicating a detection target. The means for constructing a decision tree group is a means for determining, for each node that is not a terminal node, a branch function indicating to which of the child nodes branched from the node the given partial learning image should be classified. And sequentially branching each of the partial learning images until reaching one of the terminal nodes according to the determined branch function, and for each terminal node, the sum of the first sample and the second sample and the first sample Means for determining a first probability indicating a ratio of three samples and a second probability indicating a ratio of the first sample and the second sample. The image processing apparatus further inputs an input based on a first probability for each terminal node to which each partial input image reaches when a plurality of partial input images obtained from the input image are given to the decision tree group. Whether or not a detection target is included in the image and whether or not the detection target included in the input image meets a predetermined condition based on the second probability for each terminal node Means for determining whether or not.

本発明のさらに別の局面に従えば、コンピュータで実行される画像処理プログラムが提供される。画像処理プログラムは、コンピュータに、学習画像から得られた複数の部分学習画像を用いて、ルートノードから複数の末端ノードまでの階層構造を有する決定木群を構築するステップを実行させる。複数の部分学習画像は、検出対象のうち予め定められた条件に適合する部分を示す第１のサンプルと、検出対象のうち予め定められた条件に適合しない部分を示す第２のサンプルと、非検出対象を示す第３のサンプルとを含む。決定木群を構築するステップは、末端ノードではない各ノードについて、与えられた部分学習画像が、当該ノードから分岐する子ノードのうちいずれに分類されるべきかを示す分岐関数をそれぞれ決定するステップと、決定されたそれぞれの分岐関数に従って、部分学習画像の各々をいずれかの末端ノードに到達するまで順次分岐させることで、各末端ノードについて、第１のサンプルおよび第２のサンプルの合計と第３のサンプルとの割合を示す第１の確率、および、第１のサンプルと第２のサンプルとの割合を示す第２の確率、を決定するステップとを含む。コンピュータに、さらに、入力画像から得られた複数の部分入力画像が決定木群に与えられたときの、各部分入力画像がそれぞれ到達する末端ノードについての第１の確率に基づいて、入力画像内に検出対象が含まれているか否かを判定するとともに、当該それぞれの末端ノードについての第２の確率に基づいて、入力画像内に含まれる検出対象が予め定められた条件に適合するか否かを判定するステップを実行させる。 According to still another aspect of the present invention, an image processing program executed by a computer is provided. The image processing program causes a computer to execute a step of constructing a decision tree group having a hierarchical structure from a root node to a plurality of terminal nodes using a plurality of partial learning images obtained from the learning image. The plurality of partial learning images include a first sample that indicates a portion that matches a predetermined condition among detection targets, a second sample that indicates a portion that does not match a predetermined condition among detection targets, And a third sample indicating a detection target. The step of constructing a decision tree group is a step of determining a branch function indicating to which of the child nodes branched from the node the given partial learning image should be classified for each node that is not a terminal node. And sequentially branching each of the partial learning images until reaching one of the terminal nodes according to the determined branch function, and for each terminal node, the sum of the first sample and the second sample and the first sample Determining a first probability indicating a ratio of three samples and a second probability indicating a ratio of the first sample and the second sample. Based on the first probability for the terminal node to which each partial input image arrives when a plurality of partial input images obtained from the input image are given to the decision tree group, And whether or not the detection target included in the input image meets a predetermined condition based on the second probability for each terminal node. The step of determining is executed.

本発明のいくつかの局面によれば、撮像などにより取得された入力画像に対して、予め設定された検出対象に相当する物体が存在する領域を認識するとともに、その認識された検出対象に相当する物体の状況についても判定できる。 According to some aspects of the present invention, an input image acquired by imaging or the like recognizes a region where an object corresponding to a preset detection target exists and corresponds to the recognized detection target. It is also possible to determine the situation of the object to be performed.

本実施の形態に係る画像処理装置を含む画像処理システムの構成例を示す概略図である。It is the schematic which shows the structural example of the image processing system containing the image processing apparatus which concerns on this Embodiment. 本実施の形態に係る画像処理装置の構成例を示す模式図である。It is a schematic diagram which shows the structural example of the image processing apparatus which concerns on this Embodiment. 本実施の形態に係る画像認識方法の背景を説明する図である。It is a figure explaining the background of the image recognition method which concerns on this Embodiment. 本実施の形態に係る物体認識方法において使用される学習画像を説明する図である。It is a figure explaining the learning image used in the object recognition method which concerns on this Embodiment. 本実施の形態に係る画像認識方法における重みの効果を説明する図である。It is a figure explaining the effect of the weight in the image recognition method which concerns on this Embodiment. 本実施の形態に係る画像認識方法における重みの更新による効果を説明する図である。It is a figure explaining the effect by the update of the weight in the image recognition method which concerns on this Embodiment. 本実施の形態に係る画像認識方法における重みの更新による効果を説明する図である。It is a figure explaining the effect by the update of the weight in the image recognition method which concerns on this Embodiment. 本実施の形態に係る物体認識方法において使用される学習画像のＢａｇの生成を説明する図である。It is a figure explaining the production | generation of Bag of the learning image used in the object recognition method which concerns on this Embodiment. 本実施の形態に係る物体認識方法における学習処理での決定木群の構築処理を説明するための図である。It is a figure for demonstrating the construction process of the decision tree group in the learning process in the object recognition method which concerns on this Embodiment. 本実施の形態に係る物体認識方法の学習処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the learning process of the object recognition method which concerns on this Embodiment. 本実施の形態に係る物体認識方法における学習処理での分岐関数の決定する処理を説明するための模式図である。It is a schematic diagram for demonstrating the process which determines the branch function in the learning process in the object recognition method which concerns on this Embodiment. 本実施の形態に係る物体認識方法における学習処理により得られる末端ノードの情報を説明するための模式図である。It is a schematic diagram for demonstrating the information of the terminal node obtained by the learning process in the object recognition method which concerns on this Embodiment. 本実施の形態に係る物体認識方法における認識処理手順を示すフローチャートである。It is a flowchart which shows the recognition process sequence in the object recognition method which concerns on this Embodiment. 本実施の形態に係る物体認識方法における認識処理を説明する図である。It is a figure explaining the recognition process in the object recognition method which concerns on this Embodiment. 本実施の形態に係る物体認識方法による認識結果の一例を示す図である。It is a figure which shows an example of the recognition result by the object recognition method which concerns on this Embodiment. 本実施の形態に係る物体認識方法における追加学習処理を説明する図である。It is a figure explaining the additional learning process in the object recognition method which concerns on this Embodiment. 本実施の形態に係る物体認識方法における追加学習処理手順を示すフローチャートである。It is a flowchart which shows the additional learning process procedure in the object recognition method which concerns on this Embodiment.

本発明の実施の形態について、図面を参照しながら詳細に説明する。なお、図中の同一または相当部分については、同一符号を付してその説明は繰り返さない。 Embodiments of the present invention will be described in detail with reference to the drawings. In addition, about the same or equivalent part in a figure, the same code | symbol is attached | subjected and the description is not repeated.

＜Ａ．画像処理システムの構成例＞
図１は、本実施の形態に係る画像処理装置１００を含む画像処理システム１の構成例を示す概略図である。図１には、一例として、ビンピッキングシステムに向けられる画像処理システム１を示す。ビンピッキングシステムは、入力画像から、商品選別や部品組み立てのために指定された物体（以下、「ワーク」とも称す。）の位置・姿勢を認識し、その認識された位置・姿勢の情報に従って、当該認識されたワークを把持（ピッキング）するものである。 <A. Configuration example of image processing system>
FIG. 1 is a schematic diagram illustrating a configuration example of an image processing system 1 including an image processing apparatus 100 according to the present embodiment. FIG. 1 shows an image processing system 1 directed to a bin picking system as an example. The bin picking system recognizes the position / posture of an object (hereinafter, also referred to as “work”) designated for product selection and parts assembly from the input image, and according to the recognized position / posture information, The recognized workpiece is gripped (picked).

より具体的には、画像処理システム１は、撮像装置２と、画像処理装置１００と、ピッキングロボット２００とを含む。画像処理装置１００は、撮像装置２からの入力画像内から予め登録された検出対象に相当するワーク４の位置・姿勢を認識し、その検出したワーク４の位置・姿勢の情報をピッキングロボット２００へ出力する。ピッキングロボット２００は、画像処理装置１００からの情報に従って、対象のワーク４を把持して、所定の位置まで移動させる。 More specifically, the image processing system 1 includes an imaging device 2, an image processing device 100, and a picking robot 200. The image processing apparatus 100 recognizes the position / posture of the workpiece 4 corresponding to the detection target registered in advance from the input image from the imaging device 2, and sends the detected position / posture information of the workpiece 4 to the picking robot 200. Output. The picking robot 200 grips the target workpiece 4 and moves it to a predetermined position in accordance with information from the image processing apparatus 100.

本発明に係る画像処理方法、画像処理装置、画像処理プログラムの応用先は、図１に示すビンピッキングシステムに限られるものではなく、画像認識技術を利用した各種システムに適用可能である。 The application destination of the image processing method, the image processing apparatus, and the image processing program according to the present invention is not limited to the bin picking system shown in FIG. 1, but can be applied to various systems using image recognition technology.

＜Ｂ．画像処理装置の構成例＞
次に、図１に示す画像処理装置１００の構成例について説明する。図２は、本実施の形態に係る画像処理装置１００の構成例を示す模式図である。図２には、画像処理装置１００の典型的な実装例として、プロセッサが画像処理プログラムを実行する形態について例示する。 <B. Configuration example of image processing apparatus>
Next, a configuration example of the image processing apparatus 100 illustrated in FIG. 1 will be described. FIG. 2 is a schematic diagram illustrating a configuration example of the image processing apparatus 100 according to the present embodiment. FIG. 2 illustrates a form in which a processor executes an image processing program as a typical implementation example of the image processing apparatus 100.

より具体的には、画像処理装置１００は、プロセッサ１０２と、主メモリ１０４と、ＨＤＤ（Hard Disk Drive）１０６と、ネットワークインターフェイス１１０と、画像入力インターフェイス１１２と、入力部１１４と、表示部１１６と、出力インターフェイス１１８とを含む。これらのコンポーネントは、内部バス１２０を介して、互いに通信可能に接続されている。 More specifically, the image processing apparatus 100 includes a processor 102, a main memory 104, an HDD (Hard Disk Drive) 106, a network interface 110, an image input interface 112, an input unit 114, and a display unit 116. And an output interface 118. These components are communicably connected to each other via the internal bus 120.

プロセッサ１０２は、後述する処理を実行する処理主体であり、ＨＤＤ１０６に格納されている画像処理プログラム１０８を主メモリ１０４に展開して実行する。プロセッサ１０２は、典型的には、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro-Processing Unit）からなる。ＨＤＤ１０６には、後述する学習処理の結果得られる決定木や認識処理の結果などが格納されてもよい。 The processor 102 is a processing entity that executes processing to be described later, and develops and executes the image processing program 108 stored in the HDD 106 in the main memory 104. The processor 102 typically includes a CPU (Central Processing Unit) and an MPU (Micro-Processing Unit). The HDD 106 may store a decision tree obtained as a result of learning processing described later, a result of recognition processing, and the like.

ネットワークインターフェイス１１０は、外部ネットワークなどを介した他の装置やサーバなどとの通信を仲介する。画像入力インターフェイス１１２は、任意の通信プロトコルに準拠した回路を含み、撮像装置２からの学習画像および／または入力画像を受付ける。入力部１１４は、キーボードやマウスなどを含み、ユーザからの入力操作を受付ける。表示部１１６は、ディスプレイなどからなり、学習処理や認識処理などの処理過程や結果などをユーザへ通知する。出力インターフェイス１１８は、任意の通信プロトコルに準拠した回路を含み、認識処理によって得られた結果などを外部（例えば、ピッキングロボット２００など）へ出力する。 The network interface 110 mediates communication with other devices and servers via an external network. The image input interface 112 includes a circuit that complies with an arbitrary communication protocol, and receives a learning image and / or an input image from the imaging device 2. The input unit 114 includes a keyboard, a mouse, and the like, and accepts an input operation from the user. The display unit 116 includes a display or the like, and notifies the user of processing processes and results such as learning processing and recognition processing. The output interface 118 includes a circuit that complies with an arbitrary communication protocol, and outputs the result obtained by the recognition process to the outside (for example, the picking robot 200).

撮像装置２は、被写体を撮像することで入力画像を生成する手段であり、一例として、レンズなどの光学系に加えて、ＣＣＤ（Coupled Charged Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）センサといったデバイスを含む。 The imaging device 2 is a means for generating an input image by imaging a subject. As an example, in addition to an optical system such as a lens, a device such as a CCD (Coupled Charged Device) or a CMOS (Complementary Metal Oxide Semiconductor) sensor is used. Including.

画像処理装置１００の機能の全部または一部を、例えば、ＳｏＣ（System on a chip）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）などの回路要素を用いて実現してもよい。図２に示す画像処理プログラム１０８は、任意の記録媒体（例えば、光ディスクやフラッシュメモリなど）を通じて画像処理装置１００へインストールされてもよいし、ネットワークを介して配信されてもよい。さらに、画像処理装置１００とサーバ装置とを連携させて後述する処理や機能を実現してもよい。この場合には、画像処理装置１００およびサーバ装置のいずれか一方または両方に、本実施の形態を実現するために必要な機能が存在することになる。 All or part of the functions of the image processing apparatus 100 may be realized using circuit elements such as SoC (System on a chip), ASIC (Application Specific Integrated Circuit), and FPGA (Field-Programmable Gate Array). Good. The image processing program 108 illustrated in FIG. 2 may be installed in the image processing apparatus 100 through an arbitrary recording medium (for example, an optical disk or a flash memory), or may be distributed via a network. Furthermore, the image processing apparatus 100 and the server apparatus may be linked to realize processes and functions described later. In this case, one or both of the image processing apparatus 100 and the server apparatus has a function necessary for realizing this embodiment.

撮像装置２と画像処理装置１００とを一体的に構成してもよいし、撮像装置２と直接接続されていない画像処理装置１００を採用してもよい。後者の場合には、任意の撮像手段を用いて画像を生成または取得し、その生成または取得された画像をネットワークや任意の記録媒体を介して、画像処理装置１００に取り込むようにしてもよい。 The imaging device 2 and the image processing device 100 may be configured integrally, or the image processing device 100 that is not directly connected to the imaging device 2 may be employed. In the latter case, an image may be generated or acquired using an arbitrary imaging unit, and the generated or acquired image may be captured into the image processing apparatus 100 via a network or an arbitrary recording medium.

＜Ｃ．概要＞
次に、本実施の形態に係る物体認識方法の概要について説明する。 <C. Overview>
Next, an outline of the object recognition method according to the present embodiment will be described.

図１に示すようなビンピッキングシステムにおいては、検出対象に相当するワークの位置・姿勢の認識に加えて、当該認識されたワークを把持できるか否かを判定する必要がある。図３は、本実施の形態に係る画像認識方法の背景を説明する図である。 In the bin picking system as shown in FIG. 1, it is necessary to determine whether or not the recognized workpiece can be gripped in addition to the recognition of the position and orientation of the workpiece corresponding to the detection target. FIG. 3 is a diagram for explaining the background of the image recognition method according to the present embodiment.

図３（ａ）に示すように、飲料容器をワーク４の一例とするビンピッキングシステムを想定すると、認識されたワーク４が他のワーク４から離れていれば（すなわち、他のワーク４との干渉がなければ）、容易に把持できるが、認識されたワーク４が他のワーク４に隣接していれば（すなわち、他のワーク４と干渉していれば）、把持することは困難になる。 As shown in FIG. 3 (a), assuming a bin picking system in which a beverage container is an example of a workpiece 4, if the recognized workpiece 4 is separated from another workpiece 4 (that is, with the other workpiece 4). If there is no interference), it can be easily gripped, but if the recognized workpiece 4 is adjacent to another workpiece 4 (that is, if it interferes with another workpiece 4), it is difficult to grip. .

関連技術に係る物体認識方法では、図３（ｂ）に示すように、ワークの位置・姿勢を認識する処理（ワーク認識）に加えて、認識されたワークを把持できるか否かを判定する処理（把持判定）を行なう必要があった。すなわち、関連技術に係る物体認識方法では、入力画像１０に対して、ワーク認識１２および把持判定１４を実行した後、ワークの把持動作１６を実行する必要があった。 In the object recognition method according to the related art, as shown in FIG. 3B, in addition to the process of recognizing the position and orientation of the work (work recognition), the process of determining whether or not the recognized work can be gripped. It was necessary to perform (grip determination). That is, in the object recognition method according to the related art, it is necessary to execute the workpiece gripping operation 16 after executing the workpiece recognition 12 and the gripping determination 14 on the input image 10.

図３（ｂ）に示すような処理手順の場合には、把持判定１４において、入力画像およびその距離画像などを用いてルールを予め設定する必要がある。より具体的には、認識されたワークの周囲の状況を示す３次元マップと、ピッキングロボット２００の把持動作を示すハンドマップとの間のルールを定める必要がある。すなわち、図３（ｂ）に示すような処理手順を実行するにあたっては、把持判定１４を適切に処理するために、予めルールを設定する必要があり、これらのルールの設定に人的コストを要するという課題がある。 In the case of the processing procedure as shown in FIG. 3B, it is necessary to set a rule in advance in the gripping determination 14 using the input image and its distance image. More specifically, it is necessary to define a rule between a three-dimensional map indicating the situation around the recognized workpiece and a hand map indicating the gripping operation of the picking robot 200. That is, when executing the processing procedure as shown in FIG. 3B, it is necessary to set rules in advance in order to appropriately handle the grip determination 14, and it takes human costs to set these rules. There is a problem.

これに対して、本実施の形態に係る物体認識方法では、図３（ｃ）に示すように、ワーク認識および把持判定を一体的に含む認識処理１３を実行する。認識処理１３では、把持判定の内容を含めて、ワーク４に対する認識処理が実行される。そのため、把持判定１４の実行に必要なルールを予め設定する必要がない。 On the other hand, in the object recognition method according to the present embodiment, as shown in FIG. 3C, a recognition process 13 that integrally includes workpiece recognition and gripping determination is executed. In the recognition process 13, the recognition process for the workpiece 4 is executed including the contents of the grip determination. Therefore, it is not necessary to set a rule necessary for executing the grip determination 14 in advance.

すなわち、ワーク４を認識するための識別器を構築する学習処理において、ワーク４として認識すべき事例の画像と、ワーク４として認識すべきではない事例の画像とを与えるとともに、ワーク４として認識すべき事例の画像の中に、把持容易な状況を示す事例の画像と、把持困難な状況を示す事例の画像とを含める。これらの画像を用いて、学習処理を行なうことで、ワークの位置・姿勢の認識と同時に、把持容易であるか、あるいは把持困難であるかといった状況についても判定することができる。つまり、本実施の形態に従う物体認識方法では、図３（ｂ）に示す把持判定１４に必要なルールを予め設定する必要はない。 That is, in the learning process for constructing a discriminator for recognizing the work 4, an image of a case that should be recognized as the work 4 and an image of a case that should not be recognized as the work 4 are given and recognized as the work 4 The image of the case that indicates the situation where it is easy to grip and the image of the case that indicates the situation where gripping is difficult are included in the image of the case that should be held. By performing learning processing using these images, it is possible to determine whether the workpiece is easily gripped or difficult to grip simultaneously with the recognition of the position / posture of the workpiece. That is, in the object recognition method according to the present embodiment, it is not necessary to set in advance the rules necessary for the grip determination 14 shown in FIG.

理解を容易にするために、ビンピッキングシステムという具体例を例に説明したが、これに限られず、本実施の形態に係る物体認識方法は、ワークの位置・姿勢の認識と、当該認識されたワークが予め定められた条件に適合するか否か（認識されたワークがおかれた状況）の判定とを同時に行なう必要があるような各種のアプリケーションに適用可能である。 In order to facilitate understanding, a specific example of a bin picking system has been described as an example. However, the present invention is not limited to this, and the object recognition method according to the present embodiment recognizes the recognition of the position / posture of a workpiece and the recognition. The present invention is applicable to various applications in which it is necessary to simultaneously determine whether or not a workpiece meets a predetermined condition (a situation where a recognized workpiece is placed).

例えば、平積みされた複数のワークをある規則に従って順次把持するようなアプリケーションを想定すると、ワークの位置・姿勢の認識と、当該認識したワークが優先的に把持されるべきであるか否かを判定するようなピッキングシステムにも適用できる。さらに、ワークを把持するだけではなく、ワークを吸着するアプリケーションや、ワークに対して何らかの加工処理を行なうようなアプリケーションにも適用可能である。 For example, assuming an application that sequentially holds a plurality of stacked workpieces according to a certain rule, it recognizes the position / posture of the workpiece and whether or not the recognized workpiece should be held preferentially. The present invention can also be applied to a picking system that makes a determination. Furthermore, the present invention can be applied not only to gripping a workpiece but also to an application for attracting a workpiece and an application for performing some processing on the workpiece.

さらに、ワークが予め定められた条件に適合するか否かという２つの区分の判定だけではなく、より多くの区分のうち、いずれに属するのかを判定してもよい。例えば、上述した、ワークを把持するアプリケーションでは、「把持することは容易」、「（容易とまでは言えないが）把持することができる」、「把持することは困難」のいずれであるかを判定することもできる。 Furthermore, not only the determination of the two categories of whether or not the workpiece meets a predetermined condition, but it may also be determined which of the more categories it belongs to. For example, in the above-mentioned application for gripping a workpiece, it is “easy to grip”, “can be gripped (not easy)”, or “difficult to grip”. It can also be determined.

以上のとおり、本実施の形態に係る物体認識方法は、入力画像内に検出対象が含まれているか否かを判定するとともに、当該入力画像内に含まれる検出対象が予め定められた条件に適合するか否か（認識されたワークがおかれた状況）を判定する。 As described above, the object recognition method according to the present embodiment determines whether or not a detection target is included in the input image, and the detection target included in the input image conforms to a predetermined condition. It is determined whether or not (a situation where the recognized workpiece is placed).

本実施の形態に係る物体認識方法は、統計的学習法を用いて入力画像からワークを認識する。特に、本実施の形態に係る物体認識方法は、局所パッチベースの統計的学習法に向けられる。局所パッチベースは、入力画像から切り出された部分入力画像（以下「入力パッチ」とも称す。）を物体認識に利用する手法である。入力画像から切り出された複数の入力パッチを用いて投票処理を行なうため、ワークの一部に隠れや変形が生じていても、ロバストな物体認識を実現できる。 The object recognition method according to the present embodiment recognizes a workpiece from an input image using a statistical learning method. In particular, the object recognition method according to the present embodiment is directed to a local patch-based statistical learning method. The local patch base is a technique of using a partial input image cut out from an input image (hereinafter also referred to as “input patch”) for object recognition. Since the voting process is performed using a plurality of input patches cut out from the input image, robust object recognition can be realized even if a part of the work is hidden or deformed.

このような局所パッチベースの統計的学習法の一例として、ＲａｎｄｏｍＦｏｒｅｓｔｓ（決定木群）を入力パッチの物体認識に利用したＨｏｕｇｈＦｏｒｅｓｔｓ法が提案されている。ＨｏｕｇｈＦｏｒｅｓｔｓ法は、入力画像から切り出された複数の入力パッチをＲａｎｄｏｍＦｏｒｅｓｔｓに入力し、各入力パッチが到達した末端ノードのクラス確率を、検出対象の中心までのオフセット量（オフセットベクトル）を用いて投票する物体認識方法である。ＨｏｕｇｈＦｏｒｅｓｔｓ法では、サンプル画像から切り出された部分画像（以下「学習パッチ」とも称す。）のサブセットから学習処理によって決定木群を構築した上で、入力画像から切り出された入力パッチを決定木群に入力する。各入力パッチが各決定木においていずれのリーフノード（以下「末端ノード」とも称す。）に到達したのかを判定し、それぞれの末端ノードに保持されている情報（すなわち、学習処理にて予め得られている情報）を用いて、入力画像内に検出対象に相当するワークの位置・姿勢を認識する。 As an example of such a local patch-based statistical learning method, a Hough Forests method using Random Forests (decision tree group) for object recognition of an input patch has been proposed. In the Hough Forests method, a plurality of input patches cut out from an input image are input to Random Forests, and the class probability of the terminal node reached by each input patch is calculated using an offset amount (offset vector) to the center of the detection target. An object recognition method for voting. In the Hough Forests method, a decision tree group is constructed by learning processing from a subset of a partial image cut out from a sample image (hereinafter also referred to as “learning patch”), and the input patch cut out from the input image is determined as a decision tree group. To enter. It is determined which leaf node (hereinafter also referred to as “terminal node”) in each decision tree has reached each input patch, and the information held in each terminal node (that is, obtained in advance by learning processing). The position / orientation of the workpiece corresponding to the detection target is recognized in the input image.

本実施の形態に係る物体認識方法は、ＨｏｕｇｈＦｏｒｅｓｔｓ法をベースにしているが、以下に詳述するような新規な処理を含むことで、一般的なＨｏｕｇｈＦｏｒｅｓｔｓ法では得られない、顕著な作用効果を奏するものである。 The object recognition method according to the present embodiment is based on the Hough Forests method. However, the object recognition method according to the present embodiment includes a novel process as described in detail below, and thus has a remarkable effect that cannot be obtained by a general Hough Forests method. There is an effect.

本実施の形態に係る物体認識方法は、教師付き学習に従う学習処理と、学習処理によって得られた学習結果を用いて、入力画像から検出対象に相当するワークの位置・姿勢の認識、および、当該認識されたワークが予め定められた条件に適合するか否かの判定を行なう認識処理とを含む。さらに、本実施の形態に係る物体認識方法においては、学習処理を実行することで得られた学習結果では、適切な認識結果が得られないような場合に、その学習結果を補正する追加学習処理を含めることができる。以下、本実施の形態に係る物体認識方法について、「学習処理」、「認識処理」、「追加学習処理」の順に詳述する。 The object recognition method according to the present embodiment uses a learning process according to supervised learning and a learning result obtained by the learning process to recognize the position / posture of the workpiece corresponding to the detection target from the input image, and Recognition processing for determining whether or not the recognized workpiece meets a predetermined condition. Furthermore, in the object recognition method according to the present embodiment, if the learning result obtained by executing the learning process cannot obtain an appropriate recognition result, the additional learning process corrects the learning result. Can be included. Hereinafter, the object recognition method according to the present embodiment will be described in detail in the order of “learning process”, “recognition process”, and “additional learning process”.

＜Ｄ．学習処理＞
（ｄ１：学習処理に用いる学習画像）
本実施の形態に係る物体認識方法の学習処理では、複数の部分学習画像を用いて、入力パッチを識別するためのＲａｎｄｏｍＦｏｒｅｓｔｓ（以下、「決定木群」とも称す。）を構築する。決定木群は、後述するように、ルートノードから複数の末端ノードまでの階層構造を有している。このように、学習処理は、学習画像から得られた複数の部分学習画像を用いて、ルートノードから複数の末端ノードまでの階層構造を有する決定木群を構築する工程を含む。 <D. Learning process>
(D1: Learning image used for learning processing)
In the learning process of the object recognition method according to the present embodiment, Random Forests (hereinafter also referred to as “decision tree group”) for identifying input patches are constructed using a plurality of partial learning images. As will be described later, the decision tree group has a hierarchical structure from a root node to a plurality of end nodes. Thus, the learning process includes a step of constructing a decision tree group having a hierarchical structure from a root node to a plurality of terminal nodes using a plurality of partial learning images obtained from the learning image.

認識処理では、入力画像から切り出された入力パッチを決定木群に入力し、各末端ノードに到達した入力パッチについてのクラス確率などを用いることで、検出対象に相当するワークの位置・姿勢の認識、および、当該認識されたワークが予め定められた条件に適合するか否かの判定を行なう。 In the recognition process, the input patch cut out from the input image is input to the decision tree group, and the class probability of the input patch that has reached each terminal node is used to recognize the position / posture of the workpiece corresponding to the detection target. Then, it is determined whether or not the recognized workpiece meets a predetermined condition.

本実施の形態に係る物体認識方法の学習処理では、ワークの位置・姿勢を認識するため、検出対象（すなわち、位置・姿勢を認識すべき対象）のワークの少なくとも一部が写った学習画像（以下、「ポジティブ画像」とも称す。）と、非検出対象（すなわち、位置・姿勢を認識されるべきではない対象）のワークの少なくとも一部が写った学習画像（以下、「ネガティブ画像」とも称す。）とを用いる。そして、ポジティブ画像から切り出された部分学習画像を「ポジティブパッチ」とも称し、ネガティブ画像から切り出された部分学習画像を「ネガティブパッチ」とも称す。以下では、「ポジティブパッチ」および「ネガティブパッチ」を単に「学習パッチ」と総称することもある。本実施の形態に係る物体認識方法は、局所パッチベースの統計的学習法であるため、これらの学習パッチを用いて、学習処理および認識処理を実行する。 In the learning process of the object recognition method according to the present embodiment, in order to recognize the position / posture of the work, a learning image (at least a part of the work of the detection target (that is, the target whose position / posture should be recognized) is shown ( Hereinafter, it is also referred to as a “positive image”) and a learning image (hereinafter also referred to as a “negative image”) in which at least a part of the work of a non-detection target (that is, a target whose position / posture should not be recognized) is shown. .). The partial learning image cut out from the positive image is also referred to as “positive patch”, and the partial learning image cut out from the negative image is also referred to as “negative patch”. Hereinafter, “positive patch” and “negative patch” may be simply referred to as “learning patch”. Since the object recognition method according to the present embodiment is a local patch-based statistical learning method, learning processing and recognition processing are executed using these learning patches.

後述するように、本実施の形態に係る物体認識方法においては、複数のパッチを集合体（Ｂａｇ）として扱う場合もあるため、説明の便宜上、学習画像から取得された１または複数のパッチを「学習サンプル」と総称することもある。同様に、１または複数のポジティブパッチを「ポジティブサンプル」と総称し、ネガティブパッチを「ネガティブサンプル」と総称する場合もある。また、入力パッチについても「入力サンプル」と総称する場合もある。 As will be described later, in the object recognition method according to the present embodiment, a plurality of patches may be handled as an aggregate (Bag). Therefore, for convenience of explanation, one or a plurality of patches acquired from a learning image is represented by “ It may be collectively referred to as “learning sample”. Similarly, one or more positive patches may be collectively referred to as “positive samples” and negative patches may be collectively referred to as “negative samples”. The input patch may also be collectively referred to as “input sample”.

学習画像としては任意の形式の画像を用いることができるが、物体認識においては、距離画像、すなわちある撮像点からワーク表面の各点までの距離を画素値とした画像を用いることが好ましい。もちろん、通常の画像、すなわちワーク表面の各点の明るさを画素値とした画像を用いることもできる。 An arbitrary type of image can be used as the learning image, but in object recognition, it is preferable to use a distance image, that is, an image having a pixel value as a distance from a certain imaging point to each point on the workpiece surface. Of course, a normal image, that is, an image with the brightness of each point on the workpiece surface as the pixel value can also be used.

図４は、本実施の形態に係る物体認識方法において使用される学習画像を説明する図である。図４には、（ａ）ポジティブ画像、および、（ｂ）ネガティブ画像を示す。図４に示されるポジティブ画像２０は、検出対象として飲料容器を想定したものである。検出対象のワークを識別するために、外観の特徴が似ていないと考えられる別のワークを写したネガティブ画像３０を用いて、検出対象と非検出対象とを区別するための決定木群を構築する。 FIG. 4 is a diagram for explaining a learning image used in the object recognition method according to the present embodiment. FIG. 4 shows (a) a positive image and (b) a negative image. The positive image 20 shown in FIG. 4 assumes a beverage container as a detection target. In order to identify a workpiece to be detected, a decision tree group for distinguishing between a detection target and a non-detection target is constructed using a negative image 30 obtained by copying another workpiece that does not have similar appearance characteristics. To do.

本実施の形態に係る物体認識方法においては、入力画像内に含まれる検出対象が予め定められた条件に適合するか否かを判定するため、ポジティブ画像２０を、その判定すべき条件に応じて、２またはそれ以上の区分（以下、「サブクラス」とも称す。）に分類する。言い換えれば、判定すべき条件の数に応じたそれぞれのサブクラスの学習画像を用意する。図４に示す例では、１つの条件、つまり上述したような、認識されたワークを把持できるか否かを判定するために、把持容易を示す入力画像と、把持困難を示す入力画像とを、ポジティブ画像２０に含める。 In the object recognition method according to the present embodiment, in order to determine whether or not the detection target included in the input image meets a predetermined condition, the positive image 20 is determined according to the condition to be determined. Classify into two or more categories (hereinafter also referred to as “subclasses”). In other words, learning images of respective subclasses corresponding to the number of conditions to be determined are prepared. In the example shown in FIG. 4, in order to determine whether one condition, that is, the recognized workpiece as described above can be gripped, an input image indicating easy gripping and an input image indicating gripping difficulty are It is included in the positive image 20.

図４に示すポジティブ画像２０は、飲料容器が把持可能に配置されている状態（サブクラス１）を写した学習画像２２（（ａ−１）把持容易）と、飲料容器が把持困難に配置されている状態（サブクラス２）を写した学習画像２４（（ａ−２）把持困難）とを含む。なお、図４に示すような、学習画像の生成方法については、後述する。 A positive image 20 shown in FIG. 4 is a learning image 22 ((a-1) easy to grip) showing a state (subclass 1) in which the beverage container is disposed so as to be graspable, and the beverage container is disposed difficult to grasp. And a learning image 24 ((a-2) difficulty in gripping) showing a state (subclass 2) being present. A learning image generation method as shown in FIG. 4 will be described later.

学習処理においては、ポジティブ画像２０（学習画像２２および学習画像２４を含む）およびネガティブ画像３０からそれぞれ切り出した複数の学習パッチを用いて、決定木群を構築する。この複数の学習パッチ（部分学習画像）は、検出対象のうち予め定められた条件に適合する部分を示す学習画像２２（第１のサンプル）と、検出対象のうち予め定められた条件に適合しない部分を示す学習画像２４（第２のサンプル）と、非検出対象を示すネガティブ画像３０（第３のサンプル）とを含む。 In the learning process, a decision tree group is constructed using a plurality of learning patches cut out from the positive image 20 (including the learning image 22 and the learning image 24) and the negative image 30, respectively. The plurality of learning patches (partial learning images) do not match the learning image 22 (first sample) indicating a portion that matches a predetermined condition among detection targets and the predetermined condition among detection targets. A learning image 24 (second sample) showing a portion and a negative image 30 (third sample) showing a non-detection target are included.

構築される決定木群は、入力画像から切り出された入力パッチが、ポジティブ画像２０およびネガティブ画像３０のいずれに類似している可能性が高いのかという確率と、ポジティブ画像２０に該当するとした場合に、サブクラス１およびサブクラス２のいずれに類似している可能性が高いのかという確率とを算出するのに用いられる。 When the decision tree group to be constructed corresponds to the positive image 20 and the probability that the input patch cut out from the input image is likely to be similar to the positive image 20 or the negative image 30 This is used to calculate the probability of being likely to be similar to subclass 1 or subclass 2.

（ｄ２：「重み」の概念の導入）
本実施の形態に係る物体認識方法においては、決定木のノードについて「重み」という概念を導入し、決定木群を構築する際に、誤識別を生じる可能性の高い学習パッチを、実質的に自動的に取捨選択できる新たな仕組みを採用することが好ましい。 (D2: Introduction of the concept of “weight”)
In the object recognition method according to the present embodiment, the concept of “weight” is introduced for the nodes of the decision tree, and a learning patch that is likely to cause misidentification is substantially It is preferable to adopt a new mechanism that can be automatically selected.

上述したように、本実施の形態に係る物体認識方法では、ポジティブサンプルとネガティブサンプルとの間の識別（クラス識別）、および、学習画像２２（（ａ−１）把持容易：サブクラス１）と学習画像２４（（ａ−２）把持困難：サブクラス２）との間の識別（サブクラス識別）を行なう必要があるので、ポジティブサンプルとネガティブサンプルとの間の識別に係る第１の重みと、サブクラス１とサブクラス２との間の識別に係る第２の重みとの２種類の重みを用いる。 As described above, in the object recognition method according to the present embodiment, identification (class identification) between a positive sample and a negative sample, and learning image 22 ((a-1) easy grasping: subclass 1) and learning are performed. Since it is necessary to perform identification (subclass identification) with the image 24 ((a-2) difficulty in gripping: subclass 2), the first weight related to identification between the positive sample and the negative sample, and subclass 1 And two types of weights, the second weight related to the discrimination between subclass 2 and subclass 2.

本実施の形態に係る「重み」は、決定木群を構築する際に、一方の学習画像から得られた学習パッチが示す特徴が他方の学習画像に類似しているような場合に、当該学習パッチについてのクラス確率を相対的に低減するために用いる。クラス確率は、入力画像から切り出された入力パッチに対する投票処理に用いられる確率である。学習画像からの切り出された信頼度の低い学習パッチについての確率を低減することで、認識処理において、誤識別を抑制できる。 The “weight” according to the present embodiment is used when the decision tree group is constructed when the feature indicated by the learning patch obtained from one learning image is similar to the other learning image. Used to relatively reduce class probabilities for patches. The class probability is a probability used for voting processing for an input patch cut out from the input image. By reducing the probability of the learning patch cut out from the learning image and having low reliability, erroneous identification can be suppressed in the recognition processing.

より具体的には、ポジティブ画像２０から切り出された学習サンプルのうち、ネガティブ画像３０に類似している学習サンプルについては、その重み（第１の重み）を相対的に低くする。また、学習画像２２（（ａ−１）把持容易）から切り出された学習サンプルのうち、学習画像２４（（ａ−２）把持困難）に類似している学習サンプルについては、その重み（第２の重み）を相対的に低くする。それぞれの重みは、決定木群を構築する際に、順次算出または更新される。 More specifically, among the learning samples cut out from the positive image 20, the weight (first weight) of the learning sample similar to the negative image 30 is relatively low. Among the learning samples cut out from the learning image 22 ((a-1) easy to grasp), the learning sample similar to the learning image 24 ((a-2) difficult to grasp) is weighted (second). ) Is relatively low. Each weight is sequentially calculated or updated when the decision tree group is constructed.

このように、ポジティブ画像２０から切り出された学習サンプルについては、第１の重みおよび第２の重みがそれぞれ独立に設定されることになる。一方、ネガティブ画像３０から切り出された学習パッチについては、第１の重みが設定されることになる。 Thus, for the learning sample cut out from the positive image 20, the first weight and the second weight are set independently. On the other hand, the first weight is set for the learning patch cut out from the negative image 30.

図５は、本実施の形態に係る画像認識方法における重みの効果を説明する図である。図５（ａ）には、学習処理によって決定される重みの分布例を概念的に示し、図５（ｂ）には、認識処理による投票結果の一例を示す。 FIG. 5 is a diagram for explaining the effect of weighting in the image recognition method according to the present embodiment. FIG. 5A conceptually shows an example of the distribution of weights determined by the learning process, and FIG. 5B shows an example of the voting result by the recognition process.

第１の重みとして、ポジティブサンプルとネガティブサンプルとの間（クラス）の重みｗ_１ｉｊを導入し、第２の重みとして、サブクラス１（把持容易）とサブクラス２（把持困難）との間（サブクラス）の重みｗ_２ｉｊを導入する。ここで、添え字「ｉｊ」は、ｉ番目のＢａｇに所属するｊ個目の学習パッチであることを意味する。Ｂａｇは、互いに類似した複数の学習パッチからなる群であり、その詳細については後述する。 As a first weight, a weight w _1ij between the positive sample and the negative sample (class) is introduced, and as a second weight, between the subclass 1 (easy gripping) and the subclass 2 (difficult gripping) (subclass) The weight w _2ij is introduced. Here, the subscript “ij” means the j-th learning patch belonging to the i-th Bag. Bag is a group of a plurality of learning patches similar to each other, and details thereof will be described later.

図５（ａ）に示すように、ポジティブサンプルとネガティブサンプルとの間の重みｗ_１ｉｊは、ネガティブ画像３０には含まれない特徴を示す領域において高くなっていることがわかる。また、サブクラス１とサブクラス２との間の重みｗ_２ｉｊは、サブクラス１の学習画像２２には含まれない特徴を示す領域において高くなっていることがわかる。 As shown in FIG. 5A, it can be seen that the weight w _1ij between the positive sample and the negative sample is high in the region showing the characteristics not included in the negative image 30. In addition, it can be seen that the weight w _2ij between the subclass 1 and the subclass 2 is high in a region showing a feature that is not included in the learning image 22 of the subclass 1.

すなわち、認識処理においては、入力画像に検出対象に相当するワークが含まれているか否かについては、重みｗ_１ｉｊが高い領域から切り出された学習パッチの情報により重きをおいて判断し、検出対象に相当すると判断されたワークが把持できるか否かについては、重みｗ_２ｉｊが高い領域から切り出された学習パッチの情報により重きをおいて判断する。 That is, in the recognition process, whether or not a work corresponding to the detection target is included in the input image is determined by weighting the information of the learning patch cut out from the region having the high weight w _1ij , and the detection target Whether or not the workpiece determined to correspond to can be gripped is determined with emphasis on the information of the learning patch cut out from the region having the high weight w _2ij .

図５（ｂ）には、図５（ａ）に示すような重みの分布を反映した決定木群を用いて認識処理を行なった場合に得られる投票結果の例を概念的に示す。図５（ｂ）に示すように、検出対象に相当するワークについては、把持容易に相当する投票結果４２および把持困難に相当する投票結果４４のいずれであっても、ポジティブサンプルへの投票がネガティブサンプルへの投票を上回っている。 FIG. 5B conceptually shows an example of voting results obtained when recognition processing is performed using a decision tree group reflecting the distribution of weights as shown in FIG. As shown in FIG. 5B, for a work corresponding to a detection target, voting on a positive sample is negative regardless of whether the voting result 42 is equivalent to easy grasping or the voting result 44 is equivalent to difficult grasping. It exceeds the vote for the sample.

一方で、把持容易に相当する投票結果４２においては、サブクラス１（把持容易）への投票がサブクラス２（把持困難）への投票を上回っており、把持困難に相当する投票結果４４においては、サブクラス２（把持困難）への投票がサブクラス１（把持容易）への投票を上回っている。 On the other hand, in the voting result 42 corresponding to easy gripping, the vote for subclass 1 (easy gripping) exceeds the voting for subclass 2 (difficult gripping), and in the voting result 44 corresponding to gripping difficulty, the subclass The vote for 2 (difficult to grasp) exceeds the vote for subclass 1 (easy to grasp).

投票結果４２および４４に示すように、本実施の形態に係る物体認識方法によれば、１回の認識処理によって、ポジティブ画像２０とネガティブ画像３０とのいずれに該当するのか、および、サブクラス１とサブクラス２とのいずれに該当するのかについて、同時に判断できる。 As shown in the voting results 42 and 44, according to the object recognition method according to the present embodiment, whether the image corresponds to the positive image 20 or the negative image 30 by one recognition process, and the subclass 1 It can be determined at the same time whether the sub-class 2 is applicable.

図５に示すように、本実施の形態に係る物体認識方法によれば、学習画像に含まれる識別に有効な特徴を示す領域についての重みｗ_１ｉｊおよびｗ_２ｉｊがそれぞれ高くなるように自動的に算出および更新されるので、認識されたワークを把持できるかといった判定についてのルールを予め設定する必要はない。 As shown in FIG. 5, according to the object recognition method according to the present embodiment, the weights w _1ij and w _2ij for the region showing the feature effective for identification included in the learning image are automatically increased. Since it is calculated and updated, it is not necessary to set a rule for determining whether the recognized workpiece can be gripped in advance.

本実施の形態に係る物体認識方法では、決定木群を構築する際に、それぞれの重みが順次更新される。このそれぞれの重みを更新することにより得られる効果について、以下説明する。 In the object recognition method according to the present embodiment, each weight is sequentially updated when a decision tree group is constructed. The effects obtained by updating the respective weights will be described below.

図６および図７は、本実施の形態に係る画像認識方法における重みの更新による効果を説明する図である。図６には、決定木を構築する処理において、あるノードにある学習パッチが複数の分岐ノードに分類される一例を示す。 6 and 7 are diagrams for explaining the effect of updating the weights in the image recognition method according to the present embodiment. FIG. 6 shows an example in which a learning patch in a certain node is classified into a plurality of branch nodes in the process of building a decision tree.

図６中の左側の分岐ノード内には、ネガティブサンプルとポジティブサンプルとが混在しており、この状態は、ポジティブサンプルとネガティブサンプルとを識別できる可能性（信頼度）が相対的に低いことを意味する。そのため、このような分岐ノードでの分岐結果が生じれば、ポジティブサンプルとネガティブサンプルとの間の重みｗ_１ｉｊを減少させる。すなわち、決定木群の構築過程において、ネガティブサンプルと類似するポジティブサンプルの重みｗ_１ｉｊを減少させる。 In the left branch node in FIG. 6, negative samples and positive samples are mixed, and this state indicates that the possibility (reliability) of identifying positive samples and negative samples is relatively low. means. Therefore, if a branch result at such a branch node occurs, the weight w _1ij between the positive sample and the negative sample is decreased. That is, in the decision tree group building process, the weight w _{1ij of the} positive sample similar to the negative sample is decreased.

図６中の中央の分岐ノード内には、サブクラス１の学習画像２２から切り出された学習パッチとサブクラス２の学習画像２４から切り出された学習パッチとが混在しており、この状態は、把持容易（サブクラス１）と把持困難（サブクラス２）とを識別できる可能性（信頼度）が相対的に低いことを意味する。そのため、このような分岐ノードでの分岐結果が生じれば、サブクラス１とサブクラス２との間の重みｗ_２ｉｊを減少させる。図６中の右側の分岐ノードについても同様である。すなわち、決定木群の構築過程において、サブクラス間で類似するポジティブサンプルの重みｗ_２ｉｊを減少させる。 In the center branch node in FIG. 6, a learning patch cut out from the learning image 22 of the subclass 1 and a learning patch cut out from the learning image 24 of the subclass 2 are mixed, and this state is easy to grasp. This means that the possibility (reliability) of identifying (subclass 1) and difficult to grasp (subclass 2) is relatively low. Therefore, if a branch result at such a branch node occurs, the weight w _2ij between the subclass 1 and the subclass 2 is decreased. The same applies to the right branch node in FIG. In other words, in the decision tree group building process, the weight w _{2ij of the} positive samples that are similar between the subclasses is reduced.

すなわち、分岐ノードを決定した後、各分岐ノードに分類される学習パッチが非ユニークであれば、重みを減少させる。 That is, after the branch node is determined, if the learning patch classified into each branch node is non-unique, the weight is decreased.

図７に示すように、認識処理においては、入力画像から切り出された入力パッチがそれぞれの決定木に入力される。それぞれの決定木において当該入力パッチが到達した末端ノードの情報が投票されることで、識別が行なわれるが、上述のような重みｗ_２ｉｊを更新しておくことで、把持容易なワークを写した入力画像から切り出された入力パッチについては、サブクラス１からの投票値が相対的に高くなり、把持困難なワークを写した入力画像から切り出された入力パッチについては、サブクラス２からの投票値が相対的に高くなる。この投票値の相対的な差に基づいて、いずれのサブクラスに該当するのかを判定できる。 As shown in FIG. 7, in the recognition process, input patches cut out from the input image are input to the respective decision trees. In each decision tree, identification is performed by voting the information of the terminal node that the input patch has reached, but by _{copying the} weight w _2ij as described above, a work that is easy to grasp is copied. For input patches cut out from the input image, the voting value from the subclass 1 is relatively high, and for input patches cut out from the input image showing a work that is difficult to grasp, the voting value from the subclass 2 is relative. Become expensive. Based on the relative difference of the vote values, it can be determined which subclass it corresponds to.

（ｄ３：学習画像の生成）
次に、学習画像を生成する手順について説明する。 (D3: Generation of learning image)
Next, a procedure for generating a learning image will be described.

検出対象のワークを撮像して得られる画像を回転させることで、各回転角におけるポジティブ画像２０を生成する。これは、本実施の形態に係る物体認識方法では、ワークの位置・姿勢を認識する必要があり、検出対象のそれぞれの姿勢（回転後の状態）について学習する必要があるからである。典型的には、撮像により取得した画像を１°ずつ３６０°回転させて３６０種類の学習画像を生成する。 A positive image 20 at each rotation angle is generated by rotating an image obtained by imaging the workpiece to be detected. This is because in the object recognition method according to the present embodiment, it is necessary to recognize the position / posture of the workpiece, and it is necessary to learn each posture (the state after rotation) of the detection target. Typically, 360 types of learning images are generated by rotating an image acquired by imaging 360 ° by 1 °.

その上、本実施の形態に係る物体認識方法では、認識すべきサブクラスに応じて、検出対象の飲料容器が把持可能に配置されている状態（サブクラス１）を写した画像と、飲料容器が把持困難に配置されている状態（サブクラス２）を写した画像との２種類の画像を取得し、それぞれについて回転させることで、各回転角におけるポジティブ画像２０（学習画像２２および学習画像２４）を生成する。 In addition, in the object recognition method according to the present embodiment, an image showing a state (subclass 1) in which a beverage container to be detected is arranged to be grippable according to a subclass to be recognized, and the beverage container grips Two types of images, which are images of difficultly arranged states (subclass 2), are acquired and rotated for each to generate positive images 20 (learning image 22 and learning image 24) at each rotation angle. To do.

すなわち、本実施の形態に係る物体認識方法は、学習画像を所定角度ずつ回転させて複数の学習画像を生成するとともに、生成した複数の学習画像から複数の学習パッチ／学習サンプル（部分学習画像）を抽出する工程を含む。 That is, the object recognition method according to the present embodiment rotates a learning image by a predetermined angle to generate a plurality of learning images, and also generates a plurality of learning patches / learning samples (partial learning images) from the generated learning images. A step of extracting.

一方、ネガティブ画像３０については、非検出対象のワークを撮像して得られる画像から生成する。ネガティブ画像３０は、検出対象のワークとの識別に用いるための情報であるので、回転させて各回転角についての学習画像を生成する必要はない。 On the other hand, the negative image 30 is generated from an image obtained by capturing an image of a non-detection target work. Since the negative image 30 is information used for identification with the workpiece to be detected, it is not necessary to rotate and generate a learning image for each rotation angle.

本実施の形態に係る物体認識方法においては、さらにＢａｇの概念を導入することが好ましい。Ｂａｇの概念を導入することで、学習画像に表われる検出対象の揺らぎの影響を抑制できる。 In the object recognition method according to the present embodiment, it is preferable to introduce the concept of Bag. By introducing the concept of Bag, it is possible to suppress the influence of fluctuation of the detection target appearing in the learning image.

図８は、本実施の形態に係る物体認識方法において使用される学習画像のＢａｇの生成を説明する図である。図８を参照して、ポジティブ画像２０およびネガティブ画像３０から、ポジティブパッチおよびネガティブパッチをそれぞれ切り出す。上述したように、各回転角についてのポジティブ画像２０が生成されるので、各ポジティブ画像２０からグリッドサンプリングにより複数のポジティブパッチが生成される。これらの学習パッチから、ポジティブ画像２０を示すポジティブＢａｇＢ１，Ｂ２，Ｂ３，Ｂ４，…と、ネガティブ画像３０を示すネガティブＢａｇＮＢ１，ＮＢ２，ＮＢ３，ＮＢ４，ＮＢ５，…，ＮＢＮとを生成する。 FIG. 8 is a diagram for explaining generation of a bag of a learning image used in the object recognition method according to the present embodiment. Referring to FIG. 8, a positive patch and a negative patch are cut out from positive image 20 and negative image 30, respectively. As described above, since the positive image 20 for each rotation angle is generated, a plurality of positive patches are generated from each positive image 20 by grid sampling. From these learning patches, positive BagB1, B2, B3, B4,... Indicating the positive image 20 and negative BagNB1, NB2, NB3, NB4, NB5,.

本実施の形態に係る物体認識方法では、予め教師信号がある学習画像からＢａｇを生成するため、ポジティブパッチとネガティブパッチとが混在するようなＢａｇは生成せず、各Ｂａｇは、ポジティブパッチのみ、または、ネガティブパッチのみを含むことになる。 In the object recognition method according to the present embodiment, since a bag is generated from a learning image with a teacher signal in advance, a bag in which a positive patch and a negative patch are mixed is not generated, and each bag includes only a positive patch, Or only negative patches will be included.

互いに近傍する領域から切り出された複数の学習パッチからポジティブＢａｇを生成する。すなわち、ポジティブＢａｇの各々は、ポジティブ画像２０において位置および／または角度が互いに近似する複数の学習パッチを含むことになる。このような方法によってポジティブＢａｇを生成することで、互いに類似した外観を有する複数の学習パッチからなる群が１つのＢａｇとして扱われるようになる。このように、本実施の形態に係る物体認識方法は、互いに近傍にある領域から抽出された検出対象を示す複数の学習パッチ（部分学習画像）を単一のグループに設定する工程を含む。 A positive bag is generated from a plurality of learning patches cut out from regions close to each other. That is, each positive bag includes a plurality of learning patches whose positions and / or angles approximate each other in the positive image 20. By generating a positive bag by such a method, a group of a plurality of learning patches having appearances similar to each other is handled as one bag. As described above, the object recognition method according to the present embodiment includes a step of setting a plurality of learning patches (partial learning images) indicating detection targets extracted from regions close to each other in a single group.

一方、本実施の形態に係る物体認識方法において、ネガティブ画像３０から切り出されたネガティブパッチは、主として、ポジティブ画像２０内の検出対象の認識に有効ではない領域についての重みｗ_１ｉｊを下げるために用いられるものであり、学習画像に表われる揺らぎの影響などを考慮する必要はない。そのため、ネガティブ画像３０から切り出された１つのネガティブパッチを１つのネガティブＢａｇとする。 On the other hand, in the object recognition method according to the present embodiment, the negative patch cut out from the negative image 30 is mainly used to lower the weight w _1ij for a region that is not effective for recognition of the detection target in the positive image 20. It is not necessary to consider the influence of fluctuations appearing in the learning image. Therefore, one negative patch cut out from the negative image 30 is set as one negative bag.

後述するように、本実施の形態に係る物体認識方法の学習処理においては、決定木群を構築する際に、同一のグループに属する複数の学習パッチ（部分学習画像）に対して共通して重みを決定する。すなわち、同一のＢａｇに含まれる学習パッチは、互いに類似した情報を有していると想定されるので、共通の重みを設定することで、処理の簡素化を図る。 As will be described later, in the learning process of the object recognition method according to the present embodiment, when a decision tree group is constructed, a weight is commonly applied to a plurality of learning patches (partial learning images) belonging to the same group. To decide. That is, learning patches included in the same Bag are assumed to have similar information to each other, and therefore, the processing is simplified by setting a common weight.

（ｄ４：決定木群の構築）
次に、学習処理における決定木群の構築処理について説明する。図９は、本実施の形態に係る物体認識方法における学習処理での決定木群の構築処理を説明するための図である。図９を参照して、ポジティブ画像２０から切り出されたポジティブパッチ２１（サブクラス１に属するポジティブパッチ２３、および、サブクラス２に属するポジティブパッチ２５）、ならびに、ネガティブ画像３０から切り出されたネガティブパッチ３１がランダムに選択されて複数のサブセット６２が生成される。１つのサブセット６２から１つの決定木６０が構築（すなわち、学習）されるので、構築すべき決定木群に含まれる決定木６０の数と同数だけ、サブセット６２が生成されることになる。 (D4: Construction of decision tree group)
Next, the decision tree group construction process in the learning process will be described. FIG. 9 is a diagram for explaining a decision tree group construction process in a learning process in the object recognition method according to the present embodiment. Referring to FIG. 9, positive patch 21 (positive patch 23 belonging to subclass 1 and positive patch 25 belonging to subclass 2) cut out from positive image 20 and negative patch 31 cut out from negative image 30 are shown. Randomly selected, a plurality of subsets 62 are generated. Since one decision tree 60 is constructed (ie, learned) from one subset 62, the same number of subsets 62 as the number of decision trees 60 included in the decision tree group to be constructed is generated.

学習処理においては、サブセット６２について、各階層（ノード）おいて分岐関数を決定するとともに、決定された分岐関数に従ってそれぞれの子ノードに分岐されたパッチ間の分離度合い、すなわちクラス尤度が算出される。そして、算出されたクラス尤度に基づいて、対応するノード（階層）における重みが算出または更新される。このとき、ポジティブサンプルとネガティブサンプルとの間の重みｗ_１ｉｊ、および、サブクラス１とサブクラス２との間の重みｗ_２ｉｊの両方が更新される。分岐関数の決定、クラス尤度の算出、重みの更新という一連の処理は、サブセット６２に含まれるすべての学習パッチが末端ノードへ到達するまで繰り返される。 In the learning process, a branch function is determined for each subset (node) for the subset 62, and the degree of separation between patches branched to each child node according to the determined branch function, that is, the class likelihood is calculated. The Based on the calculated class likelihood, the weight in the corresponding node (hierarchy) is calculated or updated. In this case, the weight w _1ij between positive samples and negative _samples, and both the weights w _2ij between subclasses 1 and subclass 2 is updated. A series of processes such as determination of a branch function, calculation of class likelihood, and update of weight are repeated until all learning patches included in the subset 62 reach the end node.

ルートノード５０から各末端ノード５４−１〜５４−Ｎまでの経路にあるノード５１−１，５１−２，５２−１，５２−２，５２−３，５２−４に割り当てられた重みを合算することで、各末端ノード５４−１〜５４−Ｎについての重みを最終的に決定する。 Add up the weights assigned to the nodes 51-1, 51-2, 52-1, 52-2, 52-3, 52-4 on the route from the root node 50 to each of the end nodes 54-1 to 54-N. By doing so, the weight for each of the end nodes 54-1 to 54-N is finally determined.

構築された決定木６０に含まれるそれぞれの末端ノード５４−１〜５４−Ｎは、学習処理によって得られた、クラス確率およびオフセットベクトルを保持することになる。オフセットベクトルは、パッチ中心から物体中心までの方向および距離を示すベクトル量である。 Each terminal node 54-1 to 54-N included in the constructed decision tree 60 holds the class probability and the offset vector obtained by the learning process. The offset vector is a vector quantity indicating the direction and distance from the patch center to the object center.

さらに、末端ノード５４−１〜５４−Ｎの各々には、そこに到達した学習パッチに付随するカテゴリ情報（例えば、検出対象／非検出対象の区別、位置、角度などの情報）についてのヒストグラムを関連付けてもよい。なお、完全な決定木６０が構築された後に、各末端ノード５４−１〜５４−Ｎについての重み（または、クラス確率）を調整するようにしてもよい。 Further, in each of the end nodes 54-1 to 54-N, a histogram about category information (for example, information about detection target / non-detection target distinction, position, angle, etc.) attached to the learning patch that has arrived there is a histogram. You may associate. Note that after the complete decision tree 60 is constructed, the weights (or class probabilities) for the end nodes 54-1 to 54-N may be adjusted.

図９に示す末端ノード５４−Ｎは、ネガティブパッチ３１が到達する可能性の高いノードであり、この末端ノード５４−Ｎに到達する過程において、サブセット６２に含まれる、ポジティブパッチ２１（サブクラス１に属するポジティブパッチ２３、および、サブクラス２に属するポジティブパッチ２５）についての重みｗ_１ｉｊは、順次低下することになる。図９には、この重みｗ_１ｉｊの低下を面積の大きさで概念的に示している。 The terminal node 54-N illustrated in FIG. 9 is a node that the negative patch 31 is likely to reach. In the process of reaching the terminal node 54-N, the positive patch 21 (subclass 1 is included in the subset 62). The weights w _1ij for the positive patches 23 that belong and the positive patches 25) that belong to subclass 2 will decrease sequentially. FIG. 9 conceptually shows the decrease in the weight w _{1ij in terms} of area size.

（ｄ５：学習処理の処理手順）
次に、本実施の形態に係る物体認識方法の学習処理の処理手順について説明する。図１０は、本実施の形態に係る物体認識方法の学習処理の処理手順を示すフローチャートである。図１０に示す各ステップは、典型的には、画像処理装置１００のプロセッサ１０２が画像処理プログラム１０８を実行することで実現される。 (D5: learning process procedure)
Next, a processing procedure of learning processing of the object recognition method according to the present embodiment will be described. FIG. 10 is a flowchart showing the processing procedure of the learning process of the object recognition method according to the present embodiment. Each step shown in FIG. 10 is typically realized by the processor 102 of the image processing apparatus 100 executing the image processing program 108.

図１０を参照して、画像処理装置１００は、学習処理に用いる学習画像（ポジティブ画像２０およびネガティブ画像３０）を取得する（ステップＳ２）。続いて、画像処理装置１００は、取得したポジティブ画像２０については回転させて各回転角の学習画像を生成した上で、ポジティブパッチおよびネガティブパッチを切り出し、ポジティブＢａｇおよびネガティブＢａｇを生成する（ステップＳ４）。ステップＳ２およびＳ４の処理内容については、図８を参照して詳述したので、詳細な説明は繰り返さない。 Referring to FIG. 10, the image processing apparatus 100 acquires learning images (positive image 20 and negative image 30) used for learning processing (step S2). Subsequently, the image processing apparatus 100 rotates the acquired positive image 20 to generate a learning image at each rotation angle, cuts out the positive patch and the negative patch, and generates a positive bag and a negative bag (step S4). ). Since the processing contents of steps S2 and S4 have been described in detail with reference to FIG. 8, detailed description will not be repeated.

そして、画像処理装置１００は、重みｗ_１ｉｊおよび重みｗ_２ｉｊを初期化し（ステップＳ６）、ステップＳ４において生成した複数のＢａｇから所定数のサブセットを生成する（ステップＳ８）。サブセットの各々は、生成された複数のポジティブＢａｇおよびネガティブＢａｇからランダムに選択（ランダムサンプリング）されることにより、生成される。 Then, the image processing apparatus 100 initializes the weights w _1ij and the weights w _2ij (Step S6), and generates a predetermined number of subsets from the plurality of Bags generated in Step S4 (Step S8). Each of the subsets is generated by randomly selecting (random sampling) from a plurality of generated positive Bags and negative Bags.

その後、決定木群を構築する処理（ステップＳ１０〜Ｓ２０）が開始される。ステップＳ１０〜Ｓ２０の処理を１回実施することで、１つの決定木が構築される。ステップＳ１０〜Ｓ２０の処理は、生成されるサブセットの数だけ並列的に実行されることが好ましい。もちろん、ステップＳ１０〜Ｓ２０の処理を直列的に複数回繰り返すようにしてもよい。 Then, the process (step S10-S20) which builds a decision tree group is started. One decision tree is constructed by performing the processing of steps S10 to S20 once. It is preferable that the processes of steps S10 to S20 are executed in parallel for the number of generated subsets. Of course, you may make it repeat the process of step S10-S20 in multiple times in series.

決定木を構築する処理において、画像処理装置１００は、まず、階層１にあるノードにおける分岐関数候補を生成し（ステップＳ１０）、それらの生成した分岐関数候補の中から最適なものを階層１における分岐関数として決定する（ステップＳ１２）。そして、画像処理装置１００は、同一階層で分岐関数を決定していないノードが残っているか否かを判断する（ステップＳ１４）。同一階層で分岐関数を決定していないノードが残っている場合（ステップＳ１４においてＹＥＳの場合）には、ステップＳ１０以下の処理が繰り返される。 In the process of building a decision tree, the image processing apparatus 100 first generates a branch function candidate at a node in the hierarchy 1 (step S10), and selects an optimum one of the generated branch function candidates in the hierarchy 1 A branch function is determined (step S12). Then, the image processing apparatus 100 determines whether or not there remains a node for which a branch function is not determined in the same hierarchy (step S14). If there is a node that has not yet determined a branch function in the same hierarchy (YES in step S14), the processes in and after step S10 are repeated.

同一階層のすべてのノードについて分岐関数が決定済である場合（ステップＳ１４においてＮＯの場合）には、画像処理装置１００は、各ノードについての重み（重みｗ_１ｉｊおよび重みｗ_２ｉｊ）を更新する（ステップＳ１６）。 If the branch function has been determined for all the nodes in the same hierarchy (NO in step S14), the image processing apparatus 100 updates the weights (weights w _1ij and weights w _2ij ) for each node ( Step S16).

その後、画像処理装置１００は、決定木群の構築完了に係る所定条件が満たされたか否かを判定する（ステップＳ１８）。決定木群の構築完了に係る所定条件が満たされていない場合（ステップＳ１８においてＮＯの場合）には、ステップＳ１０以下の処理が繰り返される。 Thereafter, the image processing apparatus 100 determines whether or not a predetermined condition relating to completion of the construction of the decision tree group is satisfied (step S18). If the predetermined condition relating to the completion of the construction of the decision tree group is not satisfied (NO in step S18), the processes in and after step S10 are repeated.

決定木群の構築完了に係る所定条件が満たされている場合（ステップＳ１８においてＹＥＳの場合）には、画像処理装置１００は、各決定木の末端ノードに関連付けて、到達した学習パッチの画像そのもの、オフセットベクトル、重み付き確率などの情報を格納する（ステップＳ２０）。そして、学習処理は終了する。 When the predetermined condition relating to the completion of the construction of the decision tree group is satisfied (in the case of YES in step S18), the image processing apparatus 100 associates with the end node of each decision tree and the learning patch image itself reached. Information such as an offset vector and a weighted probability is stored (step S20). Then, the learning process ends.

以下、学習処理手順のより詳細な内容について説明する。
（ｄ６：重みの初期化）
画像処理装置１００は、決定木を構築する前に、重みｗ_１ｉｊおよび重みｗ_２ｉｊを初期化する（ステップＳ６）。より具体的には、画像処理装置１００は、重みｗ_１ｉｊおよび重みｗ_２ｉｊを１／Ｎに初期化する。ここで、定数Ｎは、任意の値に設定できる。 Hereinafter, more detailed contents of the learning processing procedure will be described.
(D6: weight initialization)
The image processing apparatus 100 initializes the weight w _1ij and the weight w _2ij before constructing the decision tree (step S6). More specifically, the image processing apparatus 100 _initializes the weight w _1ij and the weight w _2ij to 1 / N. Here, the constant N can be set to an arbitrary value.

（ｄ７：分岐関数候補の生成）
決定木の構築処理の第１段階として、画像処理装置１００は、階層１における分岐関数候補を生成する（ステップＳ１０）。ここで、階層ｄにおける分岐関数候補ｈ^（ｄ） _Ｔ，τ（Ｉ）は、学習画像から切り出された学習パッチｘとテンプレートＴとの類似度Ｓ（ｘ，Ｔ）としきい値τとを用いて、以下の（１）式のように定義される。 (D7: generation of branch function candidates)
As the first stage of the decision tree construction process, the image processing apparatus 100 generates a branch function candidate in the hierarchy 1 (step S10). Here, the branch function candidate h ^(d) _{T, τ} (I) in the hierarchy d uses the similarity S (x, T) between the learning patch x extracted from the learning image and the template T and the threshold value τ. Thus, the following equation (1) is defined.

ここで、パラメータτは，学習パッチｘとテンプレートＴとの類似度を評価するためのしきい値である。 Here, the parameter τ is a threshold value for evaluating the similarity between the learning patch x and the template T.

（ｄ８：分岐関数の決定）
続いて、画像処理装置１００は、生成した分岐関数候補の中から最適なものを対象の階層における分岐関数として決定する（ステップＳ１２）。 (D8: Determination of branch function)
Subsequently, the image processing apparatus 100 determines an optimal one from the generated branch function candidates as a branch function in the target hierarchy (step S12).

図１１は、本実施の形態に係る物体認識方法における学習処理での分岐関数の決定する処理を説明するための模式図である。図１１を参照して、（１）式のテンプレートＴは、対象の分岐ノードに与えられる複数のポジティブパッチからランダムに選択される。図１１に示す例では、各学習パッチｘは、テンプレートＴとの間の類似度が算出され、算出された類似度がしきい値未満であれば、左側の子ノードに分岐され、そうでなければ右側の子ノードに分岐される。 FIG. 11 is a schematic diagram for explaining processing for determining a branch function in learning processing in the object recognition method according to the present embodiment. Referring to FIG. 11, the template T in the formula (1) is randomly selected from a plurality of positive patches given to the target branch node. In the example shown in FIG. 11, each learning patch x has a similarity with the template T calculated, and if the calculated similarity is less than the threshold, it is branched to the left child node. Branches to the right child node.

分岐関数の決定処理においては、テンプレートＴおよびパラメータτをランダムにそれぞれ選択するとともに、以下の（２）式で定義される評価関数Ｕ_＊を用いて評価値を算出する。但し、＊∈｛１，２｝であり、重みｗ_１ｉｊおよびｗ_２ｉｊにそれぞれ対応する（以下、同様である）。 In the branch function determination process, the template T and the parameter τ are selected at random, and the evaluation value is calculated using the evaluation function U _* defined by the following equation (2). However, * ε {1, 2}, which corresponds to the weights w _1ij and w _2ij (the same applies hereinafter).

（２）式で算出される評価値が最小となるテンプレートＴおよびパラメータτを決定し、これらの決定された値から最適な分岐関数を決定する。 A template T and a parameter τ that minimize the evaluation value calculated by the equation (2) are determined, and an optimal branch function is determined from these determined values.

ここで、（２）式の｛ｘ｜ｈ（Ｉ_ｉ）＝０｝は、図１１の左側の子ノードに分割された学習パッチの集合を示し、｛ｘ｜ｈ（Ｉ_ｉ）＝１｝は、図１１の右側の子ノードに分割された学習パッチの集合を示す。 Here, {x | h (I _i ) = 0} in the expression (2) indicates a set of learning patches divided into child nodes on the left side of FIG. 11, and {x | h (I _i ) = 1} Indicates a set of learning patches divided into child nodes on the right side of FIG.

評価関数Ｕの評価には、以下の３つの基準を階層ごとに切り替えて用いる。１つ目は、クラスラベルのエントロピーを評価する評価関数Ｕ_１（Ａ）およびＵ_２（Ａ）であり、それぞれの子ノードに分岐した学習パッチの集合Ａについて、以下の（３）および（４）式に従って定義される。 For the evaluation of the evaluation function U, the following three criteria are switched for each layer and used. The first is evaluation functions U ₁ (A) and U ₂ (A) that evaluate the entropy of the class label, and the following (3) and (4) ) Is defined according to the formula.

評価関数Ｕ_１（Ａ）は、ポジティブサンプルとネガティブサンプルとの識別の容易性（信頼度）を示し、評価関数Ｕ_２（Ａ）は、把持容易（サブクラス１）と把持困難（サブクラス２）との識別の容易性（信頼度）を示す。 The evaluation function U ₁ (A) indicates the ease (reliability) of discriminating between the positive sample and the negative sample, and the evaluation function U ₂ (A) is easy to grasp (subclass 1) and difficult to grasp (subclass 2). The ease of identification (reliability) is shown.

ここで、評価関数Ｕ_１（Ａ）中のｃは、学習パッチの集合Ａに含まれるポジティブサンプルの重み付き確率を示し、評価関数Ｕ_２（Ａ）中のｂは、学習パッチの集合Ａのうちポジティブサンプルの教師属性が付与されているものについての重み付き確率を示す。 Here, c in the evaluation function U ₁ (A) represents the weighted probability of the positive sample included in the learning patch set A, and b in the evaluation function U ₂ (A) represents the learning patch set A. Of these, the weighted probabilities for the positive sample teacher attributes are shown.

３つ目は、オフセットベクトルｄ_ｉｊのばらつき（分散）を評価する評価関数Ｕ_３（Ａ）であり、以下の（５）式に従って定義する。オフセットベクトルｄ_ｉｊは、対象の学習パッチを学習画像（ポジティブ画像２０）から切り出した位置を示す。 The third is an evaluation function U ₃ (A) for evaluating the variation (dispersion) of the offset vector d _ij and is defined according to the following equation (5). The offset vector _dij indicates the position where the target learning patch is cut out from the learning image (positive image 20).

ここで、ｄ_Ａは、Ａのオブセットベクトルの平均値を示す。
さらに、各決定木において、対象の階層ｄにおける必要なすべてのノードが生成されるまで、ステップＳ１０およびＳ１２の処理が繰り返される。すなわち、図１０のステップＳ１０〜Ｓ１４において、画像処理装置１００は、末端ノードではない各ノードについて、与えられた学習サンプル（部分学習画像）が、当該ノードから分岐する子ノードのうちいずれに分類されるべきかを示す分岐関数をそれぞれ決定する処理を実行する。 Here, d _A indicates an average value of A's obset vector.
Furthermore, the processes in steps S10 and S12 are repeated until all necessary nodes in the target hierarchy d are generated in each decision tree. That is, in steps S10 to S14 in FIG. 10, the image processing apparatus 100 classifies a given learning sample (partial learning image) as a child node branched from the node for each node that is not a terminal node. Processing for determining each branch function indicating whether or not to be performed is executed.

（ｄ９：重みの更新）
対象の階層ｄについてのすべてのノードが生成されると、画像処理装置１００は、各ノードについての重み（重みｗ_１ｉｊおよび重みｗ_２ｉｊ）を更新する（ステップＳ１６）。 (D9: weight update)
When all the nodes for the target hierarchy d are generated, the image processing apparatus 100 updates the weights (weight w _1ij and weight w _2ij ) for each node (step S16).

重みｗ_１ｉｊおよび重みｗ_２ｉｊは、それぞれのクラス尤度と、Ｂａｇのクラス尤度ｐ_ｉとに基づいて決定または更新される。なお、Ｂａｇの概念を導入することは必須ではないので、Ｂａｇの概念を導入しない場合には、クラス尤度のみから重みの決定または更新を行ってもよい。 The weight w _1ij and the weight w _2ij are determined or updated based on the respective class likelihood and the Bag class likelihood p _i . In addition, since it is not essential to introduce the concept of Bag, when the concept of Bag is not introduced, the weight may be determined or updated only from the class likelihood.

重みｗ_１ｉｊの更新は、ポジティブサンプルのみを対象として行なわれる。集合Ａに含まれるポジティブサンプルのうち、同一の集合Ａに含まれるネガティブサンプルに類似しているものについては、クラス尤度が相対的に下がる。そのため、ポジティブ画像とネガティブ画像との識別の容易性（信頼度）を示す第１のクラス尤度ｐ_１ｉｊを、以下の（６）式に示す。 The update of the weight w _1ij is performed only for positive samples. Among the positive samples included in the set A, the class likelihood is relatively lowered for those that are similar to the negative samples included in the same set A. Therefore, a first class likelihood p _1ij indicating ease (reliability) of discriminating between a positive image and a negative image is _represented by the following expression (6).

重みｗ_２ｉｊの更新は、集合Ａに含まれるポジティブサンプル（サブクラス１に属するポジティブパッチ２３、および、サブクラス２に属するポジティブパッチ２５）を対象として行なわれる。集合Ａに含まれるサブクラス１に属するポジティブサンプルのうち、同一の集合Ａに含まれるサブクラス２に属するポジティブサンプルに類似しているものについては、クラス尤度が相対的に下がる。同様に、集合Ａに含まれるサブクラス２に属するポジティブサンプルのうち、同一の集合Ａに含まれるサブクラス１に属するポジティブサンプルに類似しているものについては、クラス尤度が相対的に下がる。そのため、把持容易（サブクラス１）と把持困難（サブクラス２）との識別の容易性（信頼度）を示す第２のクラス尤度ｐ_２ｉｊを、以下の（７）式に示す。 The weight w _2ij is updated for the positive samples (positive patch 23 belonging to subclass 1 and positive patch 25 belonging to subclass 2) included in set A. Among the positive samples belonging to the subclass 1 included in the set A, the class likelihood is relatively lowered for those similar to the positive sample belonging to the subclass 2 included in the same set A. Similarly, among the positive samples belonging to the subclass 2 included in the set A, those that are similar to the positive samples belonging to the subclass 1 included in the same set A have a relatively low class likelihood. Therefore, the second class likelihood p _2ij indicating the ease (reliability) of discrimination between easy grip (subclass 1) and hard grip (subclass 2) is shown in the following equation (7).

（６）式において、Ｆ（ｘ）＝１−２ｃ（但し、ｃは集合Ａにおけるポジティブサンプルの割合）と定義すると、各ノードにおいて、ポジティブサンプルの割合ｃが高いほどクラス尤度ｐ_１ｉｊは高くなり、逆にポジティブサンプルの割合ｃが低いほどクラス尤度ｐ_１ｉｊは低くなる。 In the equation (6), if defined as F (x) = 1−2c (where c is the ratio of positive samples in the set A), the class likelihood p _1ij is higher as the positive sample ratio c is higher at each node. Conversely, the lower the positive sample ratio c, the lower the class likelihood p _1ij .

同様に、（７）式において、サブクラス１に属するポジティブサンプルについては、Ｇ_１（ｘ）＝１−２ｂ_１（但し、ｂ_１は集合Ａにおけるサブクラス１に属するポジティブサンプルの割合）と定義し、サブクラス２に属するポジティブサンプルについては、Ｇ_２（ｘ）＝２ｂ_２−２（但し、ｂ_２は集合Ａにおけるサブクラス２に属するポジティブサンプルの割合）と定義する。 Similarly, in the equation (7), a positive sample belonging to subclass 1 is defined as G ₁ (x) = 1−2b ₁ (where b ₁ is the proportion of positive samples belonging to subclass 1 in set A), A positive sample belonging to subclass 2 is defined as G ₂ (x) = 2b ₂ −2 (where b ₂ is the proportion of positive samples belonging to subclass 2 in set A).

Ｂａｇのクラス尤度ｐ_*ｉは、Ｂａｇに属するｐ集合のクラス尤度ｐ_*ｉｊを用いて、以下の（８）式に従って算出される。 The class likelihood p _{* i} of Bag is calculated according to the following equation (8) using the class likelihood p _{* ij} of the p set belonging to Bag.

重みｗ_＊ｉｊは、Ｂａｇのクラス尤度ｐ_*ｉとクラス尤度ｐ_*ｉｊとを用いて、（９）式に従って算出される。 The weight w _{* ij} is calculated according to the equation (9) using the Bag class likelihood p _{* i} and the class likelihood p _{* ij} .

最終的に算出された重みｗ_１ｉｊおよびｗ_２ｉｊを正規化することが好ましい。
上述したように、決定木群を構築する学習処理においては、末端ノードではない各ノードにおいて、決定された分岐関数に従って与えられた複数の学習サンプルをいずれかの子ノードにそれぞれ分類する。そして、この複数の学習サンプルを分類した結果に基づいて、各ノードについての、検出対象と非検出対象との識別能力を示す重みｗ_１ｉｊ（第１の重み）、および、サブクラス１とサブクラス２との識別能力を示す重みｗ_２ｉｊ（第２の重み）をそれぞれ決定する。 It is preferable to normalize the finally calculated weights w _1ij and w _2ij .
As described above, in the learning process for constructing the decision tree group, each of the learning samples given according to the determined branch function is classified into one of the child nodes at each node that is not the terminal node. Based on the result of classifying the plurality of learning samples, the weight w _1ij (first weight) indicating the discrimination ability between the detection target and the non-detection target for each node, and the subclass 1 and the subclass 2 The weights w _2ij (second weights) indicating the discriminating ability are respectively determined.

すなわち、決定木群を構築する学習処理においては、決定された分岐関数に従って与えられた複数の学習サンプルをいずれかの子ノードにそれぞれ分類したときに、ポジティブサンプルおよびネガティブサンプルが同一の子ノードに分類されている割合が高いほど（分離度が低いほど）、検出対象と非検出対象との識別能力を示す重みｗ_１ｉｊ（第１の重み）を低くする。また、サブクラス１およびサブクラス２が同一の子ノードに分類されている割合が高いほど（分離度が低いほど）、サブクラス１とサブクラス２との識別能力を示す重みｗ_２ｉｊ（第２の重み）を低くする。 That is, in the learning process for constructing a decision tree group, when a plurality of learning samples given according to the determined branch function are classified into any child node, the positive sample and the negative sample are classified into the same child node. The higher the ratio (the lower the degree of separation), the lower the weight w _1ij (first weight) indicating the discrimination ability between the detection target and the non-detection target. Further, the higher the ratio that subclass 1 and subclass 2 are classified into the same child node (the lower the degree of separation), the weight w _2ij (second weight) indicating the discrimination ability between subclass 1 and subclass 2 make low.

（ｄ１０：末端ノードの生成）
画像処理装置１００は、上述した分岐関数の決定および重み更新の処理を所定条件が満たされるまで繰り返す。所定条件としては、例えば、階層が指定した深さに到達すること、あるいは、各ノードの集合に含まれる学習パッチの数が一定数未満になるか、を含む。所定条件が満たされときの最下位のノードが末端ノードとなる。 (D10: Generation of terminal node)
The image processing apparatus 100 repeats the branch function determination and weight update processing described above until a predetermined condition is satisfied. The predetermined condition includes, for example, whether the hierarchy reaches a specified depth, or whether the number of learning patches included in each node set is less than a certain number. The lowest node when the predetermined condition is satisfied is the terminal node.

画像処理装置１００は、それぞれの末端ノードに関連付けて、到達した学習パッチの画像そのもの、オフセットベクトル、重み付き確率などの情報を格納する（ステップＳ２０）。 The image processing apparatus 100 stores information such as the reached learning patch image itself, the offset vector, and the weighted probability in association with each terminal node (step S20).

図１２は、本実施の形態に係る物体認識方法における学習処理により得られる末端ノードの情報を説明するための模式図である。図１２を参照して、末端ノードに関連付けて、当該末端ノードに到達したポジティブパッチ２３およびポジティブパッチ２５については、画像そのもの、オフセットベクトル、および重み付き確率を格納する。また、当該末端ノードに到達したネガティブパッチ３１については、画像そのもの、および重み付き確率を格納する。すなわち、図１０のステップＳ１６〜Ｓ１８において、画像処理装置１００は、決定されたそれぞれの分岐関数に従って、学習サンプル（部分学習画像）の各々をいずれかの末端ノードに到達するまで順次分岐させることで、各末端ノードについて、学習画像２２から切り出されたポジティブパッチ２３（第１のサンプル）および学習画像２４から切り出されたポジティブパッチ２５（第２のサンプル）の合計とネガティブ画像３０から切り出されたネガティブパッチ３１（第３のサンプル）との割合（第１の確率）、および、ポジティブパッチ２３（第１のサンプル）とポジティブパッチ２５（第２のサンプル）との割合（第２の確率）、を決定する処理を実行する。 FIG. 12 is a schematic diagram for explaining terminal node information obtained by learning processing in the object recognition method according to the present embodiment. Referring to FIG. 12, the image itself, the offset vector, and the weighted probability are stored for positive patch 23 and positive patch 25 that have reached the terminal node in association with the terminal node. For the negative patch 31 that has reached the end node, the image itself and the weighted probability are stored. That is, in steps S16 to S18 of FIG. 10, the image processing apparatus 100 sequentially branches each of the learning samples (partial learning images) until reaching one of the end nodes according to each determined branch function. For each terminal node, the sum of the positive patch 23 (first sample) cut out from the learning image 22 and the positive patch 25 (second sample) cut out from the learning image 24 and the negative cut out from the negative image 30 The ratio (first probability) with the patch 31 (third sample) and the ratio (second probability) between the positive patch 23 (first sample) and the positive patch 25 (second sample). The process to determine is executed.

以上のような学習処理によって、複数の決定木からなる決定木群が構築される。認識処理においては、重み付き確率を用いて、投票処理を行なうことで、検出対象の位置・姿勢の認識、ならびに、指定されたサブクラスへの該当性について判定する。 Through the learning process as described above, a decision tree group including a plurality of decision trees is constructed. In the recognition process, the voting process is performed using the weighted probability to determine the recognition of the position / orientation of the detection target and the appropriateness to the designated subclass.

＜Ｅ．認識処理＞
次に、本実施の形態に係る物体認識方法の認識処理について説明する。認識処理では、入力画像から切り出された複数の部分画像（入力パッチ）が決定木群に入力され、各決定木において当該入力パッチが末端ノードの情報を用いて投票処理が行なわれ、その投票処理の結果から、入力画像に含まれる検出対象の位置・姿勢の認識に加えて、当該認識されたワークを把持できるか否かが判定される。この投票処理では、投票平面（ＸＹ座標）が回転角の種類だけ（上述の例では、１°刻みで３６０枚）用意されており、これらを合成することで３次元の尤度（類似度）マップが構成される。そして、３次元の尤度マップ内の尤度の高い領域から、検出対象のＸＹ座標および回転角が判定される。 <E. Recognition process>
Next, the recognition process of the object recognition method according to the present embodiment will be described. In the recognition processing, a plurality of partial images (input patches) cut out from the input image are input to the decision tree group, and each input tree is subjected to voting processing using the information of the terminal node in each decision tree. From the result, in addition to the recognition of the position and orientation of the detection target included in the input image, it is determined whether or not the recognized workpiece can be gripped. In this voting process, voting planes (XY coordinates) are prepared only for rotation angle types (360 in the above example), and these are combined to obtain a three-dimensional likelihood (similarity). A map is constructed. Then, the XY coordinates and the rotation angle of the detection target are determined from the high likelihood region in the three-dimensional likelihood map.

さらに、本実施の形態に係る物体認識方法の認識処理においては、検出対象に相当すると判定された領域に投票されたサブクラス確率に基づいて、認識された検出対象が、把持容易（サブクラス１）であるか、把持困難（サブクラス２）であるかを判定する。 Furthermore, in the recognition processing of the object recognition method according to the present embodiment, the recognized detection target is easy to grasp (subclass 1) based on the subclass probability voted for the area determined to correspond to the detection target. It is determined whether there is a gripping difficulty (subclass 2).

図１３は、本実施の形態に係る物体認識方法における認識処理手順を示すフローチャートである。図１３に示す各ステップは、典型的には、画像処理装置１００のプロセッサ１０２が画像処理プログラムを実行することで実現される。 FIG. 13 is a flowchart showing a recognition processing procedure in the object recognition method according to the present embodiment. Each step shown in FIG. 13 is typically realized by the processor 102 of the image processing apparatus 100 executing an image processing program.

図１３を参照して、画像処理装置１００は、認識処理の対象となる入力画像を取得する（ステップＳ１０２）。続いて、画像処理装置１００は、取得した入力画像から複数の入力パッチを切り出し（ステップＳ１０４）、各入力パッチを決定木群に入力して、各入力パッチがそれぞれ到達する末端ノードを特定する（ステップＳ１０６）。画像処理装置１００は、各入力パッチがそれぞれの決定木において到達した末端ノードに保持されている情報を用いて投票処理を行なう（ステップＳ１０８）。なお、ポジティブサンプルの割合がしきい値未満である末端ノードについては、投票処理の対象にしないようにしてもよい。すなわち、認識処理においては、重みを反映した後のポジティブサンプルの割合が予め定められたしきい値以上の末端ノードについてのみ投票処理の対象としてもよい。 Referring to FIG. 13, the image processing apparatus 100 acquires an input image that is a target of recognition processing (step S102). Subsequently, the image processing apparatus 100 cuts out a plurality of input patches from the acquired input image (step S104), inputs each input patch to the decision tree group, and identifies a terminal node to which each input patch arrives ( Step S106). The image processing apparatus 100 performs a voting process using information held in the terminal node reached by each input patch in each decision tree (step S108). It should be noted that terminal nodes having a positive sample ratio less than the threshold value may not be subject to voting processing. In other words, in the recognition process, only the terminal nodes whose positive sample ratio after reflecting the weight is equal to or higher than a predetermined threshold may be subject to the voting process.

すべての入力パッチについての投票処理が完了すると、以下のような探索処理が実行される。具体的には、画像処理装置１００は、ある回転角θの投票平面を走査して１または複数の局所領域を特定する（ステップＳ１１０）。続いて、画像処理装置１００は、局所領域ごとの総和を算出する（ステップＳ１１２）とともに、投票平面における局所領域の総和の最大値を探索する（ステップＳ１１４）。画像処理装置１００は、すべての回転角θについて探索処理が完了したか否かを判定する（ステップＳ１１６）。探索処理が完了していない回転角がある場合（ステップＳ１１６においてＮＯの場合）には、画像処理装置１００は、新たな回転角θを選択し（ステップＳ１１８）、ステップＳ１１０以下の処理を繰り返す。 When the voting process for all input patches is completed, the following search process is executed. Specifically, the image processing apparatus 100 specifies one or a plurality of local regions by scanning a voting plane having a certain rotation angle θ (step S110). Subsequently, the image processing apparatus 100 calculates the sum for each local area (step S112) and searches for the maximum value of the sum of the local areas on the voting plane (step S114). The image processing apparatus 100 determines whether or not the search process has been completed for all the rotation angles θ (step S116). If there is a rotation angle for which the search process has not been completed (NO in step S116), the image processing apparatus 100 selects a new rotation angle θ (step S118), and repeats the processes in and after step S110.

すべての回転角θについて探索処理が完了している場合（ステップＳ１１６においてＹＥＳの場合）には、画像処理装置１００は、局所領域の総和が最大となる投票平面に対応する回転角θを検出対象の回転角（姿勢）とし、その局所領域の注目点を検出対象の位置として決定する（ステップＳ１２０）。なお、局所領域の総和が予め定められたしきい値未満であるような場合には、入力画像内に検出対象が存在しないと判定してもよい。 When the search processing has been completed for all the rotation angles θ (YES in step S116), the image processing apparatus 100 detects the rotation angle θ corresponding to the voting plane that maximizes the sum of the local areas. And the point of interest in the local region is determined as the position to be detected (step S120). Note that when the total sum of the local regions is less than a predetermined threshold value, it may be determined that there is no detection target in the input image.

続いて、画像処理装置１００は、ステップＳ１２０において決定された注目点（領域）に投票されたポジティブサンプルのサブクラス確率（すなわち、サブクラス１に投票された重み付き確率と、サブクラス２に投票された重み付き確率との比率）に基づいて、把持容易（サブクラス１）および把持困難（サブクラス２）のいずれであるかを決定する（ステップＳ１２２）。 Subsequently, the image processing apparatus 100 determines the subclass probability of the positive sample voted for the attention point (region) determined in step S120 (that is, the weighted probability voted for subclass 1 and the weight voted for subclass 2). Based on the ratio to the attachment probability, it is determined whether it is easy to grasp (subclass 1) or difficult to grasp (subclass 2) (step S122).

図１４は、本実施の形態に係る物体認識方法における認識処理を説明する図である。図１４を参照して、入力画像から切り出された複数の入力パッチは、決定木６０−１，６０−２，…，６０−Ｎに入力される。それぞれの決定木６０−１，６０−２，…，６０−Ｎにおいて、入力された入力パッチがいずれの末端ノードに到達したかが特定される。 FIG. 14 is a diagram for explaining recognition processing in the object recognition method according to the present embodiment. Referring to FIG. 14, a plurality of input patches cut out from an input image are input to decision trees 60-1, 60-2,..., 60-N. In each decision tree 60-1, 60-2,..., 60-N, it is specified which terminal node the input patch has reached.

そして、それぞれの末端ノードに関連付けられている情報を用いて、３次元の尤度マップへの投票処理を行なう。より具体的には、入力パッチが到達した末端ノードに関連付けられているオフセットベクトルに基づいて、３次元の尤度マップの投票先（検出対象の位置）が特定され、投票値として、関連付けられている重み付きポジティブ確率が用いられる。 Then, voting processing to a three-dimensional likelihood map is performed using information associated with each terminal node. More specifically, based on the offset vector associated with the terminal node that the input patch has reached, the voting destination (position to be detected) of the three-dimensional likelihood map is identified and associated as a voting value. Weighted positive probabilities are used.

入力画像から切り出された複数の入力パッチのそれぞれについて投票処理が行なわれることで、いずれかの位置についての投票値が予め定められたしきい値を超えると、当該位置の投票面（回転角）および位置（ＸＹ座標）が検出対象に相当する位置として特定される。 When the voting process is performed for each of the plurality of input patches cut out from the input image, and the voting value for any position exceeds a predetermined threshold value, the voting plane (rotation angle) at that position And a position (XY coordinate) is specified as a position corresponding to a detection target.

さらに、検出対象に相当する位置に投票されたポジティブサンプルのうちのサブクラスの割合に基づいて、把持容易（サブクラス１）および把持困難（サブクラス２）のいずれであるかが決定される。 Furthermore, it is determined whether it is easy to grasp (subclass 1) or difficult to grasp (subclass 2) based on the ratio of the subclass of the positive samples voted to the position corresponding to the detection target.

以上のように、認識処理において、画像処理装置１００は、入力画像から得られた複数の入力パッチ（部分入力画像）が決定木群に与えられたときの、入力パッチがそれぞれ到達する末端ノードにおけるポジティブサンプルの重み付き確率（第１の確率）に基づいて、入力画像内に検出対象に相当するワークが含まれているか否かを判定するとともに、当該それぞれの末端ノードについてのサブクラス１またはサブクラス２の重み付き確率（第２の確率）に基づいて、入力画像内に含まれる検出対象に相当するワークが予め定められた条件（把持容易／把持困難）に適合するか否かを判定する。 As described above, in the recognition processing, the image processing apparatus 100 uses the terminal nodes to which the input patches reach when a plurality of input patches (partial input images) obtained from the input images are given to the decision tree group. Based on the weighted probability (first probability) of the positive sample, it is determined whether or not a work corresponding to the detection target is included in the input image, and subclass 1 or subclass 2 for each terminal node. Based on the weighted probability (second probability), it is determined whether or not the workpiece corresponding to the detection target included in the input image meets a predetermined condition (easy gripping / hard gripping).

＜Ｆ．認識結果＞
本実施の形態に係る物体認識方法を用いた認識結果について説明する。図１５は、本実施の形態に係る物体認識方法による認識結果の一例を示す図である。図１５に示す認識結果では、３本のワークである飲料容器を配置した状態で撮像された画像を入力画像とした。（ａ）認識結果に示されるように、検出対象であるとして３本の飲料容器のすべてが認識されている。一方、（ｂ）尤度マップに示すように、右側の２本の飲料容器については、サブクラス２の尤度マップにおいて相対的に高い値が示されており、サブクラス２、つまり把持困難である可能性が高いと判定されている。 <F. Recognition result>
A recognition result using the object recognition method according to the present embodiment will be described. FIG. 15 is a diagram illustrating an example of a recognition result obtained by the object recognition method according to the present embodiment. In the recognition result shown in FIG. 15, an image captured in a state where a beverage container that is three works is arranged is used as an input image. (A) As shown in the recognition result, all of the three beverage containers are recognized as detection targets. On the other hand, as shown in the (b) likelihood map, the two drink containers on the right side have a relatively high value in the likelihood map of the subclass 2, and may be difficult to grasp, i.e., the subclass 2. It is determined that the nature is high.

すなわち、本実施の形態に係る物体認識方法によれば、検出対象のワークの位置・姿勢の認識に加えて、サブクラス（把持容易／把持困難）の判定を同時に行なうことができることがわかる。 That is, according to the object recognition method according to the present embodiment, in addition to the recognition of the position / posture of the workpiece to be detected, it is possible to simultaneously determine the subclass (easy gripping / hard gripping).

＜Ｇ．追加学習処理＞
次に、本実施の形態に係る物体認識方法の学習処理により構築された決定木群の識別性能を高める処理について説明する。典型的には、学習処理において用いた学習画像が実際の現場などの環境に適合しない場合などには、決定木群が本来の識別性能を発揮できない場合がある。このような場合には、誤識別サンプルを用いて、決定木群に新たなノード（分岐関数：識別器）を追加する。本実施の形態に係る物体認識方法の追加学習処理は、このような新たなノードの追加処理を含む。このような追加学習処理を採用することで、決定木群の識別性能を高めることができる。 <G. Additional learning process>
Next, a process for improving the discrimination performance of the decision tree group constructed by the learning process of the object recognition method according to the present embodiment will be described. Typically, when the learning image used in the learning process is not suitable for an environment such as an actual site, the decision tree group may not exhibit the original discrimination performance. In such a case, a new node (branch function: discriminator) is added to the decision tree group using the misidentification sample. The additional learning process of the object recognition method according to the present embodiment includes such a new node addition process. By adopting such additional learning processing, it is possible to improve the discrimination performance of the decision tree group.

（ｇ１：概要）
本実施の形態に係る物体認識方法の追加学習処理においては、学習サンプルおよび誤識別サンプルを、既に構築されているすべての決定木に入力する。そして、すべての決定木について走査し、誤識別サンプルのクラス確率が低い末端ノードに子ノードを追加する。本明細書において、誤識別サンプルは、実際に認識処理を実行した結果、クラスまたはサブクラスを誤って識別された入力サンプルに加えて、クラスまたはサブクラスを誤って識別される可能性の高い入力サンプルを意味する。 (G1: Overview)
In the additional learning process of the object recognition method according to the present embodiment, the learning sample and the misidentification sample are input to all the decision trees that are already constructed. Then, all decision trees are scanned, and child nodes are added to the terminal nodes having a low class probability of misidentified samples. In this specification, misidentified samples include input samples that are likely to be misidentified as classes or subclasses in addition to those that are misidentified as classes or subclasses as a result of actual recognition processing. means.

すなわち、誤識別サンプルについての識別が困難である（すなわち、信頼度が低い）と考えられる末端ノードについては、より識別の信頼度を高めるための分岐関数を追加する。なお、追加する分岐関数としては、特徴選択型および事例型のいずれかを用いることができる。これらの分岐関数の具体的な内容については、後述する。 That is, for a terminal node that is considered to be difficult to identify misidentified samples (that is, with low reliability), a branch function for increasing the reliability of identification is added. Note that either a feature selection type or a case type can be used as the branch function to be added. Specific contents of these branch functions will be described later.

図１６は、本実施の形態に係る物体認識方法における追加学習処理を説明する図である。図１６（ａ）を参照して、学習処理によって構築された決定木６０に対して、正しいラベルが付与されている誤識別サンプルを入力する。図１６（ａ）に示す例では、誤識別サンプルは末端ノード５４−２に到達したとする。末端ノード５４−２でのクラス尤度では、ネガティブサンプルがポジティブサンプルに比較して高くなっているとする。そこで、誤識別サンプルが到達した末端ノード５４−２に分岐関数を追加することで、誤識別サンプルを学習サンプルから分離する。 FIG. 16 is a diagram for explaining additional learning processing in the object recognition method according to the present embodiment. With reference to Fig.16 (a), the misidentification sample to which the correct label is provided is input with respect to the decision tree 60 constructed | assembled by the learning process. In the example shown in FIG. 16A, it is assumed that the misidentification sample has reached the end node 54-2. In the class likelihood at the terminal node 54-2, it is assumed that the negative sample is higher than the positive sample. Therefore, the misidentification sample is separated from the learning sample by adding a branch function to the terminal node 54-2 where the misidentification sample has arrived.

図１６（ｂ）には、末端ノード５４−２が通常のノードに変化し、その子ノードとして、末端ノード５５−１および５５−２が追加されている例を示す。このように、子ノードを追加することで、決定木群の識別性能を高めることができる。 FIG. 16B shows an example in which the end node 54-2 is changed to a normal node, and end nodes 55-1 and 55-2 are added as its child nodes. In this way, by adding child nodes, the decision performance of the decision tree group can be improved.

誤識別サンプルが到達した末端ノードのうち、誤識別サンプルのクラス確率が高い場合には、その末端ノードに子ノードを追加しないようにしてもよい。この場合には、学習サンプルと誤識別サンプルとの間のクラス識別は十分にできていると考えられるからである。 If the class probability of the misidentification sample is high among the end nodes that have reached the misidentification sample, a child node may not be added to the end node. In this case, it is considered that class identification between the learning sample and the misidentification sample is sufficiently performed.

図１７は、本実施の形態に係る物体認識方法における追加学習処理手順を示すフローチャートである。図１７に示す各ステップは、典型的には、画像処理装置１００のプロセッサ１０２が画像処理プログラムを実行することで実現される。 FIG. 17 is a flowchart showing the additional learning processing procedure in the object recognition method according to the present embodiment. Each step shown in FIG. 17 is typically realized by the processor 102 of the image processing apparatus 100 executing an image processing program.

図１７を参照して、画像処理装置１００は、追加学習処理に用いられる学習サンプルおよび誤識別サンプルを取得する（ステップＳ２０２）。画像処理装置１００は、取得した学習サンプルおよび誤識別サンプルの各々を、構築済みの決定木群のそれぞれに入力し、各サンプルがそれぞれの決定木において到達する末端ノードを特定する（ステップＳ２０４）。すなわち、ステップＳ２０２およびＳ２０４において、画像処理装置１００は、学習サンプル（部分学習画像）、および入力サンプル（部分入力画像）のうち誤識別サンプル（誤識別されたまたは誤識別される可能性の高い入力サンプル）を、決定木群に与えて、各サンプルをいずれかの末端ノードに到達するまで順次分岐させることで、各サンプルが到達する末端ノードを特定する。 Referring to FIG. 17, the image processing apparatus 100 acquires a learning sample and a misidentification sample used for the additional learning process (step S202). The image processing apparatus 100 inputs each of the acquired learning sample and misidentification sample to each of the constructed decision tree groups, and identifies the end node to which each sample reaches in each decision tree (step S204). In other words, in steps S202 and S204, the image processing apparatus 100 inputs a misidentified sample (misidentified or misidentified) among the learning sample (partial learned image) and the input sample (partial input image). (Sample) is given to the decision tree group, and each sample is sequentially branched until reaching any one of the end nodes, thereby specifying the end node to which each sample reaches.

続いて、画像処理装置１００は、各決定木において、誤識別サンプルが到達した末端ノードにおける誤識別サンプルのクラス確率が予め定められた条件に適合するか否かを判断する（ステップＳ２０６）。誤識別サンプルのクラス確率が予め定められた条件に適合しない場合（ステップＳ２０６においてＹＥＳの場合）には、画像処理装置１００は、分岐関数を決定し（ステップＳ２０８）、決定した分岐関数によって生成された子ノードについての重みを更新する（ステップＳ２１０）。 Subsequently, the image processing apparatus 100 determines in each decision tree whether the class probability of the misidentification sample at the terminal node where the misidentification sample has reached meets a predetermined condition (step S206). If the class probability of the misidentification sample does not meet a predetermined condition (YES in step S206), the image processing apparatus 100 determines a branch function (step S208) and is generated by the determined branch function. The weight for the child node is updated (step S210).

生成された子ノードについての重みを更新した後（ステップＳ２１０の後）、または、誤識別サンプルのクラス確率が予め定められた条件に適合しない場合（ステップＳ２０６においてＹＥＳの場合）には、画像処理装置１００は、対象の決定木において、誤識別サンプルが到達したすべての末端ノードについての評価が完了したか否かを判断する（ステップＳ２１２）。誤識別サンプルが到達したすべての末端ノードについての評価が完了していなければ（ステップＳ２１２においてＮＯの場合）、画像処理装置１００は、誤識別サンプルが到達した別の末端ノードを選択し（ステップＳ２１４）、ステップＳ２０６以下の処理を繰り返す。 After updating the weight for the generated child node (after step S210), or when the class probability of the misidentified sample does not meet a predetermined condition (YES in step S206), image processing is performed. The apparatus 100 determines whether or not the evaluation has been completed for all terminal nodes that have reached the misidentification sample in the target decision tree (step S212). If the evaluation has not been completed for all terminal nodes that have reached the misidentification sample (NO in step S212), the image processing apparatus 100 selects another terminal node that has reached the misidentification sample (step S214). ), And the processing from step S206 is repeated.

すなわち、ステップＳ２０６〜Ｓ２１４において、画像処理装置１００は、誤識別サンプルが到達した末端ノードにおける学習サンプル（部分学習画像）と誤識別サンプル（部分入力画像）との識別確率に応じて、当該末端ノードから分岐する子ノードを追加する。 That is, in steps S206 to S214, the image processing apparatus 100 determines the end node according to the discrimination probability between the learning sample (partial learning image) and the misidentification sample (partial input image) at the end node where the misidentification sample has arrived. Add a child node that branches from.

誤識別サンプルが到達したすべての末端ノードについての評価が完了していれば（ステップＳ２１２においてＹＥＳの場合）、画像処理装置１００は、構築されている決定木群に含まれるすべての決定木についての評価が完了したか否かを判断する（ステップＳ２１６）。構築されている決定木群に含まれるすべての決定木についての評価が完了していなければ（ステップＳ２１６においてＮＯの場合）、画像処理装置１００は、構築されている決定木群に含まれる別の決定木を選択し（ステップＳ２１８）、ステップＳ２０６以下の処理を繰り返す。 If the evaluation has been completed for all the end nodes to which the misidentification sample has arrived (YES in step S212), the image processing apparatus 100 determines all the decision trees included in the constructed decision tree group. It is determined whether the evaluation is completed (step S216). If the evaluation has not been completed for all the decision trees included in the constructed decision tree group (NO in step S216), the image processing apparatus 100 determines whether another decision tree group included in the constructed decision tree group includes another decision tree group. A decision tree is selected (step S218), and the processing from step S206 is repeated.

構築されている決定木群に含まれるすべての決定木についての評価が完了していれば（ステップＳ２１６においてＹＥＳの場合）、画像処理装置１００は、追加学習処理を終了する。 If the evaluation has been completed for all decision trees included in the constructed decision tree group (YES in step S216), image processing apparatus 100 ends the additional learning process.

（ｇ２：誤識別サンプルの収集およびラベル付与）
追加学習処理をするためには、実際の認識処理の結果得られる誤識別サンプルを収集する必要がある。実際の認識処理の対象となるワークには、ラベル（教師信号）が与えられていないため、認識結果を目視などで確認して、誤識別サンプルを収集することが好ましい。しかしながら、このような目視による誤識別サンプルの収集という手法には、高い人的コストを要するため、ＶｏｔｅＥｎｔｒｏｐｙを評価して、誤識別サンプルを自動的に収集するようにしてもよい。ＶｏｔｅＥｎｔｒｏｐｙは、アンサンブル識別器における識別結果の信頼度を示す。ＶｏｔｅＥｎｔｒｏｐｙの値ＶＥ（ｘ）は、以下の（１０）式のように定義される。 (G2: Collection of misidentified samples and labeling)
In order to perform additional learning processing, it is necessary to collect misidentification samples obtained as a result of actual recognition processing. Since a work (a teacher signal) is not given to a work that is a target of an actual recognition process, it is preferable to collect a misidentification sample by visually confirming a recognition result. However, since such a technique of collecting erroneously identified samples by visual inspection requires high human costs, it may be possible to evaluate the vote entry and automatically collect erroneously identified samples. “Vote Entropy” indicates the reliability of the identification result in the ensemble classifier. The value VE (x) of the vote entropy is defined as the following equation (10).

但し、Ｔは決定木の数、Ｖ（ｙ_ｍ）は、ラベルｙ_ｍと予測した決定木の数を示す。このＶｏｔｅＥｎｔｒｏｐｙの値が高い入力サンプルについては、決定木群による識別が曖昧であると判断することできるので、ＶｏｔｅＥｎｔｒｏｐｙの値が予め定められたしきい値以上である入力サンプルを自動的に収集し、当該自動的に収集した入力サンプルに対して、目視にて正しいラベルを付与することで、追加学習処理に使用する誤識別サンプルを生成する。 However, T is the number of decision trees, V (y _m) indicates the number of decision trees that predict the label y _m. For input samples with a high value of Vote Entropy, it can be determined that the identification by the decision tree group is ambiguous. Therefore, input samples whose Vote Entropy value is greater than or equal to a predetermined threshold value are automatically collected. Then, by applying a visual correct label to the automatically collected input sample, a misidentification sample used for the additional learning process is generated.

（ｇ３：特徴選択型の分岐関数）
子ノードの追加に用いられる特徴選択型の分岐関数は、参照する特徴次元およびそのしきい値を、ランダムに選択した候補の中から得られる情報利得に基づいて選択する手法である。特徴量Ｆの算出には、Ｈａａｒ−ｌｉｋｅフィルタを用いることができる。また、分岐関数としては、Ｈａａｒ−ｌｉｋｅフィルタのフィルタパターン、サイズ、位置を選択する。また、分岐関数としては、以下の（１１）式のように定義される。 (G3: Feature selection type branch function)
The feature selection type branch function used for adding a child node is a method of selecting a feature dimension to be referenced and its threshold value based on information gain obtained from randomly selected candidates. For calculation of the feature amount F, a Haar-like filter can be used. As a branch function, the filter pattern, size, and position of the Haar-like filter are selected. Further, the branch function is defined as the following equation (11).

但し、Ｆ（ｘ）は入力サンプルからＨａａｒ−ｌｉｋｅフィルタにより特徴抽出した値を示し、τはしきい値を示す。 Here, F (x) indicates a value obtained by extracting features from an input sample by a Haar-like filter, and τ indicates a threshold value.

（ｇ４：事例型の分岐関数）
子ノードの追加に用いられる事例型の分岐関数では、誤識別サンプルより選択されたテンプレートを事例として用いる。事例型の分岐関数は、ある誤識別サンプルをテンプレートとして、同じ末端ノードの学習サンプルと誤識別サンプルとの距離を計算し、最も距離が近く、かつ、クラスの異なるサンプルとの中間点をしきい値として決定し、分岐関数を決定する。追加学習後に、誤識別サンプルが存在する場合にはさらに追加学習を行なう。距離計算には、ユークリッド距離を使用する。テンプレートとしきい値との組み合わせを情報利得に基づいて選択する。分岐関数としては、以下の（１２）式のように定義される。 (G4: Case type branch function)
In the case type branch function used for adding child nodes, a template selected from the misidentification sample is used as a case. The case type bifurcation function calculates the distance between the learning sample and the misidentification sample of the same end node using a certain misidentification sample as a template, and thresholds the midpoint between the sample with the closest distance and a different class. Determine as a value and determine the branch function. After the additional learning, if there is a misidentification sample, additional learning is further performed. The Euclidean distance is used for the distance calculation. A combination of template and threshold is selected based on information gain. The branch function is defined as the following equation (12).

但し、Ｄはテンプレートと入力サンプルとの距離を示し、τはしきい値を示す。また、事例型の分岐関数のしきい値は、以下の（１３）式のように定義される。 However, D shows the distance of a template and an input sample, (tau) shows a threshold value. Further, the threshold value of the case type branch function is defined as the following equation (13).

但し、ｄ_ｉはテンプレートとクラスの異なる学習サンプルのうちで最も距離の近いサンプルを示し、Ｔはテンプレートを示す。 However, d _i denotes the sample closest distances among the templates and classes of different learning samples, T is shows a template.

＜Ｈ．利点＞
本実施の形態に係る物体認識方法は、クラスを識別するための入力サンプル、および、クラス内をさらにサブクラスに識別するための入力サンプルを用いて、一度に学習処理を行なって決定木群を構築できるとともに、共通の決定木群を用いて一度の認識処理によって、クラス判別およびサブクラス判別を同時に行なうことができる。そのため、学習処理および認識処理に要する時間および人的コストを低減できる。 <H. Advantage>
The object recognition method according to the present embodiment constructs a decision tree group by performing learning processing at a time using an input sample for identifying a class and an input sample for further identifying the class as a subclass. In addition, class discrimination and subclass discrimination can be simultaneously performed by a single recognition process using a common decision tree group. Therefore, the time and human cost required for the learning process and the recognition process can be reduced.

より具体的な効果として、入力画像内に検出対象が含まれているか否かを判定するとともに、当該入力画像内に含まれる検出対象が予め定められた条件に適合するか否か（認識されたワークがおかれた状況）を並列的に判定することができる。また、入力画像内に含まれる検出対象が予め定められた条件に適合するか否かを判定するためのルールを目視などによって決定する必要がなく、これらのルール決定に係る人的コストを低減できる。 As a more specific effect, it is determined whether or not a detection target is included in the input image, and whether or not the detection target included in the input image meets a predetermined condition (recognized) The situation where the workpiece is placed) can be judged in parallel. In addition, it is not necessary to visually determine a rule for determining whether or not a detection target included in the input image meets a predetermined condition, and the human cost for determining these rules can be reduced. .

また、本実施の形態に係る物体認識方法は、構築された決定木群を用いた認識処理において誤識別を生じる場合（あるいは、誤識別を生じる可能性が高い場合）には、当該決定木群を追加学習することができ、このような誤識別が発生する可能性を低減できる。 In addition, the object recognition method according to the present embodiment, when erroneous identification occurs in the recognition process using the constructed decision tree group (or when there is a high possibility of erroneous identification), the decision tree group Can be further learned, and the possibility of such misidentification can be reduced.

＜Ｉ．その他の実施の形態＞
本実施の形態に係る物体認識方法に含まれるコンセプトの一つは、ネガティブサンプルと類似したポジティブサンプルの重みを相対的に低くし、また、サブクラス１と類似したサブクラス２の重み（および、サブクラス２と類似したサブクラス１の重み）を相対的に低くすればよい。そのための重みの数学的な処理については任意に採用できる。すなわち、ネガティブサンプルと類似していないポジティブサンプルの重みを相対的に高くするようにし、あるいは、サブクラス１と類似していないサブクラス２の重み（および、サブクラス２と類似していないサブクラス１の重み）を相対的に高くするようにしてもよい。 <I. Other Embodiments>
One of the concepts included in the object recognition method according to the present embodiment is that the weight of a positive sample similar to a negative sample is relatively low, and the weight of subclass 2 similar to subclass 1 (and subclass 2) The weight of the subclass 1 similar to the above may be relatively low. The mathematical processing of the weights for that can be arbitrarily adopted. That is, the weight of the positive sample that is not similar to the negative sample is relatively high, or the weight of the subclass 2 that is not similar to the subclass 1 (and the weight of the subclass 1 that is not similar to the subclass 2) May be relatively high.

上述の説明では、「重み」が低いほど影響を低減するという前提で説明したが、「重み」の概念については、上述の説明とは逆の概念で用いてもよい。すなわち、「重み」が高いほど、投票処理の対象にはしないという概念であってもよい。 In the above description, the description has been made on the assumption that the influence is reduced as the “weight” is lower. However, the concept of “weight” may be used in the opposite concept to the above description. That is, the concept may be that the higher the “weight”, the less the target of voting processing.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は、上記した説明ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１画像処理システム、２撮像装置、４ワーク、１０入力画像、１２ワーク認識、１３認識処理、１４把持判定、１６把持動作、２０ポジティブ画像、２１，２３，２５ポジティブパッチ、２２，２４学習画像、３０ネガティブ画像、３１ネガティブパッチ、４２，４４投票結果、５０ルートノード、５１−１，５１−２，５２−１，５２−２，５２−３，５２−４ノード、５４−１〜５４−Ｎ，５５−１，５５−２末端ノード、６０決定木、６２サブセット、１００画像処理装置、１０２プロセッサ、１０４主メモリ、１０８画像処理プログラム、１１０ネットワークインターフェイス、１１２画像入力インターフェイス、１１４入力部、１１６表示部、１１８出力インターフェイス、１２０内部バス、２００ピッキングロボット。 1 image processing system, 2 imaging device, 4 work, 10 input image, 12 work recognition, 13 recognition processing, 14 gripping determination, 16 gripping operation, 20 positive image, 21, 23, 25 positive patch, 22, 24 learning image, 30 negative images, 31 negative patches, 42, 44 voting results, 50 root nodes, 51-1, 51-2, 52-1, 52-2, 52-3, 52-4 nodes, 54-1 to 54-N 55-1, 55-2 End node, 60 decision tree, 62 subset, 100 image processing device, 102 processor, 104 main memory, 108 image processing program, 110 network interface, 112 image input interface, 114 input unit, 116 display Part, 118 output interface, 120 internal bar Su, 200 picking robot.

Claims

Using a plurality of partial learning images obtained from the learning image, and constructing a decision tree group having a hierarchical structure from a root node to a plurality of terminal nodes,
The plurality of partial learning images include a first sample indicating a portion that satisfies a predetermined condition among detection targets, and a second sample indicating a portion that does not conform to the predetermined condition among the detection targets. And a third sample indicating a non-detection target,
The step of constructing the decision tree group includes:
For each node that is not a terminal node, determining a branch function that indicates to which of the child nodes that the given partial learning image should branch from the node;
According to the determined branch function, each of the partial learning images is sequentially branched until reaching one of the end nodes, so that the sum of the first sample and the second sample is obtained for each end node. Determining a first probability indicating a ratio between the first sample and the third sample, and a second probability indicating a ratio between the first sample and the second sample, and further comprising: When a plurality of partial input images obtained from the above are given to the decision tree group, a detection target is included in the input image based on a first probability for each terminal node to which each partial input image arrives. And whether the detection target included in the input image is based on the second condition based on the second probability for each terminal node. An image processing method comprising a step of determining whether or not the information conforms to.

The step of constructing the decision tree group is based on the result of classifying a plurality of partial learning images given according to the determined branch function into any one of the child nodes at each node that is not a terminal node. Determining a first weight indicating the discrimination capability between the detection target and the non-detection target and a second weight indicating the discrimination capability between the first sample and the second sample, respectively. The image processing method according to claim 1, further comprising:

The step of constructing the decision tree group is such that when the plurality of partial learning images given according to the decided branch function are classified into any child node, the first sample and the second sample are the same. The image processing method according to claim 1, further comprising a step of lowering a weight indicating the discrimination ability between the first sample and the second sample as the ratio of being classified as a child node is higher.

The partial learning image and the partial input image that is misidentified or highly likely to be misidentified among the partial input images are given to the decision tree group until each image reaches one of the end nodes. Identifying the end node to which each image arrives by sequentially branching;
The method according to claim 1, further comprising: adding a child node that branches from the terminal node according to a discrimination probability between the partial learning image and the partial input image at the terminal node reached by the terminal image. The image processing method according to claim 1.

Further comprising a step of setting a plurality of partial learning images indicating the detection targets extracted from regions close to each other in a single group,
The image processing method according to claim 1, wherein the step of constructing the decision tree group includes a step of determining a weight in common for a plurality of partial learning images belonging to the same group. .

6. The method according to claim 1, further comprising: generating a plurality of learning images by rotating the learning image by a predetermined angle; and extracting a plurality of partial learning images from the generated plurality of learning images. The image processing method as described.

A means for constructing a decision tree group having a hierarchical structure from a root node to a plurality of terminal nodes using a plurality of partial learning images obtained from a learning image,
The plurality of partial learning images include a first sample indicating a portion that satisfies a predetermined condition among detection targets, and a second sample indicating a portion that does not conform to the predetermined condition among the detection targets. And a third sample indicating a non-detection target,
The means for constructing the decision tree group includes:
Means for determining for each node that is not a terminal node, a branch function that indicates to which of the child nodes branched from the node the given partial learning image should be classified;
According to the determined branch function, each of the partial learning images is sequentially branched until reaching one of the end nodes, so that the sum of the first sample and the second sample is obtained for each end node. And a means for determining a first probability indicating a ratio between the first sample and the third sample, and a second probability indicating a ratio between the first sample and the second sample, and an input image When a plurality of partial input images obtained from the above are given to the decision tree group, a detection target is included in the input image based on a first probability for each terminal node to which each partial input image arrives. The detection target included in the input image is suitable for the predetermined condition based on the second probability for each terminal node. An image processing apparatus comprising means for determining whether or not to match.

An image processing program executed on a computer, the image processing program being stored in the computer
Using a plurality of partial learning images obtained from the learning image, to execute a step of building a decision tree group having a hierarchical structure from a root node to a plurality of terminal nodes,
The plurality of partial learning images include a first sample indicating a portion that satisfies a predetermined condition among detection targets, and a second sample indicating a portion that does not conform to the predetermined condition among the detection targets. And a third sample indicating a non-detection target,
The step of constructing the decision tree group includes:
For each node that is not a terminal node, determining a branch function that indicates to which of the child nodes that the given partial learning image should branch from the node;
According to the determined branch function, each of the partial learning images is sequentially branched until reaching one of the end nodes, so that the sum of the first sample and the second sample is obtained for each end node. Determining a first probability indicating a ratio between the first sample and the third sample, and a second probability indicating a ratio between the first sample and the second sample, and further comprising: When a plurality of partial input images obtained from the above are given to the decision tree group, a detection target is included in the input image based on a first probability for each terminal node to which each partial input image arrives. And whether the detection target included in the input image is based on the second condition based on the second probability for each terminal node. An image processing program for executing a step of determining whether or not the information conforms to the above.