JP2016115179A

JP2016115179A - Information processing unit, information processing method, and program

Info

Publication number: JP2016115179A
Application number: JP2014254080A
Authority: JP
Inventors: 壮馬白石; Soma Shiraishi; 哲夫井下; Tetsuo Ishita
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2014-12-16
Filing date: 2014-12-16
Publication date: 2016-06-23
Anticipated expiration: 2034-12-16
Also published as: JP6739896B2

Abstract

PROBLEM TO BE SOLVED: To recognize a holding object even when there is no holding object in an image of holding means holding a holding object.SOLUTION: A recognition device 100 includes an image acquisition section 110, a holding feature generation section 112, and a holding object recognition section 116. The image acquisition section 110 acquires an image of holding means 501 holding a holding object 502. The holding feature generation section 112 generates holding features representing a physical relationship of the holding means 501 and a plurality of predetermined sites in the image. The holding object recognition section 116 recognizes the holding object 502 on the basis of the holding features.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、情報処理方法、及び、プログラムに関し、特に、把持手段により把持されている物体を認識するための情報処理装置、情報処理方法、及び、プログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program, and more particularly, to an information processing device, an information processing method, and a program for recognizing an object held by a holding unit.

人等（以下、把持者とも記載する）が手の指等（以下、把持手段とも記載する）で物体（以下、把持対象とも記載する）を把持している状態を撮影した画像を用いて、把持対象を認識する技術が知られている。 Using an image of a state where a person or the like (hereinafter also referred to as a gripper) is holding an object (hereinafter also referred to as a gripping target) with a finger or the like (hereinafter also referred to as gripping means), A technique for recognizing a gripping object is known.

例えば、特許文献１に開示されている技術では、画像中の対象物の部分領域とデータベース内の物品形状とのマッチングを行うときに、手の把持姿勢から算出された空間の大きさで、マッチング対象の物品を限定する。 For example, in the technique disclosed in Patent Document 1, when matching a partial region of an object in an image with an article shape in a database, the matching is performed using the size of a space calculated from a hand gripping posture. Limit the items of interest.

また、非特許文献１に開示されている技術では、画像中の手領域と物体領域の形状の対の特徴を、把持パターン毎に記憶し、当該特徴を用いて把持パターンを認識する。 Further, in the technique disclosed in Non-Patent Document 1, the feature of the shape pair of the hand region and the object region in the image is stored for each grip pattern, and the grip pattern is recognized using the feature.

なお、関連技術として、特許文献２には、画像から検出された指の位置を用いて、操作対象の仮想キーを特定する技術が開示されている。特許文献３には、画像に含まれる対象物を、カラーヒストグラムを用いて認識する技術が開示されている。特許文献４には、動画像中の対象物の面積や外周長の変化の特徴をもとに、対象物を特定する技術が開示されている。非特許文献２には、指毎に指定色のついたグローブ（手袋）を着用して撮影した画像中で、指定色を探すことによって、指領域を検出する技術が開示されている。 As a related technique, Patent Document 2 discloses a technique for specifying a virtual key to be operated using a finger position detected from an image. Patent Document 3 discloses a technique for recognizing an object included in an image using a color histogram. Patent Document 4 discloses a technique for specifying an object based on the characteristics of changes in the area and outer peripheral length of the object in a moving image. Non-Patent Document 2 discloses a technique for detecting a finger region by searching for a designated color in an image photographed by wearing a glove (glove) having a designated color for each finger.

特開２０１０−２４４４１３号公報JP 2010-244413 A 特開２０１３−１４３０８２号公報JP2013-143082A 特開２０１２−１５０５５２号公報JP 2012-150552 A 特開２０１０−２４４４４０号公報JP 2010-244440 A

笠原啓雅、他３名、「把持パターン画像の学習に基づく欠損画素復元と物体認識」、画像の認識・理解シンポジウム（ＭＩＲＵ２００８）、２００８年７月、p.623-628Hiromasa Kasahara, 3 others, “Restoring Missing Pixels and Object Recognition Based on Grasping Pattern Image Learning”, Image Recognition and Understanding Symposium (MIRU2008), July 2008, p.623-628 渡辺賢、他３名、「カラーグローブを用いた指文字の認識」、電子情報通信学会論文誌、D-II、１９９７年、vol. J80-D-2、no. 10、p.2713-2722Ken Watanabe and three others, "Recognition of finger characters using color gloves", IEICE Transactions, D-II, 1997, vol. J80-D-2, no. 10, p.2713-2722

上述のように、把持手段が把持対象を把持している状態を撮影した画像で把持対象を認識する場合、例えば、指や手のひらにより把持対象が覆われてしまい、画像内に把持対象が存在しない場合がある。しかしながら、特許文献１、及び、非特許文献に記載された技術では、把持対象の部分的な画像を用いて把持対象を認識しているため、このように画像内に把持対象が存在しない場合は、把持対象を認識できない。 As described above, when the gripping target is recognized by an image obtained by capturing the state where the gripping means is gripping the gripping target, for example, the gripping target is covered by a finger or palm, and there is no gripping target in the image. There is a case. However, in the techniques described in Patent Literature 1 and Non-Patent Literature, since the gripping target is recognized using a partial image of the gripping target, there is no gripping target in the image in this way. The gripping target cannot be recognized.

本発明は、上述の課題を解決し、把持対象を把持している把持手段の画像中に把持対象が存在しない場合であっても、把持対象を認識できる、情報処理装置、情報処理方法、及び、プログラムを提供することである。 An information processing apparatus, an information processing method, and an information processing apparatus capable of recognizing a gripping target even when the gripping target does not exist in an image of a gripping unit that grips the gripping target. Is to provide a program.

本発明の情報処理装置は、把持対象を把持している把持手段の画像を取得する画像取得手段と、前記画像における前記把持手段の複数の所定部位間の位置関係を示す把持特徴を生成する、把持特徴生成手段と、前記把持特徴をもとに、前記把持対象を認識する、把持対象認識手段と、を備える。 The information processing apparatus of the present invention generates an image acquisition unit that acquires an image of a gripping unit that is gripping a gripping target, and a gripping feature that indicates a positional relationship between a plurality of predetermined parts of the gripping unit in the image. Gripping feature generating means; and gripping object recognition means for recognizing the gripping object based on the gripping feature.

本発明の情報処理方法は、把持対象を把持している把持手段の画像を取得し、前記画像における前記把持手段の複数の所定部位間の位置関係を示す把持特徴を生成し、前記把持特徴をもとに、前記把持対象を認識する。 The information processing method of the present invention acquires an image of a gripping means that grips a gripping target, generates a gripping feature indicating a positional relationship between a plurality of predetermined parts of the gripping means in the image, and Based on the above, the object to be grasped is recognized.

本発明のプログラムは、コンピュータに、把持対象を把持している把持手段の画像を取得し、前記画像における前記把持手段の複数の所定部位間の位置関係を示す把持特徴を生成し、前記把持特徴をもとに、前記把持対象を認識する、処理を実行させる。 The program of the present invention acquires, on a computer, an image of a gripping unit that grips a gripping target, generates a gripping feature indicating a positional relationship between a plurality of predetermined parts of the gripping unit in the image, and the gripping feature Based on the above, a process for recognizing the gripping object is executed.

本発明の効果は、把持対象を把持している把持手段の画像中に把持対象が存在しない場合であっても、把持対象を認識できることである。 The effect of the present invention is that the gripping target can be recognized even when the gripping target does not exist in the image of the gripping means that grips the gripping target.

本発明の実施の形態の基本的な構成を示すブロック図である。It is a block diagram which shows the fundamental structure of embodiment of this invention. 本発明の第１の実施の形態における、認識装置１００の構成を示すブロック図である。It is a block diagram which shows the structure of the recognition apparatus 100 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、コンピュータにより実現された認識装置１００の構成を示すブロック図である。It is a block diagram which shows the structure of the recognition apparatus 100 implement | achieved by the computer in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、認識装置１００の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the recognition apparatus 100 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、取得した画像の例を示す図である。It is a figure which shows the example of the acquired image in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、把持特徴量の生成方法の例を示す図である。It is a figure which shows the example of the production | generation method of the grasping feature-value in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、把持特徴量の生成方法の他の例を示す図である。It is a figure which shows the other example of the production | generation method of the grasping feature-value in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、把持特徴情報１１５の例を示す図である。It is a figure which shows the example of the holding | grip characteristic information 115 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、インスタンスの抽出例を示す図である。It is a figure which shows the example of extraction of the instance in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、物体尤度の算出結果の例を示す図である。It is a figure which shows the example of the calculation result of the object likelihood in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、把持特徴情報１１５の他の例を示す図である。It is a figure which shows the other example of the grip characteristic information 115 in the 1st Embodiment of this invention. 本発明の第２の実施の形態における、認識装置２００の構成を示すブロック図である。It is a block diagram which shows the structure of the recognition apparatus 200 in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における、認識装置２００の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the recognition apparatus 200 in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における、物体特徴情報２２５の例を示す図である。It is a figure which shows the example of the object characteristic information 225 in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における、統合尤度の算出結果の例を示す図である。It is a figure which shows the example of the calculation result of integrated likelihood in the 2nd Embodiment of this invention. 本発明の第３の実施の形態における、認識装置２００の構成を示すブロック図である。It is a block diagram which shows the structure of the recognition apparatus 200 in the 3rd Embodiment of this invention. 本発明の第３の実施の形態における、認識装置２００の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the recognition apparatus 200 in the 3rd Embodiment of this invention. 本発明の第４の実施の形態における、認識装置２００の構成を示すブロック図である。It is a block diagram which shows the structure of the recognition apparatus 200 in the 4th Embodiment of this invention. 本発明の第４の実施の形態における、認識装置２００の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the recognition apparatus 200 in the 4th Embodiment of this invention. 本発明の第４の実施の形態における、動き特徴を含む把持特徴量の算出例を示す図である。It is a figure which shows the example of calculation of the grasping feature-value containing a movement feature in the 4th Embodiment of this invention.

＜第１の実施の形態＞
はじめに、本発明の第１の実施の形態について説明する。 <First Embodiment>
First, a first embodiment of the present invention will be described.

本発明の第１の実施の形態では、把持手段５０１の複数の所定部位間の位置関係をもとに、把持対象５０２を認識する。なお、本発明の実施の形態では、把持手段５０１が人の手である場合を例に説明する。 In the first embodiment of the present invention, the gripping target 502 is recognized based on the positional relationship between a plurality of predetermined parts of the gripping means 501. In the embodiment of the present invention, a case where the gripping means 501 is a human hand will be described as an example.

はじめに、本発明の第１の実施の形態の構成を説明する。 First, the configuration of the first exemplary embodiment of the present invention will be described.

図２は、本発明の第１の実施の形態における、認識装置１００の構成を示すブロック図である。認識装置１００は、本発明の情報処理装置の一実施形態である。 FIG. 2 is a block diagram showing the configuration of the recognition apparatus 100 in the first embodiment of the present invention. The recognition apparatus 100 is an embodiment of the information processing apparatus of the present invention.

図２を参照すると、本発明の第１の実施の形態の認識装置１００は、画像取得部１１０、把持手段検出部１１１、把持特徴生成部１１２、物体尤度算出部１１３、把持特徴記憶部１１４、及び、把持対象認識部１１６を含む。 Referring to FIG. 2, the recognition apparatus 100 according to the first embodiment of the present invention includes an image acquisition unit 110, a gripping means detection unit 111, a gripping feature generation unit 112, an object likelihood calculation unit 113, and a gripping feature storage unit 114. And a grasping object recognition unit 116.

画像取得部１１０は、把持対象５０２を把持している把持手段５０１の画像を取得する。画像取得部１１０は、赤、青、緑の３色情報を取得可能なＲＧＢカメラでもよい。また、画像取得部１１０は、遠赤外線カメラやマルチスペクトルカメラのように、他の波長信号情報を取得可能なカメラでもよい。また、画像取得部１１０は、画像中の各画素に、カメラから物体までの距離情報を収められるような、距離カメラ（距離センサ）でもよい。さらに、画像取得部１１０は、上述の３色情報、他の波長信号情報、及び、距離情報の内の一つ、または、複数を同時に取得可能なカメラでもよい。 The image acquisition unit 110 acquires an image of the gripping unit 501 that is gripping the gripping target 502. The image acquisition unit 110 may be an RGB camera that can acquire three-color information of red, blue, and green. The image acquisition unit 110 may be a camera that can acquire other wavelength signal information, such as a far-infrared camera or a multispectral camera. Further, the image acquisition unit 110 may be a distance camera (distance sensor) that can store distance information from the camera to an object in each pixel in the image. Furthermore, the image acquisition unit 110 may be a camera that can simultaneously acquire one or more of the above-described three-color information, other wavelength signal information, and distance information.

把持手段検出部１１１は、画像取得部１１０により取得された画像における、把持手段５０１の複数の所定部位の各々の位置、または、位置と方向を検出する。本発明の実施の形態では、所定部位として、把持手段５０１の指が用いられる。また、所定部位の位置（指の位置）として、各指の指先や関節等、指上で指定された位置が用いられる。 The gripping means detection unit 111 detects the position or position and direction of each of a plurality of predetermined parts of the gripping means 501 in the image acquired by the image acquisition unit 110. In the embodiment of the present invention, the finger of the gripping means 501 is used as the predetermined part. In addition, as the position of the predetermined part (finger position), a position designated on the finger such as a fingertip or a joint of each finger is used.

把持特徴生成部１１２は、把持手段５０１による把持特徴を表す把持特徴量として、複数の所定部位間の位置関係（指間の位置関係）を示す把持特徴量を生成する。本発明の実施の形態では、複数の所定部位の位置間の位置関係として、各指の位置の座標値や、各指の位置の座標値と方向、指の位置間の距離等が用いられる。 The gripping feature generation unit 112 generates a gripping feature amount indicating a positional relationship between a plurality of predetermined parts (a positional relationship between fingers) as a gripping feature amount representing a gripping feature by the gripping unit 501. In the embodiment of the present invention, the coordinate value of each finger position, the coordinate value and direction of each finger position, the distance between finger positions, and the like are used as the positional relationship between the positions of a plurality of predetermined parts.

把持特徴記憶部１１４は、把持特徴情報１１５を記憶する。把持特徴情報１１５は、認識すべき物体のカテゴリに対する、「把持特徴量に基づく物体尤度」を算出するための情報である。把持特徴情報１１５には、後述するように、物体尤度の算出方法に応じた情報が設定される。 The grip feature storage unit 114 stores grip feature information 115. The gripping feature information 115 is information for calculating “object likelihood based on gripping feature amount” for the category of the object to be recognized. In the gripping feature information 115, information corresponding to the object likelihood calculation method is set as will be described later.

物体尤度算出部１１３は、把持特徴生成部１１２により生成された把持特徴量と把持特徴記憶部１１４に記憶されている把持特徴情報１１５とを用いて、物体のカテゴリ毎に、把持特徴量に基づく物体尤度を算出する。 The object likelihood calculation unit 113 uses the gripping feature amount generated by the gripping feature generation unit 112 and the gripping feature information 115 stored in the gripping feature storage unit 114 to set the gripping feature amount for each object category. Based on the object likelihood.

把持対象認識部１１６は、物体尤度算出部１１３により算出された物体尤度を用いて、把持対象５０２のカテゴリを認識する。 The gripping target recognition unit 116 recognizes the category of the gripping target 502 using the object likelihood calculated by the object likelihood calculation unit 113.

なお、認識装置１００は、ＣＰＵ（Central Processing Unit）とプログラムを記憶した記憶媒体を含み、プログラムに基づく制御によって動作するコンピュータであってもよい。 Note that the recognition apparatus 100 may be a computer that includes a CPU (Central Processing Unit) and a storage medium that stores a program, and that operates by control based on the program.

図３は、本発明の第１の実施の形態における、コンピュータにより実現された認識装置１００の構成を示すブロック図である。 FIG. 3 is a block diagram showing a configuration of the recognition apparatus 100 realized by a computer according to the first embodiment of the present invention.

認識装置１００は、ＣＰＵ１０１、ハードディスクやメモリ等の記憶デバイス（記憶媒体）１０２、他の装置等と通信を行う通信デバイス１０３、マウスやキーボード等の入力デバイス１０４、及び、ディスプレイ等の出力デバイス１０５を含む。 The recognition apparatus 100 includes a CPU 101, a storage device (storage medium) 102 such as a hard disk and a memory, a communication device 103 that communicates with other apparatuses, an input device 104 such as a mouse and a keyboard, and an output device 105 such as a display. Including.

ＣＰＵ１０１は、画像取得部１１０、把持手段検出部１１１、把持特徴生成部１１２、物体尤度算出部１１３、及び、把持対象認識部１１６の機能を実現するためのコンピュータプログラムを実行する。記憶デバイス１０２は、把持特徴記憶部１１４のデータを記憶する。入力デバイス１０４は、利用者等から、把持対象５０２を把持している把持手段５０１の画像を取得する。出力デバイス１０５が、利用者等へ、認識結果（把持対象５０２の物体のカテゴリ）を出力する。また、通信デバイス１０３は、他の装置等から画像を取得し、他の装置等へ認識結果を出力してもよい。 The CPU 101 executes a computer program for realizing the functions of the image acquisition unit 110, the gripping means detection unit 111, the gripping feature generation unit 112, the object likelihood calculation unit 113, and the gripping target recognition unit 116. The storage device 102 stores data of the gripping feature storage unit 114. The input device 104 acquires an image of the gripping unit 501 that is gripping the gripping target 502 from a user or the like. The output device 105 outputs the recognition result (the category of the object of the gripping target 502) to the user or the like. Further, the communication device 103 may acquire an image from another device or the like, and output a recognition result to the other device or the like.

次に、本発明の第１の実施の形態の動作を説明する。 Next, the operation of the first exemplary embodiment of the present invention will be described.

図４は、本発明の第１の実施の形態における、認識装置１００の動作を示すフローチャートである。 FIG. 4 is a flowchart showing the operation of the recognition apparatus 100 in the first embodiment of the present invention.

はじめに、画像取得部１１０は、利用者等から、把持対象５０２を把持している把持手段５０１の画像を取得する（ステップＳ１０１）。 First, the image acquisition unit 110 acquires an image of the gripping unit 501 that is gripping the gripping target 502 from a user or the like (step S101).

図５は、本発明の第１の実施の形態における、取得した画像の例を示す図である。
例えば、画像取得部１１０は、図５のような画像を取得する。 FIG. 5 is a diagram illustrating an example of an acquired image in the first embodiment of the present invention.
For example, the image acquisition unit 110 acquires an image as illustrated in FIG.

把持手段検出部１１１は、画像取得部１１０により取得された画像における、把持手段５０１の各指の位置、または、各指の位置と方向を検出する（ステップＳ１０２）。ここで、把持手段検出部１１１は、指の位置や方向を、例えば、指の色や形状、配置に基づいて検出する。また、把持手段検出部１１１は、指の位置を、非特許文献２に記載されて技術を用いて検出してもよい。さらに、把持手段検出部１１１は、指の位置を、画像中で指が存在する部分について検出してもよいし、指が存在する部分の検出結果をもとにした形状推定等により、指が存在しない部分についても推定してよい。また、指の位置は、画像中の２次元座標で指定されてもよいし、実空間内の３次元座標で指定されてもよい。また、座標の値は、ある特定の点を原点とした絶対座標値でもよいし、任意の点からの相対座標値であってもよい。また、把持手段検出部１１１は、各指について、複数の位置を検出してもよい。 The gripping means detection unit 111 detects the position of each finger of the gripping means 501 or the position and direction of each finger in the image acquired by the image acquisition unit 110 (step S102). Here, the grip means detection unit 111 detects the position and direction of the finger based on, for example, the color, shape, and arrangement of the finger. Further, the gripping means detection unit 111 may detect the position of the finger using a technique described in Non-Patent Document 2. Furthermore, the gripping means detection unit 111 may detect the position of the finger for a portion where the finger is present in the image, or by estimating the shape based on the detection result of the portion where the finger is present. You may estimate also about the part which does not exist. Further, the position of the finger may be specified by two-dimensional coordinates in the image, or may be specified by three-dimensional coordinates in the real space. In addition, the coordinate value may be an absolute coordinate value with a specific point as an origin, or a relative coordinate value from an arbitrary point. Further, the gripping means detection unit 111 may detect a plurality of positions for each finger.

把持手段検出部１１１は、画像において検出された指の本数が、２本以上かどうかを判定する（ステップＳ１０３）。
ステップＳ１０３において、検出された指の本数が２本以上の場合（ステップＳ１０３／Ｙ）、把持特徴生成部１１２は、検出された各指の位置、または、各指の位置と方向をもとに、把持特徴量を生成する（ステップＳ１０５）。ここで、把持特徴生成部１１２は、例えば、以下の把持特徴量生成方法１〜３のいずれかに従って、把持特徴量を生成する。 The grip means detection unit 111 determines whether the number of fingers detected in the image is 2 or more (step S103).
In step S103, when the number of detected fingers is two or more (step S103 / Y), the gripping feature generation unit 112 determines the position of each finger or the position and direction of each finger. Then, a gripping feature amount is generated (step S105). Here, the gripping feature generation unit 112 generates a gripping feature amount, for example, according to any of the following gripping feature amount generation methods 1 to 3.

（把持特徴量生成方法１）
図６は、本発明の第１の実施の形態における、把持特徴量の生成方法の例を示す図である。ここで、把持手段検出部１１１により、ｎ本の指の位置が検出されたと仮定する。この場合、図６に示すような、当該ｎ本の指の位置を互いに結ぶ線分が得られる。線分の数Ｎ_ｌは、数１式により算出される。 (Grip feature generation method 1)
FIG. 6 is a diagram illustrating an example of a method for generating a gripping feature amount according to the first embodiment of the present invention. Here, it is assumed that the position of n fingers is detected by the gripping means detection unit 111. In this case, a line segment connecting the positions of the n fingers is obtained as shown in FIG. The number N _l of line segments is calculated by the equation (1).

Ｎ_ｌ本の線分の各々の長さｌ_ｉ（ｉ＝１，…，Ｎ_ｌ）は、検出されたｊ番目（ｊ＝１，…，ｎ）の指の位置をＰ_ｊ＝（ｘ_ｊ，ｙ_ｊ，ｚ_ｊ）とすると、数２式により算出される。 The length l _i (i = 1,..., N _l ) of each of the N _l line segments represents the detected j-th (j = 1,..., N) finger position as P _j = (x _j , Y _j , z _j ), it is calculated by equation (2).

ここで、線分の長さｌ_ｉを大きい順に並べると、数３式のようなベクトル形式の把持特徴量Ｖ_Ａが定義できる。 Here, if the lengths l _i of the line segments are arranged in descending order, the vectorized gripping feature value V _A can be defined as shown in Equation 3.

また、把持特徴量Ｖ_Ａの要素の最大値をｍｘとすると、数４式のような把持特徴量Ｖ’_Ａが定義できる。 Further, if the maximum value of the elements of the gripping feature amount V _A is mx, the gripping feature amount V ′ _A as shown in Equation 4 can be defined.

把持特徴生成部１１２は、図６の線分の長さをもとに、数３式、または、数４式のような把持特徴量を生成する。 The gripping feature generation unit 112 generates gripping feature amounts such as Equation 3 or Equation 4 based on the length of the line segment in FIG.

例えば、物体尤度算出部１１３は、図５の画像から、把持特徴量Ｖ_Ａ＝（1.5, 1.0, 0.3, …）を生成する。 For example, the object likelihood calculating unit 113 generates a gripping feature value V _A = (1.5, 1.0, 0.3,...) From the image of FIG.

（把持特徴量生成方法２）
図７は、本発明の第１の実施の形態における、把持特徴量の生成方法の他の例を示す図である。ここで、把持手段検出部１１１により、ｎ本の指の位置が検出されたと仮定する。この場合、図７に示すように、検出された各指の位置と他の指の位置とを結ぶｎ−１本の線分が得られる。これらの線分の長さの組をＧｒ_ｉ＝（ｌ_ｉ，１，ｌ_ｉ，２，…，ｌ_{ｉ，ｎ−１}）とする。また_、Ｇｒ_ｉの要素を大きさ（長さ）の降順に並び替えたものをＧｒ’_ｉ＝（ｌ’_ｉ，１，ｌ’_ｉ，２，…，ｌ’_{ｉ，ｎ−１}）とする。さらに、Ｇｒ’_ｉを、最初の要素ｌ’_ｉ，１の大きい順に並べ換えたものを（Ｇｒ”_０，Ｇｒ”_１，…，Ｇｒ”_ｎ）とし、Ｇｒ”_ｉの要素をＧｒ”_ｉ＝（ｌ”_ｉ，１，ｌ”_ｉ，２，…，ｌ”_{ｉ，ｎ−１}）と記述する。各Ｇｒ”_ｉの要素を並べることにより、数５式のような把持特徴量Ｖ_Ｂが定義できる。 (Grip feature generation method 2)
FIG. 7 is a diagram illustrating another example of a method for generating a gripping feature value according to the first embodiment of this invention. Here, it is assumed that the position of n fingers is detected by the gripping means detection unit 111. In this case, as shown in FIG. 7, n−1 line segments connecting the detected positions of the fingers and the positions of the other fingers are obtained. A set of lengths of these segments _{_{_{Gr i = (l i, 1}}} , l i, 2, ..., l i, n-1) to. Also _, Gr ′ _i = (l ′ _{i, 1} , l ′ _{i, 2} ,..., L ′ _{i, n−1} ) is obtained by rearranging the elements of Gr _i in descending order of size (length). . Furthermore, 'a _i, the first element l' Gr those rearranged in descending order of _{_{_{i, 1 (Gr "0,}}} Gr" 1, ..., Gr "n) and then, _{Gr" i} elements Gr _"i = the ( l ″ _{i, 1} , l ″ _{i, 2} ,..., l ″ _{i, n−1} ). By arranging the elements of each Gr ″ _i , a gripping feature value V _B as shown in Equation 5 can be defined.

また、把持特徴量Ｖ_Ｂの要素の最大値ｍｘを用いて、数６式のような把持特徴量Ｖ’_Ｂが定義できる。 Further, using the maximum value mx of the elements of the gripping feature value V _B , a gripping feature value V ′ _B as shown in Equation 6 can be defined.

把持特徴生成部１１２は、図７の線分の長さをもとに、数５式、または、数６式のような把持特徴量を生成する。 The gripping feature generation unit 112 generates gripping feature amounts such as Equation 5 or Equation 6 based on the length of the line segment in FIG.

（把持特徴量生成方法３）
把持手段検出部１１１により、各指の位置と方向に加えて、各指が親指、人差し指、中指、薬指、及び、小指の内のどの指かを特定できたと仮定する。この場合、例えば、親指、人差し指、中指、薬指、及び、小指の順で、各指の座標値Ｐ_ｊ＝（ｘ_ｊ，ｙ_ｊ，ｚ_ｊ）、（ｊ＝１，…，ｎ）及び、方向Ｄ_ｊ＝（ａ_ｊ，ｂ_ｊ，ｃ_ｊ）が得られる。ここで、各指の方向には、例えば、距離センサで得られる指先位置の法線の方向を用いてもよいし、第一関節から指先へ向かう方向を用いてもよい。また、指の順序（親指、人差し指、…）として、他の順序を用いてもよい。 (Grip feature generation method 3)
It is assumed that the gripping means detection unit 111 can identify the finger among the thumb, the index finger, the middle finger, the ring finger, and the little finger in addition to the position and direction of each finger. In this case, for example, in the order of thumb, index finger, middle finger, ring finger, and little finger, the coordinate values P _j = (x _j , y _j , z _j ), (j = 1,..., N) and The direction D _j = (a _j , b _j , c _j ) is obtained. Here, for example, the direction of the normal of the fingertip position obtained by the distance sensor may be used as the direction of each finger, or the direction from the first joint to the fingertip may be used. Also, other orders may be used as the order of the fingers (thumb, index finger,...).

これらの座標値、方向を、所定の座標系Ｚ上の座標、方向で表すことにより、数７式のような把持特徴量Ｖ_Ｃ、または、数８式のような把持特徴量Ｖ_Ｄが定義できる。 These coordinates, the direction, the coordinates on a predetermined coordinate system Z, by expressing in the direction, the gripping feature amount such as equation (7) V _C, or gripping feature quantity V _D as equation (8) is defined it can.

座標系Ｚとしては、例えば、予め定めた一本の指の方向と平行な座標軸を持つ座標系を用いてもよい。 As the coordinate system Z, for example, a coordinate system having a coordinate axis parallel to a predetermined direction of one finger may be used.

把持特徴生成部１１２は、各指の位置や方向をもとに、数７式、または、数８式の把持特徴量を生成する。 The gripping feature generation unit 112 generates gripping feature amounts of Formula 7 or Formula 8 based on the position and direction of each finger.

次に、物体尤度算出部１１３は、物体のカテゴリ毎に、把持特徴量に基づく物体尤度を算出する（ステップＳ１０６）。ここで、物体尤度算出部１１３は、例えば、以下の物体尤度算出方法１〜３のいずれかに従って、物体尤度を算出する。 Next, the object likelihood calculating unit 113 calculates an object likelihood based on the gripping feature amount for each category of the object (step S106). Here, the object likelihood calculating unit 113 calculates the object likelihood according to any of the following object likelihood calculating methods 1 to 3, for example.

（物体尤度算出方法１）
図８は、本発明の第１の実施の形態における、把持特徴情報１１５の例を示す図である。図８の把持特徴情報１１５では、物体のカテゴリ毎に、当該物体を把持した場合の把持特徴量を示すインスタンスが登録されている。ここで、各カテゴリと当該カテゴリに対して登録された把持特徴量の対をインスタンスと呼ぶ。一つのカテゴリに対して、複数のインスタンスが登録されていてもよい。ここで、物体のカテゴリをＣ_ｉ（ｉ＝１，…，Ｍ、Ｍはカテゴリの数）、カテゴリＣ_ｉに対応する把持特徴量をＶ_ｉｊ（ｊ＝１，…，Ｒ（ｉ）、Ｒ（ｉ）はカテゴリＣ_ｉに対する把持特徴量の数）とすると、インスタンスは、（Ｃ_ｉ，Ｖ_ｉｊ）と表される。 (Object likelihood calculation method 1)
FIG. 8 is a diagram illustrating an example of the gripping feature information 115 according to the first embodiment of this invention. In the gripping feature information 115 in FIG. 8, for each object category, an instance indicating the gripping feature amount when the object is gripped is registered. Here, a pair of each feature and a gripping feature amount registered for the category is referred to as an instance. Multiple instances may be registered for one category. Here, the category of the object is C _i (i = 1,..., M, M is the number of categories), and the gripping feature amounts corresponding to the category C _i are V _ij (j = 1,..., R (i), R If (i) is the number of gripping feature values for category C _i ), the instance is represented as (C _i , V _ij ).

物体尤度算出部１１３は、把持特徴生成部１１２により生成された把持特徴量と把持特徴情報１１５に登録された各インスタンスの把持特徴量Ｖ_ｉｊとの距離を算出する。距離は、ユークリッド距離でもマンハッタン距離でも、その他の距離尺度でもよい。物体尤度算出部１１３は、把持特徴情報１１５に登録されたインスタンスの内、算出された距離が所定の閾値以下のインスタンスを抽出する。そして、物体尤度算出部１１３は、抽出したインスタンスを用いて、各カテゴリＣ_ｉの物体尤度Ｌ_ｉを、例えば、数９式により算出する。 The object likelihood calculating unit 113 calculates the distance between the gripping feature amount generated by the gripping feature generating unit 112 and the gripping feature amount V _{ij of} each instance registered in the gripping feature information 115. The distance may be Euclidean distance, Manhattan distance, or other distance measure. The object likelihood calculating unit 113 extracts an instance whose calculated distance is equal to or less than a predetermined threshold from the instances registered in the gripping feature information 115. Then, the object likelihood calculating unit 113 calculates the object likelihood L _i of each category C _i using, for example, Equation 9 using the extracted instance.

ここで、ｋは、抽出されたインスタンスの数、Ｎｃ_ｉは、ｋ個のインスタンスの内、カテゴリＣ_ｉに対応するインスタンス（Ｃ_ｉ，Ｖ_ｉｊ）の数である。 Here, k is the number of extracted instances, and Nc _i is the number of instances (C _i , V _ij ) corresponding to the category C _i out of k instances.

図９は、本発明の第１の実施の形態における、インスタンスの抽出例を示す図である。例えば、物体尤度算出部１１３は、図９に示すように、図８の把持特徴情報１１５に登録されたインスタンスの内、把持特徴生成部１１２により生成された把持特徴量Ｖ＝(1.5, 1.0, 0.3, …)との距離が閾値以下である１０個のインスタンスを抽出する。 FIG. 9 is a diagram illustrating an instance extraction example according to the first embodiment of this invention. For example, the object likelihood calculating unit 113, as shown in FIG. 9, among the instances registered in the gripping feature information 115 of FIG. 8, the gripping feature amount V = (1.5, 1.0) generated by the gripping feature generating unit 112. , 0.3,...), 10 instances whose distances are below the threshold are extracted.

図１０は、本発明の第１の実施の形態における、物体尤度の算出結果の例を示す図である。例えば、物体尤度算出部１１３は、抽出されたインスタンスの数をもとに、図１０のように物体尤度を算出する。 FIG. 10 is a diagram illustrating an example of the calculation result of the object likelihood in the first embodiment of the present invention. For example, the object likelihood calculating unit 113 calculates the object likelihood as shown in FIG. 10 based on the number of extracted instances.

なお、物体尤度算出部１１３は、距離が所定の閾値以下のインスタンスを抽出する代わりに、距離が小さい方から所定数のインスタンスを抽出してもよい。 Note that the object likelihood calculating unit 113 may extract a predetermined number of instances from a smaller distance instead of extracting instances whose distance is equal to or smaller than a predetermined threshold.

（物体尤度算出方法２）
図１１は、本発明の第１の実施の形態における、把持特徴情報１１５の他の例を示す図である。図１１の把持特徴情報１１５では、物体のカテゴリ毎に、把持特徴量空間での各点における物体尤度が登録されている。この場合、各点における物体尤度は、予め、最近傍密度推定法やカーネル密度推定法等により算出される。 (Object likelihood calculation method 2)
FIG. 11 is a diagram illustrating another example of the gripping feature information 115 in the first exemplary embodiment of the present invention. In the gripping feature information 115 in FIG. 11, the object likelihood at each point in the gripping feature amount space is registered for each object category. In this case, the object likelihood at each point is calculated in advance by a nearest neighbor density estimation method, a kernel density estimation method, or the like.

物体尤度算出部１１３は、把持特徴情報１１５を参照し、把持特徴生成部１１２により生成された把持特徴量に対応する物体尤度を取得することにより、各カテゴリＣ_ｉの物体尤度Ｌｉを算出する。 Object likelihood calculating unit 113 refers to the grip characteristic information 115, by obtaining the object likelihoods corresponding to the grip characteristic quantity generated by the gripping feature generation unit 112, the object likelihood Li for each category C _i calculate.

（物体尤度算出方法３）
把持特徴情報１１５には、例えば、Support Vector MachineやRandom Forest等、機械学習によって得られた学習結果が登録されていてもよい。 (Object likelihood calculation method 3)
In the gripping feature information 115, for example, a learning result obtained by machine learning such as Support Vector Machine or Random Forest may be registered.

例えば、Support Vector Machineを用いた場合、学習結果として、把持特徴空間内での識別境界面が登録される。この場合、物体尤度算出部１１３は、把持特徴生成部１１２により生成された把持特徴量と識別境界面との距離の関数により、各カテゴリＣ_ｉの物体尤度Ｌ_ｉを算出する。 For example, when Support Vector Machine is used, an identification boundary surface in the gripping feature space is registered as a learning result. In this case, the object likelihood calculating unit 113 calculates the object likelihood L _i of each category C _{i based on} a function of the distance between the gripping feature amount generated by the gripping feature generating unit 112 and the identification boundary surface.

また、Random Forestを用いた場合、学習結果として、識別器での識別境界とその識別器の重みが登録される。この場合、物体尤度算出部１１３は、重み付投票結果の関数により、各カテゴリＣ_ｉの物体尤度Ｌ_ｉを算出する。 When Random Forest is used, the learning boundary and the weight of the classifier are registered as learning results. In this case, the object likelihood calculating unit 113 calculates the object likelihood L _i of each category C _i using a function of the weighted vote result.

次に、把持対象認識部１１６は、算出した物体尤度を用いて、把持対象５０２のカテゴリを認識する（ステップＳ１０７）。ここで、把持対象認識部１１６は、例えば、数１０式に従って、把持対象５０２のカテゴリｉｄｘを特定する。 Next, the gripping target recognition unit 116 recognizes the category of the gripping target 502 using the calculated object likelihood (step S107). Here, the gripping target recognizing unit 116 identifies the category idx of the gripping target 502 according to, for example, Formula 10.

なお、Ｌ_ｔｈは、予め設定された、物体尤度Ｌ_ｉの最大値に対する閾値である。 Note that L _th is a preset threshold value for the maximum value of the object likelihood L _i .

例えば、把持対象認識部１１６は、図１０の物体尤度算出結果をもとに、把持対象５０２のカテゴリを、物体尤度が最大であるカテゴリＣ_１と特定する。 For example, the gripping target recognition unit 116, based on the object likelihood calculation results of FIG. 10, the category of the gripping target 502, object likelihood is identified as Category C ₁ is the largest.

把持対象認識部１１６は、ステップＳ１０６の結果に応じて、認識結果に、把持対象５０２のカテゴリのインデックス、または、「該当なし」を設定する（ステップＳ１０８）。 The gripping target recognition unit 116 sets the category index of the gripping target 502 or “not applicable” in the recognition result according to the result of step S106 (step S108).

例えば、把持対象認識部１１６は、認識結果に「カテゴリＣ_１」を設定する。 For example, the gripping target recognition unit 116 sets “category C ₁ ” as the recognition result.

なお、ステップＳ１０３において、検出された指の本数が２本未満の場合（ステップＳ１０３／Ｎ）、認識結果には、「該当なし」が設定される（ステップＳ１０４）。 In step S103, when the number of detected fingers is less than two (step S103 / N), “not applicable” is set as the recognition result (step S104).

最後に、把持対象認識部１１６は、利用者等へ、認識結果を出力する（ステップＳ１０９）。 Finally, the gripping target recognition unit 116 outputs the recognition result to the user or the like (step S109).

以上により、本発明の第１の実施の形態の動作が完了する。 Thus, the operation of the first exemplary embodiment of the present invention is completed.

なお、上述の説明では、把持対象認識部１１６は、物体尤度算出部１１３により算出された各カテゴリに対する物体尤度をもとに、物体のカテゴリを特定した。しかしながら、これに限らず、把持対象認識部１１６は、物体形状に対する物体尤度をもとに物体形状を特定し、さらに、各物体形状に関連づけられた物体のカテゴリを取得することにより、物体のカテゴリを特定してもよい。 In the above description, the gripping target recognizing unit 116 identifies the category of the object based on the object likelihood for each category calculated by the object likelihood calculating unit 113. However, the present invention is not limited to this, and the grasping target recognition unit 116 specifies the object shape based on the object likelihood with respect to the object shape, and further acquires the category of the object associated with each object shape. A category may be specified.

また、本発明の実施の形態では、把持手段５０１が人の手である場合を例に説明したが、これに限らず、人の手と同様に物体を把持できれば、把持手段５０１は、動物やロボット等の手でもよい。 Further, in the embodiment of the present invention, the case where the gripping unit 501 is a human hand has been described as an example. However, the present invention is not limited to this. A hand such as a robot may be used.

また、複数の可動部を動かすことにより物体を把持できれば、把持手段５０１は、手以外の形状であってもよい。この場合、所定部位として、例えば、把持手段５０１の各可動部が用いられ、所定部位の位置（可動部の位置）として、各可動部の先端や中心、関節等、指定された位置が検出される。また、複数の所定部位の位置間の位置関係（可動部間の位置関係）として、各可動部の位置の座標値や、可動部の位置間の距離等が用いられる。 Further, as long as an object can be gripped by moving a plurality of movable parts, the gripping means 501 may have a shape other than a hand. In this case, for example, each movable part of the gripping means 501 is used as the predetermined part, and a specified position such as the tip, center, or joint of each movable part is detected as the position of the predetermined part (position of the movable part). The Further, as the positional relationship between the positions of a plurality of predetermined parts (positional relationship between the movable parts), the coordinate value of the position of each movable part, the distance between the positions of the movable parts, and the like are used.

次に、本発明の実施の形態の基本的な構成を説明する。図１は、本発明の実施の形態の基本的な構成を示すブロック図である。 Next, the basic configuration of the embodiment of the present invention will be described. FIG. 1 is a block diagram showing a basic configuration of an embodiment of the present invention.

図１を参照すると、認識装置１００は、画像取得部１１０、把持特徴生成部１１２、及び、把持対象認識部１１６を含む。画像取得部１１０は、把持対象５０２を把持している把持手段５０１の画像を取得する。把持特徴生成部１１２は、画像における把持手段５０１の複数の所定部位間の位置関係を示す把持特徴を生成する。把持対象認識部１１６は、把持特徴をもとに、把持対象５０２を認識する。 Referring to FIG. 1, the recognition apparatus 100 includes an image acquisition unit 110, a gripping feature generation unit 112, and a gripping target recognition unit 116. The image acquisition unit 110 acquires an image of the gripping unit 501 that is gripping the gripping target 502. The grip feature generation unit 112 generates a grip feature indicating the positional relationship between a plurality of predetermined parts of the grip means 501 in the image. The grip target recognition unit 116 recognizes the grip target 502 based on the grip feature.

本発明の実施の形態によれば、把持対象５０２を把持している把持手段５０１の画像中に把持対象５０２が存在しない場合であっても、把持対象５０２を認識できる。その理由は、把持対象認識部１１６が、画像における把持手段５０１の複数の所定部位間の位置関係を示す把持特徴をもとに、把持対象５０２を認識するためである。これにより、把持対象５０２を把持している把持手段５０１の画像において、把持対象５０２が遮蔽されている場合であっても、把持対象５０２を認識できる。 According to the embodiment of the present invention, the gripping target 502 can be recognized even when the gripping target 502 does not exist in the image of the gripping means 501 gripping the gripping target 502. The reason is that the gripping target recognition unit 116 recognizes the gripping target 502 based on gripping characteristics indicating the positional relationship between a plurality of predetermined parts of the gripping means 501 in the image. Thereby, even if the gripping object 502 is shielded in the image of the gripping means 501 gripping the gripping target 502, the gripping target 502 can be recognized.

＜第２の実施の形態＞
次に、本発明の第２の実施の形態について説明する。 <Second Embodiment>
Next, a second embodiment of the present invention will be described.

本発明の第２の実施の形態では、把持特徴量に加えて、画像上の把持対象５０２の物体領域から物体特徴量を生成し、把持特徴量と物体特徴量とを用いて、把持物体を認識する。なお、本発明の第２の実施の形態では、把持特徴量基づく物体尤度を第１物体尤度、物体特徴量に基づく物体尤度を第２物体尤度と呼ぶ。 In the second embodiment of the present invention, in addition to the gripping feature amount, an object feature amount is generated from the object region of the gripping target 502 on the image, and the gripping object is detected using the gripping feature amount and the object feature amount. recognize. In the second embodiment of the present invention, the object likelihood based on the gripping feature amount is referred to as a first object likelihood, and the object likelihood based on the object feature amount is referred to as a second object likelihood.

はじめに、本発明の第２の実施の形態の構成を説明する。 First, the configuration of the second exemplary embodiment of the present invention will be described.

図１２は、本発明の第２の実施の形態における、認識装置２００の構成を示すブロック図である。 FIG. 12 is a block diagram showing the configuration of the recognition apparatus 200 in the second embodiment of the present invention.

図１２を参照すると、本発明の第２の実施の形態の認識装置２００は、画像取得部２１０、把持手段検出部２１１、把持特徴生成部２１２、第１物体尤度算出部２１３、把持特徴記憶部２１４を含む。認識装置２００は、さらに、把持対象検出部２２１、物体特徴生成部２２２、第２物体尤度算出部２２３、物体特徴記憶部２２４、統合尤度算出部２３０、及び、把持対象認識部２４０を含む。 Referring to FIG. 12, a recognition apparatus 200 according to the second embodiment of the present invention includes an image acquisition unit 210, a gripping means detection unit 211, a gripping feature generation unit 212, a first object likelihood calculation unit 213, and a gripping feature storage. Part 214. The recognition apparatus 200 further includes a gripping target detection unit 221, an object feature generation unit 222, a second object likelihood calculation unit 223, an object feature storage unit 224, an integrated likelihood calculation unit 230, and a gripping target recognition unit 240. .

画像取得部２１０、把持手段検出部２１１、及び、把持特徴生成部２１２は、それぞれ、本発明の第１の実施の形態における、画像取得部１１０、把持手段検出部１１１、及び、把持特徴生成部１１２と同様である。把持特徴記憶部２１４は、把持特徴情報１１５と同様の把持特徴情報２１５（「把持特徴量に基づく物体尤度（第１物体尤度）」を算出するための情報）を記憶する。第１物体尤度算出部２１３は、物体尤度算出部１１３と同様に、物体のカテゴリ毎に、第１物体尤度を算出する。 The image acquisition unit 210, the gripping unit detection unit 211, and the gripping feature generation unit 212 are respectively the image acquisition unit 110, the gripping unit detection unit 111, and the gripping feature generation unit in the first exemplary embodiment of the present invention. 112. The gripping feature storage unit 214 stores gripping feature information 215 similar to the gripping feature information 115 (information for calculating “object likelihood based on gripping feature amount (first object likelihood)”). Similar to the object likelihood calculating unit 113, the first object likelihood calculating unit 213 calculates a first object likelihood for each category of the object.

把持対象検出部２２１は、画像取得部１１０により取得された画像における、把持対象５０２の物体領域を検出する。ここで、把持対象検出部２２１は、例えば、背景が固定である場合に移動物体を検出する背景差分法を用いて、物体領域を検出してもよい。また、把持対象検出部２２１は、距離が所定の閾値よりも小さい（近い）画素を、物体領域として検出してもよい。また，把持対象検出部２２１は、ある把持対象候補領域（例えば画像の中心）に類似する周辺画素を把持対象に属する領域とみなすことにより、物体領域を検出してもよい。 The grip target detection unit 221 detects the object area of the grip target 502 in the image acquired by the image acquisition unit 110. Here, the gripping target detection unit 221 may detect the object region using, for example, a background subtraction method that detects a moving object when the background is fixed. Further, the gripping target detection unit 221 may detect a pixel whose distance is smaller (closer) than a predetermined threshold as an object region. Further, the gripping target detection unit 221 may detect the object region by regarding peripheral pixels similar to a certain gripping target candidate region (for example, the center of the image) as a region belonging to the gripping target.

物体特徴生成部２２２は、把持対象５０２の物体の特徴を表す物体特徴量として、把持対象５０２の色や模様等に係る特徴を示す物体特徴量を生成する。ここで、物体特徴として、例えば、色の出現頻度や色の配置を用いてもよい。また、物体特徴として、画像の輝度値の部分的な明暗パターンや、輝度値の変化方向、フィルタへの応答強度を用いてもよい。 The object feature generation unit 222 generates an object feature amount indicating a feature related to a color, a pattern, or the like of the gripping target 502 as an object feature amount representing the feature of the object of the gripping target 502. Here, for example, color appearance frequency or color arrangement may be used as the object feature. Further, as the object feature, a partial light / dark pattern of the luminance value of the image, a change direction of the luminance value, and a response intensity to the filter may be used.

物体特徴記憶部２２４は、物体特徴情報２２５を記憶する。物体特徴情報２２５は、認識すべき物体のカテゴリに対する、「物体特徴量に基づく物体尤度（第２物体尤度）」を算出するための情報である。 The object feature storage unit 224 stores object feature information 225. The object feature information 225 is information for calculating the “object likelihood based on the object feature amount (second object likelihood)” for the category of the object to be recognized.

第２物体尤度算出部２２３は、物体特徴生成部２２２により生成された物体特徴量と物体特徴記憶部２２４に記憶されている物体特徴情報２２５とを用いて、物体のカテゴリ毎に、第２物体尤度を算出する。 The second object likelihood calculation unit 223 uses the object feature amount generated by the object feature generation unit 222 and the object feature information 225 stored in the object feature storage unit 224 for each object category. Calculate object likelihood.

統合尤度算出部２３０は、第１物体尤度と第２物体尤度とを用いて、統合尤度を算出する。 The integrated likelihood calculating unit 230 calculates the integrated likelihood using the first object likelihood and the second object likelihood.

把持対象認識部２４０は、統合尤度算出部２３０により算出された統合尤度を用いて、把持対象５０２のカテゴリを認識する。 The gripping target recognition unit 240 recognizes the category of the gripping target 502 using the integrated likelihood calculated by the integrated likelihood calculation unit 230.

次に、本発明の第２の実施の形態の動作を説明する。 Next, the operation of the second exemplary embodiment of the present invention will be described.

図１３は、本発明の第２の実施の形態における、認識装置２００の動作を示すフローチャートである。 FIG. 13 is a flowchart showing the operation of the recognition apparatus 200 in the second embodiment of the present invention.

はじめに、画像取得部２１０は、把持対象５０２を把持している把持手段５０１の画像を取得する（ステップＳ２０１）。この場合、画像には、把持対象５０２の物体領域の少なくとも一部が含まれると仮定する。 First, the image acquisition unit 210 acquires an image of the gripping means 501 that is gripping the gripping target 502 (step S201). In this case, it is assumed that the image includes at least a part of the object region of the gripping target 502.

把持手段検出部２１１は、画像取得部２１０により取得された画像における、把持手段５０１の各指の位置、または、各指の位置と方向を検出する（ステップＳ２０２）。 The gripping means detection unit 211 detects the position of each finger of the gripping means 501 or the position and direction of each finger in the image acquired by the image acquisition unit 210 (step S202).

把持対象検出部２２１は、画像取得部２１０により取得された画像における、把持対象５０２の物体領域を検出する（ステップＳ２０３）。 The grip target detection unit 221 detects the object area of the grip target 502 in the image acquired by the image acquisition unit 210 (step S203).

把持手段検出部２１１は、画像において検出された指の本数が、２本以上かどうかを判定する（ステップＳ２０４）。 The gripping means detection unit 211 determines whether the number of fingers detected in the image is two or more (step S204).

ステップＳ２０４において、検出された指の本数が２本以上の場合（ステップＳ２０４／Ｙ）、把持特徴生成部２１２は、検出された各指の位置、または、各指の位置と方向をもとに、把持特徴量を生成する（ステップＳ２０６）。 In step S204, when the number of detected fingers is two or more (step S204 / Y), the gripping feature generation unit 212 determines the position of each finger or the position and direction of each finger. Then, a gripping feature amount is generated (step S206).

第１物体尤度算出部２１３は、生成された把持特徴量をもとに、物体のカテゴリ毎に第１物体尤度を算出する（ステップＳ２０７）。 The first object likelihood calculating unit 213 calculates the first object likelihood for each object category based on the generated gripping feature amount (step S207).

図１５は、本発明の第２の実施の形態における、統合尤度の算出結果の例を示す図である。 FIG. 15 is a diagram illustrating an example of the calculation result of the integrated likelihood in the second exemplary embodiment of the present invention.

例えば、第１物体尤度算出部２１３は、図８の把持特徴情報１１５に登録されたインスタンスの内、把持特徴生成部１１２により生成された把持特徴量との距離が閾値以下であるインスタンスを抽出する。そして、第１物体尤度算出部２１３は、抽出されたインスタンスの数をもとに、図１５のように第１物体尤度を算出する。 For example, the first object likelihood calculation unit 213 extracts an instance whose distance from the gripping feature amount generated by the gripping feature generation unit 112 is equal to or less than a threshold from the instances registered in the gripping feature information 115 in FIG. To do. Then, the first object likelihood calculating unit 213 calculates the first object likelihood as shown in FIG. 15 based on the number of extracted instances.

なお、ステップＳ２０４において、検出された指の本数が２本未満の場合（ステップＳ２０４／Ｎ）、全カテゴリに対する第１物体尤度に１が設定される（ステップＳ２０５）。 In step S204, when the number of detected fingers is less than two (step S204 / N), 1 is set as the first object likelihood for all categories (step S205).

次に、物体特徴生成部２２２は、検出された把持対象５０２の物体領域から、物体特徴量を生成する（ステップＳ２０８）。 Next, the object feature generation unit 222 generates an object feature amount from the detected object region of the gripping target 502 (step S208).

第２物体尤度算出部２２３は、生成された物体特徴量をもとに、物体のカテゴリ毎に、第２物体尤度を算出する（ステップＳ２０９）。ここで、第２物体尤度算出部２２３は、例えば、把持特徴量に基づく物体尤度（第１物体尤度）の算出方法と同様の方法で、第２物体尤度を算出する。 The second object likelihood calculating unit 223 calculates the second object likelihood for each category of the object based on the generated object feature amount (step S209). Here, the second object likelihood calculating unit 223 calculates the second object likelihood by, for example, the same method as the object likelihood (first object likelihood) calculation method based on the gripping feature amount.

図１４は、本発明の第２の実施の形態における、物体特徴情報２２５の例を示す図である。図１４の物体特徴情報２２５では、物体のカテゴリ毎に、当該物体の物体特徴量を示すインスタンスが登録されている。ここで、物体特徴量は、例えば、物体の色や模様等の物体特徴を表す。 FIG. 14 is a diagram illustrating an example of the object feature information 225 according to the second embodiment of the present invention. In the object feature information 225 in FIG. 14, an instance indicating the object feature amount of the object is registered for each object category. Here, the object feature amount represents, for example, an object feature such as an object color or a pattern.

例えば、第２物体尤度算出部２２３は、図１４の物体特徴情報２２５に登録されたインスタンスの内、物体特徴生成部２２２により生成された物体特徴量との距離が閾値以下であるインスタンスを抽出する。そして、第２物体尤度算出部２２３は、抽出されたインスタンスの数をもとに、図１５のように第２物体尤度を算出する。 For example, the second object likelihood calculating unit 223 extracts an instance whose distance from the object feature amount generated by the object feature generating unit 222 is equal to or less than a threshold from the instances registered in the object feature information 225 of FIG. To do. Then, the second object likelihood calculating unit 223 calculates the second object likelihood as shown in FIG. 15 based on the number of extracted instances.

次に、統合尤度算出部２３０は、第１物体尤度と第２物体尤度とを用いて、統合尤度を算出する（ステップＳ２１０）。統合尤度算出部２３０は、各カテゴリＣ_ｉの統合尤度Ｌ_ｃｏｍｂ（ｉ）を、例えば、数１１式、または、数１２式により算出する。 Next, the integrated likelihood calculating unit 230 calculates an integrated likelihood using the first object likelihood and the second object likelihood (step S210). The integrated likelihood calculating unit 230 calculates the integrated likelihood L _comb (i) of each category C _i using, for example, Equation 11 or Equation 12.

ここで、Ｌ_Ａ（ｉ）、Ｌ_Ｂ（ｉ）は、それぞれ、カテゴリＣ_ｉの第１物体尤度、第２物体尤度である。 Here, L _A (i) and L _B (i) are the first object likelihood and the second object likelihood of category C _i , respectively.

例えば、統合尤度算出部２３０は、図１５のように統合尤度を算出する。 For example, the integrated likelihood calculating unit 230 calculates the integrated likelihood as shown in FIG.

次に、把持対象認識部２４０は、算出した統合尤度を用いて、把持対象５０２のカテゴリを認識する（ステップＳ２１１）。ここで、把持対象認識部２４０は、例えば、数１３式に従って、把持対象５０２のカテゴリｉｄｘを特定する。 Next, the gripping target recognition unit 240 recognizes the category of the gripping target 502 using the calculated integrated likelihood (step S211). Here, the gripping target recognition unit 240 identifies the category idx of the gripping target 502 according to, for example, Equation 13.

ここで、Ｌ_{ｔｈ＿ｃｏｍｂ}は、予め設定された、統合尤度Ｌ_ｃｏｍｂ（ｉ）の最大値に対する閾値である。 Here, L _{th_comb} is a threshold for a maximum value of the integrated likelihood L _comb (i) set in advance.

例えば、把持対象認識部２４０は、図１５の統合尤度算出結果をもとに、把持対象５０２のカテゴリを、統合尤度が最大であるカテゴリＣ_３と特定する。 For example, the gripping target recognition unit 240, based on the integrated likelihood calculation results of FIG. 15, the category of the gripping target 502, integrated likelihood is specified as category C ₃ is the maximum.

把持対象認識部１１６は、ステップＳ２１１の結果に応じて、認識結果に、把持対象５０２のカテゴリのインデックス、または、「該当なし」を設定し（ステップＳ２１２）、出力する（ステップＳ２１３）。 The gripping target recognition unit 116 sets the category index of the gripping target 502 or “not applicable” in the recognition result according to the result of step S211 (step S212) and outputs the result (step S213).

以上により、本発明の第２の実施の形態の動作が完了する。 Thus, the operation of the second exemplary embodiment of the present invention is completed.

本発明の第２の実施の形態によれば、本発明の第１の実施の形態に比べて、把持対象５０２の認識精度を向上できる。その理由は、把持対象認識部２４０が、把持特徴量に基づく第１物体尤度と物体特徴量に基づく第２物体尤度を用いて算出された統合尤度をもとに、把持対象５０２を認識するためである。これにより、例えば、把持対象５０２を把持している把持手段５０１の画像において、把持対象５０２のほとんどが遮蔽されているが、一部が存在するような場合に、把持対象５０２の認識精度を向上できる。 According to the second embodiment of the present invention, the recognition accuracy of the grasped object 502 can be improved as compared with the first embodiment of the present invention. The reason is that the gripping target recognition unit 240 determines the gripping target 502 based on the integrated likelihood calculated using the first object likelihood based on the gripping feature amount and the second object likelihood based on the object feature amount. This is for recognition. Thereby, for example, in the image of the gripping means 501 that is gripping the gripping target 502, most of the gripping target 502 is shielded, but the recognition accuracy of the gripping target 502 is improved when a part of the gripping target 502 exists. it can.

＜第３の実施の形態＞
次に、本発明の第３の実施の形態について説明する。 <Third Embodiment>
Next, a third embodiment of the present invention will be described.

本発明の第３の実施の形態では、把持手段５０１の複数の所定部位の内、把持対象５０２と接触している部位（把持対象５０２と接触している指）について、把持特徴量を生成する。 In the third embodiment of the present invention, a gripping feature amount is generated for a part in contact with the gripping target 502 (finger in contact with the gripping target 502) among a plurality of predetermined parts of the gripping means 501. .

はじめに、本発明の第３の実施の形態の構成を説明する。 First, the configuration of the third exemplary embodiment of the present invention will be described.

図１６は、本発明の第３の実施の形態における、認識装置２００の構成を示すブロック図である。 FIG. 16 is a block diagram showing the configuration of the recognition apparatus 200 in the third embodiment of the present invention.

図１６を参照すると、本発明の第３の実施の形態の認識装置２００は、本発明の第２の実施の形態の認識装置２００の構成要素に加えて、接触検出部２５０を含む。 Referring to FIG. 16, a recognition device 200 according to the third embodiment of the present invention includes a contact detection unit 250 in addition to the components of the recognition device 200 according to the second embodiment of the present invention.

接触検出部２５０は、把持対象検出部２２１により検出された把持対象５０２の物体領域と、把持手段検出部１１１により検出された各指の位置とをもとに、検出された指の内の把持対象５０２と接触している指（接触指）を特定する。接触検出部２５０は、例えば、注目する指の位置を示す座標値と、その座標値に最も近い物体領域との距離が所定の閾値未満の場合、当該指が接触指であると判定する。 The contact detection unit 250 detects the gripping of the detected finger based on the object area of the gripping target 502 detected by the gripping target detection unit 221 and the position of each finger detected by the gripping means detection unit 111. The finger (contact finger) in contact with the target 502 is specified. For example, when the distance between the coordinate value indicating the position of the finger of interest and the object region closest to the coordinate value is less than a predetermined threshold, the contact detection unit 250 determines that the finger is a contact finger.

把持特徴生成部２１２は、接触指に係る所定部位間の位置関係を表す把持特徴量を生成する。 The gripping feature generating unit 212 generates a gripping feature amount that represents the positional relationship between the predetermined parts related to the contact finger.

次に、本発明の第３の実施の形態の動作を説明する。 Next, the operation of the third exemplary embodiment of the present invention will be described.

図１７は、本発明の第３の実施の形態における、認識装置２００の動作を示すフローチャートである。 FIG. 17 is a flowchart showing the operation of the recognition apparatus 200 in the third embodiment of the present invention.

はじめに、画像取得部２１０は、把持対象５０２を把持している把持手段５０１の画像を取得する（ステップＳ３０１）。 First, the image acquisition unit 210 acquires an image of the gripping means 501 that is gripping the gripping target 502 (step S301).

把持手段検出部２１１は、画像取得部２１０により取得された画像における、各指の位置、または、各指の位置と方向を検出する（ステップＳ３０２）。 The grip means detection unit 211 detects the position of each finger or the position and direction of each finger in the image acquired by the image acquisition unit 210 (step S302).

把持対象検出部２２１は、画像取得部２１０により取得された画像における、把持対象５０２の物体領域を検出する（ステップＳ３０３）。 The grip target detection unit 221 detects the object region of the grip target 502 in the image acquired by the image acquisition unit 210 (step S303).

把持手段検出部２１１は、画像において検出された指の本数が、２本以上かどうかを判定する（ステップＳ３０４）。 The gripping means detection unit 211 determines whether the number of fingers detected in the image is two or more (step S304).

ステップＳ３０４において、検出された指の本数が２本以上の場合（ステップＳ３０４／Ｙ）、接触検出部２５０は、検出された指の内の接触指を特定する（ステップＳ３０６）。 In step S304, when the number of detected fingers is two or more (step S304 / Y), the contact detection unit 250 identifies a contact finger among the detected fingers (step S306).

接触検出部２５０は、接触指の本数が、２本以上かどうかを判定する（ステップＳ３０７）。 The contact detection unit 250 determines whether the number of contact fingers is two or more (step S307).

ステップＳ３０７において、接触指の本数が２本以上の場合（ステップＳ３０７／Ｙ）、把持特徴生成部２１２は、検出された各接触指の位置、または、各接触指の位置と方向をもとに、接触指間の位置関係を表す把持特徴量を生成する（ステップＳ３０８）。 In step S307, when the number of contact fingers is two or more (step S307 / Y), the gripping feature generation unit 212 determines the position of each contact finger or the position and direction of each contact finger. Then, a gripping feature amount representing the positional relationship between the contact fingers is generated (step S308).

第１物体尤度算出部２１３は、生成された把持特徴量をもとに、物体のカテゴリ毎に第１物体尤度を算出する（ステップ３０９）。 The first object likelihood calculating unit 213 calculates the first object likelihood for each object category based on the generated gripping feature amount (step 309).

なお、ステップＳ３０４において、検出された指の本数が２本未満の場合（ステップＳ３０４／Ｎ）、または、ステップＳ３０７において、接触指の本数が２本未満の場合（ステップＳ３０７／Ｎ）、全カテゴリに対する第１物体尤度が１に設定される。 If the number of fingers detected in step S304 is less than 2 (step S304 / N), or if the number of contact fingers is less than 2 in step S307 (step S307 / N), all categories The first object likelihood for is set to 1.

以降、物体特徴量の生成、第２物体尤度の算出、統合尤度の算出、及び、把持対象５０２のカテゴリの認識（ステップＳ３１０〜Ｓ３１５）が、本発明の第２の実施の形態（ステップＳ２０８〜Ｓ２１３）と同様に行われる。 Thereafter, the generation of the object feature amount, the calculation of the second object likelihood, the calculation of the integrated likelihood, and the recognition of the category of the gripping target 502 (steps S310 to S315) are performed in the second embodiment (step S208 to S213) are performed.

以上により、本発明の第３の実施の形態の動作が完了する。 Thus, the operation of the third embodiment of the present invention is completed.

本発明の第３の実施の形態によれば、本発明の第１の実施の形態に比べて、把持対象５０２の認識精度を向上できる。その理由は、把持特徴生成部２１２が、把持手段５０１の複数の所定部位の内、把持対象５０２と接触している部位（把持対象５０２と接触している指）間の位置関係を表す把持特徴量を生成するためである。これにより、把持特徴量から、把持対象５０２と接触していない部位（接触していない指）に係る情報を除外することができ、把持手段５０１による把持に寄与していない部位の位置の影響を受けずに、把持対象５０２のカテゴリを特定できる。 According to the third embodiment of the present invention, the recognition accuracy of the grasped object 502 can be improved as compared with the first embodiment of the present invention. The reason is that the gripping feature generation unit 212 indicates the positional relationship between the parts that are in contact with the gripping target 502 (the finger that is in contact with the gripping target 502) among the plurality of predetermined parts of the gripping means 501. This is to produce a quantity. As a result, it is possible to exclude information relating to a part that is not in contact with the gripping target 502 (a finger that is not in contact) from the gripping feature amount, and to influence the position of the part that does not contribute to gripping by the gripping means 501 The category of the gripping target 502 can be specified without receiving it.

＜第４の実施の形態＞
次に、本発明の第４の実施の形態について説明する。 <Fourth embodiment>
Next, a fourth embodiment of the present invention will be described.

本発明の第４の実施の形態では、把持手段５０１の複数の所定部位の間の位置関係の時間的な変化を示す把持特徴を生成する。 In the fourth embodiment of the present invention, a gripping feature indicating a temporal change in the positional relationship between a plurality of predetermined parts of the gripping means 501 is generated.

はじめに、本発明の第４の実施の形態の構成を説明する。 First, the configuration of the fourth embodiment of the present invention will be described.

図１８は、本発明の第４の実施の形態における、認識装置２００の構成を示すブロック図である。 FIG. 18 is a block diagram showing the configuration of the recognition device 200 in the fourth embodiment of the present invention.

図１８を参照すると、本発明の第４の実施の形態の構成は、本発明の第２の実施の形態において、把持特徴生成部２１２が把持特徴生成部２６０に置き換えられている。 Referring to FIG. 18, in the configuration of the fourth embodiment of the present invention, the gripping feature generation unit 212 is replaced with a gripping feature generation unit 260 in the second embodiment of the present invention.

把持特徴生成部２６０は、把持手段５０１の複数の所定部位の間の位置関係の時間的な変化を示す、動き特徴を含む把持特徴量を生成する。把持特徴生成部２６０は、フレーム特徴生成部２６１、フレーム特徴記憶部２６２、及び、動き特徴抽出部２６３を含む。 The gripping feature generation unit 260 generates a gripping feature amount including a motion feature that indicates a temporal change in the positional relationship between a plurality of predetermined parts of the gripping unit 501. The gripping feature generation unit 260 includes a frame feature generation unit 261, a frame feature storage unit 262, and a motion feature extraction unit 263.

フレーム特徴生成部２６１は、把持特徴生成部２１２と同様の方法により、画像のフレーム毎の把持特徴量を生成する。 The frame feature generation unit 261 generates a grip feature amount for each frame of the image by the same method as the grip feature generation unit 212.

フレーム特徴記憶部２６２は、フレーム特徴生成部２６１により生成された、フレーム毎の把持特徴量を、所定のフレーム数分記憶する。 The frame feature storage unit 262 stores the gripping feature amount for each frame generated by the frame feature generation unit 261 for a predetermined number of frames.

動き特徴抽出部２６３は、フレーム毎の把持特徴量の差分をもとに、動き特徴を抽出し、動き特徴を含む把持特徴量を生成する。 The motion feature extraction unit 263 extracts a motion feature based on the difference between grip feature amounts for each frame, and generates a grip feature amount including the motion feature.

次に、本発明の第４の実施の形態の動作を説明する。 Next, the operation of the fourth exemplary embodiment of the present invention will be described.

図１９は、本発明の第４の実施の形態における、認識装置２００の動作を示すフローチャートである。 FIG. 19 is a flowchart showing the operation of the recognition apparatus 200 in the fourth embodiment of the present invention.

はじめに、把持特徴生成部２６０のフレーム特徴生成部２６１は、フレームを示す変数ｔ（ｔ＝１,…，Ｎｆ。Ｎｆは、動き特徴を含む把持特徴量を生成するためのフレーム数）に１を設定する（ステップＳ４０１）。 First, the frame feature generation unit 261 of the gripping feature generation unit 260 sets 1 to a variable t indicating a frame (t = 1,..., Nf, where Nf is the number of frames for generating a gripping feature amount including a motion feature). Set (step S401).

画像取得部２１０は、把持対象５０２を把持している把持手段５０１の画像を１フレーム取得する（ステップＳ４０２）。 The image acquisition unit 210 acquires one frame of the image of the gripping means 501 that is gripping the gripping target 502 (step S402).

把持手段検出部２１１は、画像取得部２１０により取得されたフレーム（対象フレーム）における、各指の位置、または、各指の位置と方向を検出する（ステップＳ４０３）。 The grip means detection unit 211 detects the position of each finger or the position and direction of each finger in the frame (target frame) acquired by the image acquisition unit 210 (step S403).

把持対象検出部２２１は、画像取得部２１０により取得された対象フレームにおける、把持対象５０２の物体領域を検出する（ステップＳ４０４）。 The grip target detection unit 221 detects the object region of the grip target 502 in the target frame acquired by the image acquisition unit 210 (step S404).

把持手段検出部２１１は、対象フレームにおいて検出された指の本数が、２本以上かどうかを判定する（ステップＳ４０５）。 The grip means detection unit 211 determines whether the number of fingers detected in the target frame is two or more (step S405).

ステップＳ４０５において、検出された指の本数が２本以上の場合（ステップＳ４０５／Ｙ）、フレーム特徴生成部２６１は、検出された各指の位置、または、各指の位置と方向をもとに、対象フレームｔでの把持特徴量Ｖ（ｔ）を生成する（ステップＳ４０７）。ここで、フレーム特徴生成部２６１は、把持特徴量Ｖ（ｔ）として、例えば、本発明の第１の実施の形態の把持特徴量生成方法で示した、把持特徴量Ｖ_Ａ、Ｖ_Ｂ、Ｖ_Ｃ、Ｖ_Ｄの内のいずれかを生成する。 In step S405, when the number of detected fingers is two or more (step S405 / Y), the frame feature generation unit 261 determines the position of each finger or the position and direction of each finger. Then, a gripping feature amount V (t) at the target frame t is generated (step S407). Here, the frame feature generation unit 261 uses the gripping feature amounts V _A , V _{B, and} V shown in the gripping feature amount generation method according to the first embodiment of the present invention as the gripping feature amount V (t), for example. _C, and generate either of _{V D.}

フレーム特徴生成部２６１は、変数ｔがＮｆ以上かどうかを判定する（ステップＳ４０８）。 The frame feature generation unit 261 determines whether the variable t is greater than or equal to Nf (step S408).

ステップＳ４０８で、変数ｔがＮｆ未満の場合（ステップＳ４０８／Ｎ）、フレーム特徴生成部２６１は、生成した把持特徴量Ｖ（ｔ）をフレーム特徴記憶部２６２に保存し、変数ｔに１を加算する（ステップＳ４０９）。そして、ステップＳ４０２からの処理が繰り返される。 If the variable t is less than Nf in step S408 (step S408 / N), the frame feature generation unit 261 stores the generated gripping feature amount V (t) in the frame feature storage unit 262, and adds 1 to the variable t. (Step S409). Then, the processing from step S402 is repeated.

一方、ステップＳ４０８で、変数ｔがＮｆ以上の場合（ステップＳ４０８／Ｙ）、動き特徴抽出部２６３は、フレーム特徴記憶部２６２に記憶されている、フレーム毎の把持特徴量の差分を算出する（ステップＳ４１０）。 On the other hand, if the variable t is greater than or equal to Nf in step S408 (step S408 / Y), the motion feature extraction unit 263 calculates the difference between the gripping feature amounts for each frame stored in the frame feature storage unit 262 ( Step S410).

動き特徴抽出部２６３は、算出した差分をもとに、動き特徴を含む把持特徴量を生成する（ステップＳ４１１）。 The motion feature extraction unit 263 generates a gripping feature amount including a motion feature based on the calculated difference (step S411).

ここで、動き特徴抽出部２６３は、例えば、数１４式により、動き特徴を含む把持特徴量Ｖ_movを生成する。 Here, the motion feature extraction unit 263 generates the gripping feature amount V _mov including the motion feature using, for example, Equation 14.

図２０は、本発明の第４の実施の形態における、動き特徴を含む把持特徴量の算出例を示す図である。図２０は、Ｎｆが３の場合の例である。例えば、動き特徴抽出部２６３は、ｔ＝１，２，３における把持特徴量Ｖ（ｔ）をもとに、図２０のように、動き特徴を含む把持特徴量Ｖ_movを生成する。 FIG. 20 is a diagram illustrating a calculation example of the gripping feature amount including the motion feature according to the fourth embodiment of the present invention. FIG. 20 shows an example when Nf is 3. For example, the motion feature extraction unit 263 generates a gripping feature amount V _mov including a motion feature as illustrated in FIG. 20 based on the gripping feature amount V (t) at t = 1, 2, and 3.

次に、第１物体尤度算出部２１３は、生成された動き特徴を含む把持特徴量をもとに、物体のカテゴリ毎に第１物体尤度を算出する（ステップＳ４１２）。 Next, the first object likelihood calculating unit 213 calculates the first object likelihood for each category of the object based on the gripping feature amount including the generated motion feature (step S412).

なお、ステップＳ４０５において、検出された指の本数が２本未満の場合（ステップＳ４０５／Ｎ）、全カテゴリに対する第１物体尤度に１が設定される（ステップＳ４０６）。 In step S405, when the number of detected fingers is less than 2 (step S405 / N), 1 is set as the first object likelihood for all categories (step S406).

次に、物体特徴生成部２２２は、検出された把持対象５０２の物体領域から、物体特徴量を生成する（ステップＳ４１３）。ここで、物体特徴生成部２２２は、Ｎｆ個のフレームの内、１番目のフレームや、Ｎｆ番目のフレームで検出された把持対象５０２の物体領域から、物体特徴量を生成する。また、物体特徴生成部２２２は、Ｎｆ個のフレームの各々で検出された把持対象５０２の物体領域から生成した物体特徴量の平均値を算出してもよい。 Next, the object feature generation unit 222 generates an object feature amount from the detected object region of the gripping target 502 (step S413). Here, the object feature generation unit 222 generates an object feature amount from the object region of the gripping target 502 detected in the first frame or the Nf-th frame among the Nf frames. Further, the object feature generation unit 222 may calculate the average value of the object feature amounts generated from the object region of the gripping target 502 detected in each of the Nf frames.

以降、第２物体尤度の算出、統合尤度の算出、及び、把持対象５０２のカテゴリの認識（ステップＳ４１４〜Ｓ４１８）が、本発明の第２の実施の形態（ステップＳ２０９〜Ｓ２１３）と同様に行われる。 Thereafter, the calculation of the second object likelihood, the calculation of the integrated likelihood, and the recognition of the category of the gripping target 502 (steps S414 to S418) are the same as in the second embodiment (steps S209 to S213) of the present invention. To be done.

以上により、本発明の第４の実施の形態の動作が完了する。 Thus, the operation of the fourth exemplary embodiment of the present invention is completed.

本発明の第４の実施の形態によれば、把持手段５０１による把持に時間的な変化がある場合に、把持対象５０２の認識精度を向上できる。その理由は、把持特徴生成部２６０が、把持手段５０１の複数の所定部位の間の位置関係の時間的な変化を示す、動き特徴を含む把持特徴量を生成するためである。これにより、例えば、柔らかい物体等、把持手段５０１による把持中に、形状が時間的に変化する把持対象５０２を、硬い物体等、形状が時間的に変化しない把持対象５０２と識別できる。また、スマートフォンの操作等、把持手段５０１の所定部位（指）が、把持対象５０２上を移動する場合にも、把持対象５０２の認識精度を向上できる。 According to the fourth embodiment of the present invention, when there is a temporal change in gripping by the gripping means 501, the recognition accuracy of the gripping target 502 can be improved. The reason is that the gripping feature generation unit 260 generates a gripping feature amount including a motion feature that indicates a temporal change in the positional relationship between a plurality of predetermined parts of the gripping means 501. Thereby, for example, a gripping target 502 whose shape changes with time during gripping by the gripping means 501 such as a soft object can be distinguished from a gripping target 502 whose shape does not change with time, such as a hard object. In addition, even when a predetermined part (finger) of the gripping unit 501 moves on the grip target 502 such as a smartphone operation, the recognition accuracy of the grip target 502 can be improved.

以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 While the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

例えば、本発明の第３の実施の形態で説明した、把持手段５０１の複数の所定部位の内の把持対象５０２と接触している部位についての把持特徴量の生成は、把持対象５０２との接触が検出できれば、本発明の第１の実施に適用されてもよい。 For example, as described in the third embodiment of the present invention, the generation of the gripping feature amount for the part in contact with the gripping target 502 among the plurality of predetermined parts of the gripping unit 501 is performed by the contact with the gripping target 502. May be applied to the first embodiment of the present invention.

また、本発明の第４の実施の形態で説明した、動き特徴を含む把持特徴量の生成は、本発明の第１の実施や第２の実施の形態に適用されてもよい。 In addition, the generation of the gripping feature amount including the movement feature described in the fourth embodiment of the present invention may be applied to the first embodiment or the second embodiment of the present invention.

１００認識装置
１０１ＣＰＵ
１０２記憶デバイス
１０３通信デバイス
１０４入力デバイス
１０５出力デバイス
１１０画像取得部
１１１把持手段検出部
１１２把持特徴生成部
１１３物体尤度算出部
１１４把持特徴記憶部
１１５把持特徴情報
１１６把持対象認識部
２００認識装置
２１０画像取得部
２１１把持手段検出部
２１２把持特徴生成部
２１３第１物体尤度算出部
２１４把持特徴記憶部
２１５把持特徴情報
２２１把持対象検出部
２２２物体特徴生成部
２２３第２物体尤度算出部
２２４物体特徴記憶部
２２５物体特徴情報
２３０統合尤度算出部
２４０把持対象認識部
２５０接触検出部
２６０把持特徴生成部
２６１フレーム特徴生成部
２６２フレーム特徴記憶部
２６３動き特徴抽出部
５０１把持手段
５０２把持対象 100 recognition device 101 CPU
DESCRIPTION OF SYMBOLS 102 Storage device 103 Communication device 104 Input device 105 Output device 110 Image acquisition part 111 Grasping means detection part 112 Grasping feature production | generation part 113 Object likelihood calculation part 114 Grasping feature memory | storage part 115 Grasping feature information 116 Grasping object recognition part 200 Recognition apparatus 210 Image acquisition unit 211 Grasping means detection unit 212 Grasping feature generation unit 213 First object likelihood calculation unit 214 Grasping feature storage unit 215 Grasping feature information 221 Grasping target detection unit 222 Object feature generation unit 223 Second object likelihood calculation unit 224 Object Feature storage unit 225 Object feature information 230 Integrated likelihood calculation unit 240 Grasping object recognition unit 250 Touch detection unit 260 Grasping feature generation unit 261 Frame feature generation unit 262 Frame feature storage unit 263 Motion feature extraction unit 501 Grasping means 502 Grasping object

Claims

An image acquisition means for acquiring an image of the grip means holding the grip target;
A gripping feature generating means for generating a gripping feature indicating a positional relationship between a plurality of predetermined parts of the gripping means in the image;
A gripping object recognition means for recognizing the gripping object based on the gripping features;
An information processing apparatus comprising:

Furthermore, an object feature generating means for generating an object feature indicating at least one of the color and pattern of the grip target in the image,
The information processing apparatus according to claim 1, wherein the gripping target recognition unit recognizes the gripping target based on the gripping feature and the object feature.

The gripping feature generating means generates the gripping feature indicating a positional relationship between parts in contact with the gripping target among the plurality of predetermined parts.
The information processing apparatus according to claim 1 or 2.

The gripping feature generating means generates a gripping feature indicating a temporal change in a positional relationship between the plurality of predetermined parts;
The information processing apparatus according to claim 1.

The gripping feature indicates a positional relationship between the plurality of predetermined portions by a coordinate value indicating the position of each of the plurality of predetermined portions.
The information processing apparatus according to claim 1.

The gripping feature indicates a positional relationship between the plurality of predetermined portions by a coordinate value indicating the position of each of the plurality of predetermined portions and a direction of the predetermined portion.
The information processing apparatus according to claim 5.

The gripping feature indicates a positional relationship between the plurality of predetermined portions by a distance between the plurality of predetermined portions.
The information processing apparatus according to claim 1.

The predetermined part is a finger of the gripping means.
The information processing apparatus according to claim 1.

Obtain an image of the gripping means that grips the gripping target,
Generating a gripping feature indicating a positional relationship between a plurality of predetermined parts of the gripping means in the image, and recognizing the gripping object based on the gripping feature;
Information processing method.

On the computer,
Obtain an image of the gripping means that grips the gripping target,
Generating a gripping feature indicating a positional relationship between a plurality of predetermined parts of the gripping means in the image, and recognizing the gripping object based on the gripping feature;
A program that executes processing.