JP6739896B2

JP6739896B2 - Information processing device, information processing method, and program

Info

Publication number: JP6739896B2
Application number: JP2014254080A
Authority: JP
Inventors: 壮馬白石; 哲夫井下
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2014-12-16
Filing date: 2014-12-16
Publication date: 2020-08-12
Anticipated expiration: 2034-12-16
Also published as: JP2016115179A

Description

本発明は、情報処理装置、情報処理方法、及び、プログラムに関し、特に、把持手段により把持されている物体を認識するための情報処理装置、情報処理方法、及び、プログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program, and more particularly to an information processing device, an information processing method, and a program for recognizing an object held by a holding means.

人等（以下、把持者とも記載する）が手の指等（以下、把持手段とも記載する）で物体（以下、把持対象とも記載する）を把持している状態を撮影した画像を用いて、把持対象を認識する技術が知られている。 Using an image obtained by capturing a state in which a person or the like (hereinafter, also referred to as a gripper) grips an object (hereinafter, also referred to as a gripping target) with a finger or the like (hereinafter, also described as gripping means), A technique for recognizing a grip target is known.

例えば、特許文献１に開示されている技術では、画像中の対象物の部分領域とデータベース内の物品形状とのマッチングを行うときに、手の把持姿勢から算出された空間の大きさで、マッチング対象の物品を限定する。 For example, in the technique disclosed in Patent Document 1, when matching a partial area of an object in an image with an article shape in a database, matching is performed with the size of the space calculated from the gripping posture of the hand. Limit the target items.

また、非特許文献１に開示されている技術では、画像中の手領域と物体領域の形状の対の特徴を、把持パターン毎に記憶し、当該特徴を用いて把持パターンを認識する。 In addition, in the technique disclosed in Non-Patent Document 1, the feature of the pair of shapes of the hand region and the object region in the image is stored for each gripping pattern, and the gripping pattern is recognized using the feature.

なお、関連技術として、特許文献２には、画像から検出された指の位置を用いて、操作対象の仮想キーを特定する技術が開示されている。特許文献３には、画像に含まれる対象物を、カラーヒストグラムを用いて認識する技術が開示されている。特許文献４には、動画像中の対象物の面積や外周長の変化の特徴をもとに、対象物を特定する技術が開示されている。非特許文献２には、指毎に指定色のついたグローブ（手袋）を着用して撮影した画像中で、指定色を探すことによって、指領域を検出する技術が開示されている。 As a related technique, Patent Document 2 discloses a technique of identifying a virtual key to be operated by using the position of a finger detected from an image. Patent Document 3 discloses a technique for recognizing an object included in an image using a color histogram. Patent Document 4 discloses a technique for identifying an object based on the characteristics of the area of the object in the moving image and the change in the outer peripheral length. Non-Patent Document 2 discloses a technique for detecting a finger area by searching for a designated color in an image taken by wearing gloves (gloves) having a designated color for each finger.

特開２０１０−２４４４１３号公報JP, 2010-244413, A 特開２０１３−１４３０８２号公報JP, 2013-143082, A 特開２０１２−１５０５５２号公報JP, 2012-150552, A 特開２０１０−２４４４４０号公報JP, 2010-244440, A

笠原啓雅、他３名、「把持パターン画像の学習に基づく欠損画素復元と物体認識」、画像の認識・理解シンポジウム（ＭＩＲＵ２００８）、２００８年７月、p.623-628Hiromasa Kasahara, 3 others, “Reconstruction of missing pixels and object recognition based on learning of grip pattern image”, Image Recognition and Understanding Symposium (MIRU2008), July 2008, p.623-628 渡辺賢、他３名、「カラーグローブを用いた指文字の認識」、電子情報通信学会論文誌、D-II、１９９７年、vol. J80-D-2、no. 10、p.2713-2722Ken Watanabe and 3 others, "Recognition of finger characters using color gloves", IEICE Transactions, D-II, 1997, vol. J80-D-2, no. 10, p. 2713-2722

上述のように、把持手段が把持対象を把持している状態を撮影した画像で把持対象を認識する場合、例えば、指や手のひらにより把持対象が覆われてしまい、画像内に把持対象が存在しない場合がある。しかしながら、特許文献１、及び、非特許文献に記載された技術では、把持対象の部分的な画像を用いて把持対象を認識しているため、このように画像内に把持対象が存在しない場合は、把持対象を認識できない。 As described above, when the gripping unit recognizes the gripping target in the image of the state in which the gripping target grips the gripping target, for example, the gripping target is covered by the finger or the palm, and the gripping target does not exist in the image. There are cases. However, in the techniques described in Patent Document 1 and Non-Patent Document, since the gripping target is recognized using a partial image of the gripping target, when the gripping target does not exist in the image in this way, , The gripping target cannot be recognized.

本発明は、上述の課題を解決し、把持対象を把持している把持手段の画像中に把持対象が存在しない場合であっても、把持対象を認識できる、情報処理装置、情報処理方法、及び、プログラムを提供することである。 The present invention solves the above-mentioned problems, and an information processing apparatus, an information processing method, and an information processing method capable of recognizing a grip target even when the grip target does not exist in an image of a grip unit that grips the grip target. , To provide the program.

本発明の情報処理装置は、把持対象を把持している把持手段の画像を取得する画像取得手段と、前記画像における前記把持手段の複数の所定部位間の位置関係を示す把持特徴を生成する、把持特徴生成手段と、前記把持特徴をもとに、前記把持対象を認識する、把持対象認識手段と、を備える。 An information processing apparatus of the present invention generates an image acquisition unit that acquires an image of a gripping unit that grips a grip target, and a gripping characteristic that indicates a positional relationship between a plurality of predetermined parts of the gripping unit in the image, A gripping feature generation unit and a gripping target recognition unit that recognizes the gripping target based on the gripping feature.

本発明の情報処理方法は、把持対象を把持している把持手段の画像を取得し、前記画像における前記把持手段の複数の所定部位間の位置関係を示す把持特徴を生成し、前記把持特徴をもとに、前記把持対象を認識する。 An information processing method of the present invention obtains an image of a gripping means gripping a gripping target, generates a gripping characteristic indicating a positional relationship between a plurality of predetermined parts of the gripping means in the image, and determines the gripping characteristic. First, the grip target is recognized.

本発明のプログラムは、コンピュータに、把持対象を把持している把持手段の画像を取得し、前記画像における前記把持手段の複数の所定部位間の位置関係を示す把持特徴を生成し、前記把持特徴をもとに、前記把持対象を認識する、処理を実行させる。 The program of the present invention causes a computer to acquire an image of a gripping means gripping a gripping target, generate a gripping characteristic indicating a positional relationship between a plurality of predetermined parts of the gripping means in the image, Based on the above, processing for recognizing the grip target is executed.

本発明の効果は、把持対象を把持している把持手段の画像中に把持対象が存在しない場合であっても、把持対象を認識できることである。 The effect of the present invention is that the gripping target can be recognized even when the gripping target does not exist in the image of the gripping means that grips the gripping target.

本発明の実施の形態の基本的な構成を示すブロック図である。It is a block diagram showing a basic composition of an embodiment of the invention. 本発明の第１の実施の形態における、認識装置１００の構成を示すブロック図である。It is a block diagram which shows the structure of the recognition apparatus 100 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、コンピュータにより実現された認識装置１００の構成を示すブロック図である。It is a block diagram which shows the structure of the recognition apparatus 100 implement|achieved by the computer in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、認識装置１００の動作を示すフローチャートである。5 is a flowchart showing an operation of the recognition device 100 according to the first embodiment of the present invention. 本発明の第１の実施の形態における、取得した画像の例を示す図である。It is a figure which shows the example of the acquired image in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、把持特徴量の生成方法の例を示す図である。It is a figure which shows the example of the generation method of the grasping feature-value in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、把持特徴量の生成方法の他の例を示す図である。It is a figure which shows the other example of the production|generation method of the holding feature-value in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、把持特徴情報１１５の例を示す図である。It is a figure which shows the example of the holding|grip characteristic information 115 in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、インスタンスの抽出例を示す図である。It is a figure which shows the example of extraction of the instance in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、物体尤度の算出結果の例を示す図である。It is a figure which shows the example of the calculation result of object likelihood in the 1st Embodiment of this invention. 本発明の第１の実施の形態における、把持特徴情報１１５の他の例を示す図である。It is a figure which shows the other example of the holding|grip characteristic information 115 in the 1st Embodiment of this invention. 本発明の第２の実施の形態における、認識装置２００の構成を示すブロック図である。It is a block diagram which shows the structure of the recognition apparatus 200 in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における、認識装置２００の動作を示すフローチャートである。It is a flowchart which shows operation|movement of the recognition apparatus 200 in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における、物体特徴情報２２５の例を示す図である。It is a figure which shows the example of the object feature information 225 in the 2nd Embodiment of this invention. 本発明の第２の実施の形態における、統合尤度の算出結果の例を示す図である。It is a figure which shows the example of the calculation result of integrated likelihood in the 2nd Embodiment of this invention. 本発明の第３の実施の形態における、認識装置２００の構成を示すブロック図である。It is a block diagram which shows the structure of the recognition apparatus 200 in the 3rd Embodiment of this invention. 本発明の第３の実施の形態における、認識装置２００の動作を示すフローチャートである。It is a flowchart which shows operation|movement of the recognition apparatus 200 in the 3rd Embodiment of this invention. 本発明の第４の実施の形態における、認識装置２００の構成を示すブロック図である。It is a block diagram which shows the structure of the recognition apparatus 200 in the 4th Embodiment of this invention. 本発明の第４の実施の形態における、認識装置２００の動作を示すフローチャートである。It is a flowchart which shows operation|movement of the recognition apparatus 200 in the 4th Embodiment of this invention. 本発明の第４の実施の形態における、動き特徴を含む把持特徴量の算出例を示す図である。It is a figure in the 4th Embodiment of this invention which shows the example of calculation of the holding|grip characteristic amount containing a motion characteristic.

＜第１の実施の形態＞
はじめに、本発明の第１の実施の形態について説明する。 <First Embodiment>
First, the first embodiment of the present invention will be described.

本発明の第１の実施の形態では、把持手段５０１の複数の所定部位間の位置関係をもとに、把持対象５０２を認識する。なお、本発明の実施の形態では、把持手段５０１が人の手である場合を例に説明する。 In the first embodiment of the present invention, the grip target 502 is recognized based on the positional relationship between a plurality of predetermined parts of the grip means 501. In the embodiment of the present invention, the case where the gripping means 501 is a human hand will be described as an example.

はじめに、本発明の第１の実施の形態の構成を説明する。 First, the configuration of the first embodiment of the present invention will be described.

図２は、本発明の第１の実施の形態における、認識装置１００の構成を示すブロック図である。認識装置１００は、本発明の情報処理装置の一実施形態である。 FIG. 2 is a block diagram showing the configuration of the recognition device 100 according to the first embodiment of the present invention. The recognition device 100 is an embodiment of the information processing device of the present invention.

図２を参照すると、本発明の第１の実施の形態の認識装置１００は、画像取得部１１０、把持手段検出部１１１、把持特徴生成部１１２、物体尤度算出部１１３、把持特徴記憶部１１４、及び、把持対象認識部１１６を含む。 Referring to FIG. 2, the recognition device 100 according to the first embodiment of the present invention includes an image acquisition unit 110, a gripping unit detection unit 111, a gripping feature generation unit 112, an object likelihood calculation unit 113, and a gripping feature storage unit 114. , And a gripping target recognition unit 116.

画像取得部１１０は、把持対象５０２を把持している把持手段５０１の画像を取得する。画像取得部１１０は、赤、青、緑の３色情報を取得可能なＲＧＢカメラでもよい。また、画像取得部１１０は、遠赤外線カメラやマルチスペクトルカメラのように、他の波長信号情報を取得可能なカメラでもよい。また、画像取得部１１０は、画像中の各画素に、カメラから物体までの距離情報を収められるような、距離カメラ（距離センサ）でもよい。さらに、画像取得部１１０は、上述の３色情報、他の波長信号情報、及び、距離情報の内の一つ、または、複数を同時に取得可能なカメラでもよい。 The image acquisition unit 110 acquires an image of the grip means 501 that grips the grip target 502. The image acquisition unit 110 may be an RGB camera that can acquire three-color information of red, blue, and green. The image acquisition unit 110 may be a camera that can acquire other wavelength signal information, such as a far infrared camera or a multispectral camera. Further, the image acquisition unit 110 may be a distance camera (distance sensor) that can store the distance information from the camera to the object in each pixel in the image. Further, the image acquisition unit 110 may be a camera capable of simultaneously acquiring one or more of the above-mentioned three-color information, other wavelength signal information, and distance information.

把持手段検出部１１１は、画像取得部１１０により取得された画像における、把持手段５０１の複数の所定部位の各々の位置、または、位置と方向を検出する。本発明の実施の形態では、所定部位として、把持手段５０１の指が用いられる。また、所定部位の位置（指の位置）として、各指の指先や関節等、指上で指定された位置が用いられる。 The gripping unit detection unit 111 detects the position, or position and direction, of each of the plurality of predetermined portions of the gripping unit 501 in the image acquired by the image acquisition unit 110. In the embodiment of the present invention, the finger of the grip means 501 is used as the predetermined portion. Further, as the position of the predetermined part (the position of the finger), the position designated on the finger such as the fingertip or joint of each finger is used.

把持特徴生成部１１２は、把持手段５０１による把持特徴を表す把持特徴量として、複数の所定部位間の位置関係（指間の位置関係）を示す把持特徴量を生成する。本発明の実施の形態では、複数の所定部位の位置間の位置関係として、各指の位置の座標値や、各指の位置の座標値と方向、指の位置間の距離等が用いられる。 The gripping characteristic generation unit 112 generates a gripping characteristic amount indicating a positional relationship between a plurality of predetermined parts (positional relationship between fingers) as a gripping characteristic amount representing a gripping characteristic by the gripping unit 501. In the embodiment of the present invention, as the positional relationship between the positions of the plurality of predetermined parts, the coordinate value of the position of each finger, the coordinate value and direction of the position of each finger, the distance between the positions of the fingers, and the like are used.

把持特徴記憶部１１４は、把持特徴情報１１５を記憶する。把持特徴情報１１５は、認識すべき物体のカテゴリに対する、「把持特徴量に基づく物体尤度」を算出するための情報である。把持特徴情報１１５には、後述するように、物体尤度の算出方法に応じた情報が設定される。 The gripping characteristic storage unit 114 stores the gripping characteristic information 115. The gripping characteristic information 115 is information for calculating the “object likelihood based on the gripping characteristic amount” for the category of the object to be recognized. As described below, the gripping characteristic information 115 is set with information according to the method of calculating the object likelihood.

物体尤度算出部１１３は、把持特徴生成部１１２により生成された把持特徴量と把持特徴記憶部１１４に記憶されている把持特徴情報１１５とを用いて、物体のカテゴリ毎に、把持特徴量に基づく物体尤度を算出する。 The object likelihood calculation unit 113 uses the gripping feature amount generated by the gripping feature generation unit 112 and the gripping feature information 115 stored in the gripping feature storage unit 114 to determine the gripping feature amount for each category of object. Based on the calculated object likelihood.

把持対象認識部１１６は、物体尤度算出部１１３により算出された物体尤度を用いて、把持対象５０２のカテゴリを認識する。 The grip target recognition unit 116 recognizes the category of the grip target 502 using the object likelihood calculated by the object likelihood calculation unit 113.

なお、認識装置１００は、ＣＰＵ（Central Processing Unit）とプログラムを記憶した記憶媒体を含み、プログラムに基づく制御によって動作するコンピュータであってもよい。 The recognition device 100 may be a computer that includes a CPU (Central Processing Unit) and a storage medium that stores a program and that operates under the control of the program.

図３は、本発明の第１の実施の形態における、コンピュータにより実現された認識装置１００の構成を示すブロック図である。 FIG. 3 is a block diagram showing the configuration of the computer-implemented recognition device 100 according to the first embodiment of the present invention.

認識装置１００は、ＣＰＵ１０１、ハードディスクやメモリ等の記憶デバイス（記憶媒体）１０２、他の装置等と通信を行う通信デバイス１０３、マウスやキーボード等の入力デバイス１０４、及び、ディスプレイ等の出力デバイス１０５を含む。 The recognition device 100 includes a CPU 101, a storage device (storage medium) 102 such as a hard disk and a memory, a communication device 103 that communicates with other devices, an input device 104 such as a mouse and a keyboard, and an output device 105 such as a display. Including.

ＣＰＵ１０１は、画像取得部１１０、把持手段検出部１１１、把持特徴生成部１１２、物体尤度算出部１１３、及び、把持対象認識部１１６の機能を実現するためのコンピュータプログラムを実行する。記憶デバイス１０２は、把持特徴記憶部１１４のデータを記憶する。入力デバイス１０４は、利用者等から、把持対象５０２を把持している把持手段５０１の画像を取得する。出力デバイス１０５が、利用者等へ、認識結果（把持対象５０２の物体のカテゴリ）を出力する。また、通信デバイス１０３は、他の装置等から画像を取得し、他の装置等へ認識結果を出力してもよい。 The CPU 101 executes a computer program for realizing the functions of the image acquisition unit 110, the gripping unit detection unit 111, the gripping feature generation unit 112, the object likelihood calculation unit 113, and the gripped target recognition unit 116. The storage device 102 stores the data of the gripping characteristic storage unit 114. The input device 104 acquires an image of the grip unit 501 that holds the grip target 502 from a user or the like. The output device 105 outputs the recognition result (the category of the object of the grip target 502) to the user or the like. The communication device 103 may acquire an image from another device or the like and output the recognition result to the other device or the like.

次に、本発明の第１の実施の形態の動作を説明する。 Next, the operation of the first exemplary embodiment of the present invention will be described.

図４は、本発明の第１の実施の形態における、認識装置１００の動作を示すフローチャートである。 FIG. 4 is a flowchart showing the operation of the recognition device 100 according to the first embodiment of the present invention.

はじめに、画像取得部１１０は、利用者等から、把持対象５０２を把持している把持手段５０１の画像を取得する（ステップＳ１０１）。 First, the image acquisition unit 110 acquires an image of the grip means 501 gripping the grip target 502 from a user or the like (step S101).

図５は、本発明の第１の実施の形態における、取得した画像の例を示す図である。
例えば、画像取得部１１０は、図５のような画像を取得する。 FIG. 5 is a diagram showing an example of an acquired image according to the first embodiment of the present invention.
For example, the image acquisition unit 110 acquires an image as shown in FIG.

把持手段検出部１１１は、画像取得部１１０により取得された画像における、把持手段５０１の各指の位置、または、各指の位置と方向を検出する（ステップＳ１０２）。ここで、把持手段検出部１１１は、指の位置や方向を、例えば、指の色や形状、配置に基づいて検出する。また、把持手段検出部１１１は、指の位置を、非特許文献２に記載されて技術を用いて検出してもよい。さらに、把持手段検出部１１１は、指の位置を、画像中で指が存在する部分について検出してもよいし、指が存在する部分の検出結果をもとにした形状推定等により、指が存在しない部分についても推定してよい。また、指の位置は、画像中の２次元座標で指定されてもよいし、実空間内の３次元座標で指定されてもよい。また、座標の値は、ある特定の点を原点とした絶対座標値でもよいし、任意の点からの相対座標値であってもよい。また、把持手段検出部１１１は、各指について、複数の位置を検出してもよい。 The gripping unit detection unit 111 detects the position of each finger of the gripping unit 501 or the position and direction of each finger in the image acquired by the image acquisition unit 110 (step S102). Here, the gripping means detection unit 111 detects the position and direction of the finger, for example, based on the color, shape, and arrangement of the finger. In addition, the gripping unit detection unit 111 may detect the position of the finger by using the technique described in Non-Patent Document 2. Further, the gripping unit detection unit 111 may detect the position of the finger in a portion where the finger exists in the image, or the finger may be detected by shape estimation based on the detection result of the portion where the finger exists. It is also possible to estimate the part that does not exist. Further, the position of the finger may be designated by two-dimensional coordinates in the image, or may be designated by three-dimensional coordinates in the real space. Further, the coordinate value may be an absolute coordinate value having a certain point as an origin, or a relative coordinate value from an arbitrary point. Further, the gripping means detection unit 111 may detect a plurality of positions for each finger.

把持手段検出部１１１は、画像において検出された指の本数が、２本以上かどうかを判定する（ステップＳ１０３）。
ステップＳ１０３において、検出された指の本数が２本以上の場合（ステップＳ１０３／Ｙ）、把持特徴生成部１１２は、検出された各指の位置、または、各指の位置と方向をもとに、把持特徴量を生成する（ステップＳ１０５）。ここで、把持特徴生成部１１２は、例えば、以下の把持特徴量生成方法１〜３のいずれかに従って、把持特徴量を生成する。 The gripping unit detection unit 111 determines whether the number of fingers detected in the image is two or more (step S103).
In step S103, when the number of detected fingers is two or more (step S103/Y), the gripping-feature generation unit 112 uses the detected position of each finger or the position and direction of each finger. , And generate gripping feature amounts (step S105). Here, the gripping feature generation unit 112 generates the gripping feature amount according to any of the following gripping feature amount generation methods 1 to 3, for example.

（把持特徴量生成方法１）
図６は、本発明の第１の実施の形態における、把持特徴量の生成方法の例を示す図である。ここで、把持手段検出部１１１により、ｎ本の指の位置が検出されたと仮定する。この場合、図６に示すような、当該ｎ本の指の位置を互いに結ぶ線分が得られる。線分の数Ｎ_ｌは、数１式により算出される。 (Grip feature amount generation method 1)
FIG. 6 is a diagram showing an example of a method of generating a gripping characteristic amount according to the first embodiment of the present invention. Here, it is assumed that the gripping unit detection unit 111 detects the positions of n fingers. In this case, a line segment connecting the positions of the n fingers is obtained as shown in FIG. The number N ₁ of line segments is calculated by the equation (1).

Ｎ_ｌ本の線分の各々の長さｌ_ｉ（ｉ＝１，…，Ｎ_ｌ）は、検出されたｊ番目（ｊ＝１，…，ｎ）の指の位置をＰ_ｊ＝（ｘ_ｊ，ｙ_ｊ，ｚ_ｊ）とすると、数２式により算出される。 The length l _i (i=1,..., N _l ) of each of the N _l line segments corresponds to the detected position of the j-th (j=1,..., n) finger as P _j =(x _j , Y _j , z _j ) is calculated by the equation (2).

ここで、線分の長さｌ_ｉを大きい順に並べると、数３式のようなベクトル形式の把持特徴量Ｖ_Ａが定義できる。 Here, by arranging the lengths l _i of the line segments in the descending order, the gripping feature amount V _{A in the} vector format as in the equation 3 can be defined.

また、把持特徴量Ｖ_Ａの要素の最大値をｍｘとすると、数４式のような把持特徴量Ｖ’_Ａが定義できる。 Further, when the maximum value of the element of the gripping characteristic amount V _A is mx, the gripping characteristic amount V′ _A can be defined by the equation (4).

把持特徴生成部１１２は、図６の線分の長さをもとに、数３式、または、数４式のような把持特徴量を生成する。 The gripping-feature generation unit 112 generates a gripping-feature amount such as Equation 3 or Equation 4 based on the length of the line segment in FIG.

例えば、物体尤度算出部１１３は、図５の画像から、把持特徴量Ｖ_Ａ＝（1.5, 1.0, 0.3, …）を生成する。 For example, the object likelihood calculation unit 113 generates a gripping feature amount V _A =(1.5, 1.0, 0.3,...) From the image in FIG.

（把持特徴量生成方法２）
図７は、本発明の第１の実施の形態における、把持特徴量の生成方法の他の例を示す図である。ここで、把持手段検出部１１１により、ｎ本の指の位置が検出されたと仮定する。この場合、図７に示すように、検出された各指の位置と他の指の位置とを結ぶｎ−１本の線分が得られる。これらの線分の長さの組をＧｒ_ｉ＝（ｌ_ｉ，１，ｌ_ｉ，２，…，ｌ_{ｉ，ｎ−１}）とする。また_、Ｇｒ_ｉの要素を大きさ（長さ）の降順に並び替えたものをＧｒ’_ｉ＝（ｌ’_ｉ，１，ｌ’_ｉ，２，…，ｌ’_{ｉ，ｎ−１}）とする。さらに、Ｇｒ’_ｉを、最初の要素ｌ’_ｉ，１の大きい順に並べ換えたものを（Ｇｒ”_０，Ｇｒ”_１，…，Ｇｒ”_ｎ）とし、Ｇｒ”_ｉの要素をＧｒ”_ｉ＝（ｌ”_ｉ，１，ｌ”_ｉ，２，…，ｌ”_{ｉ，ｎ−１}）と記述する。各Ｇｒ”_ｉの要素を並べることにより、数５式のような把持特徴量Ｖ_Ｂが定義できる。 (Grip feature amount generation method 2)
FIG. 7 is a diagram showing another example of the method of generating the gripping characteristic amount according to the first embodiment of the present invention. Here, it is assumed that the gripping unit detection unit 111 detects the positions of n fingers. In this case, as shown in FIG. 7, n-1 line segments connecting the detected positions of the respective fingers and the positions of the other fingers are obtained. _Let the set of lengths of these line segments be Gr _i =(l _i,1 , l _i,2 ,..., l _i,n−1 ). _Also elements the size of Gr _i Gr those sorted in descending order _{_{(length) 'i = (l' i}} , 1, l 'i, 2, ..., l' i, n-1) to .. Furthermore, 'a _i, the first element l' Gr those rearranged in descending order of _{_{_{i, 1 (Gr "0,}}} Gr" 1, ..., Gr "n) and then, _{Gr" i} elements Gr _"i = the ( l″ _i,1 , l″ _i,2 ,..., L″ _i,n−1 ). By arranging the elements of each Gr″ _i, the gripping characteristic amount V _B as in the formula 5 can be defined.

また、把持特徴量Ｖ_Ｂの要素の最大値ｍｘを用いて、数６式のような把持特徴量Ｖ’_Ｂが定義できる。 Further, using the maximum value mx of the elements of the gripping feature quantity V _B, it can be defined gripping feature quantity V _'B, such as equation (6).

把持特徴生成部１１２は、図７の線分の長さをもとに、数５式、または、数６式のような把持特徴量を生成する。 The gripping-feature generating unit 112 generates a gripping-feature amount such as Equation 5 or Equation 6 based on the length of the line segment in FIG. 7.

（把持特徴量生成方法３）
把持手段検出部１１１により、各指の位置と方向に加えて、各指が親指、人差し指、中指、薬指、及び、小指の内のどの指かを特定できたと仮定する。この場合、例えば、親指、人差し指、中指、薬指、及び、小指の順で、各指の座標値Ｐ_ｊ＝（ｘ_ｊ，ｙ_ｊ，ｚ_ｊ）、（ｊ＝１，…，ｎ）及び、方向Ｄ_ｊ＝（ａ_ｊ，ｂ_ｊ，ｃ_ｊ）が得られる。ここで、各指の方向には、例えば、距離センサで得られる指先位置の法線の方向を用いてもよいし、第一関節から指先へ向かう方向を用いてもよい。また、指の順序（親指、人差し指、…）として、他の順序を用いてもよい。 (Grip feature amount generation method 3)
In addition to the position and direction of each finger, it is assumed that the gripping unit detection unit 111 can identify which of the thumb, the index finger, the middle finger, the ring finger, and the little finger each finger is. In this case, for example, the coordinate values P _j =(x _j , y _j , z _j ), (j=1,..., N) of each finger in the order of thumb, index finger, middle finger, ring finger, and little finger, and The direction D _j =(a _j , b _j , c _j ) is obtained. Here, as the direction of each finger, for example, the direction of the normal line of the fingertip position obtained by the distance sensor may be used, or the direction from the first joint to the fingertip may be used. Further, as the order of the fingers (thumb, index finger,...), another order may be used.

これらの座標値、方向を、所定の座標系Ｚ上の座標、方向で表すことにより、数７式のような把持特徴量Ｖ_Ｃ、または、数８式のような把持特徴量Ｖ_Ｄが定義できる。 By expressing these coordinate values and directions by the coordinates and directions on the predetermined coordinate system Z, the gripping feature amount V _C as in the formula 7 or the gripping feature amount V _D as in the formula 8 is defined. it can.

座標系Ｚとしては、例えば、予め定めた一本の指の方向と平行な座標軸を持つ座標系を用いてもよい。 As the coordinate system Z, for example, a coordinate system having a predetermined coordinate axis parallel to the direction of one finger may be used.

把持特徴生成部１１２は、各指の位置や方向をもとに、数７式、または、数８式の把持特徴量を生成する。 The gripping characteristic generation unit 112 generates a gripping characteristic amount of Expression 7 or Expression 8 based on the position and direction of each finger.

次に、物体尤度算出部１１３は、物体のカテゴリ毎に、把持特徴量に基づく物体尤度を算出する（ステップＳ１０６）。ここで、物体尤度算出部１１３は、例えば、以下の物体尤度算出方法１〜３のいずれかに従って、物体尤度を算出する。 Next, the object likelihood calculating unit 113 calculates the object likelihood based on the gripping feature amount for each object category (step S106). Here, the object likelihood calculation unit 113 calculates the object likelihood according to any of the following object likelihood calculation methods 1 to 3, for example.

（物体尤度算出方法１）
図８は、本発明の第１の実施の形態における、把持特徴情報１１５の例を示す図である。図８の把持特徴情報１１５では、物体のカテゴリ毎に、当該物体を把持した場合の把持特徴量を示すインスタンスが登録されている。ここで、各カテゴリと当該カテゴリに対して登録された把持特徴量の対をインスタンスと呼ぶ。一つのカテゴリに対して、複数のインスタンスが登録されていてもよい。ここで、物体のカテゴリをＣ_ｉ（ｉ＝１，…，Ｍ、Ｍはカテゴリの数）、カテゴリＣ_ｉに対応する把持特徴量をＶ_ｉｊ（ｊ＝１，…，Ｒ（ｉ）、Ｒ（ｉ）はカテゴリＣ_ｉに対する把持特徴量の数）とすると、インスタンスは、（Ｃ_ｉ，Ｖ_ｉｊ）と表される。 (Object likelihood calculation method 1)
FIG. 8 is a diagram showing an example of the gripping characteristic information 115 according to the first embodiment of the present invention. In the gripping characteristic information 115 of FIG. 8, an instance indicating a gripping characteristic amount when the object is gripped is registered for each category of the object. Here, a pair of each category and the gripping feature amount registered for the category is called an instance. Multiple instances may be registered for one category. Here, the object category is C _i (i=1,..., M, M is the number of categories), and the gripping feature amount corresponding to the category C _i is V _ij (j=1,..., R(i), R When (i) is the number of gripping feature quantities for the category C _i ), the instance is represented as (C _i , V _ij ).

物体尤度算出部１１３は、把持特徴生成部１１２により生成された把持特徴量と把持特徴情報１１５に登録された各インスタンスの把持特徴量Ｖ_ｉｊとの距離を算出する。距離は、ユークリッド距離でもマンハッタン距離でも、その他の距離尺度でもよい。物体尤度算出部１１３は、把持特徴情報１１５に登録されたインスタンスの内、算出された距離が所定の閾値以下のインスタンスを抽出する。そして、物体尤度算出部１１３は、抽出したインスタンスを用いて、各カテゴリＣ_ｉの物体尤度Ｌ_ｉを、例えば、数９式により算出する。 The object likelihood calculation unit 113 calculates the distance between the gripping characteristic amount generated by the gripping characteristic generating unit 112 and the gripping characteristic amount V _{ij of} each instance registered in the gripping characteristic information 115. The distance may be Euclidean distance, Manhattan distance, or any other distance measure. The object likelihood calculating unit 113 extracts, from the instances registered in the gripping characteristic information 115, the instances whose calculated distance is equal to or less than a predetermined threshold value. Then, the object likelihood calculating unit 113 calculates the object likelihood L _i of each category C _i by using the extracted instance, for example, by the formula (9).

ここで、ｋは、抽出されたインスタンスの数、Ｎｃ_ｉは、ｋ個のインスタンスの内、カテゴリＣ_ｉに対応するインスタンス（Ｃ_ｉ，Ｖ_ｉｊ）の数である。 Here, k is the number of extracted instances, and Nc _i is the number of instances (C _i , V _ij ) corresponding to the category C _i among the k instances.

図９は、本発明の第１の実施の形態における、インスタンスの抽出例を示す図である。例えば、物体尤度算出部１１３は、図９に示すように、図８の把持特徴情報１１５に登録されたインスタンスの内、把持特徴生成部１１２により生成された把持特徴量Ｖ＝(1.5, 1.0, 0.3, …)との距離が閾値以下である１０個のインスタンスを抽出する。 FIG. 9 is a diagram showing an example of instance extraction in the first embodiment of the present invention. For example, as shown in FIG. 9, the object likelihood calculation unit 113, among the instances registered in the gripping characteristic information 115 of FIG. 8, the gripping characteristic amount V=(1.5, 1.0) generated by the gripping characteristic generating unit 112. , 0.3,...) 10 instances whose distance to the threshold is less than or equal to the threshold value are extracted.

図１０は、本発明の第１の実施の形態における、物体尤度の算出結果の例を示す図である。例えば、物体尤度算出部１１３は、抽出されたインスタンスの数をもとに、図１０のように物体尤度を算出する。 FIG. 10 is a diagram showing an example of the calculation result of the object likelihood in the first embodiment of the present invention. For example, the object likelihood calculation unit 113 calculates the object likelihood as shown in FIG. 10 based on the number of extracted instances.

なお、物体尤度算出部１１３は、距離が所定の閾値以下のインスタンスを抽出する代わりに、距離が小さい方から所定数のインスタンスを抽出してもよい。 Note that the object likelihood calculation unit 113 may extract a predetermined number of instances from the smaller distance instead of extracting the instances whose distance is equal to or less than the predetermined threshold value.

（物体尤度算出方法２）
図１１は、本発明の第１の実施の形態における、把持特徴情報１１５の他の例を示す図である。図１１の把持特徴情報１１５では、物体のカテゴリ毎に、把持特徴量空間での各点における物体尤度が登録されている。この場合、各点における物体尤度は、予め、最近傍密度推定法やカーネル密度推定法等により算出される。 (Object likelihood calculation method 2)
FIG. 11 is a diagram showing another example of the gripping characteristic information 115 according to the first embodiment of the present invention. In the gripping feature information 115 of FIG. 11, the object likelihood at each point in the gripping feature amount space is registered for each object category. In this case, the object likelihood at each point is calculated in advance by the nearest neighbor density estimation method, the kernel density estimation method, or the like.

物体尤度算出部１１３は、把持特徴情報１１５を参照し、把持特徴生成部１１２により生成された把持特徴量に対応する物体尤度を取得することにより、各カテゴリＣ_ｉの物体尤度Ｌｉを算出する。 The object likelihood calculation unit 113 refers to the gripping characteristic information 115 and acquires the object likelihood corresponding to the gripping characteristic amount generated by the gripping characteristic generation unit 112, thereby calculating the object likelihood Li of each category C _i. calculate.

（物体尤度算出方法３）
把持特徴情報１１５には、例えば、Support Vector MachineやRandom Forest等、機械学習によって得られた学習結果が登録されていてもよい。 (Object likelihood calculation method 3)
In the gripping characteristic information 115, learning results obtained by machine learning such as Support Vector Machine and Random Forest may be registered.

例えば、Support Vector Machineを用いた場合、学習結果として、把持特徴空間内での識別境界面が登録される。この場合、物体尤度算出部１１３は、把持特徴生成部１１２により生成された把持特徴量と識別境界面との距離の関数により、各カテゴリＣ_ｉの物体尤度Ｌ_ｉを算出する。 For example, when the Support Vector Machine is used, the identification boundary surface in the gripping feature space is registered as the learning result. In this case, the object likelihood calculation unit 113 calculates the object likelihood L _i of each category C _i by the function of the distance between the gripping feature amount generated by the gripping feature generation unit 112 and the identification boundary surface.

また、Random Forestを用いた場合、学習結果として、識別器での識別境界とその識別器の重みが登録される。この場合、物体尤度算出部１１３は、重み付投票結果の関数により、各カテゴリＣ_ｉの物体尤度Ｌ_ｉを算出する。 Further, when Random Forest is used, the discrimination boundary in the discriminator and the weight of the discriminator are registered as the learning result. In this case, the object likelihood calculation unit 113 calculates the object likelihood L _i of each category C _i by the function of the weighted voting result.

次に、把持対象認識部１１６は、算出した物体尤度を用いて、把持対象５０２のカテゴリを認識する（ステップＳ１０７）。ここで、把持対象認識部１１６は、例えば、数１０式に従って、把持対象５０２のカテゴリｉｄｘを特定する。 Next, the gripping target recognition unit 116 recognizes the category of the gripping target 502 using the calculated object likelihood (step S107). Here, the gripping target recognition unit 116 specifies the category idx of the gripping target 502, for example, according to Expression 10.

なお、Ｌ_ｔｈは、予め設定された、物体尤度Ｌ_ｉの最大値に対する閾値である。 Note that L _th is a preset threshold value for the maximum value of the object likelihood L _i .

例えば、把持対象認識部１１６は、図１０の物体尤度算出結果をもとに、把持対象５０２のカテゴリを、物体尤度が最大であるカテゴリＣ_１と特定する。 For example, the gripping target recognition unit 116 identifies the category of the gripping target 502 as the category C ₁ having the maximum object likelihood based on the object likelihood calculation result of FIG. 10.

把持対象認識部１１６は、ステップＳ１０６の結果に応じて、認識結果に、把持対象５０２のカテゴリのインデックス、または、「該当なし」を設定する（ステップＳ１０８）。 The gripping target recognition unit 116 sets an index of the category of the gripping target 502 or "not applicable" in the recognition result according to the result of step S106 (step S108).

例えば、把持対象認識部１１６は、認識結果に「カテゴリＣ_１」を設定する。 For example, the grip target recognition unit 116 sets “category C ₁ ”in the recognition result.

なお、ステップＳ１０３において、検出された指の本数が２本未満の場合（ステップＳ１０３／Ｎ）、認識結果には、「該当なし」が設定される（ステップＳ１０４）。 When the number of detected fingers is less than two in step S103 (step S103/N), "not applicable" is set in the recognition result (step S104).

最後に、把持対象認識部１１６は、利用者等へ、認識結果を出力する（ステップＳ１０９）。 Finally, the gripping target recognition unit 116 outputs the recognition result to the user or the like (step S109).

以上により、本発明の第１の実施の形態の動作が完了する。 With the above, the operation of the first exemplary embodiment of the present invention is completed.

なお、上述の説明では、把持対象認識部１１６は、物体尤度算出部１１３により算出された各カテゴリに対する物体尤度をもとに、物体のカテゴリを特定した。しかしながら、これに限らず、把持対象認識部１１６は、物体形状に対する物体尤度をもとに物体形状を特定し、さらに、各物体形状に関連づけられた物体のカテゴリを取得することにより、物体のカテゴリを特定してもよい。 In the above description, the gripping target recognition unit 116 specifies the object category based on the object likelihood for each category calculated by the object likelihood calculation unit 113. However, the present invention is not limited to this, and the gripping target recognition unit 116 identifies the object shape based on the object likelihood with respect to the object shape, and further acquires the category of the object associated with each object shape. You may specify the category.

また、本発明の実施の形態では、把持手段５０１が人の手である場合を例に説明したが、これに限らず、人の手と同様に物体を把持できれば、把持手段５０１は、動物やロボット等の手でもよい。 Further, in the embodiment of the present invention, the case where the gripping means 501 is a human hand has been described as an example, but the present invention is not limited to this, and if the gripping means 501 can grip an object similarly to a human hand, the gripping means 501 can A hand such as a robot may be used.

また、複数の可動部を動かすことにより物体を把持できれば、把持手段５０１は、手以外の形状であってもよい。この場合、所定部位として、例えば、把持手段５０１の各可動部が用いられ、所定部位の位置（可動部の位置）として、各可動部の先端や中心、関節等、指定された位置が検出される。また、複数の所定部位の位置間の位置関係（可動部間の位置関係）として、各可動部の位置の座標値や、可動部の位置間の距離等が用いられる。 Further, the gripping means 501 may have a shape other than the hand as long as the object can be gripped by moving the plurality of movable parts. In this case, for example, each movable part of the gripping means 501 is used as the predetermined part, and as the position of the predetermined part (position of the movable part), a designated position such as a tip, a center, or a joint of each movable part is detected. It Further, as the positional relationship between the positions of the plurality of predetermined parts (positional relationship between the movable parts), the coordinate value of the position of each movable part, the distance between the positions of the movable parts, etc. are used.

次に、本発明の実施の形態の基本的な構成を説明する。図１は、本発明の実施の形態の基本的な構成を示すブロック図である。 Next, a basic configuration of the embodiment of the present invention will be described. FIG. 1 is a block diagram showing the basic configuration of the embodiment of the present invention.

図１を参照すると、認識装置１００は、画像取得部１１０、把持特徴生成部１１２、及び、把持対象認識部１１６を含む。画像取得部１１０は、把持対象５０２を把持している把持手段５０１の画像を取得する。把持特徴生成部１１２は、画像における把持手段５０１の複数の所定部位間の位置関係を示す把持特徴を生成する。把持対象認識部１１６は、把持特徴をもとに、把持対象５０２を認識する。 Referring to FIG. 1, the recognition device 100 includes an image acquisition unit 110, a gripping feature generation unit 112, and a gripping target recognition unit 116. The image acquisition unit 110 acquires an image of the grip means 501 that grips the grip target 502. The gripping characteristic generation unit 112 generates a gripping characteristic indicating a positional relationship between a plurality of predetermined parts of the gripping means 501 in the image. The grip target recognition unit 116 recognizes the grip target 502 based on the grip characteristics.

本発明の実施の形態によれば、把持対象５０２を把持している把持手段５０１の画像中に把持対象５０２が存在しない場合であっても、把持対象５０２を認識できる。その理由は、把持対象認識部１１６が、画像における把持手段５０１の複数の所定部位間の位置関係を示す把持特徴をもとに、把持対象５０２を認識するためである。これにより、把持対象５０２を把持している把持手段５０１の画像において、把持対象５０２が遮蔽されている場合であっても、把持対象５０２を認識できる。 According to the embodiment of the present invention, even if the gripping target 502 does not exist in the image of the gripping unit 501 that grips the gripping target 502, the gripping target 502 can be recognized. The reason is that the gripping target recognition unit 116 recognizes the gripping target 502 based on the gripping characteristics indicating the positional relationship between the plurality of predetermined parts of the gripping means 501 in the image. Thereby, even if the gripping target 502 is shielded in the image of the gripping unit 501 gripping the gripping target 502, the gripping target 502 can be recognized.

＜第２の実施の形態＞
次に、本発明の第２の実施の形態について説明する。 <Second Embodiment>
Next, a second embodiment of the present invention will be described.

本発明の第２の実施の形態では、把持特徴量に加えて、画像上の把持対象５０２の物体領域から物体特徴量を生成し、把持特徴量と物体特徴量とを用いて、把持物体を認識する。なお、本発明の第２の実施の形態では、把持特徴量基づく物体尤度を第１物体尤度、物体特徴量に基づく物体尤度を第２物体尤度と呼ぶ。 In the second embodiment of the present invention, in addition to the gripping feature amount, the object feature amount is generated from the object region of the gripping target 502 on the image, and the gripping object is determined using the gripping feature amount and the object feature amount. recognize. In the second embodiment of the present invention, the object likelihood based on the gripping feature amount is referred to as a first object likelihood, and the object likelihood based on the object feature amount is referred to as a second object likelihood.

はじめに、本発明の第２の実施の形態の構成を説明する。 First, the configuration of the second embodiment of the present invention will be described.

図１２は、本発明の第２の実施の形態における、認識装置２００の構成を示すブロック図である。 FIG. 12: is a block diagram which shows the structure of the recognition apparatus 200 in the 2nd Embodiment of this invention.

図１２を参照すると、本発明の第２の実施の形態の認識装置２００は、画像取得部２１０、把持手段検出部２１１、把持特徴生成部２１２、第１物体尤度算出部２１３、把持特徴記憶部２１４を含む。認識装置２００は、さらに、把持対象検出部２２１、物体特徴生成部２２２、第２物体尤度算出部２２３、物体特徴記憶部２２４、統合尤度算出部２３０、及び、把持対象認識部２４０を含む。 Referring to FIG. 12, the recognition device 200 according to the second exemplary embodiment of the present invention includes an image acquisition unit 210, a gripping unit detection unit 211, a gripping feature generation unit 212, a first object likelihood calculation unit 213, and a gripping feature storage. The unit 214 is included. The recognition apparatus 200 further includes a grip target detection unit 221, an object feature generation unit 222, a second object likelihood calculation unit 223, an object feature storage unit 224, an integrated likelihood calculation unit 230, and a grip target recognition unit 240. ..

画像取得部２１０、把持手段検出部２１１、及び、把持特徴生成部２１２は、それぞれ、本発明の第１の実施の形態における、画像取得部１１０、把持手段検出部１１１、及び、把持特徴生成部１１２と同様である。把持特徴記憶部２１４は、把持特徴情報１１５と同様の把持特徴情報２１５（「把持特徴量に基づく物体尤度（第１物体尤度）」を算出するための情報）を記憶する。第１物体尤度算出部２１３は、物体尤度算出部１１３と同様に、物体のカテゴリ毎に、第１物体尤度を算出する。 The image acquisition unit 210, the gripping unit detection unit 211, and the gripping feature generation unit 212 are respectively the image acquisition unit 110, the gripping unit detection unit 111, and the gripping feature generation unit according to the first embodiment of the present invention. Similar to 112. The gripping characteristic storage unit 214 stores gripping characteristic information 215 (information for calculating “object likelihood based on gripping characteristic amount (first object likelihood)”) similar to the gripping characteristic information 115. The first object likelihood calculation unit 213, like the object likelihood calculation unit 113, calculates the first object likelihood for each category of object.

把持対象検出部２２１は、画像取得部１１０により取得された画像における、把持対象５０２の物体領域を検出する。ここで、把持対象検出部２２１は、例えば、背景が固定である場合に移動物体を検出する背景差分法を用いて、物体領域を検出してもよい。また、把持対象検出部２２１は、距離が所定の閾値よりも小さい（近い）画素を、物体領域として検出してもよい。また，把持対象検出部２２１は、ある把持対象候補領域（例えば画像の中心）に類似する周辺画素を把持対象に属する領域とみなすことにより、物体領域を検出してもよい。 The grip target detection unit 221 detects the object region of the grip target 502 in the image acquired by the image acquisition unit 110. Here, the gripping target detection unit 221 may detect the object region using, for example, a background subtraction method that detects a moving object when the background is fixed. In addition, the gripping target detection unit 221 may detect pixels whose distance is smaller (closer) than a predetermined threshold as the object region. The gripping target detection unit 221 may detect the object region by regarding the peripheral pixels similar to a certain gripping target candidate region (for example, the center of the image) as regions that belong to the gripping target.

物体特徴生成部２２２は、把持対象５０２の物体の特徴を表す物体特徴量として、把持対象５０２の色や模様等に係る特徴を示す物体特徴量を生成する。ここで、物体特徴として、例えば、色の出現頻度や色の配置を用いてもよい。また、物体特徴として、画像の輝度値の部分的な明暗パターンや、輝度値の変化方向、フィルタへの応答強度を用いてもよい。 The object feature generation unit 222 generates, as the object feature amount representing the feature of the object of the grip target 502, the object feature amount representing the feature of the color or pattern of the grip target 502. Here, for example, the appearance frequency of colors or the arrangement of colors may be used as the object feature. Further, as the object feature, a partial bright/dark pattern of the brightness value of the image, the changing direction of the brightness value, or the response strength to the filter may be used.

物体特徴記憶部２２４は、物体特徴情報２２５を記憶する。物体特徴情報２２５は、認識すべき物体のカテゴリに対する、「物体特徴量に基づく物体尤度（第２物体尤度）」を算出するための情報である。 The object feature storage unit 224 stores the object feature information 225. The object feature information 225 is information for calculating the “object likelihood based on the object feature amount (second object likelihood)” for the category of the object to be recognized.

第２物体尤度算出部２２３は、物体特徴生成部２２２により生成された物体特徴量と物体特徴記憶部２２４に記憶されている物体特徴情報２２５とを用いて、物体のカテゴリ毎に、第２物体尤度を算出する。 The second object likelihood calculation unit 223 uses the object feature amount generated by the object feature generation unit 222 and the object feature information 225 stored in the object feature storage unit 224 to determine the second object likelihood for each object category. Calculate the object likelihood.

統合尤度算出部２３０は、第１物体尤度と第２物体尤度とを用いて、統合尤度を算出する。 The integrated likelihood calculating unit 230 calculates the integrated likelihood using the first object likelihood and the second object likelihood.

把持対象認識部２４０は、統合尤度算出部２３０により算出された統合尤度を用いて、把持対象５０２のカテゴリを認識する。 The grip target recognition unit 240 recognizes the category of the grip target 502 using the integrated likelihood calculated by the integrated likelihood calculation unit 230.

次に、本発明の第２の実施の形態の動作を説明する。 Next, the operation of the second exemplary embodiment of the present invention will be described.

図１３は、本発明の第２の実施の形態における、認識装置２００の動作を示すフローチャートである。 FIG. 13 is a flowchart showing the operation of the recognition device 200 according to the second embodiment of the present invention.

はじめに、画像取得部２１０は、把持対象５０２を把持している把持手段５０１の画像を取得する（ステップＳ２０１）。この場合、画像には、把持対象５０２の物体領域の少なくとも一部が含まれると仮定する。 First, the image acquisition unit 210 acquires an image of the grip means 501 that grips the grip target 502 (step S201). In this case, it is assumed that the image includes at least a part of the object region of the grip target 502.

把持手段検出部２１１は、画像取得部２１０により取得された画像における、把持手段５０１の各指の位置、または、各指の位置と方向を検出する（ステップＳ２０２）。 The gripping unit detection unit 211 detects the position of each finger of the gripping unit 501 or the position and direction of each finger in the image acquired by the image acquisition unit 210 (step S202).

把持対象検出部２２１は、画像取得部２１０により取得された画像における、把持対象５０２の物体領域を検出する（ステップＳ２０３）。 The grip target detection unit 221 detects the object region of the grip target 502 in the image acquired by the image acquisition unit 210 (step S203).

把持手段検出部２１１は、画像において検出された指の本数が、２本以上かどうかを判定する（ステップＳ２０４）。 The gripping unit detection unit 211 determines whether the number of fingers detected in the image is two or more (step S204).

ステップＳ２０４において、検出された指の本数が２本以上の場合（ステップＳ２０４／Ｙ）、把持特徴生成部２１２は、検出された各指の位置、または、各指の位置と方向をもとに、把持特徴量を生成する（ステップＳ２０６）。 In step S204, when the number of detected fingers is two or more (step S204/Y), the gripping characteristic generation unit 212 determines the position of each finger or the position and direction of each finger based on the detected position. , And generate gripping feature amounts (step S206).

第１物体尤度算出部２１３は、生成された把持特徴量をもとに、物体のカテゴリ毎に第１物体尤度を算出する（ステップＳ２０７）。 The first object likelihood calculation unit 213 calculates the first object likelihood for each object category based on the generated gripping feature amount (step S207).

図１５は、本発明の第２の実施の形態における、統合尤度の算出結果の例を示す図である。 FIG. 15 is a diagram showing an example of the integrated likelihood calculation result in the second exemplary embodiment of the present invention.

例えば、第１物体尤度算出部２１３は、図８の把持特徴情報１１５に登録されたインスタンスの内、把持特徴生成部１１２により生成された把持特徴量との距離が閾値以下であるインスタンスを抽出する。そして、第１物体尤度算出部２１３は、抽出されたインスタンスの数をもとに、図１５のように第１物体尤度を算出する。 For example, the first object likelihood calculation unit 213 extracts, from the instances registered in the gripping characteristic information 115 in FIG. 8, instances whose distance to the gripping characteristic amount generated by the gripping characteristic generation unit 112 is equal to or less than a threshold value. To do. Then, the first object likelihood calculation unit 213 calculates the first object likelihood as shown in FIG. 15 based on the number of extracted instances.

なお、ステップＳ２０４において、検出された指の本数が２本未満の場合（ステップＳ２０４／Ｎ）、全カテゴリに対する第１物体尤度に１が設定される（ステップＳ２０５）。 When the number of detected fingers is less than two in step S204 (step S204/N), 1 is set to the first object likelihood for all categories (step S205).

次に、物体特徴生成部２２２は、検出された把持対象５０２の物体領域から、物体特徴量を生成する（ステップＳ２０８）。 Next, the object feature generation unit 222 generates an object feature amount from the detected object region of the grip target 502 (step S208).

第２物体尤度算出部２２３は、生成された物体特徴量をもとに、物体のカテゴリ毎に、第２物体尤度を算出する（ステップＳ２０９）。ここで、第２物体尤度算出部２２３は、例えば、把持特徴量に基づく物体尤度（第１物体尤度）の算出方法と同様の方法で、第２物体尤度を算出する。 The second object likelihood calculation unit 223 calculates the second object likelihood for each object category based on the generated object feature amount (step S209). Here, the second object likelihood calculating unit 223 calculates the second object likelihood, for example, by the same method as the method of calculating the object likelihood (first object likelihood) based on the gripping feature amount.

図１４は、本発明の第２の実施の形態における、物体特徴情報２２５の例を示す図である。図１４の物体特徴情報２２５では、物体のカテゴリ毎に、当該物体の物体特徴量を示すインスタンスが登録されている。ここで、物体特徴量は、例えば、物体の色や模様等の物体特徴を表す。 FIG. 14 is a diagram showing an example of the object feature information 225 according to the second embodiment of the present invention. In the object feature information 225 of FIG. 14, an instance indicating the object feature amount of the object is registered for each category of the object. Here, the object feature amount represents, for example, an object feature such as a color or a pattern of the object.

例えば、第２物体尤度算出部２２３は、図１４の物体特徴情報２２５に登録されたインスタンスの内、物体特徴生成部２２２により生成された物体特徴量との距離が閾値以下であるインスタンスを抽出する。そして、第２物体尤度算出部２２３は、抽出されたインスタンスの数をもとに、図１５のように第２物体尤度を算出する。 For example, the second object likelihood calculation unit 223 extracts an instance whose distance from the object feature amount generated by the object feature generation unit 222 is equal to or less than a threshold value among the instances registered in the object feature information 225 of FIG. To do. Then, the second object likelihood calculation unit 223 calculates the second object likelihood as shown in FIG. 15 based on the number of extracted instances.

次に、統合尤度算出部２３０は、第１物体尤度と第２物体尤度とを用いて、統合尤度を算出する（ステップＳ２１０）。統合尤度算出部２３０は、各カテゴリＣ_ｉの統合尤度Ｌ_ｃｏｍｂ（ｉ）を、例えば、数１１式、または、数１２式により算出する。 Next, the integrated likelihood calculating unit 230 calculates the integrated likelihood using the first object likelihood and the second object likelihood (step S210). The integrated likelihood calculating unit 230 calculates the integrated likelihood L _comb (i) of each category C _i by, for example, Expression 11 or Expression 12.

ここで、Ｌ_Ａ（ｉ）、Ｌ_Ｂ（ｉ）は、それぞれ、カテゴリＣ_ｉの第１物体尤度、第２物体尤度である。 Here, L _A (i) and L _B (i) are the first object likelihood and the second object likelihood of category C _i , respectively.

例えば、統合尤度算出部２３０は、図１５のように統合尤度を算出する。 For example, the integrated likelihood calculating unit 230 calculates the integrated likelihood as shown in FIG.

次に、把持対象認識部２４０は、算出した統合尤度を用いて、把持対象５０２のカテゴリを認識する（ステップＳ２１１）。ここで、把持対象認識部２４０は、例えば、数１３式に従って、把持対象５０２のカテゴリｉｄｘを特定する。 Next, the gripping target recognition unit 240 recognizes the category of the gripping target 502 using the calculated integrated likelihood (step S211). Here, the gripping target recognition unit 240 specifies the category idx of the gripping target 502, for example, according to Expression 13.

ここで、Ｌ_{ｔｈ＿ｃｏｍｂ}は、予め設定された、統合尤度Ｌ_ｃｏｍｂ（ｉ）の最大値に対する閾値である。 Here, L _{th_comb} is a preset threshold value for the maximum value of the integrated likelihood L _comb (i).

例えば、把持対象認識部２４０は、図１５の統合尤度算出結果をもとに、把持対象５０２のカテゴリを、統合尤度が最大であるカテゴリＣ_３と特定する。 For example, the gripping target recognition unit 240 identifies the category of the gripping target 502 as the category C ₃ having the largest integrated likelihood based on the integrated likelihood calculation result of FIG. 15.

把持対象認識部１１６は、ステップＳ２１１の結果に応じて、認識結果に、把持対象５０２のカテゴリのインデックス、または、「該当なし」を設定し（ステップＳ２１２）、出力する（ステップＳ２１３）。 The gripping target recognition unit 116 sets the category index of the gripping target 502 or “not applicable” in the recognition result according to the result of step S211 (step S212) and outputs it (step S213).

以上により、本発明の第２の実施の形態の動作が完了する。 With the above, the operation of the second exemplary embodiment of the present invention is completed.

本発明の第２の実施の形態によれば、本発明の第１の実施の形態に比べて、把持対象５０２の認識精度を向上できる。その理由は、把持対象認識部２４０が、把持特徴量に基づく第１物体尤度と物体特徴量に基づく第２物体尤度を用いて算出された統合尤度をもとに、把持対象５０２を認識するためである。これにより、例えば、把持対象５０２を把持している把持手段５０１の画像において、把持対象５０２のほとんどが遮蔽されているが、一部が存在するような場合に、把持対象５０２の認識精度を向上できる。 According to the second embodiment of the present invention, the recognition accuracy of the grip target 502 can be improved as compared with the first embodiment of the present invention. The reason is that the gripping target recognition unit 240 determines the gripping target 502 based on the integrated likelihood calculated using the first object likelihood based on the gripping feature amount and the second object likelihood based on the object feature amount. This is for recognition. Thereby, for example, when most of the gripping target 502 is shielded in the image of the gripping unit 501 gripping the gripping target 502, but a part thereof is present, the recognition accuracy of the gripping target 502 is improved. it can.

＜第３の実施の形態＞
次に、本発明の第３の実施の形態について説明する。 <Third Embodiment>
Next, a third embodiment of the present invention will be described.

本発明の第３の実施の形態では、把持手段５０１の複数の所定部位の内、把持対象５０２と接触している部位（把持対象５０２と接触している指）について、把持特徴量を生成する。 In the third embodiment of the present invention, a gripping characteristic amount is generated for a part of the plurality of predetermined parts of the gripping means 501 that is in contact with the grip target 502 (finger that is in contact with the grip target 502). ..

はじめに、本発明の第３の実施の形態の構成を説明する。 First, the configuration of the third embodiment of the present invention will be described.

図１６は、本発明の第３の実施の形態における、認識装置２００の構成を示すブロック図である。 FIG. 16: is a block diagram which shows the structure of the recognition apparatus 200 in the 3rd Embodiment of this invention.

図１６を参照すると、本発明の第３の実施の形態の認識装置２００は、本発明の第２の実施の形態の認識装置２００の構成要素に加えて、接触検出部２５０を含む。 Referring to FIG. 16, the recognition device 200 according to the third exemplary embodiment of the present invention includes a contact detection unit 250 in addition to the components of the recognition device 200 according to the second exemplary embodiment of the present invention.

接触検出部２５０は、把持対象検出部２２１により検出された把持対象５０２の物体領域と、把持手段検出部１１１により検出された各指の位置とをもとに、検出された指の内の把持対象５０２と接触している指（接触指）を特定する。接触検出部２５０は、例えば、注目する指の位置を示す座標値と、その座標値に最も近い物体領域との距離が所定の閾値未満の場合、当該指が接触指であると判定する。 The contact detection unit 250 grasps the inside of the detected fingers based on the object region of the grasped object 502 detected by the grasped object detection unit 221 and the position of each finger detected by the grasping means detection unit 111. A finger (contact finger) in contact with the target 502 is specified. For example, when the distance between the coordinate value indicating the position of the target finger and the object region closest to the coordinate value is less than a predetermined threshold value, the contact detection unit 250 determines that the finger is the contact finger.

把持特徴生成部２１２は、接触指に係る所定部位間の位置関係を表す把持特徴量を生成する。 The gripping-feature generating unit 212 generates a gripping-feature amount that represents a positional relationship between predetermined parts related to the contact finger.

次に、本発明の第３の実施の形態の動作を説明する。 Next, the operation of the third exemplary embodiment of the present invention will be described.

図１７は、本発明の第３の実施の形態における、認識装置２００の動作を示すフローチャートである。 FIG. 17 is a flowchart showing the operation of the recognition device 200 according to the third embodiment of the present invention.

はじめに、画像取得部２１０は、把持対象５０２を把持している把持手段５０１の画像を取得する（ステップＳ３０１）。 First, the image acquisition unit 210 acquires an image of the grip means 501 that holds the grip target 502 (step S301).

把持手段検出部２１１は、画像取得部２１０により取得された画像における、各指の位置、または、各指の位置と方向を検出する（ステップＳ３０２）。 The gripping unit detection unit 211 detects the position of each finger or the position and direction of each finger in the image acquired by the image acquisition unit 210 (step S302).

把持対象検出部２２１は、画像取得部２１０により取得された画像における、把持対象５０２の物体領域を検出する（ステップＳ３０３）。 The grip target detection unit 221 detects the object area of the grip target 502 in the image acquired by the image acquisition unit 210 (step S303).

把持手段検出部２１１は、画像において検出された指の本数が、２本以上かどうかを判定する（ステップＳ３０４）。 The gripping unit detection unit 211 determines whether the number of fingers detected in the image is two or more (step S304).

ステップＳ３０４において、検出された指の本数が２本以上の場合（ステップＳ３０４／Ｙ）、接触検出部２５０は、検出された指の内の接触指を特定する（ステップＳ３０６）。 In step S304, when the number of detected fingers is two or more (step S304/Y), the contact detection unit 250 identifies the contact finger among the detected fingers (step S306).

接触検出部２５０は、接触指の本数が、２本以上かどうかを判定する（ステップＳ３０７）。 The contact detection unit 250 determines whether the number of contact fingers is two or more (step S307).

ステップＳ３０７において、接触指の本数が２本以上の場合（ステップＳ３０７／Ｙ）、把持特徴生成部２１２は、検出された各接触指の位置、または、各接触指の位置と方向をもとに、接触指間の位置関係を表す把持特徴量を生成する（ステップＳ３０８）。 In step S307, when the number of contact fingers is two or more (step S307/Y), the gripping feature generation unit 212 determines the position of each contact finger or the position and direction of each contact finger based on the detected position. , And generates a gripping feature amount indicating a positional relationship between the contact fingers (step S308).

第１物体尤度算出部２１３は、生成された把持特徴量をもとに、物体のカテゴリ毎に第１物体尤度を算出する（ステップ３０９）。 The first object likelihood calculation unit 213 calculates the first object likelihood for each object category based on the generated gripping feature amount (step 309).

なお、ステップＳ３０４において、検出された指の本数が２本未満の場合（ステップＳ３０４／Ｎ）、または、ステップＳ３０７において、接触指の本数が２本未満の場合（ステップＳ３０７／Ｎ）、全カテゴリに対する第１物体尤度が１に設定される。 If the number of detected fingers is less than 2 in step S304 (step S304/N), or if the number of contact fingers is less than 2 in step S307 (step S307/N), all categories The first object likelihood for is set to 1.

以降、物体特徴量の生成、第２物体尤度の算出、統合尤度の算出、及び、把持対象５０２のカテゴリの認識（ステップＳ３１０〜Ｓ３１５）が、本発明の第２の実施の形態（ステップＳ２０８〜Ｓ２１３）と同様に行われる。 After that, the generation of the object feature amount, the calculation of the second object likelihood, the calculation of the integrated likelihood, and the recognition of the category of the grip target 502 (steps S310 to S315) are the second embodiment (step of the present invention). This is performed in the same manner as S208 to S213).

以上により、本発明の第３の実施の形態の動作が完了する。 With the above, the operation of the third exemplary embodiment of the present invention is completed.

本発明の第３の実施の形態によれば、本発明の第１の実施の形態に比べて、把持対象５０２の認識精度を向上できる。その理由は、把持特徴生成部２１２が、把持手段５０１の複数の所定部位の内、把持対象５０２と接触している部位（把持対象５０２と接触している指）間の位置関係を表す把持特徴量を生成するためである。これにより、把持特徴量から、把持対象５０２と接触していない部位（接触していない指）に係る情報を除外することができ、把持手段５０１による把持に寄与していない部位の位置の影響を受けずに、把持対象５０２のカテゴリを特定できる。 According to the third embodiment of the present invention, it is possible to improve the recognition accuracy of the grip target 502 as compared with the first embodiment of the present invention. The reason is that the gripping feature generation unit 212 indicates the gripping feature that represents the positional relationship between the parts that are in contact with the grip target 502 (the fingers that are in contact with the grip target 502) among the plurality of predetermined parts of the grip means 501. This is because it produces a quantity. As a result, it is possible to exclude, from the gripping feature amount, information related to a part that is not in contact with the gripping target 502 (a finger that is not in contact), and the influence of the position of the part that does not contribute to gripping by the gripping unit 501 is affected. The category of the grip target 502 can be specified without receiving it.

＜第４の実施の形態＞
次に、本発明の第４の実施の形態について説明する。 <Fourth Embodiment>
Next, a fourth embodiment of the present invention will be described.

本発明の第４の実施の形態では、把持手段５０１の複数の所定部位の間の位置関係の時間的な変化を示す把持特徴を生成する。 In the fourth embodiment of the present invention, a gripping feature indicating a temporal change in the positional relationship between a plurality of predetermined parts of the gripping means 501 is generated.

はじめに、本発明の第４の実施の形態の構成を説明する。 First, the configuration of the fourth embodiment of the present invention will be described.

図１８は、本発明の第４の実施の形態における、認識装置２００の構成を示すブロック図である。 FIG. 18: is a block diagram which shows the structure of the recognition apparatus 200 in the 4th Embodiment of this invention.

図１８を参照すると、本発明の第４の実施の形態の構成は、本発明の第２の実施の形態において、把持特徴生成部２１２が把持特徴生成部２６０に置き換えられている。 Referring to FIG. 18, in the configuration of the fourth exemplary embodiment of the present invention, in the second exemplary embodiment of the present invention, the gripping characteristic generation unit 212 is replaced with the gripping characteristic generation unit 260.

把持特徴生成部２６０は、把持手段５０１の複数の所定部位の間の位置関係の時間的な変化を示す、動き特徴を含む把持特徴量を生成する。把持特徴生成部２６０は、フレーム特徴生成部２６１、フレーム特徴記憶部２６２、及び、動き特徴抽出部２６３を含む。 The gripping feature generation unit 260 generates a gripping feature amount including a motion feature, which indicates a temporal change in the positional relationship between a plurality of predetermined parts of the gripping unit 501. The grip feature generation unit 260 includes a frame feature generation unit 261, a frame feature storage unit 262, and a motion feature extraction unit 263.

フレーム特徴生成部２６１は、把持特徴生成部２１２と同様の方法により、画像のフレーム毎の把持特徴量を生成する。 The frame feature generation unit 261 generates the gripping feature amount for each frame of the image by the same method as the gripping feature generation unit 212.

フレーム特徴記憶部２６２は、フレーム特徴生成部２６１により生成された、フレーム毎の把持特徴量を、所定のフレーム数分記憶する。 The frame feature storage unit 262 stores the gripping feature amount for each frame generated by the frame feature generation unit 261 for a predetermined number of frames.

動き特徴抽出部２６３は、フレーム毎の把持特徴量の差分をもとに、動き特徴を抽出し、動き特徴を含む把持特徴量を生成する。 The motion feature extraction unit 263 extracts the motion feature based on the difference in the grip feature amount for each frame, and generates the grip feature amount including the motion feature.

次に、本発明の第４の実施の形態の動作を説明する。 Next, the operation of the fourth exemplary embodiment of the present invention will be described.

図１９は、本発明の第４の実施の形態における、認識装置２００の動作を示すフローチャートである。 FIG. 19 is a flowchart showing the operation of the recognition device 200 according to the fourth embodiment of the present invention.

はじめに、把持特徴生成部２６０のフレーム特徴生成部２６１は、フレームを示す変数ｔ（ｔ＝１,…，Ｎｆ。Ｎｆは、動き特徴を含む把持特徴量を生成するためのフレーム数）に１を設定する（ステップＳ４０１）。 First, the frame feature generation unit 261 of the gripping feature generation unit 260 sets 1 to a variable t (t=1,..., Nf. Nf is the number of frames for generating a gripping feature amount including a motion feature) indicating a frame. It is set (step S401).

画像取得部２１０は、把持対象５０２を把持している把持手段５０１の画像を１フレーム取得する（ステップＳ４０２）。 The image acquisition unit 210 acquires one frame of the image of the grip unit 501 that holds the grip target 502 (step S402).

把持手段検出部２１１は、画像取得部２１０により取得されたフレーム（対象フレーム）における、各指の位置、または、各指の位置と方向を検出する（ステップＳ４０３）。 The gripping unit detection unit 211 detects the position of each finger or the position and direction of each finger in the frame (target frame) acquired by the image acquisition unit 210 (step S403).

把持対象検出部２２１は、画像取得部２１０により取得された対象フレームにおける、把持対象５０２の物体領域を検出する（ステップＳ４０４）。 The grip target detection unit 221 detects the object area of the grip target 502 in the target frame acquired by the image acquisition unit 210 (step S404).

把持手段検出部２１１は、対象フレームにおいて検出された指の本数が、２本以上かどうかを判定する（ステップＳ４０５）。 The gripping unit detection unit 211 determines whether the number of fingers detected in the target frame is two or more (step S405).

ステップＳ４０５において、検出された指の本数が２本以上の場合（ステップＳ４０５／Ｙ）、フレーム特徴生成部２６１は、検出された各指の位置、または、各指の位置と方向をもとに、対象フレームｔでの把持特徴量Ｖ（ｔ）を生成する（ステップＳ４０７）。ここで、フレーム特徴生成部２６１は、把持特徴量Ｖ（ｔ）として、例えば、本発明の第１の実施の形態の把持特徴量生成方法で示した、把持特徴量Ｖ_Ａ、Ｖ_Ｂ、Ｖ_Ｃ、Ｖ_Ｄの内のいずれかを生成する。 In step S405, when the number of detected fingers is two or more (step S405/Y), the frame feature generation unit 261 determines the position of each finger or the position and direction of each finger based on the detected position. The gripping feature amount V(t) in the target frame t is generated (step S407). Here, the frame feature generation unit 261 uses, as the gripping feature amount V(t), for example, the gripping feature amounts V _A , V _{B, and} V shown in the gripping feature amount generation method according to the first embodiment of the present invention. Generate either _{C or} V _D.

フレーム特徴生成部２６１は、変数ｔがＮｆ以上かどうかを判定する（ステップＳ４０８）。 The frame feature generation unit 261 determines whether the variable t is Nf or more (step S408).

ステップＳ４０８で、変数ｔがＮｆ未満の場合（ステップＳ４０８／Ｎ）、フレーム特徴生成部２６１は、生成した把持特徴量Ｖ（ｔ）をフレーム特徴記憶部２６２に保存し、変数ｔに１を加算する（ステップＳ４０９）。そして、ステップＳ４０２からの処理が繰り返される。 If the variable t is less than Nf in step S408 (step S408/N), the frame feature generation unit 261 saves the generated gripping feature amount V(t) in the frame feature storage unit 262, and adds 1 to the variable t. Yes (step S409). Then, the processing from step S402 is repeated.

一方、ステップＳ４０８で、変数ｔがＮｆ以上の場合（ステップＳ４０８／Ｙ）、動き特徴抽出部２６３は、フレーム特徴記憶部２６２に記憶されている、フレーム毎の把持特徴量の差分を算出する（ステップＳ４１０）。 On the other hand, in step S408, when the variable t is Nf or more (step S408/Y), the motion feature extraction unit 263 calculates the difference in the gripping feature amount for each frame stored in the frame feature storage unit 262 ( Step S410).

動き特徴抽出部２６３は、算出した差分をもとに、動き特徴を含む把持特徴量を生成する（ステップＳ４１１）。 The motion feature extraction unit 263 generates a gripping feature amount including a motion feature based on the calculated difference (step S411).

ここで、動き特徴抽出部２６３は、例えば、数１４式により、動き特徴を含む把持特徴量Ｖ_movを生成する。 Here, the motion feature extraction unit 263 generates the gripping feature amount V _mov including the motion feature, for example, by using Expression 14.

図２０は、本発明の第４の実施の形態における、動き特徴を含む把持特徴量の算出例を示す図である。図２０は、Ｎｆが３の場合の例である。例えば、動き特徴抽出部２６３は、ｔ＝１，２，３における把持特徴量Ｖ（ｔ）をもとに、図２０のように、動き特徴を含む把持特徴量Ｖ_movを生成する。 FIG. 20 is a diagram showing an example of calculation of a gripping feature amount including a motion feature according to the fourth embodiment of the present invention. FIG. 20 is an example when Nf is 3. For example, the motion feature extraction unit 263 generates a grip feature amount V _mov including a motion feature based on the grip feature amount V(t) at t=1, 2, 3 as shown in FIG.

次に、第１物体尤度算出部２１３は、生成された動き特徴を含む把持特徴量をもとに、物体のカテゴリ毎に第１物体尤度を算出する（ステップＳ４１２）。 Next, the first object likelihood calculation unit 213 calculates the first object likelihood for each category of the object based on the generated gripping feature amount including the motion feature (step S412).

なお、ステップＳ４０５において、検出された指の本数が２本未満の場合（ステップＳ４０５／Ｎ）、全カテゴリに対する第１物体尤度に１が設定される（ステップＳ４０６）。 In addition, in step S405, when the number of detected fingers is less than two (step S405/N), 1 is set to the first object likelihood for all categories (step S406).

次に、物体特徴生成部２２２は、検出された把持対象５０２の物体領域から、物体特徴量を生成する（ステップＳ４１３）。ここで、物体特徴生成部２２２は、Ｎｆ個のフレームの内、１番目のフレームや、Ｎｆ番目のフレームで検出された把持対象５０２の物体領域から、物体特徴量を生成する。また、物体特徴生成部２２２は、Ｎｆ個のフレームの各々で検出された把持対象５０２の物体領域から生成した物体特徴量の平均値を算出してもよい。 Next, the object feature generation unit 222 generates an object feature amount from the detected object region of the grip target 502 (step S413). Here, the object feature generation unit 222 generates an object feature amount from the first frame of the Nf frames or the object region of the grip target 502 detected in the Nfth frame. Further, the object feature generation unit 222 may calculate an average value of the object feature amounts generated from the object region of the grip target 502 detected in each of the Nf frames.

以降、第２物体尤度の算出、統合尤度の算出、及び、把持対象５０２のカテゴリの認識（ステップＳ４１４〜Ｓ４１８）が、本発明の第２の実施の形態（ステップＳ２０９〜Ｓ２１３）と同様に行われる。 After that, the calculation of the second object likelihood, the calculation of the integrated likelihood, and the recognition of the category of the grip target 502 (steps S414 to S418) are the same as those in the second embodiment of the present invention (steps S209 to S213). To be done.

以上により、本発明の第４の実施の形態の動作が完了する。 With the above, the operation of the fourth exemplary embodiment of the present invention is completed.

本発明の第４の実施の形態によれば、把持手段５０１による把持に時間的な変化がある場合に、把持対象５０２の認識精度を向上できる。その理由は、把持特徴生成部２６０が、把持手段５０１の複数の所定部位の間の位置関係の時間的な変化を示す、動き特徴を含む把持特徴量を生成するためである。これにより、例えば、柔らかい物体等、把持手段５０１による把持中に、形状が時間的に変化する把持対象５０２を、硬い物体等、形状が時間的に変化しない把持対象５０２と識別できる。また、スマートフォンの操作等、把持手段５０１の所定部位（指）が、把持対象５０２上を移動する場合にも、把持対象５０２の認識精度を向上できる。 According to the fourth embodiment of the present invention, the recognition accuracy of the grip target 502 can be improved when the grip by the grip means 501 changes with time. The reason is that the gripping feature generation unit 260 generates a gripping feature amount including a motion feature, which indicates a temporal change in the positional relationship between the plurality of predetermined portions of the gripping unit 501. Accordingly, for example, a gripping target 502 whose shape changes with time during gripping by the gripping unit 501, such as a soft object, can be distinguished from a gripping target 502 whose shape does not change with time, such as a hard object. Further, even when a predetermined part (finger) of the grip means 501 moves on the grip target 502 such as a smartphone operation, the recognition accuracy of the grip target 502 can be improved.

以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described with reference to the exemplary embodiments, the present invention is not limited to the above exemplary embodiments. Various modifications that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

例えば、本発明の第３の実施の形態で説明した、把持手段５０１の複数の所定部位の内の把持対象５０２と接触している部位についての把持特徴量の生成は、把持対象５０２との接触が検出できれば、本発明の第１の実施に適用されてもよい。 For example, as described in the third embodiment of the present invention, the generation of the gripping characteristic amount of the part in contact with the gripping target 502 among the plurality of predetermined parts of the gripping means 501 is performed by the contact with the gripping target 502. If it can be detected, it may be applied to the first embodiment of the present invention.

また、本発明の第４の実施の形態で説明した、動き特徴を含む把持特徴量の生成は、本発明の第１の実施や第２の実施の形態に適用されてもよい。 Further, the generation of the gripping feature amount including the motion feature described in the fourth embodiment of the present invention may be applied to the first embodiment or the second embodiment of the present invention.

１００認識装置
１０１ＣＰＵ
１０２記憶デバイス
１０３通信デバイス
１０４入力デバイス
１０５出力デバイス
１１０画像取得部
１１１把持手段検出部
１１２把持特徴生成部
１１３物体尤度算出部
１１４把持特徴記憶部
１１５把持特徴情報
１１６把持対象認識部
２００認識装置
２１０画像取得部
２１１把持手段検出部
２１２把持特徴生成部
２１３第１物体尤度算出部
２１４把持特徴記憶部
２１５把持特徴情報
２２１把持対象検出部
２２２物体特徴生成部
２２３第２物体尤度算出部
２２４物体特徴記憶部
２２５物体特徴情報
２３０統合尤度算出部
２４０把持対象認識部
２５０接触検出部
２６０把持特徴生成部
２６１フレーム特徴生成部
２６２フレーム特徴記憶部
２６３動き特徴抽出部
５０１把持手段
５０２把持対象 100 recognition device 101 CPU
Reference numeral 102 storage device 103 communication device 104 input device 105 output device 110 image acquisition unit 111 gripping means detection unit 112 gripping feature generation unit 113 object likelihood calculation unit 114 gripping feature storage unit 115 gripping feature information 116 gripping target recognition unit 200 recognition device 210 Image acquisition unit 211 Gripping means detection unit 212 Gripping feature generation unit 213 First object likelihood calculation unit 214 Gripping feature storage unit 215 Gripping feature information 221 Gripping target detection unit 222 Object feature generation unit 223 Second object likelihood calculation unit 224 Object Feature storage unit 225 Object feature information 230 Integrated likelihood calculation unit 240 Gripping target recognition unit 250 Contact detection unit 260 Gripping feature generation unit 261 Frame feature generation unit 262 Frame feature storage unit 263 Motion feature extraction unit 501 Gripping unit 502 Gripping target

Claims

A storage unit that stores an object and a gripping feature indicating a positional relationship between a plurality of predetermined portions of the gripping unit when the gripping unit grips the object in association with each other,
An image acquisition means for acquiring an image of said gripping means gripping the gripping target,
Generating the gripping characteristics of the gripping means in the image, a gripping feature generating means,
It said gripping feature acquired from the storage unit, on the basis of, and the gripping features generated by the gripping feature generating means, recognizing, gripping target recognition means that the gripping target is the object,
Equipped with
The information processing apparatus, wherein the gripping feature indicates a positional relationship between the plurality of predetermined parts based on a distance between the plurality of predetermined parts.

Furthermore, an object feature generation unit that generates an object feature indicating at least one of the color and the pattern of the grip target in the image is provided,
The gripping target recognizing means recognizes said gripping feature acquired from the storage means, the gripping feature generated by the gripping feature generating means and said object, wherein, on the basis of, the gripping target, wherein The information processing device according to item 1.

The gripping characteristic generation means generates the gripping characteristic indicating a positional relationship between the portions of the plurality of predetermined portions that are in contact with the gripping target.
The information processing apparatus according to claim 1.

The gripping characteristic generation means generates a gripping characteristic indicating a temporal change in the positional relationship between the plurality of predetermined parts,
The information processing apparatus according to claim 1.

The predetermined portion is a finger included in the gripping means,
The information processing apparatus according to claim 1.

An object and a gripping feature indicating a positional relationship between a plurality of predetermined parts of the gripping means when the gripping means grips the object are stored in association with each other,
Acquiring an image of said gripping means gripping the gripping target,
It generates the gripping characteristics of the gripping means in the image,
It said gripping feature is the storage, on the basis of, and the gripping feature the generated, recognizing that the gripping target is the object,
An information processing method ,
The information processing method, wherein the gripping feature indicates a positional relationship between the plurality of predetermined parts based on a distance between the plurality of predetermined parts.

On the computer,
An object and a gripping feature indicating a positional relationship between a plurality of predetermined parts of the gripping means when the gripping means grips the object are stored in association with each other,
Acquiring an image of said gripping means gripping the gripping target,
It generates the gripping characteristics of the gripping means in the image,
Based said gripping feature is the storage, and the gripping feature the generated, and recognizes that the gripping target is the object,
A program Ru to execute the processing,
The gripping feature is a program that indicates a positional relationship between the plurality of predetermined portions based on a distance between the plurality of predetermined portions.