JP6066093B2

JP6066093B2 - Finger shape estimation device, finger shape estimation method, and finger shape estimation program

Info

Publication number: JP6066093B2
Application number: JP2013537565A
Authority: JP
Inventors: 聖星野; 遥平豊原; 教彰藤嶋
Original assignee: University of Tsukuba NUC
Current assignee: University of Tsukuba NUC
Priority date: 2011-10-07
Filing date: 2012-10-05
Publication date: 2017-01-25
Anticipated expiration: 2032-10-05
Also published as: WO2013051681A1; JPWO2013051681A1

Description

本発明は、手指形状推定装置、手指形状推定方法、及び手指形状推定プログラムに係り、特に画像中の手指の形状や動きを高速かつ高精度に推定するための手指形状推定装置、手指形状推定方法、及び手指形状推定プログラムに関する。 The present invention relates to a finger shape estimation device, a finger shape estimation method, and a finger shape estimation program, and in particular, a finger shape estimation device and a finger shape estimation method for estimating the shape and movement of a finger in an image at high speed and with high accuracy. And a finger shape estimation program.

従来では、例えば決まった形の物体であれば、自動制御でロボット等にその物体を把持させることが可能となっている。しかしながら、様々な形状の物体を人間が思っているように自在に把持させることは困難である。 Conventionally, for example, if an object has a fixed shape, it is possible to cause the robot or the like to grip the object by automatic control. However, it is difficult to freely grip objects of various shapes as humans think.

そこで、従来では、人間の手指にセンサやマーカー等を取り付け、取り付けたセンサやマーカー等から人間の手指の動作を検知し、検知した手指動作にしたがってロボット等を駆動させる仕組みが存在する。なお、上述したセンサやマーカー等を用いずにロボット等をジェスチャー駆動することができれば、直観的で自由度が高く、入力時の拘束が少ない状態で、遠隔操作によりロボットを自在に動かして所望する作業をロボットにさせることができる。また、情報通信端末等も日常動作と同じように振る舞うことで操作できるため、操作方法の事前の習熟も不要となる。 Therefore, conventionally, there is a mechanism in which a sensor or marker is attached to a human finger, the movement of the human finger is detected from the attached sensor or marker, and the robot is driven according to the detected finger movement. If the robot or the like can be gesture-driven without using the above-described sensor or marker, the robot can be moved freely by remote operation in a state where it is intuitive and has a high degree of freedom and there are few restrictions during input. You can make the robot do the work. In addition, since an information communication terminal or the like can be operated by behaving in the same way as daily operations, prior learning of the operation method is not necessary.

なお、従来では、例えば、画像から読み取った手指の輪郭線等から得られる画像特徴量等を用いて、データベースによる類似度検索を行い、手指形状を決定する手法、すなわち手指モーションキャプチャ等が用いられている（例えば、特許文献１及び２等参照）。 Conventionally, for example, a technique for performing a similarity search using a database and determining a finger shape using an image feature amount obtained from an outline of a finger read from an image, that is, a finger motion capture is used. (See, for example, Patent Documents 1 and 2).

国際公開第２００５／０４６９４２号パンフレットInternational Publication No. 2005/046942 Pamphlet 国際公開第２００９／１４７９０４号パンフレットInternational Publication No. 2009/147904

しかしながら、特許文献１に示されているような技術では、入力画像及びデータベースの照合用画像のそれぞれから読み取った手指の輪郭線のみを抽出した画像を、６４分割し、縦線・横線・斜線・折れ線・ドット等に相当する画像特徴量により手指形状を表現していた。したがって、データベースに登録された手指の形状よりも、指が著しく太い人や長い人が当該システムを利用する場合には、同じ手指の形状の特徴量を算出しても異なる値となってしまい、登録されたデータベースには、１つの分類に収まるはずの情報が２つの異なる分類に分かれて格納され、誤推定をしてしまうことがある。 However, in the technique as disclosed in Patent Document 1, an image obtained by extracting only the contour line of a finger read from each of an input image and a database verification image is divided into 64, and a vertical line, horizontal line, diagonal line, The finger shape is expressed by image feature amounts corresponding to broken lines, dots, and the like. Therefore, if a person whose finger is significantly thicker or longer than the finger shape registered in the database uses the system, even if the feature amount of the same finger shape is calculated, it becomes a different value. In a registered database, information that should fit in one classification is stored in two different classifications, which may cause erroneous estimation.

また、特許文献２に示されているような技術では、撮像画像及び照合用画像の各画像を６４分割し、かつ各分割領域を２５次元の画像特徴量で表現したため、各画像が合計で１６００次元の特徴量を持つことになる。そのため、特許文献２に示されているような技術では、計算機のメモリにアップロードできる手指形状の種類に限界があり、ビデオレート或いはその２倍の速さの実時間処理を行うためには、せいぜい３００００種類程度の形状しか推定できない。その結果、特許文献２に示されている技術は、個人差を持つ手指形状に対して高速かつ高精度な推定は困難であった。 Further, in the technique as disclosed in Patent Document 2, each image of the captured image and the image for verification is divided into 64 and each divided region is expressed by a 25-dimensional image feature amount. It has dimension features. For this reason, in the technique as disclosed in Patent Document 2, there is a limit to the types of finger shapes that can be uploaded to the memory of a computer, and in order to perform real-time processing at a video rate or twice as fast as that, at most. Only about 30,000 types of shapes can be estimated. As a result, it is difficult for the technique disclosed in Patent Document 2 to estimate the finger shape having individual differences at high speed and with high accuracy.

つまり、従来手法では、個人差に対応する手段として、個人差に対応したデータ（データセット）を新たにデータベースに追加するしかなく、これは大きな手間であり、データベースによる照合処理の時間も増大してしまう。ここで、個人差とは、例えば手の形（骨の長さ、太さ、掌と指の比率）や、手の動かし方（関節可動域、その中での姿勢の取り方）の差等を含む。 In other words, in the conventional method, as a means for dealing with individual differences, data (data sets) corresponding to individual differences must be newly added to the database, which is a great effort and increases the time required for collation processing by the database. End up. Here, individual differences include, for example, differences in hand shape (bone length, thickness, palm-to-finger ratio), hand movement (joint range of motion, and how to take posture within it), etc. including.

本発明は、上述した課題に鑑みてなされたものであって、画像中の手指の形状や動きを高速かつ高精度に推定するための手指形状推定装置、手指形状推定方法、及び手指形状推定プログラムを提供することを目的とする。 The present invention has been made in view of the above-described problems, and is a finger shape estimation device, a finger shape estimation method, and a finger shape estimation program for estimating the shape and movement of a finger in an image at high speed and with high accuracy. The purpose is to provide.

上述した課題を解決するために、本件発明は、以下の特徴を有する。 In order to solve the above-described problems, the present invention has the following features.

本発明の手指形状推定装置は、手指形状を含む画像を取得する画像取得部と、前記画像取得部により取得された画像を解析して、前記画像中に含まれる手指の尾根線形状に対応する第１の特徴量を取得する画像解析部と、前記画像解析部により得られた前記第１の特徴量に基づいて、予め設定された所定の手指形状に対応する第２の特徴量が蓄積された照合用のデータベースを参照し、前記第１の特徴量に対応する手指形状を推定する手指形状推定部とを備え、前記画像解析部は、前記画像中に含まれる手指画像を前景とし、該手指画像以外の画像を背景として、前記前景画像における前記背景画像からの距離を高さと見なすことで前記画像を１つの山状の画像と見なし、該山状の画像から前記尾根線形状の情報を取得する。 The finger shape estimation apparatus of the present invention corresponds to the ridge line shape of a finger included in the image by analyzing an image acquired by the image acquisition unit that acquires an image including the finger shape and the image acquisition unit. An image analysis unit that acquires a first feature amount, and a second feature amount corresponding to a predetermined finger shape set in advance is accumulated based on the first feature amount obtained by the image analysis unit. And a finger shape estimation unit that estimates a finger shape corresponding to the first feature amount, the image analysis unit uses the finger image included in the image as a foreground, With the image other than the finger image as a background, the distance from the background image in the foreground image is regarded as a height so that the image is regarded as one mountain-shaped image, and the ridge line shape information is obtained from the mountain-shaped image. To get .

また、本発明の手指形状推定方法は、手指形状を含む画像を取得することと、前記取得された画像を解析して、前記画像中に含まれる手指の尾根線形状に対応する第１の特徴量を取得することと、前記第１の特徴量に基づいて、予め設定された所定の手指形状に対応する第２の特徴量が蓄積された照合用のデータベースを参照し、前記第１の特徴量に対応する手指形状を推定することとを含み、前記第１の特徴量を取得することは、前記画像中に含まれる手指画像を前景とし、該手指画像以外の画像を背景として、前記前景画像における前記背景画像からの距離を高さと見なすことで前記画像を１つの山状の画像と見なし、該山状の画像から前記尾根線形状の情報を取得することを含む。 In addition, the finger shape estimation method of the present invention obtains an image including a finger shape, analyzes the acquired image, and corresponds to a ridge line shape of a finger included in the image. Acquiring a quantity, and referring to a database for collation in which a second feature quantity corresponding to a predetermined finger shape set in advance is stored based on the first feature quantity, and the first feature look including the estimating a human hand posture corresponding to the amount, the first obtaining a feature amount, and the hand image included in the image and the foreground, the background of images other than該手finger images, wherein Considering the image as one mountain-shaped image by regarding the distance from the background image in the foreground image as a height, obtaining the ridge line shape information from the mountain-shaped image .

また、本発明の手指形状推定プログラムは、上記本発明の手指形状推定方法の各処理を、情報処理装置に実装して実行させるための手指形状推定プログラムである。 Moreover, the finger shape estimation program of the present invention is a finger shape estimation program for causing each processing of the finger shape estimation method of the present invention to be implemented in an information processing apparatus and executed.

本発明によれば、画像中の手指の形状や動きを高速かつ高精度に推定することができる。 According to the present invention, the shape and movement of a finger in an image can be estimated at high speed and with high accuracy.

図１は、第１の実施形態における手指形状推定装置の機能構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a functional configuration of the finger shape estimation apparatus according to the first embodiment. 図２は、第１の実施形態における手指形状推定処理が実現可能なハードウェア構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration capable of realizing the finger shape estimation process according to the first embodiment. 図３は、第１の実施形態における手指形状推定処理手順の一例を示すフローチャートである。FIG. 3 is a flowchart illustrating an example of a finger shape estimation process procedure according to the first embodiment. 図４は、第１の実施形態における照合処理の概要を説明するための図である。FIG. 4 is a diagram for explaining the outline of the collation processing in the first embodiment. 図５は、第１の実施形態における照合用データベース構築処理の手順の一例を示すフローチャートである。FIG. 5 is a flowchart illustrating an example of the procedure of the collation database construction process in the first embodiment. 図６Ａ〜６Ｄは、データグローブを用いて得られた複数の手指形状画像の一例を示す図である。6A to 6D are diagrams illustrating examples of a plurality of finger shape images obtained using a data glove. 図７Ａ及び７Ｂは、データベース構造の一例を示す図である。7A and 7B are diagrams illustrating an example of a database structure. 図８は、画像形状比率の算出に必要な各種パラメータの一例を示す図である。FIG. 8 is a diagram illustrating an example of various parameters necessary for calculating the image shape ratio. 図９Ａ〜９Ｃは、画像特徴量を取得する基準となるデータの一例を示す図である。9A to 9C are diagrams illustrating an example of data serving as a reference for acquiring image feature amounts. 図１０Ａ及び１０Ｂは、尾根線情報抽出結果の一例を示す図である。10A and 10B are diagrams illustrating examples of ridge line information extraction results. 図１１は、８×８分割した尾根線画像の各分割領域の尾根線情報に施す高次局所自己相関処理を説明するための図である。FIG. 11 is a diagram for explaining high-order local autocorrelation processing performed on the ridge line information of each divided region of the 8 × 8 divided ridge line image. 図１２Ａ〜１２Ｃは、画素の移動方法の一例を示す図である。12A to 12C are diagrams illustrating an example of a pixel moving method. 図１３は、第１の実施形態における輪郭線走査から尾根線ベクトルを取得する処理手順の一例を示すフローチャートである。FIG. 13 is a flowchart illustrating an example of a processing procedure for acquiring a ridge line vector from contour scanning in the first embodiment. 図１４は、照合処理の第２段階目の処理手順の一例を示すフローチャートである。FIG. 14 is a flowchart illustrating an example of the processing procedure of the second stage of the collation processing. 図１５Ａ及び１５Ｂは、素手の状態と軍手を付けた状態とで同じ手指形状をした場合に得られる輪郭線情報と尾根線情報の違いの一例を示す図である。FIGS. 15A and 15B are diagrams illustrating an example of the difference between the contour line information and the ridge line information obtained when the finger shape is the same between the bare hand state and the military hand state. 図１６Ａ〜１６Ｃは、関節角度の比較結果を示す図である。16A to 16C are diagrams illustrating comparison results of joint angles. 図１７は、誤差平均、誤差標準偏差の一例を示す図である。FIG. 17 is a diagram illustrating an example of an error average and an error standard deviation. 図１８Ａ及び１８Ｂは、本発明の手指形状推定装置の適用例を示す図である。18A and 18B are diagrams showing an application example of the finger shape estimation apparatus of the present invention. 図１９は、第２の実施形態における爪領域抽出装置の機能構成の一例を示す図である。FIG. 19 is a diagram illustrating an example of a functional configuration of the nail region extraction device according to the second embodiment. 図２０は、第２の実施形態における爪領域抽出処理が実現可能なハードウェア構成の一例を示す図である。FIG. 20 is a diagram illustrating an example of a hardware configuration capable of realizing the nail region extraction process according to the second embodiment. 図２１は、第２の実施形態における爪領域抽出処理手順の一例を示すフローチャートである。FIG. 21 is a flowchart illustrating an example of a nail region extraction processing procedure according to the second embodiment. 図２２は、手指画像のＲＧＢ色空間における画素分布のモデルの一例を示す図である。FIG. 22 is a diagram illustrating an example of a pixel distribution model in the RGB color space of a finger image. 図２３は、第２の実施形態における爪領域抽出処理の具体例を示すフローチャートである。FIG. 23 is a flowchart illustrating a specific example of nail region extraction processing in the second embodiment. 図２４Ａ〜２４Ｃは、主成分軸を基底とした座標変換の一例を示す図である。24A to 24C are diagrams illustrating an example of coordinate conversion based on the principal component axis. 図２５Ａ及び２５Ｂは、分離平面位置決定手法の一例を示す図である。25A and 25B are diagrams illustrating an example of a separation plane position determination method. 図２６Ａ及び２６Ｂは、第２の実施形態において、抽出された爪領域部分を示す図である。FIGS. 26A and 26B are diagrams showing nail region portions extracted in the second embodiment. 図２７は、誤抽出確率の高い肌の分布位置の一例を示す図である。FIG. 27 is a diagram illustrating an example of a skin distribution position with a high probability of erroneous extraction. 図２８Ａ及び２８Ｂは、それぞれ爪領域を再抽出した場合及び誤抽出肌領域を再抽出した場合の実行結果の一例を示す図である。28A and 28B are diagrams illustrating examples of execution results when the nail region is re-extracted and when the erroneously extracted skin region is re-extracted, respectively. 図２９Ａ〜２９Ｃは、第２の実施形態における評価結果について説明するための図である。29A to 29C are diagrams for describing the evaluation results in the second embodiment. 図３０は、第２の実施形態における爪領域抽出装置の適用例を示す図である。FIG. 30 is a diagram illustrating an application example of the nail region extraction device according to the second embodiment.

１．第１の実施形態
＜本発明の手指形状推定技術について＞
本発明では、上述した手指の個人差の問題に対応する際、必ずしも個別の情報を新たにデータベースに加えなくても推定を行うことができるようにするために、輪郭線の情報に対応する特徴量を算出するのではなく、例えば指の中心を通る情報に着目し、この情報に対応する特徴量を算出する。これにより、指の太さが大きく異なる場合や指の長さがある程度異なる場合でも同じ特徴量を取得することができる。具体的には、まず、例えば画像中に含まれる手画像（手指画像）を前景画像とし、それ以外の画像を背景画像とする。次いで、前景画像（手画像）の各画素おいて、該画素と、それに最も近い背景画素の画素との距離（ピクセル）を高さとし、手画像を山状の画像（以下では、単に、山という）として見る。そして、該山に対して引くことのできる尾根線情報（指の中心を通る情報）を取得し、取得した尾根線情報（尾根線形状）を手指形状の推定に用いる。これにより、本発明では、特別に個人差に対応したデータベースを付加しなくてもよいため、高速かつ高精度に手指形状の推定を行うことができる。1. 1st Embodiment <About the finger shape estimation technique of this invention>
In the present invention, when dealing with the above-mentioned problem of individual differences in fingers, the feature corresponding to the information of the contour line in order to enable estimation without necessarily adding individual information to the database. Instead of calculating the amount, for example, paying attention to information passing through the center of the finger, the feature amount corresponding to this information is calculated. As a result, the same feature amount can be acquired even when the thickness of the finger is greatly different or the length of the finger is somewhat different. Specifically, first, for example, a hand image (hand image) included in the image is set as the foreground image, and other images are set as the background image. Next, in each pixel of the foreground image (hand image), the distance (pixel) between the pixel and the pixel of the background pixel closest to the pixel is set to a height, and the hand image is simply referred to as a mountain image (hereinafter simply referred to as a mountain). ) Then, ridge line information (information passing through the center of the finger) that can be drawn with respect to the mountain is acquired, and the acquired ridge line information (ridge line shape) is used for finger shape estimation. Accordingly, in the present invention, it is not necessary to add a database corresponding to individual differences, so that the finger shape can be estimated at high speed and with high accuracy.

また、本発明は、指の中心を通る情報を用いて推定を行うため、この情報のベクトル化により、該情報に対応する特徴量を取得することができる。具体的には、例えば、上述した尾根線情報に対応する特徴量を取得する際に、ベクトル化によって該情報の特徴量を算出する。尾根線情報のベクトル化により得られる特徴量は、指の形状により得られるベクトルの本数が変動するため、特徴量次元数が安定しない可能性があるが、多くとも約１００次元程度で済む。これにより、照合用データベースの各データセットが持つ特徴量次元を大幅に低減することができ、逆に大量のデータセットを増やしても、処理速度において、従来と同じ高速性を維持できる。更に、尾根線情報は、個人差に影響されにくいため、後々に新たに個人差に対応した照合用の大規模データベースを加える必要がなくなる。 Moreover, since this invention estimates using the information which passes along the center of a finger | toe, the feature-value corresponding to this information can be acquired by vectorization of this information. Specifically, for example, when acquiring a feature amount corresponding to the above-described ridge line information, the feature amount of the information is calculated by vectorization. Since the number of vectors obtained by vectorization of the ridge line information varies depending on the shape of the finger, the feature quantity dimension may not be stable, but it may be about 100 dimensions at most. Thereby, the feature quantity dimension of each data set of the database for collation can be significantly reduced. Conversely, even when a large number of data sets are increased, the same high speed as in the conventional method can be maintained. Furthermore, since the ridge line information is not easily influenced by individual differences, it is not necessary to add a new large-scale database for matching corresponding to individual differences later.

つまり、本発明は、従来手法と比較すると、例えば以下に示すような特徴を有する。なお、以下の説明は、一例であり、本発明の特徴はこれに限定されるものではない。 That is, the present invention has the following features, for example, as compared with the conventional method. In addition, the following description is an example and the characteristic of this invention is not limited to this.

＜手指形状推定に用いる情報＞
例えば、特許文献２に示すような従来技術では、入力画像を縦６４［ｐｉｘｅｌ］×横６４［ｐｉｘｅｌ］の画像に縮小し、その輪郭線情報に対応する特徴量を取得することにより画像特徴を表現した。しかしながら、本発明では、上述のように、例えば手画像を前景とし、それ以外の画像を背景としたときに前景画像の背景画像からの距離を高さと見なすことで手画像を１つの山と見なす。そして、該山に引くことのできる尾根線を手指形状の推定に用いる。尾根線情報は、指の中心を通る情報であるため、尾根線情報に対応する特徴量を算出した場合には、該特徴量に大きな変化がない。これによって、個人差の上記問題に対応するためのデータベースを新たに登録する必要なく、個人差に対応した手指形状推定を行うことができる。<Information used for finger shape estimation>
For example, in the prior art as shown in Patent Document 2, an input image is reduced to a vertical 64 [pixel] × horizontal 64 [pixel] image, and an image feature is obtained by acquiring a feature amount corresponding to the outline information. Expressed. However, in the present invention, as described above, for example, when a hand image is a foreground and other images are used as a background, the distance from the background image of the foreground image is regarded as a height, so that the hand image is regarded as one mountain. . Then, the ridge line that can be drawn on the mountain is used for estimating the finger shape. Since the ridge line information is information passing through the center of the finger, when the feature amount corresponding to the ridge line information is calculated, the feature amount does not change greatly. Accordingly, it is possible to perform finger shape estimation corresponding to individual differences without newly registering a database for dealing with the above-described problem of individual differences.

＜画像特徴量の次元数＞
例えば、特許文献２に示すような従来技術では、１つの手指データセット或いは入力画像を、縦８×横８の合計６４の区画に分割し、各区画の画像の特徴を、高次局所自己相関関数に相当するような２５パターンの点・線分・折れ線・エッジ等の低次の特徴量により表現した。その結果、１つの手指画像は８×８区画×２５パターンの合計１６００次元を持っていた。<Number of dimensions of image features>
For example, in the prior art as shown in Patent Document 2, one finger data set or input image is divided into a total of 64 sections of 8 × 8 in length, and the image features of each section are divided into higher-order local autocorrelations. It is expressed by low-order feature quantities such as 25 patterns of points, line segments, broken lines, and edges corresponding to functions. As a result, one finger image had a total of 1600 dimensions of 8 × 8 sections × 25 patterns.

それに対して本発明では、抽出した尾根線情報のベクトル化により尾根線情報の特徴量を得る。ベクトル化により得られた特徴量次元数は、入力画像により様々であるが、最大でも約１００次元程度となる。したがって、その特徴量次元数は、従来技術の１６００次元から比べると少なくとも約１／１６となり大幅な画像特徴量の削減が可能である。 On the other hand, in the present invention, the feature amount of the ridge line information is obtained by vectorizing the extracted ridge line information. The number of feature quantity dimensions obtained by vectorization varies depending on the input image, but is about 100 dimensions at the maximum. Therefore, the number of feature dimensions is at least about 1/16 compared to the conventional 1600 dimensions, which can greatly reduce the image feature.

＜データベース規模＞
例えば、特許文献２に示すような従来技術のデータベース規模は、約３００００個のデータセットであった。その数は、多数の予備実験により、特に各指が、完全屈伸及び完全伸展と、その中間の姿勢とを高精度に推定できるようにデータベースが構築された結果である。約３００００セットという数は、必ずしもコンピュータのメモリ（蓄積部）にアップロード可能な上限ではないが、推定できる分解能を更に細かくすると、必要なデータセットの桁数が爆発的に増えるため、現実的には上限に近い数であった。<Database scale>
For example, the database scale of the prior art as shown in Patent Document 2 is about 30000 data sets. The number is a result of a database constructed by a large number of preliminary experiments, in particular, so that each finger can estimate the full bending and full extension and the intermediate posture with high accuracy. The number of about 30000 sets is not necessarily the upper limit that can be uploaded to the memory (storage unit) of the computer, but if the resolution that can be estimated is made finer, the number of digits in the required data set will increase explosively, so The number was close to the upper limit.

しかしながら、本発明では、１つのデータセットあたりの特徴量次元数が約１／１６に大幅に減少するため、計算機のメモリにアップロードできる情報（データセット数）が大幅に大きくなる。１６倍のデータベース規模の拡大が可能であるため、約４８００００個のデータセットを有するデータベースを作ることが可能である。 However, according to the present invention, the number of feature dimensions per data set is greatly reduced to about 1/16, so that the information (number of data sets) that can be uploaded to the memory of the computer is greatly increased. Since the database scale can be increased by 16 times, a database having about 480000 data sets can be created.

＜推定の分解能＞
上述したように、従来技術のデータベース規模は、約３００００個のデータセットであった。四指の３関節がそれぞれ１自由度（すなわち、一定比率で連動して動く）、母指が３自由度、四指開閉が１自由度とすると、手指形状の種類は、四指及び母指のそれぞれの４段階の姿勢と、四指開閉の２段階の姿勢との組み合わせだけで３００００種類を超えてしまう。実際には、四指開閉（すなわち、四指の内外転）が推定の良し悪しに及ぼす影響が大きいため、四指開閉の姿勢の段階を増やす必要がある。つまり、従来技術では、手指形状推定の分解能は、完全屈曲及び完全伸展と、中間姿勢１〜３種類程度とを含む荒い分解能であった。また、個人差に対応したデータセットも用意すると、従来技術のデータベース規模の３００００個のデータセットでは全く足りない。<Estimation resolution>
As described above, the database scale of the prior art was about 30000 data sets. Assuming that the three joints of the four fingers each have one degree of freedom (ie, move in a fixed ratio), the thumb has three degrees of freedom, and the four fingers open and close has one degree of freedom, the types of finger shapes are the four fingers and the thumb The combination of each of the four-step postures and the two-step postures of opening and closing the four fingers will exceed 30,000 types. Actually, the four-finger opening / closing (that is, four-finger inversion / extraction) has a great influence on the quality of the estimation, so it is necessary to increase the stage of the four-finger opening / closing posture. That is, in the prior art, the resolution of finger shape estimation is a rough resolution including complete bending and complete extension and about 1 to 3 kinds of intermediate postures. If a data set corresponding to individual differences is also prepared, the conventional database scale of 30000 data sets is not sufficient.

それに対して本発明では、同じ処理速度でも上述したように、データベース規模を約１６倍のデータベース規模まで増大できる。四指及び母指３関節の姿勢を、それぞれ５段階設け、四指開閉の姿勢を３段階設けると、全ての組み合わせは約２５００００種類となる。また、個人差に対応するためのデータセットがいらないので拡大した分のデータセットには、全て分解能を向上するためのデータを入れることができるため、示指・中指・拇指に少し細かい分解能を与えることで約４０００００個のデータセットを有するデータベースを構築することが可能となる。 On the other hand, in the present invention, as described above, the database scale can be increased to about 16 times the database scale even at the same processing speed. If the postures of the four fingers and the thumb 3 joints are each provided in five stages, and the four-finger opening / closing postures are provided in three stages, there are about 250,000 types of all combinations. In addition, since there is no need for a data set to deal with individual differences, the expanded data set can contain data for improving the resolution, giving a slightly finer resolution to the index finger, middle finger, and thumb. This makes it possible to construct a database having about 400,000 data sets.

以下に、本発明における手指形状推定装置、手指形状推定方法、及び手指形状推定プログラムを好適に実施した形態について、図面を用いて説明する。 Hereinafter, embodiments in which a finger shape estimation device, a finger shape estimation method, and a finger shape estimation program according to the present invention are suitably implemented will be described with reference to the drawings.

＜手指形状推定装置：機能構成例＞
次に、第１の実施形態における手指形状推定装置の機能構成例について図を用いて説明する。図１は、本実施形態における手指形状推定装置の機能構成の一例を示す図である。図１に示す手指形状推定装置１０は、入力部１１と、出力部１２と、蓄積部１３と、画像取得部１４と、データベース構築部１５と、画像解析部１６と、照合部１７と、手指形状推定部１８と、送受信部１９と、制御部２０とを有するよう構成されている。<Finger shape estimation device: functional configuration example>
Next, a functional configuration example of the finger shape estimation apparatus according to the first embodiment will be described with reference to the drawings. FIG. 1 is a diagram illustrating an example of a functional configuration of the finger shape estimation apparatus according to the present embodiment. 1 includes an input unit 11, an output unit 12, a storage unit 13, an image acquisition unit 14, a database construction unit 15, an image analysis unit 16, a collation unit 17, and a finger. The shape estimation unit 18, the transmission / reception unit 19, and the control unit 20 are configured.

入力部１１は、ユーザ等からの画像取得指示、データベース構築手指示、データベース構築指示、画像解析指示、照合指示、手指形状推定指示、送受信指示等の各種指示の開始／終了等の入力を受け付ける。なお、入力部１１は、例えばＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）等の汎用のコンピュータであればキーボードやマウス等のポインティングデバイスからなり、スマートフォンや携帯電話等の情報端末装置やゲーム機器であれば各操作ボタン群等からなる。また、入力部１１は、音声等により上述した指示等の音声を入力する音声入力機能を有していてもよい。 The input unit 11 accepts input such as start / end of various instructions such as an image acquisition instruction, a database construction hand instruction, a database construction instruction, an image analysis instruction, a collation instruction, a finger shape estimation instruction, and a transmission / reception instruction from a user or the like. The input unit 11 is composed of a pointing device such as a keyboard or a mouse if it is a general-purpose computer such as a PC (Personal Computer), and each operation button group if it is an information terminal device or game device such as a smartphone or a mobile phone. Etc. The input unit 11 may have a voice input function for inputting voice such as the above-described instructions by voice or the like.

出力部１２は、入力部１１により入力された内容や、入力内容に基づいて実行された内容等の情報の出力を行う。具体的には、出力部１２は、取得した画像や、データベースの構築結果、画像解析結果、照合結果、手指形状推定結果等の各構成における処理の結果などの画面表示や音声出力等を行う。なお、出力部１２は、ディスプレイ、スピーカ、ロボット等からなる。 The output unit 12 outputs information such as content input by the input unit 11 and content executed based on the input content. Specifically, the output unit 12 performs screen display, audio output, and the like of the acquired image, database construction results, image analysis results, verification results, processing results in each configuration such as finger shape estimation results, and the like. The output unit 12 includes a display, a speaker, a robot, and the like.

更に、出力部１２は、プリンタ等の印刷機能を有していてもよく、上述の各出力内容を、例えば紙等の各種印刷媒体等に印刷し、ユーザ等に提供することもできる。 Further, the output unit 12 may have a printing function such as a printer, and the above-described output contents can be printed on various printing media such as paper and provided to the user or the like.

蓄積部１３は、本実施形態において必要となる各種情報や、処理の実行時又は実行後の各種データなどを蓄積する。具体的には、蓄積部１３は、予め蓄積されている画像や画像取得部１４で取得される撮影等により得られた画像（例えば、映像等のように時系列的な画像も含む）等を蓄積する。また、蓄積部１３は、データベース構築部１５により得られたデータベースの内容、画像解析部１６により得られた解析結果、照合部１７により得られた照合結果、手指形状推定部１８により得られた推定結果等を蓄積する。また、蓄積部１３は、必要に応じて蓄積されている各種データを読み出すことができる。 The accumulating unit 13 accumulates various information necessary in the present embodiment, various data at the time of execution of the process or after execution of the process, and the like. Specifically, the storage unit 13 includes images stored in advance, images obtained by shooting acquired by the image acquisition unit 14 (for example, including time-series images such as videos), and the like. accumulate. In addition, the storage unit 13 includes the contents of the database obtained by the database construction unit 15, the analysis result obtained by the image analysis unit 16, the collation result obtained by the collation unit 17, and the estimation obtained by the finger shape estimation unit 18. Accumulate results. Further, the storage unit 13 can read out various data stored as necessary.

画像取得部１４は、例えば撮像装置２１等により撮影された画像や映像等を取得する。なお、説明の便宜上、画像取得部１４により取得される画像には、全て手指が含まれているものとするが、本発明においてはこれに限定されるものではない。 The image acquisition unit 14 acquires, for example, an image or video captured by the imaging device 21 or the like. For convenience of explanation, it is assumed that all images acquired by the image acquisition unit 14 include fingers, but the present invention is not limited to this.

ここで、本実施形態では、撮像装置２１を手指形状推定装置１０の外部に設けたが、本発明においてはこれに限定されるものではなく、撮像装置２１が例えば手指形状推定装置１０内に内蔵されていてもよい。また、画像取得部１４により取得される画像や映像は、撮像装置２１により撮影される実際の手指の画像や映像等に限定されるものではなく、例えば模型の手指や写真、ポスター等を撮影した画像、ＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ）編集ソフト等により生成された画像等であってもよい。 Here, in the present embodiment, the imaging device 21 is provided outside the finger shape estimation device 10. However, the present invention is not limited to this, and the imaging device 21 is built in the finger shape estimation device 10, for example. May be. Further, the image and video acquired by the image acquisition unit 14 are not limited to the actual finger image or video captured by the imaging device 21, and for example, a model finger or photograph, a poster, or the like was captured. It may be an image, an image generated by CG (Computer Graphics) editing software, or the like.

また、画像取得部１４は、送受信部１９を介して、通信ネットワーク上に接続される外部装置やデータベース等に蓄積されている画像や映像等を取得することもできる。画像取得部１４によって取得した画像等は、蓄積部１３に蓄積させることができ、必要に応じて蓄積部１３から読み出すことができる。 The image acquisition unit 14 can also acquire images, videos, and the like stored in an external device connected to the communication network, a database, and the like via the transmission / reception unit 19. The image acquired by the image acquisition unit 14 can be stored in the storage unit 13 and can be read from the storage unit 13 as necessary.

データベース構築部１５は、必要に応じて、ユーザ等が、予めセンサやマーカー等がついたデータグローブ等を装着して得られる、データグローブから手の各関節の動き等の必要な情報を取得して、本実施形態における手指形状推定に必要とされる照合用のデータベースを構築する。或いは、ＣＧ編集ソフト等により、手指の関節角度を指定することで生成される画像を取得して、本実施形態における手指形状推定に必要とされる照合用のデータベースを構築してもよい。 The database construction unit 15 acquires necessary information such as movements of each joint of the hand from the data glove, which is obtained by the user wearing a data glove with a sensor, a marker, etc., if necessary. Thus, a database for collation required for finger shape estimation in this embodiment is constructed. Alternatively, an image generated by designating a finger joint angle may be acquired by CG editing software or the like, and a database for collation required for finger shape estimation in this embodiment may be constructed.

データベース構築部１５による構築されるデータベースには、所定の手指形状に対して、少なくとも角度データと画像特徴量とを蓄積する。そして、例えば、入力された画像に対して前腕回旋角度である角度データ（例えば、関節角度等）、画像形状比率、及び上述した画像特徴量の３つのデータを１つの組み（データセット）として、データベースを構築してもよい。なお、上述した３つのデータは、例えば画像解析部１６等により入力画像を解析することで得られる。また、上述したデータセットは、上述した３つのデータに限定されるものではなく、例えば上述した３つのデータのうち、少なくとも１つが含まれていればよい。 The database constructed by the database construction unit 15 accumulates at least angle data and image feature amounts for a predetermined finger shape. Then, for example, the three data of the angle data (for example, joint angle etc.), the image shape ratio, and the image feature amount described above as one set (data set) with respect to the input image is a forearm rotation angle, You may build a database. Note that the above-described three data can be obtained by analyzing the input image by the image analysis unit 16 or the like, for example. Further, the above-described data set is not limited to the above-described three data, and for example, it is sufficient that at least one of the above-described three data is included.

つまり、本実施形態では、データベースに「尾根線を使った画像特徴量」と「関節角度情報」とが対応付けられて蓄積される。つまり、本実施形態では、例えば、カメラ等の撮像装置２１でユーザの手指等が撮影されると、ユーザの手指の「尾根線を使った画像特徴量」と、データベース中の「尾根線を使った画像特徴量」とを比較して照合を行い、最も類似した画像特徴量に対応付けられたデータセットの「関節角度情報」が、推定結果として出力される。したがって、本実施形態で構築されるデータベースには、例えば尾根線の画像特徴量と、角度データとが必要となる。また、上述した前腕回旋角度や画像形状比率等は、例えばデータを絞り込むために用いられる付加的なデータであり、データベースとして蓄積されていなくてもよいが、これらの付加的なデータをデータベースに蓄積することにより効率的かつ高精度な絞り込みを実現することができる。 That is, in the present embodiment, “image feature amount using ridge line” and “joint angle information” are associated and stored in the database. In other words, in the present embodiment, for example, when the user's finger or the like is photographed by the imaging device 21 such as a camera, the “image feature amount using the ridge line” of the user's finger and the “ridge line using the ridge line” are used. “Image feature value” is compared and “joint angle information” of a data set associated with the most similar image feature value is output as an estimation result. Accordingly, the database constructed in the present embodiment requires, for example, image feature amounts of ridge lines and angle data. Further, the forearm rotation angle and the image shape ratio described above are additional data used for narrowing down the data, for example, and may not be stored as a database, but these additional data are stored in the database. By doing so, it is possible to realize efficient and highly accurate narrowing down.

また、データベース構築部１５は、既に本実施形態で用いられるデータベースが構築され、蓄積部１３等に蓄積されている場合や、送受信部１９を介して通信ネットワークにより接続される外部装置から取得している場合には、データベースの構築を行わなくてもよい。 Further, the database construction unit 15 acquires the database used in the present embodiment from the external device connected by the communication network via the transmission / reception unit 19 when the database is already constructed and accumulated in the accumulation unit 13 or the like. If so, the database need not be constructed.

画像解析部１６は、画像取得部１４により取得した画像（映像を含む）等を解析する。具体的には、画像解析部１６は、画像中から背景や腕或いは体躯のような非手指領域を除去し、画像中における画素毎の特徴量等から、どの部分（位置、領域）に手指等のオブジェクトの位置がどのような姿勢で映し出されているか、或いは、映像中において手指等のオブジェクトがどのように移動しているか等を解析する。つまり、画像解析部１６は、撮影された手指等の画像の特徴量の数値化処理を行う。具体的には、画像解析部１６は、例えば手指の輪郭形状（輪郭線）等を用いて尾根線情報を取得し、取得した尾根線形状から特徴量を取得する。また、画像解析部１６は、入力された画像に対して画像形状比率、及び、上述した画像特徴量の２つのデータを取得する。ただし、画像形状比率は必ずしも取得しなくてもよく、また上述した画像形状比率のデータ以外のデータが含まれていてもよい。 The image analysis unit 16 analyzes the image (including video) acquired by the image acquisition unit 14. Specifically, the image analysis unit 16 removes a non-finger region such as a background, an arm, or a body from the image, and in which part (position, region) the finger or the like from the feature amount for each pixel in the image It is analyzed how the position of the object is projected, how the object such as a finger moves in the video, and the like. That is, the image analysis unit 16 performs numerical processing of the feature amount of the captured image such as a finger. Specifically, the image analysis unit 16 acquires ridge line information using, for example, a contour shape (contour line) of a finger, and acquires a feature amount from the acquired ridge line shape. Further, the image analysis unit 16 acquires two data of the image shape ratio and the above-described image feature amount with respect to the input image. However, the image shape ratio is not necessarily acquired, and data other than the image shape ratio data described above may be included.

なお、輪郭線の取得例としては、例えば隣接画素間における輝度差の情報等に基づいて、画像中から手指部分と背景部分とを分離し、手指部分の輪郭線を取得することができるが、本発明においては、これに限定されるものではない。 In addition, as an example of acquiring a contour line, for example, based on information on a luminance difference between adjacent pixels, the finger part and the background part can be separated from the image, and the contour line of the finger part can be acquired. The present invention is not limited to this.

照合部１７は、入力画像から画像解析部１６により得られる解析結果に基づいて、入力画像と、予め設定された照合用のデータベースとの照合を行い、類似度判定を行う。具体的には、照合部１７は、例えば上述した２つのデータ（画像形状比率、画像特徴量）のうち、少なくとも画像特徴量を用いてデータベースに含まれる手指形状と入力画像内の手指形状との照合を行う。 Based on the analysis result obtained by the image analysis unit 16 from the input image, the collation unit 17 collates the input image with a preset collation database, and performs similarity determination. Specifically, the collation unit 17 uses, for example, at least the image feature amount of the two data (image shape ratio and image feature amount) described above to calculate the finger shape included in the database and the finger shape in the input image. Perform verification.

手指形状推定部１８は、照合部１７により得られた照合結果に基づいて、画像中の手指に対応する手指形状を推定する。なお、手指形状推定部１８における手指形状推定の具体的な手法については、後述する。 The finger shape estimation unit 18 estimates the finger shape corresponding to the finger in the image based on the collation result obtained by the collation unit 17. A specific method for estimating the hand shape in the hand shape estimating unit 18 will be described later.

また、送受信部１９は、通信ネットワーク等を用いて接続可能な外部装置から所望する外部画像（例えば撮影画像や映像等）や、本発明における手指形状推定処理を実現するための実行プログラム等を取得するためのインターフェースである。また、送受信部１９は、手指形状推定装置１０内で得られた各種情報を外部装置に送信することができる。 Further, the transmission / reception unit 19 acquires a desired external image (for example, a photographed image or video) from an external device that can be connected using a communication network or the like, an execution program for realizing the finger shape estimation process in the present invention, and the like. It is an interface to do. Moreover, the transmission / reception part 19 can transmit the various information obtained in the finger shape estimation apparatus 10 to an external device.

制御部２０は、手指形状推定装置１０の各構成部全体の制御を行う。具体的には、制御部２０は、例えばユーザ等による入力部１１からの指示等に基づいて、画像の取得、データベース構築、画像解析、画像照合、手指形状の推定等の各処理における制御等を行う。 The control unit 20 controls the entire components of the finger shape estimation device 10. Specifically, the control unit 20 performs control in each process such as image acquisition, database construction, image analysis, image matching, and finger shape estimation based on, for example, an instruction from the input unit 11 by a user or the like. Do.

撮像装置２１は、例えばデジタルカメラや高精度カメラ等からなり、ユーザの実際の手指や模型の手指等の画像や映像を取得する。なお、撮像装置２１は、１台だけ設けられていてもよいし、異なる方向から同時に手指を撮影できるように複数台、設けられていてもよい。 The imaging device 21 includes, for example, a digital camera, a high-precision camera, and the like, and acquires images and videos of the user's actual fingers and model fingers. Note that only one imaging device 21 may be provided, or a plurality of imaging devices 21 may be provided so that fingers can be photographed simultaneously from different directions.

＜手指形状推定装置１０：ハードウェア構成＞
ここで、上述した手指形状推定装置１０においては、各機能をコンピュータ（情報処理装置、ハードウェア）に実行させることができるソフトウェアとしての実行プログラム（手指形状推定プログラム）等を生成し、例えばＰＣ等の汎用のパーソナルコンピュータ、サーバ、スマートフォンや携帯電話等の情報端末装置、ゲーム機器等にその実行プログラムをインストールすることにより、本発明における手指形状推定処理等を実現することができる。<Hand shape estimation device 10: hardware configuration>
Here, in the hand shape estimation apparatus 10 described above, an execution program (hand shape estimation program) or the like as software capable of causing a computer (information processing apparatus, hardware) to execute each function is generated. By installing the execution program on an information terminal device such as a general-purpose personal computer, a server, an information terminal device such as a smartphone or a mobile phone, or a game machine, the finger shape estimation process or the like in the present invention can be realized.

ここで、本実施形態における手指形状推定処理が実現可能なコンピュータのハードウェア構成例について図を用いて説明する。図２は、本実施形態における手指形状推定処理が実現可能なハードウェア構成の一例を示す図である。 Here, a hardware configuration example of a computer capable of realizing the finger shape estimation process in the present embodiment will be described with reference to the drawings. FIG. 2 is a diagram illustrating an example of a hardware configuration capable of realizing the finger shape estimation process according to the present embodiment.

図２におけるコンピュータ本体には、入力装置３１と、出力装置３２と、ドライブ装置３３と、補助記憶装置３４と、メモリ装置３５と、各種制御を行うＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）３６と、ネットワーク接続装置３７とを有するよう構成されており、これらはシステムバスＢで相互に接続されている。 2 includes an input device 31, an output device 32, a drive device 33, an auxiliary storage device 34, a memory device 35, a CPU (Central Processing Unit) 36 that performs various controls, and a network connection device. 37, and these are connected to each other by a system bus B.

入力装置３１は、ユーザ等が操作するキーボード及びマウス等のポインティングデバイスを有しており、ユーザ等からのプログラムの実行等の各種操作信号を入力する。また、入力装置３１は、例えばカメラ等の撮像装置２１から撮影された画像を入力する画像入力ユニットを有していてもよい。 The input device 31 has a pointing device such as a keyboard and a mouse operated by a user or the like, and inputs various operation signals such as execution of a program from the user or the like. The input device 31 may include an image input unit that inputs an image captured from the imaging device 21 such as a camera.

出力装置３２は、本発明における処理を行うためのコンピュータ本体を操作するのに必要な各種ウィンドウやデータ等を表示するディスプレイを有し、ＣＰＵ３６が有する制御プログラムによりプログラムの実行経過や結果等を表示することができる。 The output device 32 has a display for displaying various windows and data necessary for operating the computer main body for performing the processing according to the present invention, and displays the program execution progress and results by the control program of the CPU 36. can do.

ここで、本発明においてコンピュータ本体にインストールされる実行プログラムは、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）メモリやＣＤ−ＲＯＭ等の可搬型の記録媒体３８等により提供される。プログラムを記録した記録媒体３８は、ドライブ装置３３にセット可能であり、記録媒体３８に含まれる実行プログラムが、記録媒体３８からドライブ装置３３を介して補助記憶装置３４にインストールされる。 Here, the execution program installed in the computer main body in the present invention is provided by, for example, a portable recording medium 38 such as a USB (Universal Serial Bus) memory or a CD-ROM. The recording medium 38 on which the program is recorded can be set in the drive device 33, and the execution program included in the recording medium 38 is installed in the auxiliary storage device 34 from the recording medium 38 via the drive device 33.

補助記憶装置３４は、ハードディスク等のストレージ装置であり、本発明における実行プログラムやコンピュータに設けられた制御プログラム等を蓄積し、必要に応じてそれらの入出力を行うことができる。 The auxiliary storage device 34 is a storage device such as a hard disk, and can store an execution program according to the present invention, a control program provided in a computer, and the like, and can input and output them as necessary.

メモリ装置３５は、ＣＰＵ３６により補助記憶装置３４から読み出された実行プログラム等を格納する。なお、メモリ装置３５は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）やＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等からなる。 The memory device 35 stores an execution program read from the auxiliary storage device 34 by the CPU 36. The memory device 35 includes a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.

ＣＰＵ３６は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）等の制御プログラム、及びメモリ装置３５に格納されている実行プログラムに基づいて、各種演算や各ハードウェア構成部とのデータの入出力等、コンピュータ全体の処理を制御して、手指形状推定処理における各処理を実現することができる。なお、プログラムの実行中に必要な各種情報等は、補助記憶装置３４から取得することができ、また実行結果等を補助記憶装置３４に格納することもできる。 The CPU 36 controls processing of the entire computer, such as various operations and input / output of data with each hardware component, based on a control program such as an OS (Operating System) and an execution program stored in the memory device 35. Thus, each process in the finger shape estimation process can be realized. Various information necessary during the execution of the program can be acquired from the auxiliary storage device 34, and the execution result and the like can also be stored in the auxiliary storage device 34.

ネットワーク接続装置３７は、通信ネットワーク等と接続することにより、実行プログラムを通信ネットワークに接続されている他の端末等から取得したり、プログラムを実行することで得られた実行結果又は本発明における実行プログラム自体を他の端末等に提供することができる。 The network connection device 37 acquires an execution program from another terminal connected to the communication network by connecting to a communication network or the like, or an execution result obtained by executing the program or an execution in the present invention The program itself can be provided to other terminals.

上述したようなハードウェア構成により、本発明における手指形状推定処理を実行することができる。また、プログラムをインストールすることにより、汎用のパーソナルコンピュータ等で本発明における手指形状推定処理を容易に実現することができる。 With the hardware configuration as described above, the finger shape estimation process in the present invention can be executed. In addition, by installing the program, the finger shape estimation process according to the present invention can be easily realized by a general-purpose personal computer or the like.

次に、上述した手指形状推定プログラムにおける手指形状推定処理について具体的に説明する。 Next, the finger shape estimation process in the above-described finger shape estimation program will be specifically described.

＜手指形状推定処理手順＞
まず、本実施形態における手指形状推定処理手順について説明する。図３は、本実施形態における手指形状推定処理手順の一例を示すフローチャートである。なお、以下に説明する各種処理における各部の動作は、制御部２０（ＣＰＵ３６）により制御される。<Finger shape estimation processing procedure>
First, the finger shape estimation processing procedure in the present embodiment will be described. FIG. 3 is a flowchart showing an example of a finger shape estimation processing procedure in the present embodiment. The operation of each unit in various processes described below is controlled by the control unit 20 (CPU 36).

図３に示す手指形状推定処理では、まず、制御部２０は、手指形状を推定するための照合用データベースがあるか否かを判断する（Ｓ０１）。制御部２０は、具体的には、照合用データベースが予め蓄積部１３等に蓄積されていたり、通信ネットワーク等により外部装置等から取得されていたりするか否かを判断する。ここで、照合用データベースがない場合（Ｓ０１の処理において、ＮＯ）、制御部２０は、各部を制御して照合用データベースを構築する（Ｓ０２）。 In the finger shape estimation process shown in FIG. 3, first, the control unit 20 determines whether or not there is a collation database for estimating the finger shape (S01). Specifically, the control unit 20 determines whether or not the collation database is stored in the storage unit 13 or the like in advance, or is acquired from an external device or the like through a communication network or the like. Here, when there is no collation database (NO in the process of S01), the control unit 20 controls each unit to construct a collation database (S02).

ただし、本実施形態においてＳ０１及びＳ０２の処理は必須ではなく、例えばＳ０１の処理においてＮＯの場合、処理を終了することにしてもよい。 However, in the present embodiment, the processes of S01 and S02 are not essential. For example, in the case of NO in the process of S01, the process may be terminated.

また、Ｓ０１の処理において、照合用データベースがある場合（Ｓ０１において、ＹＥＳ）、又はＳ０２の処理において、照合用データベースを構築した場合、画像取得部１４は、手指形状を推定する必要のある手指を含む画像を取得する（Ｓ０３）。次いで、画像解析部１６は、取得した画像の解析を行う（Ｓ０４）。なお、Ｓ０４における解析処理としては、例えば画像形状比率を求めたり、手指の尾根線形状等から画像特徴量（第１の特徴量）を算出する等の処理を行うが、本発明においてはこれに限定されるものではない。 Further, when there is a collation database in the process of S01 (YES in S01), or when a collation database is constructed in the process of S02, the image acquisition unit 14 selects a finger whose hand shape needs to be estimated. The image including it is acquired (S03). Next, the image analysis unit 16 analyzes the acquired image (S04). As the analysis processing in S04, for example, processing such as obtaining an image shape ratio or calculating an image feature amount (first feature amount) from a ridge line shape of a finger is performed. It is not limited.

また、手指形状推定処理では、照合部１７は、Ｓ０４の処理で得られた解析結果に基づいて、Ｓ０２の処理等により得られた照合用データベースや予め蓄積部１３等に蓄積された照合用データベースとＳ０３で取得した画像との照合を行う（Ｓ０５）。次いで、手指形状推定部１８は、取得した画像内の手指形状の推定を行い（Ｓ０６）、画像形状比率や尾根線形状の画像特徴量等と対応付けて照合用データベースに蓄積されている手指の角度データを、推定結果として出力する（Ｓ０７）。 Further, in the finger shape estimation process, the collation unit 17 uses the collation database obtained by the process of S02 or the like based on the analysis result obtained in the process of S04 or the collation database previously stored in the storage unit 13 or the like. Are compared with the image acquired in S03 (S05). Next, the finger shape estimation unit 18 estimates the finger shape in the acquired image (S06), and associates the finger shape with the image feature ratio, the image feature amount of the ridge line shape, and the like stored in the matching database. The angle data is output as an estimation result (S07).

次に、制御部２０は、処理を終了するか否かを判断し（Ｓ０８）、終了しない場合（Ｓ０８において、ＮＯ）、Ｓ０３に戻り、制御部２０は、例えば連続する画像、つまり映像に対して上述の処理を行って時系列的に結果を出力したり、又は、他の画像に対して上述の処理を行う。 Next, the control unit 20 determines whether or not to end the process (S08). If the process does not end (NO in S08), the control unit 20 returns to S03, and the control unit 20 performs, for example, continuous images, that is, videos. The above processing is performed and the results are output in time series, or the above processing is performed on another image.

また、Ｓ０８の処理において、ユーザの指示等により処理を終了する場合（Ｓ０８において、ＹＥＳ）、制御部２０は、手指形状推定処理を終了する。 Moreover, in the process of S08, when a process is complete | finished by a user's instruction | indication etc. (in S08, YES), the control part 20 complete | finishes a finger shape estimation process.

なお、上述したように、Ｓ０１及びＳ０２の処理は、手指形状推定処理に含まれていなくてもよい。つまり、データベースの構築は、手指形状推定処理とは非同期で行われる。したがって、例えば推定の初回時に限って、上述したＳ０１及びＳ０２に示すようにデータベースの有無の判断及びデータベースの構築を行ってもよく、所望するデータベースが見当たらなかった場合やユーザ等の指示があった場合に、データベースを構築するようにしてもよい。 As described above, the processes of S01 and S02 may not be included in the finger shape estimation process. That is, the database construction is performed asynchronously with the finger shape estimation process. Therefore, for example, only at the first estimation, the presence / absence of the database may be determined and the database may be constructed as shown in S01 and S02 described above. When the desired database is not found, there is an instruction from the user or the like. In some cases, a database may be constructed.

ここで、図４は、本実施形態における照合処理（Ｓ０５）の概要を説明するための図である。本実施形態の照合処理は、大別すると、図４に示すように、照合用のデータベースに含まれる各データセットから、例えば前腕回旋による探索範囲の制限や画像形状比率等によるデータセットの絞りこみ等（すなわち、手指形状における所定の形状パラメータによるデータセットの絞り込み）を行う第１段階目の処理と、画像特徴量による類似度計算を行って、最も類似するデータセットを出力する第２段階目の処理とからなる。また、本実施形態では、例えば上述した入力画像の画像形状比率（次元数は３）等のデータセットを用いて、全データベースのデータセットと取得した画像とを照合する。 Here, FIG. 4 is a diagram for explaining the outline of the collation processing (S05) in the present embodiment. As shown in FIG. 4, the collation processing according to the present embodiment can be roughly classified from the data sets included in the collation database, for example, by narrowing down the data set by limiting the search range by forearm rotation, image shape ratio, or the like. And so on (ie, narrowing down the data set by a predetermined shape parameter in the finger shape) and the second step of outputting the most similar data set by performing similarity calculation based on the image feature amount Process. In the present embodiment, for example, the data set of all databases and the acquired image are collated using a data set such as the image shape ratio (the number of dimensions is 3) of the input image described above.

つまり、本実施形態では、例えば画像形状比率等による粗い絞り込みで残ったデータセット（データセット群）に対して、更に、画像特徴量（例えば約１００次元）による精緻な類似度照合を行い、最も類似したデータセットを手指形状の推定結果として出力する。これにより、取得した画像中の手指の形状や動きを高速かつ高精度に推定することができる。 In other words, in the present embodiment, for example, the data set (data set group) remaining after the rough narrowing by the image shape ratio or the like is further subjected to precise similarity matching by the image feature amount (for example, about 100 dimensions). A similar data set is output as a finger shape estimation result. Thereby, the shape and movement of fingers in the acquired image can be estimated at high speed and with high accuracy.

＜データベース構築処理手順＞
次に、上述した本実施形態における照合用データベース構築処理（Ｓ０２）の手順について、フローチャートを用いて説明する。図５は、本実施形態における照合用データベース構築処理手順の一例を示すフローチャートである。なお、以下に説明する各種処理における各部の動作は、制御部２０（ＣＰＵ３６）により制御される。<Database construction process>
Next, the procedure of the collation database construction process (S02) in this embodiment described above will be described using a flowchart. FIG. 5 is a flowchart showing an example of a verification database construction processing procedure in the present embodiment. The operation of each unit in various processes described below is controlled by the control unit 20 (CPU 36).

照合用データベースの構築では、まず、画像取得部１４は、データグローブ等を用いて得られた複数の手指形状の画像を取得する（Ｓ１１）。ここで、図６Ａ〜６Ｄは、データグローブを用いて得られた複数の手指形状画像の一例を示す図である。ユーザは、手指形状を推定するための照合用データベースを構築するに際し、例えば図６Ａ〜６Ｄに示すような予め設定された手指形状に基づく画像を取得する。なお、手指形状は、図６Ａ〜６Ｄの例に限定されるものではなく、多数の形状が用いられる。 In constructing the collation database, first, the image acquisition unit 14 acquires a plurality of finger-shaped images obtained using a data glove or the like (S11). Here, FIGS. 6A to 6D are diagrams illustrating an example of a plurality of finger shape images obtained using a data glove. When a user constructs a collation database for estimating a finger shape, the user acquires images based on preset finger shapes as shown in FIGS. The finger shape is not limited to the examples of FIGS. 6A to 6D, and many shapes are used.

また、Ｓ１１の処理では、画像取得部１４は、各手指形状として、例えば上面、側面、斜め前方等の少なくとも１つの角度からの形状を取得する。なお、本実施形態では、１つの手指に対する所定方向からの画像を、データセットとして扱う。 Moreover, in the process of S11, the image acquisition part 14 acquires the shape from at least 1 angle, such as an upper surface, a side surface, and diagonally forward, as each finger shape. In the present embodiment, an image from a predetermined direction with respect to one finger is handled as a data set.

次に、画像解析部１６は、各画像に含まれる手指の関節角度を取得する（Ｓ１２）。関節角度とは、例えば手指の関節角度データを示すが、本発明においてはこれに限定されるものではなく、例えば前腕回旋角度等の関節角度等も含んでもよい。なお、関節角度の取得例については、後述する。 Next, the image analysis unit 16 acquires the joint angles of fingers included in each image (S12). The joint angle indicates, for example, finger joint angle data, but is not limited to this in the present invention, and may include, for example, a joint angle such as a forearm rotation angle. An example of acquiring the joint angle will be described later.

なお、本実施形態では、データグローブ等を用いて複数の手指形状の画像と関節角度情報とを取得する代わりに、例えばＣＧ編集ソフト等を用いて手指画像を生成し、複数の関節角度情報とそれに対応する複数の手指形状の画像とを取得してもよい。 In this embodiment, instead of acquiring a plurality of finger-shaped images and joint angle information using a data glove or the like, a finger image is generated using, for example, CG editing software, and the plurality of joint angle information and A plurality of corresponding finger-shaped images may be acquired.

次に、画像解析部１６は、各画像に含まれる手指の画像形状比率を算出する（Ｓ１３）。Ｓ１３の処理における画像形状比率の算出手法については、後述する。なお、本発明においては必ずしも画像形状比率の算出を行わなくてもよい。次に、画像解析部１６は、各画像に含まれる手指の輪郭線等に基づいて尾根線形状を取得する（Ｓ１４）。具体的には、Ｓ１４の処理では、画像特徴量に基づいて手画像を山と見なして尾根線を設定する。 Next, the image analysis unit 16 calculates the image shape ratio of the fingers included in each image (S13). A method for calculating the image shape ratio in the process of S13 will be described later. In the present invention, it is not always necessary to calculate the image shape ratio. Next, the image analysis unit 16 acquires a ridge line shape based on the contours of fingers included in each image (S14). Specifically, in the process of S14, the ridge line is set by regarding the hand image as a mountain based on the image feature amount.

次に、画像解析部１６は、取得した尾根線からベクトル情報を画像特徴量として取得する（Ｓ１５）。そして、データベース構築部１５は、取得した関節角度（角度データ）、画像形状比率、画像特徴量（第２の特徴量）を含む組（データセット）からなる照合用データベースを構築する（Ｓ１６）。 Next, the image analysis unit 16 acquires vector information as an image feature amount from the acquired ridge line (S15). Then, the database construction unit 15 constructs a collation database composed of a set (data set) including the acquired joint angle (angle data), image shape ratio, and image feature amount (second feature amount) (S16).

なお、Ｓ１６の処理では、上述した画像特徴量と、推定結果の出力のために用いる関節角度（角度データ）とが含まれていればよく、また上述した画像形状比率のデータ以外のデータが含まれていてもよい。ここで、上述したデータベースの構造について、具体的に説明する。 In the process of S16, it is only necessary to include the image feature amount described above and the joint angle (angle data) used for outputting the estimation result, and data other than the image shape ratio data described above is included. It may be. Here, the structure of the database described above will be specifically described.

＜データベース構造＞
図７Ａ及び７Ｂは、データベース構造の一例を示す図である。本実施形態では、例えば、図７Ａに示すように、例えば所定の手指形状における手指の関節角度や前腕回旋角度等を含む角度データ（ＪＯＩＮＴＡＮＧＬＥＳＷＩＴＨＷＲＩＳＴＲＯＴＡＴＩＯＮ）、画像形状比率（ＩＭＡＧＥＡＳＰＥＣＴＳ）、及び画像特徴量（ＩＭＡＧＥＦＥＡＴＵＲＥＳ）の３つのデータを１つのデータセット（ＤＡＴＡＳＥＴ）とし、様々な手指形状に対するデータセットの集まりをデータベース（ＤＡＴＡＢＡＳＥ）とする。なお、角度データについては、例えば図７Ｂに示すように、予め角度データに対する把持動作を識別情報等により設定しておいてもよい。これにより、照合時に角度データに基づく把持動作（手指の動き）を効率的に取得することができる。<Database structure>
7A and 7B are diagrams illustrating an example of a database structure. In the present embodiment, for example, as shown in FIG. 7A, for example, angle data (JOINT ANGLES WITH WRISTRATION) including a finger joint angle and a forearm rotation angle in a predetermined finger shape, an image shape ratio (IMAGE ASPECTS), and Three data of image feature values (IMAGE FEATURES) are set as one data set (DATA SET), and a set of data sets for various finger shapes is set as a database (DATABASE). As for the angle data, for example, as shown in FIG. 7B, a gripping operation for the angle data may be set in advance using identification information or the like. Thereby, it is possible to efficiently acquire a gripping operation (movement of fingers) based on the angle data at the time of collation.

つまり、ユーザ（操作者）の手指動作により、例えば遠隔ロボットを制御しようとする場合には、必ずしも詳細な手指形状推定を行うよりは、カメラにより撮像された手指映像に対して迅速に把持動作の何れかを識別した方が、安定で高速な遠隔ロボット操作が実現できる。そこで、本実施形態では、図７Ｂに示すように、データベース構築時に、例えば識別が必要な手指形状とその個人差データとを集中的に生成し、手指関節角度データの代わりに、把持動作パターンの番号（１，２，３，…）を付与する。 In other words, when trying to control a remote robot by a user's (operator's) finger movement, for example, it is possible to quickly grasp a finger image captured by a camera rather than performing detailed finger shape estimation. If one of them is identified, a stable and high-speed remote robot operation can be realized. Therefore, in the present embodiment, as shown in FIG. 7B, at the time of database construction, for example, finger shapes that need to be identified and their individual difference data are intensively generated, and instead of finger joint angle data, Numbers (1, 2, 3, ...) are assigned.

これにより、本実施形態では、上述した照合処理における第１段階目の粗い絞り込みと、第２段階目の精緻な類似度照合とにより検索された最も類似する手指形状を推定結果とすることができる。なお、この場合に、推定結果として出力される内容は、関節角度でなくてもよく、例えば推定結果に対応して予め設定されている把持動作パターン番号を出力してもよい。 Thereby, in this embodiment, the most similar finger shape searched by the rough narrowing of the 1st step in the collation process mentioned above and the fine similarity collation of the 2nd step can be made into an estimation result. . In this case, the content output as the estimation result may not be the joint angle. For example, a gripping motion pattern number set in advance corresponding to the estimation result may be output.

＜手指関節角度＞
ここで、照合用データベースに含まれる手指関節角度は、例えばデータグローブ（例えば、ＶｉｒｔｕａｌＴｅｃｈｎｏｌｏｇｉｅｓ社製、ＣｙｂｅｒＧｌｏｖｅ（登録商標））によって取得される。<Finger joint angle>
Here, the finger joint angle included in the verification database is acquired by, for example, a data glove (for example, CyberGlove (registered trademark) manufactured by Virtual Technologies).

＜画像形状比率＞
次に、画像形状比率の算出方法について、具体的に説明する。図８は、画像形状比率の算出に必要な各種パラメータの一例を示す図である。画像形状比率の算出方法では、まず手指領域と背景を分離し、手指領域に対してラベリング処理を行う。そのとき、最も大きなラベル番号を持つ画素を基準点とし、基準点に基づいて手指範囲を決定する。ここで、例えば基準点から基準点のラベル番号分の画素だけ下部分を手指範囲の下端とし、手指領域が手指範囲にちょうど入るように手指範囲の上端，右端，左端を決定する。そして、画像形状比率は、縦長度，上長度，右長度の３つの値で表わされ、それぞれ次式（１）〜（３）で定義される。<Image shape ratio>
Next, a method for calculating the image shape ratio will be specifically described. FIG. 8 is a diagram illustrating an example of various parameters necessary for calculating the image shape ratio. In the method of calculating the image shape ratio, first, the finger area and the background are separated, and a labeling process is performed on the finger area. At that time, the pixel having the largest label number is set as a reference point, and the finger range is determined based on the reference point. Here, for example, the lower part of the finger range by the pixel corresponding to the label number of the reference point is set as the lower end of the finger range, and the upper end, right end, and left end of the finger range are determined so that the finger region just enters the finger range. The image shape ratio is represented by three values of verticalness, upperness and rightness, and is defined by the following equations (1) to (3), respectively.

ここで、上述した式において、Ｒ_ｔａｌｌは縦長度を示し、Ｒ_{ｔｏｐｈｅａｖｙ}は上長度を示し、Ｒ_{ｒｉｇｈｔｂｉａｓｅｄ}は右長度を示し、Ｌ_{ｈｅｉｇｈｔ}は手指範囲の上端から下端までの距離を示し、Ｌ_{ｗｉｄｔｈ}は手指範囲の右端から左端までの距離を示し、Ｌ_{ｕｐｐｅｒ}は手指範囲の上端から基準点までの距離を示し、Ｌ_{ｌｏｗｅｒ}は手指範囲の下端から基準点までの距離を示し、Ｌ_{ｒｉｇｈｔ}は手指範囲の右端から基準点までの距離を示し、Ｌ_ｌｅｆｔは手指範囲の左端から基準点までの距離を示す。Here, in the above-mentioned _{formula, R tall} represents a portrait _{of, R Topheavy} represents a superior _{degree, R Rightbiased} represents the right length _{of, L height} indicates the distance from the top of the finger ranging to the lower _{end, L width} is Indicates the distance from the right end to the left end of the finger range, L _upper indicates the distance from the upper end of the finger range to the reference point, L _lower indicates the distance from the lower end of the finger range to the reference point, and L _right indicates the distance of the finger range. The distance from the right end to the reference point is indicated, and L _left indicates the distance from the left end of the finger range to the reference point.

＜画像特徴量について＞
次に、本実施形態における画像特徴量の取得方法について、図を用いて説明する。図９Ａ〜９Ｃは、画像特徴量を取得する基準となるデータの一例を示す図である。また、図１０Ａ及び１０Ｂは、尾根線情報抽出結果の一例を示す図である。<About image features>
Next, an image feature amount acquisition method according to the present embodiment will be described with reference to the drawings. 9A to 9C are diagrams illustrating an example of data serving as a reference for acquiring image feature amounts. 10A and 10B are diagrams illustrating examples of ridge line information extraction results.

本実施形態では、従来手法で推定に用いられていた手画像の輪郭線情報用いると個人差の影響が大きく出てしまうため、例えば前景画像である手画像の各画素の背景画像からの距離を高さと考え、図９Ａ〜９Ｃに示すように手画像を山と見なす。そして、本実施形態では、その山に引くことのできる尾根線の情報を推定に用いる。 In this embodiment, the use of contour information of hand images, which has been used for estimation in the conventional method, greatly affects individual differences. For example, the distance of each pixel of the hand image that is the foreground image from the background image is determined. Considering the height, the hand image is regarded as a mountain as shown in FIGS. In this embodiment, information on the ridge line that can be drawn on the mountain is used for estimation.

なお、図９Ａ〜９Ｃに示す点群は、例えば手画像の輪郭線走査により求める。例えば手画像の輪郭線を走査するときに、１回目の走査で調べた画素に「１」というラベルを貼り、１周したら走査済み画素を背景として２回目の輪郭線走査を行い、調べた画素に「２」というラベルを貼る。同様の処理によりｎ回目の走査で調べた画素には「ｎ」というラベルを貼る。図９Ａ〜９Ｃの点群は、このような処理により貼ったラベル番号を高さとして描画したものである。ただし、本発明における尾根線情報を求める方法については、上述の処理に限定されるものではない。 Note that the point cloud shown in FIGS. 9A to 9C is obtained, for example, by contour scanning of a hand image. For example, when scanning the contour line of a hand image, a label “1” is attached to the pixel examined in the first scan, and after one round, the second contour scan is performed with the scanned pixel as the background, and the examined pixel Label “2” on A label “n” is attached to the pixel examined in the nth scan by the same process. The point cloud of FIGS. 9A to 9C is drawn with the label number attached by such processing as the height. However, the method for obtaining the ridge line information in the present invention is not limited to the above-described processing.

これにより、図１０Ａ及び１０Ｂに示すように、手の形状やグー、チョキ、パー等の形状の作り方の個人差の影響を減らし、手指形状推定の精度を上げることができる。 As a result, as shown in FIGS. 10A and 10B, it is possible to reduce the influence of individual differences in the shape of the hand and the shape of goo, choki, par, etc., and increase the accuracy of finger shape estimation.

具体的には、本実施形態では、手指画像を、画像形状比率の算出時に決めた手指範囲のみ切り出し、縦６４［ｐｉｘｅｌ］、横６４［ｐｉｘｅｌ］となるように縮小する。この縮小画像から尾根線情報を抽出すると、手指画像は、図１０Ｂに示すような画像になる。 Specifically, in the present embodiment, the finger image is cut out only in the finger range determined at the time of calculating the image shape ratio, and is reduced to 64 [pixel] and 64 [pixel]. When ridge line information is extracted from this reduced image, the finger image becomes an image as shown in FIG. 10B.

尾根線を抽出する方法は、まず、手画像の最外の輪郭線をなぞる走査を行う。この時、走査の進行方向に向かって左右の画素を調べ、それぞれが背景と同じ画素値もしくは走査済みの画素だったら、尾根線抽出結果用に用意された６４×６４［ｐｉｘｅｌ］画像の同じ画素に点（尾根に対応）をプロットする。 In the method of extracting the ridge line, first, scanning is performed by tracing the outermost contour line of the hand image. At this time, the left and right pixels are examined in the scanning direction. If each pixel has the same pixel value as the background or has been scanned, the same pixel in the 64 × 64 [pixel] image prepared for the ridge line extraction result. Plot the point (corresponding to the ridge).

輪郭線を１周走査し終えたら、走査済みの画素を背景と見なし、１回り小さくなった手画像の輪郭線（一つ内側の輪郭線）の走査を行い、上記処理と同様の処理をする。そして、走査する画素がなくなったら処理を終了とする。このような抽出処理により得られた尾根線画像を縦８分割、横８分割し、各分割領域において高次自己局所相関関数を用いて２５次元の画像特徴量を取得する。 When the contour line has been scanned once, the scanned pixel is regarded as the background, and the hand image contour line (one inner contour line) that is smaller by one is scanned, and the same processing as described above is performed. . When there are no more pixels to be scanned, the process ends. The ridge line image obtained by such an extraction process is divided into 8 vertical parts and 8 horizontal parts, and 25-dimensional image feature quantities are obtained using a high-order autolocal correlation function in each divided region.

ここで、図１１は、８×８分割した尾根線画像の各分割領域の尾根線情報に施す高次局所自己相関処理を説明するための図である。本実施形態では、図１１に示すように、尾根線画像を縦横それぞれ８ブロック（８ＢＬＯＣＫＳ）毎に分割し、分割した各ブロックに対して２５次元（図１１に示すＮｏ．１〜Ｎｏ．２５）の相関関係を画像特徴量として割り当てる。これにより、尾根線画像１枚あたり８×８×２５次元の画像特徴量を得ることができる。 Here, FIG. 11 is a diagram for explaining high-order local autocorrelation processing performed on the ridge line information of each divided region of the 8 × 8 divided ridge line image. In the present embodiment, as shown in FIG. 11, the ridge line image is divided into 8 blocks (8 BLOCKS) in each of the vertical and horizontal directions, and 25 dimensions (No. 1 to No. 25 shown in FIG. 11) for each divided block. Are assigned as image feature amounts. Thereby, it is possible to obtain an image feature amount of 8 × 8 × 25 dimensions per one ridge line image.

ここで、上述した手画像の輪郭線をなぞる走査については、以下の方法による画素移動を行うことで効率的に輪郭線走査を実現することができる。図１２Ａ〜１２Ｃは、画素の移動方法の一例を示す図である。 Here, with regard to the scanning of tracing the contour line of the hand image described above, the contour line scanning can be efficiently realized by performing pixel movement by the following method. 12A to 12C are diagrams illustrating an example of a pixel moving method.

例えば、３×３の画素マトリクスにおいて、進行方向（走査方向）が真上、真下、真横の場合には、図１２Ａに示すように、現在の画素（現在の探索画素）と、背景画素との関係に基づいて進行予定画素が決定され、決定した進行予定画素に画素を移動する。また、進行方向が斜めの場合にも同様に図１２Ｂに示すように、現在の画素と、背景画素との関係に基づいて進行予定画素が決定され、決定した進行予定画素に画素を移動する。 For example, in a 3 × 3 pixel matrix, when the traveling direction (scanning direction) is directly above, directly below, or directly beside, as shown in FIG. 12A, the current pixel (current search pixel) and the background pixel are The advance pixel is determined based on the relationship, and the pixel is moved to the determined advance pixel. Similarly, as shown in FIG. 12B, when the traveling direction is oblique, similarly, as shown in FIG. 12B, the pixel to be advanced is determined based on the relationship between the current pixel and the background pixel, and the pixel is moved to the determined pixel to be advanced.

また、進行方向（走査方向）が斜めの場合における画素の移動方法の例外として、さらに前回探索画素の位置を用いて画素を移動させることもできる。具体的には、図１２Ｃに示すように、前回探索画素と、現在の画素と、背景画素とに基づいて、進行予定画素が決定され、決定した進行予定画素に画素を移動する。これにより、より高精度に進行予定画素を決定することができる。なお、図１２Ｃに示す画素の移動方法は、例えば、他の斜めの進行方向の場合にも適用することができる。更に、図１２Ｃに示す移動方法は、上述した真上、真下、真横、斜めの各場合にも適用することができる。これにより、高精度に進行方向を決定することができると共に、画像全体を走査する必要がなく、輪郭線付近の走査のみでよいため、より迅速かつ効率的な処理を実現することができる。 In addition, as an exception to the pixel moving method when the traveling direction (scanning direction) is oblique, the pixel can be moved using the position of the previous search pixel. Specifically, as illustrated in FIG. 12C, the advance pixel is determined based on the previous search pixel, the current pixel, and the background pixel, and the pixel is moved to the determined advance pixel. As a result, it is possible to determine the scheduled pixel with higher accuracy. Note that the pixel moving method illustrated in FIG. 12C can be applied to other oblique traveling directions, for example. Furthermore, the moving method shown in FIG. 12C can be applied to the above-described cases of directly above, directly below, directly beside, and obliquely. Accordingly, the traveling direction can be determined with high accuracy, and it is not necessary to scan the entire image, and only the scanning in the vicinity of the contour line is required, so that more rapid and efficient processing can be realized.

＜輪郭線走査から尾根線ベクトルを取得する処理手順：他の例＞
次に、尾根線から画像特徴量を取得する他の例について説明する。具体的には、本実施形態における輪郭線走査から尾根線ベクトルを取得する処理手順についてフローチャートを用いて説明する。図１３は、本実施形態における輪郭線走査から尾根線ベクトルを取得する処理手順の一例を示すフローチャートである。<Processing procedure for acquiring ridge line vector from contour scanning: other example>
Next, another example of acquiring the image feature amount from the ridge line will be described. Specifically, a processing procedure for acquiring a ridge line vector from contour scanning in the present embodiment will be described with reference to a flowchart. FIG. 13 is a flowchart illustrating an example of a processing procedure for acquiring a ridge line vector from contour scanning in the present embodiment.

図１３では、まず、画像解析部１６は、入力画像において、背景又は走査済み画素以外の画素を輪郭線走査し（Ｓ２１）、現在の画素が尾根線上の画素か否かを判断する（Ｓ２２）。ここで、現在の探索画素が尾根線上の画素である場合（Ｓ２２において、ＹＥＳ）、画像解析部１６は、該画素の周りにベクトルの終点があるか否かを判断する（Ｓ２３）。該画素の周りにベクトルの終点がある場合（Ｓ２３において、ＹＥＳ）、次に、画像解析部１６は、ベクトルの終点を現在の画素座標に更新した場合に、ベクトルの傾きが一定値（予め設定された閾値）以上変化するか否かを判断する（Ｓ２４）。 In FIG. 13, first, the image analysis unit 16 performs contour scanning on pixels other than the background or scanned pixels in the input image (S21), and determines whether or not the current pixel is a pixel on the ridge line (S22). . If the current search pixel is a pixel on the ridge line (YES in S22), the image analysis unit 16 determines whether there is a vector end point around the pixel (S23). If there is an end point of the vector around the pixel (YES in S23), the image analysis unit 16 then updates the vector end point to the current pixel coordinates, and the vector inclination is a constant value (preset). It is determined whether or not the threshold value has changed (S24).

Ｓ２４の処理において、ベクトルの傾きが一定値以上変化する場合（Ｓ２４において、ＹＥＳ）、画像解析部１６は、現在の画素を今までのベクトルの終点又は新しいベクトルの始点及び終点とする（Ｓ２５）。 In the process of S24, when the gradient of the vector changes by a certain value or more (YES in S24), the image analysis unit 16 sets the current pixel as the end point of the current vector or the start point and end point of the new vector (S25). .

また、上述したＳ２３に処理において、現在の画素の周りにベクトルの終点がない場合（Ｓ２３において、ＮＯ）、画像解析部１６は、ベクトルの始点及び終点を現在の画素座標とする（Ｓ２６）。また、上述したＳ２４の処理において、ベクトルの傾きが一定値以上変化しない場合（Ｓ２４において、ＮＯ）、画像解析部１６は、ベクトル終点を現在の画素座標に更新する（Ｓ２７）。 In the above-described processing in S23, when there is no vector end point around the current pixel (NO in S23), the image analysis unit 16 sets the vector start point and end point as the current pixel coordinates (S26). In the above-described processing of S24, when the gradient of the vector does not change by a certain value or more (NO in S24), the image analysis unit 16 updates the vector end point to the current pixel coordinates (S27).

Ｓ２２の処理において、現在の画素が尾根線上の画素でない場合（Ｓ２２において、ＮＯ）、又はＳ２５，Ｓ２６，Ｓ２７のうち何れかの処理の終了後、画像解析部１６は、輪郭線走査が終了したか否かを判断する（Ｓ２８）。輪郭線走査が終了していない場合（Ｓ２８において、ＮＯ）、画像解析部１６は、探索（解析）する画素を次の画素へ移動させ（Ｓ２９）、その後、Ｓ２２に戻り、その後続の処理を行う。なお、この処理における次の画素への移動手法としては、例えば上述した図１２Ａ〜１２Ｃに示す移動方法を用いることができるが、これに限定されるものではない。 In the process of S22, when the current pixel is not a pixel on the ridge line (NO in S22), or after the process of any one of S25, S26, and S27 is finished, the image analysis unit 16 finishes the contour line scanning. Whether or not (S28). When the contour line scanning has not ended (NO in S28), the image analysis unit 16 moves the pixel to be searched (analyzed) to the next pixel (S29), and then returns to S22 to perform the subsequent processing. Do. In addition, as a moving method to the next pixel in this process, for example, the moving method shown in FIGS. 12A to 12C described above can be used, but the method is not limited to this.

また、Ｓ２８の処理において、輪郭線走査が終了した場合（Ｓ２８において、ＹＥＳ）、画像解析部１６は、背景又は走査済み画素以外の画素がないか否かを判断し（Ｓ３０）、当該画素がある場合（Ｓ３０において、ＮＯ）、Ｓ２１に戻り、その後続の処理を行う。当該画素がない場合（Ｓ３０において、ＹＥＳ）、画像解析部１６は、処理を終了する。 Further, in the process of S28, when the contour line scanning is completed (YES in S28), the image analysis unit 16 determines whether there is a pixel other than the background or the scanned pixel (S30), and the pixel is If there is any (NO in S30), the process returns to S21 to perform subsequent processing. If there is no such pixel (YES in S30), the image analysis unit 16 ends the process.

＜手指形状推定手法＞
次に、本実施形態における手指形状推定手法について具体的に説明する。まず、カメラ等の撮像装置２１等から得られた画像から手指領域を求め、例えば上述したように画像形状比率、及び画像特徴量等をそれぞれ求める。次に、データベース探索を行うが、本実施形態では、２段階のデータベース探索手法を用いる。<Finger shape estimation method>
Next, the finger shape estimation method in the present embodiment will be specifically described. First, a finger region is obtained from an image obtained from the imaging device 21 such as a camera, and for example, as described above, an image shape ratio, an image feature amount, and the like are obtained. Next, a database search is performed. In this embodiment, a two-step database search method is used.

第１段階目の探索では、以下の式（４）〜（６）に示すような閾値を用いて、例えば画像形状比率による絞り込みを行う。 In the search at the first stage, for example, narrowing down by the image shape ratio is performed using threshold values as shown in the following equations (4) to (6).

ここで、上述した式において、ｔｈ_ｔａｌｌは縦長度に関する閾値を示し、ｔｈ_{ｔｏｐｈｅａｖｙ}は上長度に関する閾値を示し、ｔｈ_{ｒｉｇｈｔｂｉａｓｅｄ}は右長度に関する閾値を示し、Ｒ_ｔａｌｌ［ｉ］はｉ番目のデータセットの縦長度を示し、Ｒ_{ｔｏｐｈｅａｖｙ}［ｉ］はｉ番目のデータセットの上長度を示し、Ｒ_{ｒｉｇｈｔｂｉａｓｅｄ}［ｉ］はｉ番目のデータセットの右長度を示し、Ｒ_{ｃｕｒｒｅｎｔ−ｔａｌｌ}は入力画像の縦長度を示し、Ｒ_{ｃｕｒｒｅｎｔ−ｔｏｐｈｅａｖｙ}は入力画像の上長度を示し、Ｒ_{ｃｕｒｒｅｎｔ−ｒｉｇｈｔｂｉａｓｅｄ}は入力画像の右長度を示す。Here, in the above-mentioned _{formula, th tall} represents a threshold for Vertical _{degree, th Topheavy} represents a threshold for superior _{degree, th Rightbiased} represents a threshold for the right length _{of, R tall} [i] is the i th data set R _topheavy [i] indicates the upper length of the i-th data set, R _rightbiased [i] indicates the right length of the i-th data set, and R _current-tall indicates the vertical length of the input image. R _{current-topheavy} indicates the upper length of the input image, and R _{current-rightbiased} indicates the right length of the input image.

次に、第２段階目の探索では、画像特徴量による類似度計算を行う。類似度計算には、例えば単純なユークリッド距離等を用い、例えば式（７），式（８）を用いて類似度が算出される。 Next, in the second stage search, similarity calculation based on image feature amounts is performed. For the similarity calculation, for example, a simple Euclidean distance or the like is used, and the similarity is calculated using, for example, Expressions (7) and (8).

ここで、上述した式において、Ｅは類似度を示し、ｅ_ｋはｋ番目の分割領域における類似度を示し、ｘ．ｃｕｒｒｅｎｔ［ｊ］_ｋは入力画像のパターンｊの画像特徴量（第１の特徴量）を示し、ｘ．ｄａｔａｓｅｔ［ｉ］［ｊ］_ｋはｉ番目のデータセットのパターンｊの画像特徴量（第２の特徴量）を示す。なお、ｉはデータセット番号を示し、ｊはＨＬＡＣ（ＨｉｇｈｅｒｏｒｄｅｒＬｏｃａｌＡｕｔｏＣｏｒｒｅｌａｔｉｏｎ；高次局所自己相関）パターン番号を示し、ｋは分割領域番号を示し、Ｄは分割領域数を示し、ＰはＨＬＡＣパターン数を示す。Here, in the above equation, E is indicated similarity, e _k represents a degree of similarity in the k-th divided area, x. current [j] _k represents the image feature amount (first feature amount) of the pattern j of the input image, x. dataset [i] [j] _k represents the image feature amount (second feature amount) of the pattern j of the i-th data set. Note that i indicates a data set number, j indicates a HLAC (Higher Order Local AutoCorrelation) pattern number, k indicates a divided region number, D indicates the number of divided regions, and P indicates an HLAC pattern. Indicates a number.

ただし、本実施形態では、２段階のデータベース探索手法が必須ではなく、例えば第１段階目の画像形状比率による絞り込みは省略してもよい。或いは、第１段階目で画像形状比率以外のパラメータによる絞り込みを用いても構わないし、更には、他のデータを用いて３段階以上の処理からなるデータベース探索手法を行い、手指形状推定を行うこともできる。 However, in the present embodiment, a two-stage database search method is not essential, and for example, narrowing down by the first-stage image shape ratio may be omitted. Alternatively, refinement by parameters other than the image shape ratio may be used in the first stage, and further, a database search method including three or more stages using other data is performed to perform finger shape estimation. You can also.

＜ベクトル情報を用いた照合（マッチング）について＞
ここで、上述したベクトル情報を用いて、上述した照合部１７等による照合処理を行う場合には、例えばベクトル特徴量や、ベクトル個数等のうち、少なくとも１つのデータから構成されるデータセットを必要数だけ蓄積したデータベースを予め用意するのが好ましい。<About matching using vector information>
Here, when the above-described vector information is used to perform the collation processing by the above-described collation unit 17 or the like, for example, a data set including at least one of the vector feature amount and the number of vectors is necessary. It is preferable to prepare in advance a database that is accumulated in a number.

また、照合処理の第１段階目の処理としては、データベースの絞り込みを行う。具体的には、例えば、カメラ等の撮像装置２１から取り込んだ手指画像から抽出された尾根線ベクトルの個数とデータベース内の全てのデータセットのベクトル個数とを比べて、両者の個数差が一定値以内のものを選択する。 Further, as the first stage process of the collation process, the database is narrowed down. Specifically, for example, when the number of ridge line vectors extracted from a finger image captured from the imaging device 21 such as a camera is compared with the number of vectors in all data sets in the database, the difference in the number of both is a constant value. Choose one within.

次に、第２段階目の処理として、第１段階目の処理で選択されたデータセット群の中から最も似たデータセットを選択する。具体的には、第１段階目の処理で絞り込まれたデータセット群からベクトル特徴量を用いて最も似たデータセットを選択する。ここで、第２段階目の処理における処理フローチャートについて図を用いて説明する。 Next, as the second stage process, the most similar data set is selected from the data set group selected in the first stage process. Specifically, the most similar data set is selected using a vector feature amount from the data set group narrowed down in the first stage process. Here, a process flowchart in the second stage process will be described with reference to the drawings.

図１４は、照合処理における第２段階目の処理の手順の一例を示すフローチャートである。図１４において、まず、照合部１７は、第１段階目の処理により絞り込まれたデータセット群からｉ番目のデータセットを選ぶ（Ｓ４１）。次いで、照合部１７は、カメラ画像から得られたベクトルｊ番目の始点を参照する（Ｓ４２）。 FIG. 14 is a flowchart illustrating an example of the procedure of the second stage in the matching process. In FIG. 14, the collation unit 17 first selects the i-th data set from the data set group narrowed down by the first stage process (S41). Next, the matching unit 17 refers to the vector jth start point obtained from the camera image (S42).

次に、照合部１７は、参照している始点座標と最も近い座標の始点をｉ番目のデータセット内のベクトル特徴量から探し（Ｓ４３）、参照している始点から延びるベクトルと、前ステップで選んだ始点から延びるベクトルとの間のなす角Ａｎｇｌｅ_ｉｊ、並びに、両ベクトル間の長さの差Ｌｅｎｇｔｈ_ｉｊを調べる（Ｓ４４）。Next, the collation unit 17 searches the vector feature amount in the i-th data set for the start point of the coordinate closest to the reference start point coordinate (S43), and the vector extending from the reference start point is determined in the previous step. The angle Angle _ij formed with the vector extending from the selected starting point and the length difference Length _ij between the two vectors are examined (S44).

ここで、照合部１７は、もう始点はないか否かを判断し（Ｓ４５）、始点がある場合（Ｓ４５において、ＮＯ）、変数ｊをインクリメント（ｊ＋＋）し（Ｓ４６）、Ｓ４２に戻り、照合部１７は、その後続の処理を行う。また、Ｓ４５の処理において、始点がない場合（Ｓ４５において、ＹＥＳ）、次に、照合部１７は、変数Ｍをカメラから得た手画像のベクトル始点総数として、以下の式（９），式（１０）の計算を行う（Ｓ４７）。 Here, the collation unit 17 determines whether or not there is already a starting point (S45). If there is a starting point (NO in S45), the variable j is incremented (j ++) (S46), and the process returns to S42. The unit 17 performs the subsequent processing. If there is no start point in the processing of S45 (YES in S45), the collation unit 17 then uses the following formulas (9) and (9) as the total number of vector start points of the hand image obtained from the camera. 10) is calculated (S47).

次に、照合部１７は、ＳｕｍＡｎｇｌｅ_ｉ及びＳｕｍＬｅｎｇｔｈ_ｉをそれぞれ正規化したものの和を類似度Ｅ_ｉとする（Ｓ４８）。次に、照合部１７は、Ｅ_ｍｉｎを暫定最小類似度として、Ｅ_ｍｉｎ＞Ｅ_ｉであれば、Ｅ_ｍｉｎにＥ_ｉを代入し、暫定的にｉ番目のデータセットが最も似ているデータセットであるとする（Ｓ４９）。Next, the collation unit 17 sets the sum of normalized SumAngle _i and SumLength _i as similarity E _i (S48). Next, the collation unit 17 sets E _min as the provisional minimum similarity, and if E _min > E _i , substitutes E _i for E _min and temporarily sets the data set in which the i-th data set is most similar. (S49).

ここで、照合部１７は、もう調べるデータセットはないか否かを判断し（Ｓ５０）、調べるデータセットがある場合（Ｓ５０において、ＹＥＳ）、変数ｉをインクリメント（ｉ＋＋）し（Ｓ５１）、その後、照合部１７は、Ｓ４１に戻り、その後続の処理を行う。また、調べるデータセットがない場合（Ｓ５０において、ＹＥＳ）、照合部１７は、最終的に最も似ているデータセットの角度情報等を出力し（Ｓ５２）、処理終了する。なお、本実施形態における出力内容については、角度情報に限定されるものではなく、例えばデータセットに含まれる他の情報やデータセットの全情報を出力してもよい。 Here, the collation unit 17 determines whether or not there is a data set to be examined (S50). If there is a data set to be examined (YES in S50), the variable i is incremented (i ++) (S51), and thereafter The collation unit 17 returns to S41 and performs subsequent processing. If there is no data set to be examined (YES in S50), the collation unit 17 finally outputs angle information and the like of the most similar data set (S52), and the process ends. In addition, about the output content in this embodiment, it is not limited to angle information, For example, you may output the other information contained in a data set, or all the information of a data set.

＜評価実験＞
ここで、例えばユーザの手指形状が登録されている手指形状推定装置において、設計者に軍手を装着させることによって疑似的に指の太いユーザを作り出し、個人差による推定精度の変化を評価する。ここで、図１５Ａ及び１５Ｂは、素手の状態と軍手を付けた状態とで同じ手指形状をした場合に得られる輪郭線情報及び尾根線情報の違いの一例を示す図である。図１５Ａは、素手の状態での輪郭線情報及び尾根線情報を示し、図１５Ｂは、軍手を付けた状態での輪郭線情報及び尾根線情報を示す。<Evaluation experiment>
Here, for example, in a finger shape estimation device in which the user's finger shape is registered, a user with a thick finger is created by wearing a work gloves on the designer, and a change in estimation accuracy due to individual differences is evaluated. Here, FIGS. 15A and 15B are diagrams illustrating an example of the difference between the contour line information and the ridge line information obtained when the finger shape is the same between the bare hand state and the military hand state. FIG. 15A shows the contour line information and ridge line information in the state of bare hands, and FIG. 15B shows the contour line information and ridge line information in the state of wearing gloves.

評価実験では、同じ入力データを用いて、従来の輪郭線情報を用いた手指形状推定システムでの推定結果と本手法の尾根線情報を用いたシステムでの推定結果とを比較することで評価を行った。また、評価実験では、右手を何も着用せずにカメラ（例えば、ＶｉｅｗＰＬＵＳ社製、Ｆｉｒｅｆｌｙ（登録商標）ＭＶ広角レンズ付）が設置された空間に置き、左手にデータグローブを装着し、右手と左手とを同時に動かすことにより、入力画像及びその時の関節角度データを入力データとして作成した。 In the evaluation experiment, using the same input data, the evaluation results were compared by comparing the estimation results of the conventional finger shape estimation system using contour information with the estimation results of the system using the ridge line information of this method. went. Also, in the evaluation experiment, without putting on the right hand, place it in the space where the camera (for example, View PLUS (with registered trademark) MV wide-angle lens) is installed, wear a data glove on the left hand, The input image and the joint angle data at that time were created as input data by simultaneously moving the left hand and the left hand.

また、評価実験では、一例として、前腕回旋角度を１８０度で固定し、握る動作、つまむ動作を中心に実験を行った。ここで、図１６Ａ〜１６Ｃは、関節角度の比較結果を示す図である。図１６Ａは、前腕回旋角度１８０度時の拇指ＩＰ関節（ｉｎｔｅｒｐｈａｌａｎｇｅａｌｊｏｉｎｔ；指節間関節）角度を比較したグラフを示し、図１６Ｂは、示指ＰＩＰ関節（ｐｒｏｘｉｍａｌｉｎｔｅｒｐｈａｌａｎｇｅａｌｊｏｉｎｔ；近位指節間関節）角度を比較したグラフを示し、図１６Ｃは、小指ＰＩＰ関節角度を比較したグラフを示す。また、図１６Ａ〜図１６Ｃにおいて、横軸は、撮像された画像の時系列フレームから得られる時刻（ＴＩＭＥ［ＮＵＭＢＥＲＯＦＦＲＡＭＥ］）を示し、縦軸は、関節角度（ＪＯＩＮＴＡＮＧＬＥ［ＤＥＧＲＥＥ］）を示す。また、図１７は、誤差平均、誤差標準偏差の一例を示す図である。 Further, in the evaluation experiment, as an example, the experiment was performed focusing on the grasping operation and the pinching operation with the forearm rotation angle fixed at 180 degrees. Here, FIG. 16A-16C is a figure which shows the comparison result of a joint angle. FIG. 16A shows a graph comparing the angle of the interdigital joint (interphalangeal joint) when the forearm rotation angle is 180 degrees, and FIG. 16B shows the index PIP joint (proximal interphalangeal joint). The graph which compared the angle is shown, FIG. 16C shows the graph which compared the little finger PIP joint angle. In FIG. 16A to FIG. 16C, the horizontal axis indicates the time (TIME [NUMBER OF FRAME]) obtained from the time-series frame of the captured image, and the vertical axis indicates the joint angle (JOINT ANGLE [DEGREE]). Show. FIG. 17 is a diagram showing an example of error average and error standard deviation.

図１６Ａ〜１６Ｃの例では、実際の値（ＭＥＡＳＵＲＥＤ）と、輪郭線を用いて得られた値（ＥＳＴＩＭＡＴＥＤＢＹＯＵＴＬＩＮＥ）と、本実施形態（本手法）における尾根線を用いて得られた値（ＥＳＴＩＭＡＴＥＤＢＹＲＩＤＧＥＬＩＮＥ）とを示している。また、図１７の例では、図１６Ａ〜図１６Ｃに対応する拇指ＩＰ関節角度、示指ＰＩＰ関節角度、小指ＰＩＰ関節角度における誤差平均［度］、誤差標準偏差［度］を示している。 In the examples of FIGS. 16A to 16C, the actual value (MEASURED), the value obtained using the contour line (ESTIMATED BY OUTLINE), and the value obtained using the ridge line in the present embodiment (this method) ( ESTIMATED BY RIDGE LINE). In the example of FIG. 17, the error average [degree] and the error standard deviation [degree] in the thumb IP joint angle, the index finger PIP joint angle, and the little finger PIP joint angle corresponding to FIGS. 16A to 16C are shown.

図１７に示すように、データベースに手指形状を登録してない人がシステムを利用する場合において、従来手法のように輪郭線情報を利用した推定方法と、尾根線情報を利用した推定方法（本手法）とを比較すると、前腕回旋１８０［度］時の拇指ＩＰ関節角度の平均的な推定誤差の範囲は、０．３３±１４．９７［度］（従来手法）から−０．１０±１３．７８［度］（本手法）となる。また、示指ＰＩＰ関節角度の平均的な推定誤差の範囲は、０．９９±１７．２０［度］（従来手法）から−０．０４±１６．８９［度］（本手法）となり、小指ＰＩＰ関節角度の平均的な推定誤差の範囲は、５．５７±２０．３３［度］（従来手法）から７．８４±１５．４５［度］（本手法）となる。 As shown in FIG. 17, when a person who has not registered a finger shape in the database uses the system, an estimation method using contour line information as in the conventional method and an estimation method using ridge line information (this When the forearm rotation is 180 degrees, the average estimation error range of the thumb IP joint angle is 0.33 ± 14.97 degrees (conventional technique) to −0.10 ± 13. 78 degrees (this method). The average estimation error range of the index finger PIP joint angle is changed from 0.99 ± 17.20 [degree] (conventional method) to −0.04 ± 16.89 [degree] (present method), and the little finger PIP The range of the average estimation error of the joint angle is 5.57 ± 20.33 [degrees] (conventional method) to 7.84 ± 15.45 [degrees] (present method).

つまり、本手法のように、尾根線情報を利用することにより、個人差に対応したデータセットを追加することなく推定精度を上げることができる。したがって、本手法を適用することで、多くの人がより不自由なく手指形状推定システムを利用できるようになる。 That is, by using the ridge line information as in this method, the estimation accuracy can be increased without adding a data set corresponding to individual differences. Therefore, by applying this method, many people can use the finger shape estimation system more comfortably.

＜本発明の手指形状推定技術の適用例＞
ここで、本発明の手指形状推定技術の適用例について、図を用いて説明する。図１８Ａ及び１８Ｂは、本発明の適用例を示す図である。本発明の手指形状推定技術は、例えば図１８Ａに示すように、遠隔ロボットの操作に適用することができる。<Application example of finger shape estimation technique of the present invention>
Here, an application example of the finger shape estimation technique of the present invention will be described with reference to the drawings. 18A and 18B are diagrams showing application examples of the present invention. The finger shape estimation technique of the present invention can be applied to the operation of a remote robot, for example, as shown in FIG. 18A.

この適用例では、遠隔地にあるロボット４０に対して、ユーザ（操作者）側の端末（手指形状推定装置１０）で、カメラ等の撮像装置２１によりユーザ４１の実際の手指４２の動きを撮影する。そして、撮影した画像（ここでは、映像）から、上述した本発明の手指形状推定処理を行うことにより、ユーザ４１の実際の手指４２の動きを高精度に推定し、その推定結果をロボット側に送信する。これにより、ユーザ４１の手指４２と同じ動きをロボット４０の手指４３に行わせ、ロボット４０を遠隔操作することができる。 In this application example, the actual movement of the user's finger 42 of the user 41 is imaged by the imaging device 21 such as a camera at the terminal (hand shape estimation device 10) on the user (operator) side of the robot 40 in a remote place. To do. Then, by performing the above-described finger shape estimation processing of the present invention from the captured image (here, video), the actual movement of the user's finger 42 is estimated with high accuracy, and the estimation result is transmitted to the robot side. Send. Thereby, the same movement as the finger 42 of the user 41 is performed on the finger 43 of the robot 40, and the robot 40 can be remotely operated.

なお、図１８Ａに示すように、ロボットカメラ４４で撮影された映像等は、手指形状推定装置１０の画面上に表示することができる。そして、ユーザ４１は、画面に表示されているロボット４０の手指４３の動作を見ながら所定の動きをロボット４０に行わせることができる。また、図１８Ａの例では、例えば上述した図７Ｂに示すような把持動作のデータベースを用いて手指動作の推定を行ってもよい。 Note that, as shown in FIG. 18A, an image or the like captured by the robot camera 44 can be displayed on the screen of the finger shape estimation device 10. Then, the user 41 can cause the robot 40 to perform a predetermined movement while watching the operation of the finger 43 of the robot 40 displayed on the screen. In the example of FIG. 18A, for example, a finger motion may be estimated using a database of gripping motions as shown in FIG. 7B described above.

また、図１８Ｂに示す例は、携帯端末５０等に具備されているカメラ機能を用いて本発明の手指形状推定技術を実現する例である。この場合、センサ類の装着なしに、或いは専用のコントローラなしに、ユーザ５１の手指５２の形状や動きを高速かつ高精度に推定することを可能にする。ここで、携帯端末５０は、上述した本実施形態における手指形状推定装置１０に相当する。 The example shown in FIG. 18B is an example in which the finger shape estimation technique of the present invention is realized using the camera function provided in the portable terminal 50 or the like. In this case, it is possible to estimate the shape and movement of the finger 52 of the user 51 at high speed and with high accuracy without wearing sensors or without a dedicated controller. Here, the portable terminal 50 corresponds to the finger shape estimation apparatus 10 in the present embodiment described above.

したがって、この場合には、例えば、手指動作により駆動するデスクトップメタファー（デスクトップ環境）、携帯端末５０にモバイルプロジェクタ機能５３等を組み合わせた手指動作により駆動するクラウドコンピュータ或いは遠隔会議システム、手指動作による３次元造形情報のコンピュータ入力、身振り手振りによるバーチャルゲーム、コントローラなしの家電機器操作、専用の制御装置なしの遠隔ロボット制御等に本発明の手指形状推定技術を利用することができる。 Therefore, in this case, for example, a desktop metaphor (desktop environment) driven by a finger operation, a cloud computer or a remote conference system driven by a finger operation in which the mobile terminal 50 is combined with the mobile projector function 53, etc., three-dimensional by a finger operation The finger shape estimation technique of the present invention can be used for computer input of modeling information, virtual game by gesture gesture, home appliance operation without a controller, remote robot control without a dedicated control device, and the like.

上述したように本発明によれば、画像中の手指の形状や動きを高速かつ高精度に推定することができる。具体的には、本発明は、センサ類の装着なしに日常動作と同じように手指や腕を動かすことで情報機器、家電製品、ロボット等の操作を可能にする。すなわち、例えばキーボードやマウスにより情報入力を行う形でなく、ディスプレイ上のアイコンをあたかも書類を開いたり、書類を丸めてゴミ箱に捨てたりするといった日常動作によりパソコンを操作できる新しいデスクトップマネージャー（デスクトップメタファー）を実現することができる。 As described above, according to the present invention, the shape and movement of fingers in an image can be estimated at high speed and with high accuracy. Specifically, the present invention enables operations of information devices, home appliances, robots, and the like by moving fingers and arms in the same way as daily operations without wearing sensors. In other words, for example, a new desktop manager (desktop metaphor) that can operate a personal computer by daily operations such as opening a document or rolling a document into a trash can instead of entering information with a keyboard or mouse. Can be realized.

また、本発明によれば、粘土細工のような複雑な自由形状の３次元情報入力を、ＣＡＤ（ＣｏｍｐｕｔｅｒＡｉｄｅｄＤｅｓｉｇｎ）ソフトを使うのではなく、手指動作により入力して、自由形状の３次元物体を造形することができる。また、テレビやビデオ装置等の家電製品もリモコンボックスなしに手指動作により制御できるようになり、ロボットの遠隔操作も専用のコントローラでなく日常動作により行うことが可能となる。 Further, according to the present invention, a complicated free-form three-dimensional information input such as clay work is input by a finger operation instead of using CAD (Computer Aided Design) software, and a free-form three-dimensional object is input. Can be shaped. In addition, home appliances such as televisions and video devices can be controlled by finger operation without a remote control box, and remote operation of the robot can be performed by daily operation instead of a dedicated controller.

つまり、本発明によれば、照合用データセットを個人差に影響しない情報を用いて作成するため、特別に個人差対応用のデータセットを付加することなく、かつ、データセットの特徴量次元数が減少するため、データセットの内容を豊富にすることができ、高速かつ高精度に手指形状推定を行うことができる。なお、データセットの内容を豊富にするとは、例えば従来では、例えば指の屈折角度に対して４５度毎にデータセットを蓄積していたのを、５度毎や１０度毎の間隔で作成することを意味する。つまり、本発明により、各々の指関節が中途半端に曲がった状態を多段階で蓄積することができ、これにより高い分解能で、高精度に手指形状を推定することができる。 That is, according to the present invention, since the matching data set is created using information that does not affect individual differences, the feature quantity dimension of the data set can be added without adding a special data set for individual differences. Therefore, the contents of the data set can be enriched, and finger shape estimation can be performed at high speed and with high accuracy. In order to enrich the contents of the data set, for example, in the past, for example, the data set was accumulated every 45 degrees with respect to the refraction angle of the finger, and created at intervals of 5 degrees or 10 degrees. Means that. That is, according to the present invention, the state in which each finger joint is bent halfway can be accumulated in multiple stages, whereby the finger shape can be estimated with high resolution and high accuracy.

以上本発明の手指形状推定技術の好ましい実施例について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形、変更が可能である。 As mentioned above, although the preferable example of the finger shape estimation technique of this invention was explained in full detail, this invention is not limited to the specific embodiment which concerns, and in the range of the summary of this invention described in the claim Various modifications and changes are possible.

２．第２の実施形態
第２の実施形態では、手指画像内の爪領域を抽出するための爪領域抽出技術（爪領域抽出装置、爪領域抽出方法、及び、爪領域抽出プログラム）について、説明する。2. Second Embodiment In the second embodiment, a nail region extraction technique (a nail region extraction device, a nail region extraction method, and a nail region extraction program) for extracting a nail region in a finger image will be described.

＜従来の爪領域抽出技術について＞
従来、指の先端位置を特定する手法として、例えば、特開２００９−２６５８０９号公報等には、爪の存在情報を推定時の情報に付加し、推定精度を向上させる手法が開示されている。この文献では、予め爪領域画素の特徴量を集めたデータベースと、爪を含まない肌領域画素の特徴量を集めたデータベースとを用いて機械学習により識別器を構築し、その識別器を用いて爪を検出することで、指先等の動作を識別し、各動作に割り当てられた制御を実行している。<Conventional nail region extraction technology>
Conventionally, as a technique for specifying the tip position of a finger, for example, Japanese Unexamined Patent Application Publication No. 2009-265809 discloses a technique for adding nail presence information to information at the time of estimation to improve estimation accuracy. In this document, a classifier is constructed by machine learning using a database that collects feature values of nail region pixels in advance and a database that collects feature values of skin region pixels that do not include a nail, and the classifier is used. By detecting the nail, the movement of the fingertip or the like is identified, and the control assigned to each movement is executed.

しかしながら、上記文献に示すような技術では、データベース作成時に、手動で爪を切りだして解析を行う必要があり、個人別のデータベースを作成する場合には、多大な時間と労力とを個人毎にかけなければならなかった。その上、この技術は、識別器の利用による画素の領域判定しか行っていない。そのため、例えば指先等のように、爪領域と肌領域との画素の色が一般的に大きく異なる領域のみを対象とする場合には、判別できるが、例えば掌側等に存在する色が非常に爪に類似している肌領域画素が含まれる画像には対応できなかった。また、手は、人毎に色が異なるために個人差への対応が困難であった。 However, in the technology as shown in the above document, it is necessary to manually cut out the nail and analyze it when creating the database. When creating a database for each individual, a great deal of time and effort is spent for each individual. I had to. Moreover, this technique only performs pixel region determination by using a discriminator. Therefore, for example, when only the areas where the color of the pixel of the nail area and the skin area generally differ greatly, such as a fingertip, can be discriminated, for example, the color existing on the palm side is very An image containing skin region pixels similar to a nail could not be handled. In addition, since hands have different colors for each person, it is difficult to cope with individual differences.

第２の実施形態では、上述した課題を解決し、手指画像中の爪領域を高精度に抽出することが可能な爪領域抽出技術（爪領域抽出装置、爪領域抽出方法、及び、爪領域抽出プログラム）について説明する。 In the second embodiment, the above-described problem is solved, and a nail region extraction technique (nail region extraction device, nail region extraction method, and nail region extraction) that can extract a nail region in a finger image with high accuracy is provided. Program).

＜第２の実施形態の概要説明＞
第２の実施形態に係る爪領域抽出装置は、撮像装置により撮影された画像中に含まれる爪領域を抽出する爪領域抽出装置であり、前記撮像装置により撮影された画像を取得する画像取得部と、前記画像取得部により得られる画像を解析し、解析された結果から得られる所定の特徴量から爪領域を抽出する爪領域抽出部とを有し、前記爪領域抽出部は、前記画像から得られる色情報のみを用いて分離平面を生成し、生成された分離平面に基づいて爪領域候補を抽出し、予め設定された掌を含む画像を用いた画素判別により、前記爪領域候補に対して爪領域の再判定を行うことを特徴とする。<Overview of Second Embodiment>
The nail region extraction device according to the second embodiment is a nail region extraction device that extracts a nail region included in an image captured by an imaging device, and an image acquisition unit that acquires an image captured by the imaging device And an image obtained by the image acquisition unit, and a nail region extraction unit that extracts a nail region from a predetermined feature amount obtained from the analysis result, the nail region extraction unit A separation plane is generated using only the obtained color information, a nail region candidate is extracted based on the generated separation plane, and pixel determination using an image including a preset palm is performed on the nail region candidate. The nail region is re-determined.

また、第２の実施形態に係る爪領域抽出方法は、撮像装置により撮影された画像中に含まれる爪領域を抽出する爪領域抽出方法であり、前記撮像装置により撮影された画像を取得する画像取得ステップと、前記画像取得ステップにより得られる画像を解析し、解析された結果から得られる所定の特徴量から爪領域を抽出する爪領域抽出ステップとを有し、前記爪領域抽出ステップは、前記画像から得られる色情報のみを用いて分離平面を生成し、生成された分離平面に基づいて爪領域候補を抽出し、予め設定された掌を含む画像を用いた画素判別により、前記爪領域候補に対して爪領域の再判定を行うことを特徴とする。 A nail region extraction method according to the second embodiment is a nail region extraction method for extracting a nail region included in an image photographed by an imaging device, and an image for acquiring an image photographed by the imaging device. An acquisition step; and an nail region extraction step of analyzing the image obtained by the image acquisition step and extracting a nail region from a predetermined feature amount obtained from the analyzed result, wherein the nail region extraction step includes: A separation plane is generated using only color information obtained from the image, a nail region candidate is extracted based on the generated separation plane, and the nail region candidate is obtained by pixel discrimination using an image including a preset palm. The nail region is re-determined with respect to.

更に、第２の実施形態に係る爪領域抽出プログラムは、上記爪領域抽出方法の各処理ステップを、情報処理装置に実装して実行させるための爪領域抽出プログラムである。 Furthermore, a nail region extraction program according to the second embodiment is a nail region extraction program for implementing each processing step of the nail region extraction method on an information processing apparatus.

上述した第２の実施形態に係る爪領域抽出技術によれば、画像中の爪領域を高精度に抽出することができる。 According to the nail region extraction technique according to the second embodiment described above, the nail region in the image can be extracted with high accuracy.

＜第２の実施形態に係る爪領域抽出技術について＞
第２の実施形態に係る爪領域抽出技術では、色情報のみを用いた分離平面生成と、掌を含む画像に対しても爪領域の推定を可能とするための画素判別とを行うと共に画素判別後の再判定を行う。つまり、本実施形態では、分離平面を決定するための処理（アルゴリズム）を有することで、爪及び肌のそれぞれの解析が不要となるため、処理内容を軽減して処理速度を向上させることができる。また、本実施形態では、例えば従来手法のように、データベース等を用いた推定を行わないため、個人差の影響への対応が原理的に簡単となる。更に、本実施形態では、爪の色と類似した色を有する肌領域を除去する手法を備えているため、手指領域全体に対して高精度に爪領域を抽出することができる。<About the nail | claw area | region extraction technique which concerns on 2nd Embodiment>
In the nail region extraction technology according to the second embodiment, pixel generation is performed while performing separation plane generation using only color information and pixel determination for enabling nail region estimation even for an image including a palm. Re-determine later. In other words, in the present embodiment, by having the process (algorithm) for determining the separation plane, each analysis of the nail and the skin becomes unnecessary, so that the processing content can be reduced and the processing speed can be improved. . Further, in this embodiment, for example, unlike the conventional method, since estimation using a database or the like is not performed, it is possible in principle to cope with the influence of individual differences. Furthermore, in this embodiment, since a method for removing a skin region having a color similar to the color of the nail is provided, the nail region can be extracted with high accuracy for the entire finger region.

以下に、本実施形態における爪領域抽出装置、爪領域抽出方法、及び爪領域抽出プログラムについて、図面を用いて説明する。 Hereinafter, a nail region extraction device, a nail region extraction method, and a nail region extraction program according to the present embodiment will be described with reference to the drawings.

＜爪領域抽出装置：機能構成例＞
まず、本実施形態における爪領域抽出装置の機能構成例について図を用いて説明する。図１９は、本実施形態における爪領域抽出装置の機能構成の一例を示す図である。図１９に示す爪領域抽出装置１１０は、入力部１１１と、出力部１１２と、蓄積部１１３と、画像取得部１１４と、画像解析部１１５と、爪領域抽出部１１６と、手指形状推定部１１７と、送受信部１１８と、制御部１１９とを有するよう構成されている。<Nail region extraction device: functional configuration example>
First, a functional configuration example of the nail region extraction device in the present embodiment will be described with reference to the drawings. FIG. 19 is a diagram illustrating an example of a functional configuration of the nail region extraction device according to the present embodiment. 19 includes an input unit 111, an output unit 112, a storage unit 113, an image acquisition unit 114, an image analysis unit 115, a nail region extraction unit 116, and a finger shape estimation unit 117. And a transmission / reception unit 118 and a control unit 119.

入力部１１１は、ユーザ等からの画像取得指示、画像解析指示、爪領域抽出指示、手指形状推定指示、送受信指示等の各種指示の開始／終了等の入力を受け付ける。なお、入力部１１１は、例えば爪領域抽出装置１１０がＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）等の汎用のコンピュータであればキーボードやマウス等のポインティングデバイスからなり、スマートフォンや携帯電話等の情報端末装置やゲーム機器であれば各操作ボタン群等からなる。また、入力部１１１は、音声等により上述した指示等の音声を入力する音声入力機能を有していてもよい。 The input unit 111 accepts input such as start / end of various instructions such as an image acquisition instruction, an image analysis instruction, a nail region extraction instruction, a finger shape estimation instruction, and a transmission / reception instruction from a user or the like. Note that the input unit 111 includes a pointing device such as a keyboard and a mouse if the nail region extraction device 110 is a general-purpose computer such as a PC (Personal Computer), and is an information terminal device such as a smartphone or a mobile phone or a game device. If there are, it consists of each operation button group. Further, the input unit 111 may have a voice input function for inputting voice such as the above-described instructions by voice or the like.

出力部１１２は、入力部１１１により入力された内容や、入力内容に基づいて実行された内容等の情報の出力を行う。具体的には、出力部１１２は、取得した画像や画像解析結果、爪領域抽出結果、手指形状推定結果等の爪領域抽出装置１１０における各構成の処理結果や処理経過等の画面表示や音声出力等を行う。なお、出力部１１２は、ディスプレイやスピーカ等からなる。更に、出力部１１２は、プリンタ等の印刷機能を有していてもよく、上述の各出力内容を、例えば紙等の各種印刷媒体等に印刷し、ユーザ等に提供することもできる。 The output unit 112 outputs information such as the content input by the input unit 111 and the content executed based on the input content. Specifically, the output unit 112 displays a screen display and a sound output of processing results and processing progress of each component in the nail region extraction device 110 such as an acquired image, an image analysis result, a nail region extraction result, and a finger shape estimation result. Etc. The output unit 112 includes a display, a speaker, and the like. Further, the output unit 112 may have a printing function such as a printer, and the above-described output contents can be printed on various printing media such as paper and provided to a user or the like.

蓄積部１１３は、本実施形態において必要となる各種情報や、処理の実行時又は実行後の各種データなどを蓄積する。具体的には、蓄積部１１３は、予め蓄積されている画像や、画像取得部１１４で取得される撮影等により得られた画像（例えば、映像等のように時系列的な画像も含む）等を蓄積する。また、蓄積部１１３は、画像解析部１１５にて解析された解析結果、爪領域抽出部１１６における抽出結果、手指形状推定部１１７による推定結果等を蓄積する。また、蓄積部１１３は、必要に応じて蓄積されている各種データを読み出すことができる。 The accumulation unit 113 accumulates various types of information necessary in the present embodiment, various types of data at the time of execution of processing or after execution of processing, and the like. Specifically, the storage unit 113 is an image stored in advance, an image obtained by shooting or the like acquired by the image acquisition unit 114 (for example, including a time-series image such as a video), and the like. Accumulate. The accumulation unit 113 accumulates analysis results analyzed by the image analysis unit 115, extraction results by the nail region extraction unit 116, estimation results by the finger shape estimation unit 117, and the like. Further, the storage unit 113 can read out various data stored as necessary.

画像取得部１１４は、例えば撮像装置１２０等により撮影された画像や映像等を取得する。なお、説明の便宜上、画像取得部１１４により取得される画像には、手指が含まれているものとするが、本実施形態においてはこれに限定されるものではない。 The image acquisition unit 114 acquires, for example, an image or video captured by the imaging device 120 or the like. For convenience of explanation, it is assumed that the image acquired by the image acquisition unit 114 includes a finger, but the present embodiment is not limited to this.

ここで、本実施形態では、撮像装置１２０を爪領域抽出装置１１０の外部に設けたが、本実施形態においてはこれに限定されるものではなく、撮像装置１２０が、例えば爪領域抽出装置１１０内に内蔵されていてもよい。また、画像取得部１１４により取得される画像や映像は、撮像装置１２０により撮影される実際の手指の画像や映像等に限定されるものではなく、例えば模型の手指や写真、ポスター等を撮影した画像等であってもよい。また、画像取得部１１４は、送受信部１１８を介して、通信ネットワーク上に接続される外部装置やデータベース等に蓄積されている画像や映像等を取得することもできる。画像取得部１１４によって取得した画像等は、蓄積部１１３に蓄積させることができ、必要に応じて蓄積部１１３から読み出すことができる。 Here, in the present embodiment, the imaging device 120 is provided outside the nail region extraction device 110. However, the present embodiment is not limited to this, and the imaging device 120 is, for example, in the nail region extraction device 110. It may be built in. In addition, the images and videos acquired by the image acquisition unit 114 are not limited to images and videos of actual fingers captured by the imaging device 120. For example, a model of fingers, photos, posters, and the like were captured. It may be an image or the like. The image acquisition unit 114 can also acquire images, videos, and the like stored in an external device or database connected on the communication network via the transmission / reception unit 118. The image acquired by the image acquisition unit 114 can be stored in the storage unit 113 and can be read out from the storage unit 113 as necessary.

画像解析部１１５は、画像取得部１１４にて取得した画像を解析する。具体的には、画像解析部１１５は、画像中における画素毎の特徴量等から、どの部分（位置、領域）に手指や爪等のオブジェクトの位置が映し出されているか、又は、映像中において手指や爪等のオブジェクトがどのように移動しているか等を解析する。つまり、画像解析部１１５は、撮影された手や爪等の画像の特徴量の数値化を行う。 The image analysis unit 115 analyzes the image acquired by the image acquisition unit 114. Specifically, the image analysis unit 115 indicates in which part (position, region) the position of an object such as a finger or a nail is projected from the feature amount of each pixel in the image, or the finger in the video. And how objects such as nails are moving. That is, the image analysis unit 115 digitizes the feature amount of the image of the captured hand or nail.

爪領域抽出部１１６は、画像解析部１１５により解析された結果に基づいて、その画像に含まれる爪領域の候補を抽出する。なお、抽出される爪領域は、例えば画像の輝度情報や閾値等に基づいて抽出することができる。また、爪領域抽出部１１６は、抽出した爪領域から各爪の重心座標又は中心座標を求め、それを位置情報として出力することができるが、本実施形態においてはこれに限定されるものではなく、例えば爪の存在情報、重心、及び爪毎領域面積等のうち、少なくとも１つを出力してもよい。爪領域抽出部１１６における具体的な爪領域の抽出手法については後述する。また、抽出された爪領域に関する情報は、蓄積部１１３に蓄積させることができ、必要に応じて蓄積部１１３から読み出すことができる。 The nail region extraction unit 116 extracts nail region candidates included in the image based on the result analyzed by the image analysis unit 115. Note that the nail region to be extracted can be extracted based on, for example, luminance information of an image, a threshold value, or the like. Further, the nail region extraction unit 116 can obtain the center-of-gravity coordinates or center coordinates of each nail from the extracted nail region and output it as position information. However, the present embodiment is not limited to this. For example, at least one of the nail presence information, the center of gravity, and the area area for each nail may be output. A specific nail region extraction method in the nail region extraction unit 116 will be described later. Information about the extracted nail region can be accumulated in the accumulation unit 113 and can be read out from the accumulation unit 113 as necessary.

手指形状推定部１１７は、爪領域抽出部１１６により設定された爪領域の情報に基づいて手指の形状を推定する。具体的には、画像中に含まれる爪の位置情報、手指の輪郭形状（輪郭線情報）等を用い、予め爪の位置情報及び手指の輪郭形状に対応する手指形状が設定されたデータベースと入力画像とを照合することで、手指の形状を高精度に推定することができる。本実施形態に示すように爪の情報を用いることで、例えば手指形状が手の甲側であるか、掌側であるかといった判別を高精度に行うことができる。また、掌や手の甲がカメラ等に対してどれくらい回転しているかという情報を高精度に推定することもできる。なお、本実施形態においては、手指形状推定部１１７を設けていない構成であってもよい。 The finger shape estimation unit 117 estimates the finger shape based on the information on the nail region set by the nail region extraction unit 116. Specifically, using a nail position information, a finger contour shape (contour line information), and the like included in the image, a database in which the finger shape corresponding to the nail position information and the finger contour shape is set in advance is input. By collating with the image, the shape of the finger can be estimated with high accuracy. By using nail information as shown in the present embodiment, for example, it is possible to determine with high accuracy whether the finger shape is the back side of the hand or the palm side. In addition, information on how much the palm or back of the hand is rotating with respect to the camera or the like can be estimated with high accuracy. In the present embodiment, the finger shape estimation unit 117 may not be provided.

また、送受信部１１８は、通信ネットワーク等を用いて接続可能な外部装置から所望する外部画像（例えば撮影画像や映像等）や、本実施形態における爪領域抽出処理を実現するための実行プログラム等を取得するためのインターフェースである。また、送受信部１１８は、爪領域抽出装置１１０内で得られた各種情報を外部装置に送信することができる。 The transmission / reception unit 118 also includes an external image desired from an external device that can be connected using a communication network or the like, an execution program for realizing the nail region extraction processing in the present embodiment, and the like. It is an interface for obtaining. Further, the transmission / reception unit 118 can transmit various information obtained in the nail region extraction device 110 to an external device.

制御部１１９は、爪領域抽出装置１１０の各構成部全体の制御を行う。具体的には、制御部１１９は、例えばユーザ等による入力部１１１からの指示等に基づいて、画像の取得、画像解析、爪領域の抽出、手指形状の推定等の各処理における制御等を行う。 The control unit 119 controls the entire components of the nail region extraction device 110. Specifically, the control unit 119 performs control in each process such as image acquisition, image analysis, nail region extraction, finger shape estimation, and the like based on, for example, an instruction from the input unit 111 by a user or the like. .

撮像装置１２０は、デジタルカメラや高精度カメラ等からなり、ユーザの実際の手指や模型の手指等の画像や映像を取得する。なお、撮像装置１２０は、１台だけ設けられていてもよいし、異なる方向から同時に手指を撮影できるように複数台、設けられていてもよい。 The imaging device 120 includes a digital camera, a high-precision camera, and the like, and acquires images and videos of the user's actual fingers and model fingers. Note that only one imaging device 120 may be provided, or a plurality of imaging devices 120 may be provided so that fingers can be photographed simultaneously from different directions.

＜爪領域抽出装置１１０：ハードウェア構成＞
ここで、上述した爪領域抽出装置１１０においては、各機能をコンピュータ（ハードウェア）に実行させることができるソフトウェアとしての実行プログラム（例えば、爪領域抽出プログラム）等を生成し、例えばＰＣ等の汎用のパーソナルコンピュータ、サーバ、スマートフォンや携帯電話等の情報端末装置、ゲーム機器等にその実行プログラムをインストールすることにより、本実施形態における爪領域抽出処理等を実現することができる。<Nail region extraction device 110: hardware configuration>
Here, the above-described nail region extraction device 110 generates an execution program (for example, a nail region extraction program) as software that can cause a computer (hardware) to execute each function, and is a general-purpose device such as a PC. By installing the execution program in an information terminal device such as a personal computer, a server, a smartphone or a mobile phone, a game machine, or the like, the nail region extraction processing or the like in the present embodiment can be realized.

ここで、本実施形態における爪領域抽出処理が実現可能なコンピュータのハードウェア構成例について図を用いて説明する。図２０は、本実施形態における爪領域抽出処理が実現可能なハードウェア構成の一例を示す図である。 Here, a hardware configuration example of a computer capable of realizing the nail region extraction process in the present embodiment will be described with reference to the drawings. FIG. 20 is a diagram illustrating an example of a hardware configuration capable of realizing the nail region extraction process according to the present embodiment.

図２０におけるコンピュータ本体には、入力装置１２１と、出力装置１２２と、ドライブ装置１２３と、補助記憶装置１２４と、メモリ装置１２５と、各種制御を行うＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１２６と、ネットワーク接続装置１２７とを有するよう構成されており、これらはシステムバスＢで相互に接続されている。 20 includes an input device 121, an output device 122, a drive device 123, an auxiliary storage device 124, a memory device 125, a CPU (Central Processing Unit) 126 that performs various controls, and a network connection device. 127 are connected to each other by a system bus B.

入力装置１２１は、ユーザ等が操作するキーボード及びマウス等のポインティングデバイスを有しており、ユーザ等からのプログラムの実行等の各種操作信号を入力する。また、入力装置１２１は、例えばカメラ等の撮像装置１２０から撮影された画像を入力する画像入力ユニットを有していてもよい。 The input device 121 has a pointing device such as a keyboard and a mouse operated by a user or the like, and inputs various operation signals such as execution of a program from the user or the like. Further, the input device 121 may include an image input unit that inputs an image taken from the imaging device 120 such as a camera.

出力装置１２２は、本実施形態における処理を行うためのコンピュータ本体を操作するのに必要な各種ウィンドウやデータ等を表示するディスプレイを有し、ＣＰＵ１２６が有する制御プログラムによりプログラムの実行経過や結果等を表示することができる。 The output device 122 has a display for displaying various windows and data necessary for operating the computer main body for performing processing in the present embodiment, and the execution program and results of the program are displayed by the control program of the CPU 126. Can be displayed.

ここで、本実施形態においてコンピュータ本体にインストールされる実行プログラムは、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）メモリやＣＤ−ＲＯＭ等の可搬型の記録媒体１２８等により提供される。プログラムを記録した記録媒体１２８は、ドライブ装置１２３にセット可能であり、記録媒体１２８に含まれる実行プログラムが、記録媒体１２８からドライブ装置１２３を介して補助記憶装置１２４にインストールされる。 Here, the execution program installed in the computer main body in the present embodiment is provided by a portable recording medium 128 such as a USB (Universal Serial Bus) memory or a CD-ROM, for example. The recording medium 128 on which the program is recorded can be set in the drive device 123, and the execution program included in the recording medium 128 is installed in the auxiliary storage device 124 from the recording medium 128 via the drive device 123.

補助記憶装置１２４は、ハードディスク等のストレージ装置であり、本実施形態における実行プログラムやコンピュータに設けられた制御プログラム等を蓄積し、必要に応じてそれらの入出力を行うことができる。 The auxiliary storage device 124 is a storage device such as a hard disk, and can store an execution program in this embodiment, a control program provided in a computer, and the like, and can input and output them as necessary.

メモリ装置１２５は、ＣＰＵ１２６により補助記憶装置１２４から読み出された実行プログラム等を格納する。なお、メモリ装置１２５は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）やＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等からなる。 The memory device 125 stores an execution program read from the auxiliary storage device 124 by the CPU 126. The memory device 125 includes a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.

ＣＰＵ１２６は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）等の制御プログラム、及びメモリ装置１２５に格納されている実行プログラムに基づいて、各種演算や各ハードウェア構成部とのデータの入出力等、コンピュータ全体の処理を制御して、爪領域抽出処理における各処理を実現することができる。なお、プログラムの実行中に必要な各種情報等は、補助記憶装置１２４から取得することができ、また実行結果等を補助記憶装置１２４に格納することもできる。 The CPU 126 controls processing of the entire computer, such as various operations and input / output of data with each hardware component, based on a control program such as an OS (Operating System) and an execution program stored in the memory device 125. Thus, each process in the nail region extraction process can be realized. Various information necessary during the execution of the program can be acquired from the auxiliary storage device 124, and the execution result can be stored in the auxiliary storage device 124.

ネットワーク接続装置１２７は、通信ネットワーク等と接続することにより、実行プログラムを通信ネットワークに接続されている他の端末等から取得したり、プログラムを実行することで得られた実行結果又は本実施形態における実行プログラム自体を他の端末等に提供することができる。 The network connection device 127 obtains an execution program from another terminal connected to the communication network by connecting to a communication network or the like, or an execution result obtained by executing the program or in the present embodiment. The execution program itself can be provided to other terminals.

上述したようなハードウェア構成により、本実施形態における爪領域抽出処理を実行することができる。また、プログラムをインストールすることにより、汎用のパーソナルコンピュータ等で本実施形態における爪領域抽出処理を容易に実現することができる。 With the hardware configuration as described above, the nail region extraction process in the present embodiment can be executed. Also, by installing the program, the nail region extraction process in the present embodiment can be easily realized by a general-purpose personal computer or the like.

次に、上述した爪領域抽出プログラムにおける爪領域抽出処理について具体的に説明する。 Next, the nail area extraction process in the above-described nail area extraction program will be specifically described.

＜爪領域抽出処理手順＞
まず、本実施形態における爪領域抽出処理手順の概略について説明する。図２１は、本実施形態における爪領域抽出処理手順の一例を示すフローチャートである。なお、以下に説明する各種処理における各部の動作は、制御部１１９（ＣＰＵ１２６）により制御される。<Nail region extraction processing procedure>
First, an outline of the nail region extraction processing procedure in the present embodiment will be described. FIG. 21 is a flowchart illustrating an example of a nail region extraction processing procedure in the present embodiment. Note that the operation of each unit in various processes described below is controlled by the control unit 119 (CPU 126).

図２１に示す爪領域抽出処理では、まず、画像取得部１１４は、カメラ等の撮像装置１２０により撮影された画像を取得する（Ｓ１０１）。次いで、画像解析部１１５は、画像の解析を行い（Ｓ１０２）、画像中に含まれる手指や爪等のオブジェクトの位置情報等を取得する。 In the nail region extraction process shown in FIG. 21, first, the image acquisition unit 114 acquires an image captured by the imaging device 120 such as a camera (S101). Next, the image analysis unit 115 analyzes the image (S102), and acquires position information of objects such as fingers and nails included in the image.

次に、爪領域抽出部１１６は、Ｓ１０２の処理にて得られた情報に基づいて爪の領域を抽出する（Ｓ１０３）。次いで、手指形状推定部１１７は、抽出された爪の領域等に基づいて手指形状の推定を行う（Ｓ１０４）。そして、手指形状推定部１１７は、その推定結果を出力する（Ｓ１０５）。なお、本実施形態においては、これに限定されるものではなく、例えばＳ１０３の処理終了後、手指形状推定部１１７が、画像中における爪の領域のみを出力してもよい。 Next, the nail region extracting unit 116 extracts a nail region based on the information obtained in the process of S102 (S103). Next, the finger shape estimation unit 117 estimates the finger shape based on the extracted nail region and the like (S104). Then, the finger shape estimation unit 117 outputs the estimation result (S105). In the present embodiment, the present invention is not limited to this. For example, the finger shape estimation unit 117 may output only the nail region in the image after the process of S103 ends.

次に、制御部１１９は、処理を終了するか否かを判断し（Ｓ１０６）、終了しない場合（Ｓ１０６において、ＮＯ）、Ｓ１０１に戻り、制御部１１９は、例えば連続する画像、つまり映像に対して上述の処理を行って時系列的に結果を出力したり、又は、他の画像を取得して、制御部１１９は、上述した処理を行う。 Next, the control unit 119 determines whether or not to end the process (S106). If the process does not end (NO in S106), the control unit 119 returns to S101, and the control unit 119 performs, for example, continuous images, that is, videos. Then, the control unit 119 performs the above-described processing by performing the above-described processing and outputting the result in time series or acquiring another image.

また、Ｓ１０６の処理において、ユーザの指示等により処理を終了する場合（Ｓ１０６において、ＹＥＳ）、制御部１１９は、爪領域抽出処理を終了する。 Further, in the process of S106, when the process is terminated by a user instruction or the like (YES in S106), the control unit 119 ends the nail region extraction process.

＜Ｓ１０３：爪領域抽出処理について＞
次に、上述したＳ１０３における爪領域抽出処理の具体例について図等を用いて説明する。<S103: Nail Region Extraction Processing>
Next, a specific example of the nail region extraction process in S103 described above will be described with reference to the drawings.

［画像中の手指を構成する画素の分布］
まず最初に、ＲＧＢ色空間における手指画像の画素分布の特徴について説明する。図２２は、手指画像のＲＧＢ色空間における画素分布のモデルの一例を示す図である。図２２に示すように、肌領域画素（ＳＫＩＮＡＲＥＡＤＩＳＴＲＩＢＵＴＩＯＮ）は、薄い楕円体のように密集して分布しており、爪領域画素（ＮＡＩＬＡＲＥＡＤＩＳＴＲＩＢＵＴＩＯＮ）は、楕円体の上部に一部の空間を共有しながら層状に乗るような形で分布している。ここで、この共有部分（ＣＯＭＭＯＭＡＲＥＡ）は、明度が低いほど増加する。[Distribution of pixels constituting the fingers in the image]
First, the characteristics of the pixel distribution of the finger image in the RGB color space will be described. FIG. 22 is a diagram illustrating an example of a pixel distribution model in the RGB color space of a finger image. As shown in FIG. 22, the skin area pixels (SKIN AREA DISTRIBUTION) are densely distributed like a thin ellipsoid, and the nail area pixels (NAIL AREA DISTRIBUTION) are part of the space above the ellipsoid. It is distributed in such a way that it rides in layers while sharing. Here, this shared part (COMMOM AREA) increases as the lightness decreases.

したがって、基本的には、手指画像全体を明るくすることが爪領域画素分布を肌領域画素分布から分離するための重要な条件となる。しかし一方では、手の形状、並びに、カメラと手との間の位置関係により光の反射が異なるため、場所等により明度（輝度等）が低下し、共有部分が増加する場合もある。このような場合には、上述の条件だけでは対応できない。したがって、様々な手の形状から爪を精度よく検出するためには、この反射の違いによる明度の差を考慮する必要がある。 Therefore, basically, brightening the entire finger image is an important condition for separating the nail region pixel distribution from the skin region pixel distribution. However, on the other hand, since light reflection differs depending on the shape of the hand and the positional relationship between the camera and the hand, the brightness (luminance, etc.) may decrease depending on the location and the like, and the shared portion may increase. In such a case, it is impossible to cope with only the above-mentioned conditions. Therefore, in order to accurately detect the nail from various hand shapes, it is necessary to consider the difference in brightness due to this difference in reflection.

［明度の差を考慮した本実施形態における爪領域抽出手法について］
そこで、本実施形態では、以下の手法に基づいて爪領域の抽出を行う。図２３は、本実施形態における爪領域抽出処理の具体例を示すフローチャートである。また、手指画像の背景は黒色とするが、これに限定されるものではない。[Nail region extraction method in the present embodiment considering brightness difference]
Therefore, in this embodiment, the nail region is extracted based on the following method. FIG. 23 is a flowchart showing a specific example of nail region extraction processing in the present embodiment. Moreover, although the background of a finger image is black, it is not limited to this.

最初に、爪領域抽出部１１６は、入力画像の撮像ノイズを取り除くために、例えばメディアンフィルタ等を用いて前処理を行う（Ｓ１１１）。なお、前処理とは、例えばスムージング処理等を含むが、本実施形態においてはこれに限定されるものではない。また、前処理としてコントラスト調整や背景分離などの処理が含まれてもよい。なお、上述した前処理によるスムージング処理では、例えば撮像ノイズを除去するために、小さな枠組みである３×３等のメディアンフィルタを用いることによりノイズ除去を行う。また、撮像ノイズ除去を目的としているため、スムージング処理では、例えば非線形フィルタ（つまり、異常値に影響されないフィルタ）を用いるが、これに限定されるものではない。 First, the nail region extraction unit 116 performs preprocessing using, for example, a median filter in order to remove imaging noise from the input image (S111). The preprocessing includes, for example, smoothing processing, but is not limited to this in the present embodiment. Further, the preprocessing may include processing such as contrast adjustment and background separation. In the smoothing process by the pre-processing described above, noise removal is performed by using a median filter such as 3 × 3 which is a small framework in order to remove, for example, imaging noise. Further, since the purpose is to remove imaging noise, in the smoothing process, for example, a non-linear filter (that is, a filter that is not affected by an abnormal value) is used, but the present invention is not limited to this.

次に、爪領域抽出部１１６は、爪の色に似た画素を抽出し（Ｓ１１２）、２値化した画像の平滑化処理やラベリング処理を行うことにより、爪に似た色を持つ領域を爪領域候補として検出する（Ｓ１１３）。なお、上述した平滑化処理では、例えばバラバラに抽出された画素を結合させて１つの領域にするために、例えば７×７等のメディアンフィルタを用いるが、フィルタの領域は７×７に限定されるものではない。また、メディアンフィルタによる平滑化処理は、具体的には、まず、元の画像をＲチャンネル画像、Ｇチャンネル画像、Ｂチャンネル画像の３つの画像に分割し、次いで、それぞれの画像に対して平滑化を行い、そして、平滑化後の３つの画像を再び統合して１つのＲＧＢ画像に統合する。また、本実施形態では、画素の結合を目的としているため、平滑化手法としてメディアンフィルタを用いるが、本実施形態においてはこれに限定されるものではなく、例えば加重平均フィルタやガウシアンフィルタ等の別の平滑化手法を用いることができる。 Next, the nail region extraction unit 116 extracts pixels similar to the color of the nail (S112), and performs a binarized image smoothing process and a labeling process to thereby extract an area having a color similar to the nail. It is detected as a nail region candidate (S113). In the smoothing process described above, for example, a median filter such as 7 × 7 is used in order to combine pixels extracted separately into one region, but the filter region is limited to 7 × 7. It is not something. In addition, the smoothing process by the median filter, specifically, first divides the original image into three images of an R channel image, a G channel image, and a B channel image, and then smoothes each image. Then, the three images after smoothing are again integrated into one RGB image. In this embodiment, since the purpose is to combine pixels, a median filter is used as a smoothing method. However, in the present embodiment, the present invention is not limited to this. For example, a weighted average filter, a Gaussian filter, or the like is used. The smoothing method can be used.

また、上述したラベリング処理では、例えばコンピュータに領域を認識させ、該領域の重心を取得するために、平滑化によって現れた領域に対し、その領域の大きさ順に番号付けを行う。また、領域の小さなもの（具体的には、例えば画素数２０以下の領域）は、ノイズとみなして除去を行う。その後、残った各領域の重心の位置を取得する。 In the labeling process described above, for example, in order to cause a computer to recognize a region and acquire the center of gravity of the region, the regions appearing by smoothing are numbered in the order of the size of the region. A small area (specifically, for example, an area having 20 or less pixels) is regarded as noise and removed. Thereafter, the position of the center of gravity of each remaining region is acquired.

上述の処理により得られた領域は、手指画像全体から求めた領域であるため、上述したように撮像による光の反射の違いにより生じる影響等を受ける可能性がある。 Since the region obtained by the above-described processing is a region obtained from the entire finger image, there is a possibility that it may be affected by a difference in light reflection due to imaging as described above.

そこで、本実施形態では、爪領域抽出部１１６は、更に爪候補領域の重心周りにＲＯＩ（ＲｅｇｉｏｎＯｆＩｎｔｅｒｅｓｔ；関心領域）を設定し（Ｓ１１４）、ＲＯＩ内で再処理を行う（Ｓ１１５）。なお、再処理とは、上述した爪領域抽出処理であり、例えば上述したＳ１１１〜Ｓ１１３までを処理を示すが、本実施形態においてはこれに限定されるものではない。Ｓ１１５の再処理を行うことで光の反射の違いによる影響等を低減させることができる。 Therefore, in this embodiment, the nail region extraction unit 116 further sets a ROI (Region Of Interest) around the center of gravity of the nail candidate region (S114), and performs reprocessing within the ROI (S115). Note that the reprocessing is the above-described nail region extraction processing, and for example, shows the processing from S111 to S113 described above, but is not limited to this in the present embodiment. By performing the reprocessing in S115, it is possible to reduce the influence or the like due to the difference in light reflection.

その後、爪領域抽出部１１６は、得られた爪領域について、再度爪かどうかの判定を行うことにより最終的な爪領域を決定し（Ｓ１１６）、爪領域の重心位置を取得して出力する（Ｓ１１７）。このように、本実施形態では、明度の差を考慮して爪領域の抽出を行う。 Thereafter, the nail region extraction unit 116 determines the final nail region by determining again whether the obtained nail region is a nail (S116), and acquires and outputs the barycentric position of the nail region ( S117). Thus, in the present embodiment, the nail region is extracted in consideration of the difference in brightness.

＜Ｓ１１２：爪に似た色を持つ画素を抽出する手法について＞
次に、上述したＳ１１２の処理における爪に似た色を持つ画素を抽出する手法について説明する。図２４Ａ〜２４Ｃは、主成分軸を基底とした座標変換の一例を示す図である。なお、図２４Ａは、皮膚（ＳＫＩＮ）及び爪（ＮＡＩＬ）の手指画像のＲＧＢ画素分布を示し、図２４Ｂは、皮膚及び爪の第１主軸（１ＳＴＭＡＩＮＡＸＩＳ）−第３主軸（３ＲＤＭＡＩＮＡＸＩＳ）平面の画素分布を示し、図２４Ｃは、皮膚及び爪の第２主軸（２ＮＤＭＡＩＮＡＸＩＳ）−第３主軸平面の画素分布を示している。また、図２５Ａ及び２５Ｂは、分離平面位置決定手法の一例を示す図である。ここで、一例として、第１主軸は肌色における明度の軸を意味し、第２主軸は暖色系の軸を意味し、第３主軸は寒色系の軸を意味しているが、本実施形態においてはこれに限定されるものではない。<S112: Method for Extracting Pixels with Color Similar to Nail>
Next, a method for extracting a pixel having a color similar to a nail in the process of S112 described above will be described. 24A to 24C are diagrams illustrating an example of coordinate conversion based on the principal component axis. 24A shows the RGB pixel distribution of finger images of skin (SKIN) and nails (NAIL), and FIG. 24B shows the first principal axis (1ST MAIN AXIS) -third principal axis (3RD MAIN AXIS) of skin and nails. FIG. 24C shows a pixel distribution in the second principal axis (2ND MAIN AXIS) -third principal axis plane of the skin and nails. 25A and 25B are diagrams illustrating an example of a separation plane position determination method. Here, as an example, the first main axis means a lightness axis in skin color, the second main axis means a warm color axis, and the third main axis means a cold color axis. Is not limited to this.

本実施形態では、まず背景を除いた手指画像の色情報に対して、分散共分散行列を固有値分解して得た主成分軸ベクトルを基底として座標変換を行う。このとき、図２４Ａのように、第３主軸に垂直となる方向に爪画素分布と肌画素分布の層が現れる。したがって、求める分離平面の方程式は、座標変換後の画素の座標をｘ＝（ｘ_１，ｘ_２，ｘ_３）^Ｔ（なお、Ｔは転置行列であることを示す）とすると以下に示す式（１１）のように１次元の非常に簡単な形に表すことができる。In the present embodiment, first, coordinate conversion is performed on the color information of the finger image excluding the background with a principal component axis vector obtained by eigenvalue decomposition of the variance-covariance matrix as a basis. At this time, as shown in FIG. 24A, layers of nail pixel distribution and skin pixel distribution appear in a direction perpendicular to the third principal axis. Therefore, the equation of the separation plane to be obtained is expressed as follows when the coordinates of the pixel after coordinate conversion are x = (x ₁ , x ₂ , x ₃ ) ^T (where T is a transposed matrix): 11) and can be expressed in a very simple one-dimensional form.

ここで、閾値Ｔｈｒｅａｄ＿ｌａｙｅｒは、システムの開始時にキャリブレーションとして爪の写らない掌のみの画像を入力画像として用いた場合の値である。掌のみの画像を用いる理由は、手の甲側と比較して掌の方が一般的に肌の色が白色に近いため、画素の明度が高く、爪の色に似る性質があり、結果として層の上部に画素が分布するためである。 Here, the threshold Thread_layer is a value when a palm-only image without a nail is used as an input image as a calibration at the start of the system. The reason for using a palm-only image is that the palm generally has a skin color close to white compared to the back side of the hand, so the brightness of the pixels is high, and it resembles the color of the nail. This is because the pixels are distributed in the upper part.

つまり、手の甲側のみ写る画像でのキャリブレーションを行ってしまうと分離閾値が低くなり、掌側の肌画素を多く抽出してしまうため適さない。そのため、本実施形態では、例えば掌のみの画像ｉの画素分布の密度の濃い部分のみを抜き出し、図２５Ａのように変数Ｔｈｒｅａｄ＿ｌａｙｅｒ_ｉを上部から下部へ移動させ、式（１１）の直線が密集画素領域と接する位置をＴｈｒｅａｄ＿ｌａｙｅｒ_ｉの値とする。そして、Ｔｈｒｅａｄ＿ｌａｙｅｒを以下に示す式（１２）と定める。In other words, if calibration is performed on an image showing only the back side of the hand, the separation threshold is lowered, and many skin pixels on the palm side are extracted, which is not suitable. Therefore, in the present embodiment, for example, only the dense portion of the pixel distribution of the palm-only image i is extracted, the variable Thread_layer _i is moved from the top to the bottom as shown in FIG. 25A, and the straight line in Expression (11) is the dense pixel. The position in contact with the area is set as the value of Thread_layer _i . Then, Thread_layer is defined as the following equation (12).

ここで、上述した式（１２）において、ｎはキャリブレーションに用いた画像数であり、ｏｆｆ_ｔｈは層を切る位置を微調整するためのオフセット定数である。これらの手法を用いることで、適切な切断位置（分離平面の位置）を自動で定めることができる。Here, in the above equation (12), n is the number of images used for calibration, and off _th is an offset constant for finely adjusting the position at which the layer is cut. By using these methods, it is possible to automatically determine an appropriate cutting position (position of the separation plane).

つまり、上述の処理では、図２５Ａに示すように、まず掌のみ写る画像の画素情報を主成分軸基底変換する。その後、分離平面の位置を高い位置から下げ、密度の大きい領域に差し掛かったところを分離平面の第３主軸座標（３ＲＤＭＡＩＮＡＸＩＳ）とする。このキャリブレーションを複数枚（ｎ枚）の画像に対して行って算出された分離平面の位置の平均値を本実施形態で用いる分離平面の第３主軸座標とする。 That is, in the above-described processing, as shown in FIG. 25A, first, the principal component axis basis conversion is performed on the pixel information of the image in which only the palm is captured. After that, the position of the separation plane is lowered from a high position, and the point where the separation plane is reached is defined as the third principal axis coordinate (3RD MAIN AXIS) of the separation plane. The average value of the position of the separation plane calculated by performing this calibration on a plurality of (n) images is set as the third principal axis coordinate of the separation plane used in this embodiment.

次に、図２５Ｂに示すように、手指画像を主成分軸基底変換し、上述した手法で求めた分離平面で爪の画素（ＮＡＩＬＰＩＸＥＬＳ）と肌の画素（ＳＫＩＮＰＩＸＥＬＳ）とを互いに分離し、例えば平滑化、ラベリング等の処理により、爪に似た色を持つ領域を求める。 Next, as shown in FIG. 25B, the finger image is principal component axis basis transformed to separate the nail pixel (NAIL PIXELS) and the skin pixel (SKIN PIXELS) from each other on the separation plane obtained by the above-described method, For example, an area having a color similar to a nail is obtained by a process such as smoothing or labeling.

上述したように、本実施形態では、主成分分析で求まる３軸のうちの２軸で２次元平面を作成し、作成された２軸のうちの１軸で分離平面の高さを変える処理行う。具体的には、本実施形態では、爪領域抽出部１１６において、撮像装置１２０により予め撮影された掌のみが写る画像の画素情報を、主成分分析により、予め設定された第１から第３までの主軸のうちの２つの主軸を用いて主成分軸基底変換し、その２つの主軸のうちの１つの主軸に沿って分離平面の位置を高い位置から下げ、密度の大きい領域に差し掛かったところを分離平面とし、その分離平面を用いて爪領域を抽出する。 As described above, in this embodiment, a two-dimensional plane is created with two of the three axes obtained by principal component analysis, and the height of the separation plane is changed with one of the two created axes. . Specifically, in the present embodiment, the nail region extraction unit 116 sets pixel information of an image in which only the palm imaged in advance by the imaging device 120 is captured from the first to third values set in advance by principal component analysis. The principal component axis basis transformation is performed using two of the principal axes, the position of the separation plane is lowered from a high position along one principal axis of the two principal axes, and the place where the density area is reached A separation plane is used, and a nail region is extracted using the separation plane.

図２６Ａ及び２６Ｂは、本実施形態において、抽出された爪領域部分を示す図である。本実施形態では、図２６Ａに示す元画像から、上述した解析等により図２６Ｂに示すような爪領域候補を抽出する。なお、図２６Ｂに示す白い領域が爪領域候補として抽出された部分である。本実施形態において、爪領域は、少なくとも１つ抽出するようにしてもよく、上述した条件に合わないような場合は、画像中に爪領域が存在しないものとして処理してもよい。 26A and 26B are diagrams showing the extracted nail region portion in the present embodiment. In the present embodiment, nail region candidates as shown in FIG. 26B are extracted from the original image shown in FIG. 26A by the above-described analysis or the like. In addition, the white area | region shown to FIG. 26B is the part extracted as a nail | claw area | region candidate. In the present embodiment, at least one nail region may be extracted. If the nail region does not meet the above-described conditions, the nail region may be processed as not present in the image.

＜密度差を利用した爪判定手法＞
次に、爪領域候補の決定後、上述したＳ１１６の処理において爪を判定して爪領域を決定する手法について説明する。爪領域候補の決定時では、カメラの撮像方向による光の反射の違いが影響し、ごく小さい領域しか出なかった爪や、爪領域より大きい領域で誤抽出されてしまった肌が爪領域候補となっている可能性がある。そのため、例えば平滑化処理やクロージング処理等の単純な処理では、肌領域より先に爪領域が消えてしまい、誤抽出領域を除去できない場合が生じる。なお、領域が大きく抽出されてしまい、誤抽出されてしまう場所は、およそ位置が決まっている。<Nail determination method using density difference>
Next, a method for determining a nail by determining a nail in the above-described processing of S116 after determining a nail region candidate will be described. When determining the nail area candidate, the difference in light reflection depending on the imaging direction of the camera affects the nail that only appears in a very small area, or the skin that is mistakenly extracted in the area larger than the nail area is the nail area candidate. It may have become. Therefore, for example, in a simple process such as a smoothing process or a closing process, the nail area disappears before the skin area, and the erroneous extraction area may not be removed. It should be noted that the location where the region is extracted greatly and is erroneously extracted is roughly determined.

ここで、図２７は、誤抽出確率の高い肌の分布位置の一例を示す図である。上述したような誤抽出が多いのは、例えば図２７に示す拇指球（ＴＨＥＮＡＲ），指腹（ＦＩＮＧＥＲＰＵＬＰ），指側面（ＦＩＮＧＥＲＳＩＤＥ）であり、分離平面の切断位置が低い場合に誤抽出確率が高くなるのが小指球（ＡＮＴＩＴＨＥＮＡＲ）、及び、ＭＰ関節（Ｍｅｔａｃａｒｐｏｐｈａｌａｎｇｅａｌ；中手指節間関節）付近の肌である。 Here, FIG. 27 is a diagram illustrating an example of a skin distribution position with a high probability of erroneous extraction. For example, there are many false extractions as described above, such as the thumb ball (THENAR), finger pad (FINGER PULP), and finger side surface (FINGER SIDE) shown in FIG. 27, and the false extraction probability when the cutting position of the separation plane is low. The skin of the vicinity of the little finger ball (ANTITHENAR) and MP joint (Metacarpophalangeal; Metacarpophalangeal joint) increases.

そこで、本実施形態では、上述した部位を除去するための処理を行う。まず、爪領域候補の重心周りに正方形状のＲＯＩ（関心領域）を設定し、ＲＯＩ毎に爪に似た色を持つ画素の再抽出を行う。ｉ番目のＲＯＩ内にある手指画素の数をｎ_ｉとしたとき、再抽出時に目標とする目標面積Ｓｑｕａｒｅ_ｉを、例えば以下に示す式（１３）と定める。Therefore, in the present embodiment, a process for removing the above-described part is performed. First, a square ROI (region of interest) is set around the center of gravity of the nail region candidate, and a pixel having a color similar to the nail is re-extracted for each ROI. When the number of finger pixels in the i-th ROI is n _i , the target area Square _i targeted at the time of re-extraction is defined as, for example, the following equation (13).

式（１３）は、全てのＲＯＩ内の手指画素数と目標面積との比が定数ａで一定であることを示す。本実施形態では、この目標面積を基にしてＲＯＩ毎に分離平面を動かし、別々の閾値により再抽出を行う。この面積の再抽出は、例えば爪領域候補の周辺情報のみを利用して行うため、近い明度を持った画素でどれが爪色に似ている画素か判断でき、結果として撮像時の光の反射による抽出精度低下の影響を低減させることができる。 Expression (13) indicates that the ratio of the number of finger pixels in all ROIs to the target area is constant at a constant a. In the present embodiment, the separation plane is moved for each ROI based on this target area, and re-extraction is performed with different threshold values. This re-extraction of the area is performed, for example, using only the peripheral information of the nail region candidate, so it is possible to determine which pixels have similar brightness and which are similar to the nail color, and as a result, reflect light during imaging. This can reduce the influence of the decrease in extraction accuracy.

ここで、図２８Ａ及び２８Ｂは、それぞれ爪領域を再抽出した場合及び誤抽出肌領域を再抽出した場合の実行結果の一例を示す図である。図２８Ａ及び２８Ｂにおいて、再抽出後の画素の様子は、図２８Ａに示す爪領域では重心周りに集まって密集して分布している。一方、図２８Ｂに示す誤抽出肌領域では、拡散して分布をしていることがわかる。そこで、本実施形態では、この密集の様子の差異を利用して爪領域を判定する。 Here, FIGS. 28A and 28B are diagrams illustrating examples of execution results when the nail region is re-extracted and when the erroneously extracted skin region is re-extracted, respectively. In FIGS. 28A and 28B, the state of pixels after re-extraction is concentrated around the center of gravity in the nail region shown in FIG. 28A. On the other hand, it can be seen that the erroneously extracted skin region shown in FIG. Therefore, in the present embodiment, the nail region is determined using the difference in the state of crowding.

本実施形態では、密集画素数と密集でない画素数とをそれぞれ数え、その画素数の比の大きさを比較することで爪を判定する。例えば、ｉ番目のＲＯＩ内において画素再抽出後の２値画像をＯ_ｉとし、その画像に対してメディアンフィルタ等による平滑化処理を施して、密な領域だけを残した画像を密画像Ｃ_ｉと定義する。そして２値画像Ｏ_ｉと密画像Ｃ_ｉの排他的論理和を以下に示す式（１４）によって取り、その値を疎画像Ｓ_ｉとして定義する。In the present embodiment, the nail is determined by counting the number of dense pixels and the number of pixels that are not dense and comparing the ratio of the number of pixels. For example, a binary image after pixel re-extraction in the i-th ROI is set as O _i, and the image is smoothed by a median filter or the like, and an image leaving only a dense region is obtained as a dense image C _i. It is defined as Then, the exclusive OR of the binary image O _i and the dense image C _i is taken by the following equation (14), and the value is defined as the sparse image S _i .

このとき、密画像Ｃ_ｉ内の抽出画素のピクセル数をＮ_ｃ ^ｉ、疎画像Ｓ_ｉ内の抽出画素のピクセル数をＮ_ｓ ^ｉとした場合に、爪である条件を、例えば以下に示す式（１５）で規定し、式（１５）に基づいて爪であるか否かの判定を行う。At this time, when the number of pixels extracted pixels in dense image C _i N _{c ^i,} the number of pixels extracted pixels in sparse image S _i was N _s ^i, equation showing the condition is a nail, for example, the following It is defined by (15), and it is determined whether or not it is a nail based on the formula (15).

つまり、本実施形態では、上述したように、各爪領域候補の重心からＲＯＩを設定し、ＲＯＩの領域面積が同じとなるように分離平面の位置をＲＯＩ毎に変化させて再抽出を行う。なお、本実施形態では、ＲＯＩ毎に画像を生成し、画素の重なりを防止する。また、ＲＯＩの形状は、正方形としてもよいし、円形としてもよいが、ＲＯＩの形状や大きさ等については特に限定されるものではない。 That is, in the present embodiment, as described above, ROI is set from the center of gravity of each nail region candidate, and re-extraction is performed by changing the position of the separation plane for each ROI so that the ROI region area is the same. In this embodiment, an image is generated for each ROI to prevent pixel overlap. The ROI shape may be a square or a circle, but the shape and size of the ROI are not particularly limited.

＜評価結果について＞
次に、上述した本実施形態に基づく爪領域抽出結果の評価実験、及び、その評価結果について、図を用いて説明する。図２９Ａ〜２９Ｃは、本実施形態における評価結果について説明するための図である。なお、図２９Ａは、手の甲（ＢＡＣＫ）及び掌（ＰＡＬＭ）における爪領域候補から、実際に爪として判定される確率（ＤＥＴＥＣＴＩＯＮＰＲＯＢＡＢＩＬＩＴＹ［％］）を各指（ＴＨＵＭＢ（親指），ＩＮＤＥＸ（人差し指）、ＭＩＤＤＬＥ（中指）、ＲＩＮＧ（薬指）、ＰＩＮＫＹ（小指））及び皮膚（ＳＫＩＮ）毎に求めた一例を示し、図２９Ｂは、手の甲及び掌における抽出した爪の重心と実重心とのユークリッド距離誤差（ＤＩＳＴＡＮＣＥ［ＰＩＸＥＬ］）を各指毎に求めた一例を示し、図２９Ｃは、手の甲側のみが写る画像の爪領域重心とＲＯＩの一例を示す。<About evaluation results>
Next, the evaluation experiment of the nail | claw area | region extraction result based on this embodiment mentioned above and the evaluation result are demonstrated using figures. 29A to 29C are diagrams for describing the evaluation results in the present embodiment. Note that FIG. 29A shows the probability (DETECTION PROBABILITY [%]) that is actually determined as a nail from nail region candidates on the back of the hand (BACK) and palm (PALM) for each finger (THUMB (thumb), INDEX (index finger), FIG. 29B shows an example obtained for each of MIDDLE (middle finger), RING (ring finger), PINKY (little finger)) and skin (SKIN). FIG. (DISTANCE [PIXEL]) is shown for each finger, and FIG. 29C shows an example of the nail region centroid and ROI of an image showing only the back side of the hand.

評価実験では、蛍光灯でカメラ上方から下方に照らし、ＬＥＤライト２台のうち一方で上方から下方に、及び、他方で下方から上方を照らし、背景が黒色になるような環境下でカメラから８０ｃｍ離れて撮影した手指の画像を使用する。カメラは、例えばＰｏｉｎｔＧｒａｙＲｅｓｅａｒｃｈ社製ＤｒａｇｏｎｆｌｙＥｘｐｒｅｓｓ（６４０×４８０［ｐｉｘｅｌ］）を用いた。 In the evaluation experiment, a fluorescent lamp is used to illuminate the camera from the top to the bottom, and one of the two LED lights is illuminated from the top to the bottom, and on the other side from the bottom to the top. Use a finger image taken from a distance. For example, a Dragonfly Express (640 × 480 [pixel]) manufactured by Point Gray Research was used as the camera.

画像は、手の甲側のみが写る画像を１００枚、掌側を含めて写る画像を１００枚の計２００枚を使用する。また評価は、爪領域候補として検出された爪及び誤抽出した肌が爪判定手法により爪として判定される確率、そして正しく抽出された爪の重心と、実重心とのユークリッド距離誤差について指毎に評価を行った。 As the images, 100 images including only the back side of the hand and 100 images including the palm side are used in total of 200 images. The evaluation is performed for each finger with respect to the nail detected as the nail region candidate and the probability that the erroneously extracted skin is determined as a nail by the nail determination method, and the Euclidean distance error between the correctly extracted nail centroid and the actual centroid. Evaluation was performed.

評価実験は、各指の爪が爪領域候補として認識される確率が、各指に対して９５％以上となるようにオフセット定数ｏｆｆ_ｔｈを調節して行った。また、本実施形態では、一例としてＲＯＩの探索範囲を４０×４０［ｐｉｘｅｌ］とし、爪の判定条件の閾値Ｔｈｒｅａｄ_ｃｓを２．５としているが、本実施形態においてはこれに限定されるものではなく、これらのパラメータは、例えば画像中における手指の大きさ、角度、入力画像の総画素数、画像サイズ等に応じて、任意に設定することができる。The evaluation experiment was performed by adjusting the offset constant off _th so that the probability that each fingernail is recognized as a nail region candidate is 95% or more for each finger. In this embodiment, as an example, the search range of ROI is 40 × 40 [pixel] and the threshold Thread _cs of the nail determination condition is 2.5. However, in the present embodiment, the present invention is not limited to this. Instead, these parameters can be arbitrarily set according to, for example, the size and angle of fingers in the image, the total number of pixels of the input image, the image size, and the like.

まず、図２９Ａに、爪領域候補として検出された爪及び誤抽出した肌が、爪として認識される確率を示す。図２９Ａによれば、本実施形態における手法は、肌の誤抽出１０％以下となり、肌をほとんど誤検出しなかった。更に、手の甲側のみが写る画像の拇指の爪の検出結果を除き、９０％を超える精度で爪を検出できることが示された。なお、最終的に誤判定された肌部位は指腹及び指側面であった。指側面を誤抽出した主な原因は、誤抽出した位置が爪領域直近に位置していたため、ＲＯＩ領域内の多くが爪となっていたためであると考えられる。 First, FIG. 29A shows the probability that a nail detected as a nail region candidate and erroneously extracted skin will be recognized as a nail. According to FIG. 29A, the technique in the present embodiment is 10% or less of erroneous skin extraction, and the skin is hardly erroneously detected. Furthermore, it was shown that the nails can be detected with an accuracy of more than 90% except for the detection result of the thumb nail in the image showing only the back side of the hand. In addition, the skin site | parts finally determined erroneously were the finger pad and the finger | toe side surface. The main reason for misextracting the finger side surface is considered to be that most of the ROI area was a nail because the misextracted position was located in the immediate vicinity of the nail area.

これは、爪判定処理（アルゴリズム）で再抽出した際に得られる画素の重心とＲＯＩ中心との誤差を見て、大きければ処理するといった処理を加えれば改善できる。また、指腹については、稀に中心に画素が集中する場合があることが判明した。 This can be improved by adding a process such as processing if the error is found between the center of gravity of the pixel and the ROI center obtained by re-extraction in the nail determination process (algorithm). In addition, with regard to the finger pad, it has been found that pixels are sometimes concentrated at the center.

次に、図２９Ｂに、抽出した重心と実重心とのユークリッド距離誤差を示す。図２９Ｂにより、手の甲側のみが写る画像における拇指以外の爪の重心のユークリッド距離誤差は、平均で４［ｐｉｘｅｌ］未満となり、画像の解像度から見てかなり小さい誤差で爪の重心を求めることができることがわかった。なお、各指において掌側を含んで写る画像よりも手の甲のみが写る画像で重心誤差が大きいのは、爪領域と、その領域に隣接する指側面の小さな領域が結合し、重心が爪の中心から外側方向へずれるためであると考えられる。 Next, FIG. 29B shows the Euclidean distance error between the extracted centroid and the actual centroid. According to FIG. 29B, the Euclidean distance error of the center of gravity of the nail other than the thumb in the image showing only the back side of the hand is less than 4 [pixel] on average, and the center of gravity of the nail can be obtained with a considerably small error from the resolution of the image. I understood. Note that the center of gravity error is larger in an image that only shows the back of the hand than the image that includes the palm side of each finger. The nail area and a small area on the side of the finger adjacent to that area are combined, and the center of gravity is the center of the nail. It is thought that it is because it shifts to the outside direction.

最後に、拇指の手の甲側のみが写る画像が他と比較して著しく精度が減少した理由を考察する。手の甲側の画像で拇指を含む画像は図２９Ｃのように撮像されている。図２９Ｃによれば、拇指以外では重心が爪のほぼ中心となっているが、拇指では爪の端に重心があることが分かる。これは、解析を行った結果、拇指の爪の下部に極度に明度が低下する影ができてしまったため色が変化し、爪上部及び指側面の肌しか抽出されず、更にそれらが結合してしまったことが原因であることが判明した。 Finally, let us consider the reason why the accuracy of the image showing only the back side of the thumb of the thumb is significantly reduced compared to other images. An image including the thumb on the back side of the hand is captured as shown in FIG. 29C. According to FIG. 29C, the center of gravity is almost the center of the nail except for the thumb, but it can be seen that the center of gravity is at the end of the nail. As a result of the analysis, a shadow with extremely low brightness was created at the lower part of the nail of the thumb, and the color changed, and only the skin on the upper part of the nail and the side of the finger was extracted. It turned out to be the cause.

また、評価結果としては、爪の端に重心、つまりＲＯＩの中心があるため、拇指付近の探索範囲は他の爪と比較して多く肌の画素を取り込むこととなる。更に、取り込む肌領域画素は色が爪に似ている指側面の画素である。これにより、拇指では、ＲＯＩに含まれる指側面画素により結果が大きく影響され、誤推定を生じさせたのではないかと考えられる。このように、ＲＯＩ内で影による局所的な明度差ができてしまう状況が影響し、判定精度が低下したことが原因であると考えられる。 Further, as an evaluation result, since the center of gravity, that is, the center of ROI is at the end of the nail, the search range near the thumb finger captures more skin pixels than other nails. Furthermore, the skin region pixels to be captured are pixels on the finger side whose color resembles a nail. As a result, with the thumb, the result is greatly influenced by the finger side surface pixels included in the ROI, and it is considered that an erroneous estimation has occurred. As described above, it is considered that this is caused by a situation where a local brightness difference due to a shadow is generated in the ROI, and the determination accuracy is lowered.

本実施形態では、爪輪郭内外及び肌において、爪に似た色を持つ画素の密集の状態が異なる性質に注目をし、密集差から爪を判別する爪領域検出システムを構築した。評価実験の結果としては、肌領域を１０％以下の低確率でしか誤判定せず、また爪に局所的に影ができない場合では９０％を超える高精度で爪のみを検出でき、爪の重心のずれも平均４［ｐｉｘｅｌ］以下で求めることができることが確認された。したがって、本実施形態を用いることにより、画像中の爪領域を高精度に抽出することができる。 In the present embodiment, a nail region detection system that distinguishes the nail from the density difference is constructed by paying attention to the property that the dense state of pixels having colors similar to the nail are different inside and outside the nail contour and on the skin. As a result of the evaluation experiment, the skin area is misjudged only with a low probability of 10% or less, and when the shadow cannot be locally applied to the nail, only the nail can be detected with high accuracy exceeding 90%, and the center of gravity of the nail is detected. It was confirmed that the deviation can be obtained with an average of 4 [pixel] or less. Therefore, by using this embodiment, the nail region in the image can be extracted with high accuracy.

＜爪領域抽出技術の適用例＞
ここで、本実施形態における爪領域抽出技術の適用例について、図を用いて説明する。図３０は、本実施形態の爪領域抽出技術の適用例を示す図である。図３０では、本実施形態における爪領域抽出装置の機能と、輪郭線を用いた既存の手指形状推定装置の機能とを具備した手指形状推定システム１３０が示されている。具体的には、手指形状推定システム１３０は、撮影装置であるカメラ１３１と、輪郭線情報取得処理系１３２と、爪情報取得処理系１３３と、データベース照合部１３４とを有している。ここで、爪情報取得処理系１３３とは、上述した爪領域抽出装置１１０に相当する。<Application example of nail area extraction technology>
Here, an application example of the nail region extraction technique in the present embodiment will be described with reference to the drawings. FIG. 30 is a diagram illustrating an application example of the nail region extraction technique of the present embodiment. FIG. 30 shows a finger shape estimation system 130 that includes the function of the nail region extraction device according to the present embodiment and the function of an existing finger shape estimation device that uses contour lines. Specifically, the finger shape estimation system 130 includes a camera 131 that is an imaging device, an outline information acquisition processing system 132, a nail information acquisition processing system 133, and a database collation unit 134. Here, the nail information acquisition processing system 133 corresponds to the nail region extraction device 110 described above.

カメラ１３１から得られる手を含む画像は、輪郭線情報取得処理系１３２と爪情報取得処理系１３３とに出力される。輪郭線情報取得処理系１３２は、取得した画像から手指の輪郭線（輪郭形状）を取得し、該輪郭線の特徴量（輪郭線情報）を出力する。なお、輪郭線の取得例としては、例えば隣接画素間における輝度差情報等に基づいて、画像中から手指部分と背景部分とを分離し、手指部分の輪郭線を取得することができるが、本実施形態においては、これに限定されるものではない。 An image including a hand obtained from the camera 131 is output to the contour line information acquisition processing system 132 and the nail information acquisition processing system 133. The contour line information acquisition processing system 132 acquires a contour line (contour shape) of a finger from the acquired image, and outputs a feature amount (contour line information) of the contour line. As an example of acquiring a contour line, for example, a finger part and a background part can be separated from an image based on luminance difference information between adjacent pixels and the contour line of the finger part can be acquired. The embodiment is not limited to this.

爪情報取得処理系１３３は、カメラ１３１から取得した画像に対して上述した処理を行うことで、例えば爪の重心や輪郭等の爪情報を取得する。 The nail information acquisition processing system 133 acquires the nail information such as the center of gravity and contour of the nail by performing the above-described processing on the image acquired from the camera 131.

データベース照合部１３４は、図３０に示すように、輪郭線情報と爪情報とを用いて、多種類の手指形状についての輪郭線の特徴量及び爪の存在情報、重心、領域面積等と、その時の関節角度情報とを組み合わせて、それらのデータと予め蓄積部等に蓄積されているデータベース内のデータとを照合して、最も類似度の高い手指形状を、その手の推定形状として出力する。 As shown in FIG. 30, the database verification unit 134 uses the contour line information and the nail information, the contour line feature amount and nail presence information, the center of gravity, the area area, etc. for various types of finger shapes, Are combined with the data in the database previously stored in the storage unit or the like, and the finger shape having the highest similarity is output as the estimated shape of the hand.

また、データベース照合部１３４は、入力データと、関節角度情報も予め組み合わせて蓄積されたデータベース内のデータとを照合することで、例えば手指の関節角度を特定し、その角度データ等を出力する。なお、データベース照合部１３４が出力する情報は、角度データに限定されるものではなく、例えば手の動作内容（例えば、把持動作の種類）等を出力してもよい。 Further, the database collating unit 134 collates the input data with data in the database that is stored in advance by combining the joint angle information, for example, specifies the joint angle of the finger, and outputs the angle data and the like. Note that the information output by the database collation unit 134 is not limited to angle data, and for example, the content of hand movement (for example, the type of gripping movement) may be output.

また、上述した輪郭線情報取得処理系１３２における処理やデータベース照合部１３４による処理は、例えば上述した爪領域抽出装置１１０の手指形状推定部１１７の処理内容に含まれていてもよい。 Further, the processing in the contour line information acquisition processing system 132 and the processing by the database collation unit 134 may be included in the processing content of the finger shape estimation unit 117 of the nail region extraction device 110 described above, for example.

一般に、手指形状の推定は、多関節構造であり、指が複雑に動作することから３次元モデルを立てると計算が煩雑になる、自己遮蔽に弱い等の問題がある。そのため、２次元の画像情報とデータベースのデータとを照合することによって３次元形状を高速で推定する等の方法が用いられる。しかしながら、その方法は、輪郭線情報を基にしているため、例えばカメラ正面に向かって指を曲げていると指の先端位置情報が失われ、精度が失われる可能性がある。 In general, the estimation of the finger shape is a multi-joint structure, and since fingers move in a complicated manner, there are problems such as a complicated calculation and a weakness against self-shielding when a three-dimensional model is established. For this reason, a method of estimating a three-dimensional shape at high speed by collating two-dimensional image information with data in a database is used. However, since the method is based on the contour line information, for example, if the finger is bent toward the front of the camera, the tip position information of the finger may be lost, and the accuracy may be lost.

そこで、本実施形態では、図３０に示す手指形状推定システム１３０のように、爪情報取得処理系１３３を用いることで、指の先端位置情報を得ることができ、推定精度を向上させることができる。なお、上述したような爪情報取得処理系は、例えば本出願人により出願された国際公開番号ＷＯ２００９／１４７９０４号に示されているような手指形状推定装置等に適用することができる。 Therefore, in this embodiment, the finger tip position information can be obtained by using the nail information acquisition processing system 133 as in the finger shape estimation system 130 shown in FIG. 30, and the estimation accuracy can be improved. . Note that the nail information acquisition processing system as described above can be applied to a finger shape estimation device as shown in, for example, International Publication No. WO2009 / 147904 filed by the present applicant.

上述したように本実施形態によれば、画像中の爪領域を高精度に抽出することができる。具体的には、本実施形態は、付け爪等の装着物なしにカメラに写るように手を動かすだけで高精度かつリアルタイムで爪領域を抽出することができる。この技術が発展すると、指の先端位置を正確に求めることができるようになり、例えば手話動作等の複雑な手指形状を伴う動作をコンピュータに認識させる技術の認識精度向上や、形を特定しない手首回旋を含む様々な手指の形状を推定する際の推定精度向上が期待できる。 As described above, according to the present embodiment, the nail region in the image can be extracted with high accuracy. Specifically, in the present embodiment, the nail region can be extracted with high accuracy and in real time only by moving the hand so that it is reflected in the camera without an attachment such as an artificial nail. As this technology develops, it will be possible to accurately determine the tip position of a finger. For example, the recognition accuracy of a technology that makes a computer recognize a motion involving a complicated finger shape such as a sign language motion, or a wrist that does not specify a shape. An improvement in estimation accuracy when estimating various finger shapes including rotation can be expected.

また、爪の色は、肌と差異があるものの、正確に爪領域と肌領域を分離することは容易ではない。実際、従来手法では、予め爪領域画素と肌領域画素との画素分布を解析しなければ、分離の判定式を作ることもできなかった。更に、爪領域は、手指領域内でとても小さいために、爪と類似した色を抽出し、平滑化処理、ラベリング処理を行って領域を求めた際に、爪領域より爪に似た色を持つ肌領域の方が大きい領域となってしまう。このため、平滑化処理やクロージング処理等の単純なノイズ除去手法では肌領域を取り除けない事例が多いことが高精度な爪領域抽出を困難なものとしていた。しかしながら、本実施形態によれば、爪画素分布と肌画素分布とを分離する分離平面を、爪及び肌の分布を解析することなく定めることができ、取り除けなかった肌領域を除去することができる。 Further, although the color of the nail is different from that of the skin, it is not easy to accurately separate the nail region and the skin region. In fact, in the conventional method, it is impossible to make a separation judgment formula unless the pixel distribution between the nail region pixel and the skin region pixel is analyzed in advance. Furthermore, since the nail area is very small in the finger area, when a color similar to the nail is extracted and the area is obtained by performing smoothing processing and labeling processing, the nail area has a color similar to that of the nail. The skin area becomes a larger area. For this reason, there are many cases in which a skin area cannot be removed by a simple noise removal method such as a smoothing process or a closing process, making it difficult to extract a nail area with high accuracy. However, according to the present embodiment, the separation plane for separating the nail pixel distribution and the skin pixel distribution can be determined without analyzing the nail and skin distribution, and the skin area that could not be removed can be removed. .

また、本実施形態では、色情報のみを用いた分離平面生成と、掌を含む画像対応のための画素判別後の爪の再判定とを行う。従来技術では、画素がどちらの領域に属するかを判定する分離超平面を構成するための識別器を生成するために、爪領域画素の特徴量データベースと肌領域画素の特徴量データベースとを作る必要がある。この際に肌と爪を手動で切り分ける必要が生じるため、手動で爪と肌を切り分け、加工した学習用画像を生成しなければならず、精度を持たせるために多くの学習用画像を生成するには膨大な時間と労力を必要とした。 In the present embodiment, separation plane generation using only color information and nail redetermination after pixel discrimination for correspondence with an image including a palm are performed. In the prior art, it is necessary to create a feature quantity database for nail area pixels and a feature quantity database for skin area pixels in order to generate a discriminator for constructing a separation hyperplane that determines which area a pixel belongs to There is. At this time, it is necessary to manually separate the skin and the nail, so it is necessary to manually separate the nail and the skin to generate a processed learning image, and generate many learning images in order to provide accuracy. Took a lot of time and effort.

それに対して本実施形態では、爪領域画素が肌領域画素に比べて非常に少ないという性質と、手指画像全体の画素情報を主成分軸基底で座標変換すると、第３主軸方向に爪領域画素及び肌領域画素がなす分布に層状の大きな偏りが生じ、線形式の分離平面で両者の画素分布を分離できるという性質とを生かす。そして、事前にキャリブレーションとして爪が写らない、掌のみの手指画像を主成分軸基底座標変換し、画素密度による判定で肌画素分布上面の第３主軸方向の座標値を得ることで、人間が手動で作業をしなくても自動で爪画素分布と肌画素分布とを切り分ける分離平面の方程式を導出可能とした。これにより、本実施形態では、時間と労力とを大幅に削減することができる。 On the other hand, in the present embodiment, the property that the nail region pixels are very small compared to the skin region pixels, and the pixel information of the entire finger image is coordinate-transformed with the principal component axis basis, the nail region pixels and the third principal axis direction A large layered bias occurs in the distribution formed by the skin region pixels, and the characteristic that both pixel distributions can be separated by a linear separation plane is utilized. Then, by performing principal component axis basis coordinate conversion on a palm-only hand image in which a nail is not captured as a calibration in advance and obtaining a coordinate value in the third principal axis direction on the upper surface of the skin pixel distribution by determination based on pixel density, It is possible to derive the equation of the separation plane that automatically separates the nail pixel distribution from the skin pixel distribution without manual operation. Thereby, in this embodiment, time and labor can be reduced significantly.

また、従来技術では、情報端末装置をタッチパネルのように操作する際に爪情報を用いることを前提としていたため、爪領域画素と色が似ている画素を持つ肌領域が少ない手の甲側のみを識別対象として考慮していた。そのため、従来技術では、識別器を用いた分離超平面により、画素が爪領域であるか、又は、肌領域であるかの種類判定しか行っていない。そのため、肌領域の１部に爪と似た色を持つ画素が集中する領域が存在する掌側では対応できない。それに対して本実施形態では、従来技術と分離平面を用いて画素を判定する点は同じであるが、判定後にその画素を含む領域（ＲＯＩ）毎に、本当にその領域が爪領域であるかどうかをもう一度判定することで、掌を含む画像に対応できるようにした。 In addition, the conventional technology is based on the assumption that nail information is used when operating an information terminal device like a touch panel, so only the back side of the hand with fewer skin areas with pixels similar in color to the nail area pixels is identified. It was considered as a target. Therefore, in the prior art, only the type determination of whether a pixel is a nail region or a skin region is performed by a separation hyperplane using a discriminator. For this reason, it cannot be handled on the palm side in which a region where pixels having colors similar to nails are concentrated in a part of the skin region. On the other hand, in this embodiment, the point of determining a pixel using the separation plane is the same as that in the conventional technique, but whether or not the region is really a nail region for each region (ROI) including the pixel after the determination. It was made possible to deal with images containing palms by judging again.

つまり、本実施形態は、事前に爪領域画素と肌領域画素とを手動で切り取る作業をすることなく用いることができるシステムであるため、本実施形態によれば、少し調整を行うだけで、すぐにシステムを用いることができる。また、掌を含む手指画像にも対応することができるため、様々な手指形状に対して正確に爪の位置を求めることができる。更に、本実施形態では、分離平面による判定と、領域抽出後の再判定とによる２段階判定方式を用いているため、誤抽出をする可能性が減少する。 That is, since this embodiment is a system that can be used without manually cutting out nail area pixels and skin area pixels in advance, according to this embodiment, it is possible to make adjustments with a little adjustment. The system can be used. Moreover, since it can respond also to the finger image containing a palm, the position of a nail | claw can be calculated | required correctly with respect to various finger shapes. Furthermore, in this embodiment, since a two-stage determination method based on determination based on the separation plane and re-determination after region extraction is used, the possibility of erroneous extraction is reduced.

更に、本実施形態は、爪が常に画像内に写るようにカメラを配置するだけで、付け爪等の装着物なしに、爪領域の位置を得る、すなわち、指の先端位置を常に正確に知ることが可能となる。したがって、本実施形態によれば、例えば、手の動きをそのまま仮想３次元空間で動作させる３次元ジェスチャーインターフェースの動作を精巧なものとしたい場合や、タッチしないでも爪の領域の動きを検出することでタッチしたかのように端末を動かすことができるノンタッチ動作検出端末機、手話認識装置等に利用できると考えられる。 Furthermore, in the present embodiment, the position of the nail region is obtained without attaching an attachment such as an artificial nail, that is, the tip position of the finger is always accurately known only by arranging the camera so that the nail always appears in the image. It becomes possible. Therefore, according to the present embodiment, for example, when it is desired to refine the operation of a three-dimensional gesture interface that moves a hand movement as it is in a virtual three-dimensional space, or to detect the movement of a nail region without touching It can be used for a non-touch motion detection terminal, a sign language recognition device, etc. that can move the terminal as if it were touched.

更に、画像中に複数の手が存在する場合であっても、本実施形態の手法により爪領域を抽出することで、より高精度に手指形状を推定することができる。 Furthermore, even if there are a plurality of hands in the image, the finger shape can be estimated with higher accuracy by extracting the nail region by the method of the present embodiment.

以上、爪領域抽出技術の好ましい実施例について詳述したが、本実施形態の爪領域抽出技術は係る特定の実施形態に限定されるものではなく、種々の変形、変更が可能である。 As mentioned above, although the preferable Example of the nail | claw area | region extraction technique was explained in full detail, the nail | claw area | region extraction technique of this embodiment is not limited to the specific embodiment which concerns, A various deformation | transformation and change are possible.

また、図３０に示す例では、第２の実施形態に係る爪領域抽出装置と、既存の手指形状推定装置とを組み合わせた手指形状推定システム１３０について説明したが、第２の実施形態に係る爪領域抽出装置（図１９）と、上記第１の実施形態に係る手指形状推定装置（図１）とを組み合わせて手指形状推定システム（手指形状推定装置）を構築してもよい。 In the example illustrated in FIG. 30, the finger shape estimation system 130 that combines the nail region extraction device according to the second embodiment and the existing finger shape estimation device has been described. However, the nail according to the second embodiment is described. A finger shape estimation system (finger shape estimation device) may be constructed by combining the region extraction device (FIG. 19) and the finger shape estimation device (FIG. 1) according to the first embodiment.

この場合、例えば、第１の実施形態に係る手指形状推定装置（図１）と、第２の実施形態に係る爪領域抽出装置（図１９）とをそれぞれ別個の装置として組み合わせてもよい。また、例えば、図１に示す手指形状推定装置１０に、図１９に示す爪領域抽出装置１１０中の爪領域抽出部１１６を組み込んで、手指形状推定システム（手指形状推定装置）を構築してもよい。 In this case, for example, the finger shape estimation device (FIG. 1) according to the first embodiment and the nail region extraction device (FIG. 19) according to the second embodiment may be combined as separate devices. Further, for example, even if a finger shape estimation system (finger shape estimation device) is constructed by incorporating the nail region extraction unit 116 in the nail region extraction device 110 shown in FIG. 19 into the finger shape estimation device 10 shown in FIG. Good.

後者の構成においては、第１の実施形態に係る手指形状推定装置（図１）と、第２の実施形態に係る爪領域抽出装置（図１９）との間で共用できる構成部（例えば、入力部、出力部、蓄積部、画像取得部、画像解析部、送受信部、制御部等）は、両装置において共用してもよい。この場合、共用する各構成部の動作を、手指形状推定機能だけでなく、爪領域抽出機能にも対応できるように制御すればよい。このような構成では、上記第１の実施形態で得られる効果だけでなく、図３０で説明した上記各種効果と同様の効果も得られる。 In the latter configuration, a component (for example, an input) that can be shared between the finger shape estimation device (FIG. 1) according to the first embodiment and the nail region extraction device (FIG. 19) according to the second embodiment. Unit, output unit, storage unit, image acquisition unit, image analysis unit, transmission / reception unit, control unit, etc.) may be shared by both devices. In this case, it is only necessary to control the operation of each component to be shared so as to support not only the finger shape estimation function but also the nail region extraction function. With such a configuration, not only the effects obtained in the first embodiment but also the same effects as the various effects described in FIG. 30 can be obtained.

１０手指形状推定装置
１１，１１１入力部
１２，１１２出力部
１３，１１３蓄積部
１４，１１４画像取得部
１５データベース構築部
１６，１１５画像解析部
１７照合部
１８，１１７手指形状推定部
１９，１１８送受信部
２０，１１９制御部
２１，１２０撮像装置
３１，１２１入力装置
３２，１２２出力装置
３３，１２３ドライブ装置
３４，１２４補助記憶装置
３５，１２５メモリ装置
３６，１２６ＣＰＵ
３７，１２７ネットワーク接続装置
３８，１２８記録媒体
４０ロボット
４１，５１ユーザ
４２，４３，５２手指
４４ロボットカメラ
５０携帯端末
５３モバイルプロジェクタ機能
１１０爪領域抽出装置
１１６爪領域抽出部
１３０手指形状推定システム
１３１カメラ
１３２輪郭線情報取得処理系
１３３爪情報取得処理系
１３４データベース照合部DESCRIPTION OF SYMBOLS 10 Finger shape estimation apparatus 11,111 Input part 12,112 Output part 13,113 Storage part 14,114 Image acquisition part 15 Database construction part 16,115 Image analysis part 17 Collation part 18,117 Finger shape estimation part 19,118 Transmission / reception Unit 20, 119 Control unit 21, 120 Imaging device 31, 121 Input device 32, 122 Output device 33, 123 Drive device 34, 124 Auxiliary storage device 35, 125 Memory device 36, 126 CPU
37, 127 Network connection device 38, 128 Recording medium 40 Robot 41, 51 User 42, 43, 52 Finger 44 Robot camera 50 Mobile terminal 53 Mobile projector function 110 Nail region extraction device 116 Nail region extraction unit 130 Finger shape estimation system 131 Camera 132 Outline information acquisition processing system 133 Nail information acquisition processing system 134 Database verification unit

Claims

An image acquisition unit for acquiring an image including a finger shape;
An image analysis unit that analyzes the image acquired by the image acquisition unit and acquires a first feature amount corresponding to a ridge line shape of a finger included in the image;
Based on the first feature value obtained by the image analysis unit, a reference database in which a second feature value corresponding to a predetermined finger shape set in advance is stored is referred to. A finger shape estimation unit for estimating a finger shape corresponding to the feature amount ,
The image analysis unit regards the image as one peak by regarding a finger image included in the image as a foreground, an image other than the finger image as a background, and a distance from the background image in the foreground image as a height. A finger shape estimation device that obtains information on the ridge line shape from the mountain-shaped image .

Furthermore, a database construction unit for constructing the database is provided,
The finger according to claim 1, wherein the database constructed by the database construction unit stores at least angle data and the second feature amount corresponding to the predetermined finger shape with respect to the predetermined finger shape. Shape estimation device.

The finger shape estimation unit narrows down a data set based on a predetermined shape parameter in the finger shape from the database, and performs similarity calculation on the narrowed data set group using the first feature amount. The finger shape estimation apparatus according to claim 1, wherein the most similar data set is output.

The image analysis unit sets a start point and an end point of a ridge line vector based on a slope of the ridge line vector obtained by contour scanning of a finger image included in the image ,
The finger shape estimation unit according to any one of claims 1 to 3, wherein the finger shape estimation unit estimates the finger shape based on a position of a start point and / or an end point of a ridge line vector set by the image analysis unit. Estimating device.

Further, a separation plane is generated using only color information obtained from the image acquired by the image acquisition unit, a nail region candidate is extracted based on the generated separation plane, and an image including a preset palm The finger shape estimation apparatus according to any one of claims 1 to 4 , further comprising a nail region extraction unit that performs redetermination of a nail region with respect to the nail region candidate by pixel determination using a nail .

The nail region extraction unit performs nail determination using a property in which a pixel in the case of an actual nail and a pixel having a color similar to that of the nail are different in density between the nail region and the skin region. The finger shape estimation apparatus according to claim 5 .

The nail region extracting unit uses pixel components of an image in which only a palm prepared in advance is captured using two principal axes among first to third principal axes set in advance by principal component analysis. The base plane is converted, the position of the separation plane along one of the two principal axes is lowered from a high position, and the area that has reached a high density area is defined as the separation plane, and the nail area is formed using the separation plane. The finger shape estimation device according to claim 5 or 6, wherein the finger shape estimation device extracts the shape.

Acquiring an image including a finger shape;
Analyzing the acquired image to acquire a first feature amount corresponding to a ridge line shape of a finger included in the image;
A finger shape corresponding to the first feature value is obtained by referring to a database for collation in which a second feature value corresponding to a predetermined finger shape set in advance is stored based on the first feature value. Estimating,
Obtaining the first feature amount includes:
The hand image included in the image is the foreground, the image other than the finger image is the background, the distance from the background image in the foreground image is regarded as a height, the image is regarded as one mountain-shaped image, Including obtaining information on the ridge line shape from the mountain-shaped image.
Finger shape estimation method.

Further, for the predetermined finger shape, including constructing the database in which at least angle data corresponding to the predetermined finger shape and the second feature amount are accumulated.
The finger shape estimation method according to claim 8 .

Estimating the finger shape
Refining a data set from the database with a predetermined shape parameter in the finger shape, performing similarity calculation using the first feature amount for the narrowed data set group, and the similarity 10. The finger shape estimation method according to claim 8 , further comprising outputting a most similar data set based on a result of degree calculation .

Obtaining the first feature amount includes:
Setting a starting point and an ending point of the ridge line vector based on the inclination of the ridge line vector obtained by contour scanning of the finger image included in the image,
Estimating the finger shape
The finger shape estimation method according to any one of claims 8 to 10, comprising estimating a finger shape based on a position of a start point and / or an end point of the set ridge line vector .

Further, before estimating the finger shape, a separation plane is generated using only color information obtained from the image, nail region candidates are extracted based on the generated separation plane, and a preset palm is set. The finger shape estimation according to any one of claims 8 to 11, further comprising: extracting a nail region by re-determination of a nail region with respect to the nail region candidate by pixel discrimination using an image including an image. Method.

Processing to acquire an image including a finger shape;
A process of analyzing the acquired image and acquiring a first feature amount corresponding to a ridge line shape of a finger included in the image;
A finger shape corresponding to the first feature value is obtained by referring to a database for collation in which a second feature value corresponding to a predetermined finger shape set in advance is stored based on the first feature value. The processing to be estimated is implemented in an information processing device and executed.
The process of acquiring the first feature amount includes:
The hand image included in the image is the foreground, the image other than the finger image is the background, the distance from the background image in the foreground image is regarded as a height, the image is regarded as one mountain-shaped image, Including processing for obtaining information on the ridge line shape from the mountain-shaped image
Finger shape estimation program.

Further, before the process of estimating the finger shape, a separation plane is generated using only color information obtained from the image, nail region candidates are extracted based on the generated separation plane, and a preset palm is set. 14. The finger shape according to claim 13, wherein processing for extracting a nail region by performing redetermination of a nail region on the nail region candidate by pixel determination using an image including an image is implemented and executed in an information processing device. Estimation program.