JP5569973B2

JP5569973B2 - Information terminal device, method and program

Info

Publication number: JP5569973B2
Application number: JP2011019710A
Authority: JP
Inventors: 晴久加藤; 暁夫米山
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2011-02-01
Filing date: 2011-02-01
Publication date: 2014-08-13
Anticipated expiration: 2031-02-01
Also published as: JP2012160051A

Description

本発明は、撮影部と表示部とを備えた情報端末装置、方法及びプログラムに関し、特に、被写体の撮影部に対する位置および姿勢に基づいて表示部を制御する情報端末装置、方法及びプログラムに関する。 The present invention relates to an information terminal device , method, and program including an imaging unit and a display unit, and more particularly to an information terminal device , method, and program for controlling a display unit based on the position and orientation of a subject with respect to the imaging unit.

撮影対象たる被写体との相対的な位置関係に応じて情報を提示する装置は、提示する情報を直感的に変化させることが可能であり、利用者の利便性を向上させることができる。上記を実現する方法としては、以下のような方法が公開されている。 An apparatus that presents information according to a relative positional relationship with a subject to be imaged can intuitively change the information to be presented, and can improve user convenience. The following methods are disclosed as methods for realizing the above.

特許文献1では，レーザセンサを使用することで、撮像対象の位置の動きを検出平面において検出することを提案している。 Patent Document 1 proposes that the movement of the position of the imaging target is detected on the detection plane by using a laser sensor.

特許文献2では，モーションセンサの検出した移動および傾動の方向および量に応じて、表示される部分をスクロールまたはズームする手法が提案されている。 Patent Document 2 proposes a method of scrolling or zooming a displayed portion according to the direction and amount of movement and tilt detected by a motion sensor.

特許文献3では，入力画像の最も大きい移動したブロックが検出され、その最も大きい移動したブロックは撮像対象の位置として定められる。そして、該ブロックの特徴が分析され、移動した撮像対象の中心座標が追跡される。 In Patent Document 3, the largest moved block of the input image is detected, and the largest moved block is determined as the position of the imaging target. Then, the feature of the block is analyzed, and the center coordinates of the moved imaging target are tracked.

特開2010-244480号公報JP 2010-244480 A 特開2009-003799号公報JP 2009-003799 A 特開2010-170300号公報JP 2010-170300 A

特許文献1および特許文献2に開示された技術では、それぞれレーザセンサやモーションセンサが必要となるため、利用できる装置が限定されるという問題がある。また、特別なセンサ類の搭載は端末のコスト上昇を招くだけでなく、装置の小型化や省電力化が困難になる可能性がある。 The techniques disclosed in Patent Document 1 and Patent Document 2 require a laser sensor and a motion sensor, respectively, so that there is a problem that devices that can be used are limited. In addition, the installation of special sensors not only increases the cost of the terminal, but also makes it difficult to reduce the size and power consumption of the device.

特許文献3では、撮像対象の検出に動きの有無が用いられているため、平面的な前後左右の入力にしか対応できないという問題がある。また、中心座標を用いているため、撮像対象の見かけの変化に影響を受けやすいという問題がある。 In Patent Document 3, since presence / absence of motion is used for detection of an imaging target, there is a problem that only a planar front / rear / right / left input can be handled. In addition, since the center coordinates are used, there is a problem that it is easily influenced by an apparent change of the imaging target.

本発明は、上記従来技術の課題を解決し、特別なセンサ等を用いることなく、情報端末装置に対する被写体の空間的な動作によって、情報端末装置の表示部に表示される情報を制御できる、高速かつ高精度な情報端末装置を提供することを目的とする。 The present invention solves the above-described problems of the prior art, and can control information displayed on the display unit of the information terminal device by using a spatial motion of the subject with respect to the information terminal device without using a special sensor or the like. And it aims at providing a highly accurate information terminal device.

上記の目的を達成するため、本発明は、被写体を連続的に撮影する撮影部と表示部とを含む情報端末装置において、撮影画像より色特徴に基づいて前記被写体の領域を抽出する領域形成部と、前記抽出された被写体の領域を包含する第一の外接多角形を形成する第一多角形形成部と、前記第一の外接多角形の内部より前記被写体の領域を排除した内部背景領域を抽出する内部背景抽出部と、前記内部背景領域を包含する第二の外接多角形を形成する第二多角形形成部と、前記第一の外接多角形、前記内部背景領域及び前記第二の外接多角形に基づいて、前記被写体の前記撮影部に対する位置及び姿勢のうち少なくとも一方を推定する姿勢推定部と、前記推定された位置及び姿勢のうち少なくとも一方に基づいて前記表示部に表示される情報の一部を制御する制御部とを備えることを特徴とする。 In order to achieve the above object, the present invention provides an area forming unit that extracts an area of a subject from a photographed image based on color characteristics in an information terminal device including a photographing unit that continuously photographs the subject and a display unit. A first polygon forming unit that forms a first circumscribed polygon that includes the extracted subject area; and an inner background area that excludes the subject area from the inside of the first circumscribed polygon. An internal background extraction unit for extracting; a second polygon forming unit for forming a second circumscribed polygon including the internal background region; the first circumscribed polygon; the internal background region; and the second circumscribed region. Based on a polygon, a posture estimation unit that estimates at least one of a position and a posture of the subject with respect to the photographing unit, and information displayed on the display unit based on at least one of the estimated position and posture of Characterized in that it comprises a control unit for controlling the parts.

本発明によれば、被写体の位置及び姿勢のうち少なくとも一方に基づいて表示部が制御されるので、本発明の情報端末装置を利用するユーザに直感的な操作インタフェースが提供される。また本発明によれば、撮像部で連続的に撮影される被写体の画像に対する簡素な画像処理によって、被写体の位置及び姿勢のうち少なくとも一方が高速かつ高精度に推定されるので、表示部も高速かつ高精度に制御され、本発明の情報端末装置を利用するユーザに使い勝手のよい操作インタフェースが提供される。さらに本発明によれば簡素な画像処理が利用されるのみであって、特別なセンサ等を必要としない。 According to the present invention, since the display unit is controlled based on at least one of the position and orientation of the subject, an intuitive operation interface is provided to the user who uses the information terminal device of the present invention. Further, according to the present invention, at least one of the position and orientation of the subject can be estimated at high speed and with high accuracy by simple image processing on the image of the subject continuously photographed by the imaging unit. In addition, an operation interface that is controlled with high accuracy and is easy to use for a user who uses the information terminal device of the present invention is provided. Furthermore, according to the present invention, only simple image processing is used, and no special sensor or the like is required.

本発明の情報端末装置の機能ブロック図である。It is a functional block diagram of the information terminal device of the present invention. 本発明の動作の概要を説明する図である。It is a figure explaining the outline | summary of operation | movement of this invention. 特徴検出部及び姿勢推定部の処理を説明する図である。It is a figure explaining the process of a feature detection part and an attitude | position estimation part. 姿勢推定部による被写体の形状(指の開閉)の推定を説明する図である。It is a figure explaining estimation of the shape of a subject (opening / closing of a finger) by a posture estimation unit. 制御部のより詳細な機能ブロック図である。It is a more detailed functional block diagram of a control part. 移動制御部による表示情報の一部の移動制御の例を示す図である。It is a figure which shows the example of a movement control of a part of display information by a movement control part. 被写体(手)の姿勢変化の例として、基準位置から各方向への回転を示す図である。FIG. 4 is a diagram illustrating rotation in each direction from a reference position as an example of a posture change of a subject (hand). 図7の各回転の推定結果に基づいて、回転制御部により表示情報の一部が回転制御される例を示す図である。FIG. 8 is a diagram showing an example in which a part of display information is rotation-controlled by a rotation control unit based on the estimation result of each rotation in FIG. 図7の各回転を姿勢推定部が推定するための、実際にそのような回転があった場合の内部背景領域の形状や基準点の位置などの例を示す図である。FIG. 8 is a diagram illustrating an example of the shape of an internal background region and the position of a reference point when such a rotation actually occurs for the posture estimation unit to estimate each rotation in FIG. 図7乃至図9の対応関係を示す図表である。10 is a chart showing a correspondence relationship between FIGS. 7 to 9; 拡大制御部による表示情報の一部の拡大縮小制御の例を示す図である。It is a figure which shows the example of the expansion / contraction control of a part of display information by an expansion control part. 補助表示部を説明する図である。It is a figure explaining an auxiliary | assistant display part. 本発明において利用可能である、手に限定されない一般的な被写体の例を示す図である。It is a figure which shows the example of the general to-be-photographed object which can be utilized in this invention and is not limited to a hand.

以下に、図面を参照して本発明を詳細に説明する。図1に本発明の情報端末装置の機能ブロック図を示す。情報端末装置10は、撮影部1、推定部2、制御部3及び表示部4を備える。推定部2は領域形成部21、特徴検出部22及び姿勢推定部23を備える。特徴検出部22は第一多角形形成部221、内部背景抽出部222及び第二多角形形成部223を備える。 Hereinafter, the present invention will be described in detail with reference to the drawings. FIG. 1 shows a functional block diagram of the information terminal device of the present invention. The information terminal device 10 includes an imaging unit 1, an estimation unit 2, a control unit 3, and a display unit 4. The estimation unit 2 includes a region formation unit 21, a feature detection unit 22, and a posture estimation unit 23. The feature detecting unit 22 includes a first polygon forming unit 221, an internal background extracting unit 222, and a second polygon forming unit 223.

情報端末装置10には携帯端末を用い、撮影部1として携帯端末に標準装備されるデジタルカメラを用いることができるが、本発明の適用される構成はこれに限定されない。撮像機能を備えるどのような情報端末を利用してもよい。例えば、外付けカメラなどによって撮像機能を備えたパーソナルコンピュータなどでもよい。 As the information terminal device 10, a mobile terminal can be used, and a digital camera provided as a standard in the mobile terminal can be used as the photographing unit 1, but the configuration to which the present invention is applied is not limited to this. Any information terminal having an imaging function may be used. For example, a personal computer having an imaging function using an external camera or the like may be used.

図2に本発明の動作の概要を示し、図2を用いて撮影部1、推定部2、制御部3及び表示部4の動作の概要を説明する。(a)に示すような情報端末装置10（ここでは携帯端末でありその表示部4の背面に撮像部1を有するものとする）は、撮像部1により被写体(b)を所定のサンプリング周期で連続的に撮影している。図2に示すように被写体(b)は、本発明の好ましい実施形態において手であり、第三乃至第五指（中指、薬指及び小指）を握り込み、且つ該握り込んだ部分を、間隙を設けて囲むように第一指（親指）及び第二指（人差し指）を構えた格好にある。 FIG. 2 shows an outline of the operation of the present invention, and the outline of the operations of the photographing unit 1, the estimation unit 2, the control unit 3, and the display unit 4 will be described with reference to FIG. An information terminal device 10 as shown in (a) (here, it is a portable terminal and has an imaging unit 1 on the back of the display unit 4), the imaging unit 1 causes the subject (b) to be captured at a predetermined sampling period. Shooting continuously. As shown in FIG. 2, the subject (b) is a hand in a preferred embodiment of the present invention, grasps the third to fifth fingers (middle finger, ring finger and little finger), and the grasped portion is separated by a gap. The first finger (thumb) and the second finger (forefinger) are arranged so as to surround them.

握り込んだ第三乃至第五指はそのまま固定して開くことはないが、第一指及び第二指は握り込んだ部分を囲む状態のままであれば動かしてよい。動かすことで第一指の腹と第二指の腹とが接してつまむ状態となってもよいし、図2に示すような離れた状態であってもよい。 The grasped third to fifth fingers are not fixed and opened as they are, but the first and second fingers may be moved as long as they surround the grasped portion. By moving it, the belly of the first finger and the belly of the second finger may be in contact with each other, or may be in a separated state as shown in FIG.

撮像部1は被写体である手(b)を、第一指及び第二指の側から撮影する。握り込んだ第三乃至第五指は、第一指及び第二指が概ね形成する面(b1)よりも、撮像部1から見ると奥に位置する。 The imaging unit 1 captures the hand (b) as a subject from the first and second fingers. The third to fifth fingers grasped are located behind the surface (b1) substantially formed by the first finger and the second finger when viewed from the imaging unit 1.

推定部2は、撮像部1より送られる(d)に示すような手(b)の撮影画像を用いて、手(b)の撮像部1に対する相対的な位置及び姿勢を推定する。図2では当該位置および姿勢を推定するための基準となる空間配置の一例が描かれている。すなわち、撮像部1の撮像面（を広げて描いたもの）を(a1)として示し、手(b)の第一指及び第二指により概ね形成される平面(b1)と平行である状態を示している。また撮像部1の正面に手(b)が位置しているものとする。 The estimation unit 2 estimates the relative position and orientation of the hand (b) with respect to the imaging unit 1 using the captured image of the hand (b) as shown in (d) sent from the imaging unit 1. FIG. 2 shows an example of a spatial arrangement serving as a reference for estimating the position and orientation. That is, the imaging surface of the imaging unit 1 (which is drawn in an expanded manner) is shown as (a1), and is in a state parallel to the plane (b1) that is generally formed by the first and second fingers of the hand (b). Show. Further, it is assumed that the hand (b) is located in front of the imaging unit 1.

なお、当該基準配置は説明の便宜上用いる一例の配置であって、その他の配置を基準としてもよい。 The reference arrangement is an example arrangement used for convenience of explanation, and other arrangements may be used as a reference.

推定部2は、このような既知の空間配置及び既知の被写体(b)における被写体(b)の撮影画像(d)を基準として用いることで、被写体(b)の位置および姿勢などの変化を推定する。推定する位置とは、被写体(b)の面(b1)内でのx軸y軸方向移動(c2)などで定められる空間的位置である。推定する姿勢とは、例えば手首を回したり傾けたりすることによる、手(b)の傾きとして表れる面(b1)の傾きに対応する姿勢であり、回転(c3)や(c4)などにより定められるものである。またそのほか、推定部2は面(a1)と面(b1)との奥行きz軸方向の距離(c1)や、手の形状すなわち第一指及び第二指の開閉なども推定する。 The estimation unit 2 estimates changes in the position and orientation of the subject (b) using the known spatial arrangement and the captured image (d) of the subject (b) in the known subject (b) as a reference. To do. The estimated position is a spatial position determined by movement in the x-axis and y-axis directions (c2) in the surface (b1) of the subject (b). The estimated posture is a posture corresponding to the tilt of the surface (b1) that appears as the tilt of the hand (b), for example, by turning or tilting the wrist, and is determined by rotation (c3), (c4), etc. Is. In addition, the estimation unit 2 also estimates the distance (c1) between the surface (a1) and the surface (b1) in the depth z-axis direction, the shape of the hand, that is, the opening and closing of the first and second fingers.

特に、推定部2では撮影画像(d)において、手(b)の領域(d4)を検出し、第一指及び第二指の領域(d2)及び(d1)で囲まれ、(d4)に属さないU字乃至V字状の領域(d5)を検出する。また(d5)より第三指の第二関節を想定した基準点(d30)を検出する。被写体の位置および姿勢が変わるとその見え方すなわち(d4)の位置および形が変わり、これら検出される結果(d4)、(d5)及び(d30)も変わる。推定部2は幾何学的考察に基づきこれらの検出結果から、被写体(b)の変化した位置及び姿勢などを推定する。 In particular, the estimation unit 2 detects the region (d4) of the hand (b) in the captured image (d) and is surrounded by the regions (d2) and (d1) of the first finger and the second finger, and (d4) A U-shaped or V-shaped region (d5) that does not belong is detected. Further, a reference point (d30) assuming the second joint of the third finger is detected from (d5). When the position and orientation of the subject change, the appearance thereof, that is, the position and shape of (d4) change, and the detected results (d4), (d5), and (d30) also change. The estimation unit 2 estimates the changed position and orientation of the subject (b) from these detection results based on geometric considerations.

制御部3は、推定部2により推定された位置及び姿勢に基づいて表示部4に表示される情報の一部を制御する。例えば、表示部4にポインタがある場合なら、(c2)のxy方向の位置変化の推定結果をそのまま、又は所定倍して表示部4におけるポインタを移動させて、マウスの代理のように機能させることができる。また例えば、表示部4に立体画像が表示されている場合に、(c3)及び(c4)のような回転方向として推定された姿勢の変化に連動させて、当該立体画像をあたかも手(b)に乗っているかのように回転させることができる。また、推定された位置及び姿勢やその他に基づいて他の表示制御を行うこともできる。 The control unit 3 controls a part of information displayed on the display unit 4 based on the position and orientation estimated by the estimation unit 2. For example, if the display unit 4 has a pointer, the estimated result of the position change in the xy direction in (c2) is directly or multiplied by a predetermined value, and the pointer on the display unit 4 is moved to function as a substitute for a mouse. be able to. Further, for example, when a stereoscopic image is displayed on the display unit 4, the stereoscopic image is displayed as if it were a hand (b) in conjunction with a change in posture estimated as the rotation direction such as (c3) and (c4). It can be rotated as if you are riding on. Also, other display control can be performed based on the estimated position and orientation and others.

なお、本発明の全体において、推定部2では位置(x,y)や姿勢(角度θなど)を推定し、制御部3では当該推定された量の変化である位置の変化(Δx, Δy)や姿勢の変化(角度の変化Δθなど)を利用して制御を行う。また推定部2で推定した距離zを利用して、制御部3はその変化Δzを利用して制御を行う。後述の推定部2で推定した形状については、各時点で推定される形状をそのまま制御部3で利用するが、結果として推定された形状の変化も制御部3で利用することとなる。 Note that in the whole of the present invention, the estimation unit 2 estimates the position (x, y) and posture (angle θ, etc.), and the control unit 3 changes the position (Δx, Δy), which is a change in the estimated amount. Control is performed using a change in angle or posture (change in angle Δθ or the like). In addition, using the distance z estimated by the estimation unit 2, the control unit 3 performs control using the change Δz. As for the shape estimated by the estimation unit 2 described later, the shape estimated at each time point is used as it is by the control unit 3, but as a result, the change in the estimated shape is also used by the control unit 3.

図2に概略を示したような本発明により、次の(1)〜(3)のような効果がある。(1)撮像部1と被写体との相対的な位置及び姿勢などを変化させるだけで表示部4での表示情報の一部を制御できる。したがって、ユーザは、撮像部1に対する被写体（特にユーザ自身の手）の相対的な位置及び姿勢などを変化させるという直感的な操作で表示情報の一部を制御できる。 According to the present invention as schematically shown in FIG. 2, the following effects (1) to (3) are obtained. (1) A part of display information on the display unit 4 can be controlled only by changing the relative position and posture of the imaging unit 1 and the subject. Therefore, the user can control a part of the display information by an intuitive operation of changing the relative position and posture of the subject (particularly the user's own hand) with respect to the imaging unit 1.

(2)推定部2では被写体の画像に対する簡素な画像処理によって位置及び姿勢などを推定するので、例えばソフトウェアで実現可能であり、特別なセンサ類といったような特別なハードウェアを情報端末装置10に組み込む必要がない。 (2) Since the estimation unit 2 estimates the position and orientation by simple image processing on the image of the subject, it can be realized by software, for example, and special hardware such as special sensors is provided in the information terminal device 10. There is no need to include it.

(3)図2の撮影画像(d)で説明したような処理によって位置及び姿勢などが高速かつ高精度に推定されるので、表示情報の一部を高速かつ高精度に制御できる。 (3) Since the position, posture, and the like are estimated at high speed and with high accuracy by the processing described in the captured image (d) in FIG. 2, a part of the display information can be controlled at high speed and with high accuracy.

次に、推定部2の各部すなわち領域形成部21、特徴検出部22及び姿勢推定部23の処理の詳細について順次説明する。まず、領域形成部21では、撮像部1から入力される画像から被写体の手の領域である肌色領域を抽出する。肌色領域は、撮像部1から入力される画像における色情報に基づいて抽出できる。例えば、撮像部1から入力される画像がRGB色空間で表現されている場合、肌色領域は、RGBが予め設定した範囲内に収まっている画素を肌色領域として抽出できる。例えば、以下の非特許文献1で紹介される(数1)のような設定範囲を用いて抽出してよい。 Next, details of the processing of each unit of the estimation unit 2, that is, the region forming unit 21, the feature detection unit 22, and the posture estimation unit 23 will be sequentially described. First, the region forming unit 21 extracts a skin color region that is a region of the subject's hand from the image input from the imaging unit 1. The skin color region can be extracted based on the color information in the image input from the imaging unit 1. For example, when the image input from the imaging unit 1 is expressed in the RGB color space, the skin color area can extract pixels that fall within a preset range of RGB as the skin color area. For example, you may extract using the setting range like (Formula 1) introduced by the following nonpatent literature 1.

（非特許文献1）Vladimir Vezhnevets, Vassili Sazonov, Alla Andreeva, "A Survey on Pixelbased Skin Color Detection Techniques," Pattern Recognition, vol.40, no.3, pp.1106-1122, March 2007. (Non-Patent Document 1) Vladimir Vezhnevets, Vassili Sazonov, Alla Andreeva, "A Survey on Pixelbased Skin Color Detection Techniques," Pattern Recognition, vol.40, no.3, pp.1106-1122, March 2007.

あるいは、RGB色空間からHSV色空間(明度V，彩度S，色相H)の表現に変換する。RGB色空間の表現からHSV色空間の表現への変換は(数2)で表される。 Alternatively, the RGB color space is converted into a representation of the HSV color space (brightness V, saturation S, hue H). The conversion from the RGB color space representation to the HSV color space representation is expressed by (Equation 2).

上記(数2)の色相Hが予め設定した範囲内に収まっている画素を肌色領域としてもよい。 Pixels in which the hue H in the above (Equation 2) falls within a preset range may be set as the skin color region.

あるいは、YCbCrなど他の色空間を利用してもよい。RGB色空間の表現からYCbCr色空間の表現への変換は(数3)で表される。 Alternatively, other color spaces such as YCbCr may be used. The conversion from the RGB color space representation to the YCbCr color space representation is expressed by (Equation 3).

上記(数3)の色差Cb、Crが予め設定した範囲内に収まっている画素を肌色領域としてもよい。 Pixels in which the color differences Cb and Cr in (Equation 3) are within a preset range may be used as the skin color region.

あるいは、複数の色空間における肌色領域結果の重み付き平均が予め設定した閾値を超える領域を肌色領域としても良い。各方式にかかる重みは肌色が最も正確に抽出できるように、予め既知の肌色領域を使って設定しておく。情報端末装置10の処理性能に応じて、組み合わせる方式の種類および数を設定してもよい。処理性能に余裕があれば、ガウス混合モデル等の確率モデルに基づく肌色検出方式を組み合わせてもよい。 Or it is good also considering the area | region where the weighted average of the skin color area | region result in several color spaces exceeds the preset threshold value as a skin color area | region. The weight applied to each method is set in advance using a known skin color region so that the skin color can be extracted most accurately. Depending on the processing performance of the information terminal device 10, the type and number of methods to be combined may be set. If there is a margin in processing performance, a skin color detection method based on a probability model such as a Gaussian mixture model may be combined.

次に、特徴検出部22の処理を、第一多角形形成部221、内部背景抽出部222及び第二多角形形成部223の順に説明し、さらに続く姿勢推定部23の処理を説明する。またこれらの処理を説明する例となる図を図3に示す。以下のように、特徴検出部22及び姿勢推定部23では図3の(A)〜(E)に例示される処理によって、最終的に矢印(E)の根元の位置として示される、予め設定された基準点に対応する座標(画像中の位置座標)や、その他の情報を検出する。予め設定される基準点は前述の通り、被写体として手を想定している場合であれば、第三指の第二関節であり、以下の処理によって当該関節の位置座標が検出される。そして姿勢推定部23は当該基準点などを用いて姿勢を推定する。 Next, the processing of the feature detection unit 22 will be described in the order of the first polygon forming unit 221, the internal background extraction unit 222, and the second polygon forming unit 223, and further the processing of the posture estimation unit 23 will be described. FIG. 3 shows an example diagram for explaining these processes. As described below, the feature detection unit 22 and the posture estimation unit 23 are set in advance by the processes exemplified in (A) to (E) of FIG. 3 and finally shown as the root position of the arrow (E). The coordinates corresponding to the reference point (position coordinates in the image) and other information are detected. As described above, the reference point set in advance is the second joint of the third finger if the hand is assumed as the subject, and the position coordinates of the joint are detected by the following processing. Then, the posture estimation unit 23 estimates the posture using the reference point and the like.

まず、第一多角形形成部221は、領域形成部21により抽出された肌色領域のうち最も大きな肌色領域を選択し、モルフォロジフィルタによって当該肌色領域の部分的な欠損を補う。これにより、(A)に示すような肌色領域が得られる。続いて、第一多角形形成部221は(A)の肌色領域の全体を包含する、(B)のような外接多角形を形成する。 First, the first polygon forming unit 221 selects the largest skin color region among the skin color regions extracted by the region forming unit 21, and compensates for partial defects in the skin color region using a morphology filter. As a result, a skin color region as shown in (A) is obtained. Subsequently, the first polygon forming unit 221 forms a circumscribed polygon as shown in (B), which includes the entire skin color area shown in (A).

続いて、内部背景抽出部222は、形成された外接多角形(B)の内部に存在する面積最大の非肌色領域を前述の内部背景領域として、斜線部(C)に示すように抽出する。なお、最大の非肌色領域を抽出することで、第一指及び第二指で囲まれる領域以外の、例えば手の甲の側や手首側などに生ずる、肌色領域(A)と外接多角形(B)との間の小さな隙間の領域が内部背景領域から除外される。 Subsequently, the internal background extraction unit 222 extracts the non-skin color region having the maximum area existing inside the formed circumscribed polygon (B) as the above-described internal background region, as indicated by the hatched portion (C). In addition, by extracting the maximum non-skin color region, for example, the skin color region (A) and circumscribed polygon (B) occurring on the back side of the hand, the wrist side, etc., other than the region surrounded by the first finger and the second finger A small gap area between and is excluded from the internal background area.

続いて、第二多角形形成部223は、抽出された内部背景領域(C)の全体を包含する外接多角形を点線(D)に示すように形成する。図3の例では、第一指及び第二指は離れているため、外接多角形(D)は外接多角形(B)と辺の一部を共有している。もし第一指及び第二指が接しており、当該指同士をつまんだ状態の手であれば外接多角形(D)は外接多角形(B)と辺を共有しない。 Subsequently, the second polygon forming unit 223 forms a circumscribed polygon including the entire extracted internal background region (C) as indicated by a dotted line (D). In the example of FIG. 3, since the first finger and the second finger are separated from each other, the circumscribed polygon (D) shares a part of the side with the circumscribed polygon (B). If the first finger and the second finger are in contact with each other and the hand is in a state of pinching the fingers, the circumscribed polygon (D) does not share the side with the circumscribed polygon (B).

なお、外接多角形(B)及び(D)はそれぞれが形成される元となる領域の、肌色領域(A)及び内部背景領域(C)に対して、各種の周知技術を適用することによって凸多角形として形成される。 The circumscribed polygons (B) and (D) are convex by applying various well-known techniques to the skin color area (A) and the internal background area (C), which are the areas from which the polygons are formed. Formed as a polygon.

続いて、姿勢推定部23は、矢印(E)の根元の点として示され、第三関節により形成される角の点に対応する基準点(F)を検出し、その座標を求めるため、次のような処理を行う。すなわち、内部領域(C)の境界上の各点より、形成した外接多角形(D)の辺上の点へ、当該内部領域(C)の内部を通過せずに至る距離のうち、最小距離となるものを求める。そして、当該内部領域(C)の境界上の各点から求められた最小距離のうち、最大距離を与える内部領域(C)の境界上の点を、矢印(E)の根元として示すように、基準点(F)とする。 Subsequently, the posture estimation unit 23 detects the reference point (F) corresponding to the corner point formed by the third joint, which is indicated as the base point of the arrow (E), and calculates its coordinates. Perform the following process. That is, the minimum distance among the distances from each point on the boundary of the inner area (C) to the point on the side of the formed circumscribed polygon (D) without passing through the inner area (C) Find what will be. And, as shown as the root of the arrow (E), the point on the boundary of the inner region (C) that gives the maximum distance among the minimum distances determined from each point on the boundary of the inner region (C), The reference point (F).

さらに、姿勢推定部23は、当該求めた基準点(F)若しくは内部領域(C)、若しくは(F)と(C)の相対的な位置関係のうち少なくとも１つを用いて、被写体の位置及び姿勢を推定する。具体的には基準点(F)の座標すなわち画像上の(x,y)位置座標を、被写体の位置として推定する。また、基準点(F)と内部領域(C)の重心との位置関係を用いて、被写体の姿勢を推定する。当該姿勢の推定については後述の制御部3の説明と共に詳述する。 Further, the posture estimation unit 23 uses at least one of the obtained reference point (F), the internal region (C), or the relative positional relationship between (F) and (C) to determine the position of the subject and Estimate posture. Specifically, the coordinates of the reference point (F), that is, the (x, y) position coordinates on the image are estimated as the position of the subject. Further, the posture of the subject is estimated using the positional relationship between the reference point (F) and the center of gravity of the internal region (C). The estimation of the posture will be described in detail together with the description of the control unit 3 described later.

また、姿勢推定部23は、外接多角形(B)と(D)とに共有辺があるか否かに基づいて、被写体である手の形状、すなわち第一指及び第二指が離れて開いているか、接して閉じているか、を推定する。当該形状の推定を図4の例で説明する。すなわち(10)のように第一指及び第二指が離れて開いている場合は、(11)のように外接多角形(B)と(D)とには重なり部分があり、辺（の一部）を共有する。逆に(20)のように第1指及び第2指が接して閉じている場合は、(21)のように外接多角形(B)と(D)とには重なり部分がなく、辺（の一部）を共有しない。このように、外接多角形(B)と(D)との共有辺の有無によって、手の形状すなわち指同士の開閉を推定できる。 Further, the posture estimation unit 23 opens the shape of the hand that is the subject, that is, the first finger and the second finger, based on whether or not the circumscribed polygons (B) and (D) have a shared side. Whether it is closed or touching. The shape estimation will be described with reference to the example of FIG. That is, when the first finger and the second finger are opened apart as in (10), the circumscribed polygons (B) and (D) have overlapping portions as in (11), and the side ( Share some). Conversely, when the first finger and the second finger are in contact and closed as in (20), the circumscribed polygons (B) and (D) have no overlapping part as in (21), and the side ( Do not share). As described above, the shape of the hand, that is, the opening / closing of the fingers can be estimated by the presence / absence of the shared side of the circumscribed polygons (B) and (D).

そして、制御部3は、当該推定された被写体の位置及び姿勢などに連動させて、表示部4に表示される情報の一部を移動又は回転させるなどする。制御部3は図5に示すように、移動制御部31、回転制御部32、切替制御部33及び倍率制御部34を備え、各制御機能を独立にあるいは互いに連動して担い、いずれもユーザにとって直感的なインタフェースを提供する。以下これらを説明する。 Then, the control unit 3 moves or rotates part of the information displayed on the display unit 4 in conjunction with the estimated position and posture of the subject. As shown in FIG. 5, the control unit 3 includes a movement control unit 31, a rotation control unit 32, a switching control unit 33, and a magnification control unit 34, and each control function is carried out independently or in conjunction with each other. Provide an intuitive interface. These will be described below.

移動制御部31は、推定された被写体の位置の変化に基づいて表示情報の一部を移動させる。当該移動制御の例を図6に示す。すなわち、推定位置(x, y)の変化として定まる基準点(F)の画像(1)上でのピクセル単位での移動(Δx, Δy)をそのまま又は定数a倍して、表示部(2)における表示情報の一部、例えばポインタ、の位置を(Δx, Δy)又は(aΔx, aΔy)だけ移動させる。またこのような定数a倍による表示情報の一部の移動量の調整は、x方向とy方向とで別の定数を用いてもよい。 The movement control unit 31 moves a part of the display information based on the estimated change in the position of the subject. An example of the movement control is shown in FIG. That is, the reference point (F) determined as a change in the estimated position (x, y) (Fx) movement (Δx, Δy) in pixels on the image (1) as it is or multiplied by a constant a, the display unit (2) The position of a part of the display information, for example, the pointer, is moved by (Δx, Δy) or (aΔx, aΔy). Further, the adjustment of the movement amount of a part of the display information by the constant a times may use different constants in the x direction and the y direction.

被写体が図2や図4で説明したような形状に構えた手である場合、推定位置は握り込んだ第三指第二関節の角の位置となる。当該角の位置は第三乃至第五指を握り込んだ手の全体と剛体的に連動して動く位置である。よって、周知のマウスにおいて手の動きがそのままポインタの移動量となるのと同様な、直感的でユーザにとって操作の容易なインタフェースが上記のような移動制御部31によってもたらされる。 When the subject is a hand held in the shape described with reference to FIG. 2 or FIG. 4, the estimated position is the position of the corner of the third joint of the second finger grasped. The position of the corner is a position that moves in a rigidly interlocked manner with the entire hand holding the third to fifth fingers. Accordingly, the movement control unit 31 provides an intuitive and easy-to-operate interface for the user, similar to the movement of the hand as it is in the known mouse, which directly becomes the amount of movement of the pointer.

回転制御部32は、推定された被写体の姿勢の変化に基づいて表示情報の一部を回転させる。当該回転制御と、そこで必要となる姿勢推定部23による姿勢推定の例を、図7乃至図9を用いて説明する。図7は推定対象となる被写体の実際の姿勢の変化の例として、(1)被写体(手)の各回転の例、を示している。図8は前記(1)被写体の各回転の推定結果に基づいて、回転制御部32が表示部4における(2)表示情報の一部を回転制御する例、を示している。 The rotation control unit 32 rotates a part of the display information based on the estimated change in the posture of the subject. An example of the rotation control and posture estimation by the posture estimation unit 23 required therewith will be described with reference to FIGS. FIG. 7 shows (1) an example of each rotation of the subject (hand) as an example of the actual posture change of the subject to be estimated. FIG. 8 shows (1) an example in which the rotation control unit 32 controls (2) part of display information on the display unit 4 based on the estimation result of each rotation of the subject.

図9は前記(1)被写体の回転の各場合における、図3で説明したような特徴検出部22及び姿勢推定部23で得られる(3)内部背景領域及び基準点、並びに当該基準点と内部背景領域の重心との位置関係を表す基準点を始点、重心を終点とする基準点・重心ベクトルv、の例を示している。被写体の姿勢(回転)に応じて図9のように内部背景領域の形状や基準点との位置関係が変わるため、姿勢推定部23による姿勢推定が可能となる。姿勢推定部23は前記(3)のうち特に基準点・重心ベクトルを用いて、前記(1)被写体の各回転に対応する姿勢を推定する。以下、図7乃至図9の各図の対応を説明する。 9 is obtained by the feature detection unit 22 and the posture estimation unit 23 as described in FIG. 3 in each case of (1) rotation of the subject (3) internal background region and reference point, and the reference point and the internal An example of a reference point / centroid vector v having a reference point representing a positional relationship with the centroid of the background region as a start point and an end point as the centroid is shown. Since the shape of the internal background region and the positional relationship with the reference point change according to the posture (rotation) of the subject, posture estimation by the posture estimation unit 23 becomes possible. The posture estimation unit 23 estimates the posture corresponding to each rotation of the subject (1) using the reference point / centroid vector in (3) above. Hereinafter, the correspondence between the respective drawings in FIGS. 7 to 9 will be described.

図7には被写体の姿勢変化の例として、(1)被写体(手)の各回転の例が示されている。図7では図2で説明したのと同様の、被写体である手(b)が基準となる位置、姿勢及び形状の状態で平面(b1)上に存在し、その画像が(d)として得られている状態を示す。当該例では軸(c3)として示す腕の軸における上下(up/down)方向と、軸(c4)として示す平面(b1)の左右(left/right)方向との回転を推定される姿勢として、表示情報の一部を制御するが、その他の基準位置と共にその他の回転軸を設定してもよい。 FIG. 7 shows (1) an example of each rotation of the subject (hand) as an example of the posture change of the subject. In FIG. 7, the hand (b) that is the subject exists on the plane (b1) in the state of the reference position, posture, and shape as described in FIG. 2, and an image thereof is obtained as (d). It shows the state. In this example, the posture estimated to rotate in the up / down direction on the axis of the arm shown as the axis (c3) and the left / right direction of the plane (b1) shown as the axis (c4), Although a part of the display information is controlled, other rotation axes may be set together with other reference positions.

図8では、図7の(1)に対応して、被写体である手を(1)の基準位置から手首方向の軸に相当する(c3)軸の周りに上回転／下回転した場合と、(c4)軸の周りに右回転／左回転した場合に、表示部4に表示される情報が制御される例を示している。制御される表示情報の一部の例として、一面に楕円の模様を有する直方体が描かれている。そして、当該直方体が、(2基準位置)で示す正面を向いた状態から、制御部3の制御によって、前記手の各回転に応じてそれぞれ回転されて表示される例を示している。xyz軸は当該回転の様子の理解を容易とするための便宜として示している。 In FIG. 8, corresponding to (1) in FIG. 7, the subject's hand is rotated up / down around the axis (c3) corresponding to the wrist direction axis from the reference position of (1), and (c4) An example is shown in which the information displayed on the display unit 4 is controlled when rotating right / left about the axis. As an example of part of the display information to be controlled, a rectangular parallelepiped having an elliptical pattern on one side is drawn. Then, an example is shown in which the rectangular parallelepiped is rotated and displayed in accordance with each rotation of the hand under the control of the control unit 3 from the state of facing the front indicated by (2 reference position). The xyz axis is shown as a convenience for easy understanding of the state of rotation.

すなわち図8では前述の(c3)上回転／下回転及び(c4)右回転／左回転のそれぞれの場合に対応して、（2上回転）／(2下回転)、及び（2右回転）／(2左回転)、のように回転されて表示される。当該回転制御は、被写体である手の図7の平面(b1)にあたかも当該直方体が張り付いており、手の回転と連動して同方向に回転されるような制御の例である。 In other words, in FIG. 8, (c3) up rotation / down rotation and (c4) right rotation / left rotation corresponding to the cases of (2 up rotation) / (2 down rotation) and (2 right rotation), respectively. Rotated and displayed as / (2 left rotation). The rotation control is an example of control in which the rectangular parallelepiped sticks to the plane (b1) in FIG. 7 of the hand that is the subject and rotates in the same direction in conjunction with the rotation of the hand.

図9には図7で前述した(c3)及び(c4)のそれぞれの場合の回転において、特徴検出部22及び姿勢推定部23で検出される、内部背景領域の形状、基準点及び該基準点を始点として内部背景領域の重心を終点とする基準点・重心ベクトルvの例が示されている。前述と同様に(3基準位置)に対してそれぞれ回転後の例が、（3上回転）／(3下回転)、及び（3右回転）／(3左回転)として示されている。 FIG. 9 shows the shape of the internal background region, the reference point, and the reference point detected by the feature detection unit 22 and the posture estimation unit 23 in the rotation in each case of (c3) and (c4) described above with reference to FIG. An example of a reference point / center-of-gravity vector v starting from, and ending at the center of gravity of the internal background region is shown. Similarly to the above, examples after rotation with respect to (3 reference positions) are shown as (3 upper rotation) / (3 lower rotation) and (3 right rotation) / (3 left rotation).

また、各姿勢での基準点・重心ベクトルvは図9にも示しているように、次の通りである。
(基準位置) v=(−x_ref, 0)
(上回転) v=(−x_up, +y_up)
(下回転) v=(−x_down, −y_down)
(右回転) v=(+x_right, 0)
(左回転) v=(−x_left, 0) Further, as shown in FIG. 9, the reference point / centroid vector v in each posture is as follows.
(Reference position) v = (− x _ref , 0)
(Up rotation) v = (− x _up , + y _up )
(Down rotation) v = (− x _down , −y _down )
(Rotate right) v = (+ x _right , 0)
(Rotate left) v = (− x _left , 0)

ここで、（3右回転）／(3左回転)と(3基準位置)とでは、次の関係があるので、上記の基準点・重心ベクトルvの各姿勢に対応する値を所定の閾値内で区別することが可能となり、姿勢推定部23はこれら各姿勢を区別して推定することが可能となる。
x_left > x_ref > 0, x_right > 0 Here, since (3 right rotation) / (3 left rotation) and (3 reference position) have the following relationship, the value corresponding to each posture of the reference point / centroid vector v is within a predetermined threshold. And the posture estimation unit 23 can distinguish and estimate these postures.
x _left > x _ref > 0, x _right > 0

また、(3上回転)／(3下回転)と(3基準位置)とでは、次の関係があるので、上記と同様に姿勢推定部23はこれら各姿勢を、上記（3右回転）／(3左回転)との区別も含めて区別して推定することが可能となる。
x_up >0 , y_up > 0, x_down > 0, y_down > 0, x_up≒x_ref, x_down≒x_ref In addition, since (3 upward rotation) / (3 downward rotation) and (3 reference position) have the following relationship, the posture estimation unit 23 converts these postures to the above (3 right rotation) / It is possible to make a distinction estimation including the distinction with (3 left rotation).
x _up > 0, y _up > 0, x _down > 0, y _down > 0, x _up ≒ x _ref , x _down ≒ x _ref

なおまた、図9では各姿勢での内部背景領域の形状、基準点及び基準点・重心ベクトルと共に、それら各姿勢の間での変化の様子をより容易に把握できるように、このような結果の得られる画像上での同一位置及び同一サイズの矩形領域を点線で表示している。当該点線矩形を目安とした各姿勢での画像対比より明らかなように、本発明では次のような効果がある。すなわち、被写体として第一指及び第二指をつまむような形で平面を形成し、撮像部1から見て当該平面の奥に第三乃至第五指を握り込んだ状態の手を設定していることから、各回転に応じてその見え方が顕著に変化し、上記のような各姿勢での顕著な基準点・重心ベクトルvの変化が得られるため、姿勢推定を高精度に行えるという効果がある。 In addition, in FIG. 9, the shape of the internal background region in each posture, the reference point, the reference point / centroid vector, and the change state between these postures can be more easily grasped. A rectangular region having the same position and the same size on the obtained image is displayed by a dotted line. As is clear from the image comparison in each posture with the dotted line rectangle as a guide, the present invention has the following effects. That is, a plane is formed by pinching the first finger and the second finger as a subject, and a hand with the third to fifth fingers held behind the plane as viewed from the imaging unit 1 is set. Therefore, the appearance changes significantly according to each rotation, and the remarkable reference point / centroid vector v change in each posture as described above is obtained, so that the posture estimation can be performed with high accuracy. There is.

当該効果は特に、第一指及び第二指と、握り込んだ第三乃至第五指とで段差があることに基づくものである。例えば(3左回転)では(3基準位置)に比べて第一指及び第二指で囲まれる部分が撮像部1から遠ざかるため、内部背景領域はx方向にもy方向にも小さくなり、第三指の凸部も小さくなる。逆に(3右回転)では(3基準位置)に比べて前記囲まれた部分が近づき、内部背景領域はx方向に縮小しy方向に拡大する。また第三指の凸部が拡大強調される。(3右回転)の場合ある程度回転すると、基準点として第三指の第二関節ではなく第四または第五指の第二関節が得られることとなるが、そのような場合はさらに当該凸部の拡大強調が促進されることとなる。 This effect is particularly based on the fact that there is a step between the first finger and the second finger and the third to fifth fingers grasped. For example, in (3 left rotation), the part surrounded by the first finger and the second finger is further away from the imaging unit 1 than (3 reference position), so the internal background area is smaller in both the x direction and the y direction. The convex part of the three fingers is also reduced. Conversely, in (3 right rotation), the enclosed portion is closer than (3 reference position), and the internal background area is reduced in the x direction and enlarged in the y direction. The convex part of the third finger is enlarged and emphasized. In the case of (3 right rotation), if it rotates to some extent, the second joint of the fourth or fifth finger is obtained as the reference point instead of the second joint of the third finger. Will be promoted.

同様の考察で、(3上回転)／(3下回転)の場合にも(3基準位置)と比べて内部背景領域の形状が顕著に変化する。(3上回転)の場合には第一指が撮像部1へ近づき、第二指が遠ざかることによって、同時に握り込んだ第三乃至第五指の見え方も連動して変化することによって、(3基準位置)の概ね上下対称な横向きU字乃至V字状の内部背景領域が、(3上回転)では上側に偏った形状へと変化する。逆に、(3下回転)の場合には第一指が撮像部1から遠ざかり、第二指が近づくことによって、同時に握り込んだ第三乃至第五指の見え方も連動して変化することによって、(3下回転)では下側へ偏った形状へと変化する。 In the same consideration, the shape of the internal background region changes significantly in the case of (3 upward rotation) / (3 downward rotation) as compared with (3 reference position). In the case of (3 upward rotation), when the first finger approaches the imaging unit 1 and the second finger moves away, the appearance of the third to fifth fingers grasped at the same time changes in conjunction with ( The (3 reference position) horizontal U-shaped or V-shaped inner background area that is generally vertically symmetric changes to a shape that is biased upward in (3 upward rotation). Conversely, in the case of (3 downward rotation), when the first finger moves away from the imaging unit 1 and the second finger approaches, the appearance of the third to fifth fingers grasped simultaneously also changes. Thus, in (3 downward rotation), the shape changes to a downwardly biased shape.

以上の説明のような見え方、形状が顕著に変化するという効果は、前記段差の存在によるものであり、本発明に特有のものである。(3基準位置)のU字乃至V字状の内部背景領域が本発明における手のような立体的な段差の存在する被写体によってではなく、仮に平面的な被写体によって得られたとすると、被写体が回転しても当該内部背景領域はアフィン変換乃至射影変換的な変化をするだけであって、図9のような顕著な形状変化は得られず、姿勢推定を可能にするような顕著な基準点・重心ベクトルの変化も得られない。 The effect that the appearance and shape change remarkably as described above is due to the presence of the step and is unique to the present invention. If the (3 reference position) U-shaped or V-shaped internal background region is obtained by a planar subject, not by a subject with a three-dimensional step like the hand in the present invention, the subject rotates. Even so, the internal background area only changes in an affine transformation or projective transformation, and a remarkable shape change as shown in FIG. The change of the center of gravity vector cannot be obtained.

以上、図7乃至図9の対応をまとめた図表を図10に示す。図10において、図7で説明した推定姿勢(1)に基づいて図8で説明した表示の回転(2)を行うのが回転制御部32である。上回転であれば被写体の回転量θ_upに対して前述の移動制御部の場合と同様に、所定数a倍して表示情報をaθ_upだけ回転させる。当該定数aは下回転θ_down、右回転θ_right及び左回転θ_leftにおいても用いられているが、別の定数を用いてもよい。 FIG. 10 shows a chart summarizing the correspondence between FIGS. In FIG. 10, the rotation control unit 32 performs the display rotation (2) described in FIG. 8 based on the estimated posture (1) described in FIG. If the rotation is upward, the display information is rotated by aθ _up by multiplying the rotation amount θ _{up of the} subject by a predetermined number a as in the case of the movement control unit described above. The constant a is used in the downward rotation θ _down , the right rotation θ _right, and the left rotation θ _left , but another constant may be used.

前述の通り、推定姿勢(1)は実際の被写体の姿勢の角度であり、表示情報の回転(2)は当該推定される角度の変化に基づく回転量であるが、表記が煩雑になるため表示情報の回転(2)の欄では変化量を表すΔの表記を省略している。 As described above, the estimated posture (1) is the angle of the actual subject posture, and the rotation (2) of the display information is the amount of rotation based on the estimated angle change, but it is displayed because the notation is complicated In the column of information rotation (2), the notation of Δ representing the amount of change is omitted.

本発明では被写体が手であることから、手の回転量が90°付近になると図3で説明したような内部背景領域(C)及び基準点(F)が正常に求められなくなり、所望の制御が困難となる。しかし例えばaとしてa>1となる値を設定することで、基準位置からの一度の手の回転動作のみによっても、表示情報の一部を90°以上回転させることも可能となる。 Since the subject is a hand in the present invention, when the rotation amount of the hand is around 90 °, the internal background region (C) and the reference point (F) as described in FIG. It becomes difficult. However, by setting a value where a> 1 as a, for example, it is possible to rotate part of the display information by 90 ° or more only by a single hand rotation operation from the reference position.

また図10において、図9で説明した基準点・重心ベクトル(3)に基づいて図7で説明した姿勢(1)を推定するのが姿勢推定部23である。回転の上下左右の各方向への割り当ては、所定の閾値を設けていずれか１つのみの方向を割り当ててもよいし、フィッティングなどによって重み付けをして複数の方向を割り当ててもよい。例えば、上回転と右回転との同時の回転を推定してもよい。各方向における回転量の推定は、基準点・重心ベクトルの長さに比例させて行ってもよい。また、各方向における回転量の推定は、回転に伴う内部背景領域の見え方の変化が不均一である場合、回転量の変化が実際の手の回転量に比例して滑らかに推定されるよう補正する対応関係を用いてもよい。 In FIG. 10, the posture estimation unit 23 estimates the posture (1) described in FIG. 7 based on the reference point / centroid vector (3) described in FIG. As for the allocation to each of the upper, lower, left and right directions of rotation, only one direction may be allocated with a predetermined threshold value, or a plurality of directions may be allocated by weighting by fitting or the like. For example, simultaneous rotation of upward rotation and right rotation may be estimated. The amount of rotation in each direction may be estimated in proportion to the length of the reference point / centroid vector. In addition, the estimation of the amount of rotation in each direction is such that if the change in the appearance of the internal background region due to the rotation is uneven, the change in the amount of rotation is estimated smoothly in proportion to the actual amount of rotation of the hand. A correspondence relationship to be corrected may be used.

なお、図10のような対応関係は、被写体となる個別の手(指の長さ、太さなどで特徴づけられる)によって姿勢を推定するための具体的な値が異なる。そのような値は各被写体と各基準位置とに対して推定部2に予め設定しておくものとする。またこのような個別の手に対する個別設定値を利用するかわりに、代表的な手の形状と対応する基準位置を想定して、代表設定値を利用してもよい。 It should be noted that the correspondence values as shown in FIG. 10 have different specific values for estimating the posture depending on individual hands (characterized by finger length, thickness, etc.) as subjects. Such values are set in advance in the estimation unit 2 for each subject and each reference position. Further, instead of using such individual setting values for individual hands, representative setting values may be used assuming a reference position corresponding to a representative hand shape.

以上のような、図7乃至図9及び図10の姿勢推定及び回転制御の応用的実施形態として、第一指及び第二指の開いている幅を広げることにより、内部背景領域自体を大きくして、手の回転に伴う内部背景領域重心の移動量が大きくなることを利用して、表示制御される回転量をユーザ側で調整することができる。すなわち、同じ手の回転量でも、第一指及び第二指が近い場合は表示制御される回転量を小さく、第一指及び第二指が遠い場合は表示制御される回転量を大きくすることができる。 As an application of the posture estimation and rotation control of FIGS. 7 to 9 and FIG. 10 as described above, the inner background region itself is enlarged by widening the open width of the first finger and the second finger. Thus, the amount of rotation for display control can be adjusted on the user side by utilizing the fact that the amount of movement of the center of gravity of the internal background region accompanying the rotation of the hand increases. That is, even for the same hand rotation amount, the rotation amount controlled for display is small when the first finger and the second finger are close, and the rotation amount controlled for display is increased when the first finger and the second finger are far away. Can do.

なお、前述の通り図7乃至図9及び図10の姿勢推定及び回転制御は一例であって、その他の回転軸や基準位置を利用することもできる。 Note that, as described above, the posture estimation and the rotation control in FIGS. 7 to 9 and 10 are merely examples, and other rotation axes and reference positions can be used.

切替制御部33は、図4を用いて前述した推定形状が「開」であるか「閉」であるかに基づいて、制御部3での制御方式の切替などを行う。当該切替の各種利用については後述するが、一例としては表示部4に表示される情報の一部がポインタである場合に、周知のマウスポインタにおけるマウスボタンのプレス状態及びリリース状態に相当する状態を、前記推定形状の「閉」及び「開」に対応づけることができる。この場合、第一指及び第二指を、（「開」→）「閉」→「開」と動かすことで、マウスポインタにおけるクリック動作に対応する処理を実現できる。さらに当該ポインタの移動を図6の説明のように移動制御部31に担当させれば、マウスポインタと同様の処理を実現できる。 The switching control unit 33 performs switching of the control method in the control unit 3 based on whether the estimated shape described above with reference to FIG. 4 is “open” or “closed”. Various uses of the switching will be described later. As an example, when a part of information displayed on the display unit 4 is a pointer, the state corresponding to the press state and the release state of a mouse button in a known mouse pointer is used. , The estimated shape can be associated with “closed” and “open”. In this case, by moving the first finger and the second finger in the order of “(open) →” “close” → “open”, it is possible to realize processing corresponding to the click operation on the mouse pointer. Further, if the movement control unit 31 is in charge of the movement of the pointer as described in FIG. 6, the same processing as that of the mouse pointer can be realized.

倍率制御部34は、次に説明する撮像部1と被写体との推定距離の変化に基づいて、表示部4に表示される情報の一部を拡大縮小する。好ましい実施形態として、推定距離が小さいほど拡大表示し、推定距離が大きいほど縮小表示することで、表示情報の一部をあたかも手で取っているように、撮像部1に対する遠近と連動させて遠近表示することができる。すなわち倍率制御部34は好ましい実施形態として、図11に示すように、(1)の被写体である手と、(2)の表示情報の一部の例である直方体とが、手を近づけるほど直方体が拡大され、手を遠ざけるほど直方体が縮小されるような制御を行うことができる。 The magnification control unit 34 enlarges or reduces part of the information displayed on the display unit 4 based on a change in the estimated distance between the imaging unit 1 and the subject to be described next. As a preferred embodiment, the smaller the estimated distance, the larger the displayed distance, and the smaller the estimated distance, the smaller the displayed distance. Can be displayed. That is, as a preferred embodiment, as shown in FIG. 11, the magnification control unit 34 has a rectangular parallelepiped as the hand that is the subject of (1) and the rectangular parallelepiped that is an example of part of the display information of (2) are closer to each other. Can be controlled such that the cuboid is enlarged and the rectangular parallelepiped is reduced as the hand is moved away.

ここで、倍率制御部34で用いる推定距離は、図2のz軸として示す(c1)の距離に相当し、姿勢推定部23によって推定される。当該距離は、内部背景領域の面積によって推定される。所定の比例定数または対応関係などを用いて、当該面積が大きいほど距離は小さく、当該面積が小さいほど距離は大きいとして推定を行う。 Here, the estimated distance used in the magnification control unit 34 corresponds to the distance (c1) shown as the z-axis in FIG. 2, and is estimated by the posture estimation unit 23. The distance is estimated by the area of the internal background region. Using a predetermined proportional constant or correspondence relationship, the distance is smaller as the area is larger, and the distance is larger as the area is smaller.

なお、距離推定を行うのは、前記推定形状が「閉」である場合すなわち第一指及び第二指が閉じている場合のみに限定してもよい。この場合、第一指及び第二指はほぼ固定されるので、より確実に内部背景領域の面積から距離を推定でき、手全体の移動によって拡大縮小制御を行うことができる。逆に前記推定形状が「閉」でない場合も距離推定を行うようにすれば、手全体の移動に加えて、第一指及び第二指の開き具合と連動させた拡大縮小制御も行うようにすることができる。 The distance estimation may be performed only when the estimated shape is “closed”, that is, when the first finger and the second finger are closed. In this case, since the first finger and the second finger are substantially fixed, the distance can be more reliably estimated from the area of the internal background region, and the enlargement / reduction control can be performed by moving the entire hand. Conversely, if distance estimation is performed even when the estimated shape is not “closed”, in addition to movement of the entire hand, enlargement / reduction control linked to the degree of opening of the first and second fingers is also performed. can do.

以上のような、制御部3の各部(移動制御部31乃至倍率制御部34)の制御はいずれかのみを個別で利用してもよいし、複数の組み合わせを利用してもよい。組み合わせ利用の実施形態のうち好ましい例を以下に説明する。 As described above, only one of the control of each unit (movement control unit 31 to magnification control unit 34) of the control unit 3 may be used individually, or a plurality of combinations may be used. A preferable example among the embodiments using the combination will be described below.

例えば、制御される表示情報の一部としてポインタを扱う場合で、当該ポインタを移動制御部31により移動させる場合は、同時に回転制御部32を利用する状況はまれであると考えられるので、回転制御部32を利用しないようにする。このような状態で当該ポインタを移動させて、例えば表示部4に通信販売のウェブサイトが表示され、そのウェブページ内の商品画像にポインタを合わせることを考える。ここで当該商品を回転させて眺めるため、ポインタ移動制御から商品画像の回転制御に切り替えるのに、切替制御部33を利用することができる。 For example, when a pointer is handled as part of display information to be controlled and the pointer is moved by the movement control unit 31, it is considered rare to use the rotation control unit 32 at the same time. Do not use part 32. Consider moving the pointer in such a state to display, for example, a website for mail order on the display unit 4 and aligning the pointer with the product image in the web page. Here, since the product is rotated and viewed, the switching control unit 33 can be used to switch from the pointer movement control to the product image rotation control.

例えばポインタが商品画像上にあって且つ推定形状が「閉」(あるいは逆に「開」でもよい)ならば、制御対象をポインタから商品画像に切り替え、移動制御部31を利用せず回転制御部32を利用するように設定することができ、ポインタ位置を気にせずに商品の回転のみを行うようにすることができる。また制御対象の商品画像に対して回転制御部32による回転に加えて、倍率制御部34による倍率の制御を行うようにしてもよく、両者の利用でより高い現実感と共に当該商品を仮想的に手に取る操作を実現できる。別商品を選ぶ場合は形状を「開」(あるいは逆に「閉」)として別商品にポインタを移動させてから同様にすればよい。 For example, if the pointer is on the product image and the estimated shape is “closed” (or conversely “open”), the control target is switched from the pointer to the product image, and the rotation control unit is not used without using the movement control unit 31. 32 can be set so that only the product is rotated without worrying about the pointer position. Further, in addition to the rotation by the rotation control unit 32 on the product image to be controlled, the magnification control by the magnification control unit 34 may be performed. The operation to pick up can be realized. When selecting another product, the shape may be set to “open” (or conversely “closed”) and the pointer may be moved to the other product, and the same may be performed.

また、前述の図10で例えばa=1と設定して手の回転量と等しい量の回転制御を行うようにしている場合であれば、当該商品を選択後に回転制御に切り替えてある方向へ45°回転させて、一度形状を「開」（あるいは逆に「閉」）に戻して移動制御状態にして手の回転状態を基準位置に戻し、再度形状を「閉」（あるいは逆に「開」）として商品の回転制御を追加継続して行い同様の方向へ45°＋45°＝90°回転させる、といった利用もできる。 Further, in the case where, for example, a = 1 is set in FIG. 10 to perform rotation control of an amount equal to the rotation amount of the hand, the product is selected in the direction in which the rotation control is switched after selection of the product. Rotate to return the shape to “open” (or conversely “closed”), move to the movement control state, return the hand rotation to the reference position, and again “close” (or conversely “open”) ), The rotation control of the product can be continued and rotated 45 ° + 45 ° = 90 ° in the same direction.

また、制御される表示情報の一部が例えばスクロールバーやスクロール式メニューの場合、移動制御部31ではなくて回転制御部32のみを利用するようにしてもよい。この場合も制御対象および制御方式の切り替えに、例えば別のスクロール式メニューをスクロールさせたい場合などに、前述と同様に切替制御部33を利用することができる。 Further, when part of the display information to be controlled is, for example, a scroll bar or a scroll menu, only the rotation control unit 32 may be used instead of the movement control unit 31. Also in this case, the switching control unit 33 can be used in the same manner as described above when, for example, another scroll menu is desired to be switched for switching the control target and the control method.

表示部4において補助的に利用可能な補助表示部の例を図12に示す。図12では補助表示部40は表示部4の一部の領域を利用して設けられているが、別のディスプレイを利用して設けてもよい。補助表示部40には、図12に示されているように、撮像部1にて撮影している手などの被写体を含む画像を(撮像部1のサンプリング周期と同様の周期又は荒い周期により)リアルタイムで表示することができる。あるいは、図3で説明したような、被写体の手の画像に対して推定部2での処理を施して得られる肌色領域、外接多角形、基準点、などとして表示してもよい。 FIG. 12 shows an example of an auxiliary display unit that can be used supplementarily in the display unit 4. In FIG. 12, the auxiliary display unit 40 is provided by using a partial area of the display unit 4, but may be provided by using another display. As shown in FIG. 12, the auxiliary display unit 40 includes an image including a subject such as a hand photographed by the imaging unit 1 (by a cycle similar to the sampling cycle of the imaging unit 1 or a rough cycle). It can be displayed in real time. Alternatively, as described with reference to FIG. 3, the image may be displayed as a skin color region, circumscribed polygon, reference point, or the like obtained by performing processing in the estimation unit 2 on the image of the subject's hand.

補助表示部40の利用によって、図12に示すようにユーザが自身の手を動かして被写体の移動や回転を行い、それに連動させて直方体として例示されている表示情報の一部の移動や回転の制御を行うに際して、ユーザの気づかないうちに被写体が撮影画像から外れてしまい、手を動かしても制御が実施されないような状況になることを防止できるという効果がある。例えば手を移動させすぎて撮影映像内から手が消えてしまうことのないよう、補助表示部40を利用して確認することができる。また例えば手を回転させすぎて内部背景領域が得られなくなり、正常な制御が行えなくなることのないよう、補助表示部40を利用して随時確認することができる。 By using the auxiliary display unit 40, as shown in FIG. 12, the user moves and rotates the subject by moving his / her hand, and the movement and rotation of a part of the display information exemplified as a rectangular parallelepiped in conjunction with the movement. When the control is performed, there is an effect that it is possible to prevent the subject from being taken out of the captured image without the user's knowledge and the situation where the control is not performed even if the hand is moved. For example, it is possible to check using the auxiliary display unit 40 so that the hand does not disappear from the captured video image by moving the hand too much. Further, for example, the auxiliary display unit 40 can be used to check at any time so that the internal background region cannot be obtained due to excessive rotation of the hand and normal control cannot be performed.

上記のような補助表示部40の利用意義よりも明らかなように、補助表示部40での表示は、制御部3による制御対象である表示部4の一部の表示情報とは別である。すなわち、原則として補助表示部40は制御部3による表示制御とは独立に、表示を行う。ただし例外として、補助表示部40を利用するか否かの切替に切替制御部33を利用するなどしても構わない。 As is clear from the significance of the use of the auxiliary display unit 40 as described above, the display on the auxiliary display unit 40 is different from the display information of a part of the display unit 4 to be controlled by the control unit 3. That is, in principle, the auxiliary display unit 40 performs display independently of the display control by the control unit 3. However, as an exception, the switching control unit 33 may be used for switching whether to use the auxiliary display unit 40 or not.

なお、本発明による表示制御自体を行うか行わないかの切替に、前述の切替制御部33を利用してもよい。例えば撮影画像に手が映っていない状態から、手を基準位置に持ってきて、その状態で例えば前述のクリック動作、あるいは同様にダブルクリック動作などを行うことで、表示制御を開始するようにしてもよい。同様に表示制御を終了するようにしてもよい。このようにすることで基準位置に手を持ってくるまでの動作のような、ユーザが表示制御として意図しない動作の影響を除外することができる。 Note that the switching control unit 33 described above may be used for switching whether or not to perform display control itself according to the present invention. For example, display control is started by bringing the hand to a reference position from a state in which the hand is not shown in the photographed image and performing, for example, the above-described click operation or a double-click operation in that state. Also good. Similarly, display control may be terminated. By doing in this way, the influence of the operation | movement which a user does not intend as display control like the operation | movement until a hand is brought to a reference position can be excluded.

図13は本発明で利用可能な被写体で、手に限定されない一般的な例を示している。当該被写体は、被写体「手」における第一指及び第二指に対応する一対の半円環状部分A1及びA2と、被写体「手」における握り込んだ第三乃至第五指に対応する、A1及びA2で取り囲まれ且つその奥に位置する凸状部分Bを備える。A1及びA2はU字状又はV字状などであってもよく、「半円環状」とはこのような形状をも含むものとする。 FIG. 13 shows a general example that is a subject that can be used in the present invention and is not limited to a hand. The subject includes a pair of semi-annular portions A1 and A2 corresponding to the first finger and the second finger in the subject “hand”, and A1 and A5 corresponding to the third to fifth fingers grasped in the subject “hand”. A convex portion B is provided which is surrounded by A2 and located behind it. A1 and A2 may be U-shaped or V-shaped, and “semi-annular” includes such a shape.

A1及びA2は例えばその根元に位置する蝶番C（回転軸C）によって先端部分が開閉可能となっており、閉じることで円環状部分を形成する。このような一般的な被写体を利用する場合も、領域形成部21において必ずしも肌色ではない当該被写体の色特徴（全体に一様な色であることが好ましい）を利用して、前述の(数1)乃至(数3)などで説明したのと類似の手法により当該一般的な被写体の領域を抽出することができる。 A1 and A2 can be opened and closed by, for example, a hinge C (rotary axis C) located at the base thereof, and an annular portion is formed by closing. Even when such a general subject is used, the region forming unit 21 uses the color feature of the subject that is not necessarily a skin color (preferably a uniform color as a whole), and the above (Equation 1 ) To (Equation 3) can be used to extract the general subject area by a method similar to that described above.

10…情報端末装置、1…撮影部、2…推定部、21…領域形成部、22…特徴検出部、221…第一多角形形成部、222…内部背景抽出部、223…第二多角形形成部、23…姿勢推定部、3…制御部、4…表示部 DESCRIPTION OF SYMBOLS 10 ... Information terminal device, 1 ... Imaging | photography part, 2 ... Estimation part, 21 ... Area | region formation part, 22 ... Feature detection part, 221 ... First polygon formation part, 222 ... Internal background extraction part, 223 ... Second polygon Forming unit, 23 ... Attitude estimation unit, 3 ... Control unit, 4 ... Display unit

Claims

In an information terminal device including a photographing unit and a display unit that continuously photograph a subject,
An area forming unit that extracts an area of the subject based on color characteristics from a captured image;
A first polygon forming unit forming a first circumscribed polygon including the extracted subject area;
An internal background extraction unit that extracts an internal background region that excludes the region of the subject from the inside of the first circumscribed polygon;
A second polygon forming portion forming a second circumscribed polygon including the inner background region;
A posture estimation unit that estimates a position and a posture of the subject with respect to the photographing unit based on the first circumscribed polygon, the inner background region, and the second circumscribed polygon;
A control unit that controls a part of information displayed on the display unit based on the estimated position and orientation ;
The photographed image of the subject is an image of a hand that forms a step and a U-shaped or V-shaped region by gripping a finger and surrounding the gripped portion with another finger, and forming the region Extract the skin-colored area,
The posture estimation unit estimates the position based on a reference point extracted from the internal background region based on the shape of the region and corresponding to a corner portion formed by grasping,
The posture estimation unit estimates the posture based on a positional relationship between the reference point and the internal background region depending on the presence of the step;
The posture estimation unit further includes a finger that surrounds the shape of the subject based on whether there is an overlapping portion between the side of the first circumscribed polygon and the side of the second circumscribed polygon. Infers whether it is open away or closed close,
The control unit moves a part of the information displayed on the display unit based on the estimated change in position, and is displayed on the display unit based on the estimated change in posture. a rotation control unit for rotating a portion of that information, based on the estimated shape, characterized Rukoto and a switching control unit for switching the part of the state of the information displayed on the display unit Information terminal device.

In an information terminal device including a photographing unit and a display unit that continuously photograph a subject,
An area forming unit that extracts an area of the subject based on color characteristics from a captured image;
A first polygon forming unit forming a first circumscribed polygon including the extracted subject area;
An internal background extraction unit that extracts an internal background region that excludes the region of the subject from the inside of the first circumscribed polygon;
A second polygon forming portion forming a second circumscribed polygon including the inner background region;
A posture estimation unit that estimates a position and a posture of the subject with respect to the photographing unit based on the first circumscribed polygon, the inner background region, and the second circumscribed polygon;
A control unit that controls a part of information displayed on the display unit based on the estimated position and orientation ;
The photographed image of the subject grasps the third and fifth fingers to the back side, and the hand in a state where the first and second fingers surround the grasped portion from the front side, the first and second fingers The image is taken from the side, and the region forming unit extracts a skin color region,
The reference point that is extracted from the internal background region based on the shape of the region by the posture estimation unit and that corresponds to the corner portion of the second joint by the grasped third to fifth fingers Estimating the position based on
The posture estimation unit estimates the posture based on a positional relationship between the reference point and the internal background region depending on the existence of the back side and the near side;
The posture estimation unit further determines the shape of the subject based on whether or not there is an overlapping portion between the side of the first circumscribed polygon and the side of the second circumscribed polygon. Infers whether the second finger is open apart or closed in contact;
The control unit moves a part of the information displayed on the display unit based on the estimated change in position, and is displayed on the display unit based on the estimated change in posture. a rotation control unit for rotating a portion of that information, based on the estimated shape, characterized Rukoto and a switching control unit for switching the part of the state of the information displayed on the display unit Information terminal device.

3. The information terminal device according to claim 1, wherein the first circumscribed polygon and the second circumscribed polygon are convex polygons.

The internal background extraction unit, according to claim 1, wherein extracting the area of the maximum area of the first circumscribed polygon area that excludes a region of the subject from the inside as the inner background area The information terminal device according to any one of the above.

The posture estimation unit calculates the shortest distance from each point on the boundary of the internal background region to the side of the second circumscribed polygon without passing through the internal background region, and sets the shortest distance to the longest distance claims 1, characterized in that such a point on the boundary of the internal background region is extracted as the reference point and the information terminal apparatus according to any one of 4.

The posture estimation unit, the information terminal device according to any one of 5 claims 1 and estimates the posture based on the positional relationship between the center of gravity of the inner background area and the reference point.

The posture estimation unit compares a vector value starting from the reference point and ending at the center of gravity of the internal background area with a preset value of the vector when the subject is rotated in each direction from the reference position. 7. The information terminal device according to claim 6, wherein the posture is estimated by doing so.

Any said control unit is further of claims 1 to 7, characterized in that it comprises a magnification control unit for enlarging or reducing a portion of the information displayed on the display unit based on a change of the area of the inner background area An information terminal device according to any one of the above.

The display unit further the captured image is continuously captured by the imaging unit, of claims 1 to 8 to the control by the control unit, characterized in that it comprises an auxiliary display section for displaying independently The information terminal device according to any one of the above.

In a method executed by an information terminal device including an imaging unit and a display unit for continuously imaging a subject,
An area forming step of extracting the area of the subject based on color characteristics from the captured image;
A first polygon forming step of forming a first circumscribed polygon that encompasses the extracted subject area;
An internal background extraction step for extracting an internal background area excluding the area of the subject from the inside of the first circumscribed polygon;
A second polygon forming step for forming a second circumscribed polygon including the inner background region;
A posture estimation step of estimating a position and a posture of the subject with respect to the photographing unit based on the first circumscribed polygon, the inner background region, and the second circumscribed polygon;
A control step of controlling a part of information displayed on the display unit based on the estimated position and orientation;
The photographed image of the subject is an image of a hand that forms a step and a U-shaped or V-shaped region by gripping a finger and surrounding the gripped portion with another finger, and forming the region The stage extracts the skin color area,
The posture estimation step estimates the position based on a reference point extracted from the internal background region based on the shape of the region and corresponding to a corner portion formed by grasping;
The posture estimation step estimates the posture based on a positional relationship between the reference point and the internal background region depending on the presence of the step;
The posture estimation step further includes determining whether the surrounding finger has a shape of the subject based on whether there is an overlapping portion between the side of the first circumscribed polygon and the side of the second circumscribed polygon. Infers whether it is open away or closed close,
The control step is displayed on the display unit based on the movement control step of moving a part of information displayed on the display unit based on the estimated change in position and the estimated change in posture. A rotation control step for rotating a part of the information to be rotated, and a switching control step for switching a state of a part of the information displayed on the display unit based on the estimated shape. .

In a method executed by an information terminal device including an imaging unit and a display unit for continuously imaging a subject,
An area forming step of extracting the area of the subject based on color characteristics from the captured image;
A first polygon forming step of forming a first circumscribed polygon that encompasses the extracted subject area;
An internal background extraction step for extracting an internal background area excluding the area of the subject from the inside of the first circumscribed polygon;
A second polygon forming step for forming a second circumscribed polygon including the inner background region;
A posture estimation step of estimating a position and a posture of the subject with respect to the photographing unit based on the first circumscribed polygon, the inner background region, and the second circumscribed polygon;
A control step of controlling a part of information displayed on the display unit based on the estimated position and orientation;
The photographed image of the subject grasps the third and fifth fingers to the back side, and the hand in a state where the first and second fingers surround the grasped portion from the front side, the first and second fingers The image is taken from the side, and the region forming stage extracts a skin color region,
The reference point extracted from the internal background region based on the shape of the region, and the reference point corresponding to the corner portion of the second joint by the grasped third to fifth fingers. Estimating the position based on
The posture estimation step estimates the posture based on a positional relationship between the reference point and the internal background region depending on the existence of the back side and the near side,
The posture estimation step further includes determining the shape of the subject based on whether there is an overlapping portion between the side of the first circumscribed polygon and the side of the second circumscribed polygon. Infers whether the second finger is open apart or closed in contact;
The control step is displayed on the display unit based on the movement control step of moving a part of information displayed on the display unit based on the estimated change in position and the estimated change in posture. A rotation control step for rotating a part of the information to be rotated, and a switching control step for switching a state of a part of the information displayed on the display unit based on the estimated shape. .

10. A program for causing a computer to function as the information terminal device according to claim 1.