JP7163649B2

JP7163649B2 - GESTURE DETECTION DEVICE, GESTURE DETECTION METHOD, AND GESTURE DETECTION CONTROL PROGRAM

Info

Publication number: JP7163649B2
Application number: JP2018135291A
Authority: JP
Inventors: ヒブランベニテス; 宗伯浮田; 佳行津田
Original assignee: Toyota School Foundation; Denso Corp
Current assignee: Toyota School Foundation; Denso Corp
Priority date: 2018-07-18
Filing date: 2018-07-18
Publication date: 2022-11-01
Anticipated expiration: 2038-07-18
Also published as: JP2020013348A

Description

本発明は、入力操作に用いるジェスチャを検出するジェスチャ検出装置、ジェスチャ検出方法、およびジェスチャ検出制御プログラムに関するものである。 The present invention relates to a gesture detection device, a gesture detection method, and a gesture detection control program for detecting gestures used for input operations.

従来のジェスチャ検出装置として、例えば、特許文献１に記載されたものが知られている。特許文献１のジェスチャ検出装置（ハンドパターンスイッチ装置）は、操作者のジェスチャが意図をもって行われたものか、不用意に行われたものかを確実に識別するようにし、誤認識の発生を抑制している。具体的には、カメラによる撮像領域において、操作者が予め設定した手指の形状で、その手指の位置を所定時間にわたって、停止させたとき、検出装置は、操作者の意図を持ったジェスチャによる操作が開始されたものと認識するようになっている。そして、その後に、検出装置は、手指の形状と動きとに基づいてくジェスチャ操作の検出処理を実行するようになっている。 As a conventional gesture detection device, for example, one described in Patent Document 1 is known. The gesture detection device (hand pattern switch device) disclosed in Patent Literature 1 reliably distinguishes whether an operator's gesture is made intentionally or carelessly, thereby suppressing the occurrence of erroneous recognition. is doing. Specifically, when the operator stops the position of the finger in the shape of the finger preset by the operator in the imaging region of the camera for a predetermined time, the detection device detects the operation by the operator's intentional gesture. has started. Then, after that, the detection device executes a gesture operation detection process based on the shape and movement of the fingers.

特開２００４－１４５７２３号公報Japanese Patent Application Laid-Open No. 2004-145723

しかしながら、上記特許文献１では、検出装置は、操作者の意図を持ったジェスチャ操作の開始を認識するために、予め設定した手指の形状で、所定時間にわたって停止させる必要があり、ジェスチャ操作が煩雑（手間のかかる）ものとなっている。 However, in Patent Document 1, in order to recognize the start of an operator's intentional gesture operation, the detection device needs to be stopped for a predetermined period of time in a preset finger shape, and the gesture operation is complicated. It is a (time-consuming) thing.

本発明の目的は、上記問題に鑑み、ジェスチャ検出のための操作の手間を増やすことなく、操作者の意図を持ったジェスチャであることを認識可能とするジェスチャ検出装置、ジェスチャ検出方法、およびジェスチャ検出制御プログラムを提供することにある。 SUMMARY OF THE INVENTION In view of the above problems, an object of the present invention is to provide a gesture detection device, a gesture detection method, and a gesture that enable recognition of a gesture intended by an operator without increasing the time and effort required for gesture detection. An object of the present invention is to provide a detection control program.

本発明は上記目的を達成するために、以下の技術的手段を採用する。 The present invention adopts the following technical means in order to achieve the above objects.

第１の発明では、操作対象に対して、操作者のジェスチャに基づいて入力操作を行う際の、ジェスチャを検出するジェスチャ検出装置において、
撮像装置（１０）によって撮像された操作者の撮像データから抽出された特徴量データに基づいて、ジェスチャのうち、互いに類似する一組のジェスチャの先の方を開始直前ジェスチャ、後の方を終了直後ジェスチャとしてジェスチャの区間を抽出するジェスチャ区間抽出部（１２０）と、
開始直前ジェスチャと終了直後ジェスチャとの間に行われたジェスチャを、入力用ジェスチャとして識別するジェスチャ識別部（１３０）と、を備え、
ジェスチャ区間抽出部は、
互いに類似する複数組みのジェスチャについて、それぞれ類似度を算出する類似度算出部（１２２）と、
類似度算出部によって算出された類似度のうち、類似度の最も高い組のジェスチャを一組のジェスチャとして抽出し、開始直前ジェスチャ、および終了直後ジェスチャを決定する開始終了決定部（１２３）と、を有することを特徴としている。 In a first invention, in a gesture detection device for detecting a gesture when performing an input operation on an operation target based on a gesture of an operator,
Based on the feature amount data extracted from the imaging data of the operator imaged by the imaging device (10), the first gesture of a pair of gestures similar to each other is the gesture just before the start, and the latter is the end gesture. a gesture segment extraction unit (120) for extracting a segment of the gesture as the gesture immediately after;
a gesture identification unit (130) that identifies a gesture performed between the gesture just before the start and the gesture just after the end as an input gesture ;
The gesture segment extraction unit
a similarity calculation unit (122) that calculates the similarity for each of a plurality of sets of gestures that are similar to each other;
a start/end determination unit (123) that extracts a set of gestures having the highest similarity among the similarities calculated by the similarity calculation unit as a set of gestures, and determines a gesture immediately before the start and a gesture immediately after the end; It is characterized by having

また、第２の発明では、操作対象に対して、操作者のジェスチャに基づいて入力操作を行う際の、ジェスチャを検出するジェスチャ検出方法において、
撮像装置（１０）によって撮像された操作者の撮像データから抽出された特徴量データに基づいて、ジェスチャのうち、互いに類似する一組のジェスチャの先の方を開始直前ジェスチャ、後の方を終了直後ジェスチャとしてジェスチャの区間を抽出し、
開始直前ジェスチャと終了直後ジェスチャとの間に行われたジェスチャを、入力用ジェスチャとして識別すると共に、
互いに類似する複数組みのジェスチャについて、それぞれ類似度を算出し、
算出された類似度のうち、類似度の最も高い組のジェスチャを一組のジェスチャとして抽出し、開始直前ジェスチャ、および終了直後ジェスチャを決定することを特徴としている。 Further, in the second invention, in a gesture detection method for detecting a gesture when performing an input operation on an operation target based on a gesture of an operator,
Based on the feature amount data extracted from the imaging data of the operator imaged by the imaging device (10), the first gesture of a pair of gestures similar to each other is the gesture just before the start, and the latter is the end gesture. Immediately after extracting the gesture interval as a gesture,
identifying gestures made between the just-beginning gesture and the just-ending gesture as input gestures ;
Calculating the degree of similarity for each of a plurality of sets of gestures that are similar to each other,
A gesture having the highest similarity among the calculated similarities is extracted as a set of gestures, and a gesture immediately before the start and a gesture immediately after the end are determined .

また、第３の発明では、操作対象に対して、操作者のジェスチャに基づいて入力操作を行う際の、ジェスチャを検出するジェスチャ検出制御プログラムにおいて、
コンピュータを、
撮像装置（１０）によって撮像された操作者の撮像データから抽出された特徴量データに基づいて、ジェスチャのうち、互いに類似する一組のジェスチャの先の方を開始直前ジェスチャ、後の方を終了直後ジェスチャとしてジェスチャの区間を抽出するジェスチャ区間抽出部（１２０）と、
開始直前ジェスチャと終了直後ジェスチャとの間に行われたジェスチャを、入力用ジェスチャとして識別するジェスチャ識別部（１３０）として機能させると共に、
ジェスチャ区間抽出部において、
互いに類似する複数組みのジェスチャについて、それぞれ類似度を算出する類似度算出部（１２２）と、
類似度算出部によって算出された類似度のうち、類似度の最も高い組のジェスチャを一組のジェスチャとして抽出し、開始直前ジェスチャ、および終了直後ジェスチャを決定する開始終了決定部（１２３）として機能させることを特徴としている。 Further, in the third invention, in the gesture detection control program for detecting the gesture when performing the input operation on the operation target based on the operator's gesture,
the computer,
Based on the feature amount data extracted from the imaging data of the operator imaged by the imaging device (10), the first gesture of a pair of gestures similar to each other is the gesture just before the start, and the latter is the end gesture. a gesture segment extraction unit (120) for extracting a segment of the gesture as the gesture immediately after;
Functioning as a gesture identification unit (130) for identifying a gesture performed between the gesture immediately before the start and the gesture immediately after the end as an input gesture ,
In the gesture segment extraction unit,
a similarity calculation unit (122) that calculates the similarity for each of a plurality of sets of gestures that are similar to each other;
Functions as a start/end determination unit (123) that extracts a set of gestures with the highest similarity among the similarities calculated by the similarity calculation unit as a set of gestures, and determines the gesture just before the start and the gesture just after the end. It is characterized by

一般的に、操作者がジェスチャを行う際の開始時と終了時では、類似した姿勢を取る。即ち、操作者は所定のジェスチャを開始するときは、まず、所定の手指形状（ジェスチャの基本形状）で手をかざす行為をとり、所定のジェスチャを実行した後は、最初と同じ所定の手指形状を維持して、ジェスチャを終了する。 In general, the operator takes similar postures at the start and end of gestures. That is, when starting a predetermined gesture, the operator first takes an action of holding his/her hand in a predetermined finger shape (basic shape of the gesture), and after executing the predetermined gesture, performs the same predetermined finger shape as the beginning. to end the gesture.

よって、本発明によれば、互いに類似する一組のジェスチャを捉えることで、開始直前ジェスチャおよび終了直後ジェスチャを抽出して、ジェスチャの区間を抽出することが可能となる。そして、開始直前ジェスチャと終了直後ジェスチャとの間に行われたジェスチャを、入力用ジェスチャとして識別することで、例えば、不用意に行われたジェスチャではなく、操作者の意図を持ったジェスチャとして確実に認識することが可能となる。よって、従来技術のように、わざわざジェスチャ検出のための操作の手間を増やすことなく、入力用のジェスチャを認識することができ、ひいては、誤認識の発生を抑制することができる。 Therefore, according to the present invention, by capturing a set of gestures that are similar to each other, it is possible to extract the gesture just before the start and the gesture just after the end, thereby extracting the gesture section. By identifying the gesture made between the gesture just before the start and the gesture just after the end as the gesture for input, for example, it can be ensured that it is a gesture intended by the operator, not a gesture made carelessly. can be recognized. Therefore, it is possible to recognize the gesture for input without increasing the trouble of operation for gesture detection as in the conventional technology, and furthermore, it is possible to suppress the occurrence of erroneous recognition.

尚、上記各手段の括弧内の符号は、後述する実施形態記載の具体的手段との対応関係を示すものである。 It should be noted that the reference numerals in parentheses of the above means indicate the corresponding relationship with specific means described in the embodiments to be described later.

ジェスチャ検出装置を示すブロック図である。1 is a block diagram showing a gesture detection device; FIG. ジェスチャの開始終了の候補を抽出する際の要領を示す説明図である。FIG. 10 is an explanatory diagram showing a procedure for extracting start/end candidates of a gesture; ジェスチャの開始終了候補の類似度を算出する際の要領を示す説明図である。FIG. 11 is an explanatory diagram showing a procedure for calculating a similarity between gesture start/end candidates; ジェスチャの開始終了を決定する際の要領を示す説明図である。FIG. 10 is an explanatory diagram showing a procedure for determining the start and end of a gesture;

以下に、図面を参照しながら本発明を実施するための複数の形態を説明する。各形態において先行する形態で説明した事項に対応する部分には同一の参照符号を付して重複する説明を省略する場合がある。各形態において構成の一部のみを説明している場合は、構成の他の部分については先行して説明した他の形態を適用することができる。各実施形態で具体的に組み合わせが可能であることを明示している部分同士の組み合わせばかりではなく、特に組み合わせに支障が生じなければ、明示していなくても実施形態同士を部分的に組み合せることも可能である。 A plurality of modes for carrying out the present invention will be described below with reference to the drawings. In each form, the same reference numerals may be given to the parts corresponding to the matters described in the preceding form, and overlapping explanations may be omitted. When only a part of the configuration is described in each form, the previously described other forms can be applied to other parts of the configuration. Not only combinations of parts that are explicitly stated that combinations are possible in each embodiment, but also partial combinations of embodiments even if they are not explicitly stated unless there is a particular problem with the combination. is also possible.

（第１実施形態）
第１実施形態のジェスチャ検出装置１００について図１～図４を用いて説明する。本実施形態のジェスチャ検出装置１００は、車両に搭載され、運転者（操作者）の体の特定部位の動き（ジェスチャ）に基づいて、各種車両機器に対する入力操作を行う際の、ジェスチャを検出する装置となっている。 (First embodiment)
A gesture detection device 100 according to the first embodiment will be described with reference to FIGS. 1 to 4. FIG. A gesture detection device 100 according to the present embodiment is mounted on a vehicle, and detects gestures when performing input operations on various vehicle devices based on movements (gestures) of specific parts of the body of a driver (operator). It is a device.

各種車両機器としては、例えば、車室内の空調を行う空調装置、自車の現在位置表示あるいは目的地への案内表示等を行うカーナビゲーション装置（以下、カーナビ装置）、テレビ放映、ラジオ放送、ＣＤ／ＤＶＤの再生等を行うオーディオ装置等、がある。これらの各種車両機器は、本発明の操作対象に対応する。尚、各種車両機器としては、上記に限らず、ヘッドアップディスプレイ装置、ルームランプ装置、後席サンシェード装置、電動シート装置、グローボックス開閉装置等がある。 Various types of vehicle equipment include, for example, an air conditioner that air-conditions the interior of the vehicle, a car navigation device that displays the current position of the vehicle or displays guidance to a destination (hereinafter referred to as a car navigation device), television broadcasting, radio broadcasting, CD / Audio device for playing back DVDs, etc. These various vehicle devices correspond to the operation targets of the present invention. Various types of vehicle equipment are not limited to those described above, but include a head-up display device, a room lamp device, a rear seat sunshade device, an electric seat device, a glove box opening and closing device, and the like.

ジェスチャ検出装置１００は、図１に示すように、撮像装置１０によって撮像された運転者の体の特定の部位の撮像データをもとに、入力操作用のジェスチャを検出する。検出されたジェスチャに基づいて、各種車両機器が操作され、操作の結果（作動状態等）が情報機器２０に表示されるようになっている。 As shown in FIG. 1 , the gesture detection device 100 detects gestures for input operations based on imaging data of a specific part of the driver's body imaged by the imaging device 10 . Various vehicle devices are operated based on the detected gesture, and the result of the operation (operating state, etc.) is displayed on the information device 20 .

運転者の体の特定部位としては、例えば、運転者の手の指、手の平、腕等とすることができる。本実施形態では、入力操作用のジェスチャとして、主に手の指（指差しジェスチャ、サークルジェスチャ、ウェーブジェスチャ、フリックジェスチャ等）を用いるものとしている。 The specific part of the driver's body can be, for example, the finger, palm, arm, or the like of the driver's hand. In this embodiment, as gestures for input operation, fingers (pointing gesture, circle gesture, wave gesture, flick gesture, etc.) are mainly used.

尚、各種車両機器は、入力操作用のジェスチャによって、例えば、空調装置であると、設定温度の変更、空調風の風量の変更等が行われ、また、カーナビ装置では、地図の拡大縮小、目的地設定等が行われ、また、オーディオ装置では、テレビ局、ラジオ局の変更、楽曲の選択、音量の変更等が行われる。 In the case of an air conditioner, the set temperature and the air volume of the air-conditioned air can be changed, for example, by gestures for input operation of various vehicle devices. On the audio device, the TV station and radio station are changed, music is selected, volume is changed, and so on.

撮像装置１０は、運転者の体の特定部位の動きを連続的に（時間経過と共に）撮像し、撮像データをジェスチャ検出装置１００（手検出部１１０）に出力するようになっている。 The imaging device 10 continuously (as time elapses) captures motion of a specific part of the driver's body, and outputs the captured data to the gesture detection device 100 (hand detection unit 110).

撮像装置１０としては、対象物の輝度画像を形成するカメラ、距離画像を形成する距離画像センサ、あるいはそれらの組合せを用いることができる。カメラとしては、近赤外線を捉える近赤外線カメラ、あるいは可視光を捉える可視光カメラ等がある。 As the imaging device 10, a camera that forms a luminance image of an object, a distance image sensor that forms a distance image, or a combination thereof can be used. Examples of cameras include a near-infrared camera that captures near-infrared light, a visible light camera that captures visible light, and the like.

また、距離画像センサとしては、例えば、複数のカメラで同時に撮影して視差から奥行方向の情報を計測するステレオカメラ、あるいは、光源からの光が対象物で反射して返るまでの時間で奥行きを計測するＴｏＦ（Time of Flight）カメラ等がある。本実施形態では、撮像装置１０としては、カメラを用いたものとしている。 In addition, as a range image sensor, for example, a stereo camera that measures information in the depth direction from parallax by photographing simultaneously with multiple cameras, or depth is measured by the time it takes for the light from the light source to be reflected by the target object. There are ToF (Time of Flight) cameras and the like for measurement. In this embodiment, a camera is used as the imaging device 10 .

情報機器２０は、各種車両機器の作動状態等を表示する表示部であり、空調表示部、カーナビ表示部、およびオーディオ表示部等がある。情報機器２０は、例えば、液晶ディスプレイあるいは有機ＥＬディスプレイ等によって形成されている。 The information device 20 is a display unit that displays the operation states of various vehicle devices, and includes an air conditioning display unit, a car navigation display unit, an audio display unit, and the like. The information equipment 20 is formed by, for example, a liquid crystal display or an organic EL display.

ジェスチャ検出装置１００は、例えば、ＣＰＵ、ＲＯＭ、ＲＡＭ等を備えたコンピュータであり、手検出部１１０、ジェスチャ区間抽出部１２０、およびジェスチャ識別部１３０等を備えている。 The gesture detection device 100 is, for example, a computer including a CPU, a ROM, a RAM, etc., and includes a hand detection unit 110, a gesture segment extraction unit 120, a gesture identification unit 130, and the like.

手検出部１１０は、撮像装置１０によって撮像された撮像データ（画像）から、運転者の手が存在する領域を特定して、特定した領域のデータをジェスチャ区間抽出部１２０に出力するようになっている。尚、手を検出する手法としては、例えば、以下の文献等に示されている。
・M. Kolsch and M. Turk.: Robust Hand Detection. FGR, 2004
・Wei Fan, Li Chen, Wei Liu, Yuan He, Jun Sun, Shutao Li: Monocular Vision Based Relative Depth Estimation for Hand Gesture Recognition. MVA 2013 。 The hand detection unit 110 identifies an area where the driver's hand exists from the imaging data (image) captured by the imaging device 10, and outputs data of the identified area to the gesture segment extraction unit 120. ing. Techniques for detecting a hand are disclosed, for example, in the following documents.
・M. Kolsch and M. Turk.: Robust Hand Detection. FGR, 2004
・Wei Fan, Li Chen, Wei Liu, Yuan He, Jun Sun, Shutao Li: Monocular Vision Based Relative Depth Estimation for Hand Gesture Recognition. MVA 2013.

ジェスチャ区間抽出部１２０は、運転者が入力操作のために意図的に行ったジェスチャの区間を抽出する部位であり、開始終了候補抽出部１２１、類似度算出部１２２、および開始終了決定部１２３等を有している（詳細後述）。 The gesture segment extraction unit 120 is a part that extracts a segment of a gesture intentionally made by the driver for an input operation, and includes a start/end candidate extraction unit 121, a similarity calculation unit 122, a start/end determination unit 123, and the like. (details will be described later).

ジェスチャ識別部１３０は、ジェスチャ区間抽出部１２０によって抽出された区間のジェスチャを、入力操作のために運転者が意図的に行ったジェスチャとして識別する部位である。ジェスチャ識別部１３０は、識別したジェスチャを各種車両機器、および情報機器２０に出力する部位となっている。 The gesture identification unit 130 is a part that identifies a gesture in the section extracted by the gesture section extraction unit 120 as a gesture intentionally made by the driver for an input operation. The gesture identification unit 130 is a part that outputs the identified gestures to various vehicle devices and the information device 20 .

本実施形態のジェスチャ検出装置１００は、以上のような構成となっており、以下、図２～図４を加えて、作動および作用効果について説明する。 The gesture detection device 100 of the present embodiment is configured as described above, and the operation and effects will be described below with reference to FIGS. 2 to 4. FIG.

撮像装置１０によって撮像された撮像データは、手検出部１１０に出力される。手検出部１１０は、撮像データにおいて、手が存在する領域を特定して、特定した領域のデータを、ジェスチャ区間抽出部１２０の開始終了候補抽出部１２１に出力する。 Imaged data imaged by the imaging device 10 is output to the hand detection unit 110 . Hand detection section 110 identifies a region in which a hand exists in the imaged data, and outputs data of the identified region to start/end candidate extraction section 121 of gesture segment extraction section 120 .

開始終了候補抽出部１２１は、図２に示すように、手検出部１１０から出力された撮像データのフレームから、ジェスチャの開始時および終了時のフレームの候補を抽出する。ここで、ジェスチャの開始時のフレームというのは、運転手がジェスチャを行う際の開始時の状態を示すフレームである。例えば、実行するジェスチャを、フリックジェスチャとすると、ジェスチャ開始時というのは、まず、指を立てた手をかざした状態に対応する。また、ジェスチャの終了時というのは、フリックジェスチャを終えたときに、開始時と同様に、指を立てた手の姿勢が維持された状態に対応する。 The start/end candidate extraction unit 121 extracts frame candidates for the start and end of the gesture from the frames of the imaging data output from the hand detection unit 110, as shown in FIG. Here, the frame at the start of the gesture is a frame that indicates the state at the start of the gesture by the driver. For example, if the gesture to be executed is a flick gesture, the start of the gesture first corresponds to a state in which the finger is held up. Also, when the gesture ends, when the flick gesture ends, it corresponds to a state in which the posture of the hand with the finger up is maintained in the same manner as when starting the flick gesture.

開始終了候補抽出部１２１は、３枚以上の連続した撮像フレーム（撮像データ）を用いて、１つのフレームに基づく静的な特徴量（特徴量データ）と、複数のフレームに基づく動的な特徴量（特徴量データ）とを抽出し、それらを結合して１つの特徴量を作成する。好適には、静的特徴量としてＨＯＧ（Histogram of Oriented Gradient）、動的特徴量としてＨｏＯＦ（Histogram of Optical Flow）が使用できる。 The start/end candidate extraction unit 121 uses three or more consecutive imaging frames (imaging data) to extract a static feature amount (feature amount data) based on one frame and dynamic features based on a plurality of frames. Quantities (feature quantity data) are extracted and combined to create one feature quantity. Preferably, HOG (Histogram of Oriented Gradient) can be used as the static feature amount, and HoOF (Histogram of Optical Flow) can be used as the dynamic feature amount.

開始終了候補抽出部１２１のデータベースには、予め、ジェスチャの開始時および終了時の特徴量と、それ以外の時の特徴量との分類基準を学習した基準データが記憶されている。そして、開始終了候補抽出部１２１は、実際の撮像フレームの中から抽出した特徴量を基準データにより分類し、開始時および終了時の特徴量としての基準を満たすものを候補として抽出する。学習・分類には機械学習の手法を用いることができ、好適にはRandom Forestが使用できる。図２では、連続する撮像フレームのうち、符号１、２、３、４で表示された部分が、ジェスチャの開始時および終了時の候補として複数抽出された部分（フレーム）となっている。 The database of the start/end candidate extracting unit 121 stores in advance reference data obtained by learning classification criteria for feature amounts at the start and end of a gesture and feature amounts at other times. Then, the start/end candidate extracting unit 121 classifies the feature amount extracted from the actual captured frame according to the reference data, and extracts the feature amount that satisfies the reference as the feature amount at the start and end as candidates. A machine learning method can be used for learning and classification, and Random Forest can be preferably used. In FIG. 2 , the portions indicated by reference numerals 1, 2, 3, and 4 among the continuous captured frames are portions (frames) extracted as candidates for the start and end of the gesture.

類似度算出部１２２は、開始終了候補抽出部１２１によって抽出された複数の候補のうち、２つで一組みとなるすべての組合せを形成して、各組の候補の撮像フレームから、静的な特徴量（特徴量データ）を抽出し、抽出した特徴量を用いて、各組ごとの類似度を算出する。図３では、２つで一組みとなるすべての組み合せは、符号１と２、符号１と３、符号１と４、符号２と３、符号２と４、符号３と４であることを示している。類似度算出部１２２は、これらのすべての組み合せについて、２つの候補の類似度をそれぞれ算出するのである。 The similarity calculation unit 122 forms all combinations of two of the plurality of candidates extracted by the start/end candidate extraction unit 121, and calculates static A feature amount (feature amount data) is extracted, and the degree of similarity for each pair is calculated using the extracted feature amount. FIG. 3 shows that all pairs of pairs are 1 and 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, and 3 and 4. ing. The similarity calculation unit 122 calculates the similarity of each of the two candidates for all these combinations.

開始終了決定部１２３は、類似度算出部１２２によって算出された類似度のうち、最も高い類似度を持つ組み合せ（一組）を抽出し、その組み合せにおいて、早い方（先の方）を開始直前ジェスチャのフレームとし、また遅い方（後の方）を終了直後のジェスチャのフレームとして決定する。そして、開始終了決定部１２３は、開始直前のフレームと終了直後のフレームの間のフレームを、運転者が入力操作のために行ったジェスチャの区間として決定する。図４では、図３における各組み合せのうち、各符号１と２の組み合せの類似度が最も高く、符号１と２の間が、ジェスチャ区間として決定されたことを示している。 The start/end determination unit 123 extracts a combination (one pair) having the highest similarity among the similarities calculated by the similarity calculation unit 122, The frame of the gesture is determined, and the later one (later) is determined as the frame of the gesture immediately after the end. Then, the start/end determination unit 123 determines a frame between the frame immediately before the start and the frame immediately after the end as the segment of the gesture performed by the driver for the input operation. FIG. 4 shows that the combination of codes 1 and 2 has the highest similarity among the combinations in FIG. 3, and the period between codes 1 and 2 is determined as the gesture section.

そして、ジェスチャ識別部１３０は、開始終了決定部１２３によって決定されたジェスチャ区間の全撮像フレームについて、静的な特徴量と動的な特徴量とを抽出する。ジェスチャ識別部１３０のデータベースには、予め、各ジェスチャ動作の特徴量を分類する基準を示すジェスチャ基準データが記憶されている。そして、ジェスチャ識別部１３０は、ジェスチャ基準データに基づき、ジェスチャ区間における撮像フレームの特徴量が、どのジェスチャ動作のものか識別（分類）する。 Then, the gesture identification unit 130 extracts static feature amounts and dynamic feature amounts for all captured frames in the gesture period determined by the start/end determination unit 123 . The database of the gesture identification unit 130 stores in advance gesture reference data indicating a reference for classifying the feature amount of each gesture motion. Based on the gesture reference data, the gesture identification unit 130 identifies (classifies) which gesture motion the feature amount of the captured frame in the gesture period belongs to.

ジェスチャ識別部１３０は、識別したジェスチャを各種車両機器、および各情報機器２０に出力することで、各種車両機器は、ジェスチャに応じた入力操作が行われ、各情報機器２０には、操作された作動状態等が表示される。 Gesture identification unit 130 outputs the identified gesture to various vehicle equipment and each information equipment 20, so that various vehicle equipment performs an input operation corresponding to the gesture, and each information equipment 20 receives an input operation corresponding to the gesture. Operation status etc. are displayed.

一般的に、運転手がジェスチャを行う際の開始時と終了時では、類似した姿勢を取る。即ち、運転手は所定のジェスチャを開始するときは、まず、所定の手指形状（ジェスチャの基本形状）で手をかざす行為をとり、所定のジェスチャを実行した後は、最初と同じ所定の手指形状を維持して、ジェスチャを終了する。 In general, the driver assumes similar postures at the beginning and at the end of gesturing. That is, when starting a predetermined gesture, the driver first takes an action of holding his/her hand in a predetermined finger shape (basic shape of the gesture), and after executing the predetermined gesture, the driver performs the same predetermined finger shape as the beginning. to end the gesture.

よって、本実施形態によれば、ジェスチャ区間抽出部１２０は、互いに類似する一組のジェスチャを捉えることで、開始直前ジェスチャおよび終了直後ジェスチャを抽出して、ジェスチャの区間を抽出することが可能となる。そして、ジェスチャ識別部１３０は、開始直前ジェスチャと終了直後ジェスチャとの間に行われたジェスチャを、入力用ジェスチャとして識別することで、例えば、不用意に行われたジェスチャではなく、運転主の意図を持ったジェスチャとして確実に認識することが可能となる。よって、従来技術のように、わざわざジェスチャ検出のための操作の手間を増やすことなく、入力用のジェスチャを認識することができ、ひいては、誤認識の発生を抑制することができる。 Therefore, according to the present embodiment, the gesture segment extraction unit 120 captures a pair of gestures that are similar to each other, extracts the gesture just before the start and the gesture just after the end, and extracts the gesture segment. Become. Then, the gesture identification unit 130 identifies the gesture made between the gesture just before the start and the gesture just after the end as an input gesture, so that, for example, the gesture is not an inadvertent gesture, but the intention of the driver. can be reliably recognized as a gesture with Therefore, it is possible to recognize the gesture for input without increasing the trouble of operation for gesture detection as in the conventional technology, and furthermore, it is possible to suppress the occurrence of erroneous recognition.

また、本実施形態では、類似度算出部１２２が処理を行う前段階で、互いに類似する複数組みのジェスチャの候補を抽出する開始終了候補抽出部１２１を設けている。これにより、複数組みのジェスチャの候補を予め抽出することで、類似度算出部１２２において、各組の候補に対して類似度を順序立てて計算することが可能となり、計算負荷を減らすことができる。 Further, in the present embodiment, the start/end candidate extraction unit 121 is provided to extract a plurality of sets of mutually similar gesture candidates before the similarity calculation unit 122 performs processing. As a result, by extracting a plurality of sets of gesture candidates in advance, the similarity calculation unit 122 can sequentially calculate the similarity for each set of candidates, thereby reducing the calculation load. .

（その他の実施形態）
上記第１実施形態では、手検出部１１０を設けるものとして説明したが、手検出部１１０を廃止して、撮像装置１０によって撮像された撮像データをジェスチャ区間抽出部１２０に出力するものとしてもよい。 (Other embodiments)
In the first embodiment described above, the hand detection unit 110 is provided, but the hand detection unit 110 may be eliminated, and the imaging data captured by the imaging device 10 may be output to the gesture segment extraction unit 120. .

上記第１実施形態では、開始終了候補抽出部１２１を設けるものとして説明したが、計算の負荷を重視しないのであれば、開始終了候補抽出部１２１は、廃止したものとしてもよい。 In the first embodiment described above, the start/end candidate extracting unit 121 is provided, but the start/end candidate extracting unit 121 may be eliminated if the calculation load is not considered important.

また、対象となる操作者は、運転者に限らず、助手席者としてもよい。この場合、助手席者も、ジェスチャを行うことで、ジェスチャ検出装置１００によるジェスチャ検出が行われて、各種車両機器の操作が可能となる。 Also, the target operator is not limited to the driver, and may be a passenger. In this case, the passenger's gesture is also detected by the gesture detection device 100, and various vehicle devices can be operated.

尚、上記第１実施形態で説明したジェスチャ検出装置１００が実行する内容は、本発明の「ジェスチャ検出方法」に対応する。また、上記第１実施形態では、ジェスチャ検出のために、ジェスチャ検出装置１００を用いたものとして説明したが、例えば、センタサーバーやクラウド等の記憶媒体に記憶された制御プログラムによって、コンピュータが、各部１１０、１２０、１３０として機能するようにすることで、ジェスチャ検出が可能である。 The contents executed by the gesture detection device 100 described in the first embodiment correspond to the "gesture detection method" of the present invention. Further, in the above-described first embodiment, the gesture detection device 100 is used for gesture detection. By having them function as 110, 120, 130, gesture detection is possible.

１０撮像装置
１００ジェスチャ検出装置
１２０ジェスチャ区間抽出部
１２１開始終了候補抽出部
１２２類似度算出部
１２３開始終了決定部
１３０ジェスチャ識別部 10 imaging device 100 gesture detection device 120 gesture segment extraction unit 121 start/end candidate extraction unit 122 similarity calculation unit 123 start/end determination unit 130 gesture identification unit

Claims

A gesture detection device for detecting a gesture when performing an input operation on an operation target based on a gesture of an operator,
Based on the feature data extracted from the imaging data of the operator captured by an imaging device (10), the first gesture of a pair of gestures similar to each other among the gestures is referred to as the immediately-before-start gesture, and the subsequent one. a gesture segment extraction unit (120) for extracting a segment of the gesture as a gesture immediately after the end of
a gesture identification unit (130) that identifies the gesture performed between the gesture immediately before the start and the gesture immediately after the end as an input gesture ;
The gesture segment extraction unit
a similarity calculation unit (122) that calculates the similarity for each of a plurality of sets of gestures that are similar to each other;
A start/end determination of extracting a set of gestures having the highest degree of similarity among the degrees of similarity calculated by the degree of similarity calculation unit as the set of gestures, and determining the gesture immediately before the start and the gesture immediately after the end. (123) .

2. The gesture detection device according to claim 1 , wherein the gesture segment extraction unit has a candidate extraction unit (121) for extracting in advance the plurality of sets of similar gesture candidates.

A gesture detection method for detecting a gesture when performing an input operation on an operation target based on a gesture of an operator,
Based on the feature amount data extracted from the imaging data of the operator imaged by the imaging device (10), among the gestures, the first gesture of a pair of gestures similar to each other is the gesture just before the start, and the latter one. is extracted as a gesture immediately after the end of
Identifying the gesture made between the just-beginning gesture and the just-ending gesture as an input gesturewith
Calculating the degree of similarity for each of a plurality of sets of gestures that are similar to each other,
Among the calculated degrees of similarity, a set of gestures with the highest degree of similarity is extracted as the set of gestures, and the gesture just before the start and the gesture just after the end are determined.Gesture detection method.

In a gesture detection control program for detecting the gesture when performing an input operation on an operation target based on the gesture of the operator,
the computer,
Based on the feature data extracted from the imaging data of the operator captured by an imaging device (10), the first gesture of a pair of gestures similar to each other among the gestures is referred to as the immediately-before-start gesture, and the subsequent one. a gesture segment extraction unit (120) for extracting a segment of the gesture as a gesture immediately after the end of
Functioning as a gesture identification unit (130) that identifies the gesture performed between the gesture immediately before the start and the gesture immediately after the end as an input gesture ,
In the gesture segment extraction unit,
a similarity calculation unit (122) that calculates the similarity for each of a plurality of sets of gestures that are similar to each other;
A start/end determination of extracting a set of gestures having the highest degree of similarity among the degrees of similarity calculated by the degree of similarity calculation unit as the set of gestures, and determining the gesture immediately before the start and the gesture immediately after the end. A gesture detection control program functioning as a part (123) .