JP2012108849A

JP2012108849A - Recognition device and recognition method

Info

Publication number: JP2012108849A
Application number: JP2010258925A
Authority: JP
Inventors: Mikiko Nakanishi; 美木子中西; Tsutomu Horikoshi; 力堀越
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2010-11-19
Filing date: 2010-11-19
Publication date: 2012-06-07

Abstract

PROBLEM TO BE SOLVED: To enable a device without a touch panel to recognize input operation on the basis of the motion of a user with lighter processing loads.SOLUTION: A recognition device 1 recognizing input operation on the basis of the motion of a user includes: a motion determining part 23 for determining a motion of the user on the basis of image data obtained by capturing the motion of the user; and an output part 25 for outputting information on the input operation on the basis of the determined motion of the user. A device without a touch panel can recognize input operation on the basis of the motion of the user, accordingly. The recognition device 1 includes an extraction part 21 for extracting image data represented by one-dimensional data from image data composed of one- or higher dimensional data. The motion determining part 23 determines the motion of the user on the basis of the image data represented by the one-dimensional data, thereby reducing processing loads in the determination.

Description

本発明は、ユーザの動作に基づき入力操作を認識する認識装置及び認識方法に関する。 The present invention relates to a recognition device and a recognition method for recognizing an input operation based on a user's operation.

近年において、携帯端末及び電子書籍用端末といった端末において、ユーザがディスプレイ上で指を左右にスライドさせることにより、本のページをめくるページ送り動作を認識する機能を有するものがある。かかる機能は、例えば、端末のディスプレイをタッチパネルにより構成することにより実現される。また、カメラで取得した画像から手が撮像された部分を抽出し、認識された所定の手の動きを装置に対する入力情報として用いる技術が知られている（例えば、特許文献１参照）。 In recent years, some terminals such as a portable terminal and an electronic book terminal have a function of recognizing a page turning operation of turning a book page by sliding a finger left and right on a display. Such a function is realized, for example, by configuring the display of the terminal with a touch panel. In addition, a technique is known in which a part where a hand is captured is extracted from an image acquired by a camera and a recognized predetermined hand movement is used as input information to the apparatus (for example, see Patent Document 1).

特開２００１−５６８６１号公報JP 2001-56861 A

しかしながら、タッチパネルを搭載していない端末は、ディスプレイに対する指の動きに基づきページ送り動作等を認識することはできない。ディスプレイは予め端末に搭載されているものであるので、タッチパネルを搭載していない端末を、ページ送り動作等の認識可能にすることはできない。また、タッチパネルにより構成されたディスプレイは高価であるので、タッチパネルを搭載した端末も高価となる。また、特許文献１に記載された技術では、カメラで取得した画像全体から、各画素の色を判別して手を認識するので、処理負荷が大きい。 However, a terminal not equipped with a touch panel cannot recognize a page turning operation or the like based on a finger movement with respect to the display. Since the display is mounted on the terminal in advance, it is not possible to make a terminal that is not mounted with a touch panel recognizable for page feed operation or the like. Moreover, since the display comprised with the touch panel is expensive, the terminal carrying a touch panel also becomes expensive. Further, in the technique described in Patent Document 1, since the hand is recognized by determining the color of each pixel from the entire image acquired by the camera, the processing load is large.

そこで、本発明は、上記問題点に鑑みてなされたものであり、タッチパネル非搭載の装置であっても、軽い処理負荷によりユーザの動作に基づく入力操作を認識することが可能な認識装置及び認識方法を提供することを目的とする。 Therefore, the present invention has been made in view of the above problems, and a recognition device and a recognition device that can recognize an input operation based on a user's operation with a light processing load even in a device without a touch panel. It aims to provide a method.

上記課題を解決するために、本発明の認識装置は、ユーザの動作に基づき入力操作を認識する認識装置であって、１次元以上のデータにより表されるデータ要素が２次元に複数配列されてなりユーザの動作を捉えた２次元データを取得する取得手段と、取得手段により取得された２次元データから、１次元以上のデータのうちの所定の1次元のデータにより表されるデータ要素からなる２次元データを抽出する抽出手段と、抽出手段により抽出された２次元データに基づき、データ要素における１次元のデータの時系列の変化を解析する解析手段と、解析手段により解析された１次元のデータの時系列の変化に基づき、ユーザの動作を判定する動き判定手段と、動き判定手段により判定されたユーザの動作に基づく入力操作に関する情報を出力する出力手段とを備えることを特徴とする。 In order to solve the above-described problems, a recognition apparatus according to the present invention is a recognition apparatus that recognizes an input operation based on a user's action, and a plurality of data elements represented by one-dimensional or more data are arranged in two dimensions. The acquisition means for acquiring the two-dimensional data that captures the user's action, and the data elements represented by the predetermined one-dimensional data among the one-dimensional or higher data from the two-dimensional data acquired by the acquisition means Extraction means for extracting two-dimensional data, analysis means for analyzing a time series change of one-dimensional data in the data element based on the two-dimensional data extracted by the extraction means, and one-dimensional data analyzed by the analysis means Based on the time-series change of the data, a motion determination unit that determines the user's motion, and outputs information related to the input operation based on the user's motion determined by the motion determination unit Output means.

また、上記課題を解決するために、本発明の認識方法は、ユーザの動作に基づき入力操作を認識する認識装置における認識方法であって、１次元以上のデータにより表されるデータ要素が２次元に複数配列されてなりユーザの動作を捉えた２次元データを取得する取得ステップと、取得ステップにおいて取得された２次元データから、１次元以上のデータのうちの所定の1次元のデータにより表されるデータ要素からなる２次元データを抽出する抽出ステップと、抽出ステップにおいて抽出された２次元データに基づき、データ要素における１次元のデータの時系列の変化を解析する解析ステップと、解析ステップにおいて解析された１次元のデータの時系列の変化に基づき、ユーザの動作を判定する動き判定ステップと、動き判定ステップにおいて判定されたユーザの動作に基づく入力操作に関する情報を出力する出力ステップとを有することを特徴とする。 In order to solve the above-described problem, a recognition method of the present invention is a recognition method in a recognition device that recognizes an input operation based on a user's action, and a data element represented by one or more dimensions is two-dimensional. The acquisition step for acquiring two-dimensional data that is arranged in multiples and captures the user's action, and the two-dimensional data acquired in the acquisition step is represented by predetermined one-dimensional data of one or more dimensions. An extraction step for extracting two-dimensional data composed of data elements, an analysis step for analyzing a time series change of the one-dimensional data in the data element based on the two-dimensional data extracted in the extraction step, and an analysis in the analysis step A motion determination step for determining a user's action based on a time-series change of the one-dimensional data, and a motion determination step And an output step for outputting information related to the input operation based on the user action determined in step S1.

本発明の認識装置及び認識方法によれば、ユーザの動作を捉えた２次元データに基づきユーザの動作が判定され、判定されたユーザの動作に基づき入力操作に関する情報が出力されるので、タッチパネル非搭載の装置であってもユーザの動作に基づく入力操作を認識することが可能となる。また、１次元以上のデータにより表されるデータ要素から成る２次元データから抽出された１次元のデータにより表されるデータ要素に基づきユーザの動作が判定されるので、かかる判定処理における処理負荷は軽い。 According to the recognition apparatus and the recognition method of the present invention, the user's action is determined based on the two-dimensional data that captures the user's action, and information related to the input operation is output based on the determined user action. Even on-board devices, it is possible to recognize input operations based on user actions. Further, since the user's action is determined based on the data element represented by the one-dimensional data extracted from the two-dimensional data composed of the data elements represented by one or more dimensions, the processing load in the determination process is as follows. light.

また、本発明の認識装置では、取得手段は、１次元以上のデータにより表される画素からなりユーザの動作を捉えた画像データを取得し、抽出手段は、取得手段により取得された画像データから、１次元以上のデータのうちの所定の1次元のデータにより表される画素からなる画像データを抽出し、解析手段は、抽出手段により抽出された画像データに基づき、１次元のデータの時系列の変化を解析することを特徴とする。 In the recognition apparatus of the present invention, the acquisition unit acquires image data including pixels represented by one-dimensional or more data and captures the user's action, and the extraction unit uses the image data acquired by the acquisition unit. Extracting image data composed of pixels represented by predetermined one-dimensional data out of one or more dimensions, and analyzing means based on the image data extracted by the extracting means, a time series of one-dimensional data It is characterized by analyzing the change of.

この場合には、ユーザの動作を捉えた画像データに基づきユーザの動作が判定され、判定されたユーザの動作に基づき入力操作に関する情報が出力されるので、タッチパネル非搭載の装置であってもカメラ等により取得された画像により、ユーザの動作に基づく入力操作を認識することが可能となる。また、１次元以上のデータにより表される画素から成る画像データから抽出された１次元のデータにより表される画像データに基づきユーザの動作が判定されるので、かかる判定処理における処理負荷は軽い。 In this case, the user's action is determined based on the image data that captures the user's action, and information about the input operation is output based on the determined user action. The input operation based on the user's operation can be recognized from the image acquired by the above. In addition, since the user's action is determined based on the image data represented by the one-dimensional data extracted from the image data composed of the pixels represented by the one-dimensional or more data, the processing load in the determination process is light.

また、本発明の認識装置では、取得手段は、輝度値を含む複数の次元のデータにより表される画像データを取得し、１次元のデータは輝度値であることを特徴とする。 In the recognition apparatus of the present invention, the acquisition unit acquires image data represented by a plurality of dimension data including a luminance value, and the one-dimensional data is a luminance value.

画像データにおいてユーザの手等の身体の一部が撮影された部分は、背景等のその他の部分と異なる輝度値を有している可能性が高い。上記構成によれば、輝度値を含む複数の次元のデータにより表された画像データから、輝度値により表される画像データが抽出されるので、ユーザの手等の身体の一部を適切に認識することが可能となる。 There is a high possibility that a portion where a part of the body such as the user's hand is captured in the image data has a luminance value different from that of other portions such as the background. According to the above configuration, the image data represented by the brightness value is extracted from the image data represented by the data of a plurality of dimensions including the brightness value, so that a part of the body such as the user's hand is appropriately recognized. It becomes possible to do.

また、本発明の認識装置では、解析手段は、互いに交わる第１及び第２の座標軸を画像データに設定し、第２の座標軸の方向に沿って第１の座標軸における第１の座標値ごとに画素の輝度値を積分し、第１の座標値ごとの積分された輝度値の時系列の変化を解析することを特徴とする。 In the recognition apparatus of the present invention, the analysis unit sets the first and second coordinate axes that intersect with each other in the image data, and for each first coordinate value on the first coordinate axis along the direction of the second coordinate axis. The luminance value of the pixel is integrated, and a change in time series of the integrated luminance value for each first coordinate value is analyzed.

この場合には、第１の座標値ごとの輝度値の積分値が算出されるので、第１の座標値ごとの１次元のデータの時系列の変化に基づきユーザの動作が判定される。これにより、軽い処理負荷によりユーザの動作を判定できる。 In this case, since the integral value of the luminance value for each first coordinate value is calculated, the user's action is determined based on the time-series change of the one-dimensional data for each first coordinate value. Thereby, a user's operation | movement can be determined with a light processing load.

また、本発明の認識装置では、解析手段は、第１の座標軸上における、第１の座標値ごとの積分された輝度値が所定の閾値未満である領域の変化を解析することを特徴とする。 In the recognition apparatus of the present invention, the analyzing means analyzes a change in a region on the first coordinate axis where the integrated luminance value for each first coordinate value is less than a predetermined threshold value. .

画像データにおいてユーザの手等の身体の一部が撮影された部分は、その他の部分と比較して低い輝度値を有する可能性が高い。上記構成によれば、第１の座標値ごとの積分された輝度値が所定の閾値未満である領域の変化が解析されるので、ユーザの手等の身体の一部の動きを適切に判定することが可能となる。 A portion where a part of the body such as a user's hand is photographed in the image data is likely to have a lower luminance value than other portions. According to the above configuration, since the change in the region where the integrated luminance value for each first coordinate value is less than the predetermined threshold is analyzed, the movement of a part of the body such as the user's hand is appropriately determined. It becomes possible.

また、本発明の認識装置では、解析手段は、第１の座標軸上における、積分された輝度値が所定の閾値未満の領域の数に基づき、当該領域がユーザの手を反映したものであるか否かを判定し、動き判定手段は、第１の座標軸上における、積分された輝度値が所定の閾値未満の領域の動きに基づき、ユーザの手の動きを判定することを特徴とする。 In the recognition apparatus of the present invention, the analysis means reflects the user's hand based on the number of areas on the first coordinate axis whose integrated luminance value is less than a predetermined threshold value. The motion determination means determines the motion of the user's hand based on the motion of the region where the integrated luminance value is less than a predetermined threshold on the first coordinate axis.

この場合には、画像データにユーザの手が捉えられている場合には、第１の座標軸上における、積分された輝度値が所定の閾値未満の領域は、手の形状に依存した数になる。上記構成によれば、輝度値が所定の閾値未満の領域の数に基づき、当該領域がユーザの手を反映したものであるか否かが判定されるので、ユーザの手を適切に認識できる。 In this case, when the user's hand is captured in the image data, the number of regions on the first coordinate axis whose integrated luminance value is less than a predetermined threshold is a number depending on the shape of the hand. . According to the above configuration, since it is determined whether or not the area reflects the user's hand based on the number of areas whose luminance values are less than the predetermined threshold, the user's hand can be recognized appropriately.

また、本発明の認識装置は、解析手段により解析された１次元のデータの時系列の変化と、ユーザの動作との対応付けを学習する学習手段を更に備え、動き判定手段は、解析手段により解析された１次元のデータの時系列の変化に基づき、学習手段により学習された対応付けを参照して、当該１次元のデータの時系列の変化に対応するユーザの動作を判定することを特徴とする。 The recognition apparatus of the present invention further includes a learning unit that learns a correspondence between a time-series change of the one-dimensional data analyzed by the analysis unit and a user's action, and the motion determination unit includes the analysis unit. Based on the time-series change of the analyzed one-dimensional data, the user's action corresponding to the time-series change of the one-dimensional data is determined with reference to the association learned by the learning means. And

この場合には、取得された画像データにおける１次元のデータの時系列の変化と、ユーザの動作との対応付けが予め学習され、かかる対応付けを参照して、解析された１次元のデータの時系列の変化に基づきユーザの動作が判定されるので、ユーザの動作の確実な判定が可能となる。 In this case, the correspondence between the time-series change of the one-dimensional data in the acquired image data and the user's action is learned in advance, and the analysis of the one-dimensional data analyzed with reference to the correspondence is performed. Since the user's action is determined based on the time series change, the user's action can be reliably determined.

また、本発明の認識装置は、出力手段から出力された入力操作に関する情報に応じて、表示対象のコンテンツを書き換えるコンテンツ処理手段と、コンテンツ処理手段により書き換えられたコンテンツを表示する表示手段とをさらに備えることを特徴とする。 The recognition apparatus according to the present invention further includes a content processing unit that rewrites content to be displayed according to information related to an input operation output from the output unit, and a display unit that displays the content rewritten by the content processing unit. It is characterized by providing.

この場合には、ユーザの動作が反映された、入力操作に関する情報に応じて、表示対象のコンテンツが書き換えられるので、ユーザの動作に応じた適切なコンテンツの表示が可能となる。 In this case, since the content to be displayed is rewritten according to the information related to the input operation that reflects the user's operation, it is possible to display the appropriate content according to the user's operation.

また、本発明の認識装置では、取得手段は、ユーザの身体に装着可能な撮像装置により構成されることを特徴とする。 In the recognition device of the present invention, the acquisition unit is configured by an imaging device that can be worn on the user's body.

この場合には、ユーザの身体等を撮像するための手段を予め有していない装置に対して、かかる手段を付加的に提供できる。また、ユーザの身体における動作の判定対象の部分近傍に取得部を装着することにより、当該判定対象の部分を適切に撮像することが可能となる。 In this case, such means can be additionally provided to an apparatus that does not have a means for imaging the user's body or the like in advance. In addition, by attaching the acquisition unit in the vicinity of the determination target portion of the user's body, it is possible to appropriately capture the determination target portion.

タッチパネル非搭載の装置であっても、軽い処理負荷によりユーザの動作に基づく入力操作を認識することが可能となる。 Even a device without a touch panel can recognize an input operation based on a user's operation with a light processing load.

認識装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of a recognition apparatus. 認識装置のハードブロック図である。It is a hard block diagram of a recognition device. 取得部を構成するカメラの装着例を示す図である。It is a figure which shows the example of mounting | wearing of the camera which comprises an acquisition part. 取得部により取得された画像の例を示す図である。It is a figure which shows the example of the image acquired by the acquisition part. 画像データにおける輝度値の積分方向を示す図である。It is a figure which shows the integration direction of the luminance value in image data. 輝度値により表される画素からなる画像の例、及び積分された輝度値の画像の横軸方向に沿った変化を示す図である。It is a figure which shows the change along the horizontal-axis direction of the example of the image which consists of the pixel represented with a luminance value, and the image of the integrated luminance value. 積分された輝度値の画像の横軸方向に沿った変化の例を示す図である。It is a figure which shows the example of the change along the horizontal-axis direction of the image of the integrated luminance value. カメラの好ましい配置位置の例を示す図である。It is a figure which shows the example of the preferable arrangement position of a camera. 認識装置における認識方法の処理内容を示すフローチャートである。It is a flowchart which shows the processing content of the recognition method in a recognition apparatus.

本発明に係る認識装置及び認識方法の実施形態について図面を参照して説明する。なお、可能な場合には、同一の部分には同一の符号を付して、重複する説明を省略する。 Embodiments of a recognition apparatus and a recognition method according to the present invention will be described with reference to the drawings. If possible, the same parts are denoted by the same reference numerals, and redundant description is omitted.

図１は、認識装置１の機能的構成を示すブロック図である。本実施形態の認識装置１は、ユーザの動作に基づき入力操作を認識する装置である。図１に示すように、認識装置１は、機能的には、取得部１０（取得手段）、判定部２０、コンテンツ処理部３０（コンテンツ処理手段）及び表示部４０（表示手段）を備える。また、判定部２０は、抽出部２１（抽出手段）、解析部２２（解析手段）、動き判定部２３（動き判定手段）、学習部２４（学習手段）及び出力部２５（出力手段）を備える。さらに、コンテンツ処理部３０は、コンテンツ書換部３１及びコンテンツ蓄積部３２を備える。 FIG. 1 is a block diagram illustrating a functional configuration of the recognition apparatus 1. The recognition device 1 of the present embodiment is a device that recognizes an input operation based on a user's operation. As shown in FIG. 1, the recognition apparatus 1 functionally includes an acquisition unit 10 (acquisition unit), a determination unit 20, a content processing unit 30 (content processing unit), and a display unit 40 (display unit). The determination unit 20 includes an extraction unit 21 (extraction unit), an analysis unit 22 (analysis unit), a motion determination unit 23 (motion determination unit), a learning unit 24 (learning unit), and an output unit 25 (output unit). . Further, the content processing unit 30 includes a content rewriting unit 31 and a content storage unit 32.

図２は、認識装置１のハードウエア構成図である。認識装置１は、物理的には、図２に示すように、ＣＰＵ１０１、主記憶装置であるＲＡＭ１０２及びＲＯＭ１０３、データ送受信デバイスである通信モジュール１０４、ハードディスク、フラッシュメモリ等の補助記憶装置１０５、入力デバイスであるキーボード等の入力装置１０６、ディスプレイ等の出力装置１０７などを含むコンピュータシステムとして構成されている。図１に示した各機能は、図２に示すＣＰＵ１０１、ＲＡＭ１０２等のハードウエア上に所定のコンピュータソフトウェアを読み込ませることにより、ＣＰＵ１０１の制御のもとで通信モジュール１０４、入力装置１０６、出力装置１０７を動作させるとともに、ＲＡＭ１０２や補助記憶装置１０５におけるデータの読み出し及び書き込みを行うことで実現される。 FIG. 2 is a hardware configuration diagram of the recognition apparatus 1. As shown in FIG. 2, the recognition apparatus 1 physically includes a CPU 101, a RAM 102 and a ROM 103 which are main storage devices, a communication module 104 which is a data transmission / reception device, an auxiliary storage device 105 such as a hard disk and a flash memory, and an input device. The computer system includes an input device 106 such as a keyboard and an output device 107 such as a display. Each function shown in FIG. 1 has a communication module 104, an input device 106, and an output device 107 under the control of the CPU 101 by loading predetermined computer software on the hardware such as the CPU 101 and the RAM 102 shown in FIG. This is realized by reading and writing data in the RAM 102 and the auxiliary storage device 105.

認識装置１は、例えば、取得部１０及び表示部４０と有線接続されたパーソナルコンピュータ、取得部１０及び表示部４０と無線接続されたサーバ、又は端末装置等により構成される。端末装置は、例えば、携帯端末及び電子書籍用端末に例示される。再び、図１を参照し、認識装置１の各機能部について詳細に説明する。 The recognition device 1 includes, for example, a personal computer that is wired to the acquisition unit 10 and the display unit 40, a server that is wirelessly connected to the acquisition unit 10 and the display unit 40, or a terminal device. The terminal device is exemplified by a mobile terminal and an electronic book terminal, for example. Again, with reference to FIG. 1, each function part of the recognition apparatus 1 is demonstrated in detail.

取得部１０は、１次元以上のデータにより表されるデータ要素が２次元に複数配列されてなりユーザの動作を捉えた２次元データを取得する部分である。本実施形態では、取得部１０は、例えば、１次元以上のデータにより表される画素からなりユーザの動作を捉えた画像データを取得する。なお、本実施形態では、ユーザの動作を捉えた２次元データとして画像データが取得部１０により取得され、ユーザの動作を捉えた２次元データは２次元に配列された画素からなることとしたが、この態様に限定されない。ユーザの動作を捉えた２次元データは、３次元空間におけるユーザの身体の一部の存在及び動作に関する情報を２次元平面上に捉えることが可能なセンサにより取得された２次元データであれば本発明に適用可能であり、例えば、平面に輝度のみがプロットされたような２次元データであってもよい。 The acquisition unit 10 is a part that acquires two-dimensional data that captures a user's action by arranging a plurality of data elements represented by two or more dimensions in a one-dimensional manner. In the present embodiment, the acquisition unit 10 acquires, for example, image data that includes pixels represented by one-dimensional or more data and captures a user's action. In the present embodiment, image data is acquired by the acquisition unit 10 as two-dimensional data that captures the user's action, and the two-dimensional data that captures the user's action is composed of pixels arranged in two dimensions. However, the present invention is not limited to this embodiment. The two-dimensional data that captures the user's movement is the two-dimensional data acquired by a sensor that can capture information on the presence and movement of a part of the user's body in the three-dimensional space on a two-dimensional plane. For example, it may be two-dimensional data in which only luminance is plotted on a plane.

具体的には、取得部１０は、１次元以上の情報を取得できるセンサであって、例えば、可視光若しくは赤外線により画像を取得するカメラ又は距離画像カメラであることができる。また、取得部１０により取得される画像データは、例えば、ＨＳＶ形式、ＲＧＢ形式といった複数次元のデータにより表される画素からなることができる。取得部１０をかかるセンサにより構成することにより、タッチパネルといった入力装置や、ユーザの身体等を撮像するための装置を予め有していない装置に対して、ユーザの身体等の動作を取得する手段を付加的に提供できる。 Specifically, the acquisition unit 10 is a sensor that can acquire one-dimensional or more information, and can be, for example, a camera that acquires an image by visible light or infrared light, or a distance image camera. Moreover, the image data acquired by the acquisition part 10 can consist of a pixel represented by multidimensional data, such as HSV format and RGB format, for example. By configuring the acquisition unit 10 with such a sensor, a means for acquiring an operation of the user's body or the like for an input device such as a touch panel or a device that does not have a device for imaging the user's body or the like in advance. Can additionally be provided.

図３は、取得部１０を構成するカメラの装着例を示す図である。取得部１０は、ユーザの指を撮影可能な位置に装着されることが好ましい。図３に示すように、例えば、取得部１０は、ユーザの手首に装着される。これにより、ユーザの動作の判定態様である指を適切に撮像することが可能となる。 FIG. 3 is a diagram illustrating a mounting example of the camera constituting the acquisition unit 10. The acquisition unit 10 is preferably mounted at a position where the user's finger can be photographed. As illustrated in FIG. 3, for example, the acquisition unit 10 is attached to the user's wrist. Thereby, it is possible to appropriately capture an image of a finger which is a determination mode of the user's motion.

また、図４は、取得部１０により取得された画像データの例を示す図である。図４に示すように、ユーザの指を含む画像データが取得部１０により取得されている。取得部１０は、取得した画像データを抽出部２１に送出する。 FIG. 4 is a diagram illustrating an example of image data acquired by the acquisition unit 10. As shown in FIG. 4, image data including a user's finger is acquired by the acquisition unit 10. The acquisition unit 10 sends the acquired image data to the extraction unit 21.

判定部２０は、取得部１０により取得された画像データに基づき、ユーザの動作に基づく入力操作に関する情報を出力する部分である。以下、判定部２０が備える各機能部について詳細に説明する。 The determination unit 20 is a part that outputs information related to an input operation based on the user's action based on the image data acquired by the acquisition unit 10. Hereinafter, each function part with which the determination part 20 is provided is demonstrated in detail.

抽出部２１は、取得部１０により取得された画像データから、１次元以上のデータのうちの所定の1次元のデータにより表される画素からなる画像データを抽出する部分である。より具体的には、抽出部２１は、取得部１０により取得された画像データから、例えば輝度値により表される画像データを抽出することができる。 The extraction unit 21 is a part that extracts image data composed of pixels represented by predetermined one-dimensional data of one or more dimensions from the image data acquired by the acquisition unit 10. More specifically, the extraction unit 21 can extract, for example, image data represented by a luminance value from the image data acquired by the acquisition unit 10.

画像データにおいてユーザの手等の身体の一部が撮影された部分は、背景等のその他の部分と異なる輝度値を有している可能性が高いので、輝度値により表される画像データを抽出することにより、ユーザの手等の身体の一部を画像データにおいて適切に認識することが可能となる。 The portion of the image data where a part of the body such as the user's hand is photographed is likely to have a brightness value different from other parts such as the background, so the image data represented by the brightness value is extracted. By doing so, it becomes possible to appropriately recognize a part of the body such as the user's hand in the image data.

なお、本実施形態では、輝度値により表される画像データが抽出部２１により抽出されることとしたが、輝度値以外の他のデータにより表される画像データが抽出されることとしてもよい。抽出部２１は、抽出した画像データを解析部２２に送出する。 In the present embodiment, the image data represented by the luminance value is extracted by the extraction unit 21, but the image data represented by data other than the luminance value may be extracted. The extraction unit 21 sends the extracted image data to the analysis unit 22.

解析部２２は、抽出部２１により抽出された画像データに基づき、１次元のデータの時系列の変化を解析する部分である。より具体的には、解析部２２は、互いに交わる第１及び第２の座標軸を画像データに設定し、第２の座標軸の方向に沿って第１の座標軸における第１の座標値ごとに画素の輝度値を積分し、第１の座標値ごとの積分された輝度値の時系列の変化を解析する。図５を参照して、解析部２２における処理内容を具体的に説明する。 The analysis unit 22 is a part that analyzes a time-series change of one-dimensional data based on the image data extracted by the extraction unit 21. More specifically, the analysis unit 22 sets the first and second coordinate axes that intersect with each other in the image data, and sets the pixel value for each first coordinate value in the first coordinate axis along the direction of the second coordinate axis. The luminance value is integrated, and the change in the time series of the integrated luminance value for each first coordinate value is analyzed. With reference to FIG. 5, the processing content in the analysis part 22 is demonstrated concretely.

図５に示すように、解析部２２は、画像データＰに第１の座標軸ａ_Ｈ及び第２の座標軸ａ_Ｖを設定する。図５に示す例では、長方形の形状を有する画像データＰに対して、第１の座標軸ａ_Ｈは横軸として設定され、第２の座標軸ａ_Ｖは縦軸として設定されている。なお、画像データＰに対する第１及び第２の座標軸の設定の仕方は、上記設定に限定されず、任意の方向に沿って座標軸を設定できる。続いて、解析部２２は、第２の座標軸ａ_Ｖの方向、即ち矢印Ｒの方向に沿って、第１の座標軸ａ_Ｈにおける座標値ごとに画素の輝度値を積分する。 As illustrated in FIG. 5, the analysis unit 22 sets a first coordinate axis a _H and a second coordinate axis a _V in the image data P. In the example shown in FIG. 5, for the image data P having a rectangular shape, the first coordinate axis a _H is set as the horizontal axis, and the second coordinate axis a _V is set as the vertical axis. The method of setting the first and second coordinate axes for the image data P is not limited to the above setting, and the coordinate axes can be set along an arbitrary direction. Subsequently, analyzing unit 22, the direction of the second coordinate axis a _V, that is, along the direction of arrow R, integrating the luminance values of the pixels for each coordinate values in the first coordinate axis a _H.

図６（ａ）は、輝度値により表される画素からなる画像の例を示す図である。また、図６（ｂ）は、第１の座標軸ａ_Ｈおける座標値に対する積分された輝度値の変化Ｖを示す図である。なお、図６（ｂ）において、符号Ｔ及びＣはそれぞれ、積分された輝度値における所定の閾値、及び積分された輝度値の変化Ｖを示す線との交点を示す。 FIG. 6A is a diagram illustrating an example of an image including pixels represented by luminance values. Also, FIG. 6 (b) is a diagram illustrating a change V of the integrated luminance value with respect to the coordinate values definitive first coordinate axis a _H. In FIG. 6B, symbols T and C indicate intersections with a predetermined threshold value in the integrated luminance value and a line indicating the change V in the integrated luminance value, respectively.

図６（ａ）及び図６（ｂ）に示すように、第１の座標軸ａ_Ｈ上において、指が撮影された位置に対応する領域の輝度値が低くなっている。即ち、図６（ａ）に示されるように、画像データにおける指が撮像された部分は、指の背景等の部分と比較して低い輝度値を有する。従って、図６（ｂ）では、第１の座標軸ａ_Ｈ上において指に対応する領域の積分された輝度値Ｖが低くなっている。 As shown in FIG. 6 (a) and 6 (b), on the first coordinate axis a _H, the luminance value of the region corresponding to the position where the finger is photographed is low. That is, as shown in FIG. 6A, the portion of the image data where the finger is imaged has a lower luminance value than the portion such as the background of the finger. Thus, in FIG. 6 (b), the accumulated luminance value V of the region corresponding to the finger on the first coordinate axis a _H is low.

さらに、解析部２２は、取得部１０及び抽出部２１を介して時系列に取得された画像データのそれぞれから第１の座標軸ａ_Ｈにおける座標値ごとに画素の輝度値を積分し、第１の座標軸ａ_Ｈおける第１の座標値ごとの積分された輝度値の時系列の変化を解析する。 Furthermore, the analysis unit 22 integrates the luminance value of the pixel for each coordinate value on the first coordinate axis a _H from each of the image data acquired in time series via the acquisition unit 10 and the extraction unit 21, to analyze changes in time series of the accumulated luminance value for each first coordinate value definitive axes a _H.

より具体的には、解析部２２は、第１の座標軸ａ_Ｈ上における第１の座標値ごとの積分された輝度値Ｖが所定の閾値Ｔ未満である領域の、第１の座標軸ａ_Ｈ方向の移動を取得する。なお、所定の閾値Ｔは、予め設定された値であってもよいし、抽出部２１により抽出された画像データＰに含まれる全画素の輝度値の（平均値±所定値）であってもよい。 More specifically, the analysis unit 22, the region first coordinate axis a _H accumulated luminance value for each first coordinate value on V is less than the predetermined threshold value T, the first coordinate axis a _H direction Get the move. The predetermined threshold T may be a preset value, or may be an average value ± predetermined value of luminance values of all pixels included in the image data P extracted by the extraction unit 21. Good.

また、解析部２２は、積分された輝度値Ｖが所定の閾値Ｔ未満である領域の第１の座標軸ａ_Ｈ方向の移動を、以下に示すような種々の方法により取得できる。例えば、解析部２２は、図６（ｂ）に示すように、第１の座標軸ａ_Ｈ上において最も端に位置する、積分された輝度値Ｖが所定の閾値Ｔ以下となる点Ｃの変動を取得することにより解析することとしてもよい。また、解析部２２は、第１の座標軸ａ_Ｈ上における第１の座標値ごとの積分された輝度値Ｖの変動を時系列で比較することにより解析することとしてもよい。さらに、解析部２２は、積分された輝度値Ｖの変動を、波動関数を用いて解析することとしてもよい。そして、解析部２２は、解析結果を動き判定部２３及び学習部２４に送出する。 Further, the analysis unit 22, the movement of the first coordinate axis a _H direction area integrated luminance value V is less than the predetermined threshold value T, it can be obtained by various methods as shown below. For example, as shown in FIG. 6B, the analysis unit 22 detects the variation of the point C, which is located at the extreme end on the first coordinate axis a _{H and} at which the integrated luminance value V is equal to or less than a predetermined threshold T. It is good also as analyzing by acquiring. Further, the analysis unit 22, may be analyzed by comparing the variation of the first coordinate axis a _H accumulated luminance value for each first coordinate value on V over time. Furthermore, the analysis part 22 is good also as analyzing the fluctuation | variation of the integrated luminance value V using a wave function. Then, the analysis unit 22 sends the analysis result to the motion determination unit 23 and the learning unit 24.

また、解析部２２は、第１の座標軸ａ_Ｈ上における、積分された輝度値が所定の閾値未満の領域の数及び幅に基づき、当該領域がユーザの手を反映したものであるか否かを判定することができる。かかる判定処理により、ユーザの手を適切に認識できる。図７を参照して、この判定処理を説明する。 Further, the analysis unit 22, the first coordinate axis a _H, based on the number and width of the area under the integrated luminance value is a predetermined threshold value, whether the area is a reflection of the user's hand Can be determined. With this determination process, the user's hand can be properly recognized. This determination process will be described with reference to FIG.

図７は、積分された輝度値Ｖの第１の座標軸ａ_Ｈ上に沿った変化の例を示す図である。例えば、図７（ａ）に示すように、積分された輝度値Ｖが所定の閾値Ｔ以下となる領域が存在する場合には、解析部２２は、当該領域がユーザの手（指）を反映したものであると判定する。一方、図７（ｂ）に示すように、積分された輝度値Ｖが所定の閾値Ｔ以下となる領域が存在しない場合には、解析部２２は、画像データＰにユーザの手（指）が捉えられていないと判定する。 Figure 7 is a diagram showing an example of a variation along on the first coordinate axis a _H of the integrated luminance value V. For example, as illustrated in FIG. 7A, when there is a region where the integrated luminance value V is equal to or less than a predetermined threshold T, the analysis unit 22 reflects the user's hand (finger). It is determined that On the other hand, as illustrated in FIG. 7B, when there is no region where the integrated luminance value V is equal to or less than the predetermined threshold T, the analysis unit 22 has the user's hand (finger) in the image data P. It is determined that it is not captured.

また、図７（ｃ）に示すように、積分された輝度値Ｖが所定の閾値Ｔ以下となる領域の幅Ｗ_１が所定の範囲の大きさである場合には、解析部２２は、当該領域がユーザの手（指）を反映したものであると判定する。一方、図７（ｄ）に示すように、積分された輝度値Ｖが所定の閾値Ｔ以下となる領域の幅Ｗ_２、幅Ｗ_３が所定の範囲の大きさより大きい又は小さい場合には、解析部２２は、当該領域がユーザの手（指）を反映したものではないと判定する。 Further, as shown in FIG. 7 (c), when the accumulated luminance value V is the width W ₁ of the region equal to or less than a predetermined threshold value T is the size of the predetermined range, the analysis unit 22, the It is determined that the area reflects the user's hand (finger). On the other hand, as shown in FIG. 7D, when the width W ₂ and the width W _{3 of the} region where the integrated luminance value V is equal to or smaller than the predetermined threshold T are larger or smaller than the predetermined range, the analysis is performed. The unit 22 determines that the area does not reflect the user's hand (finger).

また、５本の指のうち１〜５本のいずれかの数の指の動作により入力操作が実施される可能性があるので、図７（ｅ）に示すように、積分された輝度値Ｖが所定の閾値Ｔ以下となる領域の数が１〜５である場合には、解析部２２は、当該領域がユーザの手（指）を反映したものであると判定する。一方、図７（ｆ）に示すように、積分された輝度値Ｖが所定の閾値Ｔ以下となる領域の数が６以上である場合には、解析部２２は、当該領域がユーザの手（指）を反映したものではないと判定する。 Further, since there is a possibility that the input operation is performed by the operation of any one of 1 to 5 fingers among the 5 fingers, as shown in FIG. 7E, the integrated luminance value V When the number of areas where the value is equal to or less than the predetermined threshold T is 1 to 5, the analysis unit 22 determines that the area reflects the user's hand (finger). On the other hand, as shown in FIG. 7F, when the number of regions where the integrated luminance value V is equal to or less than a predetermined threshold T is 6 or more, the analysis unit 22 determines that the region is a user's hand ( (Finger) is not reflected.

動き判定部２３は、解析部２２により解析された１次元のデータの時系列の変化に基づき、ユーザの動作を判定する部分である。本実施形態では、動き判定部２３は、第１の座標軸ａ_Ｈ上における積分された輝度値Ｖが所定の閾値Ｔ未満の領域の動きに基づき、ユーザの手の動きを判定する。 The motion determination unit 23 is a part that determines a user's action based on a time-series change of the one-dimensional data analyzed by the analysis unit 22. In the present embodiment, the motion determination unit 23, luminance values V integration in the first coordinate axis a _H is based on the movement of the region of less than a predetermined threshold value T, and determines the motion of the user's hand.

例えば、積分された輝度値Ｖが所定の閾値Ｔ未満である領域が第１の座標軸ａ_Ｈ上においてプラスの方向に移動した旨の解析結果を解析部２２から取得した場合には、動き判定部２３は、ユーザの手が右方向に移動したと判定する。また、積分された輝度値Ｖが所定の閾値Ｔ未満である領域が第１の座標軸ａ_Ｈ上においてマイナスの方向に移動した旨の解析結果を解析部２２から取得した場合には、動き判定部２３は、ユーザの手が左方向に移動したと判定する。 For example, when the analysis result indicating that the region where the integrated luminance value V is less than the predetermined threshold T has moved in the positive direction on the first coordinate axis a _H is acquired from the analysis unit 22, the motion determination unit 23 determines that the user's hand has moved to the right. When an analysis result indicating that the region where the integrated luminance value V is less than the predetermined threshold T has moved in the negative direction on the first coordinate axis a _H is acquired from the analysis unit 22, the motion determination unit 23 determines that the user's hand has moved leftward.

また、動き判定部２３は、解析部２２により解析された１次元のデータの時系列の変化に基づき、学習部２４により学習された対応付けを参照して、当該１次元のデータの時系列の変化に対応するユーザの動作を判定することができる。後述するように、学習部２４は、積分された輝度値Ｖが所定の閾値Ｔ未満である領域の第１の座標軸ａ_Ｈ方向の移動と、ユーザの手の左右方向の動作との対応付けを学習し、その対応付けを記憶しているので、動き判定部２３は、学習部２４が記憶している対応付けを参照することにより、解析部２２から取得した解析結果に基づき、ユーザの手の動きを判定することができる。動き判定部２３は、ユーザの手の動きに関する判定結果を出力部２５に送出する。 In addition, the motion determination unit 23 refers to the association learned by the learning unit 24 based on the change in the time series of the one-dimensional data analyzed by the analysis unit 22, and determines the time series of the one-dimensional data. The user's action corresponding to the change can be determined. As described below, the learning section 24, the movement and the first coordinate axis a _H direction area integrated luminance value V is less than the predetermined threshold value T, the correspondence between the left and right directions of the operation of the user's hand Since learning is performed and the association is stored, the motion determination unit 23 refers to the association stored in the learning unit 24, and based on the analysis result obtained from the analysis unit 22, Movement can be determined. The movement determination unit 23 sends a determination result regarding the movement of the user's hand to the output unit 25.

学習部２４は、解析部２２により解析された１次元のデータの時系列の変化と、ユーザの動作との対応付けを学習する部分である。具体的には、学習部２４は、解析部２２から取得した、積分された輝度値Ｖが所定の閾値Ｔ未満である領域の第１の座標軸ａ_Ｈ方向の移動と、ユーザの手の左右方向の動作との対応付けを学習する。例えば、認識装置１を動作させた際に、ユーザの手の左右いずれの方向への動作であるかを認識装置１に対して明示的に入力することにより、学習部２４は、学習処理を実施できる。また、学習部２４は、ＳＶＭ（サポートベクターマシン）やｂｏｏｓｔといった公知の学習アルゴリズムにより学習処理を実施できる。学習部２４は、学習した対応付けを、例えばテーブルとして記憶・保存する。 The learning unit 24 is a part that learns the correspondence between the time-series change of the one-dimensional data analyzed by the analysis unit 22 and the user's action. Specifically, the learning unit 24 acquires the movement in the first coordinate axis a _H direction of the region where the integrated luminance value V acquired from the analysis unit 22 is less than the predetermined threshold T, and the horizontal direction of the user's hand. Learning the association with the operation. For example, when the recognition device 1 is operated, the learning unit 24 performs the learning process by explicitly inputting to the recognition device 1 whether the movement is in the left or right direction of the user's hand. it can. In addition, the learning unit 24 can perform a learning process using a known learning algorithm such as SVM (support vector machine) or boost. The learning unit 24 stores and stores the learned association as a table, for example.

出力部２５は、動き判定部２３により判定されたユーザの動作に基づく入力操作に関する情報を出力する部分である。具体的には、例えば、ユーザの手の右方向への動作を動き判定部２３から取得した場合には、出力部２５は、「コンテンツのページ送り」といった入力操作に関する情報を出力する。また、ユーザの手の左方向への動作を動き判定部２３から取得した場合には、出力部２５は、「コンテンツのページ戻し」といった入力操作に関する情報を出力する。その他、予めユーザの動作と所定の入力操作に関する情報との対応付けを出力部２５において記憶しておくことにより、出力部２５は、動き判定部２３から取得したユーザの動作に基づく入力操作に関する情報を出力することができる。 The output unit 25 is a part that outputs information related to an input operation based on the user action determined by the motion determination unit 23. Specifically, for example, when the movement in the right direction of the user's hand is acquired from the movement determination unit 23, the output unit 25 outputs information related to the input operation such as “page feed of content”. When the movement of the user's hand in the left direction is acquired from the movement determination unit 23, the output unit 25 outputs information related to the input operation such as “return page of content”. In addition, the output unit 25 stores information on the input operation based on the user's operation acquired from the motion determination unit 23 by storing the correspondence between the user's operation and information about the predetermined input operation in advance in the output unit 25. Can be output.

コンテンツ処理部３０は、出力部２５から出力された入力操作に関する情報に応じて、表示対象のコンテンツを書き換える部分であって、コンテンツ書換部３１及びコンテンツ蓄積部３２を備える。 The content processing unit 30 is a part that rewrites content to be displayed in accordance with information related to the input operation output from the output unit 25, and includes a content rewriting unit 31 and a content storage unit 32.

コンテンツ書換部３１は、出力部２５から取得した入力操作に関する情報に基づきコンテンツを書き換える。具体的には、例えば、出力部２５から「コンテンツのページ送り」といった情報を取得した場合には、コンテンツ書換部３１は、表示部４０において表示されているコンテンツの次のページのコンテンツをコンテンツ蓄積部３２から抽出して、抽出したコンテンツにより、表示部４０に表示させるコンテンツを書き換える。また、出力部２５から「コンテンツのページ戻し」といった情報を取得した場合には、コンテンツ書換部３１は、表示部４０において表示されているコンテンツの前のページのコンテンツをコンテンツ蓄積部３２から抽出して、抽出したコンテンツにより、表示部４０に表示させるコンテンツを書き換える。コンテンツ蓄積部３２は、表示部４０に表示させるコンテンツを記憶している記憶手段である。 The content rewriting unit 31 rewrites the content based on the information regarding the input operation acquired from the output unit 25. Specifically, for example, when information such as “page feed of content” is acquired from the output unit 25, the content rewriting unit 31 stores the content of the page next to the content displayed on the display unit 40 as content storage. The content extracted from the unit 32 and displayed on the display unit 40 is rewritten with the extracted content. In addition, when information such as “content page return” is acquired from the output unit 25, the content rewriting unit 31 extracts the content of the page before the content displayed on the display unit 40 from the content storage unit 32. Thus, the content to be displayed on the display unit 40 is rewritten with the extracted content. The content storage unit 32 is a storage unit that stores content to be displayed on the display unit 40.

表示部４０は、コンテンツを表示する部分であって、例えば、ディスプレイ、プロジェクタといった映像投影機器及びヘッドマウントディスプレイ（ＨＭＤ）といった装置により構成される。ＨＭＤは、光学透過型であってもビデオ透過型であってもよい。本実施形態では、表示部４０は、コンテンツを表示すると共に、コンテンツ書換部３１によりコンテンツが書き換えられた場合には、書き換えられたコンテンツを表示する。 The display unit 40 is a part that displays content, and includes, for example, a video projection device such as a display and a projector, and a device such as a head mounted display (HMD). The HMD may be an optical transmission type or a video transmission type. In the present embodiment, the display unit 40 displays the content, and when the content is rewritten by the content rewriting unit 31, the rewritten content is displayed.

ここで、図８を参照して、本実施形態におけるカメラ等センサの好ましい配置位置について説明する。本実施形態では、カメラ等により構成される取得部１０は、手Ｈにより外光が遮られるような位置Ａや、ユーザが入力操作を行う際に手Ｈが接する面Ｂに配置されることが好ましい。かかる位置に取得部１０が配置されることにより、ユーザの手が撮像された部分がユーザの手の背景部分と比較して小さい輝度値を有するような画像が取得部１０により取得される。このように取得部１０を配置することにより、取得部１０により取得された画像から、ユーザの手等の身体の一部を適切に認識することが可能となる。 Here, with reference to FIG. 8, a preferable arrangement position of the sensor such as a camera in the present embodiment will be described. In the present embodiment, the acquisition unit 10 configured by a camera or the like may be disposed at a position A where external light is blocked by the hand H, or a surface B where the hand H contacts when the user performs an input operation. preferable. By arranging the acquisition unit 10 at such a position, the acquisition unit 10 acquires an image such that a portion where the user's hand is imaged has a smaller luminance value than the background portion of the user's hand. By arranging the acquisition unit 10 in this way, a part of the body such as the user's hand can be appropriately recognized from the image acquired by the acquisition unit 10.

続いて、図９を参照して、本実施形態の認識方法における認識装置１の動作について説明する。図９は、認識方法の処理内容を示すフローチャートである。 Next, the operation of the recognition apparatus 1 in the recognition method of the present embodiment will be described with reference to FIG. FIG. 9 is a flowchart showing the processing contents of the recognition method.

まず、取得部１０は、１次元以上のデータにより表される画素からなりユーザの動作を捉えた画像データを取得する（Ｓ１、取得ステップ）。続いて、抽出部２１は、取得部１０により取得された画像データから輝度値のみを抽出し、輝度値により表される画素からなる画像データを生成する（Ｓ２、抽出ステップ）。 First, the acquisition unit 10 acquires image data that includes pixels represented by one-dimensional or more data and captures a user's action (S1, acquisition step). Subsequently, the extraction unit 21 extracts only the luminance value from the image data acquired by the acquisition unit 10, and generates image data composed of pixels represented by the luminance value (S2, extraction step).

次に、解析部２２は、互いに交わる第１の座標軸（横軸）及び第２の座標軸（縦軸）を画像データに設定し、縦軸の方向に沿って横軸における座標値ごとに画素の輝度値を積分する（Ｓ３、解析ステップ）。続いて、解析部２２は、第１の座標軸ａ_Ｈ上における第１の座標値ごとの積分された輝度値Ｖが所定の閾値Ｔ未満である領域の、第１の座標軸ａ_Ｈ方向の移動を解析する（Ｓ４、解析ステップ）。 Next, the analysis unit 22 sets the first coordinate axis (horizontal axis) and the second coordinate axis (vertical axis) that intersect each other in the image data, and sets the pixel value for each coordinate value on the horizontal axis along the direction of the vertical axis. The luminance value is integrated (S3, analysis step). Subsequently, analyzing unit 22, the region first coordinate axis a _H accumulated luminance value for each first coordinate value on V is less than the predetermined threshold value T, the movement of the first coordinate axis a _H direction Analyze (S4, analysis step).

次に、判定部２０は、学習モードであるか否かを判定する（Ｓ５）。学習モードは、学習部２４が、積分された輝度値Ｖが所定の閾値Ｔ未満である領域の第１の座標軸ａ_Ｈ方向の移動と、ユーザの手の左右方向の動作との対応付けを学習するためのモードである。学習モードであると判定された場合には、処理手順はステップＳ９に進められる。一方、学習モードであると判定されなかった場合には、処理手順はステップＳ６に進められる。 Next, the determination unit 20 determines whether or not the learning mode is set (S5). Learning mode, the learning learning section 24, the movement and the first coordinate axis a _H direction area integrated luminance value V is less than the predetermined threshold value T, the correspondence between the left and right directions of the operation of the user's hand It is a mode to do. If it is determined that the learning mode is set, the processing procedure proceeds to step S9. On the other hand, if it is not determined that the learning mode is set, the processing procedure proceeds to step S6.

ステップＳ６において、動き判定部２３は、第１の座標軸ａ_Ｈ上における積分された輝度値Ｖが所定の閾値Ｔ未満の領域の動きに基づき、ユーザの手の動きを判定する（Ｓ６、動き判定ステップ）。続いて、出力部２５は、動き判定部２３により判定されたユーザの手の動きに基づく入力操作に関する情報をコンテンツ処理部３０に出力する（出力ステップ）。コンテンツ処理部３０のコンテンツ書換部３１は、出力部２５から取得した入力操作に関する情報に基づき表示部４０に表示させるコンテンツを書き換える（Ｓ７）。そして、表示部４０は、コンテンツ書換部３１により書き換えられたコンテンツを表示する（Ｓ８）。 In step S6, the motion determination unit 23, luminance values V integration in the first coordinate axis a _H is based on the movement of the region of less than a predetermined threshold value T, and determines the motion of the user's hand (S6, the motion determination Step). Subsequently, the output unit 25 outputs information related to the input operation based on the movement of the user's hand determined by the movement determination unit 23 to the content processing unit 30 (output step). The content rewriting unit 31 of the content processing unit 30 rewrites the content to be displayed on the display unit 40 based on the information regarding the input operation acquired from the output unit 25 (S7). Then, the display unit 40 displays the content rewritten by the content rewriting unit 31 (S8).

一方、ステップＳ９において、学習部２４は、積分された輝度値Ｖが所定の閾値Ｔ未満である領域の第１の座標軸ａ_Ｈ方向の移動と、ユーザの手の左右方向の動作との対応付けを学習する（Ｓ９）。こうして、本実施形態の処理を終了する。 On the other hand, in step S9, the learning section 24, the correspondence of the integrated luminance value V movement and the first coordinate axis a _H direction area is less than a predetermined threshold value T, a horizontal direction of movement of the user's hand Is learned (S9). In this way, the process of this embodiment is complete | finished.

以上説明した本実施形態の認識装置１及び認識方法では、ユーザの動作を捉えた画像データに基づきユーザの動作が動き判定部２３により判定され、判定されたユーザの動作に基づく入力操作に関する情報が出力部２５により出力されるので、タッチパネル非搭載の装置であってもユーザの動作に基づく入力操作を認識することが可能となる。また、１次元以上のデータにより表される画素から成る画像データから１次元のデータにより表される画像データが抽出部２１により抽出され、１次元のデータにより表される画像データに基づきユーザの動作が判定されるので、かかる判定処理における処理負荷は軽い。 In the recognition device 1 and the recognition method of the present embodiment described above, the motion determination unit 23 determines the user's motion based on the image data that captures the user's motion, and information on the input operation based on the determined user's motion. Since it is output by the output unit 25, it is possible to recognize an input operation based on the user's operation even in a device not equipped with a touch panel. Further, image data represented by one-dimensional data is extracted from image data composed of pixels represented by one-dimensional data or more, and the user's action is based on the image data represented by the one-dimensional data. Therefore, the processing load in the determination process is light.

以上、本発明をその実施形態に基づいて詳細に説明した。しかし、本発明は上記実施形態に限定されるものではない。本発明は、その要旨を逸脱しない範囲で様々な変形が可能である。 The present invention has been described in detail based on the embodiments. However, the present invention is not limited to the above embodiment. The present invention can be variously modified without departing from the gist thereof.

１…認識装置、１０…取得部、２０…判定部、２１…抽出部、２２…解析部、２３…判定部、２４…学習部、２５…出力部、３０…コンテンツ処理部、３１…コンテンツ書換部、３２…コンテンツ蓄積部、４０…表示部、ａ_Ｈ…座標軸、ａ_Ｖ…座標軸、Ｐ…画像データ。
DESCRIPTION OF SYMBOLS 1 ... Recognition apparatus, 10 ... Acquisition part, 20 ... Determination part, 21 ... Extraction part, 22 ... Analysis part, 23 ... Determination part, 24 ... Learning part, 25 ... Output part, 30 ... Content processing part, 31 ... Content rewriting Part, 32 ... content storage part, 40 ... display part, a _H ... coordinate axis, a _V ... coordinate axis, P ... image data.

Claims

A recognition device that recognizes an input operation based on a user's action,
An acquisition means for acquiring two-dimensional data in which a plurality of data elements represented by one or more dimensions of data are arranged in two dimensions and capturing a user's action;
Extraction means for extracting, from the two-dimensional data acquired by the acquisition means, two-dimensional data composed of the data elements represented by predetermined one-dimensional data of the one-dimensional or higher data;
Analysis means for analyzing a time-series change of the one-dimensional data in the data element based on the two-dimensional data extracted by the extraction means;
A motion determination unit that determines a user's action based on a time-series change of the one-dimensional data analyzed by the analysis unit;
And an output unit that outputs information related to an input operation based on a user action determined by the movement determination unit.

The obtaining unit obtains image data that captures a user's action and includes pixels represented by one-dimensional or more data.
The extraction unit extracts image data including pixels represented by predetermined one-dimensional data among the one-dimensional or higher data from the image data acquired by the acquisition unit,
The recognition apparatus according to claim 1, wherein the analysis unit analyzes a time-series change of the one-dimensional data based on the image data extracted by the extraction unit.

The acquisition means acquires image data represented by data of a plurality of dimensions including luminance values,
The recognition apparatus according to claim 2, wherein the one-dimensional data is a luminance value.

The analysis unit sets first and second coordinate axes that intersect each other in the image data, and calculates a luminance value of a pixel for each first coordinate value in the first coordinate axis along a direction of the second coordinate axis. The recognition apparatus according to claim 3, wherein the time series change of the integrated luminance value for each of the first coordinate values is analyzed.

5. The analysis unit according to claim 4, wherein the analysis unit analyzes a change in a region on the first coordinate axis in which an integrated luminance value for each of the first coordinate values is less than a predetermined threshold value. Recognition device.

The analysis means determines whether or not the region reflects the user's hand based on the number of regions whose integrated luminance values are less than a predetermined threshold on the first coordinate axis.
The said movement determination means determines the movement of a user's hand based on the movement of the area | region where the integrated brightness | luminance value on a said 1st coordinate axis is less than a predetermined | prescribed threshold value. Recognition device.

Learning means for learning a correspondence between a time-series change of the one-dimensional data analyzed by the analysis means and a user's action;
The motion determination means refers to the correspondence learned by the learning means based on the time series change of the one-dimensional data analyzed by the analysis means, and determines the time series of the one-dimensional data. The recognition apparatus according to any one of claims 1 to 6, wherein a user action corresponding to the change is determined.

Content processing means for rewriting content to be displayed according to information relating to the input operation output from the output means;
The recognition apparatus according to claim 1, further comprising display means for displaying the content rewritten by the content processing means.

The recognition apparatus according to claim 1, wherein the acquisition unit includes an imaging device that can be worn on a user's body.

A recognition method in a recognition device that recognizes an input operation based on a user's action,
An acquisition step of acquiring two-dimensional data in which a plurality of data elements represented by one or more dimensions of data are arranged in two dimensions and capturing a user's action;
An extraction step of extracting, from the two-dimensional data acquired in the acquisition step, two-dimensional data composed of the data elements represented by predetermined one-dimensional data among the one-dimensional or higher data;
An analysis step of analyzing a time series change of the one-dimensional data in the data element based on the two-dimensional data extracted in the extraction step;
A motion determination step of determining a user's action based on a time-series change of the one-dimensional data analyzed in the analysis step;
An output step of outputting information related to an input operation based on the user action determined in the movement determination step.