JP2015075750A

JP2015075750A - Image recognition device and image recognition method

Info

Publication number: JP2015075750A
Application number: JP2013214186A
Authority: JP
Inventors: ▲高▼橋　誠; 誠 ▲高▼橋; Makoto Takahashi; 渋谷　彰; Akira Shibuya; 彰渋谷; 小林　茂子; Shigeko Kobayashi; 茂子小林; 雄太樋口; Yuta Higuchi
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2013-10-11
Filing date: 2013-10-11
Publication date: 2015-04-20
Anticipated expiration: 2033-10-11
Also published as: JP6177655B2

Abstract

PROBLEM TO BE SOLVED: To correct an imaging object such as sign language or a gesture with a simple operation.SOLUTION: In an image recognition device 100, a translation control part 103 performs translation to the operation of an imaging object, and a parameter generation part 104 generates a first operation parameter, and an image display part 109 outputs a translation result, and the translation control part 103 performs translation to the operation of the imaging object acquired for correction again, and a parameter generation part 104 generates the second operation parameter of the operation of the imaging object acquired again, and a correction control part 105 compares the second operation parameter with the first operation parameter determined for each operation, and when the comparison result satisfies predetermined conditions, replaces the translation result of the operation corresponding to the first operation parameter with the translation result of the operation of the second operation parameter, and corrects the translation result, and an image display part 109 outputs the corrected translation result.

Description

本発明は、撮影対象者の動作を認識する画像認識装置および画像認識方法に関する。 The present invention relates to an image recognition apparatus and an image recognition method for recognizing an action of a person to be photographed.

一般的に、ジェスチャーや手話などを画像認識により翻訳する装置が知られている。このような装置においては、誤認識することが考えられることから、その修正を行うことが必要である。例えば、特許文献１に記載されているように、入力された手話の翻訳の結果、得られた候補をすべて表示し、それらを入力者がマウス等を用いて選択することができる装置が知られている。この特許文献１によれば、正しい翻訳結果を得ることができる。 In general, an apparatus that translates gestures, sign language, and the like by image recognition is known. In such an apparatus, it may be possible to make a correction because it may be erroneously recognized. For example, as described in Patent Document 1, a device is known in which all candidates obtained as a result of translation of input sign language are displayed and an input person can select them using a mouse or the like. ing. According to this Patent Document 1, a correct translation result can be obtained.

特開平６−３３７６２７号公報JP-A-6-337627

しかしながら、特許文献１に記載されている技術では、手話の認識結果を正しいものにするため、マウスなどのデバイスが必要となり、またその操作に手間のかかるものである。 However, in the technique described in Patent Document 1, a device such as a mouse is required to correct the recognition result of sign language, and the operation is troublesome.

そこで、本発明においては、手話やジェスチャーなどの撮影対象の翻訳結果を簡単な操作で修正することができる画像認識装置および画像認識方法を提供することを目的とする。 Therefore, an object of the present invention is to provide an image recognition apparatus and an image recognition method that can correct a translation result of a photographing target such as a sign language or a gesture by a simple operation.

上述の課題を解決するために、本発明の画像認識装置は、撮影対象を画像データにして取得する画像取得手段と、前記画像取得手段により取得された画像データに含まれる撮影対象の動作に対して、当該動作毎に翻訳を行う画像翻訳手段と、前記画像取得手段により取得された画像データに含まれる撮影対象の動作を示す第１の動作パラメータを、前記動作毎に生成する生成手段と、画像翻訳手段により翻訳された撮影対象の動作の翻訳結果を出力する結果出力手段と、前記結果出力手段により出力された翻訳結果を修正する修正制御手段と、を備え、前記画像取得手段は、前記結果出力手段により翻訳結果が出力された後、修正のための撮影対象の動作を含んだ画像データを再度取得し、前記画像翻訳手段は、前記画像取得手段により再度取得された撮影対象の動作に対する翻訳を行い、前記生成手段は、前記画像取得手段により再度取得された撮影対象の動作の第２の動作パラメータを生成し、前記修正制御手段は、前記第２の動作パラメータを、動作毎に定められた前記第１の動作パラメータのそれぞれと比較し、この比較の結果が所定の条件を満たした第１の動作パラメータに対応する動作の翻訳結果を、前記第２の動作パラメータの動作の翻訳結果に置き換えて修正し、前記結果出力手段は、前記修正制御手段により修正された翻訳結果を出力するように構成されている。 In order to solve the above-described problem, an image recognition apparatus according to the present invention provides an image acquisition unit that acquires a shooting target as image data, and an operation of the shooting target included in the image data acquired by the image acquisition unit. An image translating unit that performs translation for each operation, a generation unit that generates, for each operation, a first operation parameter indicating the operation of the imaging target included in the image data acquired by the image acquisition unit, A result output unit that outputs a translation result of the motion of the object to be photographed translated by the image translation unit; and a correction control unit that corrects the translation result output by the result output unit, wherein the image acquisition unit includes: After the translation result is output by the result output means, the image data including the operation of the photographing target for correction is acquired again, and the image translation means is again acquired by the image acquisition means. Translation is performed on the obtained motion of the photographing target, the generation means generates a second motion parameter of the motion of the photographing target acquired again by the image acquisition means, and the correction control means is the second control parameter. The operation parameter is compared with each of the first operation parameters determined for each operation, and the translation result of the operation corresponding to the first operation parameter satisfying a predetermined condition as a result of the comparison is obtained as the second operation parameter. The result output means is configured to output the translation result corrected by the correction control means.

また、本発明の画像認識方法において、撮影対象の動作を認識する画像認識装置において、撮影対象を画像データにして取得する画像取得ステップと、前記画像取得ステップにより取得された画像データに含まれる撮影対象の動作に対して、当該動作毎に翻訳を行う画像翻訳ステップと、前記画像取得ステップにより取得された画像データに含まれる撮影対象の動作を示す第１の動作パラメータを、前記動作毎に生成する生成ステップと、画像翻訳ステップにより翻訳された撮影対象の動作の翻訳結果を出力する結果出力ステップと、前記結果出力ステップにより出力された翻訳結果を修正する修正制御ステップと、を備え、前記画像取得ステップは、前記結果出力ステップにより翻訳結果が出力された後、修正のための撮影対象の動作を含んだ画像データを再度取得し、前記画像翻訳ステップは、前記画像取得ステップにより再度取得された撮影対象の動作に対する翻訳を行い、前記生成ステップは、前記画像取得ステップにより再度取得された撮影対象の動作の第２の動作パラメータを生成し、前記修正制御ステップは、前記第２の動作パラメータを、動作毎に定められた前記第１の動作パラメータのそれぞれと比較し、この比較の結果が所定の条件を満たした第１の動作パラメータに対応する動作の翻訳結果を、前記第２の動作パラメータの動作の翻訳結果に置き換えて修正し、前記結果出力ステップは、前記修正制御ステップにより修正された翻訳結果を出力する。 Further, in the image recognition method of the present invention, in an image recognition device for recognizing the operation of a shooting target, an image acquisition step for acquiring the shooting target as image data, and a shooting included in the image data acquired by the image acquisition step An image translation step for performing translation for each operation, and a first operation parameter indicating the operation of the imaging target included in the image data acquired by the image acquisition step is generated for each operation. A generation step for outputting, a result output step for outputting the translation result of the motion of the object to be photographed translated by the image translation step, and a correction control step for correcting the translation result output by the result output step. The acquisition step includes the operation of the photographing target for correction after the translation result is output by the result output step. Image data is acquired again, the image translation step translates the operation of the imaging target acquired again by the image acquisition step, and the generation step acquires the operation of the imaging target acquired again by the image acquisition step. The second operation parameter is generated, and the correction control step compares the second operation parameter with each of the first operation parameters determined for each operation, and the result of the comparison is a predetermined condition. The translation result of the operation corresponding to the first operation parameter satisfying the above is corrected by replacing the translation result of the operation of the second operation parameter, and the result output step includes the translation result corrected by the correction control step Is output.

この発明によれば、撮影対象の動作に対する翻訳を行うとともに、第１の動作パラメータを生成して、翻訳結果を出力し、その後、修正のための撮影対象の動作を含んだ画像データを再度取得する。そして、再度取得された撮影対象の動作に対する翻訳を行うとともに、再度取得された撮影対象の動作の第２の動作パラメータを生成する。この第２の動作パラメータを、動作毎に定められた第１の動作パラメータのそれぞれと比較し、この比較の結果が所定の条件を満たした第１の動作パラメータに対応する動作の翻訳結果を、第２の動作パラメータの動作の翻訳結果に置き換えて修正して、翻訳結果を出力する。 According to the present invention, the translation of the motion of the photographing target is performed, the first motion parameter is generated, the translation result is output, and then the image data including the motion of the photographing target for correction is acquired again. To do. Then, the translation of the operation of the imaging target acquired again is performed, and the second operation parameter of the operation of the imaging target acquired again is generated. The second operation parameter is compared with each of the first operation parameters determined for each operation, and the translation result of the operation corresponding to the first operation parameter for which the result of the comparison satisfies a predetermined condition is The translation result of the operation of the second operation parameter is modified and corrected, and the translation result is output.

これにより、修正対象を指定することなく、修正処理を行うことができる。特に、その修正対象を指定するための処理構成を備える必要がなく、その構成を簡易にするとともに、コストを低減することができる。 As a result, the correction process can be performed without specifying the correction target. In particular, it is not necessary to provide a processing configuration for designating the correction target, and the configuration can be simplified and the cost can be reduced.

また、本発明の画像認識装置において、前記画像翻訳手段が修正指示を示す動作を認識すると、前記画像取得手段は修正のための画像データの取得処理を実行する。 In the image recognition apparatus of the present invention, when the image translation unit recognizes an operation indicating a correction instruction, the image acquisition unit executes an image data acquisition process for correction.

この発明によれば、撮影対象が修正指示を示す動作をした場合、それを認識することで修正のための画像データの取得処理を実行する。これにより、修正指示のためのボタンなどの物理的な構成を必要とすることなく、修正指示を実行することができる。さらに、動作により修正を指示するため、撮影対象が近くにいる必要がなく、その使い勝手をよくすることができる。 According to the present invention, when the photographing target performs an operation indicating a correction instruction, the image data acquisition process for correction is executed by recognizing it. Thus, the correction instruction can be executed without requiring a physical configuration such as a button for the correction instruction. Furthermore, since the correction is instructed by the operation, it is not necessary for the subject to be photographed nearby, and the usability can be improved.

また、本発明の画像認識装置において、前記修正制御手段は、あらかじめ定めた条件を満たした場合、前記第２の動作パラメータを、動作毎に定められた前記第１の動作パラメータのそれぞれと比較する。 In the image recognition apparatus of the present invention, the correction control unit compares the second operation parameter with each of the first operation parameters determined for each operation when a predetermined condition is satisfied. .

また、本発明の画像認識装置において、前記修正制御手段は、前記あらかじめ定めた条件として、前記結果出力手段による翻訳結果の出力後、前記画像取得手段による画像取得から前記画像翻訳手段による第２の動作パラメータ生成までの時間が所定時間以内である場合に、動作パラメータの比較処理を行う。 In the image recognition apparatus according to the present invention, the correction control means may include, as the predetermined condition, after the output of the translation result by the result output means, from the image acquisition by the image acquisition means to the second by the image translation means. When the time until operation parameter generation is within a predetermined time, operation parameter comparison processing is performed.

また、本発明の画像認識装置において、前記修正制御手段は、前記あらかじめ定めた条件として、修正指示のための操作を受け付けた場合に、動作パラメータの比較処理を行う。 In the image recognition apparatus of the present invention, the correction control means performs an operation parameter comparison process when an operation for a correction instruction is received as the predetermined condition.

この発明によれば、あらかじめ定めた条件を満たした場合、第２の動作パラメータを、動作毎に定められた前記第１の動作パラメータのそれぞれと比較することで、修正対象となる動作を把握することができる。よって、その条件を満たしていない場合には、比較処理を行わず、その結果、通常の翻訳処理を行うことになる。 According to the present invention, when a predetermined condition is satisfied, the second operation parameter is compared with each of the first operation parameters determined for each operation, thereby grasping the operation to be corrected. be able to. Therefore, when the condition is not satisfied, the comparison process is not performed, and as a result, the normal translation process is performed.

例えば、あらかじめ定めた条件として、翻訳結果の出力後、画像取得から第２の動作パラメータ生成までの時間が所定時間以内であることが考えられる。この場合には、その動作に基づいた処理が所定時間以内、すなわちその動作が短いということは、一連の動作による意思伝達ではなく、ある一動作の修正をすることであると判断することができる。よって、そのような場合には、修正指示を判断し、修正対象のための各動作パラメータの比較処理を実行することで、その処理を自動的に行うことができる。 For example, as a predetermined condition, it is conceivable that the time from image acquisition to second operation parameter generation is within a predetermined time after the translation result is output. In this case, if the processing based on the operation is within a predetermined time, that is, the operation is short, it can be determined that the correction of a certain operation is not a communication of a series of operations. . Therefore, in such a case, it is possible to automatically perform the process by determining the correction instruction and executing the comparison process of each operation parameter for the correction target.

なお、このあらかじめ定めた条件としては、上述の通りすべてを自動的に行うことのほか、修正指示は物理的なボタンや、タッチパネルのボタンなどによる指示を受け付けることもできる。 In addition to automatically performing all of the predetermined conditions as described above, the correction instruction can be an instruction using a physical button, a touch panel button, or the like.

また、本発明の画像認識装置は、画像取得手段により取得された画像データの撮影対象の動作毎の、当該動作の始まりから終わりまでの時間を計測する計測手段をさらに備え、前記生成手段は、撮影対象の動作を示す動作パラメータに加えて、前記動作毎の時間を動作パラメータとして生成する。 Further, the image recognition apparatus of the present invention further includes a measuring unit that measures the time from the start to the end of the operation for each operation of the photographing target of the image data acquired by the image acquisition unit, In addition to the operation parameter indicating the operation of the imaging target, the time for each operation is generated as the operation parameter.

この発明によれば、画像データの撮影対象の動作毎の、当該動作の始まりから終わりまでの時間を計測し、撮影対象の動作を示す動作パラメータに加えて、当該動作毎の時間を動作パラメータとして生成する。これにより、パラメータは動作のみならず、その動作を構成する時間を加味することができ、修正対象となる動作の一致度を判断する際に、より正確に行うことができる。 According to the present invention, the time from the start to the end of the operation is measured for each operation of the imaging target of the image data, and in addition to the operation parameter indicating the operation of the imaging target, the time for each operation is used as the operation parameter. Generate. Thus, the parameter can take into account not only the operation but also the time for configuring the operation, and can be performed more accurately when determining the degree of coincidence of the operation to be corrected.

本発明によれば、修正対象を指定することなく、修正処理を行うことができる。特に、その修正対象を指定するための処理構成を備える必要がなく、その構成を簡易にするとともに、コストを低減することができる。 According to the present invention, correction processing can be performed without specifying a correction target. In particular, it is not necessary to provide a processing configuration for designating the correction target, and the configuration can be simplified and the cost can be reduced.

本実施形態の画像認識装置１００の機能を示すブロック図である。It is a block diagram which shows the function of the image recognition apparatus 100 of this embodiment. 画像認識装置１００のハードウェア構成図である。2 is a hardware configuration diagram of the image recognition apparatus 100. FIG. 手話者の動作を示す動作パラメータを説明するための図である。It is a figure for demonstrating the operation parameter which shows a signer's operation | movement. 手話者の動作とその翻訳結果を模式的に示した説明図である。It is explanatory drawing which showed typically a signer's operation | movement and its translation result. 動作パラメータを記述するパラメータテーブルを示す説明図である。It is explanatory drawing which shows the parameter table which describes an operation parameter. 画像認識装置１００の手話の翻訳処理を示すフローチャートである。5 is a flowchart illustrating a sign language translation process of the image recognition apparatus 100. 動作パラメータとして動作時間を考慮した変形例における翻訳処理を示すフローチャートである。It is a flowchart which shows the translation process in the modification which considered operation | movement time as an operation parameter. 変形例における画像認識装置１００ａの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the image recognition apparatus 100a in a modification. 修正指示を明示的に行わない変形例における翻訳処理を示すフローチャートである。It is a flowchart which shows the translation process in the modification which does not perform a correction instruction | indication explicitly. ネットワーク先のデータベースにアクセスする構成を有する画像認識装置１００ｂおよびサーバ２００の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the image recognition apparatus 100b and the server 200 which have the structure which accesses the database of a network destination.

添付図面を参照しながら本発明の実施形態を説明する。可能な場合には、同一の部分には同一の符号を付して、重複する説明を省略する。 Embodiments of the present invention will be described with reference to the accompanying drawings. Where possible, the same parts are denoted by the same reference numerals, and redundant description is omitted.

図１は、本実施形態の画像認識装置１００の機能を示すブロック図である。この画像認識装置１００は、手話者などのジェスチャーをテキストやイメージなどに翻訳するための装置であり、ジェスチャー入力部１０１（画像取得手段）、翻訳エンジン部１０２、画像表示部１０９（結果出力手段）、および通信制御部１１０を含んで構成されている。さらに、翻訳エンジン部１０２は、翻訳制御部１０３（画像翻訳手段）、パラメータ生成部１０４（生成手段）、修正制御部１０５（修正制御手段）、ジェスチャー認識ＤＢ１０６、意図解釈ＤＢ１０７、および文字翻訳ＤＢ１０８を含んで構成されている。この画像認識装置１００は、例えば、携帯端末やスマートフォンなどの携帯端末であることが好適である。 FIG. 1 is a block diagram illustrating functions of the image recognition apparatus 100 according to the present embodiment. The image recognition apparatus 100 is an apparatus for translating a gesture of a sign language or the like into text or an image, and includes a gesture input unit 101 (image acquisition unit), a translation engine unit 102, and an image display unit 109 (result output unit). , And a communication control unit 110. Further, the translation engine unit 102 includes a translation control unit 103 (image translation unit), a parameter generation unit 104 (generation unit), a correction control unit 105 (correction control unit), a gesture recognition DB 106, an intention interpretation DB 107, and a character translation DB 108. It is configured to include. The image recognition apparatus 100 is preferably a mobile terminal such as a mobile terminal or a smartphone.

図２は、画像認識装置１００のハードウェア構成図である。図１に示される画像認識装置１００は、物理的には、図２に示すように、一または複数のＣＰＵ１１、主記憶装置であるＲＡＭ１２及びＲＯＭ１３、入力デバイスであるキーボード及びマウス等の入力装置１４、ディスプレイ等の出力装置１５、ネットワークカード等のデータ送受信デバイスである通信モジュール１６、半導体メモリ等の補助記憶装置１７などを含むコンピュータシステムとして構成されている。図１における各機能は、図２に示すＣＰＵ１１、ＲＡＭ１２等のハードウェア上に所定のコンピュータソフトウェアを読み込ませることにより、ＣＰＵ１１の制御のもとで入力装置１４、出力装置１５、通信モジュール１６を動作させるとともに、ＲＡＭ１２や補助記憶装置１７におけるデータの読み出し及び書き込みを行うことで実現される。以下、図１に示す機能ブロックに基づいて、各機能ブロックを説明する。 FIG. 2 is a hardware configuration diagram of the image recognition apparatus 100. As shown in FIG. 2, the image recognition apparatus 100 shown in FIG. 1 physically includes one or a plurality of CPUs 11, a main memory RAM 12 and ROM 13, and input devices 14 such as a keyboard and a mouse. The computer system includes an output device 15 such as a display, a communication module 16 that is a data transmission / reception device such as a network card, an auxiliary storage device 17 such as a semiconductor memory, and the like. Each function in FIG. 1 operates the input device 14, the output device 15, and the communication module 16 under the control of the CPU 11 by loading predetermined computer software on the hardware such as the CPU 11 and the RAM 12 shown in FIG. 2. In addition, it is realized by reading and writing data in the RAM 12 and the auxiliary storage device 17. Hereinafter, each functional block will be described based on the functional blocks shown in FIG.

ジェスチャー入力部１０１は、手話者などの撮影対象者の動作を画像として取得する部分であり、例えばカメラである。 The gesture input unit 101 is a part that acquires an action of a person to be photographed such as a sign language as an image, and is a camera, for example.

翻訳エンジン部１０２は、ジェスチャー入力部１０１により取得された撮影対象者の動作に基づいて、手話を理解できない人に対して理解しやすいようにしたテキストまたはイメージに翻訳する部分である。この翻訳エンジン部１０２は、上述した通り、翻訳制御部１０３、パラメータ生成部１０４、修正制御部１０５、ジェスチャー認識ＤＢ１０６、意図解釈ＤＢ１０７、および文字翻訳ＤＢ１０８を含んだものである。この構成要素についてさらに詳述する。 The translation engine unit 102 is a part that translates text or images that are easy to understand for those who cannot understand sign language based on the action of the person to be photographed acquired by the gesture input unit 101. As described above, the translation engine unit 102 includes a translation control unit 103, a parameter generation unit 104, a correction control unit 105, a gesture recognition DB 106, an intention interpretation DB 107, and a character translation DB 108. This component will be further described in detail.

翻訳制御部１０３は、ジェスチャー入力部１０１で取得された手話者の動作を含んだ画像データを、ジェスチャー認識ＤＢ１０６、意図解釈ＤＢ１０７、および文字翻訳ＤＢ１０８を参照して、テキストやイメージに翻訳する部分である。なお、翻訳制御部１０３は、テキストなどの翻訳処理のほか、手話者の動作が修正指示などの制御のための動作であるか否かを認識することができる。 The translation control unit 103 is a part that translates image data including the signer's action acquired by the gesture input unit 101 into text or an image with reference to the gesture recognition DB 106, the intention interpretation DB 107, and the character translation DB 108. is there. The translation control unit 103 can recognize whether or not the signer's operation is an operation for control such as a correction instruction in addition to the text translation process.

パラメータ生成部１０４は、ジェスチャー入力部１０１で取得された手話者の動作を含んだ画像データから、当該動作を示す一または複数の動作パラメータを生成する部分である。この動作パラメータは、例えば図３に示されるものが挙げられる。 The parameter generation unit 104 is a part that generates one or a plurality of motion parameters indicating the motion from the image data including the motion of the sign language acquired by the gesture input unit 101. Examples of the operating parameters include those shown in FIG.

図３は、手話者の動作を示す動作パラメータを説明するための図である。図３（ａ）は、動作パラメータとその具体例を示す。図３（ａ）から明らかなとおり、この動作パラメータは、指の動作、指の数、掌の動作、左右方向の手の動作、上下方向の手の動作、手の回転動作の有無、および手の位置から構成される。各動作パラメータの内容は以下の通りである。
指の動作：一動作の中で，指の本数に変化があったか（変化の有無，0か1か）
指の数：一動作の中で，指の本数が何本だったか（何本の状態が一番多かったか，0〜5）
掌の動作：一動作の中で，掌の反転変化があったか（変化の有無，0か1か）
左右方向の手の動作：一動作の中で，左右方向の移動変化があったか（右→左，右←左，右←→左，なし，の0〜3）
上下方向の手の動作２：一動作の中で，上下方向の移動変化があったか（上→下，上←下，上⇔下，なし，の0〜3 ）
手の回転動作：一動作の中で，回転方向の移動変化があったか（変化の有無，0か1か）
手の位置：一動作の中で，手は主にどの位置にあったか（顔の前，首の前，身体の前，の0〜2） FIG. 3 is a diagram for explaining the operation parameters indicating the signer's operation. FIG. 3A shows operation parameters and specific examples thereof. As apparent from FIG. 3 (a), the operation parameters are: finger motion, number of fingers, palm motion, left and right hand motion, vertical hand motion, presence / absence of hand rotation motion, and hand It consists of the position of. The contents of each operation parameter are as follows.
Finger movement: Whether the number of fingers has changed in one movement (whether there is a change, 0 or 1)
Number of fingers: How many fingers were in one movement (how many were the most, 0-5)
Palm movement: Whether there was a flip change in one movement (whether there is a change, 0 or 1)
Left / right hand movement: Has there been a change in left / right movement in one movement (right to left, right to left, right to left, none, 0 to 3)
Up and down hand movement 2: Has there been a movement change in the up and down direction in one movement (up to down, up to down, up to down, up to down, none, 0 to 3)
Rotation of the hand: Whether there was a change in movement in the direction of rotation (whether there was a change, 0 or 1)
Hand position: The main position of the hand in one movement (0-2 in front of the face, in front of the neck, in front of the body)

図３（ｂ）は、「素直」を手話で表現した動作を示している。ここでは、両手で指文字の”も”を作り、その指先を胸にあてて同時に上下に引き離す動作を示している。この場合、指の動作や、指の数等は、０である。一方、手は上下方向に動いているため、「上下方向の手の動作」は、３で表される。また、手の位置は、体の前にあるため、２で表される。 FIG. 3B shows an operation in which “obedient” is expressed in sign language. Here, an operation is shown in which a finger character “mo” is created with both hands and the fingertip is applied to the chest and simultaneously pulled up and down. In this case, the finger motion, the number of fingers, etc. are zero. On the other hand, since the hand is moving in the vertical direction, the “motion of the hand in the vertical direction” is represented by 3. Moreover, since the position of the hand is in front of the body, it is represented by 2.

同様に、図３（ｃ）は、「心配」を手話で表現した動作を示している。ここでは、両手の指先で旨を２回叩く動作を示している。この場合、指の数は、５本認識されるため、５で表される。また、手の位置は、体の前にあるため、２で表される。 Similarly, FIG. 3C shows an operation in which “anxiety” is expressed in sign language. Here, an operation of tapping twice with the fingertips of both hands is shown. In this case, since the number of fingers is recognized, it is represented by 5. Moreover, since the position of the hand is in front of the body, it is represented by 2.

このようにして、指や手の動作・位置を翻訳制御部１０３が認識することにより、それに基づいてパラメータ生成部１０４は、各動作パラメータを生成することができる。 In this way, when the translation control unit 103 recognizes the motion / position of the finger or hand, the parameter generation unit 104 can generate each operation parameter based on the recognition.

図４は、手話者の動作とその翻訳結果を模式的に示した説明図である。図４（ａ）〜図４（ｇ）では、一連の手話の動作によって、「羨ましい、私は３日間しか休めないの」を表している。これは上述の翻訳制御部１０３による翻訳制御に基づいて得られた翻訳結果である。 FIG. 4 is an explanatory diagram schematically showing the actions of the signer and the translation results thereof. 4 (a) to 4 (g) represent "enviable, I can only rest for 3 days" by a series of sign language actions. This is a translation result obtained based on the translation control by the translation control unit 103 described above.

パラメータ生成部１０４は、このような手話の一連の動作から各動作の動作パラメータを生成することができる。図５は、パラメータ生成部１０４により生成された動作パラメータを記述するパラメータテーブルを示す説明図である。図５に示される通り、動作パラメータＰ１〜Ｐ７からなるパラメータセットで、各手話の動作毎に対応付けがなされている。例えば、「羨ましい」は、動作パラメータＰ２と動作パラメータＰ５とにおいて１が付与されている。これをパラメータ生成部１０４は、手話者が手話の一連の動作をすると、翻訳制御部１０３において認識された手話者の各動作に基づいて各動作パラメータＰ１〜Ｐ７からなるパラメータセットを生成する。なお、図５においては、オプションとして手話動作時間ｔも、一動作パラメータとして、各動作に対応付けられることも考えられる。これは一連の動作における動作毎にその動作時間が、パラメータ生成部１０４（計測手段）により計測されたものであり、動作パラメータを用いた一致度の計算において用いられる。 The parameter generation unit 104 can generate operation parameters for each operation from such a sequence of sign language operations. FIG. 5 is an explanatory diagram showing a parameter table describing the operation parameters generated by the parameter generation unit 104. As shown in FIG. 5, in the parameter set composed of operation parameters P1 to P7, association is made for each operation of each sign language. For example, “enviable” is assigned 1 in the operation parameter P2 and the operation parameter P5. The parameter generation unit 104 generates a parameter set including the operation parameters P1 to P7 based on each operation of the sign language recognized by the translation control unit 103 when the sign language performs a series of operations of the sign language. In FIG. 5, as an option, sign language operation time t may be associated with each operation as one operation parameter. The operation time of each operation in a series of operations is measured by the parameter generation unit 104 (measurement means), and is used in calculating the degree of coincidence using the operation parameters.

修正制御部１０５は、翻訳制御部１０３で翻訳された翻訳結果を修正する部分である。より詳しくは、修正制御部１０５は、あらかじめ定めた条件を満たしている場合に、修正指示前のパラメータ生成部１０４により生成された動作パラメータ（第1の動作パラメータ）と、修正指示後においてパラメータ生成部１０４により生成された動作パラメータ（第２の動作パラメータ）とを比較して、修正指示後における動作パラメータとの比較結果が所定条件を満たす修正指示前の動作パラメータに対応する動作の翻訳結果を、修正指示後の動作パラメータの動作の翻訳結果に置き換える修正を行う部分である。例えば、修正制御部１０５は、その一致度が所定値以上である場合に、修正処理を行うことができる。 The correction control unit 105 is a part that corrects the translation result translated by the translation control unit 103. More specifically, the correction control unit 105 generates the operation parameter (first operation parameter) generated by the parameter generation unit 104 before the correction instruction and the parameter generation after the correction instruction when a predetermined condition is satisfied. The operation parameter (second operation parameter) generated by the unit 104 is compared, and the result of comparison with the operation parameter after the correction instruction satisfies the predetermined condition satisfying the predetermined operation condition. This is a part for performing the correction to be replaced with the translation result of the operation parameter operation after the correction instruction. For example, the correction control unit 105 can perform the correction process when the degree of coincidence is a predetermined value or more.

図３を用いて説明する。図３（ｂ）と図３（ｃ）とにおいて、手話者の手話動作はよく似ている。図３（ａ）に示される通り、各動作パラメータにおいて、指の動作等は一致しているが、指の数や、上下方向の手の動作などが相違している。修正制御部１０５は、これら各動作パラメータに基づいて類似度７１．４％（５／７）を算出することができ、この類似度が最も高い動作の翻訳結果を修正対象として判断することができる。
このような修正対象の判断手法の適用例を、図４を用いて説明する。図４は、手話の動作、その動作毎の翻訳結果を示す説明図である。図４（ａ）〜図４（ｇ）で示される通り、手話者は、７つからなる動作をしている。ここで、図４（ｃ）および図４（ｄ）では、「３日間」と表現している部分がある。この“３”の部分を、例えば“４”に修正したい場合、ユーザは、修正指示をしてから４本指を示した動作をすることになる。翻訳制御部１０３は、これを認識し、そしてパラメータ生成部１０４は、この“４”を示す動作パラメータ（第２の動作パラメータ）を生成する。修正制御部１０５は、この“４”を示す動作パラメータと一致度の高い動作パラメータ（第1の動作パラメータ）を、図４で示される一連の動作から探索し、所定の類似度の動作パラメータ（第1の動作パラメータ）を持つ動作の翻訳結果を、修正後の翻訳結果に置き換える処理を行う。ここでは、図４（ｃ）で示される“３”は、“４”と一致度が高いため（指の数のみが異なっているため）、“３”という翻訳結果を“４”という翻訳結果に置き換える処理を行うことになる。 This will be described with reference to FIG. In FIG. 3B and FIG. 3C, the sign language operation of the sign language is very similar. As shown in FIG. 3A, the finger motions and the like are the same in each motion parameter, but the number of fingers and the hand motions in the vertical direction are different. The correction control unit 105 can calculate a similarity of 71.4% (5/7) based on each of the operation parameters, and can determine the translation result of the operation with the highest similarity as a correction target. .
An application example of such a correction target determination method will be described with reference to FIG. FIG. 4 is an explanatory diagram showing sign language actions and translation results for each action. As shown in FIG. 4A to FIG. 4G, the sign language person is performing seven operations. Here, in FIG. 4C and FIG. 4D, there is a portion expressed as “3 days”. When it is desired to correct this “3” portion to “4”, for example, the user performs an operation indicating four fingers after giving a correction instruction. The translation control unit 103 recognizes this, and the parameter generation unit 104 generates an operation parameter (second operation parameter) indicating “4”. The correction control unit 105 searches for a motion parameter (first motion parameter) having a high degree of coincidence with the motion parameter indicating “4” from the series of motions illustrated in FIG. A process of replacing the translation result of the operation having the first operation parameter) with the corrected translation result is performed. Here, since “3” shown in FIG. 4C has a high degree of coincidence with “4” (only the number of fingers is different), the translation result of “3” is the translation result of “4”. Will be replaced.

図１に戻り引き続きブロック図について説明する。ジェスチャー認識ＤＢ１０６は、手話者の動作を翻訳制御部１０３に認識させるための動作のパターンを記述するデータベースである。これはジェスチャー認識のための一般的なデータベースである。 Returning to FIG. 1, the block diagram will be described. The gesture recognition DB 106 is a database describing an operation pattern for causing the translation control unit 103 to recognize a signer's operation. This is a general database for gesture recognition.

意図解釈ＤＢ１０７は、翻訳制御部１０３が、ジェスチャー認識ＤＢ１０６を参照して、ある程度の動作を認識すると、その意味を解釈するためのデータベースである。ここには動作のパターンとその意味（意図）とが対応付けて記述されることになる。これも一般的なジェスチャー解析のためのデータベースである。 The intention interpretation DB 107 is a database for interpreting the meaning when the translation control unit 103 recognizes a certain amount of motion with reference to the gesture recognition DB 106. Here, an operation pattern and its meaning (intention) are described in association with each other. This is also a database for general gesture analysis.

文字翻訳ＤＢ１０８は、意図解釈ＤＢ１０７で解釈される意味を一般的な文章に翻訳するためのデータベースである。例えば、「自分」を指し示し、それが主語であると意図解釈ＤＢ１０７を用いて解釈されると、「私は」というように翻訳するためのデータベースである。このデータベースも、ジェスチャー翻訳における一般的なデータベースである。 The character translation DB 108 is a database for translating meanings interpreted by the intention interpretation DB 107 into general sentences. For example, if it is interpreted using the intention interpretation DB 107 to indicate “self” and the subject, it is a database for translation such as “I am”. This database is also a general database in gesture translation.

画像表示部１０９は、翻訳制御部１０３において翻訳された翻訳結果や、修正制御部１０５により修正された翻訳結果を表示する部分である。 The image display unit 109 is a part that displays the translation result translated by the translation control unit 103 and the translation result corrected by the correction control unit 105.

このように構成された画像認識装置１００は、さらに、タブレット型の携帯端末であるとすると、背面側（画像表示部１０９の反対側の面）にカメラであるジェスチャー入力部１０１が配置されている。そして、そのジェスチャー入力部１０１により撮影対象者が撮影され、翻訳エンジン部１０２により翻訳され、画像表示部１０９にその翻訳結果が表示される。 If the image recognition apparatus 100 configured as described above is a tablet-type mobile terminal, a gesture input unit 101 that is a camera is disposed on the back side (the surface opposite to the image display unit 109). . Then, the person to be photographed is photographed by the gesture input unit 101, translated by the translation engine unit 102, and the translation result is displayed on the image display unit 109.

この画像認識装置１００を保持しているユーザ（健常者などで、手話を理解できない人）は、その翻訳結果を撮影対象者（手話者）に向けて、確認させる。撮影対象者は修正したい場合には、再度、ジェスチャー入力部１０１側をユーザに向けてもらい、所定の修正指示を行うことができる。 A user holding this image recognition apparatus 100 (a person who is healthy and cannot understand sign language) checks the translation result toward the person to be photographed (sign language person). If the person to be photographed wants to correct, the user can again point the gesture input unit 101 side and give a predetermined correction instruction.

なお、画像認識装置１００の両面（表面および背面）に画像表示部１０９を配置し、背面側にジェスチャー入力部１０１を配置することにより、ユーザは、画像認識装置１００の向きを変えること必要がなくなる。 In addition, by arranging the image display unit 109 on both sides (front and back) of the image recognition apparatus 100 and arranging the gesture input unit 101 on the back side, the user does not need to change the orientation of the image recognition apparatus 100. .

つぎに、このように構成された画像認識装置１００による手話の翻訳処理について説明する。図６は、画像認識装置１００の手話の翻訳処理を示すフローチャートである。 Next, a sign language translation process performed by the image recognition apparatus 100 configured as described above will be described. FIG. 6 is a flowchart showing sign language translation processing of the image recognition apparatus 100.

撮影対象となる手話者の手話動作が、ジェスチャー入力部１０１により撮影され、入力される（Ｓ１０１）。撮影された手話動作は、翻訳制御部１０３により翻訳されるとともに（Ｓ１０２）、パラメータ生成部１０４により、翻訳制御部１０３により認識された手話の動作に基づいて動作パラメータ（第1の動作パラメータ）が生成され、一時記憶される（Ｓ１０３）。これら手話の一連の動作、すなわち一文が終了するまで行われる（Ｓ１０４）。ここで一文が終了、すなわち、一連の手話動作終了後、所定時間手話動作がないと、翻訳制御部１０３により判断されると、画像表示部１０９に翻訳結果が表示される（Ｓ１０５）。 The sign language action of the sign language to be photographed is photographed and input by the gesture input unit 101 (S101). The photographed sign language motion is translated by the translation control unit 103 (S102), and the parameter generation unit 104 sets the motion parameter (first motion parameter) based on the sign language motion recognized by the translation control unit 103. Generated and temporarily stored (S103). A series of these sign language operations, that is, until one sentence is completed (S104). If the translation control unit 103 determines that one sentence ends, that is, there is no sign language operation for a predetermined time after the end of a series of sign language operations, the translation result is displayed on the image display unit 109 (S105).

ここで、修正指示が受け付けられると、修正処理が開始される（Ｓ１０６）。例えば、修正指示としては、手話者やその他のユーザによる所定の修正ボタン（図示せず）などの押下が制御部（図示せず）により検知されたり、手話者による所定のジェスチャー（修正指示を示すもの）が翻訳制御部１０３により認識されることなどが考えられる。 Here, when a correction instruction is accepted, a correction process is started (S106). For example, as a correction instruction, pressing of a predetermined correction button (not shown) by a sign language person or another user is detected by a control unit (not shown), or a predetermined gesture (a correction instruction is shown by a sign language person). It is conceivable that the translation control unit 103 recognizes the

そして、手話者やほかのユーザにより修正指示がなされると、ジェスチャー入力部１０１により、手話者の動作の撮影が開始される（Ｓ１０７）。そして、上述の修正指示前の撮影処理および翻訳処理と同様に、翻訳制御部１０３による翻訳処理が行われ（Ｓ１０８）、パラメータ生成部１０４により動作パラメータ（第２の動作パラメータ）が生成され、記憶される（Ｓ１０９）。ここで、修正指示後の動作パラメータと修正指示前の動作パラメータとが一致する動作、または所定の一致度をもった動作（手話文節）が、パラメータテーブルに記憶されているか否かが、修正制御部１０５により判断される（Ｓ１１０）。 Then, when a correction instruction is given by a sign language person or another user, the gesture input unit 101 starts photographing the action of the sign language (S107). Similar to the above-described photographing process and translation process before the correction instruction, the translation control unit 103 performs a translation process (S108), and the parameter generation unit 104 generates an operation parameter (second operation parameter) for storage. (S109). Here, whether the operation parameter after the correction instruction and the operation parameter before the correction instruction match, or the operation (sign language phrase) having a predetermined degree of coincidence is stored in the parameter table is corrected control. The determination is made by the unit 105 (S110).

修正指示後の動作パラメータと修正指示前の動作パラメータが一致する動作、または所定の一致度をもった動作が、パラメータテーブルに記憶されている場合、修正制御部１０５により、修正指示後の動作パラメータを構成する動作の翻訳結果に、修正指示前の動作パラメータを構成する動作の翻訳結果を置き換える処理が行われる（Ｓ１１１）。そして、画像表示部１０９にその置き換えられた全翻訳結果が表示される（Ｓ１１１）。翻訳制御部１０３により、次の入力処理があると判断されると、再度撮影処理や翻訳処理などが繰り返し行われる（Ｓ１１２）。なお、Ｓ１１０において、Ｓ１０７に戻る処理が行われているが、これに限らず、一致する文節がない場合には、エラーとしてもよい。 When the operation parameter after the correction instruction matches the operation parameter before the correction instruction, or the operation having a predetermined degree of coincidence is stored in the parameter table, the correction control unit 105 causes the operation parameter after the correction instruction. A process of replacing the translation result of the action that constitutes the operation parameter before the correction instruction with the translation result of the action that constitutes (S111). Then, the replaced entire translation result is displayed on the image display unit 109 (S111). If the translation control unit 103 determines that there is a next input process, the imaging process and the translation process are repeated again (S112). In S110, the process of returning to S107 is performed. However, the present invention is not limited to this, and an error may be generated when there is no matching phrase.

これにより、手話を理解できない人は、手話者からの伝達事項を正しく理解することができる。そして、次の入力がある場合には、Ｓ１０１に戻り、繰り返し、手話の翻訳処理およびその修正処理が行われる。 As a result, a person who cannot understand sign language can correctly understand the matters transmitted from the sign language. If there is the next input, the process returns to S101, and the sign language translation process and its correction process are repeated.

なお、Ｓ１１０において、修正制御部１０５は、それぞれの動作パラメータが一致するかまたはその一致度が所定値以上であることを判断することにより、修正対象となる動作を特定しているが、これに限るものではない。すなわち、それぞれの動作パラメータに優先順位をつけるたり、重みづけをしたりして、動作パラメータごとの一致度の判断に強弱をつけるようにしてもよい。例えば、動作パラメータＰ１が重要である場合には、他の動作パラメータが一致していたとしても、動作パラメータＰ１が一致していない場合には、一致度の所定条件を満たしていないなどの判断処理を行うようにしてもよい。 In S110, the correction control unit 105 identifies the operation to be corrected by determining whether the respective operation parameters match or the matching degree is equal to or greater than a predetermined value. It is not limited. That is, priority may be given to each operation parameter, or weighting may be performed so that the degree of coincidence is determined for each operation parameter. For example, if the operation parameter P1 is important, even if other operation parameters match, if the operation parameter P1 does not match, a determination process such as not satisfying a predetermined condition of the matching degree May be performed.

また、修正指示前と修正指示後とでそれぞれの動作パラメータが一致していた場合に、１を加算することが通常とした場合、重要な動作パラメータについては、所定係数をかけたものとすることにより、重み付け処理をするようにしてもよい。 In addition, when the operation parameters are the same before and after the correction instruction, and when it is normal to add 1, if the operation parameter is normal, it is assumed that the important operation parameter is multiplied by a predetermined coefficient. Thus, a weighting process may be performed.

つぎに、図７における翻訳処理の変形例について説明する。図７は、変形例における翻訳処理を示すフローチャートである。この変形例においては、動作パラメータとして、動作の時間を含んだ点で上述図６とは異なっている。 Next, a modification of the translation process in FIG. 7 will be described. FIG. 7 is a flowchart showing translation processing in the modification. This modification differs from FIG. 6 in that the operation parameter includes the operation time.

図７に示される通り、ジェスチャー入力部１０１により、手話動作が撮影され、翻訳制御部１０３により翻訳されると（Ｓ１０１、Ｓ１０２）、パラメータ生成部１０４により動作パラメータ（第1の動作パラメータ）が生成されるとともに、パラメータ生成部１０４により、各手話の動作毎（いわゆる手話文節ごと）の時間が計測され、パラメータテーブルに記憶される（Ｓ１０３ａ）。そして、翻訳結果が画像表示部１０９に表示され（Ｓ１０５）、修正指示がなされると（Ｓ１０６）、再度手話動作が撮影され、翻訳される（Ｓ１０７、Ｓ１０８）。ここでも、パラメータ生成部１０４により、修正指示後の動作パラメータ（第２の動作パラメータ）が生成されるとともに、その動作毎（いわゆる手話文節ごと）の時間が計測され、パラメータテーブルに記憶される（Ｓ１０９ａ）。そして、修正指示後および修正指示前におけるそれぞれの動作パラメータが一致する動作、または所定の一致度がある動作（手話文節）がある場合には（Ｓ１１０）、その動作に対応する翻訳結果を、修正指示後の動作の翻訳結果に置き換える処理が行われ、画像表示部１０９に表示される（Ｓ１１１）。そして、手話の入力がある限り、これら処理が繰り返し行われる（Ｓ１１２）。なお、Ｓ１１０において、Ｓ１０７に戻る処理が行われているが、これに限らず、一致する文節がない場合には、エラーとしてもよい。 As shown in FIG. 7, when a sign language action is photographed by the gesture input unit 101 and translated by the translation control unit 103 (S101, S102), an operation parameter (first operation parameter) is generated by the parameter generation unit 104. At the same time, the parameter generation unit 104 measures the time for each sign language operation (so-called sign language phrase) and stores it in the parameter table (S103a). Then, the translation result is displayed on the image display unit 109 (S105), and when a correction instruction is given (S106), the sign language action is taken again and translated (S107, S108). Also here, the parameter generation unit 104 generates an operation parameter (second operation parameter) after the correction instruction, and measures the time for each operation (so-called sign language phrase) and stores it in the parameter table ( S109a). Then, when there is an operation in which the respective operation parameters match after the correction instruction and before the correction instruction, or there is an operation (sign language phrase) having a predetermined degree of coincidence (S110), the translation result corresponding to the operation is corrected. Processing to replace the translated result of the operation after the instruction is performed and displayed on the image display unit 109 (S111). Then, as long as there is a sign language input, these processes are repeated (S112). In S110, the process of returning to S107 is performed. However, the present invention is not limited to this, and an error may be generated when there is no matching phrase.

このように手話動作の時間を動作パラメータの一つとすることで、より正確な一致度の判定を行うことができる。 In this way, by using the sign language operation time as one of the operation parameters, it is possible to determine the degree of coincidence more accurately.

つぎに、別の変形例について説明する。図８は、この変形例における画像認識装置１００ａの機能構成を示すブロック図である。この変形例においては、手話者は修正指示をあらかじめ行うことなく、自動的に画像認識装置１００ａにて修正指示があったものか否かを判断することができる。その処理を実現するためには、この画像認識装置１００ａは、修正制御部１０５ａおよびそれに内蔵されるタイマー１０５ｂを、修正制御部１０５に代えて備えている。 Next, another modification will be described. FIG. 8 is a block diagram showing a functional configuration of the image recognition apparatus 100a in this modification. In this modification, the signer can automatically determine whether or not there has been a correction instruction in the image recognition apparatus 100a without giving a correction instruction in advance. In order to realize the processing, the image recognition apparatus 100 a includes a correction control unit 105 a and a timer 105 b built therein instead of the correction control unit 105.

この修正制御部１０５ａは、画像表示部１０９が翻訳結果を一旦表示した後、ジェスチャー入力部１０１が手話者の撮影を行うように制御するとともに、タイマー１０５ｂの計測を開始させる。そして、ジェスチャー入力部１０１において取得された手話者の動作を翻訳制御部１０３が翻訳するとともに、パラメータ生成部１０４がその動作に対する動作パラメータを生成すると、タイマー１０５ｂの計測処理を停止させる。そして、修正制御部１０５ａが、タイマー１０５ｂにより計測された翻訳処理時間およびパラメータ生成時間が所定時間内で終了したと判断した場合には、再取得した動作に対する動作パラメータの一致度に応じて、修正処理を実行することができる。なお、動作パラメータの比較処理については上述のとおり各動作パラメータに重み付けをしたり、優先順位をつけるようにしてもよい。 The correction control unit 105a controls the gesture input unit 101 to take a photograph of the sign language after the image display unit 109 once displays the translation result, and starts the measurement of the timer 105b. Then, when the translation control unit 103 translates the signer's movement acquired in the gesture input unit 101 and the parameter generation unit 104 generates an operation parameter for the movement, the measurement process of the timer 105b is stopped. If the correction control unit 105a determines that the translation processing time and parameter generation time measured by the timer 105b have ended within a predetermined time, the correction control unit 105a corrects according to the degree of matching of the operation parameters with respect to the re-acquired operation. Processing can be executed. In the operation parameter comparison processing, each operation parameter may be weighted or prioritized as described above.

なお、上述の実施形態と同様に、修正するごとに、この画像認識装置１００ａの画像表示部１０９が配置されている側の手話者に向けて、その翻訳結果を確認させるとともに、確認後再度ジェスチャー入力部１０１側を手話者に向けて、継続した手話の翻訳か、またはその修正か判断するようにしてもよい。また、画像認識装置１００ａの両面に画像表示部１０９を配置してもよい。 As in the above-described embodiment, every time correction is performed, the signer on the side where the image display unit 109 of the image recognition apparatus 100a is arranged is checked for the translation result, and the gesture is again performed after the check. The input unit 101 side may be directed to the sign language to determine whether the sign language has been continuously translated or modified. Further, the image display units 109 may be arranged on both sides of the image recognition apparatus 100a.

図９は、その具体的な処理を示すフローチャートである。撮影対象となる手話者の手話動作が、ジェスチャー入力部１０１により撮影され、入力される（Ｓ２０１）。撮影された手話動作は、翻訳制御部１０３により翻訳されるとともに（Ｓ２０２）、パラメータ生成部１０４により、翻訳制御部１０３により認識された手話の動作に基づいて動作パラメータ（第1の動作パラメータ）が生成され、一時記憶される（Ｓ２０３）。これら手話の一連の動作、すなわち一文が終了するまで行われる（Ｓ２０４）。ここで一文が終了、すなわち、所定時間手話動作がないと、翻訳制御部１０３により判断されると、画像表示部１０９に翻訳結果が表示される（Ｓ２０５）。 FIG. 9 is a flowchart showing the specific processing. The sign language action of the sign language to be photographed is photographed and input by the gesture input unit 101 (S201). The photographed sign language motion is translated by the translation control unit 103 (S202), and the parameter generation unit 104 sets the motion parameter (first motion parameter) based on the sign language motion recognized by the translation control unit 103. Generated and temporarily stored (S203). A series of these sign language operations, that is, until one sentence is completed (S204). Here, when one sentence ends, that is, when there is no sign language operation for a predetermined time, the translation control unit 103 determines that the translation result is displayed on the image display unit 109 (S205).

そして、ジェスチャー入力部１０１により、手話者の撮影が再開され（Ｓ２０６）、手話動作時間の計測のためのタイマー１０５ｂの計測が開始される（Ｓ２０７）。ここでは、このタイマー１０５ｂは、修正制御部１０５ａ内に内蔵されているものとするが、特にこれに限定するものではない。翻訳制御部１０３により、ジェスチャー入力部１０１により入力された手話者の動作に基づいて翻訳が行われる（Ｓ２０８）。この翻訳とともに、パラメータ生成部１０４により、手話者の動作に基づいた動作パラメータ（第２の動作パラメータ）が生成される（Ｓ２０９）。動作パラメータが生成されると、手話動作時間の計測のためのタイマー１０５ｂによる計測処理が停止され、その計測時間が所定時間内か否かが、修正制御部１０５ａにより判断される（Ｓ２１０）。 Then, the gesture input unit 101 resumes shooting of the signer (S206), and measurement of the timer 105b for measuring the sign language operation time is started (S207). Here, the timer 105b is assumed to be built in the correction control unit 105a, but is not particularly limited thereto. The translation control unit 103 performs translation based on the signer's movement input by the gesture input unit 101 (S208). Along with this translation, the parameter generation unit 104 generates an operation parameter (second operation parameter) based on the signer's operation (S209). When the operation parameter is generated, the measurement process by the timer 105b for measuring the sign language operation time is stopped, and whether or not the measurement time is within a predetermined time is determined by the correction control unit 105a (S210).

ここで、再度取得された動作（手話文節）に基づいて生成された動作パラメータが、先に取得された動作に基づいて生成された動作パラメータと一致する、またはその一致度が所定条件を満たしていると（Ｓ２１１）、修正制御部１０５により、判断されると、再度取得された手話者の動作は、次の手話のための動作ではなく、修正のための動作であると判断することができる。なお、Ｓ２１１において、一致するものがない場合には、Ｓ２０６に戻る処理が行われているが、これに限らず、一致する文節がない場合には、エラーとしてもよい。 Here, the motion parameter generated based on the motion (sign language phrase) acquired again matches the motion parameter generated based on the motion acquired earlier, or the matching degree satisfies a predetermined condition. If it is determined by the correction control unit 105 (S211), it is possible to determine that the operation of the signer acquired again is not an operation for the next sign language but an operation for correction. . If there is no match in S211, the process returns to S206. However, the present invention is not limited to this, and an error may be generated if there is no matching phrase.

そして、一致するまたは一致度が所定条件を満たしている動作パラメータに対応する動作の翻訳結果を置き換える処理が、修正制御部１０５により行われ、画像表示部１０９に表示される（Ｓ２１２）。そして、これら処理が、ジェスチャー入力がある限り繰り返し行われる（Ｓ２１３）。 Then, a process of replacing the translation result of the operation corresponding to the operation parameter that matches or satisfies the predetermined condition is performed by the correction control unit 105 and displayed on the image display unit 109 (S212). These processes are repeated as long as there is a gesture input (S213).

このように、修正指示などの具体的な指示を手話者や他のユーザが行うことなく、自動的に修正を行うことができ、使い勝手の良い手話翻訳のための画像認識装置を提供することができる。 Thus, it is possible to provide an image recognition apparatus for sign language translation that can be automatically corrected without a specific instruction such as a correction instruction being performed by a sign language person or another user, and is easy to use. it can.

上述実施形態および変形例においては、画像認識装置１００の中に、ジェスチャー認識ＤＢ１０６、意図解釈ＤＢ１０７、および文字翻訳ＤＢ１０８を含み、その装置単体で機能するものであるが、これに限るものではない。これら各種ＤＢは、ネットワークを介して接続されるサーバに備えられ、画像認識装置１００ｂは、このサーバの各種ＤＢにアクセスすることにより翻訳処理を行うようにしてもよい。 In the above-described embodiment and modification, the image recognition apparatus 100 includes the gesture recognition DB 106, the intention interpretation DB 107, and the character translation DB 108, and functions as a single apparatus, but is not limited thereto. These various DBs may be provided in a server connected via a network, and the image recognition apparatus 100b may perform translation processing by accessing the various DBs of the server.

図１０は、その具体的な構成を示すブロック図である。図１０に示される通り、画像認識装置１００ｂは、ジェスチャー入力部１０１、翻訳エンジン部１０２、画像表示部１０９、および通信制御部１１０を含んで構成されている。翻訳エンジン部１０２は、上述実施形態と同様に、翻訳制御部１０３、パラメータ生成部１０４、および修正制御部１０５を含んで構成されている。 FIG. 10 is a block diagram showing a specific configuration thereof. As illustrated in FIG. 10, the image recognition apparatus 100 b includes a gesture input unit 101, a translation engine unit 102, an image display unit 109, and a communication control unit 110. The translation engine unit 102 includes a translation control unit 103, a parameter generation unit 104, and a correction control unit 105, as in the above embodiment.

翻訳エンジン部１０２は、ジェスチャー入力部１０１から手話者の動作を含んだ画像データが入力されると、通信制御部１１０を介してサーバ２００のジェスチャー認識ＤＢ２０６、意図解釈ＤＢ２０７、および文字認識ＤＢ２０８に対してアクセスして、翻訳処理を行う。 When the image data including the signer's action is input from the gesture input unit 101, the translation engine unit 102 receives the gesture recognition DB 206, the intention interpretation DB 207, and the character recognition DB 208 of the server 200 via the communication control unit 110. To access and translate.

画像表示部１０９は、翻訳エンジン部１０２により翻訳された翻訳結果を表示する。 The image display unit 109 displays the translation result translated by the translation engine unit 102.

修正処理を行う場合には、ジェスチャー入力部１０１は、再度手話者の動作を含んだ画像データを入力し、翻訳エンジン部１０２は、この画像データに対する翻訳処理を行う。 When performing the correction process, the gesture input unit 101 inputs image data including the signer's action again, and the translation engine unit 102 performs a translation process on the image data.

サーバ２００は、ジェスチャー認識ＤＢ２０６、意図解釈ＤＢ２０７、および文字認識ＤＢ２０８を有しており、画像認識装置１００ｂや、その他通信端末１００ｘ〜１００ｚからの翻訳要求を受け付けた場合、各ＤＢに対する参照を許可するように構成されている。 The server 200 has a gesture recognition DB 206, an intention interpretation DB 207, and a character recognition DB 208. When a translation request is received from the image recognition device 100b or other communication terminals 100x to 100z, the server 200 permits reference to each DB. It is configured as follows.

つぎに、本実施形態および各変形例における画像認識装置１００、１００ａ、および１００ｂの作用効果について説明する。 Next, functions and effects of the image recognition devices 100, 100a, and 100b in the present embodiment and the modifications will be described.

本実施形態の画像認識装置１００によれば、翻訳制御部１０３は、ジェスチャー入力部１０１により入力された撮影対象の動作に対する翻訳を行うとともに、パラメータ生成部１０４は、修正指示前の動作パラメータである第１の動作パラメータを生成して、画像表示部１０９は翻訳結果を出力する。その後、ジェスチャー入力部１０１は、修正のための撮影対象の動作を含んだ画像データを再度取得する。そして、翻訳制御部１０３は、再度取得された撮影対象の動作に対する翻訳を行うとともに、パラメータ生成部１０４は、修正指示後の動作パラメータであって、再度取得された撮影対象の動作の第２の動作パラメータを生成する。 According to the image recognition apparatus 100 of the present embodiment, the translation control unit 103 translates the motion of the photographing target input by the gesture input unit 101, and the parameter generation unit 104 is a motion parameter before the correction instruction. The first operation parameter is generated, and the image display unit 109 outputs the translation result. After that, the gesture input unit 101 obtains again image data including the action of the photographing target for correction. Then, the translation control unit 103 translates the operation of the imaging target acquired again, and the parameter generation unit 104 is the operation parameter after the correction instruction, and the second of the operation of the imaging target acquired again. Generate operating parameters.

修正制御部１０５は、この第２の動作パラメータを、動作毎に定められた第１の動作パラメータのそれぞれと比較し、この比較の結果が所定の条件を満たした第１の動作パラメータに対応する動作の翻訳結果を、第２の動作パラメータの動作の翻訳結果に置き換えて修正して、画像表示部１０９は、修正された翻訳結果を出力する。 The correction control unit 105 compares the second operation parameter with each of the first operation parameters determined for each operation, and the result of this comparison corresponds to the first operation parameter that satisfies a predetermined condition. The translation result of the motion is replaced with the translation result of the motion of the second motion parameter, and the image display unit 109 outputs the corrected translation result.

なお、変形例における画像認識装置１００ａも同様の作用効果を奏する。 Note that the image recognition apparatus 100a according to the modification also has the same effect.

また、本実施形態の画像認識装置１００または変形例における画像認識装置１００ａによれば、翻訳制御部１０３は、撮影対象である手話者が修正指示を示す動作をした場合、それを認識すると、ジェスチャー入力部１０１は、修正のための画像データの取得処理を実行する。これにより、修正指示のためのボタンなどの物理的な構成を必要とすることなく、修正指示を実行することができる。さらに、動作により修正を指示するため、撮影対象が近くにいる必要がなく、その使い勝手をよくすることができる。 In addition, according to the image recognition device 100 of the present embodiment or the image recognition device 100a in the modification, the translation control unit 103 recognizes the gesture when the sign language person who is the photographing target performs an operation indicating the correction instruction, The input unit 101 executes image data acquisition processing for correction. Thus, the correction instruction can be executed without requiring a physical configuration such as a button for the correction instruction. Furthermore, since the correction is instructed by the operation, it is not necessary for the subject to be photographed nearby, and the usability can be improved.

また、変形例の画像認識装置１００ａによれば、修正制御部１０５は、あらかじめ定めた条件を満たした場合、第２の動作パラメータを、動作毎に定められた第１の動作パラメータのそれぞれと比較することで、修正対象となる動作を把握することができる。よって、その条件を満たしていない場合には、比較処理を行わず、その結果、通常の翻訳処理を行うことになる。 Further, according to the image recognition apparatus 100a of the modification, the correction control unit 105 compares the second operation parameter with each of the first operation parameters determined for each operation when a predetermined condition is satisfied. By doing so, it is possible to grasp the operation to be corrected. Therefore, when the condition is not satisfied, the comparison process is not performed, and as a result, the normal translation process is performed.

例えば、あらかじめ定めた条件として、翻訳結果の出力後、画像取得から第２の動作パラメータ生成までの時間が所定時間以内であることが考えられる。変形例における画像認識装置１００ａにおいて、タイマー１０５ｂが、その時間を計測し、修正制御部１０５ａがタイマー１０５ｂによる計測時間が所定時間内であると判断する場合には、修正処理を実行する。 For example, as a predetermined condition, it is conceivable that the time from image acquisition to second operation parameter generation is within a predetermined time after the translation result is output. In the image recognition apparatus 100a in the modification, the timer 105b measures the time, and when the correction control unit 105a determines that the measurement time by the timer 105b is within a predetermined time, the correction process is executed.

すなわち、その動作に基づいた処理が所定時間以内、すなわちその動作が短い、ということは、一連の動作による意思伝達ではなく、ある一動作の修正をすることであると判断することができる。よって、そのような場合には、修正指示を判断し、修正対象のための各動作パラメータの比較処理を実行することで、その処理を自動的に行うことができる。 That is, if the processing based on the operation is within a predetermined time, that is, the operation is short, it can be determined that a certain operation is corrected rather than a communication of a series of operations. Therefore, in such a case, it is possible to automatically perform the process by determining the correction instruction and executing the comparison process of each operation parameter for the correction target.

また、本実施形態の画像認識装置１００または変形例における画像認識装置１００ａによれば、パラメータ生成部１０４は、画像データの撮影対象の動作毎の、当該動作の始まりから終わりまでの時間を計測し、撮影対象の動作を示す動作パラメータに加えて、当該動作毎の時間を動作パラメータとして生成する。これにより、動作パラメータは動作のみならず、その動作を構成する時間を加味することができ、修正対象となる動作の一致度を判断する際に、より正確に行うことができる。 In addition, according to the image recognition device 100 of the present embodiment or the image recognition device 100a in the modification, the parameter generation unit 104 measures the time from the start to the end of the operation for each operation of the image data capturing target. In addition to the operation parameter indicating the operation to be imaged, the time for each operation is generated as the operation parameter. As a result, the operation parameter can take into account not only the operation but also the time for configuring the operation, and can be performed more accurately when determining the degree of coincidence of the operation to be corrected.

なお、上述は、画像認識装置１００および１００ａについての作用効果について記載したが、画像認識装置１００ｂにおいても、同様の作用効果を奏するものである。なお、画像認識装置１００ｂにおいては、ネットワーク側に翻訳のためのデータベースを持たせるため、祖内側の負担を軽減することができる。 In the above description, the operational effects of the image recognition devices 100 and 100a have been described. However, the image recognition device 100b also exhibits the same operational effects. In the image recognition apparatus 100b, since the database for translation is provided on the network side, the burden on the inner side can be reduced.

１００、１００ａ、１００ｂ…画像認識装置、１０１…ジェスチャー入力部、１０２…翻訳エンジン部、１０３…翻訳制御部、１０４…パラメータ生成部、１０５…修正制御部、１０５ａ…修正制御部、１０５ｂ…タイマー、１０９…画像表示部、１１０…通信制御部、１０６…ジェスチャー認識ＤＢ、１０７…意図解釈ＤＢ、１０８…文字翻訳ＤＢ、２０６…ジェスチャー認識ＤＢ、２０７…意図解釈ＤＢ、２０８…文字認識ＤＢ。 DESCRIPTION OF SYMBOLS 100, 100a, 100b ... Image recognition apparatus, 101 ... Gesture input part, 102 ... Translation engine part, 103 ... Translation control part, 104 ... Parameter generation part, 105 ... Correction control part, 105a ... Correction control part, 105b ... Timer, DESCRIPTION OF SYMBOLS 109 ... Image display part, 110 ... Communication control part, 106 ... Gesture recognition DB, 107 ... Intention interpretation DB, 108 ... Character translation DB, 206 ... Gesture recognition DB, 207 ... Intention interpretation DB, 208 ... Character recognition DB.

Claims

An image acquisition means for acquiring a shooting target as image data;
Image translation means for translating for each operation for the operation of the imaging target included in the image data acquired by the image acquisition means;
Generating means for generating, for each operation, a first operation parameter indicating an operation of an imaging target included in the image data acquired by the image acquisition unit;
A result output means for outputting a translation result of the movement of the photographing object translated by the image translation means;
Correction control means for correcting the translation result output by the result output means;
With
The image acquisition means, after the translation result is output by the result output means, acquires again the image data including the operation of the photographing target for correction,
The image translation means performs translation for the operation of the photographing target acquired again by the image acquisition means,
The generation unit generates a second operation parameter of the operation of the photographing target acquired again by the image acquisition unit,
The correction control means compares the second operation parameter with each of the first operation parameters determined for each operation, and the result of the comparison corresponds to the first operation parameter satisfying a predetermined condition. The translation result of the motion to be performed is replaced with the translation result of the motion of the second motion parameter, and corrected.
The result output means outputs the translation result corrected by the correction control means;
Image recognition device.

When the image translation means recognizes an operation indicating a correction instruction, the image acquisition means executes an image data acquisition process for correction,
The image recognition apparatus according to claim 1.

The image recognition device according to claim 1, wherein the correction control unit compares the second operation parameter with each of the first operation parameters determined for each operation when a predetermined condition is satisfied.

The correction control means has, as the predetermined condition, a time from the output of the translation result by the result output means to the second operation parameter generation by the image translation means within a predetermined time after the output of the translation result by the image acquisition means In the case where the first operation parameter is compared with the second operation parameter,
The image recognition apparatus according to claim 3.

The correction control means performs a comparison process between the first operation parameter and the second operation parameter when an operation for a correction instruction is received as the predetermined condition.
The image recognition apparatus according to claim 3.

For each operation to be imaged of the image data acquired by the image acquisition means, further comprises a measuring means for measuring the time from the start to the end of the operation,
6. The image recognition apparatus according to claim 1, wherein the generation unit generates a time for each operation as an operation parameter in addition to an operation parameter indicating an operation to be photographed.

In an image recognition device that recognizes the operation of a shooting target,
An image acquisition step of acquiring a shooting target as image data;
An image translation step for performing translation for each operation with respect to the operation of the imaging target included in the image data acquired by the image acquisition step;
A generation step of generating, for each operation, a first operation parameter indicating an operation of a photographing target included in the image data acquired by the image acquisition step;
A result output step for outputting the translation result of the motion of the object to be photographed translated by the image translation step;
A correction control step of correcting the translation result output by the result output step;
With
In the image acquisition step, after the translation result is output by the result output step, the image data including the operation of the photographing target for correction is acquired again,
The image translation step performs translation for the operation of the photographing target acquired again by the image acquisition step,
The generation step generates a second operation parameter of the operation of the imaging target acquired again by the image acquisition step,
The correction control step compares the second operation parameter with each of the first operation parameters determined for each operation, and the result of the comparison corresponds to the first operation parameter satisfying a predetermined condition. The translation result of the motion to be performed is replaced with the translation result of the motion of the second motion parameter, and corrected.
The result output step outputs the translation result corrected by the correction control step.
Image recognition method.