JP2016224608A

JP2016224608A - Character string input device

Info

Publication number: JP2016224608A
Application number: JP2015108708A
Authority: JP
Inventors: 剛史齊藤; Takashi Saito; 眞紀飯沼; Maki Iinuma
Original assignee: Kyushu Institute of Technology NUC
Current assignee: Kyushu Institute of Technology NUC
Priority date: 2015-05-28
Filing date: 2015-05-28
Publication date: 2016-12-28
Anticipated expiration: 2035-05-28
Also published as: JP6562542B2

Abstract

PROBLEM TO BE SOLVED: To provide a character string input device that inputs a character string by detecting a vowel component of a character based upon a mouth shape and suppressing a burden on a user.SOLUTION: A character string input device comprises: imaging means 11 of imaging a mouth shape of a user P; vowel detecting means 12 of deriving, as a vowel component to be detected, a vowel component corresponding to the mouth shape found based upon a picked up image 23; character string selecting means 14 of selecting a character string starting with a character whose vowel component is the vowel component to be detected from a character database 13 in which a plurality of character strings are registered in advance; automatic scanning means 15 of specifying one of selected character strings as an inputtable item sequentially; display means 20 of displaying the inputtable item; input means 18 operated from the outside; and determining means 16 of determining a character string specified as the inputtable item as input information through an operation on the input means 18 from the outside.SELECTED DRAWING: Figure 1

Description

本発明は、利用者の口の形状を基に、文字を検出し、文字列を入力する文字列入力装置に関する。 The present invention relates to a character string input device that detects characters and inputs a character string based on the shape of a user's mouth.

コンピュータや携帯端末において、文字を入力する一般的なインタフェースとして、キーボード、マウス及び釦が主流である。
一方、科学技術の発達により、日常生活で用いられている文字を書く感覚で文章を入力可能な手書き文字認識技術に基づくインタフェースや、人間にとって最も自然なコミュニケーション手段である音声を利用して文字を入力する音声認識用のインタフェースが、近年、利用可能となっている。 In a computer or a portable terminal, a keyboard, a mouse, and a button are mainstream as general interfaces for inputting characters.
On the other hand, with the development of science and technology, text is written using an interface based on handwritten character recognition technology that allows you to input text as if you were writing in everyday life, and speech, which is the most natural means of communication for humans. An input speech recognition interface has recently become available.

しかしながら、振戦（筋肉の収縮、弛緩の繰り返しにより生じる不随意のリズミカル運動）等の手指に障害をもつ人は、キーボードやマウス、あるいは、手書き文字認識のインタフェースを利用することが難しい。また、音声認識用のインタフェースを利用するには、声を発する必要があるため、入力しようとする内容が周囲に伝わり、秘匿性を欠き、しかも、周囲の雑音等により認識精度が低下するという課題が存在する。 However, it is difficult for a person with a handicapped hand such as tremor (involuntary rhythmic movement caused by repeated muscle contraction and relaxation) to use a keyboard, a mouse, or a handwritten character recognition interface. In addition, in order to use the voice recognition interface, it is necessary to utter a voice, so that the content to be input is transmitted to the surroundings, lack of confidentiality, and the recognition accuracy decreases due to ambient noise and the like. Exists.

そこで、口形を基にその口形に対応する文字を検出する技術が着目され、その技術を利用した装置の具体例が、特許文献１〜４に記載されている。
口形は、発話時に自然に変化するため、音声認識と同様に意図する言葉を自然に入力でき、更に、音声を発する必要がないことから、音声認識の課題を解決できるという利点がある。
ここで、文字を発するときの口形は、その文字の母音成分（「カ」の母音成分は「ア」）を発するときと同じになるという性質がある。従って、「カ」と「ア」は同じ口形となり、例えば、「ありがとう」を発音する口形は、「アイアオウ」と発音した口形とみなすことができる。 Accordingly, attention is paid to a technique for detecting characters corresponding to the mouth shape based on the mouth shape, and specific examples of apparatuses using the technology are described in Patent Documents 1 to 4.
Since the mouth shape naturally changes during utterance, it is possible to input the intended word naturally in the same way as speech recognition, and further, there is an advantage that the speech recognition problem can be solved because it is not necessary to utter speech.
Here, the mouth shape when the character is emitted has the property that it is the same as when the vowel component of the character is emitted (the vowel component of “K” is “A”). Therefore, “K” and “A” have the same mouth shape, and for example, a mouth shape that pronounces “Thank you” can be regarded as a mouth shape that pronounces “Aiou”.

特許文献１〜４に記載の装置は、文字を発する口形とその文字の母音成分を発する口形が同じになるという性質を利用したもので、口形を基にその口形に対応する文字の母音成分を検出する。口形を基に、文字の母音成分を検出する精度は、文字そのものを検出（例えば、「カ」を「カ」として検出）する精度より高いことが報告されていることから、特許文献１〜４に記載の装置は、口形から文字を安定的に検出することができる。 The devices described in Patent Documents 1 to 4 utilize the property that the mouth shape that emits a character is the same as the mouth shape that emits the vowel component of the character, and based on the mouth shape, the vowel component of the character corresponding to the mouth shape is used. To detect. It has been reported that the accuracy of detecting the vowel component of a character based on the mouth shape is higher than the accuracy of detecting the character itself (for example, detecting “K” as “K”). Can stably detect characters from the mouth shape.

特開２０１１−１８６９９４号公報JP 2011-186994 A 特開２００５−３０９９５２号公報JP 2005-309952 A 特開２００５−１０８０７９号公報JP 2005-108079 A 特開２００９−１６９４６４号公報JP 2009-169464 A

しかしながら、特許文献１の装置は、携帯電話機への適用を想定したもので、子音の文字を入力するために、テンキー（１０個のキー）を操作する必要があり、表示された候補単語のリストから単語を選択する際には、選択キーと確定キーの操作を要する。従って、特許文献１の装置は、複数のキーを操作する必要があり、手指に障害が有る者にとって負担が大きい。 However, the device of Patent Document 1 is assumed to be applied to a mobile phone, and it is necessary to operate a numeric keypad (10 keys) to input consonant characters, and a list of displayed candidate words When selecting a word from, the operation of the selection key and the confirmation key is required. Therefore, the apparatus of Patent Document 1 needs to operate a plurality of keys, which is heavy for a person with a handicapped hand.

そして、特許文献２、３の装置は、子音の文字を入力する際、１文字ごとにキー操作が必要であり、例えば、「カメラ」と入力するには、少なくとも３回のキー操作を要する。よって、操作回数が多い点で、利用者の負担が大きい。
また、特許文献４の装置は、口形から口形に対応する文字の母音成分を検出した後、利用者の頭部の動きを検知して、入力したい子音の文字を確定する。そのため、原則として、１文字ごとに、頭部を動かす必要があり、利用者の負担が大きい。
本発明は、かかる事情に鑑みてなされるもので、口形を基に文字の母音成分を検出し、利用者の負担を抑制して文字列を入力する文字列入力装置を提供することを目的とする。 The devices of Patent Documents 2 and 3 require key operations for each character when inputting consonant characters. For example, inputting “camera” requires at least three key operations. Therefore, the burden on the user is large in that the number of operations is large.
Moreover, after detecting the vowel component of the character corresponding to a mouth shape from the mouth shape, the apparatus of patent document 4 detects the movement of a user's head, and determines the character of the consonant to input. Therefore, in principle, it is necessary to move the head for each character, and the burden on the user is large.
The present invention is made in view of such circumstances, and an object of the present invention is to provide a character string input device that detects a vowel component of a character based on a mouth shape and inputs a character string while suppressing a burden on the user. To do.

前記目的に沿う本発明に係る文字列入力装置は、利用者の口形を撮像する撮像手段と、撮像された画像を基に求めた前記口形に対応する母音成分を、被検出母音成分として導出する母音検知手段と、複数の文字列が予め登録された文字列データベースから、母音成分が前記被検出母音成分である文字からはじまる前記文字列を選出する文字列選出手段と、選出された前記文字列について、順次、該文字列の一（一つ）を入力可能な項目に指定するオートスキャン手段と、前記入力可能な項目を表示する表示手段と、外部から操作される入力手段と、前記入力可能な項目に指定されている前記文字列を、外部から前記入力手段への操作によって、入力情報として確定する確定手段とを備える。 A character string input device according to the present invention that meets the above-described object derives, as a detected vowel component, an imaging unit that captures a user's mouth shape, and a vowel component corresponding to the mouth shape obtained based on the captured image. A vowel detection means; a character string selection means for selecting the character string starting from a character whose vowel component is the detected vowel component; from the character string database in which a plurality of character strings are registered in advance; and the selected character string Sequentially, one (1) of the character string is designated as an inputable item, an auto-scanning means, a display means for displaying the inputable item, an input means operated from the outside, and the input enabled And a confirming means for confirming the character string designated as an input item as input information by externally operating the input means.

本発明に係る文字列入力装置において、前記撮像手段は、前記被検出母音成分が導出されるまで、間欠的に撮像を行い、前記母音検知手段は、撮像された複数の前記画像それぞれに対し、該画像にとらえられている前記口形の特徴量を導出し、所定時間内に撮像された前記複数の画像に対応する前記特徴量が、所定範囲内に収まっていることを判定して、前記被検出母音成分の導出を行うのが好ましい。 In the character string input device according to the present invention, the imaging unit intermittently captures the detected vowel component until the detected vowel component is derived, and the vowel detection unit is configured for each of the plurality of captured images. A feature amount of the mouth shape captured in the image is derived, and it is determined that the feature amounts corresponding to the plurality of images captured within a predetermined time are within a predetermined range, and It is preferable to derive the detected vowel component.

本発明に係る文字列入力装置において、前記母音検知手段は、間欠的に撮像が行われる度に、新たに撮像された前記画像を基に前記口形に対応する母音成分を求め、前記表示手段は、該口形に対応する母音成分が求められる度に、新たに求められた該口形に対応する母音成分を表示するのが好ましい。 In the character string input device according to the present invention, the vowel detection means obtains a vowel component corresponding to the mouth shape based on the newly picked up image every time image pickup is intermittently performed, and the display means Each time a vowel component corresponding to the mouth shape is obtained, it is preferable to display the newly obtained vowel component corresponding to the mouth shape.

本発明に係る文字列入力装置において、前記オートスキャン手段は、前記母音検知手段に新たな前記被検出母音成分の導出を開始させる状態にするモード切替項目も、順次、前記入力可能な項目に指定し、１つ目の前記被検出母音成分を導出した前記母音検知手段は、前記モード切替項目が前記入力可能な項目に指定されている状態で、外部から前記入力手段に操作がなされることによって、２つ目の前記被検出母音成分を導出し、前記文字列選出手段は、１番目及び２番目の文字の母音成分がそれぞれ１つ目及び２つ目に導出された前記被検出母音成分である前記文字列を選出するのが好ましい。 In the character string input device according to the present invention, the auto-scan means sequentially designates a mode switching item for causing the vowel detection means to start derivation of a new detected vowel component as the inputable item. The vowel detection means that derives the first detected vowel component is operated by operating the input means from outside in a state where the mode switching item is designated as the inputable item. The second detected vowel component is derived, and the character string selection means is the detected vowel component derived from the first and second vowel components of the first and second characters, respectively. It is preferable to select a certain character string.

本発明に係る文字列入力装置において、前記表示手段は、前記文字列選出手段が選出した前記文字列の合計数も表示するのが好ましい。 In the character string input device according to the present invention, it is preferable that the display means also displays the total number of the character strings selected by the character string selection means.

本発明に係る文字列入力装置において、前記文字列データベースから選出される前記文字列は、利用者ごとに定められるのが好ましい。 In the character string input device according to the present invention, it is preferable that the character string selected from the character string database is determined for each user.

本発明に係る文字列入力装置において、各母音成分に対応する領域がマッピングされ、前記母音検知手段による前記被検出母音成分の導出基準となる複数の口形マップを更に備え、前記各口形マップは、各母音成分に対応する前記領域が、利用者ごとに調整されるのが好ましい。 In the character string input device according to the present invention, a region corresponding to each vowel component is mapped, and further includes a plurality of mouth shape maps serving as a derivation reference of the detected vowel component by the vowel detection unit, The region corresponding to each vowel component is preferably adjusted for each user.

本発明に係る文字列入力装置は、（１）撮像された画像を基に求めた口形に対応する母音成分を、被検出母音成分として導出し、（２）複数の文字列が予め登録された文字列データベースから、母音成分が被検出母音成分である文字からはじまる文字列を選出し、（３）選出された文字列について、順次、文字列の一を入力可能な項目に指定し、（４）入力可能な項目に指定されている文字列を、外部から入力手段への操作によって、入力情報として確定するので、入力手段に、文字列を入力情報として確定するための１つの釦（キー）を設け、利用者にその１つの釦のみを操作させることで、文字列の入力を行うことができる。従って、利用者は、文字列を入力するために複数の釦を操作する必要がなく、文字列を入力するための利用者の負担を抑制可能である。 The character string input device according to the present invention derives (1) a vowel component corresponding to a mouth shape obtained based on a captured image as a detected vowel component, and (2) a plurality of character strings are registered in advance. A character string starting from a character whose vowel component is the detected vowel component is selected from the character string database. (3) For the selected character string, one of the character strings is sequentially designated as an inputable item. ) Since a character string designated as an item that can be input is determined as input information by operating the input means from the outside, one button (key) for determining the character string as input information on the input means The character string can be input by allowing the user to operate only one of the buttons. Therefore, the user does not need to operate a plurality of buttons to input the character string, and the user's burden for inputting the character string can be suppressed.

本発明の一実施の形態に係る文字列入力装置のブロック図である。It is a block diagram of the character string input device which concerns on one embodiment of this invention. 同文字列入力装置の表示手段が表示する内容の説明図である。It is explanatory drawing of the content which the display means of the character string input device displays. （Ａ）〜（Ｄ）は、それぞれ口形マップの説明図である。(A)-(D) are explanatory drawings of a mouth shape map, respectively. 撮像された画像から抽出される口唇領域の説明図である。It is explanatory drawing of the lip area | region extracted from the imaged image. 口形マップを作成する流れを示すフロー図である。It is a flowchart which shows the flow which produces a mouth shape map. 入力文字列情報を確定する流れの一部を示すフロー図である。It is a flowchart which shows a part of flow which determines input character string information. 入力文字列情報を確定する流れの一部を示すフロー図である。It is a flowchart which shows a part of flow which determines input character string information. 入力速度の計測結果を示すグラフである。It is a graph which shows the measurement result of input speed.

続いて、添付した図面を参照しつつ、本発明を具体化した実施の形態につき説明し、本発明の理解に供する。
図１に示すように、本発明の一実施の形態に係る文字列入力装置１０は、利用者Ｐの口形を撮像する撮像手段１１と、撮像された図２に示す画像２３を基に口形に対応する母音成分を求め、被検出母音成分として導出する母音検知手段１２と、文字列データベース１３から、母音成分が被検出母音成分である文字からはじまる文字列を選出する文字列選出手段１４と、選出された文字列について、順次、文字列の一（一つ）を入力可能な項目に指定するオートスキャン手段１５と、入力可能な項目に指定されている文字列を、入力情報として確定する確定手段１６とを備えて、文字列を入力する。以下、これらについて詳細に説明する。 Next, embodiments of the present invention will be described with reference to the accompanying drawings for understanding of the present invention.
As shown in FIG. 1, a character string input device 10 according to an embodiment of the present invention has a mouth shape based on an image pickup unit 11 that picks up a mouth shape of a user P and an image 23 shown in FIG. A vowel detection unit 12 for obtaining a corresponding vowel component and deriving it as a detected vowel component; For the selected character string, auto-scan means 15 for sequentially specifying one (one) of the character string as an inputable item, and confirming that the character string specified for the inputable item is confirmed as input information And a means 16 for inputting a character string. Hereinafter, these will be described in detail.

文字列入力装置１０は、図１に示すように、コンピュータ（電子計算機）１７と、コンピュータ１７に接続された各種ハードウェアを備えている。
コンピュータ１７に接続された各種ハードウェアは、カメラからなる撮像手段１１、入力デバイスである入力手段１８、キーボード１９、ディスプレイからなる表示手段２０、及び、プリンタ（印刷機器）からなる出力手段２１である。
本実施の形態では、入力手段１８に、操作釦が１つの入力デバイスを採用しているが、外部から操作が可能な入力デバイスであれば、複数の操作釦を備える入力デバイスであってもよい。また、キーボードの１つの操作釦を入力手段として扱ってもよい。 As shown in FIG. 1, the character string input device 10 includes a computer (electronic computer) 17 and various hardware connected to the computer 17.
Various types of hardware connected to the computer 17 are an imaging unit 11 that is a camera, an input unit 18 that is an input device, a keyboard 19, a display unit 20 that is a display, and an output unit 21 that is a printer (printing device). .
In the present embodiment, the input unit 18 employs an input device with one operation button. However, an input device having a plurality of operation buttons may be used as long as the input device can be operated from the outside. . Further, one operation button on the keyboard may be handled as an input means.

コンピュータ１７は、ＣＰＵ、ハードディスク、メモリ及び接続ポートを備え、接続ポートに、上述した各種ハードウェアが接続されている。なお、コンピュータ１７と各種ハードウェアの接続は、有線接続でも、無線接続でもよい。
母音検知手段１２、文字列選出手段１４、オートスキャン手段１５及び確定手段１６は、コンピュータ１７のハードディスク内に記憶されたソフトウェアである。 The computer 17 includes a CPU, a hard disk, a memory, and a connection port, and the various hardware described above is connected to the connection port. The connection between the computer 17 and various hardware may be a wired connection or a wireless connection.
The vowel detection means 12, the character string selection means 14, the auto scan means 15, and the confirmation means 16 are software stored in the hard disk of the computer 17.

本実施の形態において、撮像手段１１、入力手段１８、キーボード１９及び表示手段２０は、コンピュータ１７とはそれぞれ別体のハードウェアであるが、コンピュータと一体となったものであってもよい。
文字列入力装置１０は、口元を撮像手段１１で撮像されている利用者Ｐが、表示手段２０の画面を見ながら、入力手段１８及びキーボード１９を操作することを想定して、設計されている。そのため、撮像手段１１は、表示手段２０の画面を見ている利用者Ｐの口元（本実施の形態では、口元を含む顔全体）を撮像できるように配置されている。 In the present embodiment, the image pickup unit 11, the input unit 18, the keyboard 19, and the display unit 20 are separate hardware from the computer 17, but may be integrated with the computer.
The character string input device 10 is designed on the assumption that the user P whose mouth is imaged by the imaging means 11 operates the input means 18 and the keyboard 19 while looking at the screen of the display means 20. . Therefore, the imaging unit 11 is arranged so as to capture the mouth of the user P who is looking at the screen of the display unit 20 (in the present embodiment, the entire face including the mouth).

母音検知手段１２は、主として、図１、図２に示すように、撮像手段１１が利用者Ｐの顔を撮像した画像２３から、利用者Ｐの口唇領域（口唇の輪郭によって囲まれた領域）２４を抽出する領域抽出部２２、抽出された口唇領域２４の特徴量を求める特徴量計測部２５、及び、口唇領域２４の特徴量から利用者Ｐの口形（口唇の輪郭の形）に対応する母音成分を求める母音判別部２６から構成されている。 As shown in FIGS. 1 and 2, the vowel detection unit 12 mainly uses a lip region of the user P (region surrounded by the contour of the lip) from an image 23 in which the imaging unit 11 images the face of the user P. 24 corresponds to the mouth shape of the user P (the shape of the contour of the lip) from the feature amount measuring unit 25 for obtaining the feature amount of the extracted lip region 24, and the feature amount of the lip region 24. It comprises a vowel discriminating unit 26 for obtaining a vowel component.

また、コンピュータ１７のハードディスクには、図１に示すように、特徴量計測部２５及び母音判別部２６がアクセス可能な口形マップデータベース２７も記憶されている。口形マップデータベース２７は、口唇領域２４と口唇領域２４に対応する母音成分の関係を定義した図３（Ａ）〜（Ｄ）に示す複数の口形マップ２８を格納している。口形マップデータベース２７に格納される新たな口形マップ２８は、母音検知手段１２によって作成され、作成された口形マップ２８は、母音検知手段１２による被検出母音成分の導出基準として利用される。 The hard disk of the computer 17 also stores a mouth shape map database 27 that can be accessed by the feature amount measurement unit 25 and the vowel discrimination unit 26, as shown in FIG. The mouth shape map database 27 stores a plurality of mouth shape maps 28 shown in FIGS. 3A to 3D in which the relationship between the lip region 24 and the vowel components corresponding to the lip region 24 is defined. A new mouth shape map 28 stored in the mouth shape map database 27 is created by the vowel detection means 12, and the created mouth shape map 28 is used as a reference for deriving the detected vowel component by the vowel detection means 12.

撮像手段１１によって撮像された画像２３及び口形マップ２８に対する母音検知手段１２の各処理には、口形マップ２８を作成する際、及び、被検出母音成分を導出する際に共通する処理と異なる処理が存在する。以下、その共通する処理について説明する。
撮像手段１１によって撮像された画像２３には、図２に示すように、利用者Ｐの口唇領域２４の他、利用者Ｐの眼、眉、首等の口唇領域２４以外の部位が含まれている。
領域抽出部２２は、撮像手段１１によって撮像された画像２３を取得し、画像２３における利用者Ｐの口唇領域２４を抽出する。本実施の形態では、領域抽出部２２が、２段階の処理を経て、口唇領域２４の抽出を行う。 Each process of the vowel detection means 12 for the image 23 and the mouth shape map 28 taken by the image pickup means 11 is different from the processing common when creating the mouth shape map 28 and deriving the detected vowel component. Exists. Hereinafter, the common processing will be described.
As shown in FIG. 2, the image 23 captured by the imaging unit 11 includes parts other than the lip region 24 such as the eyes, eyebrows, and neck of the user P in addition to the lip region 24 of the user P. Yes.
The area extraction unit 22 acquires the image 23 captured by the imaging unit 11 and extracts the lip area 24 of the user P in the image 23. In the present embodiment, the region extraction unit 22 extracts the lip region 24 through two stages of processing.

領域抽出部２２は、まず、ＶｉｏｌａとＪｏｎｅｓが考案したアルゴリズムを用いて、画像２３から利用者Ｐの顔全体の領域を抽出し（１段階目の処理）、抽出した顔全体の領域に対し、Ｃｏｎｓｔｒａｉｎｅｄｌｏｃａｌｍｏｄｅｌ（ＣＬＭ）を適用して、図４に示すように、利用者Ｐの鼻と共に、利用者Ｐの口唇領域２４を抽出する（２段階目の処理）。なお、ＣＬＭは、統計的アプローチによって、特徴点を検出する手法の一つである。本実施の形態では、１段階目の処理において、左右の鼻孔にそれぞれ１点の特徴点２４ａを割り当て、口唇領域２４の輪郭に対し８点の特徴点２４ａを割り当てる処理を行っているが、これに限定されない。 First, the region extracting unit 22 extracts an entire region of the face of the user P from the image 23 using an algorithm devised by Viola and Jones (first stage processing), and for the extracted entire region of the face, Constrained local model (CLM) is applied to extract the lip region 24 of the user P together with the nose of the user P as shown in FIG. 4 (second stage process). CLM is one of the methods for detecting feature points by a statistical approach. In the present embodiment, in the first stage of processing, one feature point 24a is assigned to each of the left and right nostrils, and eight feature points 24a are assigned to the outline of the lip region 24. It is not limited to.

特徴量計測部２５は、領域抽出部２２によって抽出された口唇領域２４から、口唇領域２４の特徴量を導出する。本実施の形態では、特徴量計測部２５が、口唇領域２４の面積を第１の特徴量として導出し、口唇領域２４のアスペクト比を第２の特徴量として導出する。アスペクト比とは、口唇領域２４の縦の長さを口唇領域２４の横幅で割った値である。
なお、本実施の形態では、２つの特徴量を導出するが、これに限定されない。 The feature amount measuring unit 25 derives the feature amount of the lip region 24 from the lip region 24 extracted by the region extracting unit 22. In the present embodiment, the feature amount measuring unit 25 derives the area of the lip region 24 as the first feature amount, and derives the aspect ratio of the lip region 24 as the second feature amount. The aspect ratio is a value obtained by dividing the vertical length of the lip region 24 by the horizontal width of the lip region 24.
In the present embodiment, two feature quantities are derived, but the present invention is not limited to this.

口形マップ２８は、図３（Ａ）〜（Ｄ）に示すように、６つの領域がマッピングされた２次元マップであり、５つの領域は、「あ」、「い」、「う」、「え」、「お」の母音成分にそれぞれ対応し、残りの１つの領域は、「ん」の文字に対応している（便宜上、「ん」の文字も母音成分として、以下、説明する）。即ち、口形マップ２８には、各母音成分に対応する領域がマッピングされている。
母音判別部２６は、第１、第２の特徴量をそれぞれ、口形マップ２８上の横軸の座標及び縦軸の座標として、口形マップ２８における口唇領域２４の座標（位置）を決定し、口形マップ２８上に口唇領域２４の座標をプロットする。 As shown in FIGS. 3A to 3D, the mouth map 28 is a two-dimensional map in which six areas are mapped. The five areas are “A”, “I”, “U”, “ E ”and“ O ”respectively correspond to the vowel components, and the remaining one region corresponds to the character“ n ”(for convenience, the character“ n ”will be described below as a vowel component). That is, the mouth map 28 is mapped with a region corresponding to each vowel component.
The vowel discrimination unit 26 determines the coordinates (position) of the lip region 24 in the mouth shape map 28 using the first and second feature values as the horizontal axis coordinate and the vertical axis coordinate on the mouth shape map 28, respectively. The coordinates of the lip region 24 are plotted on the map 28.

次に、口形マップ２８を作成する際の処理について説明する。
図５に示すように、利用者Ｐによってログインされた文字列入力装置１０が待機状態にあるときに（ステップＳ１）、撮像手段１１が利用者Ｐの顔を撮像することによって（ステップＳ２）、領域抽出部２２は、撮像された画像２３を撮像手段１１から取得し（ステップＳ３）、画像２３から口唇領域２４を抽出する（ステップＳ４）。抽出された口形領域２４は、図２に示すように、表示手段２０の画面において、画像２３の表示領域外に表示される。そして、特徴量計測部２５は、図５に示すように、抽出された口唇領域２４を基に口唇領域２４の第１、第２の特徴量を計測する（ステップＳ５）。 Next, processing when creating the mouth map 28 will be described.
As shown in FIG. 5, when the character string input device 10 logged in by the user P is in a standby state (step S1), the imaging unit 11 images the face of the user P (step S2). The area extraction unit 22 acquires the captured image 23 from the imaging unit 11 (step S3), and extracts the lip area 24 from the image 23 (step S4). The extracted mouth area 24 is displayed outside the display area of the image 23 on the screen of the display means 20, as shown in FIG. Then, as shown in FIG. 5, the feature amount measuring unit 25 measures the first and second feature amounts of the lip region 24 based on the extracted lip region 24 (step S5).

第１、第２の特徴量が計測された際、表示手段２０は、第１、第２の特徴量に応じて口唇領域２４の座標がプロットされた口形マップ２８を表示する。このとき、表示手段２０には、標準的な（デフォルトの）口形マップ２８が表示されている。
この状態で、利用者Ｐが、キーボード１９又は入力手段１８からの操作（キー入力）により、自らの口形に対応する母音成分を入力すると（ステップＳ６）、口唇領域２４の座標が、口形マップ２８において、入力された母音成分に対応する領域内に配されるように、口形マップ２８上の各母音成分に対応する６つの領域それぞれの形状が調整される（ステップＳ７）。 When the first and second feature values are measured, the display means 20 displays a mouth shape map 28 in which the coordinates of the lip region 24 are plotted according to the first and second feature values. At this time, a standard (default) mouth map 28 is displayed on the display means 20.
In this state, when the user P inputs a vowel component corresponding to his / her mouth shape by an operation (key input) from the keyboard 19 or the input means 18 (step S6), the coordinates of the lip region 24 are changed to the mouth shape map 28. The shape of each of the six areas corresponding to each vowel component on the mouth shape map 28 is adjusted so as to be arranged in the area corresponding to the input vowel component (step S7).

このステップＳ１〜ステップＳ７のサイクルを、複数回、繰り返すことによって、利用者Ｐに応じた口形マップ２８を作成することができ、各口形マップ２８は、各母音成分に対応する領域が、利用者Ｐごとに調整可能である。図３（Ａ）〜（Ｄ）にそれぞれ示された口形マップ２８は、４人の利用者Ｐに対して作成されたもので、口形マップ２８上の６つの領域の形状が、口形マップ２８ごとに異なっているのが確認できる。
キーボード１９又は入力手段１８からの操作により、口形マップ２８の各領域の調整が完了したことが検出されると（ステップＳ８）、その口形マップ２８は、利用者Ｐの識別情報と共に、口形マップデータベース２７に格納される（ステップＳ９）。 By repeating this cycle of step S1 to step S7 a plurality of times, a mouth shape map 28 corresponding to the user P can be created, and each mouth shape map 28 has a region corresponding to each vowel component. Adjustment is possible for each P. The mouth map 28 shown in FIGS. 3A to 3D is created for four users P, and the shapes of the six regions on the mouth map 28 are the same for each mouth map 28. It can be confirmed that they are different.
When it is detected by the operation from the keyboard 19 or the input means 18 that the adjustment of each area of the mouth map 28 has been completed (step S8), the mouth map 28 together with the identification information of the user P is stored in the mouth map map database. 27 (step S9).

続いて、利用者Ｐの口形に対応する母音成分を求めて、被検出母音成分を導出する際の処理について説明する。
図６に示すように、利用者Ｐによってログインされた文字列入力装置１０が待機状態の際に（ステップＳ１’）、撮像手段１１が利用者Ｐの顔を撮像することによって（ステップＳ２’）、領域抽出部２２は、撮像された画像２３を撮像手段１１から取得し（ステップＳ３’）、画像２３から口唇領域２４を抽出する（ステップＳ４’）。そして、特徴量計測部２５は、抽出された口唇領域２４の第１、第２の特徴量（即ち、画像２３にとらえられている口形の特徴量）を導出する（ステップＳ５’）。 Next, processing for obtaining a vowel component corresponding to the mouth shape of the user P and deriving the detected vowel component will be described.
As shown in FIG. 6, when the character string input device 10 logged in by the user P is in a standby state (step S1 ′), the imaging unit 11 images the face of the user P (step S2 ′). The area extracting unit 22 acquires the captured image 23 from the imaging unit 11 (step S3 ′), and extracts the lip area 24 from the image 23 (step S4 ′). Then, the feature amount measuring unit 25 derives the first and second feature amounts (that is, the feature amount of the mouth shape captured in the image 23) of the extracted lip region 24 (step S5 ′).

母音判別部２６は、口形マップデータベース２７内の利用者Ｐに対応する口形マップ２８を、利用者Ｐの識別情報を基に選択し、その口形マップ２８における口唇領域２４の座標を、計測された第１、第２の特徴量を基に決定することによって、画像２３中の利用者Ｐの口形に対応する母音成分を求める（ステップＳ６’）。
そして、表示手段２０は、母音判別部２６が求めた母音成分を、図２に示す母音表示部２９に表示する（ステップＳ７’）。また、母音判別部２６は、ステップＳ６’で決定した口唇領域２４の座標を、口形マップ２８上にプロットし、表示手段２０は、図２に示すように、口唇領域２４の座標がプロットされた口形マップ２８も表示する。 The vowel discrimination unit 26 selects a mouth shape map 28 corresponding to the user P in the mouth shape map database 27 based on the identification information of the user P, and the coordinates of the lip region 24 in the mouth shape map 28 are measured. By determining based on the first and second feature amounts, a vowel component corresponding to the mouth shape of the user P in the image 23 is obtained (step S6 ′).
Then, the display unit 20 displays the vowel component obtained by the vowel discrimination unit 26 on the vowel display unit 29 shown in FIG. 2 (step S7 ′). Further, the vowel discrimination unit 26 plots the coordinates of the lip region 24 determined in step S6 ′ on the lip map 28, and the display means 20 plots the coordinates of the lip region 24 as shown in FIG. A mouth map 28 is also displayed.

本実施の形態においては、撮像手段１１が、間欠的に（例えば、１秒間に１〜３０回）、利用者Ｐの顔の撮像を行い、特徴量計測部２５は、撮像によって画像２３が生成される度に、画像２３に対して、口唇領域２４の第１、第２の特徴量を導出して、コンピュータ１７のメモリ内に格納する。母音判別部２６は、第１、第２の特徴量が新たに導出される度に、利用者Ｐの口形に対応する母音成分を求める。即ち、母音検知手段１２は、間欠的に撮像が行われる度に、新たに撮像された画像２３を基に母音成分を求める。 In the present embodiment, the imaging unit 11 captures the face of the user P intermittently (for example, 1 to 30 times per second), and the feature amount measurement unit 25 generates the image 23 by the imaging. Each time, the first and second feature quantities of the lip region 24 are derived from the image 23 and stored in the memory of the computer 17. The vowel discrimination unit 26 obtains a vowel component corresponding to the mouth shape of the user P every time the first and second feature quantities are newly derived. In other words, the vowel detection means 12 obtains a vowel component based on the newly picked up image 23 every time image pickup is performed intermittently.

そして、母音判別部２６は、所定時間（例えば、１〜３秒）内に撮像された複数の画像２３それぞれに対応する第１、第２の特徴量が、所定範囲内に収まっているか否かを判定する（ステップＳ８’）。具体的には、所定時間内に撮像された画像２３について、（１）口形マップ２８上の口唇領域２４の座標が、全て、同一の母音成分の領域内に位置し、かつ、（２）口形マップ２８上で最も離れている２つの口唇領域２４の座標の距離が所定範囲内である場合、第１、第２の特徴量が、所定範囲内に収まっているとの判定をし、それ以外の場合、第１、第２の特徴量が、所定範囲内に収まっていないとの判定をする。
なお、口形マップ２８上の口唇領域２４の座標が、全て、同一の母音成分の領域内に位置するか否かのみを、第１、第２の特徴量が、所定範囲内に収まっているか否かの判定基準にしてもよい。 And the vowel discrimination | determination part 26 is whether the 1st, 2nd feature-value corresponding to each of the several image 23 imaged within predetermined time (for example, 1-3 seconds) is settled in the predetermined range. Is determined (step S8 '). Specifically, for the image 23 captured within a predetermined time, (1) the coordinates of the lip region 24 on the mouth shape map 28 are all located within the same vowel component region, and (2) the mouth shape If the distance between the coordinates of the two most distant lip areas 24 on the map 28 is within a predetermined range, it is determined that the first and second feature values are within the predetermined range, and the others In this case, it is determined that the first and second feature amounts are not within the predetermined range.
Whether or not the coordinates of the lip region 24 on the mouth map 28 are all within the same vowel component region, whether or not the first and second feature values are within a predetermined range. It may be used as a criterion.

ステップＳ８’において、所定時間内に撮像された複数の画像２３それぞれに対応する第１、第２の特徴量が、所定範囲内に収まっていると判定された際には、母音判別部２６は、自らが求めた母音成分を、被検出母音成分として確定（導出）する（ステップＳ９’）。
一方、母音判別部２６が、所定時間内に撮像された複数の画像２３それぞれに対応する第１、第２の特徴量が、所定範囲内に収まっていないと判定した際には、ステップＳ２’からステップＳ８’までの処理が再度行われる。ここで、撮像手段１１による間欠的な撮像を行っている時間が所定時間に達するまで、被検出母音成分は確定されず、ステップＳ２’からステップＳ８’までの処理が繰り返される。 In step S8 ′, when it is determined that the first and second feature amounts corresponding to each of the plurality of images 23 captured within a predetermined time are within the predetermined range, the vowel determination unit 26 The vowel component obtained by itself is determined (derived) as the detected vowel component (step S9 ′).
On the other hand, when the vowel determination unit 26 determines that the first and second feature amounts corresponding to the plurality of images 23 captured within the predetermined time are not within the predetermined range, step S2 ′. To S8 ′ are performed again. Here, the detected vowel component is not fixed and the processing from step S2 ′ to step S8 ′ is repeated until the time during which intermittent imaging is performed by the imaging unit 11 reaches a predetermined time.

被検出母音成分が確定されるまで、撮像手段１１が間欠的に画像２３を撮像する度に、領域抽出部２２は、画像２３から口唇領域２４を抽出し、口唇領域２４が抽出される度に、表示手段２０は、図２に示すように、その口唇領域２４を表示し、母音判別部２６が新たに母音成分を求める度に、新たに求められた母音成分を母音表示部２９に表示する。
利用者Ｐは、表示手段２０によって表示される口唇領域２４を視認することによって、口唇領域２４の抽出が適切か否かを確認でき、口唇領域２４の抽出が不適切であると判断した場合、例えば、撮像手段１１に対する顔の向きを調整可能である。そして、利用者Ｐは、表示手段２０に表示される新たに求められた母音成分を視認することによって、自らが意図した母音成分が求められているか否かを判断でき、意図していない母音成分が求められている場合、例えば、自らの口形を変えることで、意図している母音成分が求められるようにすることが可能である。 Every time the imaging means 11 intermittently captures the image 23 until the detected vowel component is determined, the region extraction unit 22 extracts the lip region 24 from the image 23 and whenever the lip region 24 is extracted. As shown in FIG. 2, the display unit 20 displays the lip region 24, and displays the newly obtained vowel component on the vowel display unit 29 every time the vowel discrimination unit 26 newly obtains the vowel component. .
The user P can confirm whether or not the extraction of the lip region 24 is appropriate by visually recognizing the lip region 24 displayed by the display means 20, and if the user P determines that the extraction of the lip region 24 is inappropriate, For example, the orientation of the face with respect to the imaging unit 11 can be adjusted. And the user P can judge whether the vowel component which he intended was calculated | required by visually recognizing the newly calculated | required vowel component displayed on the display means 20, and the unintended vowel component For example, it is possible to obtain the intended vowel component by changing its mouth shape.

次に、文字列データベース１３、文字列選出手段１４、オートスキャン手段１５及び確定手段１６によって、主として、行われる被検出母音成分を基にした文字列の入力処理について説明する。
文字列データベース１３には、利用者Ｐが、利用者Ｐごとに自らが入力したい複数の文字列を、予め登録することができる。本実施の形態において、文字列は、１つの単語や、複数の単語からなる文章を意味する。 Next, the character string input process based on the detected vowel component mainly performed by the character string database 13, the character string selection means 14, the auto scan means 15 and the confirmation means 16 will be described.
In the character string database 13, the user P can register in advance a plurality of character strings that the user P wants to input for each user P. In the present embodiment, the character string means a single word or a sentence composed of a plurality of words.

文字列選出手段１４は、図１に示すように、文字列データベース１３にアクセス可能に設計され、図６に示すステップＳ９’で確定された被検出母音成分を基に、文字列データベース１３から、文字列を選出する。
ステップＳ９’で被検出母音成分を確定した後、図６に示すように、文字列選出手段１４は、文字列データベース１３から、母音成分が被検出母音成分の文字からはじまる文字列を、入力候補の文字列として選出する（ステップＳ１０’）。 As shown in FIG. 1, the character string selection means 14 is designed to be accessible to the character string database 13, and based on the detected vowel component determined in step S9 ′ shown in FIG. Select a string.
After the detected vowel component is determined in step S9 ′, as shown in FIG. 6, the character string selection unit 14 inputs a character string whose vowel component starts from the character of the detected vowel component from the character string database 13. Is selected as a character string (step S10 ').

例えば、被検出母音成分が「あ」であれば、「間」、「明日」、「会社」、「立場」等の単語や、「明日の会議に参加します」のような複数の単語からなる文章が選出される。
ここで、文字列データベース１３においては、各文字列が、利用者Ｐの識別情報を付与された状態で記憶され、文字列データベース１３から入力候補の文字列として選出する文字列を、利用者Ｐごとに定義している。そのため、利用者Ｐは、自らが入力する意図のない文字列が選出されるのを回避可能である。
そして、表示手段２０は、図７に示すように、文字列選出手段１４によって選出された入力候補の文字列を、図２に示す文字列表示部３０に表示する（ステップＳ１１’）。 For example, if the detected vowel component is “A”, from words such as “Between”, “Tomorrow”, “Company”, “Position”, and multiple words such as “I will participate in tomorrow's meeting” Will be selected.
Here, in the character string database 13, each character string is stored with the identification information of the user P added thereto, and the character string selected as the input candidate character string from the character string database 13 is stored in the user P. Each is defined. Therefore, the user P can avoid selecting a character string that the user does not intend to input.
Then, as shown in FIG. 7, the display unit 20 displays the input candidate character strings selected by the character string selection unit 14 on the character string display unit 30 shown in FIG. 2 (step S11 ′).

オートスキャン手段１５は、文字列選出手段１４によって選出された複数の入力候補の文字列について、所定時間（本実施の形態では、１〜３秒）ごとに、順次、その一つ（入力候補の文字列の一）を入力可能な項目に指定する。入力候補の文字列が、入力可能な項目に指定される順番は、過去に入力文字列情報として確定された実績を基に、オートスキャン手段１５が決定する。即ち、オートスキャン手段１５は、過去の実績を基に、入力候補の文字列を入力可能な項目に指定する順番を決定する学習機能を有している。
表示手段２０は、入力可能な項目に指定されている入力候補の文字列を、文字列表示部３０において、強調表示する（即ち、入力可能な項目を表示する）。従って、利用者Ｐは、入力可能な項目となっている入力候補の文字列を視認可能である。 The auto scan unit 15 sequentially selects one of the input candidate character strings selected by the character string selection unit 14 every predetermined time (1 to 3 seconds in the present embodiment). Specify one of the character strings as an inputable item. The order in which the input candidate character strings are designated as items that can be input is determined by the auto-scanning means 15 based on the past record as input character string information. That is, the auto-scanning means 15 has a learning function for determining the order of designating input candidate character strings as items that can be input based on past results.
The display means 20 highlights the input candidate character string designated as an inputable item on the character string display unit 30 (that is, displays the inputable item). Therefore, the user P can visually recognize the input candidate character string that is an inputable item.

文字列表示部３０は、図２に示すように、一度に表示可能な入力候補の文字列の数（本実施の形態では、１０個）に制限がある。その制限の数を超える入力候補の文字列が選ばれた場合、表示手段２０は、文字列表示部３０の更新を繰り返すことによって、全ての入力候補の文字列を表示することができる。文字列表示部３０に一度に表示可能な数を超える入力候補の文字列が選ばれた場合、表示手段２０は、文字列表示部３０を更新するための文字列表示更新項目３１（図２において、「次の候補」と示された項目）を表示する。 As shown in FIG. 2, the character string display unit 30 is limited in the number of input candidate character strings that can be displayed at one time (10 in the present embodiment). When input candidate character strings exceeding the limit are selected, the display unit 20 can display all the input candidate character strings by repeatedly updating the character string display unit 30. When input candidate character strings exceeding the number that can be displayed at one time are selected in the character string display unit 30, the display means 20 updates the character string display update item 31 (in FIG. 2) for updating the character string display unit 30. , The item indicated as “next candidate”).

オートスキャン手段１５が、文字列表示部３０中の入力候補の文字列を、上から下に向かって、順に、入力可能な項目に指定するのに従って、表示手段２０は、入力可能な項目に指定されている入力候補の文字列を、強調表示する。そして、最下段に表示されている入力候補の文字列が強調表示された後、文字列表示更新項目３１が、オートスキャン手段１５によって入力可能な項目に指定され、表示手段２０によって強調表示される。 As the autoscan means 15 designates input candidate character strings in the character string display section 30 in order from the top to the bottom as items that can be entered, the display means 20 designates them as items that can be entered. The input candidate character string is highlighted. Then, after the character string of the input candidate displayed at the bottom is highlighted, the character string display update item 31 is designated as an item that can be input by the auto scan means 15 and is highlighted by the display means 20. .

図７に示すように、文字列表示部３０中の入力候補の文字列が強調表示されている際に、利用者Ｐによって入力手段１８の釦が押される（入力手段１８が操作される）と（ステップＳ１２’）、確定手段１６は、強調表示されている入力候補の文字列を、入力情報（以下、「入力文字列情報」ともいう）として確定してコンピュータ１７のメモリに記録し（ステップ１３’）、新たな入力文字列情報を、確定する処理を行う場合は（ステップＳ１４’）、ステップＳ２’に戻る。一方、ステップＳ１４’において、入力文字列情報の確定を終了する場合は、これまで確定した入力文字列情報を、確定順に並べて、出力手段２１から出力し（ステップＳ１５’）、ステップＳ１’へ戻る。 As shown in FIG. 7, when the input candidate character string in the character string display unit 30 is highlighted, the button of the input unit 18 is pressed by the user P (the input unit 18 is operated). (Step S12 ′), the confirmation means 16 confirms the highlighted input candidate character string as input information (hereinafter also referred to as “input character string information”) and records it in the memory of the computer 17 (step S12 ′). 13 ′), when a process of confirming new input character string information is performed (step S14 ′), the process returns to step S2 ′. On the other hand, when the confirmation of the input character string information is finished in step S14 ′, the input character string information confirmed so far is arranged in the order of confirmation and output from the output means 21 (step S15 ′), and the process returns to step S1 ′. .

そして、文字列表示更新項目３１が強調表示されている際に、利用者Ｐによって入力手段１８の釦が押されると（ステップＳ１６’）、確定手段１６は、表示手段２０に、文字列表示部３０を更新させて、新たな入力候補の文字列を表示させる（ステップＳ１７’）。文字列表示部３０の更新後、新たに表示された入力候補の文字列に対して、強調表示が開始され、ステップＳ１２’に進む。 When the user P pushes the button of the input means 18 while the character string display update item 31 is highlighted (step S16 ′), the confirmation means 16 causes the display means 20 to display the character string display section. 30 is updated to display a new input candidate character string (step S17 '). After the character string display unit 30 is updated, highlighting of the newly displayed input candidate character string is started, and the process proceeds to step S12 '.

また、表示手段２０は、図２に示すように、既に確定した入力文字列情報を削除するための削除用項目３２（図２において、「単語の削除」と示された項目）、及び、既に確定した入力文字列情報をコンピュータ１７のスピーカから音で出力するための音出力項目３３（図２において、「音声の出力」と示された項目）に加え、母音検知手段１２に新たな被検出母音成分の導出を開始させる状態にするモード切替項目３４（図２において、「次の音の入力」と示された項目）も表示している。 In addition, as shown in FIG. 2, the display means 20 includes a deletion item 32 (an item indicated as “deletion of word” in FIG. 2) for deleting the input character string information that has already been confirmed, In addition to the sound output item 33 (the item indicated as “speech output” in FIG. 2) for outputting the confirmed input character string information as a sound from the speaker of the computer 17, a new detected object is detected in the vowel detection means 12. A mode switching item 34 (an item indicated as “input of the next sound” in FIG. 2) to be set in a state in which derivation of a vowel component is started is also displayed.

オートスキャン手段１５は、ステップＳ１６’で、文字列表示更新項目３１が強調表示されている際に、入力手段１８の釦が押されない場合、モード切替項目３４を入力可能な項目に指定し（即ち、オートスキャン手段１５は、モード切替項目３４も、順次、入力可能な項目に指定し）、表示手段２０は、モード切替項目３４を強調表示する。
図７に示すように、モード切替項目３４が強調表示されている際に（即ち、モード切替項目３４が入力可能な項目に指定されている状態で）、利用者Ｐによって入力手段１８の釦が押されることによって（ステップＳ１８’）、ステップＳ２’に戻り、母音検知手段１２は、新たに撮像された画像２３を基に、図６に示すように、ステップＳ３’からステップＳ９’を経て、新たな被検出母音成分を導出する（１つ目の被検出母音成分が既に導出されている際には、２つ目の被検出母音成分が導出され、Ｎ個目の被検出母音成分が既に導出されている際には、Ｎ＋１個目の被検出母音成分が導出される）。 When the character string display update item 31 is highlighted in step S16 ′ and the button of the input unit 18 is not pressed, the auto scan unit 15 designates the mode switching item 34 as an item that can be input (ie, The auto-scanning means 15 also designates the mode switching item 34 as an item that can be input sequentially), and the display means 20 highlights the mode switching item 34.
As shown in FIG. 7, when the mode switching item 34 is highlighted (that is, in a state where the mode switching item 34 is designated as an inputable item), the button of the input unit 18 is pressed by the user P. When pressed (step S18 ′), the process returns to step S2 ′, and the vowel detection means 12 passes from step S3 ′ to step S9 ′ as shown in FIG. A new detected vowel component is derived (when the first detected vowel component is already derived, the second detected vowel component is derived, and the Nth detected vowel component is already When it is derived, the (N + 1) th detected vowel component is derived).

ステップＳ９’で、１つ目の被検出母音成分を既に導出した母音検知手段１２によって、２つ目の被検出母音成分が導出された際には、文字列選出手段１４は、ステップＳ１０’にて、１番目及び２番目の文字の母音成分がそれぞれ１つ目及び２つ目に導出された被検出母音成分である文字列を、文字列データベース１３から、入力候補の文字列として選出する。例えば、１つ目に導出された被検出母音成分が「あ」で、２つ目に導出された被検出母音成分が「い」であれば、１番目及び２番目の文字の母音成分がそれぞれ「あ」、「い」である「間」、「会社」、「立場」等の文字列が選出される。 When the second detected vowel component is derived by the vowel detection unit 12 that has already derived the first detected vowel component in step S9 ′, the character string selection unit 14 proceeds to step S10 ′. Then, the character string whose detected vowel component is derived from the first and second vowel components of the first and second characters, respectively, is selected from the character string database 13 as the input candidate character string. For example, if the first detected vowel component derived is “A” and the second detected vowel component is “Yes”, the first and second character vowel components are respectively Character strings such as “A”, “I”, “Ma”, “Company”, “Position”, etc. are selected.

そして、ステップＳ９’で、Ｎ個目の被検出母音成分が導出された際には、文字列選出手段１４は、ステップＳ１０’にて、１番目、２番目、・・・Ｎ番目の文字の母音成分がそれぞれ１つ目、２つ目、・・・Ｎ個目に導出された被検出母音成分である文字列を、文字列データベース１３から、入力候補の文字列として選出する。従って、導出する被検出母音成分の増加によって、選出される入力候補の文字列は減少する（絞り込まれる）。 When the N-th detected vowel component is derived in step S9 ′, the character string selection unit 14 determines the first, second,... Nth character in step S10 ′. A character string that is a detected vowel component derived from the first, second,..., Nth vowel components is selected from the character string database 13 as a character string as an input candidate. Therefore, the input candidate character strings to be selected are reduced (narrowed) by the increase in the derived detected vowel component.

一方、ステップＳ１８’で、モード切替項目３４が強調表示されている際に、入力手段１８の釦が押されない場合、オートスキャン手段１５は、削除用項目３２及び音出力項目３３を、順次、入力可能な項目にし、表示手段２０は、削除用項目３２及び音出力項目３３を、入力可能な項目となったタイミングで、順次、強調表示する。
図７に示すように、削除用項目３２が強調表示されている際に、入力手段１８の釦が押されると（ステップＳ１９’）、既に入力文字列情報として確定していた入力候補の文字列が削除され（ステップＳ２０’）、ステップＳ２’に戻る。 On the other hand, if the button of the input unit 18 is not pressed while the mode switching item 34 is highlighted in step S18 ′, the autoscan unit 15 sequentially inputs the deletion item 32 and the sound output item 33. The display unit 20 highlights the deletion item 32 and the sound output item 33 sequentially at the timing when the item becomes an inputable item.
As shown in FIG. 7, when the button of the input means 18 is pressed while the deletion item 32 is highlighted (step S19 ′), the input candidate character string that has already been confirmed as the input character string information. Is deleted (step S20 ′), and the process returns to step S2 ′.

ステップＳ１９’で、削除用項目３２が強調表示されている際に、入力手段１８の釦が押されないと、次に、音出力項目３３が強調表示される。そして、音出力項目３３が強調表示されている際に、入力手段１８の釦が押されると（ステップＳ２１’）、既に確定した入力文字列情報が音で出力され（ステップＳ２２’）、ステップＳ１４’に進む。一方、ステップＳ２１’で、音出力項目３３が強調表示されている際に、入力手段１８の釦が押されなかった場合、文字列表示部３０中の最上段に表示されている入力候補の文字列からの強調表示が再開され、ステップＳ１２’に戻る。 If the deletion item 32 is highlighted in step S19 'and the button of the input means 18 is not pressed, then the sound output item 33 is highlighted. If the button of the input means 18 is pressed while the sound output item 33 is highlighted (step S21 ′), the input character string information that has already been confirmed is output as a sound (step S22 ′), and step S14. Proceed to '. On the other hand, if the button of the input means 18 is not pressed while the sound output item 33 is highlighted in step S21 ′, the input candidate characters displayed at the top of the character string display section 30 are displayed. Highlighting from the column is resumed, and the process returns to step S12 ′.

また、表示手段２０は、ステップＳ１０’にて、入力候補の文字列が選出されることによって、文字列選出手段１４が選出した入力候補の文字列の合計数も表示する。よって、利用者Ｐは、選出された入力候補の文字列の合計数を視認し、その数に応じて、ステップＳ１８’において、ステップＳ２’に戻って、新たな被検出母音成分の導出により、選出される入力候補の文字列の数を減少させるかを判断することが可能である。 The display unit 20 also displays the total number of input candidate character strings selected by the character string selection unit 14 by selecting input candidate character strings in step S10 '. Therefore, the user P visually recognizes the total number of character strings of the selected input candidates, and returns to step S2 ′ in step S18 ′ according to the number, thereby deriving a new detected vowel component, It is possible to determine whether to reduce the number of input candidate character strings to be selected.

次に、本発明の作用効果を確認するために行った実験について説明する。
実験では、文字列データベースに、全国手話検定試験５級の出題対象である４００個の単語を文字列として登録した。母音判別部は、２秒以内に撮像された複数の画像それぞれに対応する第１、第２の特徴量が、所定範囲内に収まっていることを条件に、被検出母音成分を確定するように設定され、オートスキャン手段は、１秒ごとに入力可能な項目を、順次、切り替えるように設定された。 Next, an experiment conducted for confirming the effect of the present invention will be described.
In the experiment, 400 words, which are the subjects of the national sign language certification test grade 5, were registered as character strings in the character string database. The vowel discrimination unit determines the detected vowel component on the condition that the first and second feature values corresponding to each of the plurality of images captured within 2 seconds are within a predetermined range. The auto-scanning means is set to sequentially switch items that can be input every second.

文字列データベースに登録された文字列を基にそれぞれ作成される表１に示す２つの単語からなる文章、３つの単語からなる文章、及び、４つの単語からなる文章を１セットとし、１０名の被験者（利用者）に、３セットの文章を入力させ、１セットあたりの入力速度を計測した。なお、入力速度の単位は、ＫＰＭ（ＫａｎａＰｅｒＭｉｎｕｔｅ）であり、例えば、「カメラ買う」の文章は、「か」、「め」、「ら」、「か」、「う」の５つのＫａｎａ（文字）を含んでいるものとして計測した。 A text composed of two words shown in Table 1 created based on a character string registered in the character string database, a sentence composed of three words, and a sentence composed of four words are set as one set. The test subject (user) was input three sets of sentences, and the input speed per set was measured. The unit of the input speed is KPM (Kana Per Minute). For example, the sentence “buy camera” has five Kanas “ka”, “me”, “ra”, “ka”, “u”. Measured as containing (letters).

１０名の被験者に対して入力速度を計測した平均値を、図８のグラフに示す。図８のグラフにおいて、横軸の「試行回数」の「１」、「２」、「３」はそれぞれ、１セット目、２セット目及び３セット目の計測結果を示している。
図８のグラフより、入力速度の平均値は、１セット目で５．６（ＫＰＭ）であったものが、３セット目で６．８（ＫＰＭ）となり、文章の入力を重ねることにより、入力速度が向上することが確認された。なお、オートスキャン手段による入力可能な項目を切り替える時間は、被験者に応じて調整可能であり、その時間を短くすることで、入力時間を短縮できることが考えられる。 The average value which measured input speed with respect to 10 test subjects is shown in the graph of FIG. In the graph of FIG. 8, “1”, “2”, and “3” of “number of trials” on the horizontal axis indicate the measurement results of the first set, the second set, and the third set, respectively.
From the graph in Fig. 8, the average value of the input speed was 5.6 (KPM) in the first set, but it was 6.8 (KPM) in the third set. It was confirmed that the speed was improved. Note that the time for switching items that can be input by the auto-scanning means can be adjusted according to the subject, and it is conceivable that the input time can be shortened by shortening the time.

以上、本発明の実施の形態を説明したが、本発明は、上記した形態に限定されるものでなく、要旨を逸脱しない条件の変更等は全て本発明の適用範囲である。
例えば、撮像手段は、被検出母音成分が確定されるまで、間欠的に撮像を行う必要はなく、１つの画像を基に母音成分を求めて、被検出母音成分を確定するようにしてもよい。
また、モード切替項目を、表示手段に表示させる代わりに、母音検知手段に新たな被検出母音成分の導出を開始させる状態にするための釦を、入力手段に設けてもよい。
そして、表示手段が、文字列選出手段によって選出された文字列の数を、必ずしも表示する必要はない。
更に、利用者ごとに各母音成分に対応する領域が調整された口形マップを作成する機能や、文字列データベースから選出される文字列を利用者ごとに定める機能も、必ずしも必要ではない。
また、表示手段は、入力可能な項目に指定されている入力候補の文字列を、他の入力候補の文字列とは、異なる色で表示してもよいし、入力可能な項目に指定されている入力候補の文字列のみを表示するようにしてもよい。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and all changes in conditions and the like that do not depart from the gist are within the scope of the present invention.
For example, it is not necessary for the imaging means to perform intermittent imaging until the detected vowel component is determined, and the detected vowel component may be determined by obtaining the vowel component based on one image. .
Further, instead of displaying the mode switching item on the display means, a button for causing the vowel detection means to start derivation of a new detected vowel component may be provided on the input means.
The display means does not necessarily display the number of character strings selected by the character string selection means.
Furthermore, a function for creating a mouth shape map in which a region corresponding to each vowel component is adjusted for each user and a function for determining a character string selected from the character string database for each user are not necessarily required.
Further, the display means may display the input candidate character string designated as an inputable item in a color different from other input candidate character strings, or may be designated as an inputable item. Only the input candidate character strings may be displayed.

１０：文字列入力装置、１１：撮像手段、１２：母音検知手段、１３：文字列データベース、１４：文字列選出手段、１５：オートスキャン手段、１６：確定手段、１７：コンピュータ、１８：入力手段、１９：キーボード、２０：表示手段、２１：出力手段、２２：領域抽出部、２３：画像、２４：口唇領域、２４ａ：特徴点、２５：特徴量計測部、２６：母音判別部、２７：口形マップデータベース、２８：口形マップ、２９：母音表示部、３０：文字列表示部、３１：文字列表示更新項目、３２：削除用項目、３３：音出力項目、３４：モード切替項目、Ｐ：利用者 10: Character string input device, 11: Imaging means, 12: Vowel detection means, 13: Character string database, 14: Character string selection means, 15: Auto scan means, 16: Determination means, 17: Computer, 18: Input means , 19: keyboard, 20: display means, 21: output means, 22: region extraction unit, 23: image, 24: lip region, 24a: feature point, 25: feature amount measurement unit, 26: vowel discrimination unit, 27: Mouth map database, 28: Mouth map, 29: Vowel display section, 30: Character string display section, 31: Character string display update item, 32: Delete item, 33: Sound output item, 34: Mode switching item, P: user

Claims

Imaging means for imaging the mouth shape of the user;
Vowel detection means for deriving a vowel component corresponding to the mouth shape obtained based on the captured image as a detected vowel component;
A character string selection means for selecting the character string starting from a character whose vowel component is the detected vowel component, from a character string database in which a plurality of character strings are registered in advance;
For the selected character string, auto-scanning means for sequentially specifying one of the character strings as an inputable item,
Display means for displaying the inputable items;
Input means operated from the outside;
A character string input device comprising: a determining unit that determines the character string specified as the inputable item as input information by operating the input unit from the outside.

2. The character string input device according to claim 1, wherein the imaging unit intermittently captures the detected vowel component until the detected vowel component is derived, and the vowel detection unit performs each of the plurality of captured images. Deriving feature values of the mouth shape captured in the image, determining that the feature values corresponding to the plurality of images captured within a predetermined time are within a predetermined range, A character string input device for deriving a detected vowel component.

3. The character string input device according to claim 2, wherein the vowel detection means obtains a vowel component corresponding to the mouth shape based on the newly picked up image every time image pickup is intermittently performed, and the display means. Is a character string input device that displays a newly obtained vowel component corresponding to the mouth shape every time a vowel component corresponding to the mouth shape is obtained.

The character string input device according to any one of claims 1 to 3, wherein the auto-scan unit includes a mode switching item for causing the vowel detection unit to start derivation of a new detected vowel component. The vowel detection means that sequentially designates the item that can be input and derives the first detected vowel component, in a state where the mode switching item is specified as the item that can be input, from the outside By operating the input means, the second detected vowel component is derived, and the character string selecting means has the first and second character vowel components as the first and second, respectively. A character string input device that selects the character string that is the derived detected vowel component.

5. The character string input device according to claim 1, wherein the display unit also displays a total number of the character strings selected by the character string selection unit. .

The character string input device according to any one of claims 1 to 5, wherein the character string selected from the character string database is determined for each user.

The character string input device according to any one of claims 1 to 6, wherein a plurality of mouth map is mapped to a region corresponding to each vowel component and serves as a reference for deriving the detected vowel component by the vowel detection unit. The character string input device further comprising: each mouth shape map, wherein the region corresponding to each vowel component is adjusted for each user.