JP2002304194A

JP2002304194A - System, method and program for inputting voice and/or mouth shape information

Info

Publication number: JP2002304194A
Application number: JP2001317031A
Authority: JP
Inventors: Masanobu Kujirada; 雅信鯨田
Original assignee: Individual
Current assignee: Individual
Priority date: 2001-02-05
Filing date: 2001-10-15
Publication date: 2002-10-18

Abstract

PROBLEM TO BE SOLVED: To provide a system, a method and a program for inputting a voice/ mouth shape information mixture so as to prevent the occurrence of leaking secret information (includes privacy information) that may occur, while inputting the information using only voice, and to provide a highly accurate information inputting. SOLUTION: The system, the method and the program are provided with a voice input means which is used to input voice uttered by a user, a mouth shape information inputting means which is used to input an image of the mouth shape of the user and an input contents determining means which is used to determine the inputted contents of the user, based on the information from the voice input means and the information from the mouth shape information input means. When the strength of the voice inputted into the voice input means is equal to or less than a prescribed level, the input contents determining means determines the inputted contents by the user, based on the information from the mouth shape information input means.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、携帯電話、ＰＤＡ（携
帯情報端末）、パソコンなどにユーザーが情報を入力す
るためのシステム、方法、プログラムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system, a method, and a program for allowing a user to input information into a mobile phone, a PDA (Personal Digital Assistant), a personal computer, and the like.

【０００２】[0002]

【従来の技術】従来より、携帯電話、ＰＤＡ、パソコン
などにユーザーが音声で情報を入力する方式が知られて
いる。また、携帯電話、ＰＤＡ、パソコンなどに、ユー
ザーが口形状（唇の形状）により情報を入力する方式
（読唇機能）が提案され、特許や特許出願なども複数存
在している。2. Description of the Related Art Conventionally, there has been known a method in which a user inputs information by voice to a mobile phone, a PDA, a personal computer, or the like. In addition, a method (lip reading function) in which a user inputs information in a mouth shape (lip shape) to a mobile phone, a PDA, a personal computer, and the like has been proposed, and there are a plurality of patents and patent applications.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来か
ら提案されている方法は、音声入力のときは入力された
音声のみから情報を入力する方式であり、また、読唇入
力のときは読唇のみで情報を入力する方式であり、両者
を混合した入力方式はまだ発案されていない。一般に、
音声のみの入力によるときは、パスワードなどの秘密情
報を入力するときに、その秘密情報が周囲の人に聞かれ
て秘密情報が漏れてしまうという問題がある。また、読
唇機能のみによるときは、入力内容の精度が低下してし
まうという問題がある。本発明はこのような従来技術の
問題点を解決するものであって、音声入力のみによると
きの秘密情報（プライバシー情報を含む）の漏洩を防ぐ
と共に、高精度な情報入力を可能にすることができる音
声及び／又は口形状入力のためのシステム、方法、プロ
グラムを提供することを目的とする。However, the conventionally proposed method is a method of inputting information only from the input voice at the time of voice input, and the method of inputting information only by lip reading at the time of lip reading input. And an input method that mixes the two has not been proposed yet. In general,
When only voice is input, when inputting secret information such as a password, there is a problem that the secret information is heard by surrounding people and the secret information is leaked. In addition, when only the lip reading function is used, there is a problem that the accuracy of the input content is reduced. SUMMARY OF THE INVENTION The present invention is to solve such a problem of the prior art, and it is possible to prevent leakage of secret information (including privacy information) only by voice input and to enable highly accurate information input. It is an object to provide a system, a method and a program for voice and / or mouth shape input that can be performed.

【０００４】[0004]

【課題を解決するための手段】（用語説明）本発明にお
いて、「秘密情報」は、プライバシー情報を含む。本発
明において、「口形状」は、「唇形状」を含む。(Explanation of Terms) In the present invention, "secret information" includes privacy information. In the present invention, “mouth shape” includes “lip shape”.

【０００５】（本発明の内容）本発明の内容は、本明細
書の特許請求の範囲に記載したとおりのものである。よ
って、本明細書の特許請求の範囲の内容を、ここに援用
することとする（前記特許請求の範囲の記載をここに転
写することによる重複記載は省略する）。なお、本発明
においては、本明細書の特許請求の範囲の中に記載され
ているシステム（装置）のアイデアは、全て、「方法」
「プログラム」としても捉えることができる。すなわ
ち、本発明においては、本明細書の特許請求の範囲の中
において「システム（装置）」の形で記載された発明の
内容は、全て、「方法」「プログラム」の形でも記載す
ることができ、「方法」「プログラム」の形でも実現す
ることができる。すなわち、本発明においては、本明細
書の特許請求の範囲の中において、末尾に「システム
（装置）」という文字が記載された発明は、その全て
が、末尾に「方法」「プログラム」という文字で把握・
記載される発明とすることができる。すなわち、本明細
書の特許請求の範囲の内容は、その全てが「方法の発
明」「プログラムの発明」としても構成することがで
き、そのように「方法の発明」「プログラムの発明」と
して構成される発明も、本発明の範囲内である。本発明
の「プログラムの発明」の実施形態には、例えば、ＡＳ
Ｐ（アプリケーションサービスプロバイダー）がプログ
ラムをユーザーにインターネットなどのネットワークを
介して送信する場合における、その送信対象であるプロ
グラムなどが、含まれる。(Content of the present invention) The content of the present invention is as described in the claims of the present specification. Therefore, the contents of the claims of the present specification are incorporated herein (duplicate description by transferring the description of the claims herein is omitted). In the present invention, all ideas of the system (apparatus) described in the claims of this specification are referred to as “methods”.
It can be understood as a "program". That is, in the present invention, all the contents of the invention described in the form of “system (apparatus)” in the claims of the present specification can also be described in the form of “method” and “program”. It can be realized in the form of "method" or "program". That is, in the present invention, in the claims of the present specification, all the inventions in which the letters "system (apparatus)" are described at the end thereof have the letters "method" and "program" at the end. Grasp by
The described invention can be the described invention. That is, the contents of the claims in this specification can all be configured as “method inventions” and “program inventions”, and as such, are configured as “method inventions” and “program inventions”. Invented inventions are also within the scope of the present invention. Embodiments of the “program invention” of the present invention include, for example, AS
When the P (application service provider) transmits the program to the user via a network such as the Internet, the transmission target program is included.

【０００６】[0006]

【発明の実施の形態】実施形態１．図１は本発明の実施
形態１を示すもので、携帯電話に搭載された本実施形態
１による音声・口形状混合入力システムを示す概略ブロ
ック図である。図１において、１はユーザーの音声を入
力するための音声入力部で、例えばマイクなどで構成さ
れる。２はユーザーの口（唇）の形状を画像入力するた
めの口形状入力部で、例えばＣＣＤカメラなどで構成さ
れる。３は前記音声入力部１から入力された音声を、音
声と文字を関連付けて記録した音声データベース３ａを
参照して、対応する文字に変換するための音声文字変換
部である。４は前記口形状入力部２から入力された口
（唇）形状を、口（唇）形状と文字を関連付けて記録し
た口形状データベース４ａを参照して、対応する文字に
変換するための口形状文字変換部である。５は前記音声
文字変換部３からの出力と前記口形状文字変換部４から
の出力とに基づいて、ユーザーからの入力内容を決定す
るための入力内容決定部である。６は前記入力内容決定
部５からの出力を他の情報機器（記録装置を含む）にネ
ットワークを介して送信するための送信部（モデムなど
で構成される）、７は前記入力内容決定部５からの出力
を記録するための図１の携帯電話に内蔵された記録部
（メモリ）である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiment 1 FIG. FIG. 1 shows a first embodiment of the present invention, and is a schematic block diagram showing a mixed voice / mouth shape input system according to the first embodiment mounted on a mobile phone. In FIG. 1, reference numeral 1 denotes a voice input unit for inputting a user's voice, which is composed of, for example, a microphone. Reference numeral 2 denotes a mouth shape input unit for inputting an image of the shape of the mouth (lips) of the user, and is constituted by, for example, a CCD camera. Reference numeral 3 denotes a voice / character conversion unit for converting the voice input from the voice input unit 1 into a corresponding character by referring to a voice database 3a in which voice and characters are associated and recorded. Reference numeral 4 denotes a mouth shape for converting the mouth (lip) shape input from the mouth shape input unit 2 into a corresponding character by referring to a mouth shape database 4a recorded in association with the mouth (lip) shape and the character. It is a character conversion unit. Reference numeral 5 denotes an input content determining unit for determining the input content from the user based on the output from the voice character converting unit 3 and the output from the mouth-shaped character converting unit 4. Reference numeral 6 denotes a transmission unit (comprising a modem or the like) for transmitting an output from the input content determination unit 5 to another information device (including a recording device) via a network, and 7 denotes the input content determination unit 5 2 is a recording unit (memory) built in the mobile phone of FIG. 1 for recording an output from the mobile phone.

【０００７】また、図１において、８は、前記音声入力
部１に入力された音声の大きさ（レベル）を判定し、そ
の入力された音声の大きさが所定のレベルよりも小さい
ときは、そのことを示す信号を前記入力内容決定部５に
出力するための音声レベル判定部である。いま、前記入
力内容決定部５が、前記音声ケベル判定部８から、前記
音声入力部１に入力されたユーザーからの音声の大きさ
が所定のレベルよりも小さいこと（例えば、ユーザーか
らの音声の大きさが、人間の耳ではほとんど聞き取れな
い程の小さいレベルのものであること）を示す信号を受
信したときは、前記入力内容決定部５は、自動的に、前
記口形状入力部２から入力された口形状のみに基づいて
（より具体的には、前記口形状文字変換部４からの出力
のみに基づいて）、ユーザーからの入力内容を決定する
ようにしている。すなわち、本実施形態１においては、
前記音声入力部１から入力されるユーザーの音声が所定
レベルよりも低い（小さい）ときは、前記入力内容決定
部５は、前記音声レベル判定部８からの信号を受けて、
自動的に、前記口形状入力部２から入力された口形状の
情報（この口形状の情報に対応する文字情報）のみに基
づいて、ユーザーからの入力内容を決定するようにして
いる。よって、本実施形態１では、ユーザーがその口か
ら所定レベル以上の（例えば、通常の人間の耳に聞き取
れる大きさと同じかそれ以上の）音声を発しているとき
は、ユーザーの音声と口形状との両方の情報に基づい
て、ユーザーからの入力内容を決定するようにし、ユー
ザーがその口から所定レベル以上の（例えば、通常の人
間の耳に聞き取れる大きさと同じかそれ以上の）音声を
発していないときは、ユーザーの口形状の情報のみに基
づいて、ユーザーからの入力内容を決定するようにして
いる。In FIG. 1, reference numeral 8 denotes the loudness (level) of the voice input to the voice input unit 1, and when the level of the input voice is smaller than a predetermined level, This is a sound level determination unit for outputting a signal indicating this to the input content determination unit 5. Now, the input content determining unit 5 determines that the volume of the voice from the user input to the voice input unit 1 is smaller than a predetermined level (for example, the voice of the user). When the input content determination unit 5 receives a signal indicating that the size is of such a small level that the human ear can hardly hear it, the input content determination unit 5 automatically inputs the signal from the mouth shape input unit 2. The input content from the user is determined based only on the mouth shape that has been input (more specifically, based only on the output from the mouth shape character conversion unit 4). That is, in the first embodiment,
When the user's voice input from the voice input unit 1 is lower (lower) than a predetermined level, the input content determination unit 5 receives a signal from the voice level determination unit 8,
The input contents from the user are automatically determined based only on the mouth shape information (character information corresponding to the mouth shape information) input from the mouth shape input unit 2. Therefore, in the first embodiment, when the user is emitting a sound of a predetermined level or more from the mouth (for example, the same size or higher than a size that can be heard by a normal human ear), the user's voice and the shape of the mouth are compared. The input from the user is determined based on both pieces of information, and the user utters a sound from his / her mouth at a predetermined level or more (for example, at least as large as can be heard by a normal human ear). When there is no such information, the input content from the user is determined based on only the information on the mouth shape of the user.

【０００８】よって、本実施形態１によれば、ユーザー
は、通常は、自分の口から通常人に話すのと同様の大き
さの音声を発して情報を入力するようにすればよい（こ
の場合は、ユーザーからの音声と口形状との両方の情報
により、ユーザーからの入力内容が決定される）。そし
て、ユーザーは、特別の場合、すなわち、パスワードや
クレジトカード番号などの秘密情報を入力する場合とき
だけは、そのときだけ、通常よりも小さな声又はほとん
ど（全く）周囲に聞き取れない声などの「所定レベルよ
りも低いレベルの音声を発する」（この「所定レベルよ
りも低いレベルの音声を発する」場合の中には、「全く
音声を発しないで、口を動かすだけ」の場合をも含む）
ようにすればよい（この場合は、ユーザーからの口形状
の情報のみに基づいて、ユーザーからの入力内容が決定
される）。このように、本実施形態１では、ユーザー
は、通常の秘密でない情報の入力（この場合は、周囲に
情報の内容が聞こえてもよいので、音声入力で入力する
ことができる）と、パスワードなどの秘密情報の入力
（この場合は、周囲に情報の内容が聞かれてしまうとま
ずい）とを、秘密情報のところだけ音声を所定レベルよ
りも小さくする（周囲に聞こえないような小さい音で話
すか、全く音声を出さないで口の形状だけ話すのと同様
に動かす）ようにすることにより、シームレスに連続的
に、行うことができるようになる。したがって、本実施
形態１によれば、例えば、ユーザーがインターネット上
のバーチャルショップとの間で商品の購入の注文を音声
入力により行っているとき、商品の注文や価格の確認な
どの秘密情報でない情報については所定レベル以上の音
声で情報を入力し、自己のパスワードや生年月日やクレ
ジットカード番号などの秘密情報（プライバシー情報を
含む）を入力するときは、そのときだけ、所定レベル以
下の音声により（周囲に聞き取れないような小さい声を
出すか、又は、全く声を出さないで）情報を入力するこ
とができる。すなわち、本実施形態１によれば、ユーザ
ーは、通常は音声を発して入力しながら、その途中にプ
ライバシーなどの秘密情報の入力が必要なときは、その
ときだけ、所定レベル以下の音声にする（全く音声を発
しないようにすることを含む）ようにして、その秘密情
報が周囲の人に聞かれてしまうことを防止し、その秘密
情報の入力が終わった後は、また通常どおり、通常の音
声を発して入力を継続することができる。Therefore, according to the first embodiment, the user may normally input a piece of information by uttering a voice of the same size as speaking to a normal person from his / her own mouth (in this case, In this case, the input content from the user is determined by both the voice from the user and the information of the mouth shape. Only in special cases, that is, when inputting secret information such as a password or a credit card number, the user is required to enter a “predetermined voice” such as a voice lower than usual or a voice almost inaudible (at all). Emit a sound at a level lower than the level "(" Speaking a sound at a level lower than the predetermined level "includes" move the mouth without sound at all ")
(In this case, the input content from the user is determined based only on the mouth shape information from the user). As described above, in the first embodiment, the user inputs normal non-confidential information (in this case, the contents of the information may be heard around the user, so that the information can be input by voice input), a password, and the like. (In this case, it is not good to hear the contents of the information in the surroundings), it is necessary to make the sound lower than the predetermined level only for the secret information (speak with a small sound that cannot be heard by the surroundings). In other words, it is possible to perform the operation seamlessly and continuously by performing the same operation as speaking only the shape of the mouth without producing any sound. Therefore, according to the first embodiment, for example, when a user makes an order to purchase a product with a virtual shop on the Internet by voice input, information that is not confidential information such as a product order or a price confirmation. If you enter information with a voice higher than the predetermined level, and enter secret information (including privacy information) such as your password, date of birth, credit card number, etc. Information can be input (either in a quiet voice that is inaudible to the surroundings, or not at all). That is, according to the first embodiment, the user normally emits a voice and inputs the voice, and when it is necessary to input confidential information such as privacy in the middle of the voice, only at that time, makes the voice lower than a predetermined level. (Including not making any sound) to prevent the confidential information from being heard by people around you, And the input can be continued.

【０００９】実施形態２．上記の実施形態１では、図１
の音声文字変換部３（及び音声入力部１）と、口形状文
字変換部４（及び口形状入力部２）とは、同時に並行的
に動作するようにしているが、本実施形態２では、通常
は（すなわちユーザーが音声を所定レベル以上の音で発
しているときは）、前記音声文字変換部３（及び音声入
力部１）のみを作動させて音声入力方式によりデータ入
力を行うようにし、ユーザーからの音声が所定レベル以
下になったときだけ、自動的に、前記口形状文字変換部
４（及び口形状入力部２）を作動させて口形状入力方式
によりデータ入力を行うようにしてもよい。すなわち、
この実施形態２では、実施形態１と異なって、図１の音
声文字変換部３（及び音声入力部１）と、口形状文字変
換部４（及び口形状入力部２）とを、いずれか一方のみ
が作動して両者が同時には作動しないようにするという
「トレードオフの関係」にしている。Embodiment 2 In the first embodiment, FIG.
Although the voice character conversion unit 3 (and the voice input unit 1) and the mouth shape character conversion unit 4 (and the mouth shape input unit 2) operate in parallel at the same time, in the second embodiment, Normally (i.e., when the user emits a voice with a sound of a predetermined level or more), only the voice character conversion unit 3 (and the voice input unit 1) is operated to perform data input by a voice input method, Only when the voice from the user falls below a predetermined level, the mouth shape character conversion unit 4 (and the mouth shape input unit 2) is automatically activated to input data by the mouth shape input method. Good. That is,
In the second embodiment, unlike the first embodiment, one of the speech character conversion unit 3 (and the speech input unit 1) and the mouth shape character conversion unit 4 (and the mouth shape input unit 2) shown in FIG. It is a "trade-off relationship" that only operates and not both at the same time.

【００１０】[0010]

【発明の効果】本発明によれば、ユーザーは、通常は音
声を発して音声で入力しながら（上記の実施形態１の場
合は、音声入力と口形状入力との両者の方式を混合させ
て入力内容を決定しながら）、その途中にプライバシー
などの秘密情報の入力が必要なときは、そのときだけ、
所定レベル以下の音声にする（全く音声を発しないよう
にすることを含む）ようにして口形状のみからデータ入
力を行うようにして、その秘密情報が周囲の人に聞かれ
てしまうことを防止し、その秘密情報の入力が終わった
後は、また通常どおり、通常の音声を発して音声入力で
入力を継続する（上記の実施形態１の場合は、音声入力
と口形状入力との両者の方式を混合させて入力内容を決
定することを継続する）ことができる。よって、本発明
によれば、ユーザーは、通常の秘密でない情報の入力
（この場合は、周囲に情報の内容が聞こえてもよいの
で、音声入力で入力することができる）と、パスワード
などの秘密情報の入力（この場合は、周囲に情報の内容
が聞かれてしまうとまずい）とを、秘密情報のところだ
け音声を所定レベルよりも小さくする（周囲に聞こえな
いような小さい音で話すか、全く音声を出さないで口の
形状だけ話すのと同様に動かす）ようにすることによ
り、シームレスに且つ連続的に、行うことができるよう
になる。According to the present invention, the user normally emits voice and inputs the voice (in the case of the first embodiment, the user mixes both the voice input and the mouth shape input). If you need to enter confidential information such as privacy during the process,
Prevents the secret information from being heard by surrounding people by inputting data only from the mouth shape by making the sound below a predetermined level (including making no sound at all) Then, after the input of the secret information is completed, a normal voice is emitted and the input is continued by the voice input as usual (in the case of the first embodiment, both the voice input and the mouth shape input are performed). It is possible to continue to determine the input contents by mixing the methods). Therefore, according to the present invention, the user can input normal non-secret information (in this case, the contents of the information can be heard around, so that the user can input by voice input) and secret information such as a password. The input of information (in this case, it is bad if the contents of the information are heard by others) means that the sound is lower than the predetermined level only for the secret information (speaking with a small sound that can not be heard by others, (Similar to speaking only the shape of the mouth without any sound) so that it can be performed seamlessly and continuously.

[Brief description of the drawings]

【図１】本発明の実施形態１を示す概略ブロック図。FIG. 1 is a schematic block diagram showing a first embodiment of the present invention.

─────────────────────────────────────────────────────
────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成１４年４月８日（２００２．４．８）[Submission date] April 8, 2002 (2002.4.8)

【手続補正１】[Procedure amendment 1]

【補正対象書類名】図面[Document name to be amended] Drawing

【補正対象項目名】全図[Correction target item name] All figures

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【図１】 FIG.

Claims

[Claims]

1. A voice input means for inputting voice from a user, a mouth shape input means for inputting an image of a user's mouth shape, information from the voice input means and / or the mouth shape input Means for determining input contents by the user based on information from the means. A system for inputting voice and / or mouth shape, comprising:

2. The apparatus according to claim 1, wherein the input content determining means automatically outputs the voice from the mouth shape input means when the volume of the voice input to the voice input means is smaller than a predetermined level. A system for voice and / or mouth shape input, wherein the input content of a user is determined based on information.

3. Inputting a voice from a user;
Inputting an image of a user's mouth shape, and / or inputting an image of the user's mouth shape; and input content by the user based on the input voice information and / or the input mouth shape information. And determining a speech and / or mouth shape input.

4. The input mouth shape according to claim 3, wherein in the input content determination step, when the volume of the voice input from the user is smaller than a predetermined level, the input mouth shape is automatically set. A method for mixed input of voice and mouth shape, characterized in that the input content by the user is determined based on the information of the user.

5. Inputting a voice from a user,
A function of inputting an image of a user's mouth shape and / or a function of inputting an image of the user's mouth shape based on the input voice information and / or the input mouth shape information; And a computer program for inputting voice and / or mouth shape for realizing the function.

6. The input content determination function according to claim 5, wherein when the volume of the voice input from the user is smaller than a predetermined level,
A computer program for voice and / or mouth shape input, wherein the input content by a user is determined based on the input information on the mouth shape.