JP3728921B2

JP3728921B2 - Voice command terminal device

Info

Publication number: JP3728921B2
Application number: JP10504698A
Authority: JP
Inventors: 裕幸横川
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 1998-04-15
Filing date: 1998-04-15
Publication date: 2005-12-21
Anticipated expiration: 2018-04-15
Also published as: JPH11296187A

Description

【０００１】
【発明の属する技術分野】
本発明は例えば音声認識カードなどの音声指令端末装置に関する。
【０００２】
【従来の技術】
近年、音声認識の技術が実用化されつつある。音声認識技術には不特定話者を対象にしたものと特定話者を対象にしたものとの２種類ある。例えば、音声認識機能を有する公共の施設などの前で、不特定話者がある名前を発音すると、この名前が認識されて対応する処理を実行するものが知られている。
【０００３】
これに対して、特定話者を対象にしたものは、特定の人間の音声を登録しておき、登録した人間のみを対象に音声認識を行って認識された音声に対応する処理を実行するものである。
【０００４】
【発明が解決しようとする課題】
しかしながら、上記した不特定話者を対象にした従来の音声認識装置は、不特定多数のユーザを対象としているので認識可能な語彙数が限られていたり、語彙数を増やそうとすると認識率が良好でなく実用の域までには到達していない。また、上記した特定話者を対象にした音声認識装置においては、認識率は高いが限られた特定の人間の間でしか使用することができないという問題がある。
【０００５】
さらに、音声認識を行なう場所が固定されていたので周囲の雑音などに影響されたり、他人が存在し得る環境で発音する必要があるので秘匿性を保持する必要がある場合には適していないという問題があった。
【０００６】
本発明の課題は、認識率の高い特定話者による音声認識機能を各ユーザが所持する端末に持たせてユーザが所望の場所で音声入力を行なえるようにすることで、秘匿性を保ちつつ不特定多数の音声を極めて高い認識率で認識できるようにすることである。
【０００７】
【課題を解決するための手段】
本発明の音声指令端末装置は、入力された音声を認識する音声認識手段と、この音声認識手段により認識された音声に対応する指令情報を音声とは異なる形態で他装置の機種別に登録する登録手段と、前記指令情報を受信して対応する処理を実行する他装置への接近を示す情報を当該他装置から取得したときに、前記登録手段に登録された当該他装置の指令情報を当該他装置に送信する送信手段とを具備することを特徴とする。
【００１０】
【発明の実施の形態】
以下、図面を参照して本発明の一実施形態を詳細に説明する。図１は、本発明の音声指令端末装置を適用した音声認識カードの構成を示す図であり、図中の参照番号１０は当該音声認識カード全体を制御する制御部としてのＣＰＵである。１１は音声入力部としてのマイクであり、１２はキーボードであり、音声指令キーや送信キーを含む種々の指令を入力するためのキーを備えている。１３はユーザにより入力された音声に対応する指令情報を自動販売機などの他の装置に送信するときの送信手段としての通信インタフェースであり、１４はユーザが選択可能な複数種類の対象機種を示すメニューを表示するための表示部１４であり、１５はユーザにより入力された音声を認識する音声認識部（音声認識手段）であり、１６は話者登録辞書として用いられるフラッシュＲＯＭであり、１７は入力された音声からＣＰＵ１０による検索により指令データを得るのに用いられる指令テーブルである。
【００１１】
図２は図１に示す指令テーブルの構成を示す図であり、ユーザが選択可能な複数の機種コードと、各機種において使用される指令データ（指令情報）と、各指令データに対応する音声データとが対応付けて記憶されている。図２では、例えば切符販売用の自動販売機Ａの機種コードに対して３つの指令データと音声データとが例として示されている。
【００１２】
図３は表示部１４に表示されるメニューの一例を示す図であり、ユーザが選択可能な機種として、１．自動販売機Ａ、２．自動販売機Ｂ、３．ドアＡ、４．ドアＢ、５．ＴＶが例として示されている。
【００１３】
以下に、音声認識カードに対する音声登録処理の詳細を図４に示すフローチャートを参照して説明する。
まず、音声指令キーを押すと、図３に示すような対象機種の候補が表示部１４に表示される（ステップＳ１）。ユーザはこのメニューを見て、例えば、切符購入用の自動販売機Ａを選択した後（ステップＳ２）、マイク１１を介して例えば、“東京”、“大阪”などの行き先を音声で入力する（ステップＳ３）。すると音声認識部１５において入力された音声の認識が行われる（ステップＳ４）。次に、指令テーブル１７内を走査して認識された音声データに対応する指令データを検索して（ステップＳ５）、検索された対応指令データを、選択した機種（ここでは自動販売機Ａ）の機種コードと共にテキストデータの形でフラッシュＲＯＭ１６に記憶する（ステップＳ６）。次に、登録すべき音声データが他にあるか否かを判断し（ステップＳ７）、他にある場合にはステップＳ３に戻って音声の入力を行なう。例えば、ユーザは、実行してほしい指令に対応する音声データを複数、対象機種ごとにあらかじめまとめて（例えば１日分）入力しておくことが可能である。これによって選択した機種に応じて音声データに対応する指令データを選択することができる。
【００１４】
また、音声データと、この音声データを変換した形態の音声データ（例えば、“パソコン”を“パーソナルコンピュータ”に変換する）とを対応付けてフラッシュＲＯＭ１６に登録しておき、“パソコン”という音声が入力されたときに“パーソナルコンピュータ”が登録されるようにしてもよい。
【００１５】
図５は上記の方法で登録した指令データを送信するときの処理の詳細を示すフローチャートである。まず、図７に示すように、ユーザ９９が音声認識カード１００を、切符購入用の自動販売機１０１に例えば３ｍ以内にまで近づけることにより自動販売機１０１から機種接近情報を受信するか、あるいはユーザ９９が送信キーを押したか否かを判定し（ステップＳ８、Ｓ９）、いずれかの判定がＹＥＳになった場合には、音声認識カード１００はあらかじめ登録されている音声データに対応する指令データと機種コードとを自動販売機に赤外線などの非接触手段により送信する（ステップＳ１０）。当該指令データが自動販売機１０１によって正常に認識されたときには応答発信が音声認識カード９９側に返されるのでステップＳ１１がＹＥＳとなって、送信すべき指令データが他に有るか否かを判断する（ステップＳ１２）。ここでＮＯの場合は処理を終了するが、ＹＥＳの場合にはステップＳ８に戻って機種接近情報があるか（ステップＳ８）あるいは送信キーが押されたか否か（ステップＳ９）の判定を行なう。一方、当該指令データが正常に認識されなかった場合には応答発信が無いのでエラー表示を行なう（ステップＳ１３）。
【００１６】
次に、図６のフローチャートを参照して自動販売機１０１によるコマンド受信処理の詳細を説明する。自動販売機１０１は機種接近情報を常に発信しており（ステップＳ２０）、これが音声認識カード１００により認識されると、音声認識カード１００側から指令データと機種コードとが送信されてくるので、その機種コードが自身の機種コードと一致する指令データを受信したか否かを判定する（ステップＳ２１）。機種コードが一致する指令データを受信したときには、この指令データを解析して（ステップＳ２２）、指令データに対応する物品（切符）をユーザ９９に提供することが可能かどうかを判定する（ステップＳ２３）。ここでユーザが所定額の金銭を投入し、かつ、販売可能な行き先であった場合にはＹＥＳとなり、次に、音声認識カード１００に対して応答発信を行ない（ステップＳ２４）、続いて当該物品（切符）をユーザ９９に提供する対応処理（ステップＳ２５）を行なった後、ステップＳ２０に戻る。また、ステップＳ２３においてＮＯの場合にはエラー表示（ステップＳ２６）を行ってステップＳ２０に戻る。
【００１７】
上記した実施形態によれば、ユーザは、音声認識カードの携帯性を利用して、他人の介在しない所望の場所であらかじめ音声登録を行ない、その後、自動販売機の設置してある場所に音声認識カードを運んで指令データを送信することができる。この場合の音声認識方法は、音声認識カードを所持するユーザのみの特定話者を対象としたものになるので、秘匿性を保ちつつ不特定多数の音声を極めて高い認識率で認識できる。これによって、切符の販売機の場合には、料金表を見なくともかつキー操作を行なうことなしに切符を購入する作業、及び視力の弱い人や、機械走査に不慣れな人が助けを必要とせずに切符を購入する作業が簡単かつ確実に行なえるようになる。
【００１８】
なお、コマンドの送信方法は非接触手段として赤外線の他に、光や電磁結合による方法を用いてもよい。また、自動販売機に専用のカードリーダを設けて電気的接触による方法を用いてもよい。また、切符購入用の自動販売機に限らず、飲料購入用など他の任意の自動販売機であってもよい。
【００１９】
以下に上記した音声認識カードをオートドアロックに適用した変形例を図８を参照して説明する。
まず、ユーザ１９９は、所望の場所で音声認識カード２００に所定の暗証番号を予め登録する。例えば、「すずきいちろう」と入力すると、暗証番号「５４１８４９７３」に変換されて登録される。音声の登録方法は上記した方法に準じて行なうことができる。
【００２０】
次に、ユーザ１９９は、ドアロックを通って入室／入館する場合、音声認識カード２００に「すずきいちろう」と話す。
次に、音声認識カード２００をオートロック装置２０１に近づける。受信範囲内に入れば、暗証番号「５４１８４９７３」がオートロック装置２０１に入力される。
【００２１】
次に、オートロック装置２０１はこの暗号番号を認識することにより、ドアロックが解除されてドア２０２からの入室／入館が可能になる。
上記した変形例によれば、秘匿性を保ちつつ不特定多数の音声を極めて高い認識率で認識できるようになる。また、長い暗証番号を覚える必要がなくなるので、入室／入館動作が簡略化される。また、オートロック装置などのキー操作が不要になり、利便性が大幅に向上するとともに、身体障害者やお年寄りなどにも使用できる。また、特定話者認識なので正当な使用者以外のものが音声認識カードを取得して、「すずきいちろう」と話しても認識できないため不正な入室／入館ができないので、電子ロックとしての安全性が格段に向上する。
【００２２】
さらなる変形例として、音声認識カードに辞書機能を持たせることで、家電製品のリモコンや電話帳などにも利用できる。例えば、テレビリモコンなどで、「わうわう」を音声入力し、これを「ＢＳ５」に変換した上でテレビに送信したり、電話帳において、「たまくやくしょ」を音声入力し、これを「０４４９３５３１１１」に変換した上で電話機に送信することも可能である。
【００２３】
【発明の効果】
本発明によれば、認識率の高い特定話者による音声認識機能を各ユーザが所持する端末装置に持たせ、この端末装置を他装置へ接近することによって、接近した当該他装置に対応して登録されている指令情報を当該他装置に送信することができ、他装置に対応して的確に指令情報を送信して他装置による処理を実行させることができる。
【図面の簡単な説明】
【図１】本発明の音声指令端末装置が適用される音声認識カードの構成を示す図である。
【図２】音声認識カードの指令テーブルの構成を示す図である。
【図３】音声認識カードの表示部に表示されるメニューの一例を示す図である。
【図４】音声認識カードに対する音声登録処理の詳細を説明するためのフローチャートである。
【図５】登録した指令データを送信するときの処理の詳細を示すフローチャートである。
【図６】自動販売機によるコマンド受信処理の詳細を説明するためのフローチャートである。
【図７】音声認識カードを自動販売機に適用したときの作用を説明するための図である。
【図８】音声認識カードをオートロック装置に適用したときの作用を説明するための図である。
【符号の説明】
１０…ＣＰＵ、
１１…マイク、
１２…キーボード、
１３…通信インタフェース、
１４…表示部、
１５…音声認識部、
１６…フラッシュＲＯＭ、
１７…指令テーブル。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a voice command terminal device such as a voice recognition card.
[0002]
[Prior art]
In recent years, voice recognition technology has been put into practical use. There are two types of speech recognition technologies: those targeting unspecified speakers and those targeting specific speakers. For example, it is known that when an unspecified speaker pronounces a name in front of a public facility having a voice recognition function, the name is recognized and a corresponding process is executed.
[0003]
On the other hand, for a specific speaker, a specific person's voice is registered, and voice recognition is performed only on the registered person, and processing corresponding to the recognized voice is executed. It is.
[0004]
[Problems to be solved by the invention]
However, the conventional speech recognition apparatus for unspecified speakers described above targets a large number of unspecified users, so the number of recognizable words is limited or the recognition rate is good when trying to increase the number of words It has not reached the practical range. In addition, the above-described speech recognition apparatus for a specific speaker has a problem that it can be used only by a limited number of specific persons with a high recognition rate.
[0005]
Furthermore, since the place where voice recognition is performed is fixed, it is not suitable for the case where it is necessary to maintain confidentiality because it is affected by surrounding noise, etc., or it must be pronounced in an environment where other people may exist. There was a problem.
[0006]
An object of the present invention is to provide a voice recognition function by a specific speaker with a high recognition rate to a terminal possessed by each user so that the user can perform voice input at a desired location, while maintaining confidentiality. It is to be able to recognize an unspecified number of voices with a very high recognition rate.
[0007]
[Means for Solving the Problems]
The voice command terminal device according to the present invention includes a voice recognition unit for recognizing an input voice and registration for registering command information corresponding to the voice recognized by the voice recognition unit for each device type in a form different from the voice. And the command information of the other device registered in the registration unit when the information indicating the approach to the other device that receives the command information and executes the corresponding process is acquired from the other device. Transmission means for transmitting to the apparatus.
[0010]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. FIG. 1 is a diagram showing a configuration of a voice recognition card to which a voice command terminal device of the present invention is applied. Reference numeral 10 in the figure is a CPU as a control unit that controls the entire voice recognition card. Reference numeral 11 denotes a microphone as a voice input unit, and reference numeral 12 denotes a keyboard, which includes keys for inputting various commands including voice command keys and transmission keys. Reference numeral 13 denotes a communication interface as transmission means when transmitting command information corresponding to voice input by the user to another device such as a vending machine. Reference numeral 14 denotes a plurality of types of target models that can be selected by the user. A display unit 14 for displaying a menu, 15 is a speech recognition unit (speech recognition means) for recognizing speech input by a user, 16 is a flash ROM used as a speaker registration dictionary, and 17 is It is a command table used for obtaining command data from a sound input by a search by the CPU 10.
[0011]
FIG. 2 is a diagram showing the configuration of the command table shown in FIG. 1, and a plurality of model codes selectable by the user, command data (command information) used in each model, and voice data corresponding to each command data Are stored in association with each other. In FIG. 2, for example, three command data and voice data are shown as an example for the model code of the vending machine A for ticket sales.
[0012]
FIG. 3 is a diagram illustrating an example of a menu displayed on the display unit 14. Vending machine A, 2. Vending machine B, 3. Door A, 4. Door B, 5. TV is shown as an example.
[0013]
Details of the voice registration process for the voice recognition card will be described below with reference to the flowchart shown in FIG.
First, when a voice command key is pressed, candidates for the target model as shown in FIG. 3 are displayed on the display unit 14 (step S1). The user looks at this menu and selects, for example, a vending machine A for ticket purchase (step S2), and then inputs a destination such as “Tokyo” or “Osaka” via the microphone 11 by voice (step S2). Step S3). Then, the speech recognition unit 15 recognizes the input speech (step S4). Next, command data corresponding to the voice data recognized by scanning the command table 17 is retrieved (step S5), and the retrieved corresponding command data is stored in the selected model (here, vending machine A). Along with the model code, it is stored in the flash ROM 16 in the form of text data (step S6). Next, it is determined whether there is other voice data to be registered (step S7). If there is another voice data, the process returns to step S3 to input voice. For example, the user can input a plurality of voice data corresponding to a command to be executed in advance for each target model (for example, for one day). As a result, the command data corresponding to the voice data can be selected according to the selected model.
[0014]
Also, the voice data and the voice data converted from the voice data (for example, “PC” is converted to “personal computer”) are associated with each other and registered in the flash ROM 16, and the voice “PC” is recorded. “Personal computer” may be registered when input.
[0015]
FIG. 5 is a flowchart showing details of processing when command data registered by the above method is transmitted. First, as shown in FIG. 7, the user 99 receives model approach information from the vending machine 101 by bringing the voice recognition card 100 close to the vending machine 101 for ticket purchase, for example, within 3 m, or the user 99 99 determines whether or not the transmission key has been pressed (steps S8 and S9), and if either determination is YES, the voice recognition card 100 includes command data corresponding to the voice data registered in advance. The model code is transmitted to the vending machine by non-contact means such as infrared rays (step S10). When the command data is normally recognized by the vending machine 101, a response transmission is returned to the voice recognition card 99 side, so step S11 is YES, and it is determined whether there is other command data to be transmitted. (Step S12). If NO, the process ends. If YES, the process returns to step S8 to determine whether there is model approach information (step S8) or whether the transmission key is pressed (step S9). On the other hand, if the command data is not recognized normally, no response is transmitted and an error is displayed (step S13).
[0016]
Next, details of command reception processing by the vending machine 101 will be described with reference to the flowchart of FIG. The vending machine 101 always transmits model approach information (step S20). When this is recognized by the voice recognition card 100, the command data and the model code are transmitted from the voice recognition card 100 side. It is determined whether or not command data whose model code matches its model code has been received (step S21). When command data that matches the model code is received, the command data is analyzed (step S22), and it is determined whether or not an article (ticket) corresponding to the command data can be provided to the user 99 (step S23). ). Here, if the user has inserted a predetermined amount of money and the destination is available for sale, the answer is YES, and then a response is sent to the voice recognition card 100 (step S24), followed by the article. After performing the corresponding process of providing (ticket) to the user 99 (step S25), the process returns to step S20. If NO in step S23, an error display (step S26) is performed and the process returns to step S20.
[0017]
According to the above-described embodiment, the user uses the portability of the voice recognition card to perform voice registration in advance at a desired place where no other person is present, and then performs voice recognition at the place where the vending machine is installed. Command data can be transmitted by carrying the card. Since the speech recognition method in this case is intended for a specific speaker only for a user who possesses a speech recognition card, an unspecified number of speeches can be recognized with a very high recognition rate while maintaining secrecy. As a result, in the case of ticket vending machines, people who have low vision or those who are unfamiliar with machine scanning need help without having to look at the price list and without having to operate the keys. You can easily and reliably purchase tickets without having to pay for them.
[0018]
As a command transmission method, light or electromagnetic coupling may be used as non-contact means in addition to infrared rays. Alternatively, a dedicated card reader may be provided in the vending machine and an electrical contact method may be used. Moreover, it is not limited to a vending machine for purchasing tickets, but may be any other vending machine for purchasing beverages.
[0019]
A modification in which the above-described voice recognition card is applied to an automatic door lock will be described below with reference to FIG.
First, the user 199 registers a predetermined password in advance in the voice recognition card 200 at a desired location. For example, if “Suzuki Ichiro” is entered, it is converted into a password “54184973” and registered. The voice registration method can be performed in accordance with the method described above.
[0020]
Next, when the user 199 enters / enters through the door lock, the user 199 speaks to the voice recognition card 200 as “Suzuki Ichiro”.
Next, the voice recognition card 200 is brought close to the auto-lock device 201. If it falls within the reception range, the password “54184973” is input to the auto-lock device 201.
[0021]
Next, the auto-lock device 201 recognizes this code number, so that the door lock is released and entry / entry from the door 202 becomes possible.
According to the above-described modification, an unspecified number of voices can be recognized with a very high recognition rate while maintaining confidentiality. In addition, since it is not necessary to memorize a long password, entry / entry operation is simplified. In addition, key operations such as an auto-lock device are not required, and the convenience is greatly improved, and it can also be used by persons with disabilities and the elderly. In addition, because it is a specific speaker recognition, it is impossible for unauthorized persons to obtain a voice recognition card and speak "Suzuki Ichiro" so that unauthorized entry / entry is not possible. Greatly improved.
[0022]
As a further modification, by providing the voice recognition card with a dictionary function, it can be used for a remote control of a home appliance, a telephone directory, or the like. For example, “Wow Wow” is input by voice using a TV remote control or the like, and this is converted to “BS5” and transmitted to the TV. It is also possible to transmit it to the telephone after converting it to “0449353111”.
[0023]
【The invention's effect】
According to the present invention, each terminal device possesses a voice recognition function by a specific speaker having a high recognition rate, and by approaching this terminal device to another device, it corresponds to the other device approaching. The registered command information can be transmitted to the other device, and the command information can be accurately transmitted corresponding to the other device to execute processing by the other device.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration of a voice recognition card to which a voice command terminal device of the present invention is applied.
FIG. 2 is a diagram showing a configuration of a command table of a voice recognition card.
FIG. 3 is a diagram showing an example of a menu displayed on the display unit of the voice recognition card.
FIG. 4 is a flowchart for explaining details of a voice registration process for a voice recognition card.
FIG. 5 is a flowchart showing details of processing when transmitting registered command data.
FIG. 6 is a flowchart for explaining details of command reception processing by the vending machine;
FIG. 7 is a diagram for explaining the operation when a voice recognition card is applied to a vending machine.
FIG. 8 is a diagram for explaining the operation when a voice recognition card is applied to an auto-lock device.
[Explanation of symbols]
10 ... CPU,
11 ... Mike,
12 ... Keyboard,
13. Communication interface,
14 ... display part,
15 ... voice recognition unit,
16 ... Flash ROM,
17 ... Command table.

Claims

Speech recognition means for recognizing the input speech;
Registration means for registering the command information corresponding to the voice recognized by the voice recognition means for each device type in a form different from the voice;
Transmits information indicating access to another device that executes processing corresponding to receive the instruction information when acquired from the other device, the command information of the other devices that are registered in the registration unit to the other device Sending means to
A voice command terminal device comprising:

2. The command information corresponding to the recognized voice is selected according to another device selected from the plurality of other devices. Voice command terminal device.