JPH07219583A

JPH07219583A - Method and device for speech processing

Info

Publication number: JPH07219583A
Application number: JP6008497A
Authority: JP
Inventors: Tsuyoshi Yagisawa; 津義八木沢
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1994-01-28
Filing date: 1994-01-28
Publication date: 1995-08-18

Abstract

PURPOSE:To enable smooth operation matching human sensations by considering a speech recognition result to be confirmed and outputting corresponding action information when the speech recognition result is displayed and no speech is inputted within a certain permissible time. CONSTITUTION:A speech recognizing process part 2 performs a speech recognizing process for a speech 'henshuu' (editing in English) inputted from a microphone 1 by using a word dictionary 3, holds 'henshuu' in KANJI(Chinese character) as its result in a speech recognition result holding part 4, and displays it at a state display part 6. A recognition result control part 5 confirms whether or not a time previously set by a timer 7 is elapsed and when some speech, e.g. 'chigau' (wrong) is inputted within the time, it is considered that the result displayed at the display part 6 is canceled, so that an input wait state is entered. When the previously set allowable timer is elapsed and no other speech is inputted, it is confirmed that the displayed result is correct; and action information corresponding to 'henshuu' in KANJI stored in the speech recognition result holding part 4 is outputted to an information output part 8.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、コンピュータ等のアプ
リケーションにおいて、そのアプリケーションを操作す
るための、一般にコマンドと言われているような各操作
などを音声でも行なうことができる音声処理方法及び装
置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice processing method and apparatus capable of performing various operations, which are generally called commands, for operating an application such as a computer by voice. It is a thing.

【０００２】[0002]

【従来の技術】一般に、コンピュータ等のアプリケーシ
ョンにおいては、そのアプリケーションの操作を、キー
ボード入力、マウスなどによる指定が一般的に行われて
いる。更に、最近では、音声入力によるものも出始めて
いる。2. Description of the Related Art Generally, in an application such as a computer, the operation of the application is generally designated by a keyboard input or a mouse. Furthermore, recently, voice input has begun to appear.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来お
よび現在の音声認識技術では、１００％の精度で正しく
認識することはまだ実現されてはいない。また、近い将
来においてもその実現は困難である。従って、誤認識さ
れた場合には、アプリケーションがその誤認識された結
果の操作を一旦実行してしまうので、ユーザは元の状態
に戻してから再び操作をし直すといった作業が、他のキ
ーボード入力による場合などよりも増えてしまいかねな
いという欠点があった。そこで、音声認識結果につい
て、キーボードなどの音声以外の入力装置で、ユーザに
毎回確認させてから操作を実行させるといった方法も考
えられているが、この方法では、正しく認識された場合
でも毎回確認を必要とするので、逆にわずらわしいとい
った欠点も生じ、結果的に、ユーザの心理状態を悪化さ
せてしまい、ユーザのスムーズな操作を妨げたり、コン
ピュータに対する不信感をむやみに高めてしまう恐れが
ある。However, in the conventional and current speech recognition techniques, correct recognition with 100% accuracy has not been realized yet. Also, it will be difficult to realize it in the near future. Therefore, if the application is erroneously recognized, the application once executes the operation resulting from the erroneous recognition, so that the user needs to return to the original state and then perform the operation again. There is a drawback that it may increase more than in the case of. Therefore, it is also considered to make the user confirm the voice recognition result with an input device other than the voice such as a keyboard each time, and then perform the operation. Since it is necessary, there is a drawback that it is troublesome, and as a result, the psychological state of the user may be deteriorated, the smooth operation of the user may be hindered, and distrust of the computer may be unnecessarily increased.

【０００４】[0004]

【課題を解決するための手段】本発明は、上述の欠点を
除去し、音声認識結果を状況提示手段に提示したのち、
時間計測手段によって計測される或る一定猶予時間内に
他の制御情報（音声やキーボードなど）が入力されなか
った場合には、ユーザによって確認されたものと判断
し、音声認識結果の対応するアクション情報を出力し、
一方、他の何らかの制御情報（音声やキーボードなど）
が入力された場合には、前の音声認識結果を無効にする
認識結果制御手段を設け、人間の感覚にマッチした、結
果的にスムーズな操作をできるようにしたものである。The present invention eliminates the above-mentioned drawbacks and presents a voice recognition result to a situation presenting means,
If no other control information (voice, keyboard, etc.) is input within a certain grace period measured by the time measuring means, it is determined that the confirmation has been made by the user, and the action corresponding to the voice recognition result is determined. Output information,
On the other hand, some other control information (voice, keyboard, etc.)
When is input, a recognition result control means for invalidating the previous speech recognition result is provided, and as a result, a smooth operation that matches the human sense can be performed.

【０００５】[0005]

【実施例】以下、図面を参照して本発明を詳細に説明す
る。図１は、本発明の一実施例に係る装置の構成を示す
ブロック図である。同図において、１は音声を入力する
ためのマイク、２は入力された音声を認識する音声認識
処理部、３は音声認識処理部２で用いる単語辞書、４は
音声認識処理部２で認識された結果を保持する音声認識
結果保持部、５はタイマー７からの情報を参照しなが
ら、音声認識保持部に保持されている結果を、単語辞書
３を参照して状況提示部６に提示したり、あるいは、無
効にしたり、あるいは、決定したりするなどの制御を行
なう認識結果制御部、６は認識結果制御部５によって、
音声認識保持部４に保持されている結果などの状況を提
示する状況提示部、７は時間経過を計測するタイマー、
８は単語辞書３を参照して、認識結果制御部５によって
最終的に決定された認識結果に対応するアクション情報
を出力する情報出力部である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described in detail below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an apparatus according to an embodiment of the present invention. In the figure, 1 is a microphone for inputting voice, 2 is a voice recognition processing unit that recognizes the input voice, 3 is a word dictionary used in the voice recognition processing unit 2, and 4 is recognized by the voice recognition processing unit 2. The voice recognition result holding unit 5 which holds the result is referred to the information from the timer 7 and presents the result held in the voice recognition holding unit to the situation presenting unit 6 with reference to the word dictionary 3. Alternatively, the recognition result control unit 6 for controlling the invalidation, the determination, etc.
A situation presenting unit that presents a situation such as a result held in the voice recognition holding unit 4, a timer 7 that measures elapsed time,
An information output unit 8 refers to the word dictionary 3 and outputs action information corresponding to the recognition result finally determined by the recognition result control unit 5.

【０００６】図２は、図１に示した装置における動作の
処理手順を示すフローチャートである。本図を参照しな
がら、図４（ａ）〜（ｄ）に示す入力音声「ヘンシュ
ー」による操作を例に本発明の動作手順を示す。FIG. 2 is a flow chart showing the processing procedure of the operation in the apparatus shown in FIG. With reference to this figure, the operation procedure of the present invention will be described by taking as an example the operation by the input voice "Henschu" shown in FIGS.

【０００７】図２において、ステップＳ０で、タイマー
リセットなどの初期設定を行ない、ステップＳ１に移
る。In FIG. 2, in step S0, initialization such as timer reset is performed, and the process proceeds to step S1.

【０００８】ステップＳ１で、図４（ａ）のような音声
入力待ち状態の画面でマイク１から音声が入力されるの
を待ち、例えば「ヘンシュー」と音声が入力されたら、
ステップＳ２に移る。In step S1, a voice input is waited for from the microphone 1 on the screen for waiting for voice input as shown in FIG. 4A. For example, when "Henshu" is input,
Go to step S2.

【０００９】ステップＳ２では、音声認識処理部２にお
いて、入力された音声「ヘンシュー」に対して、単語辞
書３を用いて音声認識処理を行なう。音声認識処理中
は、その旨、図４（ｂ）のように表示する。その結果で
ある「編集」を音声認識結果保持部４に保持し、ステッ
プＳ３に移る。In step S2, the voice recognition processing unit 2 performs voice recognition processing on the input voice "Henschu" using the word dictionary 3. During the voice recognition process, that effect is displayed as shown in FIG. The result "edit" is held in the voice recognition result holding unit 4, and the process proceeds to step S3.

【００１０】次に、ステップＳ３では、認識結果制御部
５において、音声認識結果保持部４の中に格納されてい
る結果である「編集」を状況提示部６に図４（ｃ）のよ
うに提示し、ステップＳ４に移る。Next, in step S3, in the recognition result control unit 5, the result "edit" stored in the voice recognition result holding unit 4 is displayed in the situation presenting unit 6 as shown in FIG. 4 (c). Present and move to step S4.

【００１１】ステップＳ４では、認識結果制御部５にお
いて、タイマー７をリセットし、予め設定されている猶
予時間の確認モードに入る。ここで、タイマー７により
計測している時間を、図４（ｃ）の４１のようにユーザ
に報知しても良い。ここでは、０．５秒ごとに「・」を
表示する例を示す。In step S4, the recognition result control section 5 resets the timer 7 and enters a preset mode for confirming the grace time. Here, the time measured by the timer 7 may be notified to the user as indicated by 41 in FIG. 4C. Here, an example is shown in which "." Is displayed every 0.5 seconds.

【００１２】ステップＳ５では、認識結果制御部５にお
いて、あらかじめ設定された時間が経過したかどうかを
一定時間間隔で確かめる。もしも、設定された時間が経
過していなければ、ステップＳ６に移り、また、設定さ
れた時間が経過していれば、ステップＳ７に移る。In step S5, the recognition result control section 5 confirms at predetermined time intervals whether or not a preset time has elapsed. If the set time has not elapsed, the process proceeds to step S6, and if the set time has elapsed, the process proceeds to step S7.

【００１３】ステップＳ６では、何らかの音声、例えば
「チガウ」が入力されたか否かを確かめ、もしも、音声
が入力されていれば、状況提示部６に提示されているも
のがキャンセルされたものとして、状況提示部６にその
旨の内容（すなわち、入力待ち状態のメッセージ）を図
４（ａ）のように提示してステップＳ０に移る。もし
も、音声が入力されていなければ、ステップＳ５に移
る。In step S6, it is confirmed whether or not any voice, for example, "chigau" is input, and if the voice is input, it is determined that the one presented in the situation presenting section 6 is canceled. The content (that is, the message in the input waiting state) to that effect is presented to the situation presentation unit 6 as shown in FIG. 4A, and the process proceeds to step S0. If no voice is input, the process proceeds to step S5.

【００１４】一方、予め設定されている猶予時間の２秒
間が経過し、その間他の音声が入力されなかった場合
は、ステップＳ７で、図４（ｃ）の状況提示部６に提示
されているものがユーザによって受け付けられた（正し
いと確認された）と解釈して、単語辞書３に格納されて
いる情報から音声認識結果保持部４に格納されている
「編集」に対応するアクション情報である「ｃｏｍｍａ
ｎｄ−ｅ」を情報出力部８に出力し、その結果アプリケ
ーションがアクションを実行して今回の入力音声に関す
る処理を終了し、表示も図４（ｄ）のように（ａ）と同
じ状態に戻る。On the other hand, when the preset grace time of 2 seconds has elapsed and no other voice is input during that time, the situation presenting section 6 of FIG. 4C is presented in step S7. It is the action information corresponding to the "edit" stored in the voice recognition result holding unit 4 from the information stored in the word dictionary 3 by interpreting that the object has been accepted (confirmed as correct) by the user. "Comma
"nd-e" is output to the information output unit 8, and as a result, the application executes the action to end the processing related to the input voice this time, and the display is also returned to the same state as (a) as in FIG. 4 (d). .

【００１５】図３は、単語辞書３の内容の一例を示した
ものである。FIG. 3 shows an example of the contents of the word dictionary 3.

【００１６】尚、本実施例では、認識結果制御手段にお
いて、あらかじめ設定した猶予時間内に他の音声が入力
された際にキャンセルする方法について説明したが、あ
らかじめ設定した猶予時間内にキーボードやマウスなど
の他の入力装置からの制御情報の入力に対しても同様の
処置ができる形式にしてもよい。In this embodiment, the recognition result control means has explained the method of canceling when another voice is input within the preset grace time. However, the keyboard and mouse are canceled within the preset grace time. The same processing may be applied to the input of control information from other input devices such as.

【００１７】尚、本実施例では、単語辞書に格納する単
語として、アプリケーションのいわゆるマコンドについ
て説明したが、これに限定される訳ではなく、任意の単
語とその情報を格納しておくようにしてもよい。In this embodiment, the so-called Macondo of the application has been described as a word to be stored in the word dictionary, but the present invention is not limited to this, and an arbitrary word and its information may be stored. Good.

【００１８】尚、本実施例では、認識結果制御手段にお
いて、認識結果制御の対象を、単語辞書に格納されてい
る全ての単語にした例について説明したが、これに限定
される訳ではなく、単語辞書に格納されている各単語に
認識結果制御の対象とするか否かのマークを付与し、マ
ークが付与されている単語についてのみ認識結果制御を
行なうようにしてもよい。In this embodiment, the recognition result control means is described as an example in which the recognition result control target is all the words stored in the word dictionary, but the invention is not limited to this. It is also possible to add a mark to each word stored in the word dictionary as to whether or not to be a target of the recognition result control, and to perform the recognition result control only for the marked word.

【００１９】尚、本実施例では、認識結果制御手段にお
いて、あらかじめ設定した猶予時間を２秒として説明し
たが、これに限定される訳ではなく、ユーザが随時およ
び任意に設定できる形式にしてもよい。In the present embodiment, the recognition result control means has been described with the preset grace time being 2 seconds, but the present invention is not limited to this, and the user may set the grace time at any time. Good.

【００２０】尚、本実施例では、認識結果制御手段にお
いて、あからじめ設定した猶予時間内に他の音声が入力
された際にキャンセルする方法について説明したが、キ
ャンセルするのではなく、最初に認識された結果の次候
補を順次提示させる形式にしてもよい。In the present embodiment, the recognition result control means has explained the method of canceling when another voice is input within the grace time that has been set, but it is not canceled but first. The next candidate of the result recognized by may be displayed in order.

【００２１】尚、本実施例では、単語辞書を１つにして
処理する例について説明したが、これに限定される訳で
はなく、例えば、アプリケーションのいわゆるコマンド
を登録した単語辞書と、認識結果制御手段において、予
め設定された猶予時間内に入力できる制御用単語を登録
した制御用辞書などの複数の辞書をして、それぞれ使い
分ける形式にしてもよい。In this embodiment, an example in which one word dictionary is processed has been described, but the present invention is not limited to this. For example, a word dictionary in which a so-called command of an application is registered and recognition result control are provided. In the means, a plurality of dictionaries, such as a control dictionary in which control words that can be input within a preset grace period are registered, may be used separately.

【００２２】尚、本実施例では、日本語を例にとり説明
したが、これに限定されるものではなく、英語やドイツ
語などのどのような言語についても適用できる。In the present embodiment, Japanese has been described as an example, but the present invention is not limited to this, and any language such as English or German can be applied.

【００２３】尚、本実施例では、状況提示部において、
文字による状況提示を行なう例について説明したが、こ
れに限定されるものではなく、例えば、絵などの他の視
聴覚情報による状況提示を行なう形式にしてもよい。In this embodiment, in the situation presenting section,
Although the example of presenting the situation by characters is described, the present invention is not limited to this, and the situation may be presented by other audiovisual information such as a picture.

【００２４】[0024]

【発明の効果】以上説明したように、本発明によれば、
音声認識結果を状況提示手段に提示したのち、時間計測
手段によって計測される或る一定猶予時間内に音声が入
力されなかった場合には、ユーザによって、音声認識結
果が正しいものと確認されたものとみなし、音声認識結
果の対応するアクション情報を出力し、一方、何らかの
音声が入力された場合には、ユーザが音声認識結果をキ
ャンセルしたいものとみなし、その音声認識結果を無効
にする制御を設けることによって、人間の感覚にマッチ
した、スムーズな操作を行なうことができるという効果
が得られる。As described above, according to the present invention,
After presenting the voice recognition result to the situation presenting means, if no voice is input within a certain grace period measured by the time measuring means, the user has confirmed that the voice recognition result is correct. And output the action information corresponding to the voice recognition result, while if any voice is input, consider that the user wants to cancel the voice recognition result and provide a control to invalidate the voice recognition result. As a result, it is possible to obtain an effect that a smooth operation that matches the human sense can be performed.

[Brief description of drawings]

【図１】本発明の実施例に係る機能ブロック図FIG. 1 is a functional block diagram according to an embodiment of the present invention.

【図２】本発明の実施例に係る処理手順を示すフローチ
ャートFIG. 2 is a flowchart showing a processing procedure according to the embodiment of the present invention.

【図３】本発明の実施例を説明するための単語辞書の内
容の例を示す図FIG. 3 is a diagram showing an example of contents of a word dictionary for explaining an embodiment of the present invention.

【図４】本発明の実施例を説明するための具体的処理の
過程を示した例を示す図FIG. 4 is a diagram showing an example showing a process of specific processing for explaining an embodiment of the present invention.

Claims

[Claims]

1. An input unit for inputting a voice, a recognition unit for recognizing the input voice, a result holding unit for holding the recognized result, and a presenting unit for presenting the held voice recognition result. A time measuring means for measuring the passage of a predetermined time, and a control means for controlling to execute a process according to the voice recognition result held in the result holding means after the passage of the predetermined time by the time measuring means A voice processing device comprising:

2. The voice according to claim 1, wherein, when an invalidation instruction is input before the predetermined time is measured by the time measuring means, the contents held in the result holding means are invalidated. Processing equipment.

3. Inputting a voice, recognizing the input voice, holding the recognized result, presenting the held information, and starting measurement of a lapse of a predetermined time, lapse of the predetermined time Is measured, and then control is performed so as to execute processing according to the held information.

4. The information processing method according to claim 3, wherein if the invalidation instruction is input before the predetermined time is measured, the held information is invalidated.