JP2000076241A

JP2000076241A - Voice recognizor and voice input method

Info

Publication number: JP2000076241A
Application number: JP10250013A
Authority: JP
Inventors: Takashi Amari; 隆甘利
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1998-09-03
Filing date: 1998-09-03
Publication date: 2000-03-14

Abstract

PROBLEM TO BE SOLVED: To discriminate an editing command from an input word which is inputted in voices. SOLUTION: When a voice break is detected by a voice break detector 3 of this voice input device, a main recognition engine 4 recognizes a voice with the detected voice break used as a trigger and the recognized voice is stored in a text area by a document preparing device 6. If a new recognition trigger is detected while the engine 4 is executing its recognizing operation, the syllable of the detected trigger is recognized by a sub-recognition engine 5. This recognized syllable is incompletely recognized by the engine 4 and accordingly the incomplete word is deleted from the text area. Meanwhile, the word recognized by the engine 5 is interrupted as an editing command and a sentence editing device 7 edits a text in response to the editing command.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】音声入力ワープロTECHNICAL FIELD The speech input word processor

【従来の技術】従来、音声入力で文章を作成する音声入
力ワードプロセッサがある。これは、マイク等から入力
された音声を、音声認識装置を用いて文字列に変換し、
これを文字処理装置の編集メモリへ挿入・追加すること
によって文章を作成するシステムである。しかし、実際
の文章作成においては、単に文字の挿入だけでなく、既
に入力した文字の削除等の編集操作も必要になってく
る。2. Description of the Related Art Conventionally, there is a speech input word processor for creating sentences by speech input. This is to convert the voice input from a microphone etc. into a character string using a voice recognition device,
This is a system for creating a text by inserting and adding it to the editing memory of the character processing device. However, in actual text creation, not only insertion of characters but also editing operations such as deletion of already input characters are required.

【０００２】[0002]

【発明が解決しようとする課題】この削除等の編集コマ
ンドも、音声入力によって「さくじょ」と発音して実現
するシステムの場合、文書中に「削除」という単語を意
図的に入力したい場合も、音声入力した「削除」は編集
コマンドとして扱われてしまい、「削除」という単語が
入力できない欠点がある。このように、編集コマンドを
音声入力するシステムでは、編集コマンドを示す単語
を、文書中の単語として入力することが困難であった。In the case of a system in which this editing command for deletion or the like is realized by pronouncing “sakuju” by voice input, when the word “deletion” is intentionally input in a document However, there is a disadvantage in that the "delete" input by voice is treated as an edit command, and the word "delete" cannot be input. As described above, in a system for inputting an edit command by voice, it is difficult to input a word indicating the edit command as a word in a document.

【０００３】また、編集コマンドをキーボード等の音声
認識以外のデバイスから入力するシステムの場合、音声
入力とキーボード入力とを使い分けなければならず、ユ
ーザにとって使い勝手の悪いシステムになってしまうと
いう欠点がある。In the case of a system for inputting an editing command from a device other than voice recognition such as a keyboard, there is a disadvantage that the voice input and the keyboard input must be used properly, and the system becomes inconvenient for the user. .

【０００４】本発明は上記従来例に鑑みてなされたもの
で、音声入力される文入力用の単語と編集コマンドとを
識別し、操作性の良い音声認識装置及び音声入力方法を
提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned prior art, and has as its object to provide a speech recognition apparatus and a speech input method which distinguish between a sentence input word to be inputted by speech and an edit command, and have good operability. Aim.

【０００５】また、編集コマンドとして、話者が文章中
に挿入する、その文章の編集に係る単語を用いること
で、ユーザが無理なく利用できる音声認識装置及び音声
入力方法を提供することを目的とする。It is another object of the present invention to provide a speech recognition device and a speech input method which can be used by a user without difficulty by using a word inserted by a speaker in a sentence as an editing command. I do.

【０００６】[0006]

【課題を解決するための手段】以上の目的を達成するた
めに、本発明は以下のような構成からなる。すなわち、
入力される音声から単語を認識する第１の認識手段と、
入力される音声から指示を認識する第２の認識手段と、
前記第１の認識手段により認識された単語を蓄積して文
を作成するとともに、前記第２の認識手段により認識さ
れた指示に応じて前記文を編集する編集手段とを備える
音声認識装置。In order to achieve the above object, the present invention has the following arrangement. That is,
First recognition means for recognizing a word from an input voice;
Second recognition means for recognizing an instruction from an input voice;
A speech recognition apparatus comprising: an editing unit that accumulates a word recognized by the first recognition unit to create a sentence and edits the sentence according to an instruction recognized by the second recognition unit.

【０００７】あるいは、第１の認識手段により音声認識
された単語を文に追加する文作成工程と、第２の認識手
段により音声認識された指示に基づいて前記文を編集す
る文編集工程とを備える音声入力方法。[0007] Alternatively, a sentence creating step of adding a word speech-recognized by the first recognizing means to a sentence and a sentence editing step of editing the sentence based on an instruction speech-recognized by the second recognizing means. Voice input method to be equipped.

【０００８】あるいは、コンピュータに、第１の認識手
段により音声認識された単語を文に追加する文作成工程
と、第２の認識手段により音声認識された指示に基づい
て前記文を編集する文編集工程とを実行させるためのプ
ログラムを格納したコンピュータ可読記憶媒体。[0008] Alternatively, a sentence creating step of adding a word speech-recognized by the first recognizing means to a sentence, and a sentence editing step of editing the sentence based on an instruction speech-recognized by the second recognizing means. A computer-readable storage medium storing a program for executing the steps.

【０００９】[0009]

【発明の実施の形態】図１は、本発明第１の実施例であ
る音声入力ワードプロセッサのブロック図である。マイ
ク１から入力された音声信号は、マイクアンプ２によっ
て音声認識用に処理しやすいようにノイズ除去・音量レ
ベルの安定化を施された後、後述のメイン認識エンジン
４とサブ認識エンジン５に送られる。さらにマイクアン
プ２は、音声入力の有無をデジタル情報に変換して音声
区切検出装置３へ送る機能も持つ。FIG. 1 is a block diagram of a speech input word processor according to a first embodiment of the present invention. The audio signal input from the microphone 1 is subjected to noise removal and volume level stabilization by the microphone amplifier 2 so as to be easily processed for voice recognition, and then sent to a main recognition engine 4 and a sub-recognition engine 5 to be described later. Can be Further, the microphone amplifier 2 has a function of converting the presence or absence of a voice input into digital information and sending the digital information to the voice segment detection device 3.

【００１０】音声区切り検出装置３は、マイクアンプ２
から入力された音声の有無情報から、入力された単語の
区切りを検出し、メイン認識エンジン４の認識動作を制
御する機能を持つ。さらに、音声の有無情報とメイン認
識エンジン４が現在認識動作中であるか否かの情報とを
併せて、サブ認識エンジン５の認識動作を制御する機能
も持つ。この制御手順については後述する。The voice segment detecting device 3 includes a microphone amplifier 2
It has a function of detecting the break of the input word from the presence / absence information of the voice input from the PC and controlling the recognition operation of the main recognition engine 4. Further, it has a function of controlling the recognition operation of the sub-recognition engine 5 together with the presence / absence information of voice and the information of whether or not the main recognition engine 4 is currently performing the recognition operation. This control procedure will be described later.

【００１１】メイン認識エンジン４は、入力されたアナ
ログの音声情報をＡ／Ｄ変換し、このデジタル情報を処
理する音声認識チップを含んでいる。さらに音声認識チ
ップが認識動作中は次の音声を受け取ることが出来ない
ので、認識動作中であることを音声区切検出装置３へ出
力する機能も含んでいる。メイン認識エンジン４が有す
る認識辞書は一般の文章に用いられる単語で構成されて
いる。The main recognition engine 4 includes a voice recognition chip that performs A / D conversion on the input analog voice information and processes the digital information. Further, since the speech recognition chip cannot receive the next speech during the recognition operation, it also has a function of outputting to the speech segmentation detection device 3 that the recognition operation is being performed. The recognition dictionary of the main recognition engine 4 is composed of words used in general sentences.

【００１２】サブ認識エンジン５はメイン認識エンジン
４と同じ構造であるが、内蔵する辞書が異なっている。
サブ認識エンジン５の辞書は、「独り言」と言われるよ
うな単語を主体にして、例えば「あれ」「じゃない」
「えっと」「ちがう」など、音声入力しようとする文章
やコマンドには含まれないにも関らず思わずつぶやいて
しまうような単語を格納している。なお、図では、サブ
認識エンジン５が認識動作中であることを示す出力が記
されていないが、これはこの出力を必要とするものがな
いので記していないだけで、認識エンジンの機能として
はメイン認識エンジン４と同様に動作中を示す信号線が
あっても問題ない。The sub-recognition engine 5 has the same structure as the main recognition engine 4, but includes a different dictionary.
The dictionary of the sub-recognition engine 5 is mainly composed of words that are referred to as “solo”, for example, “that” or “not”
It stores words that are not included in the text or command to be input by voice, such as "Um" or "No", but are muttered as expected. In the figure, an output indicating that the sub-recognition engine 5 is performing a recognition operation is not described, but this is not described because there is no need for this output. Like the main recognition engine 4, there is no problem even if there is a signal line indicating that the operation is in progress.

【００１３】文書作成装置６は、メイン認識エンジン４
から出力される単語の文字列をワープロの文書作成用メ
モリに挿入することを機能を持つ。The document creation device 6 includes a main recognition engine 4
It has a function to insert a character string of a word output from a word processor into a document creation memory of a word processor.

【００１４】文書編集装置７は、サブ認識エンジン５か
ら出力される単語の意味に対応する動作を行い、例えば
文書作成装置６が作成中の文章の中の単語を削除するな
どの機能を持つ。記憶装置８は、文書作成装置６と文書
編集装置７とにより作成された文章を記憶しておくため
のものである。The document editing device 7 performs an operation corresponding to the meaning of the word output from the sub-recognition engine 5, and has a function of, for example, deleting a word in the text being created by the document creation device 6. The storage device 8 is for storing the text created by the document creation device 6 and the document editing device 7.

【００１５】表示器９は、作成中の文章等を表示する。The display 9 displays a sentence or the like being created.

【００１６】なお、不図示の外部Ｉ／Ｆによって作成し
た文章をプリントアウトしたり、通信回線を介して転送
することも可能である。It is also possible to print out a sentence created by an external I / F (not shown) or transfer the sentence via a communication line.

【００１７】また、本実施例では、メイン認識エンジン
４とサブ認識エンジン５の内にそれぞれＡ／Ｄ変換装置
を内蔵しているが、マイクアンプ２の次に独立して１つ
のＡ／Ｄ変換装置を設け、このデジタル出力を認識エン
ジンへ渡すような構成にしてもよい。In this embodiment, the A / D converter is built in each of the main recognition engine 4 and the sub recognition engine 5, but one A / D converter is provided independently of the microphone amplifier 2. A device may be provided to pass this digital output to the recognition engine.

【００１８】図２は、音声区切検出装置３の動作アルゴ
リズムを表したフローチャートである。FIG. 2 is a flowchart showing an operation algorithm of the voice segment detection device 3.

【００１９】ステップＳ１１でマイクアンプ２から入力
される音声有無情報を反転し、以降のステップで用いら
れるメイン認識エンジン４のトリガを作成する。ステッ
プＳ１２でこのメイン認識エンジン４用のトリガを取得
し、ステップＳ１３で信号変化のエッジが立ち下がりか
どうかを調べる。立ち下がりでない場合は再びステップ
Ｓ１１へ戻ってこの処理を続ける。In step S11, the voice presence / absence information input from the microphone amplifier 2 is inverted, and a trigger for the main recognition engine 4 used in the subsequent steps is created. In step S12, the trigger for the main recognition engine 4 is acquired, and in step S13, it is checked whether or not the edge of the signal change falls. If it does not fall, the process returns to step S11 to continue this process.

【００２０】メイン認識エンジン４のエッジが立ち下が
りの場合は、ステップＳ１４へ行き、現在のメイン認識
エンジン４の動作状態を取得する。次にステップＳ１５
でメイン認識エンジン４が認識中であるかどうかを調
べ、認識中の時はステップＳ１６へ、認識中でない時は
ステップＳ１７へ分岐する。If the edge of the main recognition engine 4 is falling, the flow goes to step S14 to acquire the current operation state of the main recognition engine 4. Next, step S15
It is checked whether or not the main recognition engine 4 is recognizing. If it is, the process branches to step S16. If it is not, the process branches to step S17.

【００２１】ステップＳ１６では、認識中のメイン認識
エンジン４に代わってサブ認識エンジン５を音声入力待
ち状態にし、マイクアンプ２から入力される音声を取得
するモードにする。これに対し、ステップＳ１７では、
サブ認識エンジン５を停止状態にしてマイクアンプ２か
ら入力された音声を処理しないようにする。In step S16, the sub-recognition engine 5 is placed in a standby state for voice input in place of the main recognition engine 4 being recognized, and a mode for acquiring voice input from the microphone amplifier 2 is set. On the other hand, in step S17,
The sub-recognition engine 5 is stopped so that the voice input from the microphone amplifier 2 is not processed.

【００２２】ステップＳ１６およびステップＳ１７が終
わった後は、ステップＳ１８で全体の処理を終了する。After steps S16 and S17 have been completed, the entire process ends in step S18.

【００２３】以上のアルゴリズムによって、音声区切検
出装置３は、メイン認識エンジン４が認識動作中に入力
された音声を、サブ認識エンジン５によって認識させ
る。With the above-described algorithm, the speech segmentation detection device 3 causes the sub-recognition engine 5 to recognize the speech input during the recognition operation of the main recognition engine 4.

【００２４】なお、本実施例ではステップＳ１１で音声
有無信号を反転してステップＳ１２用のトリガを作成し
ていたが、初めから立ち上がりエッジで認識エンジンの
トリガとなるような構成の場合、ステップＳ１１は不必
要になる。In the present embodiment, the trigger for step S12 is created by inverting the voice presence / absence signal in step S11. However, in the case of a configuration in which the rising edge triggers the recognition engine from the beginning, step S11 Becomes unnecessary.

【００２５】さらに、音声区切検出装置３は、ソフトウ
エアによって実現することも、フリップフロップ回路等
を組合せたハードウエアによっても実現することもでき
る。ソフトウエアにより実現する場合には、不図示のＣ
ＰＵにより図２の手順のプログラムを実行させる。Further, the voice segment detection device 3 can be realized by software or by hardware combining a flip-flop circuit or the like. In the case of realizing by software, C (not shown)
The PU executes the program of the procedure of FIG. 2.

【００２６】図３は、図１の装置によって実際に音声入
力を処理する様子の一例を示したタイミングチャートで
ある。具体的には、ユーザがマイク１に向かって「私の
名前は田中です」と言ってその文章を入力したい時に、
誤って「田村」と言ってしまい、その直後に慌てて「違
う」と言った状況を想定している。FIG. 3 is a timing chart showing an example of how speech input is actually processed by the apparatus shown in FIG. Specifically, when the user wants to input the text by saying “My name is Tanaka” to microphone 1,
Imagine a situation where you mistakenly say "Tamura" and then immediately rush "No".

【００２７】図中のグラフＧ０１文章に、この音声の状
態を記している。ポイントは、「ちがう」という音声が
「たむら」に密接して発生されている点である。ちなみ
に、従来の音声認識システムでは、このような場合、音
声認識エンジンが「たむら」という音声を認識中のた
め、「ちがう」という音声を取り込むことが出来なかっ
た。The state of this sound is described in a graph G01 sentence in the figure. The point is that the voice of "difference" is generated closely to "tamura". In such a case, in the conventional speech recognition system, in such a case, since the speech recognition engine is recognizing the speech "tamura", the speech "different" cannot be captured.

【００２８】グラフＧ０２音声レベルは、マイク１から
入力されたこの音声が、マイクアンプ２から出力された
様子を示したイメージ図である。The graph G02 sound level is an image diagram showing a state in which the sound input from the microphone 1 is output from the microphone amplifier 2.

【００２９】グラフＧ０３音声有無情報では、上記アナ
ログ信号を音声有無情報に変換したデジタル情報を示し
ている。この図では、音声が入力されている時は高レベ
ル、入力されていない時は低レベルで表わされている。The graph G03 voice presence / absence information shows digital information obtained by converting the analog signal into voice presence / absence information. In this figure, when a voice is being input, it is indicated by a high level, and when it is not input, it is indicated by a low level.

【００３０】グラフＧ０４メイン認識トリガは、上記グ
ラフＧ０３音声有無情報を反転したものであり、このグ
ラフの立ち下がりエッジによってメイン認識エンジン４
は音声の入力状態に入り、立ち上がりエッジによって入
力を終了して認識処理を開始するようになっている。図
中のタイミング「Ａ」及び「Ｃ」は立ち上がりエッジに
よってメイン認識エンジン４の認識処理を開始する例と
して記してある。The graph G04 main recognition trigger is obtained by inverting the above-mentioned graph G03 voice presence / absence information.
Enters a voice input state, ends the input by a rising edge, and starts a recognition process. Timings “A” and “C” in the figure are described as an example in which the recognition processing of the main recognition engine 4 is started by a rising edge.

【００３１】グラフＧ０５メイン認識状態は、上記グラ
フＧ０４メイン認識トリガによって認識処理が開始され
たメイン認識エンジン４の状態を示すものであり、認識
中を高レベル、認識が終了している時を低レベルで示し
ている。この中の「Ｂ」および「Ｄ」は上記「Ａ」
「Ｂ」の認識開始に応じた認識処理終了のタイミングを
表している。「Ａ」のタイミングで認識を開始した場合
「Ｂ」のタイミング（立ち下がり）で認識が終了し、本
実施例の場合「わたし」という単語が認識されて正常に
出力される。The graph G05 main recognition state indicates the state of the main recognition engine 4 in which the recognition process has been started by the graph G04 main recognition trigger. The state is high during recognition, and low when recognition is completed. Shown by level. "B" and "D" in the above are "A"
This represents the timing of the end of the recognition process according to the start of the recognition of “B”. When the recognition is started at the timing of "A", the recognition ends at the timing of "B" (fall). In the case of the present embodiment, the word "I" is recognized and output normally.

【００３２】これに対し、「ちがう」なる単語は、本来
ならタイミング「Ｅ」から入力されるはずだが、メイン
認識エンジンがビジー状態であるために、認識が終了し
たタイミング「Ｆ」から入力される。そのため、「Ｃ」
のタイミングに認識を開始した場合、認識用音声の取込
みを開始した場所が図中の「Ｆ」であるため、本来なら
「ちがう」という音声が入力されるべきところ始めの部
分が欠落してしまい、例えば最初の音「ち」が欠落した
「がう」という音声しかメイン認識エンジン４には入力
されない。そのため、「Ｄ」のタイミングで出力された
認識結果は誤ったものである。On the other hand, the word "difference" should be input from the timing "E", but is input from the timing "F" when the recognition is completed because the main recognition engine is busy. . Therefore, "C"
When the recognition is started at the timing of, since the location where the voice for recognition is started is “F” in the figure, the first part where the voice that “should be different” should normally be input is missing. For example, only the voice of “ga” in which the first sound “chi” is missing is input to the main recognition engine 4. Therefore, the recognition result output at the timing “D” is incorrect.

【００３３】グラフＧ０６サブ認識トリガは、グラフＧ
０４メイン認識トリガの立ち下がり「Ｅ」のタイミング
でグラフＧ０５メイン認識状態が高レベルの時に、立下
りエッジとして音声区切検出装置３により生成される。
このトリガにより、サブ認識エンジン５は音声入力待ち
状態になる。よって、音声「ちがう」は始めの部分から
サブ認識エンジン５に取り込まれ、タイミング「Ｃ’」
で認識開始が指示される。The graph G06 sub-recognition trigger is the graph G
When the main recognition state of the graph G05 is at a high level at the timing "E" of the fall of the 04 main recognition trigger, the voice segment detection device 3 generates the falling edge as a falling edge.
With this trigger, the sub-recognition engine 5 enters a voice input waiting state. Therefore, the voice “No” is taken into the sub-recognition engine 5 from the beginning and the timing “C ′”
Indicates the start of recognition.

【００３４】認識が開始されたサブ認識エンジン５はグ
ラフＧ０７サブ認識状態の「Ｈ」のタイミングで正常に
「ちがう」という単語に認識される。The sub-recognition engine 5 whose recognition has begun is normally recognized as the word "difference" at the timing of "H" in the graph G07 sub-recognition state.

【００３５】以上のようにして本実施形態のシステムで
は、十分な認識時間をおかずに独り言のようにつぶやい
た音声も正常に取り込むことが可能になっている。As described above, in the system of the present embodiment, it is possible to normally capture a muttered voice like a solitary message without sufficient recognition time.

【００３６】図４は、メイン認識エンジン４及びサブ認
識エンジン５によって認識出来た単語を文書処理コマン
ドとして処理するアルゴリズムを示したフローチャート
である。この手順は、文書作成装置６及び文書編集装置
７により実行される。FIG. 4 is a flowchart showing an algorithm for processing words recognized by the main recognition engine 4 and the sub recognition engine 5 as a document processing command. This procedure is executed by the document creation device 6 and the document editing device 7.

【００３７】ステップＳ２０でメイン認識エンジン４よ
り認識結果（＝単語文字列）を取得した後、ステップＳ
２１でテキスト領域Ｓ２２へその文字列を追加する。ち
なみにこの文字列取得・テキストへ追加という処理自体
は、一般的なワードプロセッサにおいてキーボードから
入力した文字列をメモリに格納する処理と同じであるの
で詳細は割愛する。After obtaining the recognition result (= word character string) from the main recognition engine 4 in step S20, the process proceeds to step S20.
At 21, the character string is added to the text area S22. By the way, the process of acquiring the character string and adding it to the text is the same as the process of storing the character string input from the keyboard in the memory in a general word processor, so that the details are omitted.

【００３８】次にステップＳ２３でサブ認識エンジン５
が認識終了しているかを調べ、認識終了していない場合
はステップＳ２１へ戻る。Next, in step S23, the sub recognition engine 5
It is determined whether or not has been recognized. If not, the process returns to step S21.

【００３９】サブ認識エンジン５が認識終了している場
合は、サブ認識エンジンにより認識された単語と同じ単
語が、その直前にメイン認識エンジン４により不完全に
認識されている。前述の例でいえば、「がう」がメイン
認識エンジンにより認識されている。そこで、ステップ
Ｓ２４でサブ認識エンジン５の認識結果を取得した後
に、ステップＳ２５で、テキスト領域に直前に格納され
た単語を１つ削除する。これにより前述の例での「が
う」が削除される。When the sub-recognition engine 5 has completed the recognition, the same word as the word recognized by the sub-recognition engine has been incompletely recognized by the main recognition engine 4 immediately before. In the above-described example, “gaou” is recognized by the main recognition engine. Therefore, after acquiring the recognition result of the sub-recognition engine 5 in step S24, one word stored immediately before in the text area is deleted in step S25. As a result, "gau" in the above-described example is deleted.

【００４０】次にステップＳ２６で、サブ認識エンジン
５から出力された単語をコマンドが登録された辞書から
検索し、否定の意味を持つ単語の場合はステップＳ２７
へ行き、ステップＳ２５で削除された単語の直前に格納
された単語をテキスト領域から１つ削除する。この処理
により、前述の例での「たむら」が削除される。Next, in step S26, a word output from the sub-recognition engine 5 is searched from the dictionary in which the command is registered.
And deletes one word stored immediately before the word deleted in step S25 from the text area. By this processing, the “tamura” in the above-described example is deleted.

【００４１】単語が否定語ではない場合、ステップＳ２
８へ行き終了する。If the word is not a negative word, step S2
Go to 8 and end.

【００４２】以上が、サブ認識エンジン５によって認識
出来た独り言をワープロとして処理するアルゴリズムで
ある。The above is the algorithm for processing the monologue recognized by the sub-recognition engine 5 as a word processor.

【００４３】以上のように、サブ認識エンジンで認識さ
れた単語を解析し、辞書にコマンドとして登録されてい
る単語であれは、そのコマンドに対応する処理が遂行さ
れる。As described above, the word recognized by the sub-recognition engine is analyzed, and if the word is registered as a command in the dictionary, the processing corresponding to the command is performed.

【００４４】この例では否定語の単語だけをコマンドと
して処理しているが、カーソルの移動を意味する「う
え」「した」「みぎ」「ひだり」等を変換テーブルに置
いておいて、これを判断・分析する処理を追加すれば、
更に機能がアップし使い勝手が良くなる。この場合に
も、コマンドとなる単語は、メイン認識エンジンにより
認識がおこなされている最中に入力され、サブ認識エン
ジンによって認識される。なお、文書編集装置７のカー
ソルの移動の処理方法では、通常のワープロのカーソル
移動処理をそのまま使うことができるので、本実施例で
は詳細は省略する。In this example, only the negative word is processed as a command. However, "up", "do", "magi", "hidari", etc., which mean the movement of the cursor, are placed in the conversion table and If you add processing to judge and analyze,
Further functions are improved and usability is improved. Also in this case, a word serving as a command is input while recognition is being performed by the main recognition engine, and is recognized by the sub recognition engine. In the processing method of the cursor movement of the document editing apparatus 7, since the cursor movement processing of a normal word processor can be used as it is, the details are omitted in this embodiment.

【００４５】図５は、図４のステップＳ２６で用いられ
る変換テーブルの一部を示したものである。図５におい
ては、右欄が認識された単語を示し、左欄がその意味、
すなわちコマンドを示している。なお、この中でノーオ
ペレーションとは、何も意味を持たない単語のため、何
も処理を行わなくて良い単語を記している。FIG. 5 shows a part of the conversion table used in step S26 of FIG. In FIG. 5, the right column shows the recognized word, the left column its meaning,
That is, it indicates a command. Note that, in this, a no operation is a word that does not need to be processed because it has no meaning.

【００４６】以上のように、文章入力用の単語と編集用
の単語とを装置により識別することで、操作を簡単にす
ることができる。また、編集用のコマンドとしては、ユ
ーザから無意識のうちに発生される独り言を使うので、
ユーザに負担をかけることもない。As described above, the operation can be simplified by identifying words for text input and words for editing by the device. In addition, as a command for editing, a so-called self-indulgently generated word from the user is used.
There is no burden on the user.

【００４７】［第２の実施形態］第１の実施形態で説明
した音声認識システムは、その多くの機能をコンピュー
タシステムにより実現することができる。そのために、
メイン認識エンジン及びサブ認識エンジンを区別せず、
文として入力される音声の切れ目が検出されてから、所
定時間以内に入力された音声をコマンドとして認識す
る。この所定時間はプログラマブルとする。このように
することで、コマンド認識のトリガを、文として入力さ
れる単語の認識開始から所定時間以内に発音された音声
として設定できる。この所定時間を、第１実施形態にお
けるメイン認識エンジンによる認識処理の所要時間に相
当する値に設定することで、第１の実施形態と同様の音
声認識システムを実現できる。[Second Embodiment] Many functions of the speech recognition system described in the first embodiment can be realized by a computer system. for that reason,
Without distinguishing the main recognition engine and the sub recognition engine,
A voice input within a predetermined time after a break in voice input as a sentence is detected is recognized as a command. This predetermined time is programmable. In this way, the trigger for command recognition can be set as a sound pronounced within a predetermined time from the start of recognition of a word input as a sentence. By setting this predetermined time to a value corresponding to the time required for the recognition processing by the main recognition engine in the first embodiment, a speech recognition system similar to that of the first embodiment can be realized.

【００４８】図６は、音声認識及び文章入力・編集をソ
フトウエアにより実現するためのコンピュータシステム
のブロック図である。FIG. 6 is a block diagram of a computer system for realizing voice recognition and text input / editing by software.

【００４９】図６において、マイク１から入力された音
声信号はマイクアンプ２で増幅され、ＡＤ変換器６０１
によりデジタル化され、メモリ６０２に格納される。In FIG. 6, the audio signal input from the microphone 1 is amplified by the microphone amplifier 2 and
, And stored in the memory 602.

【００５０】メモリ６０２には、音声データの他に、Ｃ
ＰＵ６０４により実行されるプログラムも格納される。
このプログラムは、図４及び図７の手順をＣＰＵ６０４
により実行させるためのプログラムである。ただし、本
実施形態では、認識により得られたテキストの編集だけ
でなく、音声認識自体もコンピュータシステムによりプ
ログラムを実行することで行われるために、メイン認識
エンジン及びサブ認識エンジンによる音声認識は、ひと
つのプログラムモジュールにより実現される。In the memory 602, in addition to the audio data, C
The program executed by the PU 604 is also stored.
This program executes the procedures of FIGS.
Is a program to be executed. However, in the present embodiment, not only editing of text obtained by recognition, but also voice recognition itself is performed by executing a program by a computer system, so that the voice recognition by the main recognition engine and the sub-recognition engine is one. Is realized by the program module.

【００５１】図７は、第１の実施形態における音声区切
り検出装置３で実行される図２の手順をコンピュータに
より実現するためのフローチャートである。まず、ステ
ップＳ７０１で、メモリ６０２に格納された音声データ
から認識トリガを取得する。なお、この方法は第１実施
形態と同様で良い。ステップＳ７０２で、認識トリガか
ら音声の切れ目までの音声認識を行う。ステップＳ７０
３で直前の認識トリガとの時間間隔が所定の閾値Ｔthよ
り大きいか判定する。大きければステップＳ７０４で、
例えばフラグ等により、ステップＳ７０２における音声
認識結果はメイン認識エンジンによる認識結果であるも
のと示しておき、大きくなければ、ステップＳ７０５
で、サブ認識エンジンによる認識結果であるものと示
す。FIG. 7 is a flowchart for realizing, by a computer, the procedure of FIG. 2 executed by the voice segment detecting device 3 in the first embodiment. First, in step S701, a recognition trigger is obtained from the audio data stored in the memory 602. This method may be the same as in the first embodiment. In step S702, speech recognition from a recognition trigger to a speech break is performed. Step S70
At 3, it is determined whether the time interval from the immediately preceding recognition trigger is greater than a predetermined threshold Tth. If it is larger, in step S704,
For example, a flag or the like indicates that the speech recognition result in step S702 is a result of recognition by the main recognition engine.
Indicates that the recognition result is obtained by the sub-recognition engine.

【００５２】このようにして得られた認識結果は、図４
の手順により、メイン認識エンジンで得られたと示され
た認識結果はテキスト領域に追加され、サブ認識エンジ
ンにより得られたものと示された認識結果は編集コマン
ドとしてみなされる。そのコマンドが削除コマンドであ
れば、図４のステップＳ２７のように、テキストから直
前の単語の削除がおこなわれる。なお、図４はリアルタ
イムで認識を行うために、不完全な認識結果をステップ
Ｓ２５で削除しているが、本実施形態では音節を逐次認
識するために、ステップＳ２５は必要ない。The recognition result thus obtained is shown in FIG.
According to the above procedure, the recognition result indicated as obtained by the main recognition engine is added to the text area, and the recognition result indicated as obtained by the sub-recognition engine is regarded as an edit command. If the command is a delete command, the immediately preceding word is deleted from the text as in step S27 in FIG. In FIG. 4, incomplete recognition results are deleted in step S25 in order to perform recognition in real time, but in this embodiment, step S25 is not necessary in order to sequentially recognize syllables.

【００５３】以上のようにして、図６の構成によって
も、本発明を実現することができる。As described above, the present invention can also be realized by the configuration shown in FIG.

【００５４】[0054]

【他の実施形態】なお、本発明は、複数の機器（例えば
ホストコンピュータ，インタフェイス機器，リーダ，プ
リンタなど）から構成されるシステムに適用しても、一
つの機器からなる装置（例えば、複写機，ファクシミリ
装置など）に適用してもよい。[Other Embodiments] Even if the present invention is applied to a system including a plurality of devices (for example, a host computer, an interface device, a reader, a printer, etc.), an apparatus (for example, a copying machine) Machine, facsimile machine, etc.).

【００５５】また、本発明の目的は、前述した実施形態
の機能を実現する図４及び図７の手順のプログラムコー
ドを記録した記憶媒体を、システムあるいは装置に供給
し、そのシステムあるいは装置のコンピュータ（または
ＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコ
ードを読出し実行することによっても達成される。Another object of the present invention is to provide a system or an apparatus with a storage medium in which the program code of the procedure shown in FIGS. 4 and 7 for realizing the functions of the above-described embodiment is supplied to a system or an apparatus. (Or CPU or MPU) by reading and executing the program code stored in the storage medium.

【００５６】この場合、記憶媒体から読出されたプログ
ラムコード自体が前述した実施形態の機能を実現するこ
とになり、そのプログラムコードを記憶した記憶媒体は
本発明を構成することになる。In this case, the program code itself read from the storage medium implements the functions of the above-described embodiment, and the storage medium storing the program code constitutes the present invention.

【００５７】プログラムコードを供給するための記憶媒
体としては、例えば、フロッピディスク，ハードディス
ク，光ディスク，光磁気ディスク，ＣＤ−ＲＯＭ，ＣＤ
−Ｒ，磁気テープ，不揮発性のメモリカード，ＲＯＭな
どを用いることができる。As a storage medium for supplying the program code, for example, a floppy disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD
-R, a magnetic tape, a nonvolatile memory card, a ROM, or the like can be used.

【００５８】また、コンピュータが読出したプログラム
コードを実行することにより、前述した実施形態の機能
が実現されるだけでなく、そのプログラムコードの指示
に基づき、コンピュータ上で稼働しているＯＳ（オペレ
ーティングシステム）などが実際の処理の一部または全
部を行い、その処理によって前述した実施形態の機能が
実現される場合も含まれる。When the computer executes the readout program code, not only the functions of the above-described embodiment are realized, but also the OS (Operating System) running on the computer based on the instruction of the program code. ) Performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.

【００５９】さらに、記憶媒体から読出されたプログラ
ムコードが、コンピュータに挿入された機能拡張ボード
やコンピュータに接続された機能拡張ユニットに備わる
メモリに書込まれた後、そのプログラムコードの指示に
基づき、その機能拡張ボードや機能拡張ユニットに備わ
るＣＰＵなどが実際の処理の一部または全部を行い、そ
の処理によって前述した実施形態の機能が実現される場
合も含まれる。Further, after the program code read from the storage medium is written into a memory provided on a function expansion board inserted into the computer or a function expansion unit connected to the computer, based on the instructions of the program code, The case where the CPU of the function expansion board or the function expansion unit performs part or all of the actual processing, and the function of the above-described embodiment is realized by the processing.

【００６０】[0060]

【発明の効果】以上のように、本発明によれば、音声入
力される文入力用の単語と編集コマンドとを識別し、操
作性の良い音声認識装置及び音声入力方法を提供でき
る。As described above, according to the present invention, it is possible to provide a speech recognition apparatus and a speech input method which can distinguish a sentence input word and an edit command to be inputted by speech and have good operability.

【００６１】また、編集コマンドとして、話者が文章中
に挿入する、その文章の編集に係る単語を用いること
で、ユーザが無理なく利用できる音声認識装置及び音声
入力方法を提供できる。Also, by using words related to editing of a sentence inserted by a speaker in a sentence as an editing command, it is possible to provide a voice recognition device and a voice input method that can be used by a user without difficulty.

【００６２】[0062]

[Brief description of the drawings]

【図１】音声認識システムのブロック図である。FIG. 1 is a block diagram of a speech recognition system.

【図２】音声区切検出装置３のアルゴリズムを表したフ
ローチャートである。FIG. 2 is a flowchart showing an algorithm of the voice segment detection device 3.

【図３】音声を処理する様子を示したタイミングチャー
トである。FIG. 3 is a timing chart showing a state of processing audio.

【図４】サブ認識エンジン５の結果を処理するアルゴリ
ズムを表したフローチャートである。FIG. 4 is a flowchart showing an algorithm for processing a result of the sub-recognition engine 5.

【図５】単語の意味を記した変換テーブルの一例を示す
図である。FIG. 5 is a diagram showing an example of a conversion table describing the meaning of a word.

【図６】音声認識システムを実現するコンピュータシス
テムのブロック図である。FIG. 6 is a block diagram of a computer system for realizing a speech recognition system.

【図７】図６のシステムで文及びコマンドを認識する手
順のフローチャートである。FIG. 7 is a flowchart of a procedure for recognizing a sentence and a command in the system of FIG. 6;

[Explanation of symbols]

１マイク２マイクアンプ３音声区切検出装置４メイン認識エンジン５サブ認識エンジン６文書作成装置７文書編集装置８記憶装置９表示器 Reference Signs List 1 microphone 2 microphone amplifier 3 voice segment detection device 4 main recognition engine 5 sub-recognition engine 6 document creation device 7 document editing device 8 storage device 9 display

Claims

[Claims]

1. A first method for recognizing words from input speech
A second recognizing means for recognizing an instruction from an input voice, and a sentence created by accumulating words recognized by the first recognizing means, and recognizing by the second recognizing means. Editing means for editing the sentence in accordance with a given instruction.

2. The second recognizing means recognizes an instruction from the pronounced voice when the voice is pronounced within a predetermined time after the recognition by the first recognizing means is started. The speech recognition device according to claim 1, wherein:

3. The speech recognition apparatus according to claim 2, wherein the predetermined time is a time from when the first recognition unit starts recognition to when it ends.

4. An instruction recognized by the second recognition unit includes an instruction to delete a word recognized by the first recognition unit immediately before the instruction is recognized. The speech recognition device according to claim 1.

5. A sentence creating step of adding a word speech-recognized by the first recognizing means to a sentence, and a sentence editing step of editing the sentence based on an instruction speech-recognized by the second recognizing means. A voice input method comprising:

6. The second recognizing means recognizes an instruction from the pronounced voice when the voice is pronounced within a predetermined time from the start of the recognition by the first recognizing means. The voice input method according to claim 5, wherein a sentence is edited in the sentence editing step based on the instruction.

7. The voice input method according to claim 6, wherein the predetermined time is a time from when the first recognition unit starts recognition to when it ends.

8. The instruction recognized by the second recognizing means includes a word deletion instruction, and the word recognized by the first recognizing means immediately before the deletion instruction is recognized is sent to the sentence editing step. 6. The method according to claim 5, wherein
Voice input method described in.

9. A sentence creating step of adding a word speech-recognized by the first recognizing means to a sentence, and a sentence editing step of editing the sentence based on an instruction speech-recognized by the second recognizing means. A computer-readable storage medium storing a program for executing the steps.