JPH0228827A

JPH0228827A - Sound recognizing device

Info

Publication number: JPH0228827A
Application number: JP63180800A
Authority: JP
Inventors: Hiroyuki Iwahashi; 岩橋　弘幸; Akira Tsuruta; 彰鶴田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1988-07-19
Filing date: 1988-07-19
Publication date: 1990-01-30

Abstract

PURPOSE:To improve input efficiency by displaying all syllable candidates in the position of a cursor. CONSTITUTION:A sound recognizing device 1 equips with a microphone 2, feature extracting part 4, CPU part 5, standard pattern storing part 5, matching part 7 and an input pattern storing part 8. Then, a display part 9 and a keyboard 10 are connected. For a sound signal to be collected by this microphone 2, a feature parameter to be used for the extraction of a syllable block, etc., is computed by the feature extracting part 4. In the CPU part 5, a syllable block is determined on the basis of the said parameter and a sound pattern is prepared and stored in the input pattern storing part 8. For this sound pattern and the plural sound patterns of a standard pattern storing part 6, distance computation is executed and the syllable candidate is prepared and stored to a memory 5a. This syllable candidate is displayed on the display part 9. Thus, a user can exactly grasp how recognized the sound of the user itself is.

Description

【発明の詳細な説明】産業上の利用分野本発明は、いゆわる日本語ワードプロセッサやパーソナ
ルコンピュータなどで好適に用いられる音声認識装置に
関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a speech recognition device suitably used in so-called Japanese word processors, personal computers, and the like.

従来の技術前記ワードプロセッサやパーソナルコンピュータなどに
おいて、音声による入力では、たとえばキーボードなど
からの入力とは異なり、発声された音声が必ずしも正確
に入力されるとは限らず、したがって入力された音声の
認識結果の候補を言語処理機能等で補正して音声入力と
して用いている。しかしながら実際の音声が認識結果の
下位候補であったり、あるいは候補中にない場合には、
たとえばカーソル指示によってキーボードなどを用いて
音節単位の修正が必要、となる。BACKGROUND OF THE INVENTION In word processors, personal computers, etc., when inputting by voice, unlike input from a keyboard, for example, the uttered voice is not necessarily input accurately, and therefore the recognition result of the inputted voice is The candidates are corrected using language processing functions and used as voice input. However, if the actual voice is a lower candidate in the recognition result or is not among the candidates,
For example, syllables need to be corrected using a keyboard or the like according to cursor instructions.

第６図は典型的な従来技術の音節候補の表示例を示す図
である。たとえば「こくみんを」と発声した場合、先ず
第６図（１）で示されるように、認識結果の第１候補で
ある「ぼふにんを」が表示される。この認識結果は誤っ
ており、したがって参照符Ａで示されるカーソルを「←
Ｊキーなどを用いて第６図（２）で示されるように移動
し、たとえば「↓」キーを操作することによって第６図
（３）で示される正しい音節候補が表示された状態とな
ると、第１音節の入力が終了し、たとえば「→Ｊキーな
どを用いてカーソルＡを第２音節以降の誤入力箇所に移
動し、再びこのような訂正入力操作が行われ、第６図（
４）で示されるように、入力された音節の全ての訂正入
力操作、すなわち入力音節の確定操作が終了した後、た
とえばカナ漢字変換などが行われる。FIG. 6 is a diagram showing an example of display of syllable candidates in a typical prior art. For example, when the user utters "Kokuminwo", the first candidate of the recognition results, "Bofuninwo", is first displayed as shown in FIG. 6(1). This recognition result is incorrect, and therefore the cursor indicated by reference mark A is
When the correct syllable candidates are displayed as shown in FIG. 6 (3) by using the J key etc. to move as shown in FIG. 6 (2) and operating the "↓" key, for example, When the input of the first syllable is completed, use the →J key, for example, to move cursor A to the incorrect input location after the second syllable, and such correction input operation is performed again.
As shown in 4), after all the correction input operations for the input syllables, ie, the finalization operations for the input syllables, for example, kana-kanji conversion is performed.

発明が解決しようとする課題上述のような従来技術では、入力された音声の認識結果
が最上位の音節候補にない場合、前述のように「→」キ
ーなどを用いてカーソルをその位置に移動し、［↓Ｊキ
ーなどを用いて音節候補を上位から順次的に表示してゆ
く。したがって使用者は、その発声した音声の認識結果
を上位候補から下位候補まで一括して確認することがで
きず、したがって使用者が発声した音節が前記候補中に
ない場合には、無駄なキー操作を行うこととなり、入力
効率が低下してしまう。Problems to be Solved by the Invention In the prior art as described above, if the recognition result of the input speech is not in the top syllable candidate, the cursor is moved to that position using the "→" key etc. as described above. Then, use the [↓J key etc. to display syllable candidates sequentially from the top. Therefore, the user is unable to check the recognition results of the voice uttered from the top candidates to the bottom candidates, and therefore, if the syllable uttered by the user is not among the candidates, unnecessary key operations are required. This results in a decrease in input efficiency.

本発明の目的は、カーソル移動に伴ってそのカーソル位
置に対応した全ての音節候補を一括表示することによっ
て、前述のような無駄なキー操作を省くとともに、使用
者に認識結果の内部状態を知らしめる表示方法を備えた
音声認識装置を提供することである。The purpose of the present invention is to simultaneously display all syllable candidates corresponding to the cursor position as the cursor moves, thereby eliminating unnecessary key operations as described above and informing the user of the internal state of the recognition result. It is an object of the present invention to provide a speech recognition device equipped with a display method that provides a clear and accurate display.

課題を解決するための手段本発明は、入力された音声を予め登録された標準パター
ンとの距離計算を用いて音節毎に認識する音声認識装置
において、各音節に対して１または複数の音節候補を作成する音節
候補作成手段と、前記音節候補作成手段からの出力に応答して、音節の目
視表示を行う表示手段とを備え、前記表示手段にはカー
ソルが表示され、該カーソルが移動された音節における
前記音節候補を一斉に表示することを特徴とする音声認
識装置である。Means for Solving the Problems The present invention provides a speech recognition device that recognizes input speech syllable by syllable by calculating the distance from a standard pattern registered in advance. syllable candidate creation means for creating a syllable candidate; and display means for visually displaying a syllable in response to an output from the syllable candidate creation means, a cursor is displayed on the display means, and a cursor is displayed on the display means, and a cursor is displayed on the display means when the cursor is moved. The speech recognition device is characterized in that the syllable candidates in a syllable are displayed all at once.

作　　用本発明に従えば、音声認識装置に音声が入力されると、
音節候補作成手段はその認識結果に基づいて、入力され
た音節毎に１または複数の音節候補を作成する。音節候
補作成手段からの出力は表水手段に与えられており、作
成された音節候補のうち、たとえば第１の音節候補がこ
の表示手段に目視表示される。この表示手段にはまたカ
ーソルが表示されており、このカーソルが移動された音
節においては、音節候補の全てが表示される。Effect According to the present invention, when a voice is input to the voice recognition device,
The syllable candidate creation means creates one or more syllable candidates for each input syllable based on the recognition result. The output from the syllable candidate creation means is given to the surface water means, and of the created syllable candidates, for example, the first syllable candidate is visually displayed on the display means. A cursor is also displayed on this display means, and all syllable candidates are displayed for the syllable to which the cursor is moved.

したがって使用者は、自己の音声がどのように認識され
たのかを正確に把握することができ、また発声した音節
が候補中にない場合などでは、不要なキー操作を行う必
要がなく、速やかに訂正入力を行うことができ、入力効
率を向上することができる。Therefore, users can accurately grasp how their own voice has been recognized, and if the syllable they uttered is not among the candidates, there is no need to perform unnecessary key operations, and the user can quickly Correct input can be performed and input efficiency can be improved.

実施例第１図は、本発明の一実施例の音声認識装置１の構成を
示すブロック図である。マイクロホン２で収音された音
声の信号は、音声帯域のみを通過し、増幅を行う増幅部
３を介して、特徴抽出部４に与えられる。特徴抽出部４
は、入力された音声波形から、後述するような音節区間
の抽出やパターンマツチングの距離計算に使用される特
徴パラメータを計算する。Embodiment FIG. 1 is a block diagram showing the configuration of a speech recognition device 1 according to an embodiment of the present invention. The audio signal picked up by the microphone 2 passes only the audio band and is provided to the feature extraction unit 4 via the amplification unit 3 which performs amplification. Feature extraction unit 4
calculates feature parameters used for extracting syllable sections and calculating distances for pattern matching, which will be described later, from the input speech waveform.

０２０部５では、前記特徴パラメータを基に、音節区間
の決定と、その音節区間の音声パターンを作成し、入カ
バターン記憶部８に記憶する。マツチング部７では、入
力された音声パターンと、標準パターン記憶部６に記憶
されている複数の音声パターンとの距離計算を行なう。The 020 section 5 determines a syllable section based on the characteristic parameters, creates a voice pattern for the syllable section, and stores it in the input pattern storage section 8. The matching section 7 calculates the distance between the input voice pattern and a plurality of voice patterns stored in the standard pattern storage section 6.

こうして計算されたマツチング結果は再び０２０部５に
送られ、音節候補が作成され、該ＣＰＵ部５内のメモリ
５ａに記憶される。またこの音節候補は、表示部９に表
示される。The matching results calculated in this way are sent again to the 020 unit 5, where syllable candidates are created and stored in the memory 5a within the CPU unit 5. This syllable candidate is also displayed on the display section 9.

表示部９およびカーソルの移動を指示するキーボード１
０は外部装置であり、該音声認識装置１が接続される機
器、たとえばワードプロセッサなどが用いられる。前記
ＣＰＵ部５は、キーボード１０からの入力などに対応し
てカーソルの移動制御を行なう。Display section 9 and keyboard 1 for instructing cursor movement
0 is an external device, and a device to which the speech recognition device 1 is connected, such as a word processor, is used. The CPU section 5 controls the movement of the cursor in response to input from the keyboard 10 and the like.

上述のように構成された音声認識装置１において、使用
者がたとえば第２図（１）で示されるように、「こくみ
んを」と発声した場合、マツチング部７において、入カ
バターン記憶部８に記憶されている入力音声パターンと
、標準パターン記憶部６に記憶されている標準パターン
との照合結果から、ＣＰＵ部５内のメモリ５ａには第２
図（２）で示されるような各音節候補の認識結果が記憶
される。In the speech recognition device 1 configured as described above, when the user utters "Kokumin wo" as shown in FIG. Based on the comparison result between the stored input voice pattern and the standard pattern stored in the standard pattern storage section 6, the second
The recognition results for each syllable candidate as shown in Figure (2) are stored.

前記「こくみんを」という１文節が入力されると、表示
部９の画面上には、第３図で示されるように、各音節候
補の第１候補が表示されるととともに、最後の音節候補
の次の位置には参照符１１で示されるように、次の入力
可能位置を示すカーソルが表示される。When the phrase "kokumin wo" is input, the first syllable candidate is displayed on the screen of the display unit 9, as shown in FIG. 3, and the last syllable is displayed. At the position next to the candidate, a cursor indicating the next possible input position is displayed, as indicated by reference numeral 11.

前記第３図より明らがなように、第１〜第３音節の認識
結果は誤りであり、したがって使用者はカーソルを移動
して、入力を希望する文字に修正を行う必要があり、先
ずキーボード１ｏの「←」キーを用いてカーソルを第１
音節に移動する。これによって表示部９の画面上には、
第４図（１）において参照符１２で示されるように、カ
ーソル表示が行われるとともに、このカーソルのある第
１音節の全ての音節候補が表示される。As is clear from Figure 3 above, the recognition results for the first to third syllables are incorrect, so the user must move the cursor to correct the characters they wish to input. Use the “←” key on keyboard 1o to move the cursor to
Move to syllable. As a result, on the screen of the display section 9,
As shown by reference numeral 12 in FIG. 4(1), a cursor is displayed and all syllable candidates for the first syllable where the cursor is located are displayed.

この状態でキーボード１０の「↑」または「↓」キーの
操作などによって正しい文字の選択を行い、この場合選
択された音節候補は、第１候補の位置に入換えられるよ
うにしてもよく、あるいはまたたとえば点滅表示や、背
景の色と表示文字の色との反転表示などによって他の音
節候補と区分して表示されてもよい。In this state, the correct character may be selected by operating the "↑" or "↓" key on the keyboard 10, and in this case, the selected syllable candidate may be replaced in the position of the first candidate, or Alternatively, the syllable candidate may be displayed separately from other syllable candidates by, for example, blinking or displaying the background color and displayed character color inverted.

こうして第１音節の修正が終了すると、キーボード１０
の「→」キーの操作によって第４図（２）において参照
符１３で示されるように、カーソルを第２音節の位置に
移動する。第２音節においても前記第１音節と同様にし
て修正操作が行われ、キーボード１０の「→」キーを操
作することによって第４図（３）において参照符１４で
示される位置にカーソルが移動し、以降第４図〈４）に
おいて参照符１５で示されるようにカーソルを順次的に
移動してゆき、各音節毎に入力文字の修正が行われる。When the modification of the first syllable is completed in this way, the keyboard 10
By operating the "→" key, the cursor is moved to the position of the second syllable, as indicated by reference numeral 13 in FIG. 4(2). The correction operation is performed on the second syllable in the same manner as on the first syllable, and by operating the "→" key on the keyboard 10, the cursor is moved to the position indicated by the reference mark 14 in FIG. 4(3). Thereafter, the cursor is sequentially moved as indicated by reference numeral 15 in FIG. 4 (4), and the input characters are corrected for each syllable.

全ての入力文字の修正が終了し、第４図（５）において
参照符１６で示されるようにカーソルを次の文字の入力
位置に移動した時点で、たとえばカナ漢字変換などが行
われ、再び音声入力が行われる。When all input characters have been corrected and the cursor is moved to the input position of the next character as shown by reference mark 16 in Figure 4 (5), kana-kanji conversion, etc. is performed, and the voice is changed again. Input is made.

第５図は、動作を説明するためのフローチャートである
。ステップｎ１で使用者が発声すると、ステップｒｉ　
２で前述のような標準パターンとの照合が行なわれ、そ
の照合結果に基づいてＣＰＵ部５内のメモリ５ａに各音
節候補の認識結果が記憶される。ステップｎ３では、ス
テップｎ２での認識結果が表示部９に表示される。FIG. 5 is a flowchart for explaining the operation. When the user speaks in step n1, step ri
In step 2, a comparison with the standard pattern as described above is performed, and the recognition result of each syllable candidate is stored in the memory 5a in the CPU section 5 based on the comparison result. At step n3, the recognition result at step n2 is displayed on the display section 9.

ステップｎ４ではキーボード１０の「←」キーや「→」
キーの操作が行なわれたかどうか、すなわち音節候補の
修正があるかどうかが判断され、そうであるときにはス
テップｎ５で前記キー操作によるカーソル移動が行なわ
れる。ステップｎ６ではカーソル位置の音節候補の全て
が表示され、ステップｎ７でこれら音節候補の選択やキ
ー人力などによって音節候補の修正が行なわれ、ステッ
プｎ３に戻る。In step n4, the "←" key or "→" key on the keyboard 10
It is determined whether a key operation has been performed, that is, whether a syllable candidate has been modified, and if so, the cursor is moved by the key operation in step n5. In step n6, all of the syllable candidates at the cursor position are displayed, and in step n7, the syllable candidates are corrected by selection of these syllable candidates, manual keystrokes, etc., and the process returns to step n3.

ステップｎ４において音節候補の修正が行なわれないと
き、すなわち各音節候補の第１候補が入力音声に一致し
たときには、ステップｎ８でカナ漢字変換などの処理が
行なわれて、ステップｒ１１に戻る。If the syllable candidates are not modified in step n4, that is, if the first candidate of each syllable candidate matches the input speech, processing such as kana-kanji conversion is performed in step n8, and the process returns to step r11.

このように本件音声認識装置１では、カーソルが文節中
にある場合には、そのカーソルの位置における音節候補
の全てを表示するようにしている。In this manner, in the present speech recognition device 1, when the cursor is located in a phrase, all syllable candidates at the cursor position are displayed.

発明の効果以上のように本発明によれば、カーソルのある位置にお
ける音節候補の全てを表示するようにしたので、使用者
は、自己の音声がどのように認識されたのかを把握する
ことができる。また発声した音節が候補中にない場合な
どでは、不要なキー操作を行う必要なく、速やかに訂正
入力を行うことができ、入力効率を向上することができ
る。Effects of the Invention As described above, according to the present invention, all syllable candidates at the cursor position are displayed, so the user can understand how his/her own speech is recognized. can. Furthermore, in cases where the uttered syllable is not among the candidates, correction input can be quickly performed without the need for unnecessary key operations, and input efficiency can be improved.

[Brief explanation of the drawing]

第１図は本発明の一実施例の音声認識装置１の構成を示
すブロック図、第２図はマイクロホン２から入力された
音声とその音声のＣＰＵ部５内のメモリ５ａにおける認
識結果の一例を示す図、第３図は１文節の音声入力が終
了した時点における表示部９の表示例を示す図、第４図
はカーソル移動による各音節候補の選択時の表示例を示
す図、第５図は動作を説明するためのフローチャート、
第６図は従来技術の音節候補の表示例を示す図である。１・・・音声認識装置、２・・・マイクロホン、３・・
・増幅部、４・・・特徴抽出部、５・・・ＣＰＵ部、６
・・・標準パターン記憶部、７・・・マツチング部、８
・・・入カバターン記憶部、９・・・表示部、１０・・
・キーボード、１１〜１６・・・カーソル代理人　　弁理士　画数　圭一部ごくみんを１音Ｐ籍峰置FIG. 1 is a block diagram showing the configuration of a speech recognition device 1 according to an embodiment of the present invention, and FIG. 2 shows an example of the voice input from the microphone 2 and the recognition result of the voice in the memory 5a in the CPU section 5. FIG. 3 is a diagram showing an example of the display on the display unit 9 at the time when the voice input of one syllable is completed, FIG. 4 is a diagram showing an example of the display when each syllable candidate is selected by moving the cursor, and FIG. is a flowchart to explain the operation,
FIG. 6 is a diagram showing an example of display of syllable candidates in the prior art. 1... Voice recognition device, 2... Microphone, 3...
- Amplification section, 4... Feature extraction section, 5... CPU section, 6
...Standard pattern storage section, 7...Matching section, 8
...Input cover pattern storage section, 9...Display section, 10...
・Keyboard, 11 to 16... Cursor agent Patent attorney Number of strokes Keiichi Gokumin is set as 1 sound P registration

Claims

[Claims] A syllable candidate that creates one or more syllable candidates for each syllable in a speech recognition device that recognizes input speech syllable by syllable using distance calculation from a standard pattern registered in advance. and a display means for visually displaying a syllable in response to an output from the syllable candidate creation means, wherein a cursor is displayed on the display means, and the syllable candidate for the syllable to which the cursor is moved is provided. A voice recognition device characterized by displaying all at once.