JPH0229231B2

JPH0229231B2 -

Info

Publication number: JPH0229231B2
Application number: JP59042661A
Authority: JP
Inventors: Mitsuhiro Toya; Shin Kamya
Original assignee: Computer Basic Technology Research Association Corp
Current assignee: Computer Basic Technology Research Association Corp
Priority date: 1984-03-05
Filing date: 1984-03-05
Publication date: 1990-06-28
Also published as: JPS60185999A

Description

[Detailed description of the invention]

＜発明の技術分野＞本発明は入力された音声を音節単位に認識する
日本語音声入力装置の改良に関し、更に詳細には
入力時に発声された音声の認識結果にもとずい
て、誤認識に関与した特徴標準パターンを消去す
るように成したものである。＜発明の技術的背景とその問題点＞従来の音声認識装置においては、音声の特徴標
準パターンを登録する登録モードと、入力音声を
認識する認識モード（入力モード）とを分け、再
登録という形式での音節特徴パターンの入れ換え
を行なうことは出来ても、認識モードにおいて誤
認識に関与した特徴標準パターンを消去すること
が出来なかつた。このことは単語の認識を単位と
する場合にはあまり問題とならないが、音節を単
位とする場合には音節間の特徴パターンの差が小
さいため、切り出し位置等の影響により、特徴標
準パターンとして不適当なものがあつた場合に
は、認識性能の低下が生じるという問題点があつ
た。＜発明の目的＞本発明は上記の点に鑑みて成されたものであ
り、一つの音節に対して複数の特徴標準パターン
を備えるようにした日本語音声入力装置におい
て、誤認識に関与した不適当な特徴標準パターン
を消去することによつて、認識性能を向上させる
ことを目的とし、この目的を達成するため、本発
明の日本語音声入力装置は、入力時に発声された
音声の認識結果にもとずいて、誤認識に関与した
特徴標準パターンを消去する消去手段を備えるよ
うに構成されている。しかも本発明によれば入力した音声を記憶して
おくことにより、誤認識に関与した特徴標準パタ
ーンの消去を指示したときの、対応する部分の入
力音声を再生出力して、正確に音節として切り出
されたか否かを入力者自身が確認でき、間違つて
特徴標準パターンを消去しないように成されてい
る。＜発明の実施例＞以下、図面を参照して本発明を、連続的に発声
された音声を音節単位に認識し、この認識結果を
キーボード等の入力装置で修正した後に、単語等
の単位で外部装置に転送する機能を有する日本語
音声入力装置を一例として説明する。第１図は本発明の一実施例装置の構成を示すブ
ロツク図である。第１図において、発声され入力された音声はマ
イクロホン１等を介してアナログ入力部２に入力
され、該アナログ入力部２内の増幅器３によつて
増幅された後、アナログ／デジタル変換部４によ
つてデジタル信号に変換され、そのデジタル信号
が音声分析部５及び音節セグメンテーシヨン部６
に入力される。音声分析部５では入力音声を16ms程度のフレ
ームに分け、スペクトル分析を行ない、8ms程度
の間隔で音節セグメンテーシヨン部６に特徴パタ
ーンと、音節セグメンテーシヨンに必要な情報
（パワー、零交差数等）を転送する。音節セグメンテーシヨン部６では、音声分析部
５からの種合の情報を用いて、入力音声から音節
を切り出す。その切り出した部分の特徴パターン
と、その区間の音声波形を波形・特徴パターン一
時メモリ７にたくわえる。そして、音声を切り出
したことをCPU８に伝達すると共に波形・特徴
パターン一時メモリ７内のアドレスも同時に伝達
する。波形・特徴パターン一時メモリ７は複数の音節
をたくわえることができるように構成されてい
る。音節セグメンテーシヨン部６の処理はCPU８
からの命令により、開始・停止がコントロールさ
れるように構成されている。９は単音節認識部であり、該単音節認識部９で
は、CPU８からの命令によりパターンメモリ１
０内の特徴パターンメモリ１０ａと標準パターン
メモリ１０ｂとの間で距離計算等を行ない、その
結果をCPU８に戻す。そして、CPU８はその結
果を、認識結果格納メモリ１１にたくわえ、表示
装置１２に表示する。認識結果格納メモリ１１に
は、複数の音節に対する認識結果をたくわえるこ
とができるように構成されている。１３は音声出力制御部であり、該音声出力制御
部１３では、CPU８の命令により、波形・特徴
パターン一時メモリ７の任意の部分に記憶された
情報をアナログ出力部１４に送る。そしてアナロ
グ出力部１４では音声のデジタル信号をデジタ
ル／アナログ変換部１４ａ及び増幅器１４ｂによ
つて音声波形に再生し出力するように構成されて
いる。なお、上記パターンメモリ１０は二つの部分に
分かれており、１０ａは特徴パターンメモリであ
り入力された音節に対応する特徴パターンを一個
分だけ記憶できる。後の一つの１０ｂは特徴標準
パターン用メモリであり複数の音節の特徴パター
ンが記憶されている。上記音節の特徴パターン用メモリ１０ｂは後述
するように各音節名をコードで記憶するエリア、
登録の有無を記憶するフラグエリア及び特徴標準
パターンデータを記憶する特徴標準パターンエリ
アより構成されている。また１５はキーボード等により構成された入力
部であり、例えば第２図に示すようにカナキー１
５ａ、登録モードキー１５ｂ、認識モードキー１
５ｃ、音節消去キー１５ｄ等が備えられている。また１６は認識結果を外部装置に転送する際の
データの送受信の制御を行なうＩ／Ｆ部である。次に、上記の如く構成された装置の動作を登録
モード及び認識モードについて説明する。登録モードの説明第３図は登録モードにおけるCPU８の処理
フローを示したものである。第３図において、装置本体が、登録モードキ
ー１５ｂの操作によつて登録モードに設定され
るとまずステツプｎ１においてパターンメモリ
１０が初期化され、標準パターンが総て消去さ
れる。第１表は標準パターンメモリ１０ｂの構
成を示したものである。 <Technical Field of the Invention> The present invention relates to the improvement of a Japanese speech input device that recognizes input speech syllable by syllable, and more specifically, to improve recognition errors based on the recognition results of speech uttered during input. This is done to erase the feature standard pattern involved. <Technical background of the invention and its problems> In conventional speech recognition devices, a registration mode in which a standard pattern of speech characteristics is registered and a recognition mode (input mode) in which input speech is recognized are separated, and a form of re-registration is used. Although it was possible to replace the syllable feature patterns in , it was not possible to erase the feature standard patterns that were involved in erroneous recognition in the recognition mode. This does not pose much of a problem when recognizing words as a unit, but when recognizing syllables as a unit, the difference in feature patterns between syllables is small, so it may not be possible to use it as a standard feature pattern due to the influence of the extraction position, etc. There was a problem that if a suitable one was found, the recognition performance would deteriorate. <Object of the Invention> The present invention has been made in view of the above points, and is intended to solve the problems associated with misrecognition in a Japanese speech input device that is equipped with a plurality of feature standard patterns for one syllable. The purpose of the Japanese speech input device of the present invention is to improve recognition performance by erasing appropriate characteristic standard patterns. Originally, it is configured to include erasing means for erasing feature standard patterns that are involved in erroneous recognition. Moreover, according to the present invention, by storing the input voice, when an instruction is given to delete a feature standard pattern that is involved in erroneous recognition, the corresponding part of the input voice can be played back and output to accurately cut out the syllable. This allows the person who inputs the information to check whether or not the feature standard pattern has been deleted by mistake. <Embodiments of the Invention> Hereinafter, with reference to the drawings, the present invention will be described in which continuously uttered speech is recognized in units of syllables, the recognition results are corrected using an input device such as a keyboard, and then in units of words etc. A Japanese voice input device having a function of transferring data to an external device will be explained as an example. FIG. 1 is a block diagram showing the configuration of an apparatus according to an embodiment of the present invention. In FIG. 1, the uttered and input voice is input to an analog input section 2 via a microphone 1 etc., and after being amplified by an amplifier 3 in the analog input section 2, it is input to an analog/digital conversion section 4. Therefore, it is converted into a digital signal, and the digital signal is sent to a speech analysis section 5 and a syllable segmentation section 6.
is input. The speech analysis section 5 divides the input speech into frames of about 16 ms, performs spectrum analysis, and sends the syllable segmentation section 6 at intervals of about 8 ms to collect characteristic patterns and information necessary for syllable segmentation (power, number of zero crossings). etc.). The syllable segmentation section 6 uses the type information from the speech analysis section 5 to cut out syllables from the input speech. The feature pattern of the cut out portion and the audio waveform of that section are stored in a waveform/feature pattern temporary memory 7. Then, the CPU 8 is informed that the audio has been cut out, and the address in the waveform/characteristic pattern temporary memory 7 is also transmitted at the same time. The waveform/characteristic pattern temporary memory 7 is configured to be able to store a plurality of syllables. The processing of the syllable segmentation unit 6 is carried out by the CPU 8.
It is configured so that the start and stop are controlled by commands from. Reference numeral 9 denotes a monosyllable recognition unit, and the monosyllable recognition unit 9 reads the pattern memory 1 according to instructions from the CPU 8.
Distance calculation etc. are performed between the characteristic pattern memory 10a in 0 and the standard pattern memory 10b, and the result is returned to the CPU 8. Then, the CPU 8 stores the results in the recognition result storage memory 11 and displays them on the display device 12. The recognition result storage memory 11 is configured to be able to store recognition results for a plurality of syllables. Reference numeral 13 denotes an audio output control section, and the audio output control section 13 sends information stored in an arbitrary part of the waveform/characteristic pattern temporary memory 7 to the analog output section 14 in response to instructions from the CPU 8. The analog output section 14 is configured to reproduce the audio digital signal into an audio waveform using a digital/analog conversion section 14a and an amplifier 14b, and output it. The pattern memory 10 is divided into two parts, and 10a is a feature pattern memory that can store only one feature pattern corresponding to the input syllable. The latter one 10b is a feature standard pattern memory in which feature patterns of a plurality of syllables are stored. The syllable characteristic pattern memory 10b includes an area for storing each syllable name in code, as will be described later.
It is composed of a flag area for storing registration status and a feature standard pattern area for storing feature standard pattern data. Reference numeral 15 denotes an input unit composed of a keyboard, etc. For example, as shown in FIG.
5a, registration mode key 15b, recognition mode key 1
5c, a syllable deletion key 15d, and the like. Reference numeral 16 denotes an I/F unit that controls data transmission and reception when transferring recognition results to an external device. Next, the operation of the apparatus configured as described above will be explained in terms of registration mode and recognition mode. Description of Registration Mode FIG. 3 shows the processing flow of the CPU 8 in the registration mode. In FIG. 3, when the main body of the apparatus is set to the registration mode by operating the registration mode key 15b, the pattern memory 10 is first initialized in step n1, and all standard patterns are erased. Table 1 shows the configuration of the standard pattern memory 10b.

【表】ステツプｎ１における初期化の処理は標準パ
ターンメモリ１０ｂの登録の有無のフラグエリ
アに「０」を入れることで実現される。次にステツプｎ２に移行して発声すべき単音
節が表示装置１２に次のように表示される。「あ₁」ここで添字の「１」は「あ」のパターンの中
の一番目であることを示している。オペレータはこの表示装置１２の表示を見
て、所定の単音節の音声を発声して入力する。この音声入力に応じてステツプｎ３に移行し
て音節セグメンテーシヨン部６に音声の切り出
しの開始の指示を行ない、音節セグメンテーシ
ヨン部６は単音節を切り出し、その中間の波形
及び音声分析部５で得られた特徴パターンを波
形・特徴パターン一時メモリ７に記憶させる。ステツプｎ４では音節セグメンテーシヨン部
６で単音節が切り出されたかどうかのチエツク
を行ない、切り出されると次のステツプ５に移
行する。ステツプｎ５では音節セグメンテーシヨン部
６に切り出し処理の停止を命令し登録の処理を
継続する。ステツプｎ６では今切り出された音節に対応
する音声部分を波形・特徴パターン一時メモリ
７より読み出して音声出力制御部１３を介して
アナログ出力部１４より再生出力させる。ステツプｎ７では再生出力された音声にもと
ずいてオペレータが正確に切り出されたかどう
かを判定し、その結果のキーボード１５による
指示に従い、再切り出しか登録の実行かを決定
する。このステツプｎ７において、オペレータ
が再生出力を聞いて正確に切り出されたと判断
した場合には実行キー１５ｉを操作することに
なり、その結果ステツプｎ８に移行し、オペレ
ータが再切り出しを指示する場合には、解除キ
ー１５ｈの操作に応じて、ステツプｎ３に戻る
ことになる。ステツプｎ８では表示装置１２に表示されて
いる音節に対応する特徴標準パターンメモリ１
０ｂの位置に特徴標準パターンを記憶させると
共に対応する登録の有無を示すフラグに「１」
をセツトする。ステツプｎ９では全標準パターンの登録が終
了されたかどうかの判断を行ない、終了してい
なければステツプｎ２に戻り、次の単音節の表
示、例えば「あ₂」を表示し、同様の処理を行
なう。このようにして、登録が終了すると標準パタ
ーンメモリ１１ｂには総ての単音節の特徴標準
パターンが数個ずつ登録されることになる。次に認識モードの動作を説明する。認識モードの説明第４図は、認識モードにおけるCPU８の処
理フローを示したものである。まず、認識モードキー１５ｃの操作によつて
装置が認識モードに設定され、オペレータが認
識するべく音声を発声すると、この入力音声に
応じてステツプｎ１１では音節セグメンテーシ
ヨン部６に音節の切り出し開始の命令を与え
る。そして、音節セグメンテーシヨン部６は波
形・特徴パターン一時メモリ７を初期化し、以
後切り出した音節に対応する特徴パターンと波
形を先頭番地から入れていき、各音節の波形及
び特徴パターンの始端と終端番地の情報を
CPU８に与える。ステツプｎ１２では音節が切り出されたかど
うかのチエツクを行ない、切り出されるとステ
ツプｎ１３に移る。ステツプｎ１３では、波
形・特徴パターン一時メモリ７の特徴パターン
をパターンメモリ１０の特徴パターンメモリ１
０ａの領域に転送して認識を行なう。即ち単音
節認識部９に認識の命令を与えることにより特
徴パターンメモリ１０ａの内容と標準パターン
メモリ１０ｂの内容の照合により認識が行なわ
れ、その結果を認識結果格納メモリ７に入れる
とともに、表示装置１２に表示する（ステツプ
ｎ１４）。例えば、入力音声として「かいもの」と発声
したときの認識結果の第１位が「もぎもの」で
あれば表示装置１２にはもぎもの_- と表示され、また認識結果格納メモリ１１に
は、各音節に対する複数の認識結果候補が例え
ば第２表に示すように格納される。[Table] The initialization process in step n1 is realized by putting "0" in the flag area of the standard pattern memory 10b indicating the presence or absence of registration. Next, proceeding to step n2, the monosyllable to be uttered is displayed on the display device 12 as follows. `` _A1 '' Here, the subscript ``1'' indicates that it is the first character in the ``A'' pattern. The operator looks at the display on the display device 12 and inputs a predetermined monosyllabic voice by uttering it. In response to this voice input, the process moves to step n3, where the syllable segmentation unit 6 is instructed to start cutting out the voice, and the syllable segmentation unit 6 cuts out a single syllable, and the intermediate waveform and voice analysis unit 5 The characteristic pattern obtained is stored in the waveform/characteristic pattern temporary memory 7. In step n4, the syllable segmentation section 6 checks whether a single syllable has been segmented, and if it has been segmented, the process moves to the next step 5. At step n5, the syllable segmentation section 6 is commanded to stop the segmentation process, and the registration process is continued. In step n6, the audio portion corresponding to the syllable just cut out is read out from the waveform/characteristic pattern temporary memory 7 and is reproduced and output from the analog output section 14 via the audio output control section 13. In step n7, it is determined whether the operator has correctly extracted the audio based on the reproduced audio, and in accordance with the resulting instruction from the keyboard 15, it is determined whether to perform re-extracting or registration. In this step n7, if the operator listens to the playback output and determines that the clip has been cut out accurately, he or she operates the execution key 15i, and as a result, the process moves to step n8, and if the operator instructs re-cutting, , the process returns to step n3 in response to the operation of the release key 15h. In step n8, the characteristic standard pattern memory 1 corresponding to the syllable displayed on the display device 12 is
The feature standard pattern is stored in the 0b position, and "1" is set in the flag indicating whether or not the corresponding registration exists.
Set. At step n9, it is determined whether or not all standard patterns have been registered. If not, the process returns to step n2, displays the next monosyllable, for example, " _A2 ", and performs the same process. In this way, when the registration is completed, several characteristic standard patterns of all monosyllabic characters are registered in the standard pattern memory 11b. Next, the operation in recognition mode will be explained. Description of Recognition Mode FIG. 4 shows the processing flow of the CPU 8 in the recognition mode. First, the apparatus is set to recognition mode by operating the recognition mode key 15c, and when the operator utters a voice to be recognized, in step n11, the syllable segmentation section 6 is instructed to start cutting out syllables according to the input voice. give orders. Then, the syllable segmentation unit 6 initializes the waveform/feature pattern temporary memory 7, and thereafter stores the feature pattern and waveform corresponding to the cut out syllable from the first address, starting from the beginning and end of the waveform and feature pattern of each syllable. street address information
Give to CPU8. In step n12, a check is made to see if a syllable has been cut out, and if it has been cut out, the process moves to step n13. In step n13, the characteristic pattern of the waveform/characteristic pattern temporary memory 7 is transferred to the characteristic pattern memory 1 of the pattern memory 10.
It is transferred to the 0a area and recognized. That is, by giving a recognition command to the monosyllable recognition unit 9, recognition is performed by comparing the contents of the characteristic pattern memory 10a with the contents of the standard pattern memory 10b, and the result is stored in the recognition result storage memory 7 and displayed on the display device 12. (Step n14). For example, if the first recognition result when uttering "kaimono" as an input voice is "mogimono", the display device 12 will display "mogimono- _" , and the recognition result storage memory 11 will display each recognition result. A plurality of recognition result candidates for syllables are stored, for example, as shown in Table 2.

【表】上記第２表において、アンダーラインを付し
たものは各音節名での上位の認識結果であり、
「音節次候補」キー１５ｅの操作によつて順次、
表示装置１２に表示され、また、本発明にした
がつて消去の対象となり得る特徴標準パターン
に対応したものである。上記の「かいもの」といつた単語の入力が終
わると、オペレータはキーボード等の入力部１
５の「終了」キー１５ｇを操作する。この結
果、音節セグメンテーシヨン部６に切り出しの
停止が命令される（ステツプｎ１５，ｎ１６）。
そして、全文字列が正解であれば「転送」のキ
ー１５ｊを入力することにより、Ｉ／Ｆ部１６
を介して外部装置にカナ文字を出力することが
できる（ステツプｎ１８，ｎ１９）。また認識結果の表示を見て、ほとんどの文字
が間違つていたり、言い間違いをしたときには
「取消」キー１５ｆを入力することにより、ス
テツプｎ１７の判断により、初期状態に戻すこ
とができる。また、一部の認識結果が違つている場合に
は、ステツプｎ２０に示すように、オペレータ
がキーによる修正を行なうことになる。キーによる修正には二種類の方法がある。まず、修正したい位置にカーソル移動キー１
５ｋ，１５ｌ「→」「←」を用いて、修正したい
文字のところにカーソルを持つていく。例えば
第２文字目の「ぎ」を修正したい場合には、カ
ーソル移動キー「←」１５ｌの操作により表示
は次のようになる。もぎもの一つの方法としては、このカーソル位置でキ
ーボード１５のカナキー１５ａで文字を入れる
ことにより、次のように修正する。＜キー入力＞「い」もいものもう一つの方法としては、キーボード１５の
「音節次候補」のキー１５ｅを入力することに
よつて＜キー入力＞「音節次候補」もにもの「音節次候補」もいもののように修正ができる。なお、このとき同一音
節名のものは一度しか表示されないように構成
されている。即ち第２音節の認識結果の候補は
「ぎ₃、に₁、ぎ₂、い₂、…」であるが、「ぎ₂」は
表示されないように成されている。上記のような手順によつて認識結果の修正が
行なわれるが、本発明による特徴標準パターン
の消去が次の手順によつて行なわれる。上記した「かいもの」の例でいえば、入力音
声「か」に対する認識結果として「も」が出現
することは、音節の類似度の点から異常だと考
えられる。そこで表示装置１２上のカーソルを
カーソル移動１５ｋ，１５ｌ「←」「→」を用い
て、最初の「も」の位置に移動させて表示画面
を次のようにする（ｎ２０）もいものここで、この「も」の特徴標準パターンを消
去する場合には「音節消去」キー１５ｄを操作
する（ステツプｎ２１）。もし、「も」以外の特
徴標準パターンを消去したい場合には「音節次
候補」キー１５ｅを操作して音節候補を順次選
択して表示させ、表示に現われたときに「音節
消去」キー１５ｄを操作して消去したい音節を
指示することになる。上記表示のときに、「音節消去」キー１５ｄ
を入力することにより、ステツプｎ２１の判断
でステツプｎ２２に移り、音声出力制御部１３
に再生すべき音節の波形の始端と終端の番地が
指示され、波形・特徴パターン一時メモリ７よ
り該当部分の波形が読み出されて、音声出力制
御部１３を介してアナログ出力部１４に与えら
れ、該アナログ出力部１４から「か」に対応す
る音声が再生出力される。オペレータはこの再生音声を聞くことによ
り、入力者自身によつて切り出し位置が正確か
どうかの判断を下し、「実行」キー１５ｉある
いは「解除」キー１５ｈを入力することになる
が、「実行」キー１５ｉが押されるとステツプ
ｎ２３からステツプｎ２４に進み、「解除」キ
ー１５ｈが押されるとステツプｎ１７に進むこ
とになる。ステツプｎ２３からｎ２４に移行すると
CPU８は指示された認識結果（今の場合は
「も₁」）に対応する音節特徴標準パターンをパ
ターンメモリ１０の標準パターンメモリ１０ｂ
から消去すると共に該当音節名部分の登録の有
無フラグを「０」にする。上記の例では、特徴
標準パターンの「も₁」の消去を行なうので、
音節名「も₁」の登録有無のフラグに「０」を
入れることによつて実現される。もし、その音節名の登録の有無フラグが総て
「０」になつてしまつた場合には（ステツプｎ
２５の判断による）、再登録を促すために次の
ような表示を行なう（ステツプｎ２６）。「も」のパターンは総て消去されました。オペレータはこの表示を見て、装置をを登録
モードに設定して、「も」の標準パターンの登
録を行なうことになる。上記実施例では、標準パターンメモリ１０ｂ
の記憶内容の一部を入力者自らの判断だけで消
去するように成しているが、本発明はこれに限
定されるものではなく、例えば標準パターンの
良否を判定することにより認識に貢献している
特徴標準パターンは消去できないようになして
も構わない。標準パターンの良否の判定方法としては、例
えば本出願人が先に特願昭57−217296号「音声
認識装置」として提案した方法、即ち特徴標準
パターン毎にカウンタ手段を設け、入力音声の
認識判定結果に応じて、そのカウンタ値を増減
させ、このカウンタ値に応じて認識に貢献して
いる特徴標準パターンを判定する方法等があ
る。以上のようにして、上記した実施例によれば、
一つの音節に対して複数の特徴標準パターンを持
つ音声入力装置において、誤認識に関与した不適
当な特徴標準パターンを消去することにより、認
識性能を向上させることができる。したがつて、
例えば音声入力装置を使用していると、登録時の
操作ミス等によつて特徴標準パターンとして不適
当なものが登録され、入力音声とかけはなれた音
節が認識結果として出現することがあり、例えば
「い」と発声しても常に「にや」が認識結果とし
て一位に出現し、その度に修正を行なわなければ
ならない場合が生じるが、本発明の実施例によれ
ば、誤認識に関与した特徴標準パターンの「に
や」を消去することにより、以後は「にや」が一
位に出現することがなくなり、認識性能が向上す
ることになる。また。例えば「かいもの」と発声したときの
「か」を「も」に誤認識した場合、入力した音声
を記憶しておくことにより、この「も」の特徴標
準パターンの消去を指示したときに、入力音声
「か」に対応する部分の音声を再生出力して、正
確に音節として切り出されたか否かを入力者自身
が確認でき、また消去を指示した特徴標準パター
ンが正常な入力音声に対する認識結果であるか否
かを判断することが出来、認識に貢献している特
徴標準パターンを間違つて消去することを避ける
ことが出来る。＜発明の効果＞以上のように、本発明によれば入力された音声
を予め登録された複数種類の音節の特徴標準パタ
ーンと照合して音節単位に認識する日本語音声入
力装置において、入力時に発声された音声データ
を一時記憶する一時記憶手段と、前記音声の認識
結果を表示する表示手段と、前記表示手段に表示
された認識結果にもとづいて誤認識に関与した特
徴標準パターンを消去する消去手段を備え、前記消去手段は、上記表示手段に表示された認
識結果の中の消去したい文字を選択する選択手段
と、消去キーと、前記選択手段と消去キーの操作
に基づいて選択手段で選択された文字に対応する
音声データを上記一時記憶手段から読出して音声
出力させる手段と、前記音声出力による確認後に
前記選択手段で選択された文字の特徴標準パター
ンの消去を実行させるための実行キー及びその消
去を解除するキーを含むことを特徴としており、
そのため消去を指示した特徴標準パターンが目的
とする入力音声に対する認識結果であるか否かを
判断することができ、認識に貢献している特徴標
準パターンを間違つて消去するといつた不都合を
回避でき、もつて不適当な特徴標準パターンの消
去で、認識性能を向上させることができる。[Table] In Table 2 above, the underlined items are the top recognition results for each syllable name.
By operating the "Next Syllable Candidate" key 15e,
This corresponds to a characteristic standard pattern that is displayed on the display device 12 and that can be erased according to the present invention. When the operator has finished inputting the word "kaimono" mentioned above, the operator should
5. Operate the "end" key 15g. As a result, the syllable segmentation unit 6 is commanded to stop segmentation (steps n15, n16).
Then, if all the character strings are correct, by inputting the "transfer" key 15j, the I/F unit 16
Kana characters can be output to an external device via (steps n18, n19). If you look at the display of the recognition results and find that most of the characters are incorrect or you have made a mistake, you can return to the initial state by inputting the "cancel" key 15f as determined in step n17. If some of the recognition results are incorrect, the operator will make corrections using keys, as shown in step n20. There are two ways to modify using keys. First, move the cursor to the position you want to correct using the 1 key.
5k, 15l Use "→" and "←" to move the cursor to the character you want to correct. For example, if you want to correct the second character "gi", the display changes as follows by operating the cursor movement key "←" 15l. One method is to enter characters at this cursor position using the Japanese keypad 15a of the keyboard 15, and make the following corrections. <Key input>"I" Momono Another method is to input the "Next syllable candidate" key 15e on the keyboard 15. <Key input>"Next syllable candidate" MOMONO " Next syllable You can modify it like "Candidate" Moimo . Note that at this time, the configuration is such that items with the same syllable name are displayed only once. That is, the candidates for the recognition result of the second syllable are " _gi3 , _ni1 , _gi2 , _i2 ,...", but " _gi2 " is not displayed. Although the recognition result is corrected by the above-described procedure, the feature standard pattern is deleted by the following procedure according to the present invention. In the example of "kaimono" mentioned above, the appearance of "mo" as a recognition result for the input speech "ka" is considered abnormal in terms of syllable similarity. Therefore, use cursor movement 15k, 15l "←" and "→" to move the cursor on the display device 12 to the position of the first "mo" and change the display screen to the following (n20) Momo Here , To delete this characteristic standard pattern of "mo", operate the "syllable deletion" key 15d (step n21). If you want to delete the characteristic standard pattern other than "mo", operate the "Next Syllable Candidate" key 15e to select and display syllable candidates one by one, and when they appear on the display, press the "Delete Syllable" key 15d. You will be instructed to operate on the syllable you want to erase. When the above display is displayed, "Syllable deletion" key 15d
By inputting , the process moves to step n22 based on the judgment in step n21, and the audio output control section 13
The addresses of the start and end of the waveform of the syllable to be reproduced are specified, and the waveform of the corresponding part is read out from the waveform/characteristic pattern temporary memory 7 and given to the analog output section 14 via the audio output control section 13. , the audio corresponding to "ka" is reproduced and output from the analog output section 14. By listening to this reproduced audio, the operator judges whether the cutting position is accurate or not, and then inputs the "execute" key 15i or the "cancel" key 15h. When the key 15i is pressed, the process proceeds from step n23 to step n24, and when the "cancel" key 15h is pressed, the process proceeds to step n17. When moving from step n23 to n24
The CPU 8 stores the syllable feature standard pattern corresponding to the instructed recognition result (in this case, " _Mo1 ") in the standard pattern memory 10b of the pattern memory 10.
At the same time, the registration presence/absence flag of the corresponding syllable name part is set to "0". In the above example, the standard feature pattern " _Mo1 " is deleted, so
This is achieved by setting "0" to the flag indicating whether or not the syllable name " _Mo1 " is registered. If all the registration flags for that syllable name become "0" (step n
25), the following display is made to encourage re-registration (step n26). All "mo" patterns have been deleted. The operator sees this display, sets the device to registration mode, and registers the standard pattern of "Mo". In the above embodiment, the standard pattern memory 10b
Although a part of the memory content of the input user is erased based on the input person's own judgment, the present invention is not limited to this, and for example, contributes to recognition by determining the quality of the standard pattern. It is also possible to make it impossible to erase the feature standard pattern. As a method for determining the quality of a standard pattern, for example, the method previously proposed by the applicant in Japanese Patent Application No. 57-217296 ``Speech Recognition Device'' is used, in which a counter means is provided for each feature standard pattern, and a recognition judgment of input speech is performed. There is a method in which the counter value is increased or decreased depending on the result, and a feature standard pattern contributing to recognition is determined according to the counter value. As described above, according to the embodiment described above,
In a speech input device that has a plurality of feature standard patterns for one syllable, recognition performance can be improved by erasing inappropriate feature standard patterns that are involved in erroneous recognition. Therefore,
For example, when using a voice input device, an inappropriate standard pattern may be registered as a feature standard pattern due to an operational error during registration, and syllables that are far from the input voice may appear as recognition results. Even if you say "I", "Niya" always appears in the first place as a recognition result, and you may have to make corrections each time. By erasing "niya" from the feature standard pattern, "niya" will no longer appear in first place, improving recognition performance. Also. For example, if you say "kaimono" and misrecognize "ka" as "mo", by storing the input voice, when you instruct to delete the characteristic standard pattern of "mo", By reproducing and outputting the part of the audio corresponding to the input speech "ka", the inputter can check whether the syllable has been correctly extracted, and the recognition result for the input speech in which the feature standard pattern that was instructed to be deleted is normal. Therefore, it is possible to avoid mistakenly erasing feature standard patterns that contribute to recognition. <Effects of the Invention> As described above, according to the present invention, in a Japanese speech input device that recognizes input speech in units of syllables by comparing input speech with standard patterns of characteristics of multiple types of syllables registered in advance, temporary storage means for temporarily storing uttered voice data; display means for displaying the recognition result of the voice; and erasure for erasing feature standard patterns involved in misrecognition based on the recognition result displayed on the display means. The erasing means includes a selection means for selecting characters to be erased from the recognition results displayed on the display means, an erasure key, and a selection means for selecting characters based on operations of the selection means and the erasure key. means for reading audio data corresponding to the selected character from the temporary storage means and outputting the audio data; an execution key for deleting the feature standard pattern of the character selected by the selecting means after confirmation by the audio output; It is characterized by including a key to cancel the erasure,
Therefore, it is possible to judge whether or not the feature standard pattern that was instructed to be deleted is the recognition result for the target input voice, and it is possible to avoid the inconvenience caused by mistakenly deleting the feature standard pattern that contributes to recognition. , recognition performance can be improved by eliminating inappropriate feature standard patterns.

[Brief explanation of drawings]

第１図は本発明の一実施例装置の構成を示すブ
ロツク図、第２図はキーボード等の入力部の一例
を示す平面図、第３図は登録モードの動作を説明
するための処理フロー図、第４図は認識モードの
動作を説明するための処理フロー図である。５…音声分析部、７…波形・特徴パターン一時
メモリ、８…CPU、９…単音節認識部、１０…
パターンメモリ、１０ａ…特徴パターンメモリ、
１０ｂ…標準パターンメモリ、１１…認識結果格
納メモリ、１５…キーボード、１５ｂ…登録モー
ドキー、１５ｃ…認識モードキー、１５ｄ…音節
消去キー。 Fig. 1 is a block diagram showing the configuration of a device according to an embodiment of the present invention, Fig. 2 is a plan view showing an example of an input section such as a keyboard, and Fig. 3 is a processing flow diagram for explaining the operation in registration mode. , FIG. 4 is a processing flow diagram for explaining the operation in the recognition mode. 5... Speech analysis unit, 7... Waveform/feature pattern temporary memory, 8... CPU, 9... Monosyllable recognition unit, 10...
Pattern memory, 10a... Feature pattern memory,
10b... Standard pattern memory, 11... Recognition result storage memory, 15... Keyboard, 15b... Registration mode key, 15c... Recognition mode key, 15d... Syllable deletion key.

Claims

[Scope of Claims] 1. A Japanese speech input device that recognizes input speech in units of syllables by comparing the input speech with standard patterns of characteristics of multiple types of syllables registered in advance, which temporarily stores speech data uttered at the time of input. a temporary storage means for displaying the voice recognition result, a display means for displaying the recognition result of the voice, and an erasing means for erasing the feature standard pattern involved in the misrecognition based on the recognition result displayed on the display means, the erasing means , a selection means for selecting a character to be erased from among the recognition results displayed on the display means, an erase key, and audio data corresponding to the character selected by the selection means based on the operation of the selection means and the erase key. means for reading out from the temporary storage means and outputting it audibly, an execution key for deleting the character feature standard pattern selected by the selection means after confirmation by the audible output, and a key for canceling the deletion. A Japanese voice input device characterized by: