JPH04254896A

JPH04254896A - Speech recognition correction device

Info

Publication number: JPH04254896A
Application number: JP3016519A
Authority: JP
Inventors: Kikumi Kaburagi; 鏑木　喜久美
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1991-02-07
Filing date: 1991-02-07
Publication date: 1992-09-10

Abstract

PURPOSE:To obtain a voice recognition correction device making the best use of its excellent characters without losing the simplicity and quickness that is excellent characters of speech input. CONSTITUTION:A voice recognition correction device recognizes a first input voice, and stores the sound analysis result and the voice recognition result. When it becomes necessary to change the voice recognition result, an operator inputs the changing part as a second voice. In a spotting part 14, matching is carried out by using the sound analysis result and the speech recognition result of the first input and the second input, and the spotting of the changing part is carried out for changing it.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】音声認識装置に係わる。[Industrial field of application] Related to speech recognition devices.

【０００２】0002

【従来の技術】従来の音声認識訂正装置について図９を
用いて説明する。2. Description of the Related Art A conventional speech recognition and correction device will be explained with reference to FIG.

【０００３】従来の音声認識訂正装置では、音響分析部
１において入力された音声の分析を行い、特徴を出力す
る。音響分析部１からの出力に基づいて、音声認識部２
において入力音声の認識を行う。音声認識部２にて認識
された結果は操作者が確認できるように出力される。入
力音声を分析した特徴抽出の結果は、記憶部３に記憶さ
れる。音声認識結果に変更を施す必要がない場合には次
音声入力の操作に移行する。しかし、音声認識結果を変
更したい場合には、カーソル指示部４を操作し変更する
部分の先頭部と、同じく変更部分の終了部にカーソルを
移動させてマークし変更部分を指定する。従来の音声認
識訂正装置では、操作者がカーソル指示部４を操作して
、誤認識等により変更する必要がある部分を指示するの
である。カーソル指示部４の操作によって指示された部
分を変更するため、操作者は変更部５において、キー入
力装置を用いて変更したり、音声認識結果の候補がいく
つか表示されているような場合にはその中から候補を選
ぶ等の方法によって音声認識結果を変更する。[0003] In a conventional speech recognition and correction device, an acoustic analysis section 1 analyzes input speech and outputs characteristics. Based on the output from the acoustic analysis section 1, the speech recognition section 2
Recognizes the input voice. The result recognized by the voice recognition unit 2 is output so that the operator can confirm it. The result of feature extraction obtained by analyzing the input speech is stored in the storage unit 3. If there is no need to change the voice recognition result, the process moves to the next voice input operation. However, if it is desired to change the voice recognition result, the user operates the cursor indicator 4 to move the cursor to the beginning of the part to be changed and the end of the part to mark it, thereby specifying the part to be changed. In the conventional speech recognition correction device, the operator operates the cursor indicator 4 to indicate the part that needs to be changed due to misrecognition or the like. In order to change the part indicated by the operation of the cursor indicator 4, the operator can use the key input device in the changing unit 5 to change the part, or if several candidates for speech recognition results are displayed. changes the speech recognition result by selecting a candidate from among them.

【０００４】0004

【発明が解決しようとする課題】音声による入力は、キ
ー入力操作をすることなくデータ入力を行うことができ
、キー入力装置のキー配置位置、キー操作方法等を知る
必要がなく、誰でもが簡便に使用できる入力方法である
。しかし、音声入力方法は、キー操作による入力方法と
異なり、操作者が正確に入力をしても入力データが正確
に認識される確率はやや低くなる傾向がある。そこで、
音響学会誌中の文章を構成する単語について、１単語を
構成している音素の数を調べたところ、１単語当りの平
均音素数は約９音素であった。ここで、音素認識率が９
５％以上である音声認識装置を用いても、１単語中に１
文字の認識誤りは予想される。また、音素認識率が９８
％以上である音声認識装置を考えた場合でも、１単語中
に１文字の認識誤りが発生することは、そう稀なことで
はないと考えられる。また、同じく音響学会誌中文章に
ついて、１文を構成している音素数を調べたところ、１
文当りの平均音素数は約４７音素であった。ここで、音
素認識率が９５％以上である音声認識装置を用いても、
１文中に２〜３音素の認識誤りは避けられないことにな
る。また、音素認識率が９８％以上である音声認識装置
を考えた場合でも、一文の中に１音素の認識誤りが発生
することは充分に考えられる。以上の事実からみても、
入力された音声の認識結果を変更する必要が生じるのは
、そう稀なことではないことが分かる。音声認識装置を
考えた場合には、音声認識訂正装置の役割はきわめて大
きいと思われる。また、音声認識の誤りを正す場合だけ
ではなく、入力データの一部を変更したい場合にも音声
認識訂正装置が用いられる。[Problem to be solved by the invention] Voice input allows data input without performing key input operations, and there is no need to know the key layout positions of the key input device, key operation methods, etc., and anyone can do it. This is an easy-to-use input method. However, unlike the input method using keys, the voice input method tends to have a slightly lower probability that the input data will be accurately recognized even if the operator inputs the data accurately. Therefore,
When we looked at the number of phonemes that make up one word for the words that make up the sentences in the Journal of the Acoustical Society of Japan, we found that the average number of phonemes per word was about 9 phonemes. Here, the phoneme recognition rate is 9
Even with a speech recognition device that has a rate of 5% or more, only 1 per word is used.
Errors in character recognition are to be expected. Also, the phoneme recognition rate is 98
% or more, it is considered that it is not so rare that a single character recognition error occurs in one word. Also, when we looked at the number of phonemes that make up one sentence for sentences in the Journal of the Acoustical Society of Japan, we found that 1
The average number of phonemes per sentence was approximately 47 phonemes. Here, even if a speech recognition device with a phoneme recognition rate of 95% or more is used,
Misrecognition of two to three phonemes in one sentence is unavoidable. Further, even when considering a speech recognition device with a phoneme recognition rate of 98% or more, it is quite possible that a recognition error of one phoneme will occur in one sentence. Considering the above facts,
It can be seen that it is not uncommon for it to be necessary to change the recognition results of input speech. When considering speech recognition devices, the role of speech recognition correction devices is considered to be extremely important. Furthermore, the speech recognition correction device is used not only when correcting errors in speech recognition, but also when it is desired to change part of input data.

【０００５】音声認識結果に変更を加える必要が生じた
場合には、まず変更部分を正しく指定することが重要で
ある。しかし、前述の従来技術を用いた音声認識訂正装
置では、操作者は自ら図９カーソル指示部４を操作し認
識誤りや変更が発生した区間の指定をしなければならな
い。このようなキー入力装置を用いたカーソル操作を頻
繁にしなければならないことは、操作者にとって非常に
負担である。このような視線が頻繁に移動する作業は、
操作者の視神経を非常に疲労させ作業効率を低下させる
ばかりか、視力の著しい低下を招く恐れがある。また、
音声による入力という優れた入力方法を用いておきなが
ら、変更部分の指定を手動でカーソルを移動することに
よってする従来技術では、音声認識装置の特徴である「
誰にでも操作が簡便に出来る。」、「キー入力やその他
の方法に比べ、データ入力スピードが速い。」という利
点を充分に発揮することが出来ないのである。つまり、
データ入力操作は速やかに行えたとしても、従来技術を
用いた音声認識訂正装置は、誰でもが簡単に迅速に変更
操作を行うことは非常に困難である。このように従来技
術を用いた音声認識訂正装置では、変更操作に非常に時
間がかかり、極めて作業効率が悪いのである。従来技術
には、以上述べてきたような問題点があった。[0005] When it becomes necessary to make changes to the speech recognition results, it is important to first correctly specify the changes. However, in the speech recognition and correction apparatus using the prior art described above, the operator must himself operate the cursor designator 4 shown in FIG. 9 to specify the section in which a recognition error or change has occurred. It is very burdensome for the operator to have to frequently operate the cursor using such a key input device. Work that requires frequent movement of the line of sight is
This not only greatly fatigues the operator's optic nerves and reduces work efficiency, but also may cause a significant decrease in visual acuity. Also,
While using the excellent input method of voice input, the conventional technology requires manually moving the cursor to specify the changed part, but the voice recognition device's characteristic "
Anyone can easily operate it. ” and “data entry speed is faster than key entry or other methods.” In other words,
Even if data input operations can be performed quickly, it is extremely difficult for anyone to easily and quickly change the voice recognition and correction apparatus using the conventional technology. As described above, in the speech recognition and correction apparatus using the conventional technology, the change operation takes a very long time, and the work efficiency is extremely low. The conventional technology has the problems described above.

【０００６】[0006]

【課題を解決するための手段】本発明の音声認識訂正装
置は、入力された第１の音声の特徴を出力する音響分析
部と、前記音響分析部の出力を符号列に変換する音声認
識部と、前記音響分析部の出力を記憶する音響分析記憶
部と、前記音声認識部の出力を記憶する音声認識記憶部
と、入力された第２の音声を前記音響分析記憶部及び前
記音声認識記憶部内のデータと対比して前記第１の音声
のデータから、前記第２の音声に該当する部分を検出す
るスポッティング部と、前記符号列のうち前記該当する
部分に対応する部分を変更する変更部とからなることを
特徴とする。[Means for Solving the Problems] A speech recognition and correction device of the present invention includes an acoustic analysis section that outputs characteristics of an input first speech, and a speech recognition section that converts the output of the acoustic analysis section into a code string. an acoustic analysis storage section that stores the output of the acoustic analysis section; a speech recognition storage section that stores the output of the speech recognition section; and an acoustic analysis storage section that stores the input second speech and the speech recognition storage section. a spotting unit that detects a portion corresponding to the second voice from data of the first voice in comparison with data in the first voice; and a changing unit that changes a portion of the code string that corresponds to the corresponding portion. It is characterized by consisting of.

【０００７】[0007]

【実施例】以下、本発明について実施例に基づいて詳細
に説明する。（実施例１）図１は本発明の音声認識訂正装置の原理ブ
ロック図、図２は本発明の一実施例である単語毎に区切
って発生した音声認識訂正装置のブロック図である。図
２に破線によって示した処理経路は、第２の音声のもの
である。単語毎に区切って発声された第１の音声は、図
１音響分析部１１の構成要素であるマイク、高域強調フ
ィルタ、ＡＤ変換器より構成される図２音声入力部２１
によって８ＫＨｚ、１２ｂｉｔｓのデジタル信号として
サンプリングされる。さらに同じく図１音響分析部１１
の構成要素である図２特徴抽出回路２２において、デジ
タル信号に変換された音声信号を１６ｍｓ区間を１フレ
ームとして１フレーム毎に周波数変換し、周波数領域で
の特徴パラメータを抽出し、発声された単語の特徴パラ
メータ列として表される。図１音響分析部１１で抽出さ
れた第１の音声の特徴パラメータ列は、図２特徴パラメ
ータ列記憶回路２７に記憶される。図２特徴パラメータ
記憶回路２７は図１音響分析記憶部１６を構成する。さ
らに、第１の音声の特徴パラメータ列は、図１音声認識
部１２を構成する図２ＤＰマッチング回路２３において
、図２音素記憶辞書２４とマッチングされる。図１音声
認識部１２において認識判定された音素ラティスは、図
２音素ラティス記憶回路３２に記憶され、図２表示部制
御回路２５の制御によって図２表示部２６に表示される
。図２音素ラティス記憶回路３２は図１音声認識記憶部
１３を構成している。図２表示部２６に表示された音声
認識結果に誤りがあった場合、または変更したい部分が
生じた場合には、操作者は訂正キーに触れる等の行為に
よって認識結果変更の必要を知らせる。音声認識訂正装
置は、図２訂正キー制御回路２８からの発信によって第
２の音声入力に備える。図２に破線で示されているのは
、第２の入力音声の処理を示したものである。入力され
た第２の音声である音素は、第１の音声と同様に、図２
音声入力部２１、図２特徴抽出回路２２を経て特徴パラ
メータ列に変換される。まず、その第２の音声の特徴パ
ラメータ列は、図２特徴パラメータ列記憶回路２７に記
憶されている第１の入力音声の特徴パラメータ列と、図
２ＤＰマッチング回路２９においてＤＰマッチングされ
る。誤って認識されてしまった音素、または変更を施す
必要のある音素を第２の音声として入力することによっ
て、変更部分を確実にスポッティングしているのである
。この処理により得られた変更部分をスポッティング１
とする。さらに、変更部分のスポッティングを確実にす
るために、第２の音声も第１の音声と同様に図２ＤＰマ
ッチング回路２３において、図２音素記憶辞書２４とＤ
Ｐマッチングし、音声認識する。この第２の音声を音声
認識した結果と、図２音素ラティス記憶回路３２に記憶
されている第１の音声の音声認識結果とを、図２ＤＰマ
ッチング回路３３において、ＤＰマッチングする。これ
は、音声認識結果を用いて変更部分のスポッティングを
する処理である。この処理により得られた変更部分をス
ポッティング２とする。次に、図２スポッティング結果
比較回路３４において、図２特徴パラメータ列記憶回路
２７に記憶されている第１の音声の特徴パラメータ列と
第２の音声の特徴パラメータ列を用いて変更部分をスポ
ッティングした結果、スポッティング１と、図２音素ラ
ティス記憶回路３２に記憶されている第１の音声の音声
認識結果と第２の音声の音声認識結果を用いて変更部分
をスポッティングした結果、スポッティング２とを比較
する。図２スポッティング結果比較回路３４において比
較した結果、スポッティング１とスポッティング２が同
一ならば、確実に変更部分がスポッティングされたとし
て、次の処理に移る。もし、スポッティング１とスポッ
ティング２が一致していない場合には、スポッティング
１、スポッティング２をそれぞれＤＰマッチングにより
マッチングした際のＤＰ値の大きい方を変更部と決定す
る。もし変更部分が誤ってスポッティングされてしまっ
た場合には、操作者は変更された結果が図２表示部２６
に表示されるのを見て、知ることが出来る。そのような
場合には、直ちに図２訂正キー制御回路２８に制御され
ている訂正キーを操作して、先に図２スポッティング結
果比較回路３４において他のスポッティング結果よりも
、ＤＰ値が小さいために変更部分の候補に選ばれなかっ
たもう一方のスポッティング結果を変更部分として、変
更部分の変更操作は再び進められる。図２ＤＰマッチン
グ回路２９、図２ＤＰマッチング回路３３、図２スポッ
ティング結果比較回路３４は、図１スポッティング部１
４を構成する。以上の処理によりスポッティングされた
変更部分について、図２音素ラティス記憶回路３２の中
に記憶されている第２候補の音素を図２音素ラティス入
れ替え回路３０によって、第１候補の音素として順位入
れ替えし、図２表示部制御回路２５の制御によって図２
表示部２６に表示する。この操作により変更部が変更を
希望する音素に変更されたならば、図２確定キー制御回
路３１に制御されている確定キーを操作して音声認識結
果を確定する。変更部分の第２候補の音素が変更を希望
する音素でない場合には、確定キーが操作されないこと
を図２確定キー制御回路３０は認識し、図２音素ラティ
ス入れ替え回路３０はさらに下位の優先順位を持った候
補を第１の候補として順位入れ替えし、図２表示部２６
に表示する。操作者は、希望の音素を見つけたところで
、確定キーを操作して音声認識結果を確定する。表示さ
れた第２、第３、それ以下の候補の中に希望する音素が
存在しなければ、改めて音声の入力を行い、最初に音声
を入力した際と同様な経路にて音声認識を行う。図２音
素ラティス入れ替え回路３０、図２確定キー制御回路３
１は図１変更部１５を構成する。EXAMPLES The present invention will be explained in detail below based on examples. (Embodiment 1) FIG. 1 is a block diagram of the principle of a speech recognition and correction device according to the present invention, and FIG. 2 is a block diagram of a speech recognition and correction device that generates speech by dividing it into words according to an embodiment of the present invention. The processing path indicated by the dashed line in FIG. 2 is that of the second voice. The first voice that is divided into words and uttered is sent to the voice input unit 21 in FIG. 2, which is composed of a microphone, a high-frequency emphasis filter, and an AD converter, which are the components of the acoustic analysis unit 11 in FIG.
It is sampled as an 8KHz, 12-bit digital signal. Furthermore, the same figure 1 acoustic analysis section 11
In the feature extraction circuit 22 in FIG. 2, which is a component of the digital signal, the frequency of the audio signal converted into a digital signal is converted frame by frame, with a 16 ms interval as one frame, and the feature parameters in the frequency domain are extracted. is expressed as a feature parameter sequence. The first audio feature parameter string extracted by the acoustic analysis unit 11 in FIG. 1 is stored in the feature parameter string storage circuit 27 in FIG. The feature parameter storage circuit 27 in FIG. 2 constitutes the acoustic analysis storage section 16 in FIG. Furthermore, the first speech feature parameter string is matched with the phoneme memory dictionary 24 in FIG. 2 in the DP matching circuit 23 in FIG. 2 which constitutes the speech recognition unit 12 in FIG. The phoneme lattice recognized and determined by the speech recognition unit 12 in FIG. 1 is stored in the phoneme lattice storage circuit 32 in FIG. 2, and displayed on the display unit 26 in FIG. The phoneme lattice storage circuit 32 in FIG. 2 constitutes the speech recognition storage section 13 in FIG. If there is an error in the voice recognition result displayed on the display unit 26 in FIG. 2, or if there is a part that should be changed, the operator notifies the user of the need to change the recognition result by touching a correction key or the like. The speech recognition correction device prepares for the second speech input by the transmission from the correction key control circuit 28 in FIG. The broken line in FIG. 2 shows the processing of the second input voice. The phonemes that are the input second voice are as shown in Figure 2, similar to the first voice.
It is converted into a feature parameter string via the voice input section 21 and the feature extraction circuit 22 shown in FIG. First, the feature parameter string of the second voice is subjected to DP matching in the DP matching circuit 29 of FIG. 2 with the feature parameter string of the first input voice stored in the feature parameter string storage circuit 27 of FIG. By inputting the incorrectly recognized phoneme or the phoneme that needs to be changed as the second voice, the changed part is reliably spotted. Spotting the changed parts obtained by this process 1
shall be. Furthermore, in order to ensure spotting of the changed part, the second voice is also used in the DP matching circuit 23 in FIG. 2 in the same way as the first voice.
P matching and voice recognition. The result of voice recognition of this second voice and the voice recognition result of the first voice stored in the phoneme lattice storage circuit 32 of FIG. 2 are subjected to DP matching in the DP matching circuit 33 of FIG. This is a process of spotting changed parts using the voice recognition results. The changed portion obtained by this process is referred to as spotting 2. Next, in the spotting result comparison circuit 34 shown in FIG. 2, the changed portion was spotted using the first voice feature parameter string and the second voice feature parameter string stored in the feature parameter string storage circuit 27 shown in FIG. As a result, spotting 1 is compared with spotting 2, which is the result of spotting the changed part using the voice recognition results of the first voice and the voice recognition results of the second voice stored in the phoneme lattice storage circuit 32 in FIG. do. As a result of comparison in the spotting result comparison circuit 34 of FIG. 2, if spotting 1 and spotting 2 are the same, it is assumed that the changed portion has been definitely spotted, and the process moves on to the next process. If spotting 1 and spotting 2 do not match, the one with the larger DP value when spotting 1 and spotting 2 are matched by DP matching is determined to be the changed part. If the changed part is spotted by mistake, the operator can check the changed result on the display section 26 in FIG.
You can find out by seeing what is displayed on the screen. In such a case, the correction key controlled by the correction key control circuit 28 in FIG. 2 is immediately operated, and the spotting result comparing circuit 34 in FIG. The other spotting result that was not selected as a candidate for the changed part is used as the changed part, and the changing operation for the changed part is proceeded again. The DP matching circuit 29 in FIG. 2, the DP matching circuit 33 in FIG. 2, the spotting result comparison circuit 34 in FIG.
4. Regarding the changed part spotted by the above processing, the second candidate phoneme stored in the phoneme lattice storage circuit 32 in FIG. 2 is rearranged as the first candidate phoneme by the phoneme lattice replacement circuit 30 in FIG. FIG. 2 is controlled by the display control circuit 25.
It is displayed on the display section 26. When the changing unit is changed to the desired phoneme by this operation, the user operates the enter key controlled by the enter key control circuit 31 in FIG. 2 to confirm the speech recognition result. If the second candidate phoneme in the change part is not the desired phoneme to be changed, the confirm key control circuit 30 in FIG. 2 recognizes that the confirm key is not operated, and the phoneme lattice replacement circuit 30 in FIG. The candidate with
to be displayed. When the operator finds the desired phoneme, he or she operates the confirmation key to confirm the speech recognition result. If the desired phoneme does not exist among the displayed second, third, and lower candidates, the voice is input again and voice recognition is performed using the same route as when the voice was first input. Figure 2 Phoneme lattice exchange circuit 30, Figure 2 Determining key control circuit 3
1 constitutes a modification unit 15 in FIG.

【０００８】本発明について実施例に基づいて、図３、
図４を用いてさらに説明する。Based on the embodiment of the present invention, FIG.
This will be further explained using FIG. 4.

【０００９】図３は本発明の音声認識訂正装置の一実施
例である単語毎に区切って発声した音声認識訂正装置を
構成する図１音声認識記憶部１３に記憶された音素ラテ
ィス構造、図１音響分析記憶部１６に記憶された特徴抽
出結果の一部である音声パワーを示す図である。FIG. 3 shows an embodiment of the speech recognition and correction apparatus of the present invention, which constitutes a speech recognition and correction apparatus in which each word is uttered separately. FIG. 3 is a diagram showing audio power that is part of the feature extraction results stored in the acoustic analysis storage unit 16. FIG.

【００１０】単語毎に区切って発声された音声を認識す
る音声認識訂正装置において、操作者が「電子計算機」
という単語を第１の音声として入力したと仮定する。こ
の入力を受けた際の図２音素ラティス記憶回路３２に記
憶された音素ラティスは図３に示した通り、「で」の音
素ラティス構造は第一候補は「で」、第二候補は「て」
である。「ん」の音素ラティス構造は第一候補は「ん」
、第二候補は「む」、第三候補は「う」である。「し」の音素ラティス構造は第一候補が「ち」、第二候
補は「し」である。「け」の音素ラティス構造は第一候
補が「け」、第二候補、第三候補はない。「い」の音素
ラティス構造は第一候補が「い」、第二候補は「ひ」、
第三候補は「し」である。「さ」音素ラティス構造は第
一候補が「さ」、第二候補が「は」、第三候補が「あ」
である。「ん」の音素ラティス構造は第一候補が「ん」
、第二候補は「む」である。「き」の音素ラティス構造
は第一候補が「き」、第二候補、第三候補はない。また
、図３に音素ラティス構造と共に示したのは、「でんし
けいさんき」と入力した際に、図１音響分析部１１によ
って分析された音声特徴の一部である音声パワーである
。以上の結果から、各音素認識結果の第一候補をつなげ
ると、認識結果は「でんちけいさんき」である。この場
合、入力音声は「でんしけいさんき」であるから、三番
目の文字「し」が「ち」と誤認識されてしまったことに
なる。操作者は図２表示部２６に表示された音声認識結
果を確認し、音声認識結果に変更の必要があることを、
訂正キーを用いて知らせる。図２訂正キー制御回路２８
は音声認識結果に誤りがあったことを認識し、直ちに第
２の音声として変更音素の入力を求める。[0010] In a speech recognition and correction device that recognizes speech uttered by dividing it into words, an operator uses an "electronic computer"
Assume that the word ``1'' is input as the first voice. When this input is received, the phoneme lattice stored in the phoneme lattice storage circuit 32 in FIG. 2 is as shown in FIG. ”
It is. The first candidate for the phoneme lattice structure of “n” is “n”
, the second candidate is "mu", and the third candidate is "u". Regarding the phoneme lattice structure of "shi", the first candidate is "chi" and the second candidate is "shi". In the phoneme lattice structure of "ke", the first candidate is "ke", and there is no second or third candidate. Regarding the phoneme lattice structure of "i", the first candidate is "i", the second candidate is "hi",
The third candidate is "shi". For the “sa” phoneme lattice structure, the first candidate is “sa”, the second candidate is “ha”, and the third candidate is “a”
It is. The first candidate for the phoneme lattice structure of “n” is “n”
, the second candidate is "mu". In the phoneme lattice structure of "ki", the first candidate is "ki", and there are no second or third candidates. Also shown in FIG. 3 together with the phoneme lattice structure is the voice power, which is part of the voice characteristics analyzed by the acoustic analysis unit 11 in FIG. From the above results, when the first candidates of each phoneme recognition result are connected, the recognition result is "Denchi Keisanki". In this case, since the input voice is "denshikeisanki", the third character "shi" is incorrectly recognized as "chi". The operator checks the voice recognition results displayed on the display section 26 in FIG. 2, and confirms that the voice recognition results need to be changed.
Notify using the correction key. Figure 2 Correction key control circuit 28
recognizes that there is an error in the speech recognition result and immediately requests input of the changed phoneme as the second speech.

【００１１】図４は本発明の音声認識訂正装置の一実施
例である単語毎に区切って発声した音声を認識する音声
認識訂正装置を構成する図１音素スポッティング部１４
の処理を説明する図である。図４では、第１の音声の特
徴パラメータ列の一つである音声パワーと音素ラティス
、第２の音声の特徴パラメータ列の一つである音声パワ
ーと音素ラティスを示して説明している。FIG. 4 shows an embodiment of the speech recognition and correction device of the present invention. FIG. 1 shows a phoneme spotting unit 14 that constitutes a speech recognition and correction device that recognizes speech uttered by dividing it into words.
It is a figure explaining the process. FIG. 4 shows and explains the voice power and phoneme lattice, which is one of the first voice feature parameter strings, and the voice power and phoneme lattice, which is one of the second voice feature parameter strings.

【００１２】操作者は誤って認識されてしまった音素そ
のもの、ここでは「ち」を第２の音声として入力する。これは、変更部分をカーソルなどで指定せずにスポッテ
ィングするためである。入力された「ち」は、図２音声
入力部２１、図２特徴抽出回路２２を経て特徴パラメー
タに変換される。特徴パラメータ列に変換された第２の
音声は、図２ＤＰマッチング回路２９において図２特徴
パラメータ列記憶回路２７に記憶されている第１の音声
の特徴パラメータとＤＰマッチングされる。第１の入力
音声の特徴パラメータ列と、第２の入力音声の特徴パラ
メータ列をＤＰマッチングすることで、第２の音声が第
１の音声の「ち」の部分であることが分かり、変更部分
の候補としてスポッティングされる。この結果をスポッ
ティング１とする。ここに示したように、一度特徴抽出
し音声認識した音素「ち」をスポッティングすることは
それほど困難なことではない。第２の入力音声を図２Ｄ
Ｐマッチング回路２３において、図２音素記憶辞書２４
とＤＰマッチングし、音声認識する。ここで、第２の音
声は「ち」と認識される。第２の音声の音声認識結果「
ち」を図２ＤＰマッチング回路３３において、図２音素
ラティス記憶回路３２とＤＰマッチングする。このＤＰ
マッチングの結果、「でんちけいさんき」の「ち」がス
ポッティングされる。これをスポッティング２とする。図２スポッティング結果比較回路３４において、変更部
分の候補としてスポッティング１とスポッティング２と
比較される。ここでは、スポッティング１が「ち」、ス
ポッティング２も「ち」で一致するので、変更部分とし
て「でんちけいさんき」の「ち」がスポッティングされ
る。このようにして、誤認識部分「ち」が変更必要な音
素として検出される。この結果を受け図２音素ラティス
入れ替え回路３０では、図２音素ラティス記憶回路３２
を操作し、変更部分としてスポッティングされた「ち」
の音声認識結果の第２候補である「し」を第１の候補に
順位入れ替えし、図２表示部制御回路２５の制御により
、図２表示部２６に表示する。幸い第二候補に正しい音
素「し」が存在するので、操作者は確定キーを用いて確
定し、音声認識結果を訂正する。以上の操作によって音
声認識結果の訂正を終了し、「でんしけいさんき（電子
計算機）」を得ることができる。[0012] The operator inputs the erroneously recognized phoneme itself, in this case ``chi'', as the second voice. This is to spot the changed part without specifying it with a cursor or the like. The input "chi" is converted into a feature parameter through the speech input section 21 in FIG. 2 and the feature extraction circuit 22 in FIG. The second voice converted into a feature parameter string is subjected to DP matching in the DP matching circuit 29 of FIG. 2 with the feature parameters of the first voice stored in the feature parameter string storage circuit 27 of FIG. By performing DP matching between the feature parameter string of the first input voice and the feature parameter string of the second input voice, it is found that the second voice is the "chi" part of the first voice, and the changed part is Spotted as a candidate. This result is called spotting 1. As shown here, it is not that difficult to spot the phoneme "chi" once its features have been extracted and voice recognized. Figure 2D shows the second input audio.
In the P matching circuit 23, the phoneme memory dictionary 24 shown in FIG.
and DP matching and voice recognition. Here, the second voice is recognized as "chi". Speech recognition result of second voice “
The DP matching circuit 33 in FIG. 2 performs DP matching on the phoneme lattice storage circuit 32 in FIG. This DP
As a result of the matching, the ``chi'' in ``Denchi Keisanki'' is spotted. This is called spotting 2. In the spotting result comparison circuit 34 shown in FIG. 2, spotting 1 and spotting 2 are compared as candidates for the changed portion. Here, since spotting 1 is "chi" and spotting 2 is also "chi", the "chi" of "Denchi Keisanki" is spotted as the changed part. In this way, the misrecognized part "chi" is detected as a phoneme that needs to be changed. In response to this result, the phoneme lattice exchange circuit 30 in FIG. 2 uses the phoneme lattice storage circuit 32 in FIG.
"chi" was spotted as a changed part by operating
The second candidate "shi" in the speech recognition result is replaced with the first candidate and displayed on the display section 26 in FIG. 2 under the control of the display section control circuit 25 in FIG. Fortunately, the correct phoneme "shi" exists as the second candidate, so the operator uses the confirmation key to confirm and correct the speech recognition result. By the above operations, the correction of the voice recognition result is completed and the "Electronic computer" can be obtained.

【００１３】図８、図２、図３、図４を参照しながら本
発明の一実施例である（実施例１）の処理過程を詳細に
説明する。図８は本発明の一実施例である単語毎に区切
って発声された音声を認識する音声認識訂正装置の処理
例を示したフローチャートである。[0013] The processing process of (Embodiment 1), which is an embodiment of the present invention, will be explained in detail with reference to FIGS. 8, 2, 3, and 4. FIG. 8 is a flowchart showing a processing example of a speech recognition and correction device that recognizes speech uttered by dividing it into words, which is an embodiment of the present invention.

【００１４】まず、操作者によって第１の音声が入力さ
れる。音声データの入力に係わるのは図２音声入力部２
１である。入力された音声は直ちに図２特徴抽出回路２
２において、分析、特徴抽出される。抽出された特徴は
、図２特徴パラメータ列記憶回路２７に記憶される。さらに、第１の音声の特徴パラメータ列は、図２ＤＰマ
ッチング回路２３において、図２音素辞書２４とＤＰマ
ッチングされ、符号列として音声認識される。この結果
は図２音素ラティス記憶回路３２に記憶される。第１の
音声が音声認識された結果は図２音素ラティス記憶回路
３２に記憶されるとともに、操作者が確認できるように
図２表示部２６に表示される。この表示に係わるのは図
２表示制御回路２５、および図２表示部２６である。表
示された音声認識結果の例は、図３に示してあるとおり
である。図２表示部２６に表示された音声認識結果に誤
りや変更の必要が生じた場合には、操作者は訂正キーを
用いて変更の必要があることを伝える。ここで用いられ
る訂正キーは、図２訂正キー制御回路２８によって制御
されているものである。変更の必要があった場合には、
直ちに変更部分を検出する必要があある。音声認識結果
の中から、変更部分をスポッティングするために第２の
音声として変更部分そのものを音声により入力する。第
２の音声として入力された変更部分は直ちに特徴抽出さ
れ特徴パラメータ列となり、図２ＤＰマッチング回路２
９において、図２特徴パラメータ列記憶回路２７に記憶
されている第１の音声の特徴パラメータ列とＤＰマッチ
ングされる。ＤＰマッチングにより変更部分の候補とし
てスポッティング１がスポッティングされる。さらに、
変更部のスポッティングを確実に行うために、第２の音
声を音声認識した結果と、図２音素ラティス記憶回路３
２に記憶されている第１の音声の音声認識結果とを、図
２ＤＰマッチング回路３３においてＤＰマッチングする
。その結果、変更部の候補としてスポッティング２がス
ポッティングされる。ようすは図４に示すとおりである
。そして、図２スポッティング結果比較回路３４におい
て、スポッティング１とスポッティング２が比較される
。この結果が一致していれば変更部分が正確にスポッテ
ィングされたとして、速やかに変更部分を変更する処理
に移行する。しかし、第１の音声と第２の音声の特徴抽
出した結果である特徴パラメータ列を用いて変更部分を
スポッティングしたスポッティング１と、第１の音声と
第２の音声の音声認識結果を用いて変更部をスポッティ
ングしたスポッティング２が異なっている場合には、図
２スポッティング結果比較回路３４において、スポッテ
ィング１とスポッティング２が検出された際のＤＰ値の
大きさを比較し、ＤＰ値のより大きい方を変更部分とし
て処理を進める。しかし、もし変更部分のスポッティン
グに誤りがあり、変更を希望する音素以外が変更されて
しまった場合には、操作者は表示部に表示された結果に
より確認し、訂正キーを用いて知らせる。すると、先ほ
ど、ＤＰ値が小さいために変更部分とならなかったもう
一方のスポッティング結果を変更部分として、変更処理
を進める。図２ＤＰマッチング回路２９、図２ＤＰマッ
チング回路３３、図２スポッティング結果比較回路３４
は、図１スポッティング部１４を構成する。スポッティ
ングされた変更部分について、図２音素ラティス記憶回
路３２の中に記憶されている第２候補の音素を図２音素
ラティス入れ替え回路３０によって、第１候補の音素と
して順位入れ替えし、その結果を図２表示部制御回路２
５の制御により、図２表示部２６に表示する。この処理
によって変更部が変更を希望する音素に変更されたなら
ば、図２確定キー制御回路３１に制御されている確定キ
ーを操作して音声認識結果を確定する。変更部分の第２
候補の音素が変更を希望する音素でない場合には、確定
キーが操作されないことを図２確定キー制御回路３１は
認識し、図２音素ラティス入れ替え回路３０はさらに下
位の優先順位を持った候補を第１の候補として順位入れ
替えし、先ほどと同様に表示する。操作者は、希望の音
素を見つけたところで、確定キーを操作して音声認識結
果を確定する。表示された第２、第３、それ以下の候補
の中に希望する音素が存在しなければ、改めて音声の入
力を行い、最初に音声を入力した際と同様な経路にて音
声認識を行う。この変更操作に係わるのは、図２音素ラ
ティス入れ替え回路３０と図２確定キー制御回路３１、
音素ラティス記憶回路３２、図２表示部制御回路２５、
図２表示部２６である。（実施例２）図５は本発明の一実施例である単語毎に区
切らずに連続して発声した音声を認識する連続音声認識
訂正装置のブロック図である。破線によって示した処理
経路は、第２の音声のものである。First, a first voice is input by the operator. The voice input section 2 in Figure 2 is involved in inputting voice data.
It is 1. The input voice is immediately processed by the feature extraction circuit 2 in Figure 2.
In step 2, analysis and feature extraction are performed. The extracted features are stored in the feature parameter string storage circuit 27 in FIG. Further, the first speech feature parameter string is subjected to DP matching with the phoneme dictionary 24 in FIG. 2 in the DP matching circuit 23 in FIG. 2, and speech recognition is performed as a code string. This result is stored in the phoneme lattice storage circuit 32 in FIG. The result of voice recognition of the first voice is stored in the phoneme lattice storage circuit 32 in FIG. 2, and is also displayed on the display section 26 in FIG. 2 so that the operator can confirm it. The display control circuit 25 shown in FIG. 2 and the display section 26 shown in FIG. 2 are involved in this display. An example of the displayed speech recognition results is as shown in FIG. If there is an error or a need to change the voice recognition result displayed on the display unit 26 in FIG. 2, the operator uses the correction key to notify the user of the need for change. The correction key used here is controlled by the correction key control circuit 28 shown in FIG. If it is necessary to make changes,
Changes need to be detected immediately. In order to spot the changed part from among the voice recognition results, the changed part itself is inputted as a second voice by voice. The changed part input as the second voice is immediately extracted as a feature and becomes a feature parameter string.
At step 9, DP matching is performed with the feature parameter string of the first voice stored in the feature parameter string storage circuit 27 of FIG. Spotting 1 is spotted as a candidate for the changed part by DP matching. moreover,
In order to ensure spotting of the changed part, the result of voice recognition of the second voice and the phoneme lattice storage circuit 3 in FIG.
DP matching circuit 33 performs DP matching on the voice recognition result of the first voice stored in DP matching circuit 33 in FIG. As a result, spotting 2 is spotted as a candidate for the changed portion. The situation is shown in Figure 4. Then, in the spotting result comparison circuit 34 shown in FIG. 2, spotting 1 and spotting 2 are compared. If the results match, it is assumed that the changed portion has been accurately spotted, and the process immediately proceeds to change the changed portion. However, spotting 1, which spots the changed part using the feature parameter sequence that is the result of extracting the features of the first voice and the second voice, and the change part using the voice recognition results of the first voice and the second voice. If spotting 2, which was used to spot the area, is different, the spotting result comparison circuit 34 shown in FIG. Proceed with the process as a changed part. However, if there is an error in spotting the changed part and a phoneme other than the desired change is changed, the operator confirms the result displayed on the display and notifies the user by using the correction key. Then, the other spotting result, which was not changed because the DP value was small, becomes the changed part and the change processing is continued. DP matching circuit 29 in FIG. 2, DP matching circuit 33 in FIG. 2, spotting result comparison circuit 34 in FIG.
constitutes the spotting section 14 in FIG. Regarding the spotted change portion, the second candidate phoneme stored in the phoneme lattice storage circuit 32 in FIG. 2 is rearranged as the first candidate phoneme by the phoneme lattice replacement circuit 30 in FIG. 2 Display control circuit 2
5, it is displayed on the display section 26 in FIG. When the change unit is changed to the desired phoneme by this process, the voice recognition result is confirmed by operating the confirm key controlled by the confirm key control circuit 31 in FIG. Second part of the change
If the candidate phoneme is not the phoneme you wish to change, the confirmation key control circuit 31 in FIG. 2 recognizes that the confirmation key is not operated, and the phoneme lattice replacement circuit 30 in FIG. 2 selects a candidate with a lower priority. Replace the order as the first candidate and display it in the same way as before. When the operator finds the desired phoneme, he or she operates the confirmation key to confirm the speech recognition result. If the desired phoneme does not exist among the displayed second, third, and lower candidates, the voice is input again and voice recognition is performed using the same route as when the voice was first input. The circuits involved in this change operation are the phoneme lattice exchange circuit 30 in FIG. 2 and the confirmation key control circuit 31 in FIG.
Phoneme lattice storage circuit 32, FIG. 2 display unit control circuit 25,
This is the display section 26 in FIG. (Embodiment 2) FIG. 5 is a block diagram of a continuous speech recognition and correction device that recognizes speech uttered continuously without dividing each word, which is an embodiment of the present invention. The processing path indicated by the dashed line is that of the second voice.

【００１５】第１の音声として入力された音声は、図１
音響分析部１１の構成要素であるマイク、高域強調フィ
ルタ、ＡＤ変換器より構成される図５音声入力部４１に
よって８ＫＨｚ、１２ｂｉｔｓのデジタル信号としてサ
ンプリングされる。更に同じく図１音響分析部１１の構
成要素である図５特徴抽出回路４２において、デジタル
信号に変換された音声信号を１６ｍｓ区間を１フレーム
として１フレーム毎に周波数変換し、周波数領域での特
徴パラメータを抽出し、発声された単語の特徴パラメー
タ列として表される。図５特徴抽出回路４２で抽出され
た入力音声の特徴パラメータ列は、図５特徴パラメータ
列記憶回路４７に記憶される。図５特徴パラメータ列記
憶回路４７は図１音響分析記憶部１６を構成する。第１
の音声の特徴パラメータ列は、図５連続ＤＰマッチング
回路４３において、図５単語記憶辞書４４と連続ＤＰマ
ッチングされ、符号列として音声認識され、図５単語ラ
ティス記憶回路５２に記憶される。第１の音声の音声認
識結果を符号列として記憶する図５単語ラティス記憶回
路５２は、図１音声認識記憶部１３を構成している。第
１の音声の音声認識結果は図５単語ラティス記憶回路５
２に記憶されるとともに、図５音声合成回路４５によっ
て音声合成され、図５音声出力制御回路４６の制御によ
りスピーカーから出力される。操作者は、音声により出
力された第１の音声の音声認識結果を聞き、音声認識結
果に誤りがあった場合、または変更したい部分が生じた
場合には、操作者は訂正キーに触れる等の行為によって
認識結果変更の必要を知らせる。図５に破線で示されて
いるのは、第２の入力音声の処理を示したものである。第２の音声として入力された単語は、第１の音声と同様
に、図５音声入力部４１、図５特徴抽出回路４２を経て
特徴パラメータ列に変換される。変更が必要な部分とし
て入力された第２の音声の特徴パラメータ列は、図５連
続ＤＰマッチング回路４９において、図５特徴パラメー
タ列記憶回路４７に記憶されている第１の音声の特徴パ
ラメータ列と連続ＤＰマッチングされる。誤って認識さ
れてしまった部分や、変更を施す部分を第２の音声とし
て入力することによって、第１の音声の特徴パラメータ
列と比較し、一致する部分を探すことで、音声認識結果
の変更部分をスポッティングしているのである。この結
果を変更部分の候補としてスポッティング１とする。さ
らに、変更部分のスポッティングを確実にするために、
第２の音声も第１の音声と同様に第５連続ＤＰマッチン
グ回路４３において、図５単語記憶辞書４４と連続ＤＰ
マッチングし、音声認識する。この第２の音声を音声認
識した結果と、図５単語ラティス記憶回路５２に記憶さ
れている第１の音声の音声認識結果とを、図５連続ＤＰ
マッチング回路５３において、連続ＤＰマッチングする
。これは、音声認識結果を用いて変更部分のスポッティ
ングをする処理である。この処理により得られた変更部
分をスポッティング２とする。次に、図５スポッティン
グ結果比較回路５４において、先に図５特徴パラメータ
列記憶回路４７に記憶されている第１の音声の特徴パラ
メータ列と第２の音声の特徴パラメータ列を用いて変更
部分をスポッティングした結果、スポッティング１と、
図５音素ラティス記憶回路５２に記憶されている第１の
音声の音声認識結果と第２の音声の音声認識結果を用い
て変更部分をスポッティングした結果、スポッティング
２とを比較する。図５スポッティング結果比較回路５４
において比較した結果、スポッティング１とスポッティ
ング２が同一ならば、確実に変更部分がスポッティング
されたとして、次の処理に移る。もし、スポッティング
１とスポッティング２が一致していない場合には、スポ
ッティング１、スポッティング２が検出された際のＤＰ
値の大きい方を変更部と決定する。もし変更部分が誤っ
てスポッティングされてしまった場合には、操作者は変
更された結果がスピーカーから流れるのを聞き、知るこ
とが出来る。そのような場合には、直ちに図５訂正キー
制御回路４８に制御されている訂正キーを操作して、先
に図５スポッティング結果比較回路５４において他のス
ポッティング結果よりも、ＤＰ値が小さいために変更部
分の候補に選ばれなかったもう一方のスポッティング結
果を変更部分として、変更部分の変更操作は再び進める
ことが出来る。図５連続ＤＰマッチング回路４９、図５
連続ＤＰマッチング回路５３、図５スポッティング５４
は、図１スポッティング部１４を構成する。スポッティ
ングされた変更部分について、図５単語ラティス記憶回
路５２の中に記憶されている第２候補の単語を図５単語
ラティス入れ替え回路５０によって、第１候補の単語と
して順位入れ替えし、その結果を図５音声合成回路４５
において音声合成し図５音声出力制御回路４６の制御に
よってスピーカーから音声出力する。この処理によって
変更部が変更を希望する単語に変更されたならば、図５
確定キー制御回路５１に制御されている確定キーを操作
して音声認識結果を確定する。変更部分の第２候補の単
語が変更を希望する単語でない場合には、確定キーが操
作されないことを図５確定キー制御回路５０は認識し、
図５単語ラティス入れ替え回路５０はさらに下位の優先
順位を持った候補を第１の候補として順位入れ替えし、
先ほどと同様に音声出力する。操作者は、希望の単語を
見つけたところで、確定キーを操作して音声認識結果を
確定する。表示された第２、第３、それ以下の候補の中
に希望する単語が存在しなければ、改めて音声の入力を
行い、最初に音声を入力した際と同様な経路にて音声認
識を行う。図５単語ラティス入れ替え回路５０、図５確
定キー制御回路５１は図１変更部１５を構成する。The voice input as the first voice is shown in FIG.
The sound is sampled as an 8 KHz, 12 bits digital signal by the audio input section 41 in FIG. Furthermore, in the feature extraction circuit 42 in FIG. 5, which is also a component of the acoustic analysis unit 11 in FIG. is extracted and expressed as a string of characteristic parameters of the uttered word. The feature parameter string of the input voice extracted by the feature extraction circuit 42 in FIG. 5 is stored in the feature parameter string storage circuit 47 in FIG. The characteristic parameter string storage circuit 47 in FIG. 5 constitutes the acoustic analysis storage section 16 in FIG. 1st
The speech feature parameter string is subjected to continuous DP matching with the word storage dictionary 44 in FIG. 5 in the continuous DP matching circuit 43 in FIG. The word lattice storage circuit 52 shown in FIG. 5, which stores the speech recognition result of the first speech as a code string, constitutes the speech recognition storage section 13 shown in FIG. The speech recognition result of the first speech is shown in FIG. 5. Word lattice storage circuit 5
2, the voice is synthesized by the voice synthesis circuit 45 in FIG. 5, and outputted from the speaker under the control of the voice output control circuit 46 in FIG. The operator listens to the voice recognition result of the first voice output by voice, and if there is an error in the voice recognition result or there is a part that needs to be changed, the operator can touch the correction key etc. Notify the need to change the recognition result by action. The broken line in FIG. 5 shows the processing of the second input voice. Similar to the first voice, the word input as the second voice is converted into a feature parameter string through the voice input section 41 in FIG. 5 and the feature extraction circuit 42 in FIG. The second voice feature parameter string input as the part that needs to be changed is combined in the continuous DP matching circuit 49 in FIG. 5 with the first voice feature parameter string stored in the feature parameter string storage circuit 47 in FIG. Continuous DP matching. By inputting the incorrectly recognized part or the part to be changed as the second voice, it is compared with the feature parameter string of the first voice, and by searching for a matching part, the voice recognition result can be changed. It's spotting the parts. This result is set as spotting 1 as a candidate for the changed portion. Additionally, to ensure spotting of changes,
Similarly to the first voice, the second voice is also connected to the word memory dictionary 44 in FIG.
Match and voice recognition. The result of voice recognition of this second voice and the voice recognition result of the first voice stored in the word lattice storage circuit 52 in FIG.
A matching circuit 53 performs continuous DP matching. This is a process of spotting changed parts using the voice recognition results. The changed portion obtained by this process is referred to as spotting 2. Next, the spotting result comparison circuit 54 in FIG. 5 uses the first voice feature parameter string and the second voice feature parameter string previously stored in the feature parameter string storage circuit 47 in FIG. As a result of spotting, spotting 1,
The results of spotting the changed portion using the voice recognition results of the first voice and the voice recognition results of the second voice stored in the phoneme lattice storage circuit 52 in FIG. 5 are compared with spotting 2. FIG. 5 Spotting result comparison circuit 54
As a result of the comparison, if spotting 1 and spotting 2 are the same, it is assumed that the changed portion has been definitely spotted, and the process moves on to the next step. If spotting 1 and spotting 2 do not match, the DP when spotting 1 and spotting 2 are detected
The one with the larger value is determined to be the changed part. If a changed part is accidentally spotted, the operator can hear the changed result played from the speaker and know. In such a case, the correction key controlled by the correction key control circuit 48 in FIG. 5 is immediately operated, and the spotting result comparing circuit 54 in FIG. The other spotting result that was not selected as a candidate for the changed part can be used as the changed part, and the changing operation for the changed part can be proceeded again. Figure 5 Continuous DP matching circuit 49, Figure 5
Continuous DP matching circuit 53, Figure 5 spotting 54
constitutes the spotting section 14 in FIG. Regarding the spotted changed portion, the second candidate word stored in the word lattice storage circuit 52 in FIG. 5 is rearranged as the first candidate word by the word lattice replacement circuit 50 in FIG. 5 Speech synthesis circuit 45
The synthesized sound is synthesized at the step 5 and outputted from the speaker under the control of the sound output control circuit 46 shown in FIG. If the change section is changed to the desired word by this process, then
A confirmation key controlled by the confirmation key control circuit 51 is operated to confirm the voice recognition result. If the second candidate word of the change part is not the word desired to be changed, the confirmation key control circuit 50 in FIG. 5 recognizes that the confirmation key is not operated,
The word lattice replacement circuit 50 in FIG. 5 replaces the candidate with a lower priority as the first candidate,
Output the audio as before. When the operator finds the desired word, he or she operates the confirmation key to confirm the speech recognition result. If the desired word does not exist among the displayed second, third, and lower candidates, the user inputs the voice again and performs voice recognition using the same route as when the voice was input for the first time. The word lattice exchange circuit 50 in FIG. 5 and the confirmation key control circuit 51 in FIG. 5 constitute the changing unit 15 in FIG.

【００１６】本発明について、本発明の（実施例２）に
基づいて、図６、図７を用いて更に説明する。The present invention will be further explained based on (Embodiment 2) of the present invention using FIGS. 6 and 7.

【００１７】図６は本発明の音声認識訂正装置の一実施
例である単語毎に区切らずに発生した音声を認識する認
識訂正装置を構成する図１音声認識記憶部内の単語ラテ
ィス構造および、図１音響分析記憶部内の一部のデータ
として音声の音声パワーを示す図である。FIG. 6 shows an embodiment of the speech recognition and correction device of the present invention, which constitutes a recognition and correction device that recognizes speech generated without dividing each word. 1 is a diagram showing the audio power of audio as part of data in an acoustic analysis storage unit.

【００１８】操作者が第１の音声として「今日の天気は
晴れです。」という文章を入力したと仮定する。この入
力を受けた際の図５単語ラティス記憶回路５２における
単語タティス構造は図６に示したとおり、「今日」の単
語ラティス構造は第一候補「今日」、第二候補第三候補
はない。「の」の単語ラティス構造は第一候補「の」、
第二候補「も」である。「天気」の単語ラティス構造は
第一候補「天気」、第二候補「天使」である。また、「
は」の単語ラティス構造は第一候補「は」、第二候補「
あ」である。また、「晴れ」の単語ラティス構造は、第
一候補「針」第二候補「晴れ」、第三候補「橋」である
。同様に「です」についての単語ラティス構造は、第一
候補「です」、第二候補「でぶ」である。この第１の入
力音声「今日の天気は晴れです。」の抽出された特徴の
一種である音声パワーは、図６に示した通りである。この場合、「晴れ」が「針」に誤認識されてしまったこ
とになる。音声認識結果は図５音声合成回路４５により
音声合成され、図５音声出力制御回路４６の制御により
スピーカーから出力される。操作者は音声によって伝え
られる「今日の天気は針です。」という音声認識結果を
聞き、音声認識結果に変更の必要があることを知り、訂
正キーを用いて知らせる。図５訂正キー制御回路４８は
音声認識結果に変更の必要があることを認識し、直ちに
第２の音声として変更部分の入力を求める体制を整える
。Assume that the operator inputs the sentence "Today's weather is sunny." as the first voice. When this input is received, the word lattice structure in the word lattice storage circuit 52 of FIG. 5 is shown in FIG. 6, as shown in FIG. 6, the word lattice structure of "Today" is the first candidate "Today", and there is no second candidate or third candidate. The word lattice structure of “no” is the first candidate “no”,
The second candidate is "mo". The word lattice structure of "weather" has the first candidate "weather" and the second candidate "angel."Also,"
The word lattice structure of ``wa'' is the first candidate ``wa'' and the second candidate ``wa''.
It's "A". Further, the word lattice structure for "hare" is the first candidate "needle", the second candidate "hare", and the third candidate "hashi". Similarly, the word lattice structure for "desu" is the first candidate "desu" and the second candidate "fat". The voice power, which is a type of extracted feature, of the first input voice "Today's weather is sunny." is as shown in FIG. In this case, "sunny" is mistakenly recognized as "needle". The speech recognition result is synthesized by the speech synthesis circuit 45 shown in FIG. 5, and outputted from the speaker under the control of the speech output control circuit 46 shown in FIG. The operator listens to the voice recognition result of ``Today's weather is needles.'', learns that the voice recognition result needs to be changed, and uses the correction key to notify the operator. The correction key control circuit 48 in FIG. 5 recognizes that the voice recognition result needs to be changed, and immediately prepares a system to request input of the changed part as a second voice.

【００１９】図７は本発明の音声認識訂正装置の一実施
例である（実施例２）に基づき、単語毎に区切らずに発
生した音声を認識する音声認識訂正装置を構成する図１
音素スポッティング部１４の処理を説明する図である。図７では、第１の音声の特徴パラメータ列の一つである
音声パワーと単語ラティスおよび、第２の音声の特徴パ
ラメータ列の一つである音声パワーと音声認識結果を示
して説明している。FIG. 7 shows an embodiment of the speech recognition and correction apparatus of the present invention. Based on (Embodiment 2), FIG.
FIG. 3 is a diagram illustrating processing of the phoneme spotting unit 14. FIG. FIG. 7 shows and explains the speech power and word lattice, which is one of the first speech feature parameter strings, and the speech power and speech recognition result, which is one of the second speech feature parameter strings. .

【００２０】操作者は誤って認識されてしまった単語そ
のもの、「針」を第２の音声として入力する。これは、
変更部分をスポッティングするためである。入力された
第２の音声「針」は図５音声入力部４１、図５特徴抽出
回路４２を経て特徴パラメータ列に変換され、図５特徴
パラメータ列記憶回路４７に記憶されている第１の入力
音声の特徴パラメータ列と図５連続ＤＰマッチング回路
４９において連続ＤＰマッチングされる。その結果は、
図７に示したように「針」が変更部分の候補としてスポ
ッティングされたのである。「針」をスポッティング１
とする。ここに示したように、一度特徴抽出し音声認識
した単語「針」をスポッティングすることはそれほど困
難なことではない。第２の入力音声を図５連続ＤＰマッ
チング回路４３において、図５音素記憶辞書４４と連続
ＤＰマッチングし音声認識する。こここ、仮に第２の音
声は「天気」と認識されたとする。第２の音声の音声認
識結果「天気」を図５連続ＤＰマッチング回路５３にお
いて、図５単語ラティス記憶回路５２と連続ＤＰマッチ
ングする。この連続ＤＰマッチングの結果、「今日の天
気は針です。」の「天気」がスポッティングされる。こ
れをスポッティング２とする。図５スポッティング結果
比較回路５４において、変更部分の候補としてスポッテ
ィング１とスポッティング２と比較される。ここでは、
スポッティング１が「針」、スポッティング２は「天気
」で一致しないので、スポッティング１とスポッティン
グ２が連続ＤＰマッチングによりスポッティングされた
際のＤＰマッチング値を比較する。するとスポッティン
グ１のＤＰ値はスポッティング２のＤＰ値より大きいこ
とが分かる。ここで、スポッティング１が変更部分の候
補となり、「今日の天気は針です。」の「針」がスポッ
ティングされる。このようにして、誤認識部分「針」が
変更必要な単語として検出される。この結果を受け図５
単語ラティス入れ替え回路５０では、図５単語ラティス
記憶回路５２を操作し、変更部分としてスポッティング
された「針」の音声認識結果の第２候補である「晴れ」
を第１の候補に順位入れ替えし、その結果は図５音声合
成回路４５において、音声合成され、図５音声出力制御
部４６の制御によりスピーカーから音声出力される。幸
い第二候補に正しい単語「晴れ」が存在して、スピーカ
ーより「晴れ」という音声出力を聞き、操作者は確定キ
ーを用いて確定し、音声認識結果を訂正する。以上の操
作によって音声認識結果の訂正を終了し、「今日の天気
は晴れです。」を得ることができる。仮に、図５スポッ
ティング結果比較回路５４において、スポッティング１
（針）とスポッティング２（天気）のＤＰ値を比較した
際に、スポッティング２の方がＤＰ値が大きければ、変
更部分としてスポッティング２８天気）が選ばれる。変
更部分スポッティング以降の変更処理は、「天気」につ
いて行われる。この場合を引用して、変更部分のスポッ
ティングが誤ってしまった場合の例を説明する。図５単
語ラティス入れ替え回路５０は、図５単語ラティス記憶
回路５２に記憶されている「天気」の第２候補は「天使
」を順位入れ替えし第１の候補として、図５音声合成回
路４５に出力し、図５音声合成回路４５において音声合
成され、図５音声出力制御回路４６の制御によりスピー
カーから「天使」と流れたのを聞き、変更部分のスポッ
ティングに失敗したことを知る。そこで、変更部分のス
ポッティングに誤りがあったことを、図５訂正キー制御
回路４８によって制御されている訂正キーを用いて知ら
せる。そこで、図５スポッティング結果比較回路５４に
おけるＤＰ値の比較で、ＤＰ値が小さいとして変更部分
の候補になれなかった「針」を、変更部分として図５単
語ラティス入れ替え回路５０に送る。図５単語ラティス
入れ替え回路５０では、図５単語ラティス記憶回路５２
に記憶されている「針」の第２候補である「晴れ」第１
の候補として順位入れ替えを行う。図５音声合成回路４
５では「晴れ」を音声合成し、図５音声出力制御回路４
６の制御により「晴れ」と音声出力する。操作者は「晴れ」という音声を聞き、希望していた変更
部分が希望していた認識結果に変更されたことを認識し
、図５確定キー制御回路５１に制御されている確定キー
を用い結果を訂正する。以上の操作によって、変更部分
のスポッティングを誤ってしまったとしても、希望する
音声認識結果「今日の天気は晴れです。」を得ることが
できる。[0020] The operator inputs the erroneously recognized word itself, ``needle'', as the second voice. this is,
This is for spotting changed parts. The input second voice "needle" is converted into a feature parameter string through the voice input section 41 in FIG. 5 and the feature extraction circuit 42 in FIG. Continuous DP matching is performed with the speech feature parameter string in the continuous DP matching circuit 49 of FIG. The result is
As shown in FIG. 7, the "needle" was spotted as a candidate for the changed part. Spotting the “needle” 1
shall be. As shown here, it is not that difficult to spot the word "needle" once features have been extracted and voice recognized. The second input voice is subjected to continuous DP matching with the phoneme memory dictionary 44 in FIG. 5 in the continuous DP matching circuit 43 in FIG. 5 for speech recognition. Assume here that the second voice is recognized as "weather." The speech recognition result of the second speech "weather" is subjected to continuous DP matching with the word lattice storage circuit 52 of FIG. 5 in the continuous DP matching circuit 53 of FIG. As a result of this continuous DP matching, "weather" in "Today's weather is needles" is spotted. This is called spotting 2. In the spotting result comparison circuit 54 in FIG. 5, spotting 1 and spotting 2 are compared as candidates for the changed portion. here,
Since spotting 1 is "needle" and spotting 2 is "weather" and they do not match, the DP matching values when spotting 1 and spotting 2 are spotted by continuous DP matching are compared. Then, it can be seen that the DP value of spotting 1 is larger than the DP value of spotting 2. Here, spotting 1 becomes a candidate for the change part, and the "needle" in "Today's weather is needles." is spotted. In this way, the erroneously recognized part "needle" is detected as a word that needs to be changed. Based on this result, Figure 5
The word lattice replacement circuit 50 operates the word lattice storage circuit 52 shown in FIG.
is replaced with the first candidate, and the result is voice synthesized in the voice synthesis circuit 45 in FIG. 5, and the voice is output from the speaker under the control of the voice output control section 46 in FIG. Fortunately, the correct word ``Hare'' exists in the second candidate, and the operator hears the voice output ``Hare'' from the speaker, confirms it using the Confirm key, and corrects the voice recognition result. By the above operations, the correction of the speech recognition result is completed and "Today's weather is sunny." can be obtained. Suppose that in the spotting result comparison circuit 54 in FIG.
(needle) and Spotting 2 (weather), if Spotting 2 has a larger DP value, then Spotting 28 (Weather) is selected as the changed part. The change processing after the changed part spotting is performed for "weather". Referring to this case, an example of a case where spotting of a changed part is incorrect will be explained. The word lattice replacement circuit 50 in FIG. 5 rearranges the ranking of "angel" as the second candidate for "weather" stored in the word lattice storage circuit 52 in FIG. 5, and outputs it as the first candidate to the speech synthesis circuit 45 in FIG. However, the voice is synthesized in the voice synthesis circuit 45 in FIG. 5, and when the user hears "angel" being played from the speaker under the control of the voice output control circuit 46 in FIG. 5, he knows that spotting of the changed part has failed. Therefore, the correction key controlled by the correction key control circuit 48 in FIG. 5 is used to notify that there is an error in spotting the changed portion. Accordingly, in the comparison of the DP values in the spotting result comparison circuit 54 in FIG. 5, "needle", which could not be a candidate for the changed part due to its small DP value, is sent to the word lattice replacement circuit 50 in FIG. 5 as the changed part. In the word lattice exchange circuit 50 in FIG. 5, the word lattice storage circuit 52 in FIG.
The second candidate for "Needle" stored in "Hare" is the first.
The ranking will be changed as a candidate. Figure 5 Speech synthesis circuit 4
5, the speech output of "Hare" is synthesized, and the speech output control circuit 4 in FIG.
6 outputs a voice saying "Sunny". The operator hears the voice "Sunny", recognizes that the desired change part has been changed to the desired recognition result, and uses the confirmation key controlled by the confirmation key control circuit 51 in FIG. 5 to confirm the result. Correct. Through the above operations, even if the changed part is spotted incorrectly, the desired speech recognition result "Today's weather is sunny." can be obtained.

【００２１】尚、（実施例１）、（実施例２）では音声
入力部として、マイク、高域強調フィルタ、ＡＤ変換器
より構成し、８ＫＨｚ、１２ｂｉｔｓのデジタル信号と
してサンプリングしたものを用いたが、迅速に入力音声
をサンプリングできるものであれば、それ以外の構成で
あってもかまわない。また、特徴抽出回路では、デジタ
ル信号に変換された音声信号を１６ｍｓ区間を１フレー
ムとして、１フレーム毎に周波数変換し、周波数領域で
の特徴パラメータを抽出し、発声された単語の特徴パラ
メータ列として表す方法を用いたが、これ以外の方法で
あっても特徴を的確に抽出できる方法であればかまわな
い。また、音声認識結果を操作者に知らせる手段として
、（実施例１）では表示部に音声認識結果を表示する方
法を用いた。また、（実施例２）では音声認識結果を音
声合成により生成し、合成音声として出力し操作者に知
らせる方法を用いたが、これら以外の方法であっても、
音声認識結果を迅速に操作者に知らせることが出来る方
法であれば構わない。また、変更希望の単語や音素を確
定するために、ここでは確定キーを用いたが、それ以外
の音声による確定等であっても、正確に確定操作が出来
る方法であれば、それ以外の方法であっても構わない。In (Example 1) and (Example 2), the audio input section consisted of a microphone, a high-frequency emphasis filter, and an AD converter, and was sampled as an 8 KHz, 12-bit digital signal. , other configurations may be used as long as the input audio can be sampled quickly. In addition, the feature extraction circuit converts the frequency of the audio signal converted into a digital signal for each frame, with a 16 ms interval as one frame, extracts feature parameters in the frequency domain, and generates a string of feature parameters of the uttered word. Although this method is used, other methods may be used as long as they can accurately extract the features. Furthermore, as a means for notifying the operator of the voice recognition results, in (Embodiment 1) a method of displaying the voice recognition results on the display unit was used. In addition, in (Example 2), a method was used in which the voice recognition result was generated by voice synthesis and output as a synthesized voice to notify the operator, but other methods may also be used.
Any method may be used as long as it can quickly notify the operator of the voice recognition results. In addition, in order to confirm the word or phoneme that you wish to change, we used the confirm key here, but you can use any other method, such as confirming by voice, as long as it allows you to confirm the word or phoneme accurately. It doesn't matter.

【００２２】[0022]

【発明の効果】以上述べてきたように本発明の音声認識
訂正装置は、入力された音声認識結果の変更にあたって
、カーソルを移動して変更部分の指定をする必要がなく
、音声により変更部分を再入力することによって、極め
て速やかに変更部分の指定を行い変更することが出来る
。そのため、雑音等による音声認識装置の使用環境の悪
化や、音声認識装置に入力を行う操作者の体調等により
、音声認識結果に頻繁に誤認識が生じ得るような場合に
も、音声認識訂正のための特別な操作や知識を必要とせ
ず、音声入力操作と同様な操作では訂正が可能となり、
操作者への負担が軽減され作業効率も著しく改善された
。Effects of the Invention As described above, the speech recognition correction device of the present invention eliminates the need to move the cursor to specify the changed part when changing the input speech recognition result, and allows the changed part to be changed by voice. By re-entering the information, you can specify and change the changed part very quickly. Therefore, even when voice recognition errors may occur frequently due to deterioration of the usage environment of the voice recognition device due to noise, etc., or due to the physical condition of the operator inputting input to the voice recognition device, etc., voice recognition correction can be performed. Corrections can be made using operations similar to voice input operations, without requiring any special operations or knowledge.
The burden on the operator has been reduced and work efficiency has been significantly improved.

[Brief explanation of the drawing]

【図１】本発明の音声認識訂正装置の原理ブロック図で
ある。FIG. 1 is a principle block diagram of a speech recognition and correction device according to the present invention.

【図２】本発明の一実施例のブロック図である。FIG. 2 is a block diagram of one embodiment of the invention.

【図３】本発明の一実施例の音響分析記憶部内に記憶さ
れている第１の音声の特徴の一種である音声パワー及び
、音声認識記憶部内に記憶されている第１の音声の音素
ラティス構造３を示す図である。FIG. 3 shows the voice power, which is a type of feature of the first voice stored in the acoustic analysis storage unit, and the phoneme lattice of the first voice stored in the voice recognition storage unit, according to an embodiment of the present invention; FIG. 3 is a diagram showing structure 3;

【図４】本発明の一実施例の変更部スポッティング部及
び変更部を説明する図である。FIG. 4 is a diagram illustrating a changing unit spotting unit and a changing unit according to an embodiment of the present invention.

【図５】本発明の一実施例のブロック図である。FIG. 5 is a block diagram of one embodiment of the present invention.

【図６】本発明の一実施例の音響分析記憶部内に記憶さ
れている第１の音声の特徴の一種である音声パワー及び
、音声認識記憶部内に記憶されている第１の音声の単語
ラティス構造を示す図である。FIG. 6 shows the voice power, which is a type of feature of the first voice stored in the acoustic analysis storage unit, and the word lattice of the first voice stored in the voice recognition storage unit according to an embodiment of the present invention; It is a figure showing a structure.

【図７】本発明の一実施例の変更部スポッティング部及
び変更部を説明する図である。FIG. 7 is a diagram illustrating a changing unit spotting unit and a changing unit according to an embodiment of the present invention.

【図８】本発明の一実施例の処理を説明する図である。FIG. 8 is a diagram illustrating processing according to an embodiment of the present invention.

【図９】従来の音声認識訂正装置のブロック図である。FIG. 9 is a block diagram of a conventional speech recognition correction device.

[Explanation of symbols]

１　　音響分析部２　　音声認識部３　　記憶部４　　カーソル指示部５　　変更部１１　　音響分析部１２　　音声認識部１３　　音声認識記憶部１４　　音声スポッティング部１５　　変更部１６　　音響分析記憶部２１　　音声入力部２２　　特徴抽出回路２３　　ＤＰマッチング回路２４　　音素記憶辞書２５　　表示部制御回路２６　　表示部２７　　特徴パラメータ列記憶回路２８　　訂正キー制御回路２９　　ＤＰマッチング回路３０　　音素ラティス入れ替え回路３１　　確定キー制御回路３２　　音素ラティス制御回路３３　　ＤＰマッチング回路３４　　スポッティング結果比較回路４１　　音声入力部４２　　特徴抽出回路４３　　連続ＤＰマッチング回路４４　　単語記憶辞書４５　　音声合成回路４６　　音声出力制御回路４７　　特徴パラメータ列記憶回路４８　　訂正キー制御回路４９　　連続ＤＰマッチング回路５０　　単語ラティス入れ替え回路５１　　確定キー制御回路５２　　単語ラティス記憶回路５３　　連続ＤＰマッチング回路５４　　スポッティング結果比較回路 1. Acoustic analysis department 2 Speech recognition section 3. Storage section 4 Cursor instruction section 5　Change section 11 Acoustic analysis department 12 Speech recognition section 13 Voice recognition memory unit 14 Audio spotting section 15 Change section 16 Acoustic analysis storage unit 21 Audio input section 22 Feature extraction circuit 23 DP matching circuit 24 Phoneme memory dictionary 25 Display control circuit 26 Display section 27 Feature parameter string storage circuit 28 Correction key control circuit 29 DP matching circuit 30 Phoneme lattice replacement circuit 31 Confirmation key control circuit 32 Phoneme lattice control circuit 33 DP matching circuit 34 Spotting result comparison circuit 41 Audio input section 42 Feature extraction circuit 43 Continuous DP matching circuit 44 Word memory dictionary 45 Speech synthesis circuit 46 Audio output control circuit 47 Feature parameter string storage circuit 48 Correction key control circuit 49 Continuous DP matching circuit 50 Word lattice replacement circuit 51 Confirmation key control circuit 52 Word lattice memory circuit 53 Continuous DP matching circuit 54 Spotting result comparison circuit

Claims

[Claims]

1. An acoustic analysis unit that outputs characteristics of an input first voice, a speech recognition unit that converts the output of the acoustic analysis unit into a code string, and an acoustic analysis unit that stores the output of the acoustic analysis unit. a storage unit, a voice recognition storage unit that stores the output of the voice recognition unit; and a voice recognition storage unit that stores the output of the voice recognition unit, and compares the input second voice with the data in the acoustic analysis storage unit and the voice recognition storage unit,
A speech recognition and correction device comprising: a spotting section that detects a portion corresponding to the second speech from data of the speech; and a changing section that changes a portion of the code string that corresponds to the corresponding section.