JP3104659B2

JP3104659B2 - Speech input device and machine-readable recording medium recording program

Info

Publication number: JP3104659B2
Application number: JP09315933A
Authority: JP
Inventors: 清一三木
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1997-10-31
Filing date: 1997-10-31
Publication date: 2000-10-30
Anticipated expiration: 2017-10-31
Also published as: JPH11133994A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声入力装置に関
し、特にユーザの発声内容に適応することで認識誤りを
低減させる機能を有する音声入力装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice input device, and more particularly to a voice input device having a function of reducing recognition errors by adapting to the contents of a user's utterance.

【０００２】[0002]

【従来の技術】文書を音声により入力する場合、大規模
なテキストデータベースから単語の生起順序に関する統
計的言語モデルを作成し、それを音声認識で利用するこ
とで、認識性能を向上させる手法が従来から知られてい
る。しかし、上記したようにして作成される統計的言語
モデルは汎用的なものであり、例えばユーザが異なった
り、認識タスク（発声内容）が統計的言語モデルの作成
に用いたテキストデータベースと異なったりする場合は
有効性が低かった。2. Description of the Related Art In the case of inputting a document by voice, a method of improving a recognition performance by creating a statistical language model relating to the order of occurrence of words from a large-scale text database and using the same in voice recognition has been used. Known from. However, the statistical language model created as described above is a general-purpose one. For example, the user is different, or the recognition task (speech content) is different from the text database used to create the statistical language model. The case was less effective.

【０００３】そこで、このような問題点を解決するた
め、特開平４−２９１３９９号公報のように、認識タス
クに類似したテキストデータベースから作成した学習用
言語モデルを用いて汎用の統計的言語モデルを適応化
し、この適応化された統計的言語モデルを利用して音声
認識を行うようにした技術も従来から提案されている。[0003] In order to solve such a problem, a general-purpose statistical language model is created by using a learning language model created from a text database similar to a recognition task as disclosed in Japanese Patent Laid-Open No. Hei 4-291399. Techniques for adapting and performing speech recognition using the adapted statistical language model have been proposed in the past.

【０００４】図７はこの種の従来技術のブロック図であ
る。FIG. 7 is a block diagram of this kind of the prior art.

【０００５】標準パターンメモリ１１５には、予め学習
された標準パターンが複数格納されている。汎用的な統
計的言語モデル１１６は、汎用的なテキストデータベー
スに基づいて作成されるものであり、単語の生起順序を
表す。学習用言語モデル１１７は、認識タスクに類似し
たテキストデータベースから作成されるものである。適
応型統計的言語モデル１１８は、学習用言語モデル１１
７を用いて汎用的な統計的言語モデル１１６を適応化し
たものである。適応化の手法としては、例えば、削除補
間法がある。具体的には、汎用的な統計的言語モデル１
１６をＰ、学習用言語モデル１１７をＱ、適応型統計的
言語モデル１１８をＲとすると、適応型統計的言語モデ
ルＲは、Ｒ＝λ×Ｐ＋（１−λ）×Ｑで表される。ここで、０≦λ≦１である。このように、
ＰとＱとを混合することで、適応型統計的言語モデル１
１８が得られる。The standard pattern memory 115 stores a plurality of standard patterns learned in advance. The general-purpose statistical language model 116 is created based on a general-purpose text database and represents the order in which words occur. The learning language model 117 is created from a text database similar to the recognition task. The adaptive statistical language model 118 corresponds to the learning language model 11.
7, a general-purpose statistical language model 116 is adapted. As an adaptation method, for example, there is a deletion interpolation method. Specifically, general-purpose statistical language model 1
Assuming that 16 is P, the learning language model 117 is Q, and the adaptive statistical language model 118 is R, the adaptive statistical language model R is represented by R = λ × P + (1−λ) × Q. Here, 0 ≦ λ ≦ 1. in this way,
By mixing P and Q, the adaptive statistical language model 1
18 are obtained.

【０００６】音声信号入力端子１１１から入力された音
声は、特徴抽出部１１２に於いてディジタル信号に変換
され、更にＬＰＣケプストラム分析された後、１フレー
ム毎に特徴パラメータに変換される。[0006] The voice input from the voice signal input terminal 111 is converted into a digital signal in the feature extraction unit 112, further subjected to LPC cepstrum analysis, and then converted into feature parameters for each frame.

【０００７】認識部１１３では、適応型統計的言語モデ
ル１１８を用いて選出した複数の候補について、その候
補の標準パターンと入力音声のパラメータとの類似度
（尤度）を求める。更に、適応型統計的言語モデル１１
８により得られた尤度と、標準パターンと入力音声のパ
ラメータとの尤度との線形和を総合尤度とし、総合尤度
の最も高い候補を認識結果として認識結果出力部１１４
へ出力する。The recognizing unit 113 obtains the similarity (likelihood) between the standard pattern of the candidates and the parameters of the input speech for a plurality of candidates selected using the adaptive statistical language model 118. Further, the adaptive statistical language model 11
8 and the linear sum of the likelihood of the standard pattern and the parameters of the input speech as the total likelihood, and the candidate with the highest total likelihood as the recognition result is output as the recognition result output unit 114.
Output to

【０００８】[0008]

【発明が解決しようとする課題】上述した適応型統計的
言語モデルを用いる従来の技術は、汎用的な統計的言語
モデルだけを用いる技術に比較して、高い認識性能を実
現できるが、認識タスクに類似したテキストデータベー
スを事前に用意し、それに基づいて適応型統計的言語モ
デルを作成しておかなければならないという問題があっ
た。The conventional technique using the above-mentioned adaptive statistical language model can realize higher recognition performance than the technique using only a general-purpose statistical language model. There is a problem that a text database similar to the above must be prepared in advance, and an adaptive statistical language model must be created based on the text database.

【０００９】そこで、本発明の目的は、ユーザによる認
識結果の修正を利用することにより、事前に認識タスク
に類似したテキストデータを必要とせず、認識タスクに
適応して認識性能を向上することができる音声入力装置
を提供することにある。Accordingly, an object of the present invention is to improve recognition performance by adapting to a recognition task without using text data similar to the recognition task in advance by utilizing the correction of the recognition result by the user. It is an object of the present invention to provide a voice input device capable of performing the above.

【００１０】[0010]

【課題を解決するための手段】本発明の音声入力装置は
上記目的を達成するため、入力音声に対して得られる複
数の認識結果候補の中から最適なものを選択し前記入力
音声に対する認識結果とする音声入力装置であって、認
識結果候補となり得る複数の単語の適応スコアが格納さ
れる適応スコア記憶部と、入力音声に対する複数の認識
結果候補の中から認識結果を選択する際、前記適応スコ
ア記憶部に格納されている前記各認識結果候補の適応ス
コアも考慮して認識結果を選択する手段と、該手段によ
って選択された認識結果をユーザの指示に従って修正す
ると共に、認識結果候補の内の、修正後の認識結果より
も上位に位置する認識結果候補の適応スコアの値から一
定値を減じるユーザ修正部とを備えている。或いは、上
記ユーザ修正部の代わりに、前記手段によって選択され
た認識結果をユーザの指示に従って修正すると共に、修
正後の認識結果よりも上位に位置する認識結果候補に対
し、初期値以外の適応スコアがあれば一定の値を減じる
ユーザ修正部を備えている。 In order to achieve the above object, the speech input device of the present invention selects an optimal recognition result candidate from a plurality of recognition result candidates obtained for an input speech and recognizes the recognition result for the input speech. An adaptive score storage unit for storing adaptive scores of a plurality of words that can be recognition result candidates; and selecting the recognition result from among a plurality of recognition result candidates for input speech. means for selecting said stored in score storage unit each recognition result candidate recognition result adaptive score even considering the, along with modifying the recognition result selected by said means in accordance with an instruction of the user, among the recognition result candidates From the modified recognition result
From the value of the adaptive score of the recognition result candidate
A user correction unit for reducing the fixed value . Or on
Selected by said means instead of the user correction section
The recognition result is corrected according to the user's instructions,
For recognition result candidates positioned higher than the
And if there is an adaptation score other than the initial value, reduce a certain value
A user correction unit is provided.

【００１１】この構成に於いては、複数の認識結果候補
の中から認識結果を選択する際、適応スコア記憶部に格
納されている各認識結果候補の適応スコアも考慮して認
識結果を選択する。もし、この認識結果に誤りがある場
合は、ユーザは、ユーザ修正部に認識結果の修正を指示
する。これにより、ユーザ修正部は、ユーザの指示に従
って認識結果を修正すると共に、認識結果候補の内の、
修正後の認識結果よりも上位に位置する認識結果候補の
適応スコアの値から一定値を減じたり、或いは修正後の
認識結果よりも上位に位置する認識結果候補に対し、初
期値以外の適応スコアがあれば一定の値を減じる。従っ
て、以後、認識結果に誤りが生じた単語を音声入力した
場合、正しい認識結果が得られる確率が高くなる。In this configuration, when a recognition result is selected from a plurality of recognition result candidates, the recognition result is selected in consideration of the adaptive score of each recognition result candidate stored in the adaptive score storage unit. . If there is an error in the recognition result, the user instructs the user correction unit to correct the recognition result. With this, the user correction unit corrects the recognition result according to the user's instruction , and, among the recognition result candidates,
Recognition result candidates that are higher than the
After subtracting a certain value from the value of the adaptive score, or
For the recognition result candidates located higher than the recognition result,
If there is an adaptation score other than the period value, a certain value is reduced. Therefore, when a word having an erroneous recognition result is input thereafter, the probability of obtaining a correct recognition result increases.

【００１２】また、本発明の音声入力装置は上記目的を
達成するため、入力音声を分析し、分析結果を出力する
パラメータ分析部と、複数の単語についての標準パター
ンが格納された標準パターン記憶部と、前記パラメータ
分析部の分析結果と前記標準パターン記憶部に格納され
ている各標準パターンとの間の距離をそれぞれ求め、該
求めた各距離に基づいて複数の認識結果候補を求める音
声認識部と、認識結果候補となり得る複数の単語の適応
スコアが格納される適応スコア記憶部と、前記音声認識
部で求められた複数の認識結果候補の中から、前記適応
スコア記憶部に格納されている前記各認識結果候補の適
応スコアと、前記各認識結果候補の標準パターンとの間
の距離とに基づいて、認識結果を選択する候補選択部
と、該候補選択部によって選択された認識結果をユーザ
の指示に従って修正すると共に、認識結果候補の内の、
修正後の認識結果よりも上位に位置する認識結果候補の
適応スコアの値から一定値を減じるユーザ修正部とを備
えている。或いは、上記ユーザ修正部の代わりに、前記
候補選択部によって選択された認識結果をユーザの指示
に従って修正すると共に、修正後の認識結果よりも上位
に位置する認識結果候補に対し、初期値以外の適応スコ
アがあれば一定の値を減じるユーザ修正部を備えてい
る。 In order to achieve the above object, a voice input device of the present invention analyzes a input voice and outputs a result of the analysis, and a standard pattern storage unit storing standard patterns for a plurality of words. And a voice recognition unit that obtains a distance between the analysis result of the parameter analysis unit and each of the standard patterns stored in the standard pattern storage unit and obtains a plurality of recognition result candidates based on the obtained distances. And an adaptive score storage unit that stores adaptive scores of a plurality of words that can be recognition result candidates, and an adaptive score storage unit that stores, from among a plurality of recognition result candidates obtained by the speech recognition unit, A candidate selection unit that selects a recognition result based on an adaptation score of each recognition result candidate and a distance between a standard pattern of each recognition result candidate and a candidate selection unit; Together to correct it has been recognition result selected in accordance with an instruction of the user I, of recognition result candidates,
Recognition result candidates that are higher than the
A user correction unit for subtracting a fixed value from the value of the adaptive score . Alternatively, instead of the user correction unit,
User's instruction on the recognition result selected by the candidate selection unit
And higher than the corrected recognition result
For the recognition result candidates located in
A user correction unit that reduces a certain value if
You.

【００１３】この構成に於いては、ユーザが、単語を音
声入力すると、パラメータ分析部が、入力音声を分析
し、音声認識部が、パラメータ分析部の分析結果と標準
パターン記憶部に格納されている各標準パターンとの間
の距離に基づいて複数の認識結果候補を求め、候補選択
部が、音声認識部で求められた複数の認識結果候補の中
から、適応スコア記憶部に格納されている上記各認識結
果候補の適応スコアと、上記各認識結果候補の標準パタ
ーンとの間の距離とに基づいて、認識結果を選択する。In this configuration, when the user inputs a word by voice, the parameter analysis unit analyzes the input voice, and the voice recognition unit stores the analysis result of the parameter analysis unit and the standard pattern storage unit. A plurality of recognition result candidates are obtained based on a distance from each standard pattern, and a candidate selection unit is stored in an adaptive score storage unit from among a plurality of recognition result candidates obtained by a speech recognition unit. A recognition result is selected based on the adaptive score of each recognition result candidate and the distance between the standard pattern of each recognition result candidate.

【００１４】ユーザは、候補選択部で選択された認識結
果に誤りがある場合は、ユーザ修正部に対してその修正
を指示する。これにより、ユーザ修正部が、候補選択部
によって選択された認識結果をユーザの指示に従って修
正すると共に、認識結果候補の内の、修正後の認識結果
よりも上位に位置する認識結果候補の適応スコアの値か
ら一定値を減じたり、或いは修正後の認識結果よりも上
位に位置する認識結果候補に対し、初期値以外の適応ス
コアがあれば一定の値を減じる。 If there is an error in the recognition result selected by the candidate selection unit, the user instructs the user correction unit to correct the error. Accordingly, the user correcting unit corrects the recognition result selected by the candidate selecting unit in accordance with the user's instruction , and, among the recognition result candidates, the corrected recognition result.
Is the value of the adaptation score of the recognition result candidate positioned higher than
From the recognition result after correction, or
Of the recognition result candidate located at
If there is a core, subtract a certain value.

【００１５】また、本発明の音声入力装置は、認識性能
を更に向上させるため、入力音声を分析し、分析結果を
出力するパラメータ分析部と、複数の単語についての標
準パターンが格納された標準パターン記憶部と、前記パ
ラメータ分析部の分析結果と前記標準パターン記憶部に
格納されている各標準パターンとの間の距離をそれぞれ
求め、該求めた各距離に基づいて複数の認識結果候補を
求める音声認識部と、認識結果候補となり得る複数の単
語の適応スコアが格納される適応スコア記憶部と、単語
の生起順序を示す統計的言語モデルと、前記音声認識部
で求められた複数の認識結果候補の中から、前記適応ス
コア記憶部に格納されている前記各認識結果候補の適応
スコアと、前記各認識結果候補の標準パターンとの間の
距離と、前記統計的言語モデルの内容とに基づいて、認
識結果を選択する言語処理部と、該言語処理部によって
選択された認識結果をユーザの指示に従って修正すると
共に、認識結果候補の内の、修正後の認識結果よりも上
位に位置する認識結果候補の適応スコアの値から一定値
を減じるユーザ修正部とを備えている。或いは、上記ユ
ーザ修正部の代わりに、前記言語処理部によって選択さ
れた認識結果をユーザの指示に従って修正すると共に、
修正後の認識結果よりも上位に位置する認識結果候補に
対し、初期値以外の適応スコアがあれば一定の値を減じ
るユーザ修正部を備えている。 Further, in order to further improve the recognition performance, the voice input device of the present invention analyzes the input voice and outputs a result of the analysis, and a standard pattern storing a standard pattern for a plurality of words. A voice for obtaining a distance between the analysis result of the parameter analysis unit and each standard pattern stored in the standard pattern storage unit, and obtaining a plurality of recognition result candidates based on the obtained distances; A recognition unit, an adaptive score storage unit for storing adaptive scores of a plurality of words that can be recognition result candidates, a statistical language model indicating an order of occurrence of words, and a plurality of recognition result candidates obtained by the speech recognition unit. , The distance between the adaptive score of each recognition result candidate stored in the adaptive score storage unit and the standard pattern of each recognition result candidate, Based on the content of the language model, a language processing unit for selecting a recognition result, a recognition result selected by該言word processor as well as modified according to a user instruction, of the recognition result candidates, the recognition result after correction Above
Constant from the value of the adaptive score of the recognition result candidate located at
And a user correction unit for reducing Alternatively,
Selected by the language processing unit instead of the user correction unit.
Correct the recognition result according to the user's instruction,
Recognition result candidates positioned higher than the corrected recognition results
On the other hand, if there is an adaptation score other than the initial value, a certain value is reduced.
User correction unit.

【００１６】この構成に於いては、言語処理部が、音声
認識部で求められた複数の認識結果候補の中から、適応
スコア記憶部に格納されている上記各認識結果候補の適
応スコアと、上記各認識結果候補の標準パターンとの間
の距離と、統計的言語モデルの内容とに基づいて、認識
結果を選択する。In this configuration, the language processing unit selects, from among the plurality of recognition result candidates obtained by the speech recognition unit, the adaptive score of each of the recognition result candidates stored in the adaptive score storage unit; A recognition result is selected based on the distance between each recognition result candidate and the standard pattern and the contents of the statistical language model.

【００１７】[0017]

【００１８】[0018]

【００１９】[0019]

【００２０】[0020]

【００２１】[0021]

【００２２】[0022]

【００２３】[0023]

【００２４】[0024]

【発明の実施の形態】次に本発明の実施の形態について
図面を参照して詳細に説明する。Embodiments of the present invention will now be described in detail with reference to the drawings.

【００２５】先ず、本発明の音声入力装置の機能的概要
を簡単に説明する。本発明の音声入力装置は、入力音声
に対して複数の認識結果候補を求め、その中から認識結
果を選択する際、入力音声と上記認識結果候補との間の
距離だけでなく、これまでに行われたユーザによる修正
操作に基づいて各認識結果候補に与えられている適応ス
コアの値も考慮する。First, the functional outline of the voice input device of the present invention will be briefly described. The voice input device of the present invention obtains a plurality of recognition result candidates for an input voice, and when selecting a recognition result from among them, not only the distance between the input voice and the recognition result candidate, but also The value of the adaptive score given to each recognition result candidate based on the correction operation performed by the user is also considered.

【００２６】図１は本発明の一実施例に係る音声入力装
置の構成例を示したブロック図であり、パラメータ分析
部１と、標準パターン記憶部２と、音声認識部３と、候
補選択部４と、適応スコア記憶部５と、表示部６と、ユ
ーザ修正部７とを備えている。FIG. 1 is a block diagram showing an example of the configuration of a voice input device according to one embodiment of the present invention. A parameter analysis unit 1, a standard pattern storage unit 2, a voice recognition unit 3, a candidate selection unit 4, an adaptive score storage unit 5, a display unit 6, and a user correction unit 7.

【００２７】標準パターン記憶部２には、予め分析され
た音声の標準パターン（例えば、パラメータベクトル列
として表現されている）が複数格納されている。適応ス
コア記憶部５には、認識結果候補となり得る複数の単語
の適応スコアが格納されている。The standard pattern storage unit 2 stores a plurality of standard patterns (for example, expressed as a parameter vector sequence) of speech analyzed in advance. The adaptive score storage unit 5 stores adaptive scores of a plurality of words that can be recognition result candidates.

【００２８】パラメータ分析部１は、入力音声を分析
し、分析結果（例えば、パラメータベクトル列）を出力
する機能を有する。このようなパラメータ分析部１は、
例えば、フィルタバンク，フーリエ変換，線形予測係数
型分析器等を用いて構成することができる。The parameter analyzer 1 has a function of analyzing an input voice and outputting an analysis result (for example, a parameter vector sequence). Such a parameter analyzer 1
For example, it can be configured using a filter bank, a Fourier transform, a linear prediction coefficient type analyzer, or the like.

【００２９】音声認識部３は、パラメータ分析部１の分
析結果と標準パターン記憶部２に格納されている各標準
パターンとの間の距離をそれぞれ求め、求めた各距離に
基づいて複数の認識結果候補を求める機能や、各認識結
果候補をその距離と共に出力する機能を有する。尚、距
離の代わりに確からしさを用いることもできる。The voice recognition unit 3 calculates the distance between the analysis result of the parameter analysis unit 1 and each of the standard patterns stored in the standard pattern storage unit 2, and based on the obtained distances, a plurality of recognition results. It has a function of obtaining candidates and a function of outputting each recognition result candidate together with its distance. Note that certainty can be used instead of the distance.

【００３０】候補選択部４は、音声認識部３から出力さ
れる認識結果候補の中から、適応スコア記憶部５に格納
されている上記各認識結果候補の適応スコアと、上記各
認識結果候補の標準パターンとの間の距離とに基づい
て、認識結果を選択する機能を有する。The candidate selecting section 4 selects, from among the recognition result candidates output from the speech recognition section 3, the adaptive score of each of the recognition result candidates stored in the adaptive score storage section 5 and the adaptive score of each of the recognition result candidates. It has a function of selecting a recognition result based on the distance from the standard pattern.

【００３１】表示部６には、候補選択部４で選択された
認識結果や、認識結果候補が表示される。The display unit 6 displays the recognition result selected by the candidate selection unit 4 and the recognition result candidates.

【００３２】ユーザ修正部７は、表示部６に表示された
認識結果が誤っており、ユーザによってその修正が指示
された場合、他の認識結果候補を表示部６に表示する機
能や、表示部６に表示した認識結果候補の内のユーザに
よって選択された認識結果候補を修正後に認識結果とす
る機能や、修正後の認識結果が以後選択されやすくなる
ように（有利になるように）適応スコア記憶部５の内容
を更新する機能を有する。The user correction unit 7 has a function of displaying another recognition result candidate on the display unit 6 when the recognition result displayed on the display unit 6 is incorrect and the correction is instructed by the user. 6, a function of setting the recognition result candidate selected by the user among the recognition result candidates displayed as the recognition result after the correction, and an adaptive score so that the corrected recognition result is easily selected thereafter (to be advantageous). It has a function of updating the contents of the storage unit 5.

【００３３】図２は、図１に示す音声入力装置の処理例
のフローチャートであり、以下各図を参照して動作を説
明する。FIG. 2 is a flowchart of a processing example of the voice input device shown in FIG. 1, and the operation will be described below with reference to the drawings.

【００３４】ユーザが、入力する単語を発声すると（ス
テップＳ１）、パラメータ分析部１が入力音声を分析し
て分析結果を出力し、音声認識部３が入力音声の分析結
果と標準パターン記憶部２に格納されている各単語の標
準パターンとを比較し、入力音声と各単語の標準パター
ンとの距離を求め、複数の認識結果候補をその距離と共
に出力する（ステップＳ２）。When the user utters a word to be input (step S1), the parameter analysis unit 1 analyzes the input voice and outputs an analysis result, and the voice recognition unit 3 analyzes the input voice and the standard pattern storage unit 2. Is compared with the standard pattern of each word, and the distance between the input voice and the standard pattern of each word is obtained, and a plurality of recognition result candidates are output together with the distance (step S2).

【００３５】音声認識部３から複数の認識結果候補とそ
の距離が出力されると、候補選択部４は、適応スコア記
憶部５から上記各認識結果候補についての適応スコアを
取得する（ステップＳ３）。その後、候補選択部４で
は、各認識結果候補について、距離の符号を逆転したも
のと適応スコアの値との線形和を求め、認識結果候補を
その大きい順番に並べ替え、先頭の認識結果候補（線形
和が最も大きい認識結果候補）を認識結果として表示部
６に表示する（ステップＳ４）。When the plurality of recognition result candidates and their distances are output from the voice recognition unit 3, the candidate selection unit 4 obtains an adaptive score for each of the recognition result candidates from the adaptive score storage unit 5 (step S3). . After that, the candidate selecting unit 4 calculates a linear sum of the inverse of the sign of the distance and the value of the adaptive score for each recognition result candidate, sorts the recognition result candidates in descending order, and selects the first recognition result candidate ( The recognition result candidate having the largest linear sum) is displayed on the display unit 6 as the recognition result (step S4).

【００３６】図３は、ステップＳ４で行う認識結果候補
の並べ替え処理を具体的に例示したものである。この例
は、ユーザが「音声」という単語を発声し、音声認識部
３から認識結果候補「温泉」，「音声」，「本社」と距
離「３０」，「３６」，「４４」とが出力され、適応ス
コア記憶部５に上記認識結果候補「温泉」，「音声」，
「本社」の適応スコアとしてそれぞれ「０」，「１
０」，「１６」が格納されている場合の処理を示してい
る。FIG. 3 shows a specific example of the reordering process of the recognition result candidates performed in step S4. In this example, the user utters the word “voice”, and the recognition result candidates “hot spring”, “voice”, “head office” and distances “30”, “36”, and “44” are output from the voice recognition unit 3. Then, the recognition result candidates “hot spring”, “voice”,
"0", "1" as the adaptation score of "Headquarters"
The processing when “0” and “16” are stored is shown.

【００３７】候補選択部４は、各認識結果候補「温
泉」，「音声」，「本社」について、距離の符号を逆転
したもの「−３０」，「−３６」，「−４４」と、適応
スコアの値「０」，「１０」，「１６」の線形和をそれ
ぞれ求め、認識結果候補を線形和が大きい順に並べ替え
る。この例の場合、各認識結果候補「温泉」，「音
声」，「本社」の線形和は、それぞれ「−３０」，「−
２６」，「−２８」となるので、「音声」，「本社」，
「温泉」の順番に認識結果候補が並べ替えられる。The candidate selection unit 4 determines whether each of the recognition result candidates “hot spring”, “voice”, and “head office” has the opposite sign of the distance, such as “−30”, “−36”, and “−44”. The linear sum of the score values “0”, “10”, and “16” is obtained, and the recognition result candidates are rearranged in descending order of the linear sum. In this example, the linear sums of the recognition result candidates “hot spring”, “voice”, and “head office” are “−30” and “−”, respectively.
26 "," -28 ", so" voice "," head office ",
The recognition result candidates are sorted in the order of “hot spring”.

【００３８】ユーザは、表示部６に表示された認識結果
が正しいものである場合（ステップＳ５がＮｏ）は、次
に入力する単語を発声し（ステップＳ１）、認識結果に
誤りがある場合は、キーボード，マウス（図示せず）等
からユーザ修正部７に対して修正指示を入力する。If the recognition result displayed on the display unit 6 is correct (No in step S5), the user utters the next word to be input (step S1), and if the recognition result contains an error, A correction instruction is input to the user correction unit 7 from a keyboard, a mouse (not shown), or the like.

【００３９】ユーザ修正部７に修正指示が入力されると
（ステップＳ５がＹｅｓ）、以下に示すステップＳ６の
処理が行われる。When a correction instruction is input to the user correction section 7 (Yes in step S5), the processing in step S6 described below is performed.

【００４０】先ず、ユーザ修正部７が候補選択部４に対
して認識結果候補を要求する。この要求に応答して、候
補選択部４は、音声認識部３から渡された複数の認識結
果候補の内、今回認識結果とした認識結果候補を除いた
ものをユーザ修正部７に渡す。First, the user correction unit 7 requests the candidate selection unit 4 for a recognition result candidate. In response to this request, the candidate selection unit 4 passes to the user correction unit 7 a plurality of recognition result candidates passed from the speech recognition unit 3 excluding the recognition result candidate that is the current recognition result.

【００４１】これにより、ユーザ修正部７は、候補選択
部４から渡された認識結果候補を全て表示部６に表示す
る。ユーザは、表示部６に認識結果候補が表示される
と、その中から正しい認識結果候補をマウス等を用いて
選択する。Thus, the user correcting section 7 displays all the recognition result candidates passed from the candidate selecting section 4 on the display section 6. When the recognition result candidates are displayed on the display unit 6, the user selects a correct recognition result candidate from the display using the mouse or the like.

【００４２】ユーザによって正しい認識結果候補が選択
されると、ユーザ修正部７は、選択された認識結果候補
を認識結果とする修正処理を行う。更に、ユーザ修正部
７は、選択された認識結果候補（修正結果）が、以後候
補選択部４に於いて選択されやすくなるように、例えば
下記（Ａ）に示す方法によって適応スコア記憶部５の内
容を更新する。When a correct recognition result candidate is selected by the user, the user correction section 7 performs a correction process using the selected recognition result candidate as a recognition result. Further, the user correction unit 7 stores the selected recognition result candidate (correction result) in the adaptive score storage unit 5 by a method shown in (A) below, for example, so that the candidate is easily selected by the candidate selection unit 4 thereafter. Update the content.

【００４３】（Ａ）・修正結果の適応スコアが初期値で
あれば、その値を一定値Ａにする。・修正結果の適応スコアが初期値以外であれば、その値
に一定の値Ｂを加える。・修正結果よりも上位の認識結果候補の中に初期値以外
の適応スコアがあれば一定の値Ｃを減じる。（ここで、
Ａ，Ｂ，Ｃは、Ａ＞Ｂ，Ａ＞Ｃの関係を有することが望
ましい）(A) If the adaptive score resulting from the correction is an initial value, the value is set to a constant value A. If the adaptation score resulting from the correction is other than the initial value, a fixed value B is added to the value. If the recognition result candidate higher than the correction result has an adaptation score other than the initial value, a certain value C is reduced. (here,
A, B, and C preferably have a relationship of A> B, A> C)

【００４４】図４は、適応スコア更新処理の具体例を示
したものである。この例は、「音声」という単語を発声
し、その認識結果とした「本社」が得られ、また、認識
結果候補として第２位に「温泉」が、第３位に「音声」
が得られ、表示された認識候補の中から「音声」が選択
された場合の処理を示している。また、この例では、適
応スコアの初期値を０としている。FIG. 4 shows a specific example of the adaptive score updating process. In this example, the word "voice" is uttered, and "Headquarters" is obtained as the recognition result. Also, "hot spring" is ranked second as a recognition result candidate, and "voice" is ranked third.
Is obtained, and the process is performed when “voice” is selected from the displayed recognition candidates. In this example, the initial value of the adaptive score is set to 0.

【００４５】修正結果「音声」には、初期値以外の適応
スコア（図の例では１０）が既に設定されているため、
「音声」に対する適応スコアに一定の値（図の例では
４）を加え、新たな適応スコア（図の例では１４）を得
る。また、修正結果「音声」よりも上位の認識結果候補
で、適応スコアが初期値以外の認識結果候補は「本社」
だけであるので、認識結果候補「本社」の適応スコアか
ら一定の値（図の例では５）を減じ、新たな適応スコア
（図の例では１１）を得る。このように、適応スコアを
更新することにより、再度「音声」を入力した場合に
は、候補選択部４に於いて、認識結果候補「音声」が選
択されやすくなる。尚、図４の例では、適応スコアの初
期値を０としているが、適応スコアの初期値を認識タス
クに類似したテキストデータを用い、例えば、テキスト
データを単語に分割し、その出現頻度とすることもでき
る。Since the adaptation score (10 in the example in the figure) other than the initial value has already been set in the correction result “voice”,
A fixed value (4 in the example in the figure) is added to the adaptation score for “voice” to obtain a new adaptation score (14 in the example in the figure). In addition, the recognition result candidates higher than the correction result “voice” and having the adaptation score other than the initial value are “head office”
Therefore, a certain value (5 in the example in the figure) is subtracted from the adaptation score of the recognition result candidate “head office” to obtain a new adaptation score (11 in the example in the figure). As described above, by updating the adaptive score, when “speech” is input again, the candidate selection unit 4 can easily select the recognition result candidate “speech”. In the example of FIG. 4, the initial value of the adaptive score is set to 0. However, the initial value of the adaptive score is determined by using text data similar to the recognition task. You can also.

【００４６】上記したステップＳ６の処理が終了する
と、ユーザは、次に入力する単語を音声入力する（ステ
ップＳ１）。以後、単語が音声入力される毎に、前述し
たと同様の処理が行われる。When the processing in step S6 is completed, the user voice-inputs the next word to be input (step S1). Thereafter, every time a word is input by speech, the same processing as described above is performed.

【００４７】尚、適応スコア記憶部５の更新方法として
は、上記した（Ａ）以外に下記（Ｂ）〜（Ｉ）に示す方
法を採用することもできる。As a method of updating the adaptive score storage unit 5, the following methods (B) to (I) can be adopted in addition to the method (A) described above.

【００４８】（Ｂ）常に修正結果の適応スコアの値を一
定値Ａにする。(B) The value of the adaptive score resulting from the correction is always set to a constant value A.

【００４９】（Ｃ）常に修正結果の適応スコアの値に一
定値Ｂを加える。(C) A constant value B is always added to the value of the adaptive score resulting from the correction.

【００５０】（Ｄ）・修正結果の適応スコアが初期値で
あれば、その認識結果候補の適応スコアの値を一定値Ａ
にする。・修正結果の適応スコアが初期値以外であれば、その認
識結果候補の適応スコアの値に一定の値Ｂを加える。(D) If the adaptation score of the correction result is an initial value, the value of the adaptation score of the recognition result candidate is set to a constant value A
To If the adaptive score of the correction result is other than the initial value, a certain value B is added to the value of the adaptive score of the recognition result candidate.

【００５１】（Ｅ）上記した方法（Ｂ），（Ｃ）または
（Ｄ）に於いて、認識結果候補の内の、修正結果よりも
上位に位置する認識結果候補の適応スコアから一定値Ｃ
を減じる。(E) In the above method (B), (C) or (D), a fixed value C is obtained from the adaptive score of the recognition result candidate positioned higher than the correction result among the recognition result candidates.
Reduce.

【００５２】（Ｆ）上記した方法（Ｂ）または（Ｃ）に
於いて、修正結果よりも上位に位置する認識結果候補に
対し、初期値以外の適応スコアがあれば一定の値Ｃを減
じる。(F) In the above method (B) or (C), if there is an adaptive score other than the initial value for the recognition result candidate positioned higher than the correction result, a certain value C is reduced.

【００５３】（Ｇ）上記した方法（Ａ），（Ｂ），
（Ｃ），（Ｄ），（Ｅ）または（Ｆ）に於いて、ユーザ
が単語を音声入力する毎に適応スコア記憶部５に記憶さ
れている適応スコアの内の、その値が正になっている適
応スコアの値を一定値Ｄだけ減じる。尚、この方法を採
用する場合は、例えば、候補選択部４に於いて認識結果
を表示部６に表示する毎に、ユーザ修正部７に通知を行
い、この通知を受けたユーザ修正部７が、適応スコア記
憶部５に記憶されている適応スコアの内の、その値が正
になっている適応スコアから一定値Ｄだけ減じるように
すれば良い。(G) The above methods (A), (B),
In (C), (D), (E) or (F), the value of the adaptive score stored in the adaptive score storage unit 5 becomes positive each time the user inputs a word by voice. The value of the adaptive score is reduced by a constant value D. When this method is adopted, for example, each time the recognition result is displayed on the display unit 6 in the candidate selecting unit 4, the user correcting unit 7 is notified. It is sufficient that the value of the adaptive score stored in the adaptive score storage unit 5 is subtracted by a constant value D from the adaptive score having a positive value.

【００５４】（Ｈ）上記した方法（Ａ），（Ｂ），
（Ｃ），（Ｄ），（Ｅ）または（Ｆ）に於いて、ユーザ
修正部７が認識結果を修正する毎に、適応スコア記憶部
５に格納されている全ての適応スコアの値を一定値Ｄだ
け減じるという減衰処理を行う。(H) The above methods (A), (B),
In (C), (D), (E) or (F), every time the user correction unit 7 corrects the recognition result, the values of all the adaptive scores stored in the adaptive score storage unit 5 are kept constant. An attenuation process of reducing by the value D is performed.

【００５５】（Ｉ）上記した方法（Ａ），（Ｂ），
（Ｃ），（Ｄ），（Ｅ）または（Ｆ）に於いて、ユーザ
修正部７が認識結果を修正する毎に、適応スコア記憶部
５に格納されている正の値の適応スコアから一定値Ｄを
減じるという減衰処理を行う。(I) The above methods (A), (B),
In each of (C), (D), (E) and (F), each time the user correction unit 7 corrects the recognition result, a fixed value is set from the positive value adaptive score stored in the adaptive score storage unit 5. An attenuation process of reducing the value D is performed.

【００５６】本実施例の音声入力装置は、例えば、音声
入力文書作成装置に使用すると非常に有効である。一般
に、ユーザは、音声入力文書作成装置を用いて様々な分
野の文書を作成し、また、ひとまとまりの文書を入力す
る場合には、文書中で同じ単語を繰り返し使用すること
が多い。本実施例では、認識結果に誤りがあり、ユーザ
によって認識結果が修正された場合、以後候補選択部４
に於いて修正結果が選択されやすくなるように、適応ス
コア記憶部５の内容を更新するので、特開平４−２９１
３９９号公報に記載されている従来技術のように、作成
する文書の分野（認識タスク）タスクに類似したテキス
トデータを用意することなく、様々な分野の文書を認識
誤りをあまり起こさずにスムーズに入力することが可能
になる。The voice input device of this embodiment is very effective when used, for example, in a voice input document creation device. In general, a user creates documents in various fields using a voice input document creation device, and when inputting a group of documents, the same word is often used repeatedly in the documents. In the present embodiment, if the recognition result contains an error and the user corrects the recognition result, the candidate selection unit 4
Since the contents of the adaptive score storage unit 5 are updated so that the correction result can be easily selected in the method described in JP-A-4-291.
No text data similar to the task (recognition task) of the document to be created is prepared as in the prior art described in Japanese Patent Application Publication No. 399, and documents in various fields are smoothly read without causing recognition errors. It becomes possible to input.

【００５７】また、本実施例の音声入力装置を、例え
ば、音声によるコマンド入力で、発声に対し複数候補を
提示し、その中からユーザに入力すべきコマンドを選択
させるようなシステムに適用しても非常に有効である。
本実施例の音声入力装置を使用すれば、以前に誤認識さ
れたコマンドを再度音声入力した場合には、そのコマン
ドが上位に提示されるようになり、ユーザインタフェー
スを向上させることができる。但し、このようにする場
合には、候補選択部４の構成を多少変更し、最上位のコ
マンドだけでなく、残りの候補も線形和が大きい順に表
示部６に表示するようにすることが必要である。Also, the voice input device of the present embodiment is applied to a system that presents a plurality of candidates for utterance by, for example, a command input by voice, and allows the user to select a command to be input from the candidates. Is also very effective.
If the voice input device of this embodiment is used, when a previously misrecognized command is input again by voice, the command is presented at a higher position, and the user interface can be improved. However, in such a case, it is necessary to slightly change the configuration of the candidate selection unit 4 so that not only the top command but also the remaining candidates are displayed on the display unit 6 in descending order of the linear sum. It is.

【００５８】また、適応スコア記憶部５の更新方法とし
て、上記した方法（Ｇ），（Ｈ），（Ｉ）を使用する
と、認識タスクが異なるものになった直後に於いて、認
識誤りが多発しないようにすることができる。例えば、
最初に入力する認識タスクが「音声入力装置に関する発
明」で、その次に入力する認識タスクが「温泉に関する
話題」である場合に於いて、もし、「音声入力装置に関
する発明」を入力中に「音声」が「温泉」に誤認識さ
れ、「音声」の適応スコアが高められたとする。このよ
うな場合、「音声」の適応スコアを高めたままにしてお
くと、次に「温泉に関する話題」を入力する場合に、
「温泉」が「音声」に誤認識される危険性が高くなる。
上記した方法（Ｇ），（Ｈ），（Ｉ）では、、ユーザが
単語を入力する毎に、或いはユーザが認識結果を修正す
る毎に、適応スコア記憶部５に記憶されている適応スコ
アの値を減じるようにしているので、「温泉に関する話
題」の入力時には、「音声」の適応スコアの値は、０に
なっている可能性が高い。従って、上記した方法
（Ｇ），（Ｈ），（Ｉ）によれば、認識タスクが異なる
ものになった直後に於いて、認識誤りが多発しないよう
にすることができる。When the above methods (G), (H), and (I) are used as a method of updating the adaptive score storage unit 5, recognition errors frequently occur immediately after the recognition task becomes different. Can not be. For example,
If the recognition task to be input first is “an invention related to a voice input device” and the next recognition task to be input is “a topic related to a hot spring”, if “an invention related to a voice input device” is Assume that "voice" is erroneously recognized as "hot spring", and the adaptive score of "voice" is increased. In such a case, if you keep the adaptive score of "voice" high, the next time you enter "topics on hot springs"
The risk that "hot spring" is erroneously recognized as "voice" is increased.
In the above methods (G), (H), and (I), each time the user inputs a word or every time the user corrects the recognition result, the adaptive score stored in the adaptive score storage unit 5 is calculated. Since the value is reduced, the value of the adaptation score of “voice” is likely to be 0 when “topic on hot spring” is input. Therefore, according to the above methods (G), (H), and (I), it is possible to prevent recognition errors from occurring frequently immediately after a different recognition task.

【００５９】尚、上述した実施例に於いては、説明しな
かったが、最新の発声以前の発声に対する認識結果と認
識結果候補とを保存しておき、保存されている内容に基
づいて過去の認識結果を修正可能にすることもできる。Although not described in the above-described embodiment, recognition results and recognition result candidates for utterances before the latest utterance are stored, and past results are stored based on the stored contents. The recognition result can be made modifiable.

【００６０】図５は本発明の別の実施例のブロック図で
ある。本実施例の音声入力装置は、図１に示した音声入
力装置が備えている構成に加え、音声認識部３で選ばれ
た認識結果の候補を、発声された順番に一定量記憶する
認識結果保持部８と、認識結果保持部８から与えられた
認識結果候補からラティスを構成するラティス構成部９
を備えている。ラティスは、発声毎の認識結果候補から
構成される。FIG. 5 is a block diagram of another embodiment of the present invention. The speech input device of the present embodiment has a configuration in which the speech input device shown in FIG. A holding unit 8 and a lattice forming unit 9 that forms a lattice from recognition result candidates given from the recognition result holding unit 8
It has. The lattice is composed of recognition result candidates for each utterance.

【００６１】更に、本実施例の音声入力装置は、図１に
示した候補選択部４の代わりに言語処理部１０を備えて
いる。言語処理部１０は、ラティス構成部９から与えら
れるラティスに対し、音声認識部３から与えられる入力
音声の各認識結果候補の標準パターンとの間の距離と、
適応スコア記憶部５に格納されている認識結果候補の適
応スコアの値と、言語処理部１０が持つ統計的言語モデ
ル（単語の生成順序を表す）の内容に基づいて、ラティ
ス中の各発声に対する認識結果候補からそれぞれ、最適
な認識結果を得る機能を備えている。尚、図５に於い
て、他の図１と同一符号は同一部分を表している。Further, the voice input device of the present embodiment is provided with a language processing unit 10 instead of the candidate selection unit 4 shown in FIG. The language processing unit 10 determines the distance between the lattice provided from the lattice configuration unit 9 and the standard pattern of each recognition result candidate of the input voice provided from the voice recognition unit 3,
Based on the value of the adaptive score of the recognition result candidate stored in the adaptive score storage unit 5 and the contents of the statistical language model (indicating the generation order of words) possessed by the language processing unit 10, each utterance in the lattice is It has a function of obtaining an optimum recognition result from each of the recognition result candidates. In FIG. 5, the same reference numerals as those in FIG. 1 represent the same parts.

【００６２】複数の認識結果候補から認識結果を選択す
る際の、適応スコアの具体的な使用例として、離散単語
発声に対する認識結果候補を時系列に並べたラティスに
対し、統計的言語モデルとして単語バイグラムモデルを
使用した場合を説明する。As a specific example of the use of the adaptation score when selecting a recognition result from a plurality of recognition result candidates, a lattice in which recognition result candidates for discrete word utterances are arranged in time series is used as a statistical language model. The case where the bigram model is used will be described.

【００６３】言語処理部１０では、数１に示すＳ（ｗ＿
１，…，ｗ＿ｎ）を最大にする単語系列ｗ＿１，…，ｗ
＿ｎを求め、ラティスに対する認識結果とする。In the language processing unit 10, S (w_w_
, W_n) to maximize the word sequence w_1,.
_N is determined as a recognition result for the lattice.

【００６４】[0064]

【数１】 (Equation 1)

【００６５】但し、ｎはラティスの幅（処理の対象とな
る発声数）、ｗ＿ｉは第ｉ番目の発声に対する認識結果
候補の１つ、Ｐ（ｗ＿ｉ）はｗ＿ｉの出現確率、Ｐ（ｗ
＿ｉ／ｗ＿（ｉ−１））はｗ＿（ｉ−１）が出現した場
合のｗ＿ｉの条件付き出現確率である。また、ｕ＿ｓｃ
ｏｒｅ（ｗ＿ｉ）はｗ＿ｉに対する適応スコア、ａ＿ｓ
ｃｏｒｅ（ｗ＿ｉ）はｗ＿ｉに対する音響スコアで音声
認識部３により得られる、入力音声と標準パターンとの
距離の符号を逆転したものである。ｃは予め定めた定数
である。Here, n is the lattice width (the number of utterances to be processed), w_i is one of the recognition result candidates for the i-th utterance, P (w_i) is the appearance probability of w_i, and P (w
_I / w_ (i-1)) is a conditional occurrence probability of w_i when w_ (i-1) appears. U_sc
ore (w_i) is the adaptation score for w_i, a_s
core (w_i) is obtained by reversing the sign of the distance between the input speech and the standard pattern, which is obtained by the speech recognition unit 3 using the acoustic score for w_i. c is a predetermined constant.

【００６６】適応スコアは、これまでのユーザの発声内
容と音声認識部３の認識誤りの傾向を反映したものとな
っており、予め用意した単語バイグラムだけを用いた場
合に比較して性能を向上させる手段を与えることができ
る。また、図１で示される音声入力装置と比べ、単語バ
イグラムを用いることで、より高い性能を与えることが
できる。The adaptive score reflects the contents of the user's utterance so far and the tendency of the recognition error of the voice recognition unit 3, and improves the performance as compared with the case where only the word bigram prepared in advance is used. Means can be provided. In addition, higher performance can be provided by using the word bigram as compared with the voice input device shown in FIG.

【００６７】図６は音声入力装置のハードウェア構成を
示すブロック図であり、コンピュータ６１と、記憶媒体
６２と、記憶装置６３と、表示装置６４とから構成され
ている。記憶媒体６２は、ディスク，半導体メモリ，そ
の他の記録媒体であり、コンピュータ６１を音声入力装
置として動作させるためのプログラムが記録されてい
る。FIG. 6 is a block diagram showing a hardware configuration of the voice input device, which comprises a computer 61, a storage medium 62, a storage device 63, and a display device 64. The storage medium 62 is a disk, a semiconductor memory, or another storage medium, and stores a program for causing the computer 61 to operate as a voice input device.

【００６８】図１に示した音声入力装置を実現する場合
は、記録媒体６２に記録されたプログラムがコンピュー
タ６１によって読み取られ、コンピュータ６１の動作を
制御することで、コンピュータ６１上に図１に示したパ
ラメータ分析部１，音声認識部３，候補選択部４，ユー
ザ修正部７を実現する。尚、標準パターン記憶部２，適
応スコア記憶部５は、記憶装置６３上に実現され、表示
部６は表示装置６４によって実現される。In the case of realizing the voice input device shown in FIG. 1, the program recorded on the recording medium 62 is read by the computer 61 and the operation of the computer 61 is controlled so that the program shown in FIG. A parameter analysis unit 1, a speech recognition unit 3, a candidate selection unit 4, and a user correction unit 7 are realized. Note that the standard pattern storage unit 2 and the adaptive score storage unit 5 are realized on a storage device 63, and the display unit 6 is realized by a display device 64.

【００６９】また、図５に示した音声入力装置を実現す
る場合は、記録媒体６２に記録されたプログラムがコン
ピュータ６１によって読み取られ、コンピュータ６１の
動作を制御することで、コンピュータ６１上に図５に示
したパラメータ分析部１，音声認識部３，ユーザ修正部
７，ラティス構成部９，言語処理部１０を実現する。
尚、標準パターン記憶部２，適応スコア記憶部５は、記
憶装置６３上に実現され、表示部６は表示装置６４によ
って実現される。When the voice input device shown in FIG. 5 is realized, the program recorded on the recording medium 62 is read by the computer 61 and the operation of the computer 61 is controlled so that the program is stored on the computer 61 as shown in FIG. The parameter analysis unit 1, the speech recognition unit 3, the user correction unit 7, the lattice configuration unit 9, and the language processing unit 10 shown in FIG.
Note that the standard pattern storage unit 2 and the adaptive score storage unit 5 are realized on a storage device 63, and the display unit 6 is realized by a display device 64.

【００７０】[0070]

【発明の効果】以上説明したように、本発明は、ユーザ
が認識結果を修正した場合、認識結果候補の内の、修正
後の認識結果よりも上位に位置する認識結果候補の適応
スコアの値から一定値を減じたり、或いは修正後の認識
結果よりも上位に位置する認識結果候補に対し、初期値
以外の適応スコアがあれば一定の値を減じるので、下記
の第１〜第３の効果を得ることができる。As described above, according to the present invention, when a user corrects a recognition result, the correction is performed within the recognition result candidates.
Adaptation of recognition result candidates positioned higher than the later recognition results
Recognition after subtracting a fixed value from the score value or correcting it
Initial value for the recognition result candidate higher than the result
If there is an adaptation score other than, a certain value is reduced, so that the following first to third effects can be obtained.

【００７１】第１の効果は、事前に発声内容に類似した
テキストデータを用意することなく、発声内容に適応さ
せて認識性能を向上させることができるという点であ
る。The first effect is that the recognition performance can be improved by adapting to the utterance content without preparing text data similar to the utterance content in advance.

【００７２】第２の効果は、ユーザに対する固有の認識
誤り傾向を反映し、認識誤りを減少させることができる
という点である。The second effect is that the recognition error tendency inherent to the user can be reflected, and the recognition error can be reduced.

【００７３】第３の効果は、ユーザが使用していくのに
従って、発声内容及びユーザの誤り傾向により適応さ
せ、認識性能を向上させることができるという点であ
る。即ち、使用すればするほどより高い認識性能を得る
ことができるようになる。The third effect is that the recognition performance can be improved by adapting to the utterance content and the tendency of the user to make mistakes as the user uses it. That is, the higher the use, the higher the recognition performance can be obtained.

【００７４】また、本発明は、ユーザが単語を音声入力
する毎、或いは認識結果を修正する毎に、適応スコア記
憶部に格納されている各認識結果候補の適応スコア値を
減少させるようにしているので、認識タスクが異なるも
のになった直後に於いて、認識誤りが多発しないように
することができる。Further, the present invention reduces the adaptive score value of each recognition result candidate stored in the adaptive score storage unit each time the user inputs a word by speech or corrects the recognition result. Therefore, immediately after the recognition tasks become different, it is possible to prevent recognition errors from occurring frequently.

[Brief description of the drawings]

【図１】本発明の一実施例のブロック図である。FIG. 1 is a block diagram of one embodiment of the present invention.

【図２】図１に示した音声入力装置の処理例を示すフロ
ーチャートである。FIG. 2 is a flowchart illustrating a processing example of the voice input device illustrated in FIG. 1;

【図３】認識結果候補の並べ替え処理を説明するための
図である。FIG. 3 is a diagram for explaining a process of rearranging recognition result candidates.

【図４】適応スコア記憶部５に対する更新処理を説明す
るための図である。FIG. 4 is a diagram for explaining an update process for an adaptive score storage unit 5;

【図５】本発明の別の実施例のブロック図である。FIG. 5 is a block diagram of another embodiment of the present invention.

【図６】音声入力装置のハードウェア構成の一例を示し
た図である。FIG. 6 is a diagram illustrating an example of a hardware configuration of a voice input device.

【図７】従来技術のブロック図である。FIG. 7 is a block diagram of the related art.

[Explanation of symbols]

１…パラメータ分析部２…標準パターン記憶部３…音声認識部４…候補選択部５…適応スコア記憶部６…表示部７…ユーザ修正部８…認識結果保持部９…ラティス構成部１０…言語処理部１１１…音声信号入力端子１１２…特徴抽出部１１３…認識部１１４…認識結果出力部１１５…標準パターンメモリ１１６…汎用的な統計的言語モデル１１７…学習用言語モデル１１８…適応型統計的言語モデル DESCRIPTION OF SYMBOLS 1 ... Parameter analysis part 2 ... Standard pattern storage part 3 ... Speech recognition part 4 ... Candidate selection part 5 ... Adaptive score storage part 6 ... Display part 7 ... User correction part 8 ... Recognition result holding part 9 ... Lattice construction part 10 ... Language Processing unit 111 Voice signal input terminal 112 Feature extraction unit 113 Recognition unit 114 Recognition result output unit 115 Standard pattern memory 116 General-purpose statistical language model 117 Learning language model 118 Adaptive statistical language model

フロントページの続き (56)参考文献特開昭59−3491（ＪＰ，Ａ) 特開平４−291399（ＪＰ，Ａ) 特開昭63−201698（ＪＰ，Ａ) 特開昭58−93099（ＪＰ，Ａ) 特開昭59−17595（ＪＰ，Ａ) 特開昭61−292200（ＪＰ，Ａ) 特開昭61−69099（ＪＰ，Ａ) 特許2502880（ＪＰ，Ｂ２) 特公平２−36960（ＪＰ，Ｂ２) 特公平１−43960（ＪＰ，Ｂ２) 特公平７−104679（ＪＰ，Ｂ２) ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｉｇｅｎｃｅ，Ｖｏｌ．ＰＡＭＩ− 12，Ｎｏ．６，Ｊｕｎｅ 1990，”ＡＣａｃｈｅ−ＢａｓｅｄＮａｔｕｒａｌＬａｎｇｕａｇｅＭｏｄｅｌｆｏｒＳｐｅｅｃｈＲｅｃｏｇｎｉｔｉｏｎ”，ｐ．570−583 シャープ技報，通巻第31号，斗谷充宏外「日本語音声入力装置 10−8335」, ｐ．97−103，昭和60年３月20日発行 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/22 G10L 15/18 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of front page (56) References JP-A-59-3951 (JP, A) JP-A-4-291399 (JP, A) JP-A-63-201698 (JP, A) JP-A-58-93099 (JP, A) JP-A-59-17595 (JP, A) JP-A-61-292200 (JP, A) JP-A-61-69099 (JP, A) Patent 2502880 (JP, B2) JP, B2) JP 1-43960 (JP, B2) JP 7-104679 (JP, B2) IEEE Transactions on Pattern Analysis is Machine Inteligence, Vol. PAMI-12, no. 6, June 1990, "A Cache-Based Natural Language Model for Speech Recognition", p. 570-583 Sharp Technical Report, Vol. 31, No. 31, Mitsuhiro Tootani, "Japanese Speech Input Device 10-8335," p. 97-103, issued March 20, 1985 (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 15/22 G10L 15/18 JICST file (JOIS)

Claims

(57) [Claims]

1. A speech input device for selecting an optimum one from a plurality of recognition result candidates obtained for an input speech and for obtaining a recognition result for the input speech, comprising: An adaptive score storage unit in which an adaptive score is stored; and selecting a recognition result from a plurality of recognition result candidates obtained for the input voice, when selecting the recognition result candidates stored in the adaptive score storage unit. means for selecting a recognition result adaptive score even considering the recognition result selected by said means as well as modified in accordance with an instruction from the user, among the recognition result candidates, certified after correction
Adaptive score of the recognition result candidate higher than the recognition result
And a user correction unit for subtracting a constant value from the value of the voice input device.

(2)Instead of the user correction unit, The recognition result selected by the above means is converted into a user instruction.
Therefore, correct it and place it higher than the corrected recognition result.
Adaptive score other than the initial value for the recognition result candidate
That there is a user correction unit that reduces a certain value if there is
The voice input device according to claim 1, wherein:

3. A parameter analysis unit for analyzing an input voice and outputting an analysis result, a standard pattern storage unit in which standard patterns for a plurality of words are stored, an analysis result of the parameter analysis unit, and storage of the standard pattern. A voice recognition unit that calculates a distance between each of the standard patterns stored in the unit and obtains a plurality of recognition result candidates based on the obtained distances; and an adaptive score of a plurality of words that can be a recognition result candidate. An adaptive score storage unit that is stored; an adaptive score of each recognition result candidate stored in the adaptive score storage unit; and a distance between a standard pattern of each recognition result candidate and the speech recognition. A candidate selecting unit for selecting a recognition result from among a plurality of recognition result candidates obtained by the unit, and correcting the recognition result selected by the candidate selecting unit in accordance with a user instruction Rutotomoni, of the recognition result candidate, Modify
Adaptation of recognition result candidates positioned higher than the later recognition results
A voice input device comprising: a user correction unit for subtracting a fixed value from a score value .

(4)Instead of the user correction unit, The recognition result selected by the candidate selection unit is
Make corrections according to the instructions, and
Adaptation other than the initial value to the recognition result candidates at the top
Equipped with a user correction unit that reduces a certain value if there is a score
4. The voice input device according to claim 3, wherein:

5. A parameter analysis unit for analyzing an input voice and outputting an analysis result, a standard pattern storage unit storing standard patterns for a plurality of words, an analysis result of the parameter analysis unit and storing the standard pattern. A voice recognition unit that calculates a distance between each of the standard patterns stored in the unit and obtains a plurality of recognition result candidates based on the obtained distances; and an adaptive score of a plurality of words that can be a recognition result candidate. A stored adaptive score storage unit, a statistical language model indicating an order of occurrence of words, an adaptive score of each recognition result candidate stored in the adaptive score storage unit, and a standard pattern of each recognition result candidate. Language processing for selecting a recognition result from among a plurality of recognition result candidates obtained by the speech recognition unit based on a distance between the recognition result and the contents of the statistical language model When the recognition result selected by該言word processor as well as modified according to a user instruction, of the recognition result candidates, modified
Adaptation of recognition result candidates positioned higher than the later recognition results
A voice input device comprising: a user correction unit for subtracting a fixed value from a score value .

6.Instead of the user correction unit, The recognition result selected by the language processing unit is
Make corrections according to the instructions, and
Adaptation other than the initial value to the recognition result candidates at the top
Equipped with a user correction unit that reduces a certain value if there is a score
Characterized by The voice input device according to claim 5.

7. A computer provided with an adaptive score storage unit for storing adaptive scores of a plurality of words that can be recognition result candidates is selected from a plurality of recognition result candidates obtained from input speech. A machine-readable recording medium that records a program for functioning as a speech input device that serves as a recognition result for the input speech, wherein the computer is configured to select a plurality of recognition result candidates obtained for the input speech. Means for selecting a recognition result in consideration of the adaptive score of each recognition result candidate stored in the adaptive score storage unit when selecting a recognition result; correcting the recognition result selected by the means in accordance with a user instruction And the corrected recognition result among the recognition result candidates.
Adaptive score of the recognition result candidate higher than the recognition result
A machine-readable recording medium on which a program for functioning as a user correcting unit for subtracting a certain value from the value of the above is recorded.

8.The computer, the user correction unit
Instead of, The recognition result selected by the above means is converted into a user instruction.
Therefore, correct it and place it higher than the corrected recognition result.
Adaptive score other than the initial value for the recognition result candidate
Function as a user correction unit that reduces a certain value if there is
Claims recorded a program for the
A machine-readable record that records the program according to 7.
Medium.

9. A computer comprising: a standard pattern storage unit for storing standard patterns for a plurality of words; and an adaptive score storage unit for storing adaptive scores of a plurality of words that may be recognition result candidates. And a parameter analysis unit that outputs an analysis result.A distance between the analysis result of the parameter analysis unit and each of the standard patterns stored in the standard pattern storage unit is obtained. Based on each of the obtained distances, A voice recognition unit that obtains a plurality of recognition result candidates, and among the plurality of recognition result candidates obtained by the voice recognition unit, the adaptive score of each recognition result candidate stored in the adaptive score storage unit; A candidate selecting unit for selecting a recognition result based on a distance between each recognition result candidate and a standard pattern; With modified according, among the recognition result candidates, modified
Adaptation of recognition result candidates positioned higher than the later recognition results
A machine-readable recording medium on which a program for functioning as a user correction unit for subtracting a certain value from a score value is recorded.

10.Modify the computer with the user
Instead of department, The recognition result selected by the candidate selection unit is
Make corrections according to the instructions, and
Adaptation other than the initial value to the recognition result candidates at the top
If there is a score, it can be used as a user correction unit that reduces a certain value.
Characterized by recording a program for functioning
Machine readable recording the program according to claim 9
Recording medium.

11. A standard pattern storage unit for storing standard patterns for a plurality of words, an adaptive score storage unit for storing adaptive scores of a plurality of words that may be recognition result candidates, and statistics indicating the order of occurrence of the words. A parameter analysis unit that analyzes an input voice and outputs an analysis result between the analysis result of the parameter analysis unit and each standard pattern stored in the standard pattern storage unit. A voice recognition unit that obtains a distance and obtains a plurality of recognition result candidates based on the obtained distances; and a plurality of recognition result candidates obtained by the voice recognition unit. Based on the adaptive score of each recognition result candidate and the distance between the standard pattern of each recognition result candidate and the contents of the statistical language model. A language processing unit to be selected; a recognition result selected by the language processing unit is corrected in accordance with a user instruction ;
Adaptation of recognition result candidates positioned higher than the later recognition results
A machine-readable recording medium on which a program for functioning as a user correction unit for subtracting a certain value from a score value is recorded.

12.Modify the computer with the user
Instead of department, The recognition result selected by the language processing unit is
Correct according to instructions Together with the corrected recognition result
Adaptation other than the initial value to the recognition result candidates at the top
If there is a score, it can be used as a user correction unit that reduces a certain value.
Characterized by recording a program for functioning
Machine readable recording the program according to claim 11
Recording media.