JP4939560B2

JP4939560B2 - Speech recognition apparatus, method and program

Info

Publication number: JP4939560B2
Application number: JP2009055519A
Authority: JP
Inventors: 厚徳小川; 篤中村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-03-09
Filing date: 2009-03-09
Publication date: 2012-05-30
Anticipated expiration: 2029-03-09
Also published as: JP2010210816A

Description

この発明は、入力音声信号の音声認識結果が、どの程度信頼できるかを表す信頼度を推定するようにした音声認識装置と、その方法とプログラムに関する。 The present invention relates to a speech recognition apparatus, a method and a program for estimating a reliability indicating how reliable a speech recognition result of an input speech signal is.

音声認識結果の信頼度（正解不正解とその確からしさ）を推定する音声認識装置としては、特許文献１に開示されたものが知られている。図１５にその音声認識装置１５０の機能構成を示して動作を簡単に説明する。音声認識装置１５０は、記憶部４、発話分割部５、音声認識部６、音響モデル格納部１０、辞書・言語モデル格納部１２、情報変換部２０、信頼度付与部２２、識別モデル格納部２９、出力部２６、を備える。 As a speech recognition apparatus that estimates the reliability of a speech recognition result (correct and incorrect answer and its likelihood), the one disclosed in Patent Document 1 is known. FIG. 15 shows a functional configuration of the speech recognition apparatus 150, and the operation will be briefly described. The speech recognition device 150 includes a storage unit 4, an utterance division unit 5, a speech recognition unit 6, an acoustic model storage unit 10, a dictionary / language model storage unit 12, an information conversion unit 20, a reliability assignment unit 22, and an identification model storage unit 29. The output unit 26 is provided.

記憶部４は、入力端子２に入力される音声信号を離散値化したディジタル音声信号として記憶する。発話分割部５は、所定値以上継続する無音区間に挟まれたディジタル音声信号を一発話として分割する。音声認識部６は、音響分析部８と認識探索部７とから構成される。音響分析部８は、ディジタル音声信号を特徴量ベクトルの時系列に変換する。認識探索部７は、音響モデル格納部１０と辞書・言語モデル格納部１２に格納された音響モデルと言語モデルを用いて、辞書・言語モデル格納部１２に登録されている単語列と特徴量ベクトルの時系列との照合を行い、照合尤度が最も高い単語列を認識結果として出力する。 The storage unit 4 stores the audio signal input to the input terminal 2 as a digital audio signal that has been converted into discrete values. The utterance dividing unit 5 divides a digital voice signal sandwiched between silent periods that continue for a predetermined value or more as one utterance. The voice recognition unit 6 includes an acoustic analysis unit 8 and a recognition search unit 7. The acoustic analysis unit 8 converts the digital speech signal into a feature vector time series. The recognition search unit 7 uses the acoustic model and the language model stored in the acoustic model storage unit 10 and the dictionary / language model storage unit 12 to use the word string and feature vector registered in the dictionary / language model storage unit 12. And the word string having the highest matching likelihood is output as a recognition result.

音響分析部８における音声分析方法としてよく用いられるのは、ケプストラム分析であり、特徴量としてはＭＦＣＣ（Mel Frequency Cepstral Coefficient）、ΔＭＦＣＣ、ΔΔＭＦＣＣ、対数パワー、Δ対数パワー等があり、これらが１０〜１００次元程度の特徴量ベクトルを構成する。分析フレーム幅は３０ｍｓ程度、分析フレームシフト幅は１０ｍｓ程度で分析が実行される。 A cepstrum analysis is often used as a speech analysis method in the acoustic analysis unit 8 and features include MFCC (Mel Frequency Cepstral Coefficient), ΔMFCC, ΔΔMFCC, logarithmic power, Δlogarithmic power, etc. A feature vector of about 100 dimensions is constructed. The analysis is executed with an analysis frame width of about 30 ms and an analysis frame shift width of about 10 ms.

音響モデルは、上記ＭＦＣＣ等の音声の特徴量を音素等の適切なカテゴリでモデル化したものである。この音響モデルを用いて入力音声のフレーム毎の特徴量と各カテゴリのモデルとの音響的な近さが音響尤度として計算される。現在のモデル化の手法としては、確率・統計理論によるＨＭＭ（Hidden Markov Model）に基づくものが主流となっている。言語モデルの形式は、単語リスト、定型文法、Ｎ−gramモデルの三つに大別される。孤立単語発声を認識対象とする音声認識装置においては、認識対象の単語を列挙した単語リストが用いられる（単語リストは辞書・言語モデル格納部１２に格納されている辞書と等価である）。定型的な文章発声を認識対象とする音声認識装置においては、辞書・言語モデル格納部１２に登録されている単語を連結して、装置で受理する発話内容（文章）を記述した定型文法が用いられる。自由な連続発話を認識対象とする音声認識装置においては、辞書・言語モデル格納部１２に登録されている単語のＮ連鎖確率を保持しているＮ−gramモデルが用いられ、これによりＮ連鎖以下の単語のつながり易さが言語尤度として計算される。以上のような音響モデル、言語モデルを用いた音声認識装置については、例えば非特許文献１と２に詳述されている。 The acoustic model is obtained by modeling the voice feature amount such as the MFCC in an appropriate category such as a phoneme. Using this acoustic model, the acoustic proximity between the feature quantity of each frame of the input speech and the model of each category is calculated as the acoustic likelihood. Current modeling techniques are based on HMM (Hidden Markov Model) based on probability / statistical theory. Language model formats are roughly divided into three categories: word lists, fixed grammars, and N-gram models. In a speech recognition apparatus that recognizes isolated word utterances, a word list that lists words to be recognized is used (the word list is equivalent to a dictionary stored in the dictionary / language model storage unit 12). In a speech recognition apparatus that recognizes typical sentence utterances, a fixed grammar that describes the utterance contents (sentences) received by the apparatus by connecting words registered in the dictionary / language model storage unit 12 is used. It is done. In the speech recognition apparatus for recognizing free continuous utterances, an N-gram model that holds the N chain probability of words registered in the dictionary / language model storage unit 12 is used. The ease of connecting words is calculated as language likelihood. The speech recognition apparatus using the above acoustic model and language model is described in detail in Non-Patent Documents 1 and 2, for example.

情報変換部２０は、単語列を構成する各単語について、例えば図１６に示す様な発話特徴量ベクトルに変換する。発話特徴量ベクトルの各単語の品詞情報は、この例では３７種類に分類される。品詞情報に付随する音響尤度スコアと言語尤度スコアと音素継続時間長は、この例ではそれぞれの平均値、分散値、最大値、最小値、が計算される。 The information conversion unit 20 converts each word constituting the word string into, for example, an utterance feature quantity vector as shown in FIG. The part of speech information of each word of the utterance feature vector is classified into 37 types in this example. In this example, the average value, variance value, maximum value, and minimum value of the acoustic likelihood score, the language likelihood score, and the phoneme duration length associated with the part-of-speech information are calculated.

信頼度付与部２２は、発話特徴量ベクトルを評価して信頼度を付与する。信頼度の付与は、識別モデル格納部２９に格納されている予め学習した発話特徴量ベクトルと音声認識率とを関連付けた値と、情報変換部２０が出力する発話特徴量ベクトルとを対比することで行う。例えば、１０％間隔の音声認識率に対応させた発話特徴量ベクトルを用意して置くことで、音声認識結果が１００％信頼できるものか、或いは全く信頼できない信頼度の音声認識結果であるのかを、１０％の間隔で信頼度を付与することができる。出力部２６は、各発話単位毎に、単語系列と、各単語の発話特徴量ベクトルと、信頼度とを出力する。以上の様な音声認識結果に信頼度を付与する試みは、例えば非特許文献３にも開示されている。 The reliability providing unit 22 evaluates the utterance feature quantity vector and provides the reliability. The reliability is given by comparing a value obtained by associating a previously learned utterance feature vector stored in the identification model storage unit 29 with a speech recognition rate with the utterance feature vector output by the information conversion unit 20. To do. For example, by preparing an utterance feature vector corresponding to a speech recognition rate at 10% intervals, whether the speech recognition result is 100% reliable or not reliable at all. Reliability can be given at intervals of 10%. The output unit 26 outputs a word series, an utterance feature amount vector of each word, and a reliability for each utterance unit. For example, Non-Patent Document 3 discloses an attempt to give reliability to the speech recognition result as described above.

特開２００７−２４０５８９号公報JP 2007-240589 A

鹿野清宏、伊藤克亘、河原達也、武田一哉、山本幹雄、IT Text 音声認識システム、オーム社、pp. 1-51, 2001Kiyohiro Shikano, Katsunobu Ito, Tatsuya Kawahara, Kazuya Takeda, Mikio Yamamoto, IT Text Speech Recognition System, Ohmsha, pp. 1-51, 2001 安藤彰男、リアルタイム音声認識、（社）電子情報通信学会、pp. 1-58, pp. 125-170, 2003Akio Ando, Real-time Speech Recognition, IEICE, pp. 1-58, 125-170, 2003 H. Jiang, “Confidence measures for speech recognition: A survey,” Speech Communication, vol. 45, pp. 455-470, 2005.H. Jiang, “Confidence measures for speech recognition: A survey,” Speech Communication, vol. 45, pp. 455-470, 2005.

従来の音声認識結果に信頼度を付与して出力する音声認識装置によれば、信頼度を利用することで認識結果が正しい或いは間違っているという推定に基づく運用が実現できる。しかし、それだけでは解決できない課題が存在する。例えば、図１７に示すように認識対象が男性話者に設定された音声認識装置に対して、女性話者が音声入力した場合を想定すると、音声が認識できないことが多い。そこで、信頼度を利用して女性話者に再発声を促すことはできると思われる。しかし、女性話者は再発声を要求された理由が不明である。その結果、ユーザが音声認識装置を満足に使いこなすことができない。 According to a conventional speech recognition apparatus that outputs a speech recognition result with reliability added thereto, an operation based on the estimation that the recognition result is correct or incorrect can be realized by using the reliability. However, there are issues that cannot be solved by themselves. For example, as shown in FIG. 17, assuming that a female speaker inputs a voice to a voice recognition device in which the recognition target is set to a male speaker, the voice is often not recognized. Therefore, it seems that the reliability can be used to encourage a female speaker to reoccur. However, it is unclear why the female speaker was asked to recite. As a result, the user cannot satisfactorily use the speech recognition apparatus.

この発明はこの点に鑑みてなされたものであり、音声の認識誤りが生じた場合に、その誤り原因を利用者に提示することで、利用者に音声認識装置を適切に使用させるように促すことができる音声認識装置とその方法と、プログラムを提供することを目的とする。 The present invention has been made in view of this point. When a speech recognition error occurs, the user is prompted to cause the user to use the speech recognition apparatus appropriately by presenting the cause of the error. An object of the present invention is to provide a speech recognition apparatus, a method thereof, and a program.

この発明の音声認識装置は、音声認識部と正誤・誤り原因推定部を備える。音声認識部は、入力音声を音声認識した単語列と、その単語列を構成する各単語の特徴量を複数のパラメータで表した各単語の発話特徴量ベクトルとを出力する。正誤・誤り原因推定部は、各単語の発話特徴量ベクトルを入力として、その各単語の正解不正解と誤り原因の推定値及びその確からしさを、発話特徴量ベクトルと音声認識結果単語の正解不正解及び誤り原因との関係を表す識別モデルに基づく条件付確率を用いて推定する。 The speech recognition apparatus according to the present invention includes a speech recognition unit and an error / error cause estimation unit. The voice recognition unit outputs a word string obtained by voice recognition of the input voice and an utterance feature quantity vector of each word in which the feature quantity of each word constituting the word string is represented by a plurality of parameters. The correctness / error cause estimation unit receives the utterance feature vector of each word as input, and determines the correct answer of each word, the estimated error cause, and its probability, and the correctness of the utterance feature vector and the speech recognition result word. Estimation is performed using a conditional probability based on an identification model that represents the relationship between the correct answer and the cause of error.

この発明の音声認識装置は、音声認識結果の誤認識が推定された場合に、その誤り原因を推定する。その推定した誤り原因を提示することで、利用者に音声認識装置の適切な使用方法を促すことができる。その推定した誤り原因を用いることで、例えば上記したように認識音声の種別が異なっている場合でも、図１８に示すように「すみません。今、男性の声を認識する設定になっています。女性認識用ボタンを押して下さい。」と、メッセージを表示することが可能である。よって、利用者は音声認識装置を適切に使用することができる。 The speech recognition apparatus according to the present invention estimates the cause of an error when an erroneous recognition of a speech recognition result is estimated. By presenting the estimated cause of error, it is possible to prompt the user to use an appropriate method for using the speech recognition apparatus. By using the estimated cause of error, for example, even when the type of recognized speech is different as described above, as shown in FIG. 18, “I'm sorry. Now, it is set to recognize male voice. "Please press the recognition button." Therefore, the user can use the speech recognition apparatus appropriately.

この発明の音声認識装置１００の機能構成例を示す図。The figure which shows the function structural example of the speech recognition apparatus 100 of this invention. 音声認識装置１００の動作フローを示す図。The figure which shows the operation | movement flow of the speech recognition apparatus 100. 正誤・誤り原因推定部４０の機能構成例を示す図。The figure which shows the function structural example of the right / wrong / error cause estimation part 40. 正誤・誤り原因推定部４０の動作フローを示す図。The figure which shows the operation | movement flow of the correctness / error cause estimation part 40. 正誤・誤り原因ラベルベクトルｙ^→の取り得る値の一例を示す図。The figure which shows an example of the value which the correct / error cause label vector y- ^> can take. 正誤・誤り原因推定部４０′の機能構成例を示す図。The figure which shows the function structural example of right / wrong / error cause estimation part 40 '. 正誤・誤り原因推定部７０の機能構成例を示す図。The figure which shows the function structural example of the correctness / error cause estimation part 70. 正誤・誤り原因推定部７０の動作フローを示す図。The figure which shows the operation | movement flow of the correctness / error cause estimation part 70. 誤り原因ラベルベクトルｚ^→の取り得る値の一例を示す図。The figure which shows an example of the value which the error cause label vector z- ^> can take. 正誤・誤り原因推定部７０′の機能構成例を示す図。The figure which shows the function structural example of correct / error / error cause estimation part 70 '. 正誤・誤り原因推定部１１０の機能構成例を示す図。The figure which shows the function structural example of the correctness / error cause estimation part 110. 正誤・誤り原因推定部１１０の動作フローを示す図。The figure which shows the operation | movement flow of the correctness / error cause estimation part 110. この発明の音声認識装置１３０の機能構成例を示す図。The figure which shows the function structural example of the speech recognition apparatus 130 of this invention. 正誤・誤り原因メッセージの一例を示す図。The figure which shows an example of a right / wrong / error cause message. 特許文献１の音声認識装置１５０の機能構成を示す図。The figure which shows the function structure of the speech recognition apparatus 150 of patent document 1. FIG. 発話特徴量ベクトルｘ^→の一例を示す図。The figure which shows an example of utterance feature-value vector x- ^> . 従来の音声認識の状況の一例を示す図。The figure which shows an example of the condition of the conventional voice recognition. この発明の音声認識装置を用いた音声認識の状況の一例を示す図。The figure which shows an example of the condition of the speech recognition using the speech recognition apparatus of this invention.

以下に、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は省略する。 Embodiments of the present invention will be described below with reference to the drawings. The same components in the drawings are denoted by the same reference numerals, and the description thereof is omitted.

図１にこの発明の音声認識装置１００の機能構成例を示す。その動作フローを図２に示す。音声認識装置１００は、音声認識部３０、正誤・誤り原因推定部４０、を備える。音声認識装置１００は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。 FIG. 1 shows a functional configuration example of the speech recognition apparatus 100 of the present invention. The operation flow is shown in FIG. The speech recognition apparatus 100 includes a speech recognition unit 30 and a correct / error / error cause estimation unit 40. The speech recognition apparatus 100 is realized by reading a predetermined program into a computer configured with, for example, a ROM, a RAM, a CPU, and the like, and executing the program by the CPU.

音声認識部３０は、入力端子２に入力される音声を音声認識した単語列と、その単語列を構成する各単語の特徴量を複数のパラメータで表した各単語の発話特徴量ベクトルｘ^→（→は図中の表記が正しい）と、を出力する（ステップＳ３０）。音声認識部３０は、従来技術で説明した音声認識装置１５０の記録部４から情報変換部２０までの構成を含むものである。各単語の発話特徴量ベクトルｘ^→も、例えば音響尤度スコアや言語尤度スコアから成る図１６に示したようなベクトルである。正誤・誤り原因推定部４０は、各単語の発話特徴量ベクトルｘ^→を入力としてその発話特徴量ベクトルｘ^→からその単語の正解不正解と誤り原因の推定値ｙ^→＾と、その確からしさを推定する（ステップＳ４０）。音声認識装置１００は、音声認識結果である単語列中の各単語の正解不正解と誤り原因の推定値ｙ^→＾とその確からしさの他に、図示していないが当然、音声認識した単語列も出力する。 The speech recognition unit 30 speech-recognizes the speech input to the input terminal 2 and the utterance feature amount vector x ^→ (word) representing the feature amount of each word constituting the word sequence by a plurality of parameters. → indicates that the notation in the drawing is correct) (step S30). The voice recognition unit 30 includes the configuration from the recording unit 4 to the information conversion unit 20 of the voice recognition device 150 described in the related art. The utterance feature quantity vector x ^{→ of} each word is also a vector as shown in FIG. 16 including, for example, an acoustic likelihood score and a language likelihood score. Correctness and error cause estimation part 40, the estimated value y ^→ ^ of correct incorrect and the error cause of the word from the utterance feature vector x ^→ speech feature vector x ^→ of each word as input, the likelihood Estimate (step S40). The speech recognition apparatus 100 has a speech recognition-recognized word sequence in addition to the correct / incorrect answer of each word in the word sequence that is the speech recognition result and the error cause estimation value y ^→ ^ and its certainty. Is also output.

このように音声認識装置１００は、認識結果の正解不正解に加えて誤り原因も推定するので、利用者はどのような原因で音声認識装置１００が動作しないのかを知ることができる。その結果、音声認識装置１００の適切な使用を可能にする。図３にこの発明の要部である正誤・誤り原因推定部４０の機能構成例を示して更に詳しく説明する。 Thus, since the speech recognition apparatus 100 estimates the cause of the error in addition to the correct / incorrect answer of the recognition result, the user can know what cause the speech recognition apparatus 100 does not operate. As a result, the voice recognition apparatus 100 can be used appropriately. FIG. 3 shows a functional configuration example of the correctness / error / error cause estimation unit 40, which is a main part of the present invention, and will be described in more detail.

なお、以下の説明では、音声認識装置は孤立単語音声認識装置であり、静かな場所における男性の声で、日本の地名発声を認識する場合を想定する（図１７参照）。 In the following description, it is assumed that the speech recognition device is an isolated word speech recognition device and recognizes Japanese place name utterances with a male voice in a quiet place (see FIG. 17).

正誤・誤り原因推定部４０は、正誤・誤り原因条件付確率計算部４１、モデルパラメータ記録部４２、正誤・誤り原因選択部４３、を具備する。モデルパラメータ記録部４２は、発話特徴量ベクトルｘ^→と正誤・誤り原因ラベルベクトルｙ^→との関係を表す識別モデルに基づく条件付確率を計算するのに必要なモデルパラメータを記録する。正誤・誤り原因条件付確率計算部４１は、音声認識部３０が出力する各単語の発話特徴量ベクトルｘ^→を入力として、識別モデルの一種である最大エントロピーモデル（ＭＥＭ：Maximum Entropy Model）に基づく条件付確率を、予め設定された正誤・誤り原因ラベルベクトルｙ^→の取り得る値毎に、モデルパラメータ記録部４２に記録されている、素性関数ｆ_ｋ（ｘ^→，ｙ^→）とその重みパラメータλ_ｋと（これらが最大エントロピーモデルのモデルパラメータである）、を用いて計算する（ステップＳ４１、図４）。最大エントロピーモデルは識別モデルの一例であり、最近の信頼度推定手法に用いられるものである。 The correctness / error cause estimation unit 40 includes a correctness / error cause conditional probability calculation unit 41, a model parameter recording unit 42, and a correctness / error cause selection unit 43. The model parameter recording unit 42 records model parameters necessary for calculating a conditional probability based on an identification model representing the relationship between the utterance feature vector x ^→ and the correct / error cause label vector y ^→ . The correctness / error cause conditional probability calculation unit 41 is based on a maximum entropy model (MEM), which is a kind of identification model, with the utterance feature vector x ^→ of each word output from the speech recognition unit 30 as an input. Feature functions f _k (x ^→ , y ^→ ) and weight parameters thereof recorded in the model parameter recording unit 42 for each possible value of the conditional error / prevention label vector y ^→ set in advance. lambda _k and (these are model parameters of maximum entropy models), calculated using (step S41, Fig. 4). The maximum entropy model is an example of an identification model and is used in recent reliability estimation methods.

正誤・誤り原因ラベルベクトルｙ^→とは、一つの正誤ラベルｙ_０と一つ以上の誤り原因ラベルｙ_i，ｉ≧1を各次元に持つベクトルである。正誤ラベルと誤り原因ラベルを合わせて正誤・誤り原因ラベルｙ_i，ｉ≧0と呼ぶ。正誤・誤り原因ラベルｙ_iは、例えば表１に示すようなものである。 The right / wrong / error cause label vector y ^→ is a vector having one right / wrong label y ₀ and one or more error cause labels y _i , i ≧ 1 in each dimension. The correct / wrong label and the error cause label are collectively referred to as correct / wrong cause label y _i , i ≧ 0. The right / wrong / error cause label y _i is as shown in Table 1, for example.

正解不正解を表す正誤・誤り原因ラベル（正誤ラベル）ｙ_０は、発話特徴量ベクトルｘ^→から最大エントロピーモデルに基づいて推定された２値の情報である。ｙ_０＝０が正解、ｙ_０＝１が不正解を表す。正誤・誤り原因ラベル（誤り原因ラベル）ｙ_１は、語彙内（ｙ_１＝０）か、語彙外（ｙ_１＝１）かを表す。正誤・誤り原因ラベル（誤り原因ラベル）ｙ_２は、雑音なし（ｙ_２＝０）か、雑音あり（ｙ_２＝１）かを表す。正誤・誤り原因ラベル（誤り原因ラベル）ｙ_３は、男性（ｙ_３＝０）か、女性（ｙ_３＝１）かを表す。 The correct / incorrect / error cause label (correct / incorrect label) y ₀ representing the correct / incorrect answer is binary information estimated from the utterance feature vector x ^→ based on the maximum entropy model. y ₀ = 0 represents a correct answer, and y ₀ = 1 represents an incorrect answer. The right / wrong / error cause label (error cause label) y ₁ indicates whether it is within the vocabulary (y ₁ = 0) or outside the vocabulary (y ₁ = 1). The right / wrong / error cause label (error cause label) y ₂ indicates whether there is no noise (y ₂ = 0) or noise (y ₂ = 1). The right / wrong / error cause label (error cause label) y ₃ represents male (y ₃ = 0) or female (y ₃ = 1).

正誤・誤り原因ラベルとしては、表１に示す４種類の他にも、例えば、音量が適切であるか／適切でないか、使用者の年齢層が音声認識装置が想定しているものに一致しているか／一致していないか（成人を使用者として想定する音声認識装置の場合、子供や高齢者は想定外の使用者である）、などを挙げることができる。ここでは煩雑さを避けるために、表１に示す４種類に限定して説明を行う。 In addition to the four types of correct / incorrect error cause labels, for example, whether the sound volume is appropriate / inappropriate, or the user's age group matches that assumed by the speech recognition apparatus. (In the case of a speech recognition device that assumes an adult as a user, a child or an elderly person is an unexpected user). Here, in order to avoid complexity, the description is limited to the four types shown in Table 1.

表１の例では、正誤・誤り原因ラベルの数が４個なので、正誤・誤り原因ラベルベクトルｙ^→の取り得る値は２^４＝１６状態に場合分けすることができ、それぞれをベクトル表記ｙ^→できる。但し、ｙ^→＝（ｙ_０，ｙ_１，ｙ_２，ｙ_３）＝（０，１，０，０）の「語彙外だけど認識できた」等は、ありえない組み合わせである。これらの存在しない組み合わせを考慮すると、正誤・誤り原因ラベルベクトルｙ^→の取り得る値は、図５に示すように１２状態である。 In the example of Table 1, since correctness and error cause label number is four, the possible values for accuracy and error cause label vector y ^→ may be a case divided into 2 ^{4 =} 16 states, vector notation each y ^→ it can. However, “outside the vocabulary but could be recognized” or the like of y ^→ = (y ₀ , y ₁ , y ₂ , y ₃ ) = ( ₀ , ₁ , ₀ , ₀ ) is an impossible combination. Considering these non-existing combinations, the possible values of the error / error source label vector y ^→ are 12 states as shown in FIG.

最大エントロピーモデルに基づく正誤・誤り原因推定では、例えばこれら１２状態の正誤・誤り原因ラベルベクトルｙ^→と、発話特徴量ベクトルｘ^→との関係を、予め学習データを用いて学習しておく。まず、発話特徴量ベクトルｘ^→と正誤・誤り原因ラベルベクトルｙ^→の関係を表すＫ種類（１００〜１００万種類程度）の素性関数ｆ_ｋ（ｘ^→，ｙ^→）, ｋ＝１，２，．．，Ｋを用意する。そして、各素性関数ｆ_ｋ（ｘ^→，ｙ^→）の重みパラメータλ_ｋを、例えば準ニュートン法により学習して推定する。これらの素性関数ｆ_ｋ（ｘ^→，ｙ^→）と重みパラメータλ_ｋは、モデルパラメータ記録部４２に予め記録されている。 In the true / false / error cause estimation based on the maximum entropy model, for example, the relationship between the 12-state correct / wrong cause label vector y ^→ and the utterance feature vector x ^→ is learned in advance using learning data. First, K types (about 1 to 1 million types) of feature functions f _k (x ^→ , y ^→ ), k = 1, 2, which represent the relationship between the utterance feature vector x ^→ and the correctness / error cause label vector y ^→ . . . , K are prepared. Then, the weight parameter λ _k of each feature function f _k (x ^→ , y ^→ ) is learned and estimated by, for example, the quasi-Newton method. These feature functions f _k (x ^→ , y ^→ ) and weight parameter λ _k are recorded in advance in the model parameter recording unit 42.

正誤・誤り原因条件付確率計算部４１は、発話特徴量ベクトルｘ^→を入力として、モデルパラメータ記録部４２に記録されている素性関数ｆ_ｋ（ｘ^→，ｙ^→）と重みパラメータλ_ｋを参照して式（１）に示す正誤・誤り原因条件付確率Ｐ_ＭＥ（ｙ^→｜ｘ^→）を計算する。 The correctness / error cause conditional probability calculation unit 41 receives the utterance feature vector x ^→ as an input, and refers to the feature function f _k (x ^→ , y ^→ ) and the weight parameter λ _k recorded in the model parameter recording unit 42. Then, the correctness / error cause conditional probability P _ME (y ^→ | x ^→ ) shown in Expression (1) is calculated.

正誤・誤り原因条件付確率Ｐ_ＭＥ（ｙ^→｜ｘ^→）は、この例では１２個ある正誤・誤り原因ラベルベクトルｙ^→毎に計算される。これらの値は、０〜１の確率値である（全ての正誤・誤り原因ラベルベクトルｙ^→（この例では１２個）についてその条件付確率を足すと１．０になる。すなわち、Σ_ｙ→Ｐ_ＭＥ（ｙ^→｜ｘ^→）＝１．０である）。例えば、正誤・誤り原因ラベルベクトルｙ^→＝（ｙ_０，ｙ_１，ｙ_２，ｙ_３）＝（１，０，１，０）「雑音が乗っていたので不正解」の正誤・誤り原因条件付確率Ｐ_ＭＥ（ｙ^→＝（１，０，１，０）｜ｘ^→）の値が大きければ、雑音が多くて誤認識している可能性が高いことを意味する。 In this example, the correctness / error cause conditional probability P _ME (y ^→ | x ^→ ) is calculated for every 12 correct / error cause label vectors y ^→ . These values are probability values of 0 to 1 (adding conditional probabilities for all correct / error-cause label vectors y ^→ (12 in this example)), that is, Σ _{y →} P _ME (y ^→ | x ^→ ) = 1.0). For example, correct / error cause label vector y ^→ = (y ₀ , y ₁ , y ₂ , y ₃ ) = ( ₁ , ₀ , ₁ , ₀ ) Correct / error cause condition of “incorrect answer due to noise” If the attached probability P _ME (y ^→ = (1, 0, 1, 0) | x ^→ ) is large, it means that there is a high possibility of misrecognition due to a lot of noise.

このように正誤・誤り原因条件付確率Ｐ_ＭＥ（ｙ^→｜ｘ^→）は、正誤・誤り原因ラベルベクトルｙ^→、つまり音声認識結果の正解不正解と誤り原因の推定値の確からしさを示す値である。なお、正誤・誤り原因ラベルベクトルｙ^→は、図３に示すように正誤・誤り原因条件付確率計算部４１の外部に正誤・誤り原因ラベルベクトル記録部４４を設け、そこに記録して置き、正誤・誤り原因条件付確率計算部４１がそれを参照するようにしても良い。 Thus, the correctness / error cause conditional probability P _ME (y ^→ | x ^→ ) is a value indicating the accuracy of the correct / incorrect error cause label vector y ^→ , that is, the correct / incorrect answer of the speech recognition result and the estimated cause of the error. It is. The correct / error cause label vector y ^→ is provided with a correct / error cause label vector recording unit 44 outside the correct / error cause conditional probability calculation unit 41 as shown in FIG. The correctness / error cause conditional probability calculation unit 41 may refer to it.

正誤・誤り原因選択部４３は、正誤・誤り原因条件付確率Ｐ_ＭＥ（ｙ^→｜ｘ^→）を入力として、式（２）に示すように正誤・誤り原因条件付確率Ｐ_ＭＥ（ｙ^→｜ｘ^→）を最大にする正誤・誤り原因ラベルベクトルの推定値ｙ^→＾を、この例では図５に示す１２個の組み合わせから選択する（ステップＳ４３）。 The right / wrong / error cause selection unit 43 receives the right / wrong / error-cause conditional probability P _ME (y ^→ | x ^→ ) as input, and the right / wrong / error-cause conditional probability P _ME (y ^→ | In this example, the estimated value y ^→ ^ of the correct / error / error cause label vector that maximizes x ^→ ) is selected from the 12 combinations shown in FIG. 5 (step S43).

式（２）によって、例えば、図５に示す「女性なので不正解」ｙ^→＾＝（１，０，０，１）が選択される。 For example, “incorrect answer because it is a woman” y ^→ ^ = (1, 0, ⁰ , 1) shown in FIG. 5 is selected by Expression (2).

このように音声認識装置１００によれば、音声認識結果の正解不正解と誤り原因の推定値ｙ^→＾と、その確からしさＰ_ＭＥ（ｙ^→＾｜ｘ^→）を推定することが可能である。なお、音声認識装置１００を孤立単語音声認識装置として説明したが、この実施例１の考えは他の定型文音声認識や連続音声認識にも適用が可能である。以降の変形例や実施例でも同様である。 As described above, according to the speech recognition apparatus 100, it is possible to estimate the correct and incorrect answer of the speech recognition result and the estimated value y ^→ ^ of the error cause and the probability P _ME (y ^→ ^ | x ^→ ). . Although the speech recognition apparatus 100 has been described as an isolated word speech recognition apparatus, the idea of the first embodiment can be applied to other fixed sentence speech recognition and continuous speech recognition. The same applies to the following modifications and embodiments.

〔変形例１〕
実施例１では、一つの最大エントロピーモデルＭＥを用いて正誤・誤り原因ラベルベクトルｙ^→の取り得る値毎に正誤・誤り原因条件付確率Ｐ_ＭＥ（ｙ^→｜ｘ^→）を求めたが、正誤・誤り原因ラベルｙ_ｉ，ｉ＝０，１，２，３に独立性があると仮定して、正誤・誤り原因ラベルｙ_ｉ毎に専用の最大エントロピーモデルＭＥ_ｉ，ｉ＝０，１，２，３を準備して、各正誤・誤り原因ラベルｙ_ｉの取り得る値（ｙ_ｉ＝０または１）毎に条件付確率Ｐ_ＭＥｉ（ｙ_ｉ＝ｊ｜ｘ^→），ｊ＝０，１を求め、それらから、正誤・誤り原因ラベルベクトルの推定値ｙ^→＾を求めても良い。図６にその方法の正誤・誤り原因推定部４０′の機能構成例を示す。 [Modification 1]
In the first embodiment, the correctness / error cause conditional probability P _ME (y ^→ | x ^→ ) is obtained for each possible value of the correctness / error cause label vector y ^→ using one maximum entropy model ME. Assuming that the error cause labels y _i , i = 0, 1, 2, 3 are independent, the dedicated maximum entropy model ME _i , i = 0, 1, 2, for each error / error cause label y _i , 3 and the conditional probability P _MEi (y _i = j | x ^→ ), j = 0, 1 for each possible value (y _i = 0 or 1) of each correct / error cause label y _i It is also possible to obtain the estimated value y ^→ ^ of the correctness / error cause label vector. FIG. 6 shows an example of the functional configuration of the correctness / error / error cause estimation unit 40 'of the method.

正誤・誤り原因推定部４０′は、正誤・誤り原因条件付確率計算部４１′と、モデルパラメータ記録部４２′と、正誤・誤り原因選択部４３′を備える。正誤・誤り原因条件付確率計算部４１′は、正誤・誤り原因ラベルｙ_ｉ毎に専用の最大エントロピーモデルＭＥ_ｉを用いて、各正誤・誤り原因ラベルｙ_ｉの取り得る値（０か１）毎に条件付確率Ｐ_ＭＥｉ（ｙ_ｉ＝ｊ｜ｘ^→），ｊ＝０，１を式（３）の計算で求める。 The correctness / error cause estimation unit 40 'includes a correctness / error cause conditional probability calculation unit 41', a model parameter recording unit 42 ', and a correctness / error cause selection unit 43'. Correctness and error cause probability calculation unit 41 'is conditional, using the maximum entropy model ME _i dedicated to each correctness and error cause label y _i, possible for each correctness and error cause label y _i value (0 or 1) Each time, the conditional probability P _MEi (y _i = j | x ^→ ), j = 0, 1 is obtained by the calculation of equation (3).

モデルパラメータ記録部４２′は、最大エントロピーモデルＭＥ_ｉにおける発話特徴量ベクトルｘ^→と正誤・誤り原因ラベルｙ_ｉの関係を表す素性関数ｆ_ｋ ^ｉ（ｘ^→，ｙ_ｉ）（式（３））を記録する。素性関数ｆ_ｋ ^ｉ（ｘ^→，ｙ_ｉ）は、Ｋ_ｉ種類（１００〜１００万種類程度）定義する（ｆ_ｋ ^ｉ（ｘ^→，ｙ_ｉ）, ｋ＝１，２，．．，Ｋ_ｉ）。また、λ_ｋ ^ｉは、素性関数ｆ_ｋ ^ｉ（ｘ^→，ｙ_ｉ）の重みパラメータである。最大エントロピーモデルＭＥ_ｉ毎に異なる学習データを用いて、例えば準ニュートン法により、λ_ｋ ^ｉを学習して推定する。また、最大エントロピーモデルＭＥ_ｉ毎に異なる素性関数をｆ_ｋ ^ｉ（ｘ^→，ｙ_ｉ）, ｋ＝１，２，．．，Ｋ_ｉを定義しないで、共通の素性関数を用いてもよい。例えば、全ての最大エントロピーモデルＭＥ_ｉ，ｉ＝０，１，２，３について、共通の素性関数ｆ_ｋ（ｘ^→，ｙ_ｉ）, ｋ＝１，２，．．，Ｋを用いてもよい。 The model parameter recording unit 42 ′ is a feature function f _k ⁱ (x ^→ , y _i ) representing the relationship between the utterance feature vector x ^→ in the maximum entropy model ME _i and the correctness / error cause label y _i (formula (3)). Record. The feature function f _k ⁱ (x ^→ , y _i ) is defined as K _i types (about 1 to 1 million types) (f _k ⁱ (x ^→ , y _i ), k = 1, 2,..., K _i. ). Further, λ _k ⁱ is a weight parameter of the feature function f _k ⁱ (x ^→ , y _i ). The learning data different for each maximum entropy model ME _i is used to learn and estimate λ _k ⁱ by, for example, the quasi-Newton method. Also, a feature function that differs for each maximum entropy model ME _i is expressed as f _k ⁱ (x ^→ , y _i ), k = 1, 2,. . , K _i may be defined without using a common feature function. For example, for all maximum entropy models ME _i , i = 0, 1, 2, 3, a common feature function f _k (x ^→ , y _i ), k = 1, 2,. . , K may be used.

正誤・誤り原因推定部４０′は、表１に示した正誤・誤り原因ラベルｙ_ｉ，ｉ＝０，１，２，３毎に専用の最大エントロピーモデルＭＥ_ｉを用いて、各ラベルｙ_ｉの取り得る値（ｙ_ｉ＝０または１）毎に、条件付確率Ｐ_ＭＥ０（ｙ_０＝ｊ｜ｘ^→），Ｐ_ＭＥ１（ｙ_１＝ｊ｜ｘ^→），Ｐ_ＭＥ２（ｙ_２＝ｊ｜ｘ^→），Ｐ_ＭＥ３（ｙ_３＝ｊ｜ｘ^→），ｊ＝０，１を求める。 The correctness / error cause estimation unit 40 ′ uses the maximum entropy model ME _i dedicated to each of the correctness / error cause labels y _i , i = 0, 1, 2, 3 shown in Table 1 for each label y _i . For each possible value (y _i = 0 or 1), conditional probabilities P _ME0 (y ₀ = j | x ^→ ), P _ME1 (y ₁ = j | x ^→ ), P _ME2 (y ₂ = j | x ^→ ), P _ME3 (y ₃ = j | x ^→ ), j = 0, 1 is obtained.

正誤・誤り原因選択部４３′は、それら６個の条件付確率を入力として式（４）の計算で正誤・誤り原因ラベルベクトルの推定値ｙ^→＾を求める。 The right / wrong / error cause selection unit 43 ′ receives the six conditional probabilities as input and obtains an estimated value y ^→ ^ of the right / wrong cause label vector by the calculation of Expression (4).

等号で結ばれた式（４）の中央の項のｙ_ｉ＾は、右側の項に示す通り、０か１の何れかであり、条件付確率Ｐ_ＭＥｉ（ｙ_ｉ＝ｊ｜ｘ^→），ｊ＝０，１で大きな方の値を与えるｊである。Ｔは行列の転置を表す。正誤・誤り原因選択部４３′は、正誤・誤り原因ラベルベクトルの推定値ｙ^→＾の条件付確率Ｐ（ｙ^→＾｜ｘ^→）（確からしさ）を、式（５）に示すように正誤・誤り原因ラベルｙ_ｉ毎の条件付確率の積で求める。 The middle term y _i ^ of the equation (4) connected by the equal sign is either 0 or 1, as shown in the right term, and the conditional probability P _MEi (y _i = j | x ^→ ) , J = 0, 1 and j giving the larger value. T represents the transpose of the matrix. The correctness / error / cause cause selection unit 43 ′ corrects the conditional probability P (y ^→ ^ | x ^→ ) (probability) of the estimated value y ^→ ^ of the correctness / error cause label vector as shown in Expression (5). -It calculates | requires by the product of the conditional probability for every error cause label _yi .

この正誤・誤り原因ラベルｙ_ｉ間に独立性があると仮定して正誤・誤り原因ラベルベクトルの推定値ｙ^→＾を求める方法では、実施例１ではありえない組み合わせとして説明した正誤・誤り原因ラベルベクトルの取り得る値、例えばｙ^→＝（０，１，０，０）等を推定値としてしまう場合も考えられる。これらのありえない推定値ｙ^→＾については、ソフトウェアやハードウェアで容易に出力を禁止することが可能である。又、後述する一度正誤・誤り条件付確率Ｐ_ＭＥｉ（ｙ_ｉ＝ｊ｜ｘ^→）を求めた後に、改めて正誤を推定し直す方法（実施例３）と組み合わせることで、ありえない推定値ｙ^→＾の出力を抑制する方法も考えられる。何れにしろ容易に解決できる。 In this method of obtaining the correctness / error cause label vector estimation value y ^→ ^ on the assumption that the correctness / error cause label y _i is independent, the correctness / error cause label vector described as a combination that cannot be achieved in the first embodiment. It is also conceivable that the estimated value may be a value that can be taken by, for example, y ^→ = (0, 1, 0, 0). These impossible estimated values y ^→ ^ can be easily prohibited by software or hardware. In addition, after obtaining the correctness / error conditional probability P _MEi (y _i = j | x ^→ ), which will be described later, in combination with a method of re-estimating the correctness (Example 3), an estimated value y ^→ ^ A method of suppressing the output is also conceivable. In any case, it can be solved easily.

この正誤・誤り原因ラベルｙ_ｉ間に独立性を仮定して正誤・誤り原因ラベルベクトルの推定値ｙ^→＾を推定する方法は、正誤・誤り原因ラベルｙ_ｉ個々について、専用の最大エントロピーモデルＭＥ_ｉを用いて条件付確率を求めるので、実施例１の方法よりも精度良く正誤・誤り原因ラベルベクトルを推定できる。 Assuming independence between the correct / error / error cause labels y _i , the method of estimating the correct / error / error cause label vector estimate y ^→ ^ is obtained by using a dedicated maximum entropy model ME for each correct / error / error cause label y _i. _Since the conditional probability is obtained using _i , the correctness / error cause label vector can be estimated with higher accuracy than the method of the first embodiment.

実施例１及びその変形例１は、音声認識部３０の出力する発話特徴量ベクトルｘ^→に対して常に正解不正解と誤り原因を推定するものである。しかし、音声認識装置１００が適切に使用されて音声の認識率が高い場合において常に正解不正解や誤り原因を推定することは、計算負荷の増加を招き無駄である。そこで、発話特徴量ベクトルｘ^→から誤認識の可能性が疑われる場合だけ、誤り原因を推定するようにした実施例２を次に説明する。 In the first embodiment and the first modification thereof, the correct answer and the wrong cause are always estimated with respect to the utterance feature vector x ^→ output from the speech recognition unit 30. However, when the speech recognition apparatus 100 is properly used and the speech recognition rate is high, it is wasteful to always estimate correct answers and incorrect causes and increase the calculation load. Accordingly, a second embodiment in which the cause of the error is estimated only when the possibility of erroneous recognition is suspected from the utterance feature vector x ^→ will be described below.

実施例２の正誤・誤り原因推定部７０の機能構成例を図７に示す。その動作フローを図８に示す。正誤・誤り原因推定部７０は、正誤条件付確率計算部７１、正誤判定部７２、誤り原因条件付確率計算部７３、誤り原因選択部７４、モデルパラメータ記録部４２′を備える。モデルパラメータ記録部４２′は、図６に示す変形例１と同じものである。 An example of the functional configuration of the correctness / error / error cause estimation unit 70 of the second embodiment is shown in FIG. The operation flow is shown in FIG. The correctness / error cause estimation unit 70 includes a correctness / correction conditional probability calculation unit 71, a correctness / error determination unit 72, an error cause conditional probability calculation unit 73, an error cause selection unit 74, and a model parameter recording unit 42 '. The model parameter recording unit 42 'is the same as that of the first modification shown in FIG.

正誤条件付確率計算部７１は、発話特徴量ベクトルｘ^→から音声認識の結果の正解及び不正解それぞれの条件付確率Ｐ_ＭＥ０（ｙ_０＝ｊ｜ｘ^→）のみを計算する（ステップＳ７１）。この正解及び不正解の条件付確率の計算は、図６の正誤・誤り原因条件付確率計算部４１′において、正解及び不正解の条件付確率を求めるための専用の最大エントロピーモデルＭＥ_０を用いて、条件付確率Ｐ_ＭＥ０（ｙ_０＝ｊ｜ｘ^→），ｊ＝０，１のみを求める。また、この条件付確率Ｐ_ＭＥ０（ｙ_０＝ｊ｜ｘ^→）に替えて、例えば、非特許文献３に開示されているような、従来の他の推定手法による信頼度（正誤判定とその確からしさ）を用いてもよい。 The correct / error conditional probability calculation unit 71 calculates only the conditional probabilities P _ME0 (y ₀ = j | x ^→ ) of the correct and incorrect answers of the speech recognition result from the utterance feature vector x ^→ (step S71). The calculation of the conditional probability of correct answer and incorrect answer uses a dedicated maximum entropy model ME ₀ for obtaining the conditional probability of correct answer and incorrect answer in the correct / incorrect error cause conditional probability calculation unit 41 ′ of FIG. Thus, only the conditional probability P _ME0 (y ₀ = j | x ^→ ), j = 0, 1 is obtained. Further, _{instead of the} conditional probability P _ME0 (y ₀ = j | x ^→ ), for example, reliability (correction determination and its confirmation by other conventional estimation methods as disclosed in Non-Patent Document 3). May be used.

正誤判定部７２は、正誤条件付確率計算部７１で求めた正解及び不正解の条件付確率Ｐ_ＭＥ０（ｙ_０＝ｊ｜ｘ^→），ｊ＝０，１と閾値ＴＨを用いて正誤判定を行う。正解である条件付確率Ｐ_ＭＥ０（ｙ_０＝０｜ｘ^→）が不正解である条件付確率Ｐ_ＭＥ０（ｙ_０＝１｜ｘ^→）よりも大きく、かつ、正解である条件付確率Ｐ_ＭＥ０（ｙ_０＝０｜ｘ^→）が予め設定した閾値ＴＨよりも大の場合（ステップＳ７２のＹ）は、音声認識結果の正解の確率が高いとして、誤り原因条件付確率計算ステップ（ステップＳ７３）と誤り原因選択ステップ（ステップＳ７４）とを省略する。つまり、誤り原因条件付確率計算部７３と、誤り原因選択部７４の動作を停止させる。このように音声の認識率が高い場合には、計算負荷を軽減することが可能である。逆に、不正解である条件付確率Ｐ_ＭＥ０（ｙ_０＝１｜ｘ^→）が正解である条件付確率Ｐ_ＭＥ０（ｙ_０＝１｜ｘ^→）以上である、または、正解である条件付確率Ｐ_ＭＥ０（ｙ_０＝０｜ｘ^→）があらかじめ設定した閾値ＴＨ以下の場合（ステップＳ７４のＮ）は、音声認識結果の不正解の確率が高いとして、誤り原因条件付確率計算ステップ（ステップＳ７３）と、誤り原因選択ステップ（ステップＳ７４）が動作する。なお、正誤判定部７２からは、正誤判定結果ｙ_０＝ｙ_０＾（正解（０）か不正解（１）か）とその確からしさＰ_ＭＥ０（ｙ_０＝ｙ_０＾｜ｘ^→）を出力する。 The correct / incorrect determination unit 72 performs correct / incorrect determination using the correct / incorrect conditional probability _PME0 (y ₀ = j | x ^→ ), j = 0, 1 and the threshold value TH obtained by the correct / incorrect conditional probability calculation unit 71. Do. A conditional probability P _ME0 (y ₀ = 0 | x ^→ ) that is a correct answer is greater than a conditional probability P _ME0 (y ₀ = 1 | x ^→ ) that is an incorrect answer, and a conditional probability P _ME0 that is a correct answer. If (y ₀ = 0 | x ^→ ) is larger than a preset threshold TH (Y in step S72), the probability of correct answer of the speech recognition result is assumed to be high, and an error cause conditional probability calculation step (step S73). And the error cause selection step (step S74) are omitted. That is, the operations of the error cause conditional probability calculation unit 73 and the error cause selection unit 74 are stopped. Thus, when the speech recognition rate is high, the calculation load can be reduced. Conversely, a conditional probability P _ME0 (y ₀ = 1 | x ^→ ) that is an incorrect answer is greater than or equal to a conditional probability P _ME0 (y ₀ = 1 | x ^→ ) that is a correct answer, or a conditional answer that is a correct answer. If the probability P _ME0 (y ₀ = 0 | x ^→ ) is less than or equal to a preset threshold TH (N in step S74), the probability calculation of the error cause conditional probability is performed (step S74). S73) and an error cause selection step (step S74) are performed. The correct / incorrect determination unit 72 outputs the correct / incorrect determination result y ₀ = y ₀ ^ (correct answer (0) or incorrect answer (1)) and its probability P _ME0 (y ₀ = y ₀ ^ | x ^→ ). To do.

正誤判定部７２において音声認識結果が不正解である可能性が高いと判定した場合は、誤り原因条件付確率計算部７３と、誤り原因選択部７４は動作し、誤り原因の推定を行う。この誤り原因の推定では、正誤判定部７２から既に正誤判定結果とその確からしさを出力しているので、正解不正解の推定を行う必要がない。 When the correctness determination unit 72 determines that there is a high possibility that the speech recognition result is incorrect, the error cause conditional probability calculation unit 73 and the error cause selection unit 74 operate to estimate the cause of the error. In this error cause estimation, the correctness / incorrectness determination section 72 has already output the correctness / incorrectness determination result and its certainty, so there is no need to estimate the correct / incorrect answer.

そこで、誤り原因条件付確率計算部７３と誤り原因選択部７４では、正誤・誤り原因ラベルベクトルｙ^→＝（ｙ_０，ｙ_１，ｙ_２，ｙ_３）から正解不正解に関する正誤ラベルｙ_０を除いた誤り原因ラベルベクトルｚ^→＝（ｙ_１，ｙ_２，ｙ_３）の取り得る値のうちから、誤り原因の推定値ｚ^→＾＝（ｙ_１＾，ｙ_２＾，ｙ_３＾）とその確からしさΠ_ｉ＝１ ^３Ｐ_ＭＥｉ（ｙ_ｉ＝ｙ_ｉ＾｜ｘ^→）を求める。図９に誤り原因ラベルベクトルｚ^→の取り得る値を示す。この例では８状態である。誤り原因条件付確率計算部７３と誤り原因選択部７４では、誤り原因ラベルｙ_ｉ，ｉ＝１，２，３間の独立性を仮定した上で、図６に示す変形例１の正誤・誤り原因条件付確率計算部４１′と正誤・誤り原因選択部４３′と同様の処理をそれぞれ実行して誤り原因の推定値ｚ^→＾とその確からしさΠ_ｉ＝１ ^３Ｐ_ＭＥｉ（ｙ_ｉ＝ｙ_ｉ＾｜ｘ^→）を求める。 Therefore, the error cause conditional probability calculation unit 73 and the error cause selection unit 74 obtain the correct / incorrect label y ₀ regarding the correct / incorrect answer from the correct / error / error cause label vector y ^→ = (y ₀ , y ₁ , y ₂ , y ₃ ). Among the possible values of the removed error cause label vector z ^→ = (y ₁ , y ₂ , y ₃ ), the error cause estimated value z ^→ ^ = (y ₁ ^, y ₂ ^, y ₃ ^) The probability Π _{i = 1} ³ P _MEi (y _i = y _i ^ | x ^→ ) is obtained. FIG. 9 shows possible values of the error cause label vector z ^→ . In this example, there are 8 states. The error cause conditional probability calculation unit 73 and the error cause selection unit 74 assume the independence between the error cause labels y _i , i = 1, 2, 3 and then correct / error of the first modification shown in FIG. The same processing as that performed by the cause conditional probability calculation unit 41 ′ and the correctness / error cause selection unit 43 ′ is executed to estimate the error cause z ^→ ^ and its probability Π _{i = 1} ³ P _MEi (y _i = y _i ^ | x ^→ ) is obtained.

例えば、語彙内か外かのラベルｙ_１について、語彙内である条件付確率がＰ_ＭＥ１（ｙ_１＝０｜ｘ^→）＝０．８で、語彙外である条件付確率がＰ_ＭＥ１（ｙ_１＝１｜ｘ^→）＝０．２であるとする。つまり、Σ_ｊ＝０ ^１Ｐ_ＭＥ１（ｙ_１＝ｊ｜ｘ^→）＝１．０である。ｙ_２，ｙ_３についても同じである。このとき、語彙内か語彙外かの推定値ｙ_１＾としては０（語彙内）が選ばれる。同様に、例えば、雑音に関するラベルｙ_２について、雑音なしである条件付確率がＰ_ＭＥ２（ｙ_２＝０｜ｘ^→）＝０．７で、雑音ありである条件付確率がＰ_ＭＥ２（ｙ_２＝１｜ｘ^→）＝０．３であるとする。このとき、雑音なしか雑音ありかの推定値ｙ_２＾としては０（雑音なし）が選ばれる。同様に、例えば、性別に関するラベルｙ_３について、男性（性別一致）である条件付確率がＰ_ＭＥ３（ｙ_２＝０｜ｘ^→）＝０．１で、女性（性別不一致）である条件付確率がＰ_ＭＥ３（ｙ_２＝１｜ｘ^→）＝０．９であるとする。このとき、男性か女性かの推定値ｙ_３＾としては１（女性）が選ばれる。以上のような各誤り原因ラベルｙ_ｉ毎の選択結果を統合する（並べる）ことで、誤り原因の推定値ｚ^→＾としてｚ^→＾＝（ｙ_１＾，ｙ_２＾，ｙ_３＾）＝（０，０，１）の「女性（性別不一致）」が求められ、その確からしさは、Ｐ（ｚ^→＾｜ｘ^→）＝Π_ｉ＝１ ^３Ｐ_ＭＥｉ（ｙ_ｉ＝ｙ_ｉ＾｜ｘ^→）＝０．７×０．８×０．９＝０．５０４と計算される。 For example, for a label y ₁ within or outside the vocabulary, the conditional probability within the vocabulary is P _ME1 (y ₁ = 0 | x ^→ ) = 0.8, and the conditional probability outside the vocabulary is P _ME1 (y It is assumed that ₁ = 1 | x ^→ ) = 0.2. That is, Σ _{j = 0} ¹ P _ME1 (y ₁ = j | x ^→ ) = 1.0. The same applies to y ₂ and y ₃ . At this time, 0 (in the vocabulary) is selected as the estimated value y ₁かの within or outside the vocabulary. Similarly, for example, for the label y ₂ relating to noise, the conditional probability that there is no noise is P _ME2 (y ₂ = 0 | x ^→ ) = 0.7, and the conditional probability that is noise is P _ME2 (y ₂ = 1 | x ^→ ) = 0.3. At this time, 0 (no noise) is selected as the estimated value y ₂ ^ for whether there is no noise or no noise. Similarly, for example, for the label y ₃ relating to gender, the conditional probability of being male (gender match) is P _ME3 (y ₂ = 0 | x ^→ ) = 0.1, and the conditional probability of being female (gender mismatch) Is P _ME3 (y ₂ = 1 | x ^→ ) = 0.9. At this time, 1 (female) is selected as the estimated value y ₃ ^ of male or female. Integrating the selection result of the error causes each label _{y i} as described above (arranged) that ^{_{_{is, z → ^ = (y 1}}} ^, y 2 ^, y 3 ^) as the estimated value z ^→ ^ error causes = (0, 0, 1) “female (gender mismatch)” is obtained, and the probability is P (z ^→ ^ | x ^→ ) = Π _{i = 1} ³ P _MEi (y _i = y _i ^ | x ^→ ) = 0.7 × 0.8 × 0.9 = 0.504.

このときの誤り原因ラベルｙ_ｉ，ｉ＝１，２，３毎の専用の最大エントロピーモデルＭＥ_ｉ，ｉ＝０，１，２，３としては、変形例１のものをそのまま用いることができる。ただし、その場合は正誤ラベルｙ₀に対応する素性関数ｆ_ｋ ^０（ｘ^→，ｙ_ｉ）とその重みパラメータλ_ｋ ^０は用いない。なお、誤り原因ラベルベクトルｚ^→は、図７に示すように誤り原因条件付確率計算部７３の外部に誤り原因ラベル記録部７５を設け、そこに記録して置いても良い。 As the dedicated maximum entropy model ME _i , i = 0, 1, 2, 3 for each of the error cause labels y _i , i = 1, 2, 3 at this time, the one of the modified example 1 can be used as it is. However, in that case, the feature function f _k ⁰ (x ^→ , y _i ) corresponding to the correct / incorrect label y ₀ and its weight parameter λ _k ⁰ are not used. The error cause label vector z ^→ may be recorded in an error cause label recording unit 75 provided outside the error cause conditional probability calculation unit 73 as shown in FIG.

〔変形例２〕
また、最大エントロピーモデルＭＥｚを用いて誤り原因条件付確率計算部７３′と誤り原因選択部７４′とで、誤り原因の推定値ｚ^→＾とその確からしさＰ_ＭＥｚ（ｚ^→＾｜ｘ^→）を求めても良い。最大エントロピーモデルＭＥｚは、誤り原因ラベルベクトルｚ^→と、発話特徴量ベクトルｘ^→との関係について、例えば準ニュートン法によって学習して推定したものである。 [Modification 2]
Further, the error cause conditional probability calculation unit 73 ′ and the error cause selection unit 74 ′ using the maximum entropy model MEz and the error cause estimated value z ^→ ^ and its probability P _MEz (z ^→ ^ | x ^→ ) You may ask for. The maximum entropy model MEz is obtained by learning and estimating the relationship between the error cause label vector z ^→ and the speech feature vector x ^→ by, for example, the quasi-Newton method.

図１０に最大エントロピーモデルＭＥｚを用いた正誤・誤り原因推定部７０′の機能構成例を示す。正誤・誤り原因推定部７０′は、正誤条件付確率計算部７１と、正誤判定部７２と、誤り原因条件付確率計算部７３′と、モデルパラメータ記録部１０１と、誤り原因選択部７４′を備える。正誤条件付確率計算部７１と正誤判定部７２とは、図７の正誤・誤り原因推定部７０と同じものである。 FIG. 10 shows a functional configuration example of a correct / error / error cause estimation unit 70 ′ using the maximum entropy model MEz. The correctness / error cause estimation unit 70 ′ includes a correct / error conditional probability calculation unit 71, a correctness / error determination unit 72, an error cause conditional probability calculation unit 73 ′, a model parameter recording unit 101, and an error cause selection unit 74 ′. Prepare. The right / wrong conditional probability calculation unit 71 and the right / wrong determination unit 72 are the same as the right / wrong / error cause estimation unit 70 of FIG.

モデルパラメータ記録部１０１は、最大エントロピーモデルＭＥｚの素性関数とその重みパラメータを記録する。誤り原因条件付確率計算部７３′は、モデルパラメータ記録部１０１に記録されている素性関数ｆ_ｋ ^ｚ（ｘ^→，ｚ^→）と重みパラメータλ_ｋ ^ｚを参照して誤り原因ラベルベクトルｚ^→毎に、誤り条件付確率Ｐ_ＭＥＺ（ｚ^→｜ｘ^→）を計算する。 The model parameter recording unit 101 records a feature function of the maximum entropy model MEz and its weight parameter. The error cause conditional probability calculation unit 73 ′ refers to the feature function f _k ^z (x ^→ , z ^→ ) and the weight parameter λ _k ^z recorded in the model parameter recording unit 101, and each error cause label vector z ^→ Then, an error conditional probability P _MEZ (z ^→ | x ^→ ) is calculated.

誤り原因選択部７４′は、誤り条件付確率Ｐ_ＭＥＺ（ｚ^→｜ｘ^→）が最大の誤り原因の推定値ｚ^→＾を選択して、誤り条件付確率Ｐ_ＭＥＺ（ｚ^→｜ｘ^→）と共に出力する。なお、音声認識結果の正解の確率が高い場合に、誤り原因条件付確率計算部７３′と、誤り原因選択部７４′の動作を停止させるのは、正誤・誤り原因推定部７０と同じである。 The error cause selection unit 74 ′ selects an error cause estimation value z ^→ ^ with the maximum error conditional probability P _MEZ (z ^→ | x ^→ ), and sets the error conditional probability P _MEZ (z ^→ | x ^→ ). And output. Note that, when the probability of correct answer of the speech recognition result is high, the error cause conditional probability calculation unit 73 ′ and the error cause selection unit 74 ′ are stopped in the same manner as the correctness / error cause estimation unit 70. .

次に正誤推定の精度をより向上させることが可能な実施例３について説明する。 Next, a description will be given of a third embodiment that can further improve the accuracy of the accuracy estimation.

図１１に実施例３の正誤・誤り原因推定部１１０の機能構成例を示す。その動作フローを図１２に示す。正誤・誤り原因推定部１１０は、正誤・誤り原因条件付確率計算部４１′、モデルパラメータ記録部４２′、誤り原因選択部７４、正誤条件付確率再計算部１１１、正誤再選択部１１２、誤り原因‐正誤関係情報記録部１１３、を備える。正誤・誤り原因条件付確率計算部４１′とモデルパラメータ記録部４２′とは、変形例１（図６）の正誤・誤り原因推定部４０′と同じものである。誤り原因選択部７４は、実施例２（図７）の正誤・誤り原因推定部７０と同じである。 FIG. 11 shows a functional configuration example of the correctness / error cause estimation unit 110 according to the third embodiment. The operation flow is shown in FIG. The correctness / error cause estimation unit 110 includes a correctness / error cause conditional probability calculation unit 41 ', a model parameter recording unit 42', an error cause selection unit 74, a correctness / error conditional probability recalculation unit 111, a correctness / error reselection unit 112, an error A cause-correct / incorrect relationship information recording unit 113 is provided. The correctness / error cause conditional probability calculation section 41 'and the model parameter recording section 42' are the same as the correctness / error cause estimation section 40 'of the first modification (FIG. 6). The error cause selection unit 74 is the same as the error / error cause estimation unit 70 of the second embodiment (FIG. 7).

誤り原因‐正誤関係情報記録部１１３は、個々の正誤・誤り原因ラベルｙ_ｉに対応する認識誤り原因と、その認識誤り原因による正解不正解との関係を表す誤り原因‐正誤関係情報確率Ｐ_Ｒ（ｙ_０＝ｊ｜ｙ_ｉ＝ｓ），ｉ＝０，１，２，３，ｊ＝０，１，ｓ＝０，１を記録する。誤り原因‐正誤関係情報確率Ｐ_Ｒ（ｙ_０＝ｊ｜ｙ_ｉ＝ｓ）は、事前の学習によって求められるものである。 Error cause - Errata relationship information recording unit 113, an error cause represents a recognition error sources associated to the individual correctness and error cause label y _i, the relationship between the correct incorrect due to the recognition error caused - Errata relationship information probability P _R (Y ₀ = j | y _i = s), i = 0, 1, 2, 3, j = 0, 1, s = 0, 1 are recorded. The error cause-correction relation information probability P _R (y ₀ = j | y _i = s) is obtained by prior learning.

誤り原因‐正誤関係情報確率Ｐ_Ｒ（ｙ_０＝ｊ｜ｙ_ｉ＝ｓ），ｊ＝０，１，ｓ＝０，１を、例えば雑音の誤り原因ラベルｙ_２で説明する。雑音なしｙ_２＝０であるときに認識結果が正解ｙ_０＝０である確率Ｐ_Ｒ（ｙ_０＝０｜ｙ_２＝０）、雑音なしｙ_２＝０であるときに認識結果が不正解ｙ_０＝１である確率Ｐ_Ｒ（ｙ_０＝１｜ｙ_２＝０）、雑音ありｙ_２＝１であるときに認識結果が正解ｙ_０＝０である確率Ｐ_Ｒ（ｙ_０＝０｜ｙ_２＝１），雑音ありｙ_２＝１であるときに認識結果が不正解ｙ_０＝１である確率Ｐ_Ｒ（ｙ_０＝１｜ｙ_２＝１）。これらが雑音のなし／ありと正解不正解の関係を表す誤り原因‐正誤関係情報確率Ｐ_Ｒ（ｙ_０＝ｊ｜ｙ_２＝ｓ），ｊ＝０，１，ｓ＝０，１である。 Error cause - Errata relationship information probability _{_{_{P R (y 0 = j |}}} y i = s), j = 0,1, the s = 0, 1, will be described for example in the noise error causes the label _{y 2.} Recognition result is correct _y probability _P R 0 = 0 when the noise is no _{_{y 2 = 0 (y 0 =}} 0 | y 2 = 0), the recognition result is incorrect when the noise without _y 2 = 0 y ₀ = 1 a is the probability _{_{_{P R (y 0 = 1 |}}} y 2 = 0), the probability noise there recognition results when a _y 2 = 1 is correct _{_{_{y 0 = 0 P R (y}}} 0 = 0 | Probability P _R (y ₀ = 1 | y ₂ = 1) that the recognition result is incorrect y ₀ = 1 when y ₂ = 1) and y ₂ = 1 with noise. These are error cause-correct / incorrect relationship information probabilities P _R (y ₀ = j | y ₂ = s), j = 0, 1, s = 0, 1 representing the relationship between no / no noise and correct / incorrect answer.

正誤条件付確率再計算部１１１は、誤り原因‐正誤関係情報記録部１１３に記録された誤り原因‐正誤関係情報確率Ｐ_Ｒ（ｙ_０＝ｊ｜ｙ_ｉ＝ｓ），ｉ＝０，１，２，３，ｊ＝０，１，ｓ＝０，１を用いて式（６）で正誤・誤り原因条件付確率計算部４１′が出力する正解不正解の確からしさを表す条件付確率Ｐ_ＭＥ０（ｙ_０＝ｊ｜ｘ^→），ｊ＝０，１を、補正した条件付確率Ｐ（ｙ_０＝ｊ｜ｘ^→），ｊ＝０，１を出力する（ステップＳ１１１、図１２）。 The correctness-conditional probability recalculation unit 111 includes the error cause-correction relationship information recording unit 113 recorded in the error cause-correction relationship information probability P _R (y ₀ = j | y _i = s), i = 0, 1, A conditional probability P _ME0 representing the probability of the correct / incorrect answer output by the correct / error cause conditional probability calculation unit 41 ′ in Expression (6) using 2, 3, j = 0, 1, s = 0, 1. _{^{(y 0 = j | x →}} ), the j = 0, 1, probabilities corrected condition _{P (y 0 = j | x} →), and it outputs a j = 0, 1 (step S111, FIG. 12).

正誤再選択部１１２は、補正された条件付確率Ｐ（ｙ_０＝ｊ｜ｘ^→），ｊ＝０，１を入力として、式（７）に示すように正解か不正解を選択し、その選択結果である正誤判定結果ｙ_０＾とともにその補正された条件付確率値Ｐ（ｙ_０＝ｙ_０＾｜ｘ^→）を出力する（ステップＳ１１２）。 The correct / incorrect reselection unit 112 inputs the corrected conditional probability P (y ₀ = j | x ^→ ), j = 0, 1 and selects a correct answer or an incorrect answer as shown in Expression (7). The corrected conditional probability value P (y ₀ = y ₀ ^ | x ^→ ) is output together with the correct / incorrect determination result y ₀ ^ which is the selection result (step S112).

このように新たな知識である誤り原因‐正誤関係情報確率Ｐ_Ｒ（ｙ_０＝ｊ｜ｙ_ｉ＝ｓ）を導入することで、正誤判定の精度を向上させることが可能である。一方、誤り原因選択部７４からは、図７に示す実施例２と同様に、誤り原因の推定値ｚ^→＾＝（ｙ_１＾，ｙ_２＾，ｙ_３＾）とその確からしさΠ_ｉ＝１ ^３Ｐ_ＭＥｉ（ｙ_ｉ＝ｙ_ｉ＾｜ｘ^→）を出力する。なお、正誤・誤り原因推定部１１０の正誤・誤り原因条件付確率計算部４１′と、誤り原因選択部７４と、正誤・誤り原因ラベルベクトル記録部４４と、モデルパラメータ記録部４２′とについては、その部分を正誤・誤り原因推定部７０若しくは正誤・誤り原因推定部７０′に置き換えても良い。その場合、正誤条件付確率再計算部１１１には正誤条件付確率計算部７１の出力する条件付確率Ｐ_ＭＥ０（ｙ_０＝ｊ｜ｘ^→）が入力される。 Thus errors cause a new knowledge - Errata relationship information probability P _R _| By introducing _{_{(y 0 = j y i =}} s), it is possible to improve the accuracy of the right or wrong decision. On the other hand, in the same manner as in the second embodiment shown in FIG. 7, the error cause selection unit 74 estimates the error cause z ^→ ^ = (y ₁ ^, y ₂ ^, y ₃ ^) and the probability Π _{i = 1} ³ P _MEi (y _i = y _i ^ | x ^→ ) is output. The correctness / error cause conditional probability calculation unit 41 ′, the error cause selection unit 74, the error / error cause label vector recording unit 44, and the model parameter recording unit 42 ′ of the error / error cause estimation unit 110 are as follows. The part may be replaced with a correct / error / error cause estimation unit 70 or a correct / error / error cause estimation unit 70 '. In this case, the conditional probability P _ME0 (y ₀ = j | x ^→ ) output from the correct / incorrect conditional probability calculator 71 is input to the correct / incorrect conditional probability recalculator 111.

以上説明したこの発明の音声認識装置とその方法は、上述の実施形態に限定されるものではなく、この発明の趣旨を逸脱しない範囲で適宜変更が可能である。例えば、上記した装置及び方法において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。また、例えば、最大エントロピーモデルに代わる識別モデルとして、サポートベクトルマシン（ＳＶＭ：Support Vector Machine）や、条件付確率場（ＣＲＦ：Conditional Random Fields）を用いることも可能である。 The speech recognition apparatus and method of the present invention described above are not limited to the above-described embodiments, and can be appropriately changed without departing from the spirit of the present invention. For example, the processes described in the above-described apparatus and method are not only executed in time series in the order described, but are also executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Also good. Further, for example, a support vector machine (SVM) or a conditional random field (CRF) can be used as an identification model instead of the maximum entropy model.

上記した装置では、正誤・誤り原因ラベルベクトルの推定値ｙ^→＾を出力するので、使用者はその推定値ｙ^→＾を確認することでどのように対処すべきかを知ることができる。その利便性を更に向上させた音声認識装置１３０も考えられる。音声認識装置１３０は、正誤・誤り原因ラベルベクトルの推定値ｙ^→＾から正誤・誤り原因メッセージを生成するようにしたものである。その機能構成例を図１３に示す。音声認識装置１３０は、正誤・誤り原因推定部４０の出力する正誤・誤り原因ラベルベクトルの推定値ｙ^→＾を入力として正誤・誤り原因メッセージを生成する正誤・誤り原因メッセージ生成部１３１を備える。正誤・誤り原因メッセージ生成部１３１は、正誤・誤り原因ラベルベクトルの推定値ｙ^→＾に対応させたメッセージを出力するものであり、例えば、図１８に示したように、使用者により分かり易い対処方法を提示することを可能にする。図１４に正誤・誤り原因ラベルベクトルの推定値ｙ^→＾に対応する正誤・誤り原因メッセージの例を示す。この図に示すように、メッセージは必ずしも音声認識結果が不正解と推定された場合（ｙ^→＾の取り得る値＝８，９，１０，１１，１２，１３，１４，１５）のみに出力するのではなく、場合によっては、音声認識結果が正解と推定された場合（ｙ^→＾の取り得る値＝１，２，３）に出力して、使用者に正しい使用方法を提示することも可能である。 The above-described apparatus outputs the estimated value y ^→ ^ of the correctness / error cause label vector, so that the user can know how to deal with it by checking the estimated value y ^→ ^. A speech recognition device 130 that further improves convenience is also conceivable. The speech recognition apparatus 130 generates a correct / error / cause cause message from the estimated value y ^→ ^ of the correct / error / cause cause label vector. An example of the functional configuration is shown in FIG. The speech recognition apparatus 130 includes a correct / error / error cause message generator 131 that generates an error / error cause message by inputting the estimated value y ^→ ^ of the correct / error / error cause label vector output from the correct / error / error cause estimator 40. The error / error cause message generator 131 outputs a message corresponding to the estimated value y ^→ ^ of the error / error cause label vector. For example, as shown in FIG. Allows to present a method. FIG. 14 shows an example of the error / error cause message corresponding to the estimated value y ^→ ^ of the error / error cause label vector. As shown in this figure, the message is always output only when the speech recognition result is estimated to be an incorrect answer (possible values of y ^→ ^ = 8, 9, 10, 11, 12, 13, 14, 15). In some cases, when the speech recognition result is estimated to be correct (y ^→ ^ possible values = 1, 2, 3), it is also possible to present the correct usage to the user It is.

上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。 When the processing means in the above apparatus is realized by a computer, the processing contents of the functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ-ＲＡＭ
（Random Access Memory）、ＣＤ-ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ-Ｒ
（Recordable）/ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto Optical disc）等を、半導体メモリとしてフラッシュメモリー等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape, or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM
(Random Access Memory), CD-ROM (Compact Disc Read Only Memory), CD-R
(Recordable) / RW (ReWritable) or the like can be used as a magneto-optical recording medium, MO (Magneto Optical disc) or the like as a semiconductor memory, and flash memory or the like as a semiconductor memory.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

Claims

A speech recognition unit that outputs a word sequence obtained by speech recognition of the input speech, and an utterance feature amount vector of each word that represents a feature amount of each word constituting the word sequence by a plurality of parameters;
Using the utterance feature vector of each word as an input, the correct incorrect answer of each word and the estimated value of the error cause and its certainty are calculated as the utterance feature vector and the correct incorrect answer and error cause of the speech recognition result word. A correct / error / error cause estimator that estimates using conditional probabilities based on an identification model representing the relationship;
A speech recognition apparatus comprising:

The speech recognition apparatus according to claim 1,
The correctness / error cause estimation unit is
A model parameter recording unit that records a model parameter necessary to calculate a conditional probability based on an identification model that represents the relationship between the utterance feature vector and the correctness / error cause label vector;
Correct / error to calculate conditional probabilities based on the identification model using the model parameters for each possible value of the correct / error / error cause label vector set in advance using the utterance feature vector of each word A cause conditional probability calculator,
Select the correct incorrect answer and error cause estimate with the maximum conditional probability from the possible values of the correct / incorrect error cause label vector, and select the correct incorrect answer and error cause estimate as the selection result. An error / error cause selection unit that outputs with a conditional probability representing the probability,
A speech recognition apparatus comprising:

The speech recognition apparatus according to claim 1,
The correctness / error cause estimation unit is
Recorded model parameters necessary to calculate conditional probabilities based on an identification model that expresses the relationship between the utterance feature vector and the correct / error / cause cause labels that are the elements of the correct / error / error cause label vectors. A model parameter recording unit;
Using the utterance feature vector of each word as an input, a correct probability based on the identification model for each correct / error cause label, and a correct / error cause conditional probability calculator that uses the model parameters;
Select one of the two possible values for the correct / incorrect / error cause label so that the conditional probability is maximized for each correct / incorrect / error cause label, and use it as the correct / incorrect answer / estimated error cause. A correct / incorrect / error cause selector that outputs the product of the larger conditional probability value for each error cause label as the probability of the correct / incorrect answer and the estimated cause of the error;
A speech recognition apparatus comprising:

The speech recognition apparatus according to claim 1,
The correctness / error cause estimation unit is
A correct / incorrect conditional probability calculation unit for calculating a correct / incorrect conditional probability as to whether or not each word is correct from the utterance feature vector of each word;
The correctness of the word is determined from the correct conditional probability by comparing the correct conditional probability with the incorrect conditional probability and comparing the correct conditional probability with a predetermined threshold. A correct / incorrect determination unit that outputs a correct or incorrect answer as a determination result and its certainty,
Recorded model parameters necessary to calculate conditional probabilities based on an identification model that expresses the relationship between the utterance feature vector and the correct / error / cause cause labels that are the elements of the correct / error / error cause label vectors. A model parameter recording unit;
Based on the identification model for each error cause label of each element of the error cause label vector, which is a vector obtained by removing the correct / incorrect label relating to the correct / incorrect answer from the correct / incorrect / error cause label vector, using the utterance feature vector of each word as an input An error-cause conditional probability calculator that calculates conditional probabilities using the model parameters;
Select the possible value of the error cause label with the larger conditional probability value for each error cause label as an error cause estimate, and multiply the product of the above conditional probability values into the error cause estimate. An error cause selection unit that outputs as certainty,
With
The speech recognition apparatus, wherein the error cause conditional probability calculation unit and the error cause selection unit operate when the word recognized by the correctness / incorrectness determination unit is determined to be an error.

The speech recognition apparatus according to claim 1,
The correctness / error cause estimation unit is
A correct / incorrect conditional probability calculation unit for calculating a correct / incorrect conditional probability as to whether or not each word is correct from the utterance feature vector of each word;
The correctness of the word is determined from the correct conditional probability by comparing the correct conditional probability with the incorrect conditional probability and comparing the correct conditional probability with a predetermined threshold. A correct / incorrect determination unit that outputs a correct or incorrect answer as a determination result and its certainty,
Conditional based on an identification model that expresses the relationship between the utterance feature vector and the error cause label vector obtained by removing the correct / incorrect label relating to the correct / incorrect answer that is an element of the correct / incorrect error cause label vector from the correct / incorrect / error cause label vector A model parameter recording unit that records model parameters necessary for calculating the probability;
An error cause conditional probability calculation unit that calculates an error conditional probability based on the identification model for each error cause label vector using the model parameters, using the utterance feature vector of each word as an input,
An error cause selection unit that selects an estimated value of an error cause label vector having the maximum error conditional probability and outputs the error cause probability vector together with the error conditional probability;
With
The speech recognition apparatus, wherein the error cause conditional probability calculation unit and the error cause selection unit operate when the word recognized by the correctness / incorrectness determination unit is determined to be an error.

The speech recognition apparatus according to claim 1,
The correctness / error cause estimation unit is
Recorded model parameters necessary to calculate conditional probabilities based on an identification model that expresses the relationship between the utterance feature vector and the correct / error / cause cause labels that are the elements of the correct / error / error cause label vectors. A model parameter recording unit;
Using the utterance feature vector of each word as an input, a correct probability based on the identification model for each correct / error cause label, and a correct / error cause conditional probability calculator that uses the model parameters;
Select the possible value of the error cause label with the larger conditional probability value for each error cause label of each element of the error cause label vector, which is a vector obtained by removing the correct / incorrect label related to the correct / incorrect answer from the correct / incorrect error cause label vector. An error cause estimation unit that outputs a product of conditional probability values having a large value as the probability of the error cause estimation value,
An error cause-correction relation information recording unit that records the error cause-correction relation information probability representing the relationship between the correct / wrong cause label value and the correct / incorrect answer;
Multiply the conditional probability for each correctness / error cause label output by the correctness / error cause conditional probability calculation unit by the corresponding error cause / correction relation information probability with the possible value of the correctness / error cause label, A correct and incorrect conditional probability recalculation unit that corrects the conditional probability for each of correct and incorrect answers;
Using the corrected conditional probability as an input, selecting a correct answer or an incorrect answer, and a correct / incorrect reselecting unit that outputs the corrected conditional probability value together with a correct / incorrect determination result that is the selection result;
A speech recognition apparatus comprising:

The speech recognition apparatus according to any one of claims 2 to 6,
A speech recognition apparatus, further comprising: a correct / error / error cause message generation unit that receives the correct / error / error cause label vector or the error cause label vector and generates a correct / error / error cause message corresponding to the label vector.

The speech recognition device according to any one of claims 1 to 7,
The speech recognition apparatus, wherein the identification model is a maximum entropy model.

A speech recognition process in which a speech recognition unit outputs a word sequence obtained by speech recognition of an input speech, and an utterance feature amount vector of each word representing a feature amount of each word constituting the word sequence by a plurality of parameters;
The correctness / error / error cause estimation unit receives the utterance feature vector of each word as an input, and determines the correct / incorrect answer of each word, the estimated cause of the error, and the probability of the utterance feature vector and the speech recognition result word. Correct / incorrect error cause estimation process using conditional probabilities based on an identification model representing the relationship between correct and incorrect answers and error causes,
A speech recognition method including:

The speech recognition method according to claim 9,
The above error / error cause estimation process is as follows:
The correctness / error cause conditional probability calculation unit takes the utterance feature vector of each word as an input, and calculates a conditional probability based on the identification model for each possible value of the preset correct / error / error cause label vector. Correct / error-cause conditional probability calculation step using the model parameters recorded in the model parameter recording unit,
The correct / incorrect / error cause selection unit selects the correct / incorrect correct answer and the error cause estimated value with the maximum conditional probability from the possible values of the correct / incorrect / error cause label vector, and selects the correct / incorrect correct answer An error / error cause selection step for outputting an estimated value of the error cause together with a conditional probability representing the probability;
A speech recognition method comprising:

The speech recognition method according to claim 9,
The above error / error cause estimation process is as follows:
The correctness / error cause conditional probability calculation unit calculates the conditional probability based on the identification model for each correctness / error cause label using the model parameters, using the utterance feature vector of each word as input. An error cause conditional probability calculation step;
The correct / incorrect / error cause selection unit selects one of the two possible values of the correct / incorrect error cause label so that the conditional probability is maximized for each correct / incorrect error cause label. And an error / error cause selection step for outputting the product of the larger conditional probability value for each correct / error / error cause label as the probability of the correct / incorrect answer and the error cause estimate,
A speech recognition method comprising:

The speech recognition method according to claim 9,
The above error / error cause estimation process is as follows:
A correct / correct conditional probability calculation unit calculates a correct / correct conditional probability calculation step as to whether or not each word is correct from the utterance feature vector of each word;
The correct / incorrect determination unit compares the correct / incorrect of the word from the correct / incorrect conditional probability with a conditional probability that is correct and a conditional probability that is incorrect, and with a conditional probability that is correct and a predetermined threshold. A correct / incorrect determination step of outputting the correct or incorrect answer and its certainty as the determination result;
The error cause conditional probability calculation unit uses the utterance feature vector of each word as an input, and the error of each element of the error cause label vector, which is a vector obtained by removing the correct / incorrect error label from the correct / incorrect error cause label vector. An error cause conditional probability calculation step of calculating a conditional probability based on the identification model for each cause label using a model parameter recorded in the model parameter recording unit;
The error cause selection unit selects a possible value of the error cause label having a larger conditional probability value for each error cause label as an estimated value of the error cause, and the product of the conditional probability values having a larger value is An error cause selection step to output the probability of the error cause estimate;
Including
A speech recognition method, wherein the error cause conditional probability calculation step and the error cause selection step operate when the correct / wrong determination step determines that the word recognized by speech is an error.

The speech recognition method according to claim 9,
The above error / error cause estimation process is as follows:
A correct / correct conditional probability calculation unit calculates a correct / correct conditional probability calculation step as to whether or not each word is correct from the utterance feature vector of each word;
The correct / incorrect determination unit compares the correct / incorrect of the word from the correct / incorrect conditional probability with a conditional probability that is correct and a conditional probability that is incorrect, and with a conditional probability that is correct and a predetermined threshold. A correct / incorrect determination step of outputting the correct or incorrect answer and its certainty as the determination result;
The error cause conditional probability calculation unit uses the utterance feature vector of each word as an input, and uses the model parameter recorded in the model parameter recording unit for the error conditional probability based on the identification model for each error cause label vector. Error cause conditional probability calculation step calculated by
An error cause selection unit that selects an estimated value of the error cause having the maximum error conditional probability and outputs it together with the error conditional probability; and
Including
A speech recognition method, wherein the error cause conditional probability calculation step and the error cause selection step operate when the correct / wrong determination step determines that the word recognized by speech is an error.

The speech recognition method according to claim 9,
The above error / error cause estimation process is as follows:
The correctness / error cause conditional probability calculation unit calculates the conditional probability based on the identification model for each correctness / error cause label using the model parameters, using the utterance feature vector of each word as input. An error cause conditional probability calculation step;
The error cause label having a larger conditional probability value for each error cause label of each element of the error cause label vector, which is a vector obtained by removing the correct / incorrect label related to the correct / incorrect answer from the correct / incorrect error cause label vector. An error cause selection step that selects a possible value of the error as an error cause estimate, and outputs a product of the above conditional probability value as a probability of the error cause estimate;
The error probability-correction probability recalculation unit responds to the conditional probability for each error / error cause label output by the error / error cause conditional probability calculation unit with the possible value of the error / error cause label. A correct / incorrect conditional probability recalculation step of correcting the conditional probability for each of the correct answer and incorrect answer by multiplying the error cause-correct / incorrect relation information probability recorded in the relationship information recording unit;
A correct / incorrect reselection unit that inputs the corrected conditional probability as an input, selects a correct answer or an incorrect answer, and outputs the corrected conditional probability value together with a correct / incorrect determination result that is the selection result; ,
A speech recognition method comprising:

The speech recognition method according to any one of claims 10 to 14,
The above error / error cause estimation process is as follows:
Correct / error for generating a correct / error / cause message corresponding to the label vector, using the correct / error / cause label vector or the error cause label vector as an input. A speech recognition method, further comprising an error cause message generation step.

The speech recognition method according to any one of claims 9 to 15,
The speech recognition method, wherein the identification model is a maximum entropy model.

An apparatus program for causing a computer to function as the voice recognition apparatus according to claim 1.