JP5113797B2 - Dissimilarity utilization type discriminative learning apparatus and method, and program thereof - Google Patents

Dissimilarity utilization type discriminative learning apparatus and method, and program thereof Download PDF

Info

Publication number
JP5113797B2
JP5113797B2 JP2009100865A JP2009100865A JP5113797B2 JP 5113797 B2 JP5113797 B2 JP 5113797B2 JP 2009100865 A JP2009100865 A JP 2009100865A JP 2009100865 A JP2009100865 A JP 2009100865A JP 5113797 B2 JP5113797 B2 JP 5113797B2
Authority
JP
Japan
Prior art keywords
value
function
recognition
positive example
positive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2009100865A
Other languages
Japanese (ja)
Other versions
JP2010250161A (en
Inventor
篤 中村
エリック マクダモット
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2009100865A priority Critical patent/JP5113797B2/en
Publication of JP2010250161A publication Critical patent/JP2010250161A/en
Application granted granted Critical
Publication of JP5113797B2 publication Critical patent/JP5113797B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Description

この発明は、音声、静止画像、動画画像等の時間軸上や空間軸上、或いはその双方において動的に変化し、何らかの概念情報を表現する信号から何らかの方法によって抽出した特徴量系列から、予め定められた信号の種別を離散値で表現したシンボル系列に同定するパターン認識のための識別的学習方法とその装置と、プログラムに関する。   The present invention is based on a feature amount sequence that is dynamically changed on a time axis and / or a space axis of voice, still image, moving image, etc. The present invention relates to a discriminative learning method, an apparatus thereof, and a program for pattern recognition for identifying a specified signal type as a symbol series expressed by discrete values.

パターン認識誤りの多くは、特徴量空間上で隣接する他シンボルとの境界周辺に位置するパターンの混同に起因して発生する。これを抑止するためには、学習の段階で正解シンボルと隣接するシンボルの両方の学習データから情報を得、混同を減らすようにモデルパラメータを推定することが有効である。このようなシンボル間識別能力の向上を図る枠組みを総称して識別的学習(Discriminative training)と呼んでいる。   Many pattern recognition errors occur due to confusion of patterns located around the boundary with other symbols adjacent in the feature amount space. In order to suppress this, it is effective to obtain information from learning data of both correct symbols and adjacent symbols at the learning stage and estimate model parameters so as to reduce confusion. Such a framework for improving the inter-symbol discriminating ability is generically called discriminative training.

その識別的学習法の代表的な実現法のひとつである最小識別誤り(MCE:Minimum Classification Error、以降MCEと称する)学習を、シンボル系列を同定するパターン認識に適用する場合を例に説明する。図7にパターン認識の一種である連続単語音声認識にMCE学習法を適用した識別的学習装置700の機能構成例を示す。   An example in which minimum identification error (MCE: Minimum Classification Error, hereinafter referred to as MCE) learning, which is one of the typical implementations of the discriminative learning method, is applied to pattern recognition for identifying symbol sequences will be described. FIG. 7 shows an example of a functional configuration of a discriminative learning device 700 in which the MCE learning method is applied to continuous word speech recognition which is a kind of pattern recognition.

識別的学習装置700は、音響モデル記録部70、正例言語モデル記録部71、正例識別関数値生成部73、正例用音声認識部74、負例識別関数値生成部75、音声認識部76、正例・負例比較部77、モデルパラメータ最適化部78を備える。音響モデル記録部70は、音響モデル701と言語モデル702とを含む。音響モデル701は、例えば連続単語音声認識に広く用いられる隠れマルコフモデル(HMM:Hidden Markov Model)で実現されるものである。言語モデル702は、単語N−gram確率モデルであり、単語の品詞属性情報や発音情報等を保持する単語発音辞書も含むものである。   The discriminative learning device 700 includes an acoustic model recording unit 70, a positive example language model recording unit 71, a positive example identification function value generation unit 73, a positive example speech recognition unit 74, a negative example identification function value generation unit 75, and a voice recognition unit. 76, a positive / negative example comparison unit 77, and a model parameter optimization unit 78. The acoustic model recording unit 70 includes an acoustic model 701 and a language model 702. The acoustic model 701 is realized by, for example, a hidden Markov model (HMM) widely used for continuous word speech recognition. The language model 702 is a word N-gram probability model, and includes a word pronunciation dictionary that holds part-of-speech attribute information, pronunciation information, and the like.

正例言語モデル記録部71は、入力音声信号の特徴量系列Xに対応する正解言語シンボル系列である正例言語モデルを記録する。複数の特徴量系列Xによって特徴量系列群Z={X,X,X,…}が構成される。正例識別関数値生成部73は、特徴量系列Xとその正解R(X)を入力として、正例言語モデル記録部71を参照してその特徴量系列Xが所属する正解のシンボル系列R(X)(以降、正例シンボル系列R(X)と称する)に対応するか否かを評価するための識別関数値G(式(1))を出力する。 The correct example language model recording unit 71 records a correct example language model that is a correct language symbol sequence corresponding to the feature amount sequence X of the input speech signal. A feature quantity sequence group Z = {X 1 , X 2 , X 3 ,... The correct example identification function value generation unit 73 receives the feature quantity sequence X and the correct answer R (X) as an input, refers to the correct example language model recording unit 71, and receives the correct symbol series R (to which the feature quantity series X belongs). X) (hereinafter referred to as positive example symbol series R (X)) is output as an identification function value G (expression (1)) for evaluating whether or not it corresponds.

Figure 0005113797
Figure 0005113797

ここでΛは、音響モデル記録部70に記録されたシンボルが持つモデルパラメータの集合である。   Here, Λ is a set of model parameters possessed by the symbols recorded in the acoustic model recording unit 70.

正例用音声認識部74は、正例シンボル系列R(X)と識別関数値Gを並び替えて正解と推定される音声認識結果を出力する。   The correct example speech recognition unit 74 rearranges the correct example symbol series R (X) and the discriminant function value G, and outputs a speech recognition result estimated to be correct.

音声認識部76は、特徴量系列Xとその正解R(X)を入力として、正例シンボル系列R(X)以外、つまり正解以外のシンボル系列S(以降、負例シンボル系列S(S≠R(X))と称する)を生成し、特徴量系列Xと負例シンボル系列S(S≠R(X))を負例識別関数値生成部75に出力する。   The speech recognition unit 76 receives the feature amount series X and its correct answer R (X) as input, and other than the positive example symbol series R (X), that is, a symbol series S other than the correct answer (hereinafter, negative example symbol series S (S ≠ R (X)) is generated, and the feature amount series X and the negative example symbol series S (S ≠ R (X)) are output to the negative example identification function value generation unit 75.

負例識別関数値生成部75は、音響モデル記録部70を参照して入力音声信号の特徴量系列Xが負例シンボル系列S(S≠R(X))に対応するか否かを評価するための識別関数値G ̄(式(2))を出力する。G ̄の表記は式中及び図中の表記が正しい。   The negative example identification function value generation unit 75 refers to the acoustic model recording unit 70 and evaluates whether or not the feature amount series X of the input audio signal corresponds to the negative example symbol series S (S ≠ R (X)). A discrimination function value G ̄ (formula (2)) is output. The notation of G ̄ is correct in the formula and the figure.

Figure 0005113797
Figure 0005113797

ここで、Wは想定するシンボル系列全体の集合である。PΛA(X|S)は、負例シン
ボル系列S(S≠R(X))を意図して発声された音声の特徴量系列がXであることの確率であり音響モデルを用いて計算される。PΛL(S)は、負例シンボル系列S(S≠R(X))の出現に関する事前確率であり言語モデルを用いて計算される。ηは、人為的に定めることの出来る事前確率PΛL(S)の効果を制御する係数であり、ηが大きいほど式(2)における事前確率PΛL(S)の寄与が大きくなる。
Here, W is a set of all assumed symbol sequences. P ΛA (X | S) is a probability that the feature amount sequence of speech uttered with the intention of the negative example symbol sequence S (S ≠ R (X)) is X, and is calculated using an acoustic model. The P ΛL (S) is a prior probability regarding the appearance of the negative example symbol sequence S (S ≠ R (X)), and is calculated using a language model. η is a coefficient that controls the effect of prior probability P ΛL (S) that can be artificially determined. The larger η is, the larger the contribution of prior probability P ΛL (S) in equation (2) is.

正例・負例比較部77は、正例シンボル系列を評価する識別関数値Gと負例シンボル系列Sを評価する識別関数値G ̄を入力とし、それら全ての識別関数値G ̄を用いて入力音声信号の特徴量系列Xについての誤識別の尺度である式(3)に示す誤識別尺度d(X;Λ)を出力する。   The positive example / negative example comparison unit 77 receives an identification function value G that evaluates a positive example symbol series and an identification function value G 評 価 that evaluates a negative example symbol series S, and uses all these discrimination function values G ̄. A misidentification scale d (X; Λ) shown in Expression (3), which is a misclassification scale for the feature amount series X of the input speech signal, is output.

Figure 0005113797
Figure 0005113797

この誤識別尺度d(X;Λ)の意味するところは、複数の負例シンボル系列S(S≠R(X))が与える識別関数値G ̄の内の最大値と、正例シンボル系列が与える識別関数値Gとの差であり、これが正値をとるならば少なくとも一つの負例シンボル系列S(S≠R(X))の識別関数値G ̄が正例シンボル系列R(X)の識別関数値Gを上回り、入力音声信号の特徴量Xは誤識別されたことになる。   This misidentification scale d (X; Λ) means that the maximum value among the discriminant function values G ̄ given by a plurality of negative example symbol sequences S (S ≠ R (X)) and the positive example symbol sequences are This is a difference from the given discriminant function value G. If this takes a positive value, the discriminant function value G ̄ of at least one negative example symbol sequence S (S ≠ R (X)) is equal to that of the positive example symbol sequence R (X). It exceeds the discrimination function value G, and the feature amount X of the input audio signal is erroneously discriminated.

誤識別尺度d(X;Λ)を、より一般的には式(4)で表すことが出来る。   The misidentification scale d (X; Λ) can be expressed more generally by the equation (4).

Figure 0005113797
Figure 0005113797

このように、式(4)右辺第二項において、より多くの負例シンボル系列の影響を考慮
した誤識別尺度を考えることも出来る。ここでh(・)は任意の単調増加可逆関数、h-1(・)はその逆関数である。連続単語音声認識においては、誤識別尺度d(X;Λ)を例えば式(5)で定義する。
In this way, in the second term on the right side of Equation (4), it is possible to consider a misidentification scale that takes into account the influence of more negative symbol sequences. Here, h (•) is an arbitrary monotonically increasing reversible function, and h −1 (•) is its inverse function. In continuous word speech recognition, the misidentification scale d (X; Λ) is defined by, for example, Expression (5).

Figure 0005113797
Figure 0005113797

ここでφは正定数であり、φが大きいほど右辺第二項においてはS≠R(X)を満たす
exp(g(X,S;Λ)の中で最大値をとるものが支配的となる。
Here, φ is a positive constant, and as φ is larger, the second term on the right side satisfies S ≠ R (X).
The exp (g (X, S; Λ) having the maximum value is dominant.

モデルパラメータ最適化部78は、誤識別尺度d(X;Λ)を入力として、誤識別尺度d(X;Λ)によって被る損失の大きさを表す損失関数loss(d)を定義し、その総損失が最小化されるモデルパラメータΛを見つける。損失関数としては、例えば式(6)に示す様な連続非線形なものが考えられる。   The model parameter optimization unit 78 receives the misidentification measure d (X; Λ) as an input and defines a loss function loss (d) representing the magnitude of loss caused by the misidentification measure d (X; Λ). Find the model parameter Λ where the loss is minimized. As the loss function, for example, a continuous nonlinear one as shown in the equation (6) can be considered.

Figure 0005113797
Figure 0005113797

式(6)の損失関数loss(d)は、d=0のシンボル系列境界周辺の狭い領域では誤識別尺度d(X;Λ)の値に応じた0〜1の間の値をとり、d<0では0に漸近し、d>0では1に漸近する値をとる。また、最も簡単な線形の損失関数loss(d)としては式(7)が考えられる。   The loss function loss (d) in equation (6) takes a value between 0 and 1 according to the value of the misclassification measure d (X; Λ) in a narrow region around the symbol sequence boundary where d = 0. Asymptotically approaches 0 when <0, and asymptotically approaches 1 when d> 0. Further, as the simplest linear loss function loss (d), equation (7) is conceivable.

Figure 0005113797
式(7)の損失関数の場合は、損失値は誤識別尺度d(X;Λ)と一致した値となる。
Figure 0005113797
In the case of the loss function of Equation (7), the loss value is a value that matches the misclassification measure d (X; Λ).

今、一団の特徴量系列群Z={X,X,X,…}と、その特徴量系列群Zの個々に対する正例シンボル系列{R(X),R(X),R(X),…}が学習データとして与えられると、特徴量系列群Z全体の総損失L(Z;Λ)は式(8)で得られる。 Now, a group of feature quantity series groups Z = {X 1 , X 2 , X 3 ,...}, And positive example symbol series {R (X 1 ), R (X 2 ), for each of the feature quantity series groups Z When R (X 3 ),...} Is given as learning data, the total loss L (Z; Λ) of the entire feature quantity sequence group Z is obtained by Expression (8).

Figure 0005113797
Figure 0005113797

総損失L(Z;Λ)を最適化手法によって最小化するモデルパラメータΛを見つけることが、識別的学習方法の識別能力を高めることに相当する。最適化手法としては、確率的効果(PD:Probabilistic Descent)法、Quickprop法等を利用することが出来る。   Finding the model parameter Λ that minimizes the total loss L (Z; Λ) by the optimization method corresponds to enhancing the discrimination ability of the discriminative learning method. As an optimization method, a probabilistic effect (PD: Probabilistic Descent) method, a Quickprop method, or the like can be used.

また、もう一つの識別学習の代表例である最大相互情報量(MMI:Maximum Mutual Information)学習法では、式(9)で定義されるMMI目的関数FMMI(Z;Λ)を最大にするモデルパラメータΛを最適化手法によって見つける。 In addition, in a maximum mutual information (MMI) learning method, which is another typical example of discriminative learning, a model that maximizes the MMI objective function F MMI (Z; Λ) defined by Equation (9). The parameter Λ is found by an optimization method.

Figure 0005113797
Figure 0005113797

ここで、W′は想定するシンボル系列全体Wに対して連続単語音声認識をした結果とし
て得られたシンボル系列W′(W′⊂W)である。また、式(5)を変形すると式(10)で表せる。
Here, W ′ is a symbol sequence W ′ (W′⊂W) obtained as a result of continuous word speech recognition for the entire assumed symbol sequence W. Further, when equation (5) is modified, it can be expressed by equation (10).

Figure 0005113797
Figure 0005113797

このように、fMMI(X,Λ)と、MCE学習法における誤識別尺度d(X;Λ)とは、ほぼ同じ手順によって計算出来る。特にMCE学習法において線形損失関数(式(6))を適用した場合の総損失の最小化は、MMI目的関数の最大化とほぼ同じ手順になる。 Thus, f MMI (X, Λ) and the misclassification measure d (X; Λ) in the MCE learning method can be calculated by substantially the same procedure. In particular, the minimization of the total loss when the linear loss function (formula (6)) is applied in the MCE learning method is almost the same procedure as that of the MMI objective function.

式(9)と式(10)の計算では、特徴量系列Xを連続単語音声認識した結果として得
られる複数の対立関係にある正例単語系列と負例単語系列が用いられる。識別的学習にお
いては、正例,負例の認識単語系列を十分多くの種類用いて、より多様な認識誤りを考慮
することが重要である。このため、多数の単語から成る単語系列群を単語のネットワーク
構造で効率よく表現出来る単語ラティス等を利用して式(9)及び式(10)の計算が行
われる。そして、正例単語系列を教師情報として利用して総損失が最小化されるようにモ
デルパラメータΛを最適化する。
In the calculations of Expressions (9) and (10), a plurality of opposing positive example word series and negative example word series obtained as a result of continuous word speech recognition of the feature amount series X are used. In discriminative learning, it is important to consider a wider variety of recognition errors by using a sufficiently large number of positive and negative recognition word sequences. For this reason, the calculation of Expressions (9) and (10) is performed using a word lattice or the like that can efficiently express a word sequence group composed of a large number of words with a word network structure. Then, the model parameter Λ is optimized so that the total loss is minimized by using the positive example word sequence as teacher information.

[Juang & Katagiri 92] Biing-Hwang JUANG and Shigeru KATAGIRI; Discriminative Learning for Minimum Error Classification, IEEE, Trans. On SP., Vol. 40, No. 12, pp. 3043-3054 (1992).[Juang & Katagiri 92] Biing-Hwang JUANG and Shigeru KATAGIRI; Discriminative Learning for Minimum Error Classification, IEEE, Trans. On SP., Vol. 40, No. 12, pp. 3043-3054 (1992). [Katagiri et al., 98] Shigeru KATAGIRI, Biing-Hwang JUANG and Chin-Hui LEE; Pattern Recognition Using a Family of Design Algorithms Based Upon the Generalized Probabilistic Descent Method, Proc. IEEE, Vol. 86, No. 11, pp. 2345-2373 (1998).[Katagiri et al., 98] Shigeru KATAGIRI, Biing-Hwang JUANG and Chin-Hui LEE; Pattern Recognition Using a Family of Design Algorithms Based Upon the Generalized Probabilistic Descent Method, Proc.IEEE, Vol. 86, No. 11, pp 2345-2373 (1998). 「McDermott & Katagiri, 97」 Erik MCDERMOTT and Shigeru KATAGIRI; String-Level MCE for Continuous Phoneme Recognition, Proc. Eurospeech97, Vol. 1, pp. 123-126 (1997)."McDermott & Katagiri, 97" Erik MCDERMOTT and Shigeru KATAGIRI; String-Level MCE for Continuous Phoneme Recognition, Proc.Eurospeech97, Vol. 1, pp. 123-126 (1997). [McDermott et al., 07] E. McDermott, T. Hazen, J. Le Roux, A. Nakamura, and S. Katagiri: Discriminative training for large vocabulary speech recognition using Minimum Classification Error, IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 1, pp. 203-223 (2007).[McDermott et al., 07] E. McDermott, T. Hazen, J. Le Roux, A. Nakamura, and S. Katagiri: Discriminative training for large vocabulary speech recognition using Minimum Classification Error, IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 1, pp. 203-223 (2007). [Macherey et al., 05] W. Macherey, L. Haferkamp, R.Schlueter, and H. Ney: Investigations on error minimizing training criteria for discriminative training in automatic speech recognition, in Proc. Interspeech’ 05 - Eurospeech, pp. 2133-2136 (2005).[Macherey et al., 05] W. Macherey, L. Haferkamp, R. Schlueter, and H. Ney: Investigations on error minimizing training criteria for discriminative training in automatic speech recognition, in Proc. Interspeech '05-Eurospeech, pp. 2133-2136 (2005).

従来の識別的学習法は、正例シンボル系列と負例シンボル系列とを、それぞれ別に計算して求め、それぞれの誤識別尺度を最小化するか又は、それぞれの識別関数値の差を最大化するシンボル系列を求めていた。そために次のような問題が生じる。第一に正例シンボル系列と負例シンボル系列のそれぞれの識別関数値を求める必要性から所要計算リソース量が大きいという問題がある。   In the conventional discriminative learning method, a positive example symbol series and a negative example symbol series are calculated separately, and each misclassification measure is minimized or a difference between respective discriminant function values is maximized. I was seeking a symbol series. This causes the following problems. First, there is a problem that the required calculation resource amount is large due to the necessity of obtaining the discrimination function values of the positive example symbol series and the negative example symbol series.

第二に正例に準ずるシンボル系列の選定が恣意的、且つ手作業で行われていた問題がある。特徴量系列Xの正例シンボル系列R(X)と比較して、文意への影響がわずかである相違を持つ複数の負例シンボル系列{R′(X),R′(X),R′(X),…}も正例に準ずるものとして扱い正例についての知識量を大きくする。知識量を増やすことで学習データに含まれない特徴量系列に対してより頑健なモデルパラメータを生成する識別学習装置が実現できる。しかし、その複数の負例シンボル系列{R′(X),R′(X),R′(X),…}の選定は恣意的に行われていた。 Secondly, there is a problem that selection of a symbol series according to a positive example is arbitrarily and manually performed. Compared with the positive example symbol series R (X) of the feature quantity series X, a plurality of negative example symbol series {R 1 ′ (X), R 2 ′ (X) having a slight difference in the meaning of the sentence , R 3 ′ (X),...} Are treated as equivalent to the positive example, and the knowledge amount of the positive example is increased. By increasing the amount of knowledge, it is possible to realize an identification learning device that generates a more robust model parameter for a feature amount sequence not included in the learning data. However, the selection of the plurality of negative example symbol sequences {R 1 ′ (X), R 2 ′ (X), R 3 ′ (X),...} Is arbitrarily performed.

この発明は、このような点に鑑みてなされたものであり、個々のシンボル系列に対する正例,負例の区別を、正例に対する相違度として一般化することで計算量を削減すると共に、正例に準ずるシンボル系列を客観的基準に基づいて自動的に目的関数に反映させ、正例に準ずるシンボル系列の選定を手動で行う必要の無い相違度利用型識別的学習装置と、その方法とプログラムを提供することを目的とする。   The present invention has been made in view of the above points, and reduces the amount of calculation by generalizing the distinction between positive examples and negative examples for each symbol series as the degree of difference from the positive examples. A difference-based discriminative learning device that automatically reflects symbol sequences according to examples in an objective function based on objective criteria and does not need to manually select symbol sequences according to positive examples, and a method and program thereof The purpose is to provide.

この発明の相違度利用型識別的学習装置は、モデルパラメータ記録部と、パターン認識部と、識別関数値生成部と、相違度算出部と、正例認識比較部と、モデルパラメータ最適化部とを具備する。モデルパラメータ記録部はモデルパラメータを記録する。パターン認識部は、学習データをパターン認識した認識シンボル系列を生成する。識別関数値生成部は、モデルパラメータ記録部を参照して認識シンボル系列が学習データの特徴量に対応するか否かを評価する識別関数値を出力する。相違度算出部は、認識シンボル系列と正例との相違度を算出する。正例認識比較部は、N個(N≧2)の減衰係数と識別関数値と相違度を入力とし、そのN個の減衰係数を用いて正例側統合値及びその正例側統合値を補正するための統合値を求め、上記正例側統合値を補正した目的関数を出力する。モデルパラメータ最適化部は、目的関数を用いて認識シンボル系列に対応するモデルパラメータを最適化する。   The discriminating-type discriminating learning device of the present invention includes a model parameter recording unit, a pattern recognition unit, an identification function value generation unit, a difference calculation unit, a positive example recognition comparison unit, a model parameter optimization unit, It comprises. The model parameter recording unit records model parameters. The pattern recognition unit generates a recognition symbol series obtained by pattern recognition of the learning data. The discriminant function value generation unit refers to the model parameter recording unit and outputs an discriminant function value for evaluating whether or not the recognized symbol sequence corresponds to the feature amount of the learning data. The difference calculation unit calculates a difference between the recognized symbol series and the positive example. The positive example recognition comparison unit receives N (N ≧ 2) attenuation coefficients, discriminant function values, and dissimilarities as inputs, and uses the N attenuation coefficients to determine the positive example side integrated value and the positive example side integrated value. An integrated value for correction is obtained, and an objective function obtained by correcting the positive-side integrated value is output. The model parameter optimization unit optimizes the model parameter corresponding to the recognized symbol sequence using the objective function.

この発明の相違度利用型識別的学習装置によれば、相違度推定部が認識シンボル系列と正例との相違度を推定し、正例,負例の区別を、正例に対する相違度として一般化する。
そして、その相違度を用いた目的関数によってモデルパラメータを最適化する。よって、従来の識別的学習装置のように負例シンボル系列を生成するためのパターン認識処理を必要としない。また、負例シンボル系列の識別関数値を計算する必要も無くなるので所要計算リソース量を削減することが出来る。
According to the discriminant learning device using the degree of difference of the present invention, the difference degree estimation unit estimates the degree of difference between the recognized symbol sequence and the positive example, and the distinction between the positive example and the negative example is generally used as the degree of difference with respect to the positive example. Turn into.
Then, the model parameter is optimized by an objective function using the degree of difference. Therefore, unlike the conventional discriminative learning device, pattern recognition processing for generating a negative example symbol series is not required. Further, since it is not necessary to calculate the identification function value of the negative example symbol series, the required calculation resource amount can be reduced.

また、正例と正例に準ずる認識シンボル系列を減衰係数によって自動的に重み付けして目的関数に反映するので、正例に準ずるシンボル系列の選定を恣意的、且つ手作業で行う必要が無くなる。   In addition, since the recognized symbol series according to the positive example and the recognized symbol series according to the positive example are automatically weighted by the attenuation coefficient and reflected in the objective function, it is not necessary to select the symbol series according to the positive example arbitrarily and manually.

この発明の相違度利用型識別的学習装置100の機能構成例を示す図。The figure which shows the function structural example of the difference degree utilization type | formula discriminative learning apparatus 100 of this invention. 相違度利用型識別的学習装置100の動作フローを示す図。The figure which shows the operation | movement flow of the dissimilarity utilization type discriminative learning apparatus 100. FIG. 最大相互情報量学習法による正例認識比較部14の機能構成例を示す図。The figure which shows the function structural example of the positive example recognition comparison part 14 by the maximum mutual information learning method. 正例認識比較部14の動作フローを示す図。The figure which shows the operation | movement flow of the positive example recognition comparison part. 最小識別誤り学習法による正例認識比較部50の機能構成例を示す図。The figure which shows the function structural example of the positive example recognition comparison part 50 by the minimum identification error learning method. 正例認識比較部50の動作フローを示す図。The figure which shows the operation | movement flow of the positive example recognition comparison part. 従来の識別的学習装置の機能構成の一例を示す図。The figure which shows an example of a function structure of the conventional discriminative learning apparatus.

この発明の相違度利用型識別的学習装置は、従来の識別的学習装置700の正例シンボル系列のパターン認識を行う正例側音声認識部74と正例識別関数値生成部73に相当する機能構成を必要としない点で新しい。この発明の実施例の説明をする前に、この発明の基本的な考えについて説明する。   The discriminant-based discriminative learning device of the present invention has functions corresponding to a positive example side speech recognition unit 74 and a positive example discrimination function value generation unit 73 that perform pattern recognition of a positive example symbol series of the conventional discriminative learning device 700. New in that no configuration is required. Before describing the embodiments of the present invention, the basic idea of the present invention will be described.

〔基本的な考え〕
この発明の相違度利用型識別的学習方法は、各々の認識シンボル系列に対する正例、負
例の区別を、正例に対する相違度を用いて抽象化し、相違度を基準とした識別関数値の荷重和を用いて学習する方法である。
[Basic idea]
The discriminant-based discriminative learning method of the present invention abstracts the distinction between positive examples and negative examples for each recognition symbol sequence by using the degree of difference with respect to the positive example, and weights the discriminant function value based on the degree of difference. It is a method of learning using the sum.

まず、二つの任意のシンボル系列VとSとの間の相違度を表す関数Δ(V,S)を導入する。関数Δ(V,S)の実現法としては、例えばVとSの間の編集距離等を用いることが出来る。また、VとSとが共通の特徴量系列Xに対応付けられている場合には、シンボル系列を成す各々のシンボルと特徴量系列を成す各々の特徴量との対応関係に基づく相違尺度(参考文献;J. Zheng and A. Stolcke: Improved Discriminative Training Using Phone Lattices, in Proc. Interspeech, pp. 2125-2128, (2005))等が利用できる。   First, a function Δ (V, S) representing the degree of difference between two arbitrary symbol sequences V and S is introduced. As a method for realizing the function Δ (V, S), for example, an edit distance between V and S can be used. When V and S are associated with a common feature quantity sequence X, a difference scale based on the correspondence between each symbol constituting a symbol series and each feature quantity constituting a feature quantity series (reference) Literature: J. Zheng and A. Stolcke: Improved Discriminative Training Using Phone Lattices, in Proc. Interspeech, pp. 2125-2128, (2005)).

正例シンボル系列R(X)と任意の認識シンボル系列S(S∈W′)との間の相違度Δ(R(X),S)は、認識シンボル系列S(S∈W′)の誤り尺度とみなすことが出来る。この相違度Δ(R(X),S)を利用して新たな目的関数F+(Z;Λ)を式(11)に示すように定義することが出来る。F+の+はこの発明で提案するものであることを意味する。 The difference Δ (R (X), S) between the positive example symbol series R (X) and the arbitrary recognition symbol series S (S∈W ′) is an error of the recognition symbol series S (S∈W ′). It can be regarded as a scale. Using this difference Δ (R (X), S), a new objective function F + (Z; Λ) can be defined as shown in equation (11). F + means that it is proposed in the present invention.

Figure 0005113797
Figure 0005113797

+(X;Λ)の各項は、各認識シンボル系列S(S∈W′)に対応する識別関数値g(X,S;Λ)を任意の単調増加関数h(・)に通し、その値に正例との相違度Δ(R(X),S)と減衰係数σを乗じた値の指数関数値を乗ずるものである。つまり、指数減衰する値で荷重和をとったもので成り立っている。減衰係数σ,σを適切に設定することで、目的関数F+(Z;Λ)の最大化による識別的学習が行える。 Each term of f + (X; Λ) passes the discriminant function value g (X, S; Λ) corresponding to each recognition symbol sequence S (S∈W ′) through an arbitrary monotonically increasing function h (•), This value is multiplied by an exponential function value obtained by multiplying the difference Δ (R (X), S) from the positive example by the attenuation coefficient σ. In other words, it consists of a value that exponentially decays and a sum of loads. By appropriately setting the attenuation coefficients σ 1 and σ 2 , discriminative learning can be performed by maximizing the objective function F + (Z; Λ).

例えば、減衰係数σを大きな値にすることにより、式(11)の右辺第一項は、相違度Δ(R(X),S)が小さい程大きな値となる。よって、相違度Δ(R(X),S)=0の正例と正例に準ずる相違度Δ(R(X),S)が極小さな認識シンボル系列S(S∈W′)についての識別関数値の影響が支配的となる。減衰係数σ=0とすると式(11)の右辺第二項の荷重値は全て1となり、認識シンボル系列群内(S∈W′)の全ての識別関数値が公平に扱われる。つまり、式(11)の右辺第一項の値は、正例若しくは正例に極近いシンボル系列の識別関数値の荷重和であり、右辺第二項はほとんどが負例のシンボル系列の識別関数値の累積となる。 For example, by setting the attenuation coefficient σ 1 to a large value, the first term on the right side of the equation (11) becomes a larger value as the dissimilarity Δ (R (X), S) is smaller. Therefore, a positive example of the difference degree Δ (R (X), S) = 0 and a recognition symbol series S (S∈W ′) having a very small difference degree Δ (R (X), S) according to the positive example. The effect of the function value becomes dominant. When the attenuation coefficient σ 2 = 0, the weight values in the second term on the right side of Equation (11) are all 1, and all discriminant function values in the recognition symbol sequence group (S * ∈W ′) are treated fairly. That is, the value of the first term on the right side of Equation (11) is a weighted sum of the discriminant function values of the symbol series that are close to the positive example or the positive example, and the second term on the right side is mostly the discriminant function of the symbol series of the negative example. Cumulative value.

このように、この発明によれば、一回のパターン認識で目的関数を生成することが可能である。よって、従来の方法よりも計算量を削減することが出来る。また、式(11)の第二式右辺第一項の指数減衰係数σによって、正例と正例に準ずる認識シンボル系列S(S∈W′)を相違度の大きさに応じて自動的に重み付けして目的関数に反映させることが出来る。その結果、従来の方法のように正例に準ずるシンボル系列の選定を手作業で行う必要がない。 As described above, according to the present invention, it is possible to generate an objective function by a single pattern recognition. Therefore, the amount of calculation can be reduced as compared with the conventional method. Further, the recognized symbol sequence S (S∈W ′) according to the positive example and the positive example is automatically set according to the magnitude of the difference by the exponential attenuation coefficient σ 1 of the first term on the right side of the second expression of the equation (11). Can be weighted and reflected in the objective function. As a result, it is not necessary to manually select a symbol series according to the positive example as in the conventional method.

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。   Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated.

図1にこの発明の相違度利用型識別的学習装置100の機能構成例を示す。その動作フローを図2に示す。相違度利用型識別的学習装置100は、モデルパラメータ記録部12と、識別関数値生成部11と、パターン認識部10と、相違度算出部13と、正例認識比較部14と、モデルパラメータ最適化部15とを具備する。相違度利用型識別的学習装置100は、例えばROM、RAM、CPU等で構成されるコンピュータに所定のプログラムが読み込まれて、CPUがそのプログラムを実行することで実現されるものである。   FIG. 1 shows an example of the functional configuration of the discriminant utilization type discriminative learning device 100 of the present invention. The operation flow is shown in FIG. The dissimilarity utilization type discriminative learning device 100 includes a model parameter recording unit 12, a discriminant function value generation unit 11, a pattern recognition unit 10, a dissimilarity calculation unit 13, a positive example recognition comparison unit 14, and a model parameter optimum. And the conversion unit 15. The dissimilarity utilization type discriminative learning device 100 is realized by reading a predetermined program into a computer composed of, for example, a ROM, a RAM, a CPU, and the like, and executing the program by the CPU.

相違度利用型識別的学習装置100は、学習データの特徴量系列Xとその正解である正例シンボル系列R(X)を入力信号として、最適化したモデルパラメータΛを出力す
るものである。図1及び図3、図5の入力信号の表記は、多数の特徴量系列{X,X,X,…}、多数の正例シンボル系列{R(X),R(X),R(X),…}を、X及びR(X)と表記している。なお、本文中にはこの表記は用いない。
The dissimilarity utilization type discriminative learning device 100 outputs an optimized model parameter Λ m by using a feature value sequence X of learning data and a correct example symbol sequence R (X) as a correct answer thereof as input signals. 1, 3, and 5 are represented by a large number of feature quantity sequences {X 1 , X 2 , X 3 ,...} And a number of positive example symbol sequences {R (X 1 ), R (X 2 ), R (X 3 ),...} Are denoted as X * and R (X * ). Note that this notation is not used in the text.

モデルパラメータ記録部12は、音響モデルと言語モデルとから成る認識対象シンボル系列に対応するモデルパラメータを記録する。パターン認識部10は、外部から入力される学習データの特徴量系列Xをパターン認識した認識シンボル系列Sを生成する(ステップS10)。識別関数値生成部11は、認識シンボル系列Sを入力としモデルパラメータ記録部12を参照して、その認識シンボル系列Sが学習データの特徴量系列Xに対応するか否かを評価する識別関数値g(X,S;Λ)を出力する(ステップS11)。   The model parameter recording unit 12 records model parameters corresponding to a recognition target symbol sequence including an acoustic model and a language model. The pattern recognition unit 10 generates a recognition symbol series S obtained by pattern recognition of the feature quantity series X of the learning data input from the outside (step S10). The discriminant function value generation unit 11 receives the recognition symbol sequence S as an input and refers to the model parameter recording unit 12 to evaluate whether or not the recognition symbol sequence S corresponds to the feature amount sequence X of the learning data. g (X, S; Λ) is output (step S11).

識別関数値g(X,S;Λ)は、パターン認識部10を介して正例認識比較部14に入力される。相違度算出部13は、学習データの特徴量系列Xに対応する正例シンボル系列R(X)と、認識シンボル系列Sを入力として、その間の相違度Δ(R(X,S))を算出する(ステップS13)。   The discriminant function value g (X, S; Λ) is input to the positive example recognition comparison unit 14 via the pattern recognition unit 10. The dissimilarity calculation unit 13 receives the positive example symbol series R (X) corresponding to the feature amount series X of the learning data and the recognition symbol series S as input, and calculates the dissimilarity Δ (R (X, S)) between them. (Step S13).

正例認識比較部14は、予め定められたN個(N≧2)の減衰係数σ,σと上記識別関数値g(X,S;Λ)と相違度Δ(R(X,S))とを入力とし、N個の減衰係数を用いて正例側統合値及びその正例側統合値を補正するための統合値を求め、正例側統合値を補正した目的関数f(X;Λ)を出力する(ステップS14)。 The positive example recognition / comparison unit 14 has predetermined N (N ≧ 2) attenuation coefficients σ 1 and σ 2 , the discrimination function value g (X, S; Λ), and the difference Δ (R (X, S )) As inputs, and the N-type attenuation coefficient is used to obtain a positive-side integrated value and an integrated value for correcting the positive-side integrated value, and the objective function f + ( X; Λ) is output (step S14).

モデルパラメータ最適化部15は、目的関数f(X;Λ)を用いて目的関数f(X;Λ)をより大きくする様にパラメータの集合Λ内の認識シンボル系列に対応するモデルパラメータを最適化する(ステップS15)。モデルパラメータ最適化部15は、目的関数f(X;Λ)の増分が予め定めた収束条件閾値よりも小さな値になるまでモデルパラメータを最適化する。 Model parameter optimization unit 15, + objective function f model parameters corresponding to the recognized symbol sequence in the set lambda parameter as a larger;; X) objective function f + using (X lambda) Optimize (step S15). The model parameter optimization unit 15 optimizes the model parameter until the increment of the objective function f + (X; Λ) becomes a value smaller than a predetermined convergence condition threshold value.

以上のように実施例1の相違度利用型識別的学習装置100は、従来の識別的学習装置700が必要とした正例側のパターン認識部(正例用音声認識部74)と正例識別関数値生成部73に相当する機能構成がない。パターン認識部10の1回のパターン認識動作で目的関数f(X;Λ)を生成する。したがって、従来の識別的学習装置700よりも計算量を削減することが出来る。 As described above, the discriminant learning device 100 using the degree-of-difference according to the first embodiment has a pattern recognition unit (positive speech recognition unit 74 for positive examples) and a positive example identification required by the conventional discriminative learning device 700. There is no functional configuration corresponding to the function value generation unit 73. The objective function f + (X; Λ) is generated by one pattern recognition operation of the pattern recognition unit 10. Therefore, the calculation amount can be reduced as compared with the conventional discriminative learning device 700.

なお、実施例1のパターン認識部10、識別関数値生成部11、モデルパラメータ記録部12、モデルパラメータ最適化部15は、それぞれ従来の識別的学習装置700の音声認識部76、負例識別関数値生成部75、モデルパラメータ最適化部78に対応するものであり各々の動作も同じである。   The pattern recognition unit 10, the discrimination function value generation unit 11, the model parameter recording unit 12, and the model parameter optimization unit 15 according to the first embodiment are respectively a speech recognition unit 76 and a negative example discrimination function of the conventional discriminative learning device 700. The operations correspond to the value generation unit 75 and the model parameter optimization unit 78, and the operations are the same.

相違度利用型識別的学習装置100は、相違度算出部13と正例認識比較部14の機能構成が新しい。以降の説明では、この新しい構成についてのみ説明を行う。なお、相違度利用型識別的学習装置100では、学習データの特徴量系列Xに対応する正解シンボル系列R(X)を入力する例で説明を行ったが、正解シンボル系列R(X)の入力が無くてもこの発明の相違度利用型識別的学習装置100は実現出来る。   In the dissimilarity utilization type discriminative learning device 100, the functional configurations of the dissimilarity calculation unit 13 and the positive example recognition comparison unit 14 are new. In the following description, only this new configuration will be described. Note that, in the dissimilarity utilization type discriminative learning device 100, an example in which the correct symbol sequence R (X) corresponding to the feature amount sequence X of the learning data is input has been described. However, the correct symbol sequence R (X) is input. Even if there is no difference, the dissimilarity utilization type discriminative learning device 100 of the present invention can be realized.

〔変形例〕
図1に破線で、正解シンボル系列R(X)の入力を必要としない実施例1の変形例の相違度利用型識別的学習装置100′の機能構成例を示す。変形例は、相違度算出部13′の入力信号とその動作のみが異なる。相違度推定部13′は、学習データの特徴量系列Xとその特徴量系列Xをパターン認識した認識シンボル系列S(S∈W′)と識別関数値g(X,S;Λ)を入力として、認識シンボル系列S(S∈W′)と正例との相違度Δ(R(X,S))の推定値Δ^(S)を推定する。相違度の推定値Δ^(S)は、例えば、[Wessel et ai., 01]の方法により認識結果の確信度θ(S)を推定することで計算出来る。(参考文献:F. Wessel, R. Schiuter, K. Macherey, and H. Ney: Confidence Measures for Large Vocabulary Continuous Speech Recognition, IEEE Transactions on Speech and Audio Processing, vol. 9, no. 3, pp. 288-298 (2001))
[Modification]
FIG. 1 shows a functional configuration example of the dissimilarity utilization type discriminative learning device 100 ′ of the modification of the first embodiment that does not require the input of the correct symbol series R (X) by a broken line. In the modification, only the input signal of the dissimilarity calculation unit 13 ′ and its operation are different. The dissimilarity estimation unit 13 ′ receives as input a feature quantity sequence X of learning data, a recognition symbol series S (S∈W ′) obtained by pattern recognition of the feature quantity series X, and an identification function value g (X, S; Λ). Then, an estimated value Δ ^ (S) of the difference Δ (R (X, S)) between the recognized symbol sequence S (S∈W ′) and the positive example is estimated. The estimated difference Δ ^ (S) can be calculated, for example, by estimating the certainty factor θ (S) of the recognition result by the method of [Wessel et ai., 01]. (Reference: F. Wessel, R. Schiuter, K. Macherey, and H. Ney: Confidence Measures for Large Vocabulary Continuous Speech Recognition, IEEE Transactions on Speech and Audio Processing, vol. 9, no. 3, pp. 288- 298 (2001))

確信度θ(S)とは、認識結果を正解として信頼できる度合いを表す尺度である。例えば、特徴量系列Xを連続パターン認識して得られた複数の認識シンボル系列S,S,S,…の内、Sの確信度を考える。もし識別関数値g(X,S,Λ)の値がg(X,S,Λ),(X,S,Λ),…との比較において突出して大きければ、Sが正解である確信度は大きいとみなすことが出来る。 The certainty factor θ (S) is a scale that represents the degree of confidence that the recognition result is a correct answer. For example, consider the certainty of S 1 among a plurality of recognition symbol series S 1 , S 2 , S 3 ,... Obtained by continuous pattern recognition of the feature quantity series X. If the discriminant function value g (X 1, S 1, Λ) the value of g (X 2, S 2, Λ), (X 3, S 3, Λ), larger projects in comparison ... and, S 1 It can be considered that the certainty that is correct is large.

逆にg(X,S,Λ),(X,S,Λ),…の多くが、g(X,S,Λ)の値と同程度の値を持つ場合は、Sが正解であることの確信度は小さくなる。この他、Sを構成する各シンボル毎に対応する部分特徴量系列の長さの妥当性や、Sを構成する各シンボルが同一シンボル系列内に共に存在することの妥当性も考慮して、Sが正解として信頼できる度合いを0〜1以下の数値θ(S)で表す。 Conversely, when many of g (X 2 , S 2 , Λ), (X 3 , S 3 , Λ),... Have the same value as g (X 1 , S 1 , Λ), confidence that S 1 is the correct answer is reduced. In addition, considering the validity of the length of the partial feature quantity sequence corresponding to each symbol constituting S 1 and the validity of the symbols constituting S 1 being present in the same symbol series. , S 1 represents a degree of reliability as a correct answer by a numerical value θ (S 1 ) of 0 to 1 or less.

と正例シンボル系列の相違度推定値Δ^(S)は、Sが正解として信頼できるほど、つまり数値θ(S)が1に近いほど0に近づく、Sが正解として信頼できないほど大きくなるように定める。この相違度推定値Δ^(S)は例えば式(12)で計算することが出来る。 The estimated difference Δ ^ (S 1 ) between S 1 and the positive example symbol series is closer to 0 as S 1 is more reliable as a correct answer, that is, as the numerical value θ (S 1 ) is closer to 1 , and S 1 is taken as a correct answer. Determine to be unreliable. This difference degree estimated value Δ ^ (S 1 ) can be calculated by, for example, Expression (12).

Figure 0005113797
Figure 0005113797

この相違度推定値Δ^(S)を用いることで、正例シンボル系列R(X)が明示的に与えられなくても識別的学習を実行することが可能である。従来の識別敵学習では、正例シンボル系列が与えられていなければ識別的学習を実行できない問題から、大量のデータの全てに正例シンボル系列を付与する必要があった。しかし、相違度推定値Δ^(S)を計算する相違度算出部13′を設けることで、手作業で正例シンボル系列を用意する必要が無くなる。   By using this difference degree estimation value Δ ^ (S), it is possible to execute discriminative learning even if the positive symbol series R (X) is not explicitly given. In the conventional discriminative enemy learning, since the discriminative learning cannot be performed unless the positive example symbol sequence is given, it is necessary to assign the positive example symbol sequence to all of a large amount of data. However, by providing the dissimilarity calculation unit 13 ′ for calculating the dissimilarity estimated value Δ ^ (S), it is not necessary to prepare a normal example symbol series manually.

図3に、正例認識比較部14のより具体的な機能構成例を示して実施例1を更に詳しく説明する。図4のその動作フローを示す。図3はMMI学習法における目的関数f(X;Λ)の計算の実現例を示したものである。 FIG. 3 shows a more specific functional configuration example of the positive example recognition / comparison unit 14, and the first embodiment will be described in more detail. The operation | movement flow of FIG. 4 is shown. FIG. 3 shows an implementation example of the calculation of the objective function f + (X; Λ) in the MMI learning method.

正例認識比較部14は、識別関数平滑化・逆対数化手段140と、正例側荷重手段141と、認識側荷重手段142と、正例側統合・対数化手段143と、認識側統合・対数化手段144と、統合値比較手段145とを備える。なお、識別関数平滑化・逆対数化手段140と正例側荷重手段141と認識側荷重手段142とを、それぞれ一つずつ設ける例を示しているが、多数入力される認識シンボル系列S(S∈W′)にそれぞれ対応する各手段140,141,142を設けて識別関数値g(X,S;Λ)と相違度Δ(R(X),S)を同時に処理するようにしても良い。図3に示す例は、多数入力される認識シンボル系列S(S∈W′)を時間を分けて処理する方式の機能構成例である。   The positive example recognition comparison unit 14 includes a discriminant function smoothing / reverse logarithmizing unit 140, a positive example side load unit 141, a recognition side load unit 142, a positive example side integration / logarithmization unit 143, and a recognition side integration / A logarithmizing means 144 and an integrated value comparing means 145 are provided. In addition, although the example which provides one each of discriminant function smoothing / antilogarithmizing means 140, positive example side load means 141, and recognition side load means 142 is shown, recognition symbol series S (S Each means 140, 141, 142 corresponding to εW ′) may be provided to simultaneously process the discrimination function value g (X, S; Λ) and the difference Δ (R (X), S). . The example shown in FIG. 3 is an example of a functional configuration of a method for processing a recognition symbol sequence S (SεW ′), which is input in large numbers, at different times.

識別関数平滑化・逆対数化手段140は、識別関数平滑化値Aを式(13)で計算する(ステップS140)。   The discriminant function smoothing / antilogarithmizing unit 140 calculates the discriminant function smoothing value A by the equation (13) (step S140).

Figure 0005113797
Figure 0005113797

識別関数平滑化値Aは、識別関数値g(X,S;Λ)に予め定められた正定数φを乗じた値の指数関数値である。   The discriminant function smoothing value A is an exponential function value obtained by multiplying the discriminant function value g (X, S; Λ) by a predetermined positive constant φ.

正例側荷重手段141は、正例側荷重値Bを式(14)で計算する(ステップS141)。   The positive example side load means 141 calculates the positive example side load value B by the equation (14) (step S141).

Figure 0005113797
Figure 0005113797

正例側荷重値Bは、相違度Δ(R(X),S)に第1の減衰係数-σを乗じた値の指数関数値に、識別関数平滑化値Aを乗じた値である。 The positive side load value B is a value obtained by multiplying an exponential function value obtained by multiplying the degree of difference Δ (R (X), S) by the first attenuation coefficient −σ 1 by the discrimination function smoothing value A. .

認識側荷重手段142は、認識側統合値Cを式(15)で計算する(ステップS142)。   The recognition-side load means 142 calculates the recognition-side integrated value C using equation (15) (step S142).

Figure 0005113797
Figure 0005113797

認識側荷重値Cは、相違度Δ(R(X),S)に第2の減衰係数-σを乗じた値の指数関数値に、識別関数平滑化値Aを乗じた値である。 The recognition-side load value C is a value obtained by multiplying an exponential function value obtained by multiplying the degree of difference Δ (R (X), S) by the second attenuation coefficient −σ 2 by the discrimination function smoothing value A.

正例側統合・対数化手段143は、正例側統合値Dを式(16)で計算する(ステップ
S143)。
The positive example side integration / logarithmization means 143 calculates the positive example side integration value D by the equation (16) (step S143).

Figure 0005113797
Figure 0005113797

正例側統合値Dは、全ての認識シンボル系列S(S∈W′)に対する正例側荷重値Bの総和の対数関数値である。   The positive example side integrated value D is a logarithmic function value of the sum of the positive example side load values B for all the recognized symbol sequences S (SεW ′).

認識側統合・対数化手段144は、認識側統合値Eを式(17)で計算する(ステップS144)。   The recognizing side integration / logarithmizing means 144 calculates the recognizing side integrated value E by the equation (17) (step S144).

Figure 0005113797
Figure 0005113797

認識側統合値Eは、全ての認識シンボル系列S(S∈W′)に対応する認識側荷重値C
を累計した値の対数関数値であり、正例側統合値を補正するための統合値である。
The recognition-side integrated value E is a recognition-side load value C corresponding to all recognition symbol sequences S (S∈W ′).
Is a logarithmic function value of the accumulated value, and an integrated value for correcting the positive example side integrated value.

統合値比較手段145は、正例側統合値Dと認識側統合値Cを入力として式(18)に示す目的関数f(X;Λ)を出力する(ステップS145)。 The integrated value comparison means 145 outputs the objective function f + (X; Λ) shown in the equation (18) with the positive example integrated value D and the recognition integrated value C as inputs (step S145).

Figure 0005113797
Figure 0005113797

式(18)において、例えば減衰係数σを十分大きくとり、σ=0として各種最適化手法によって最大化することで、従来のMMI学習法より小さい所要計算リソース量で従来のMMI学習法と同等の認識精度の向上が図れる。所要計算リソース量を削減する方法は実施例1の構成に限られない、他の方法を実施例2として説明する。 In Expression (18), for example, by taking a sufficiently large attenuation coefficient σ 1 and maximizing it by various optimization methods with σ 2 = 0, the conventional MMI learning method can be compared with the conventional MMI learning method with a smaller required calculation resource amount. Equivalent recognition accuracy can be improved. The method for reducing the required calculation resource amount is not limited to the configuration of the first embodiment, and another method will be described as a second embodiment.

図5に実施例2の正例認識比較部50の機能構成例を示す。その動作フローを図6に示
す。正例認識比較部50は、識別関数平滑化・逆対数化手段140と、正例側荷重手段1
41と、正例側統合・対数化手段143と、負例側荷重手段242と、負例側統合・対数
化手段244と、統合値比較手段245とを備える。識別関数平滑化・逆対数化手段14
0と正例側荷重手段141と正例側統合・対数化手段143とは、実施例1の正例認識比
較部14と同じものであり、正例側統合・対数化手段143は正例側統合値D(式(16))を出力する。よって、この部分の説明は省略する。
FIG. 5 shows a functional configuration example of the positive example recognition / comparison unit 50 of the second embodiment. The operation flow is shown in FIG. The positive example recognition / comparison unit 50 includes a discriminant function smoothing / antilogarithmizing unit 140 and a positive example side loading unit 1.
41, positive example side integration / logarithmization means 143, negative example side load means 242, negative example side integration / logarithmization means 244, and integrated value comparison means 245. Discriminant function smoothing / antilogarithmization means 14
The positive example side load means 141 and the positive example side integration / logarithmization means 143 are the same as the positive example recognition comparison unit 14 of the first embodiment, and the positive example side integration / logarithmization means 143 is the positive example side. The integrated value D (Formula (16)) is output. Therefore, description of this part is omitted.

負例側荷重手段242は、負例側統合値Kを式(19)で計算する(ステップS242)。   The negative example side load means 242 calculates the negative example side integrated value K by the equation (19) (step S242).

Figure 0005113797
Figure 0005113797

負例側統合値Kは、相違度Δ(R(X),S)に第1の減衰係数σよりも小さな第2の減衰係数σを乗じた値の指数関数値に識別関数平滑化値Aを乗じた第1負例側荷重値と、第2減衰係数よりも大きな第3の減衰係数σを相違度Δ(R(X),S)に乗じた値の指数関数値に識別関数平滑化値Aを乗じた第2負例側荷重値とを計算し、第1負例側荷重値から第2負例側荷重値を減じた値を正例側統合値Dを補正するための統合値である負例側統合値として計算する。 The negative example-side integrated value K is an identification function smoothed by an exponential function value obtained by multiplying the degree of difference Δ (R (X), S) by a second attenuation coefficient σ 2 smaller than the first attenuation coefficient σ 1 . The first negative example side load value multiplied by the value A and the third attenuation coefficient σ 3 larger than the second attenuation coefficient are identified as exponential function values obtained by multiplying the difference Δ (R (X), S) by the difference Δ The second negative example side load value multiplied by the function smoothing value A is calculated, and the positive example side integrated value D is corrected by subtracting the second negative example side load value from the first negative example side load value. It is calculated as the negative example side integrated value that is the integrated value of.

負例側統合・対数化手段244は、負側統合値Lを式(20)で計算する(ステップS244)。   The negative example side integration / logarithmizing means 244 calculates the negative side integration value L by the equation (20) (step S244).

Figure 0005113797
Figure 0005113797

負例側統合値Lは、全ての認識シンボル系列S(S∈W′)に対する負例側統合値Lの
総和の対数関数値である。
The negative example side integration value L is a logarithmic function value of the sum of the negative example side integration values L for all recognition symbol sequences S (SεW ′).

統合値比較手段245は、正例側統合値Dから負例側統合値Lを減じた識別尺度d(X;Λ)を式(21)で計算し、識別尺度d(X;Λ)を損失関数に通した目的関数loss(d(X;Λ))として出力する。損失関数は例えば上記した式(6)に示した様なものである。 The integrated value comparison means 245 calculates the identification measure d (X; Λ) obtained by subtracting the negative example-side integrated value L from the positive example-side integrated value D by the equation (21), and the identification measure d (X; Λ) is lost. The objective function loss (d + (X; Λ)) passed through the function is output. The loss function is, for example, as shown in the above equation (6).

Figure 0005113797
Figure 0005113797

正例側統合値D内の減衰係数σを大きくとると共に、負例側統合値L内の減衰係数σ=0、例えばσ≒σとすれば、学習における対立シンボル系列として負例シンボル系列のみを用いることが出来る。つまり、減衰係数σ3を減衰係数σ1に近い値にすることにより、負例側シンボル系列から正例と正例に極近い負例シンボル系列を削除することが出来る。 If the attenuation coefficient σ 1 in the positive example side integrated value D is increased and the attenuation coefficient σ 2 = 0 in the negative example side integrated value L, for example, σ 3 ≈σ 1 , a negative example as an opposing symbol sequence in learning is used. Only symbol sequences can be used. That is, by setting the attenuation coefficient σ3 to a value close to the attenuation coefficient σ1, the negative example symbol series that is very close to the positive example and the positive example can be deleted from the negative example side symbol series.

このように正例認識比較部50を備えた相違度利用型識別的学習装置によれば、MCE
学習法より小さな所要計算リソース量でMCE学習法と同等の認識精度の向上が図れる。
As described above, according to the discriminant learning device using the degree of difference provided with the positive example recognition comparison unit 50, the MCE
The recognition accuracy equivalent to that of the MCE learning method can be improved with a smaller amount of required calculation resources than the learning method.

〔評価実験〕
この発明の実施例1の相違度利用型識別的学習装置100を用いて、モデルパラメータ
を取得する実験を行った。実験は、日本語の学会講演約230時間分の音声を学習データ
として用いた。まず、既存技術である最大尤度学習法によって初期モデルを学習し、その
初期モデルをそのまま用いて連続単語音声認識装置を動作させた場合の単語誤り率は21.2%であった。
[Evaluation experiment]
An experiment for acquiring model parameters was performed using the discriminating-type discriminative learning device 100 of Example 1 of the present invention. In the experiment, about 230 hours of speech for a Japanese conference was used as learning data. First, when the initial model was learned by the maximum likelihood learning method which is an existing technique and the continuous word speech recognition apparatus was operated using the initial model as it was, the word error rate was 21.2%.

この21.2%に対して同じ学習データを用いて既存のMMI学習法によって得たモデ
ルパラメータを用いた場合の単語誤り率は18.6%であった。この単語誤り率を比較対象として、相違度利用型識別的学習装置100の第2の減衰係数σ=−4、第1の減衰係数σを1,2,3,4として得たモデルパラメータによる連続単語音声認識結果の単語誤り率を表1に示す。
The word error rate in the case of using the model parameters obtained by the existing MMI learning method using the same learning data for 21.2% was 18.6%. Using this word error rate as a comparison target, model parameters obtained by setting the second attenuation coefficient σ 2 = −4 and the first attenuation coefficient σ 1 as 1 , 2, 3, 4 of the dissimilarity utilization type discriminative learning device 100 Table 1 shows the word error rates of the continuous word speech recognition results obtained from the above.

Figure 0005113797
Figure 0005113797

この発明の相違度利用型識別的学習装置100によるモデルパラメータで連続単語音声
認識装置を動作させた場合の単語誤り率は18.5%〜18.7%と、従来のMMI学習法による単語誤り率と同等の結果が得られた。なお、この発明の実施例2と従来のMCE学習法との比較実験は未実施であるが、MMI学習法の比較結果と同じように同水準の単語誤り率となると予測される。
The word error rate when the continuous word speech recognition apparatus is operated with the model parameters by the discriminative learning apparatus 100 using the degree-of-difference of the present invention is 18.5% to 18.7%, and the word error by the conventional MMI learning method A result equivalent to the rate was obtained. In addition, although the comparison experiment of Example 2 of this invention and the conventional MCE learning method is not implemented, it is estimated that it becomes the same level word error rate similarly to the comparison result of the MMI learning method.

以上説明した相違度利用型識別的学習装置は、例えば音声認識装置に利用することが可
能である。また、それ以外の用途として静止画像、動画画像等の時間軸上、空間軸上ある
いはその双方において変化し、何らかの概念情報を表現する特徴量系列をパターン認証対
象とする認識装置に適用することが可能である。具体例としては、手書き文字の画像情報
のパターン認識に用いることが出来る。
The dissimilarity utilization type discriminative learning device described above can be used for a speech recognition device, for example. Also, as other uses, it may be applied to a recognition apparatus that uses a feature quantity series that changes on the time axis, the space axis, or both of a still image, a moving image, etc., and expresses some conceptual information as a pattern authentication target. Is possible. As a specific example, it can be used for pattern recognition of image information of handwritten characters.

なお、正定数φ及び減衰係数σ〜σは正例認識比較部14,50に予め設定されている例で説明したが、これらの値を外部から与えるようにしても良い。また、上記方法及び装置において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 Although the positive constant φ and the attenuation coefficients σ 1 to σ 3 have been described as examples set in advance in the positive example recognition comparison units 14 and 50, these values may be given from the outside. Further, the processes described in the above method and apparatus are not only executed in time series according to the order of description, but also may be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Good.

また、上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。   Further, when the processing means in the above apparatus is realized by a computer, the processing contents of functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、DVD(Digital Versatile Disc)、DVD−RAM(Random Access Memory)、CD−ROM(Compact Disc Read Only Memory)、CD−R(Recordable)/RW(ReWritable)等を、光磁気記録媒体として、MO(Magneto Optical disc)等を、半導体メモリとしてEEP−ROM(Electronically Erasable and Programmable-Read Only Memory)等を用いることができる。   The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD−ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。   The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。   Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

Claims (12)

モデルパラメータを記録するモデルパラメータ記録部と、
学習データの特徴量系列をパターン認識した認識シンボル系列を生成するパターン認識部と、
上記モデルパラメータ記録部を参照して上記認識シンボル系列が上記学習データの特徴量系列に対応するか否かを評価する識別関数値を出力する識別関数値生成部と、
上記認識シンボル系列と正例との相違度を算出する相違度算出部と、
予め定められたN個(N≧2)の減衰係数と上記識別関数値と上記相違度とを入力とし、上記N個の減衰係数を用いて正例側統合値及びその正例側統合値を補正するための統合値を求め、上記正例側統合値を補正した目的関数を出力する正例認識比較部と、
上記目的関数を用いて上記認識シンボル系列に対応する上記モデルパラメータを最適化するモデルパラメータ最適化部と、
を具備する相違度利用型識別的学習装置。
A model parameter recording unit for recording model parameters;
A pattern recognition unit that generates a recognition symbol sequence obtained by pattern recognition of a feature amount sequence of learning data;
An identification function value generation unit that outputs an identification function value for evaluating whether or not the recognition symbol sequence corresponds to a feature amount sequence of the learning data with reference to the model parameter recording unit;
A difference calculation unit for calculating a difference between the recognition symbol series and the positive example;
N predetermined attenuation coefficients (N ≧ 2), the discriminant function value, and the dissimilarity are input, and the positive example side integrated value and the positive example side integrated value are obtained using the N attenuation coefficients. A positive example recognition / comparison unit that calculates an integrated value for correction and outputs an objective function that corrects the positive example side integrated value;
A model parameter optimization unit that optimizes the model parameter corresponding to the recognition symbol sequence using the objective function;
Dissimilarity utilization type discriminative learning device comprising:
請求項1の相違度利用型識別的学習装置において、
上記正例認識比較部は2個の減衰係数を備え、
上記正例認識比較部は、
上記識別関数値に予め定められた正定数を乗じた値の指数関数値である識別関数平滑化値を求め、
上記相違度に第1の減衰係数を乗じた値の指数関数値に上記識別関数平滑化値を乗じて正例側荷重値とし、全ての認識シンボル系列に対する上記正例側荷重値の総和を任意の単調増加可逆関数に通して正例側統合値として求め、
上記相違度に上記第1の減衰係数よりも小さな第2の減衰係数を乗じた値の指数関数値に上記識別関数平滑化値を乗じて認識側荷重値とし、全ての認識シンボル系列に対する上記認識側荷重値の総和を任意の単調増加可逆関数に通して上記正例側統合値を補正するための統合値である認識側統合値を求め、
上記正例統合値から上記認識側統合値を減じたものを目的関数として出力するものであることを特徴とする相違度利用型識別的学習装置。
The dissimilarity utilization type discriminative learning device according to claim 1,
The positive example recognition comparison unit includes two attenuation coefficients,
The positive example recognition comparison unit
Obtain an identification function smoothing value that is an exponential function value obtained by multiplying the above identification function value by a predetermined positive constant;
The exponential function value obtained by multiplying the degree of difference by the first attenuation coefficient is multiplied by the discriminant function smoothing value to obtain a positive example side load value, and the sum of the positive example side load values for all recognition symbol sequences can be arbitrarily set. Through the monotonically increasing reversible function of
An exponential function value obtained by multiplying the degree of difference by a second attenuation coefficient smaller than the first attenuation coefficient is multiplied by the discrimination function smoothing value to obtain a recognition-side load value, and the recognition for all recognition symbol sequences is performed. The recognition side integrated value, which is an integrated value for correcting the positive example side integrated value by passing the sum of the side load values through an arbitrary monotonically increasing reversible function,
A discriminant learning apparatus using the degree-of-difference, wherein a value obtained by subtracting the recognition-side integrated value from the positive example integrated value is output as an objective function.
請求項1又は2に記載の相違度利用型識別的学習装置において、
上記正例認識比較部は、
上記識別関数値に予め定められた正定数を乗じ、その値の指数関数値を識別関数平滑化値として計算する識別関数平滑化・逆対数化手段と、
上記相違度に第1の減衰係数を乗じた値の指数関数値に上記識別関数平滑化値を乗じた正例側荷重値を計算する正例側荷重手段と、
全ての認識シンボル系列に対する上記正例側荷重値の総和の対数関数値を正例側統合値として計算する正例側統合・対数化手段と、
上記相違度に上記第1の減衰係数よりも小さな第2の減衰係数を乗じた値の指数関数値に上記識別関数平滑化値を乗じた認識側荷重値を計算する認識側荷重手段と、
全ての認識シンボル系列に対応する上記認識側荷重値を累計した値の対数関数値を上記正例側統合値を補正するための統合値である上記認識側統合値として出力する認識側統合・対数化手段と、
上記正例統合値から上記認識側統合値を減じたものを目的関数として出力する統合値比較手段と、
を備えることを特徴とする相違度利用型識別的学習装置。
In the dissimilarity utilization type discriminative learning device according to claim 1 or 2,
The positive example recognition comparison unit
Discriminant function smoothing / antilogarithmizing means for multiplying the discriminant function value by a predetermined positive constant and calculating an exponential function value of the value as a discriminant function smoothing value;
Positive example side load means for calculating a positive example side load value obtained by multiplying the exponential function value obtained by multiplying the difference by the first attenuation coefficient by the discriminant function smoothing value;
Positive example side integration / logarithmizing means for calculating a logarithmic function value of the sum of the positive example side load values for all recognition symbol sequences as a positive example side integrated value;
Recognition-side load means for calculating a recognition-side load value obtained by multiplying the discriminant function smoothing value by an exponential function value obtained by multiplying the difference by the second attenuation coefficient smaller than the first attenuation coefficient;
Recognizing side integration / logarithm that outputs a logarithmic function value obtained by accumulating the above recognition side load values corresponding to all recognition symbol series as the above recognition side integration value, which is an integrated value for correcting the positive example side integration value And
Integrated value comparison means for outputting, as an objective function, a value obtained by subtracting the recognized integrated value from the positive example integrated value;
A discriminant learning device using the degree of difference, characterized by comprising:
請求項1の相違度利用型識別的学習装置において、
上記正例認識比較部は3個の減衰係数を備え、
上記目的関数は、誤識別尺度を損失関数に通したものであり、
上記正例認識比較部は、
上記識別関数値に予め設定した正定数を乗じた値の指数関数値である識別関数平滑化値を求め、
上記相違度に第1の減衰係数を乗じた値の指数関数値に上記識別関数平滑化値を乗じて正例側荷重値とし、全ての認識シンボル系列に対する上記正例側荷重値の総和を任意の単調増加可逆関数に通して正例側統合値として求め、
上記相違度に上記第1の減衰係数よりも小さな第2の減衰係数を乗じた値の指数関数値に上記識別関数平滑化値を乗じた第1負例側荷重値と、第2減衰係数よりも大きな第3の減衰係数を上記相違度に乗じた値の指数関数値に上記識別関数平滑化値を乗じた第2負例側荷重値とを計算し、上記第1負例側荷重値から上記第2負例側荷重値を減じて負例側荷重値とし、全ての認識シンボル系列に対する上記負例側荷重値の総和を任意の単調増加可逆関数に通して上記正例側統合値を補正するための統合値である負例側統合値として求め、
上記負例側統合値から上記正例側統合値を減じて上記識別尺度とするものであることを特徴とする相違度利用型識別的学習装置。
The dissimilarity utilization type discriminative learning device according to claim 1,
The positive example recognition comparison unit includes three attenuation coefficients,
The above objective function is a misclassification measure passed through a loss function,
The positive example recognition comparison unit
Obtain an identification function smoothing value that is an exponential function value obtained by multiplying the identification function value by a positive constant set in advance,
The exponential function value obtained by multiplying the degree of difference by the first attenuation coefficient is multiplied by the discriminant function smoothing value to obtain a positive example side load value, and the sum of the positive example side load values for all recognition symbol sequences can be arbitrarily set. Through the monotonically increasing reversible function of
A first negative example side load value obtained by multiplying the discriminant function smoothing value by an exponential function value obtained by multiplying the degree of difference by a second damping coefficient smaller than the first damping coefficient, and a second damping coefficient A second negative example side load value obtained by multiplying the exponential function value obtained by multiplying the degree of difference by the larger third attenuation coefficient by the discriminant function smoothing value, and calculating from the first negative example side load value. The negative example side load value is reduced to the negative example side load value, and the sum of the negative example side load values for all recognition symbol sequences is passed through an arbitrary monotonically increasing reversible function to correct the positive example side integrated value. It is calculated as a negative example side integrated value that is an integrated value for
The discriminant learning apparatus using the degree of difference is characterized in that the positive example side integrated value is subtracted from the negative example side integrated value to obtain the discrimination scale.
請求項1又は4に記載の相違度利用型識別的学習装置において、
上記正例認識比較部は、
上記識別関数値に予め定められた正定数を乗じ、その値の指数関数値を識別関数平滑化値として計算する識別関数平滑化・逆対数化手段と、
上記相違度に第1の減衰係数を乗じた値の指数関数値に上記識別関数平滑化値を乗じた正例側荷重値を計算する正例側荷重手段と、
全ての認識シンボル系列に対する上記正例側荷重値の総和の対数関数値を正例側統合値として計算する正例側統合・対数化手段と、
上記相違度に上記第1の減衰係数よりも小さな第2の減衰係数を乗じた値の指数関数値に上記識別関数平滑値を乗じた第1負例側荷重値と、第2減衰係数よりも大きな第3の減衰係数を上記相違度に乗じた値の指数関数値に上記識別関数平滑化値を乗じた第2負例側荷重値とを計算し、上記第1負例側荷重値から上記第2負例側荷重値を減じた値を計算する負例側荷重手段と、
全ての認識シンボル系列に対応する上記負側荷重値を累計してその値の対数関数値を上記正例側統合値を補正するための統合値である負例側統合値として出力する負側統合・対数化手段と、
上記負例統合値から上記正例側統合値を減じて上記識別尺度とし、その識別尺度を損失関数に通したものを目的関数として出力する統合値比較手段と、
を備えることを特徴とする相違度利用型識別的学習装置。
In the dissimilarity utilization type discriminative learning device according to claim 1 or 4,
The positive example recognition comparison unit
Discriminant function smoothing / antilogarithmizing means for multiplying the discriminant function value by a predetermined positive constant and calculating an exponential function value of the value as a discriminant function smoothing value;
Positive example side load means for calculating a positive example side load value obtained by multiplying the exponential function value obtained by multiplying the difference by the first attenuation coefficient by the discriminant function smoothing value;
Positive example side integration / logarithmizing means for calculating a logarithmic function value of the sum of the positive example side load values for all recognition symbol sequences as a positive example side integrated value;
A first negative example side load value obtained by multiplying the discriminant function smooth value by an exponential function value obtained by multiplying the degree of difference by a second attenuation coefficient smaller than the first attenuation coefficient, and a second attenuation coefficient. A second negative example side load value obtained by multiplying the exponential function value obtained by multiplying the degree of difference by the large third attenuation coefficient and the discriminant function smoothing value is calculated, and the first negative example side load value is used to calculate the above value. A negative example side load means for calculating a value obtained by subtracting the second negative example side load value;
Negative side integration for accumulating the negative side load values corresponding to all recognized symbol series and outputting the logarithmic function value of the cumulative value as a negative example side integrated value for correcting the positive example side integrated value・ Logarithmization means,
An integrated value comparison means for subtracting the positive example side integrated value from the negative example integrated value as the discrimination scale, and outputting the discrimination scale passed through a loss function as an objective function;
A discriminant learning device using the degree of difference, characterized by comprising:
請求項1乃至5の何れかに記載の相違度利用型識別的学習装置において、
上記相違度算出部は、学習データの特徴量と上記認識シンボル系列と識別関数値とを入力として、上記相違度を、上記認識シンボル系列が正解として信頼できる度合いを表す尺度の確信度として推定した相違度推定値として出力するものであることを特徴とする相違度利用型識別的学習装置。
The dissimilarity utilization type discriminative learning device according to any one of claims 1 to 5,
The dissimilarity calculation unit inputs the feature amount of the learning data, the recognition symbol sequence, and the discriminant function value, and estimates the dissimilarity as a certainty of a scale representing the degree to which the recognition symbol sequence is reliable as a correct answer. A difference utilization type discriminative learning device characterized by being output as a difference degree estimated value.
パターン認識部が、学習データの特徴量系列をパターン認識した認識シンボル系列を生成するパターン認識過程と、
識別関数値生成部が、モデルパラメータ記録部内のモデルパラメータを参照して上記認識シンボル系列が上記学習データの特徴量に対応するか否かを評価する識別関数値を出力する識別関数値生成過程と、
相違度算出部が、上記認識シンボル系列と正例との相違度を算出する相違度算出過程と、
正例認識比較部が、N個(N≧2)の減衰係数と上記識別関数値と上記相違度とを入力とし、上記N個の減衰係数を用いて正例側統合値及びその正例側統合値を補正するための統合値を求め、上記正例側統合値を補正した目的関数を出力する正例認識比較過程と、
モデルパラメータ最適化部が、上記目的関数を用いて上記認識シンボル系列に対応する上記モデルパラメータを最適化するモデルパラメータ最適化過程と、
を含み、
上記正例認識比較過程は2個の減衰係数を備え、
上記正例認識比較過程は、
上記識別関数値に予め定められた正定数を乗じた値の指数関数値である識別関数平滑化値を求め、
上記相違度に第1の減衰係数を乗じた値の指数関数値に上記識別関数平滑化値を乗じて正例側荷重値とし、全ての認識シンボル系列に対する上記正例側荷重値の総和を任意の単調増加可逆関数に通して正例側統合値として求め、
上記相違度に上記第1の減衰係数よりも小さな第2の減衰係数を乗じた値の指数関数値に上記識別関数平滑化値を乗じて認識側荷重値とし、全ての認識シンボル系列に対する上記認識側荷重値の総和を任意の単調増加可逆関数に通して上記正例側統合値を補正するための統合値として認識側統合値として求め、
上記正例統合値から上記認識側統合値を減じたものを目的関数として出力するものであることを特徴とする相違度利用型識別的学習方法。
A pattern recognition process in which a pattern recognition unit generates a recognition symbol sequence obtained by pattern recognition of a feature amount sequence of learning data;
An identification function value generation process in which an identification function value generation unit outputs an identification function value for evaluating whether or not the recognition symbol sequence corresponds to a feature quantity of the learning data with reference to a model parameter in a model parameter recording unit; ,
A difference calculation unit calculates a difference between the recognized symbol sequence and the positive example,
The positive example recognition / comparison unit receives N (N ≧ 2) attenuation coefficients, the discriminant function value, and the dissimilarity as inputs, and uses the N attenuation coefficients to integrate the positive example side value and its positive example side. A positive example recognition comparison process for obtaining an integrated value for correcting the integrated value and outputting an objective function in which the positive example side integrated value is corrected,
A model parameter optimization unit for optimizing the model parameter corresponding to the recognition symbol sequence using the objective function;
Including
The positive example recognition comparison process comprises two attenuation coefficients,
The positive recognition comparison process is as follows:
Obtain an identification function smoothing value that is an exponential function value obtained by multiplying the above identification function value by a predetermined positive constant;
The exponential function value obtained by multiplying the degree of difference by the first attenuation coefficient is multiplied by the discriminant function smoothing value to obtain a positive example side load value, and the sum of the positive example side load values for all recognition symbol sequences can be arbitrarily set. Through the monotonically increasing reversible function of
An exponential function value obtained by multiplying the degree of difference by a second attenuation coefficient smaller than the first attenuation coefficient is multiplied by the discrimination function smoothing value to obtain a recognition-side load value, and the recognition for all recognition symbol sequences is performed. The sum of the side load values is passed through an arbitrary monotonically increasing reversible function to obtain the integrated value for correcting the positive example side integrated value as the recognized side integrated value,
What is obtained by subtracting the recognition side integrated value from the positive example integrated value is output as an objective function.
請求項7の相違度利用型識別的学習方法において、
上記正例認識比較過程は、
識別関数平滑化・逆対数化手段が、上記識別関数値に正定数を乗じ、その値の指数関数値を識別関数平滑化値として計算する識別関数平滑化・逆対数化ステップと、
正例側荷重手段が、上記相違度に第1の減衰係数を乗じた値の指数関数値に上記識別関数平滑化値を乗じた正例側荷重値を計算する正例側荷重ステップと、
正例側統合・対数化手段が、全ての認識シンボル系列に対する上記正例側荷重値の総和の対数関数値を正例側統合値として計算する正例側統合・対数化ステップと、
認識側荷重手段が、上記相違度に上記第1の減衰係数よりも小さな第2の減衰係数を乗じた値の指数関数値に上記識別関数平滑化値を乗じた認識側荷重値を計算する認識側荷重ステップと、
認識側統合・対数化手段が、全ての認識シンボル系列に対応する上記認識側荷重値を累計してその値の対数関数値を上記正例側統合値を補正するための統合値である認識側統合値として出力する認識側統合・対数化ステップと、
統合値比較手段が、上記正例統合値から上記認識側統合値を減じたものを目的関数として出力する統合値比較ステップと、
を含むことを特徴とする相違度利用型識別的学習方法。
The difference degree utilization type discriminative learning method according to claim 7,
The positive recognition comparison process is as follows:
Discriminant function smoothing / antilogarithmizing means is a discriminant function smoothing / antilogarithmizing step for multiplying the discriminant function value by a positive constant and calculating an exponential function value of the value as a discriminant function smoothing value;
A positive example side load means for calculating a positive example side load value obtained by multiplying the exponential function value of the value obtained by multiplying the degree of difference by the first attenuation coefficient by the discrimination function smoothing value;
Positive example side integration / logarithmization means for calculating a logarithmic function value of the sum of the positive example side load values for all recognition symbol sequences as a positive example side integration value;
A recognition side load means for calculating a recognition side load value obtained by multiplying an exponential function value obtained by multiplying the degree of difference by a second attenuation coefficient smaller than the first attenuation coefficient by the discrimination function smoothing value; A side load step;
The recognition side integration / logarithmization means is an integration value for accumulating the recognition side load values corresponding to all recognition symbol sequences and correcting the logarithmic function value of the value to the positive example side integration value. Recognition side integration / logarithmization step to output as integrated value,
An integrated value comparing step, wherein the integrated value comparing means outputs the objective function obtained by subtracting the recognized integrated value from the positive integrated value;
A discriminative learning method using the degree of difference.
請求項7の相違度利用型識別的学習方法において、
上記正例認識比較過程は3個の減衰係数を備え、
上記目的関数は誤識別尺度を損失関数に通したものであり、
上記正例認識比較過程は、
上記識別関数値に予め設定した正定数を乗じた値の指数関数値である識別関数平滑化値を求め、
上記相違度に第1の減衰係数を乗じた値の指数関数値に上記識別関数平滑化値を乗じて正例側荷重値とし、全ての認識シンボル系列に対する上記正例側荷重値の総和を任意の単調増加可逆関数に通して正例側統合値として求め、
上記相違度に上記第1の減衰係数よりも小さな第2の減衰係数を乗じた値の指数関数値に上記識別関数平滑化値を乗じた第1負例側荷重値と、第2減衰係数よりも大きな第3の減衰係数を上記相違度に乗じた値の指数関数値に上記識別関数平滑化値を乗じた第2負例側荷重値とを計算し、上記第1負例側荷重値から上記第2負例側荷重値を減じて負例側荷重値とし、全ての認識シンボル系列に対する上記負例側荷重値の総和を任意の単調増加可逆関数に通して上記正例側統合値を補正するための統合値である負例側統合値として求め、
上記負例側統合値から上記正例側統合値を減じて上記識別尺度とする過程であることを特徴とする相違度利用型識別的学習方法
The difference degree utilization type discriminative learning method according to claim 7,
The positive example recognition comparison process comprises three attenuation coefficients,
The objective function is a misclassification measure passed through a loss function,
The positive recognition comparison process is as follows:
Obtain an identification function smoothing value that is an exponential function value obtained by multiplying the identification function value by a positive constant set in advance,
The exponential function value obtained by multiplying the degree of difference by the first attenuation coefficient is multiplied by the discriminant function smoothing value to obtain a positive example side load value, and the sum of the positive example side load values for all recognition symbol sequences can be arbitrarily set. Through the monotonically increasing reversible function of
A first negative example side load value obtained by multiplying the discriminant function smoothing value by an exponential function value obtained by multiplying the degree of difference by a second damping coefficient smaller than the first damping coefficient, and a second damping coefficient A second negative example side load value obtained by multiplying the exponential function value obtained by multiplying the degree of difference by the larger third attenuation coefficient by the discriminant function smoothing value, and calculating from the first negative example side load value. The negative example side load value is reduced to the negative example side load value, and the sum of the negative example side load values for all recognition symbol sequences is passed through an arbitrary monotonically increasing reversible function to correct the positive example side integrated value. It is calculated as a negative example side integrated value that is an integrated value for
A method of discriminative learning using a degree of difference, which is a process of subtracting the positive example side integrated value from the negative example side integrated value to obtain the discrimination scale.
請求項7の相違度利用型識別的学習方法において、
上記正例認識比較過程は、
識別関数平滑化・逆対数化手段が、上記識別関数値に予め定められた正定数を乗じ、その値の指数関数値を識別関数平滑化値として計算する識別関数平滑化・逆対数化ステップと、
上記相違度に第1の減衰係数を乗じた値の指数関数値に上記識別関数平滑化値を乗じた正例側荷重値を計算する正例側荷重ステップと、
正例側統合・対数化手段が、全ての認識シンボル系列に対する上記正例側荷重値の総和の対数関数値を正例側統合値として計算する正例側統合・対数化ステップと、
負例側荷重手段が、上記相違度に上記第1の減衰係数よりも小さな第2減衰係数を乗じた第1負例側荷重値と、第2減衰係数よりも大きな第3の減衰係数を上記相違度に乗じた値の指数関数値に上記識別関数平滑化値を乗じた第2負例側荷重値とを計算し、上記第1負例側荷重値から上記第2負例側荷重値を減じた値を上記正例側統合値を補正するための統合値である負例側統合値として計算する負例側荷重ステップと、
負側統合・対数化手段が、全ての認識シンボル系列に対応する上記負側荷重値を累計してその値の対数関数値を負例側統合値として出力する負側統合・対数化ステップと、
統合値比較手段が、上記負例統合値から上記正例側統合値を減じて上記識別尺度とし、その識別尺度を損失関数に通したものを目的関数として出力する統合値比較ステップと、
を含むことを特徴とする相違度利用型識別的学習方法。
The difference degree utilization type discriminative learning method according to claim 7,
The positive recognition comparison process is as follows:
A discriminant function smoothing / antilogarithmizing step, wherein the discriminant function smoothing / antilogarithmizing means multiplies the discriminant function value by a predetermined positive constant and calculates an exponential function value of the value as a discriminant function smoothing value; ,
A positive side load step for calculating a positive side load value obtained by multiplying the exponential function value obtained by multiplying the degree of difference by the first attenuation coefficient by the discrimination function smoothing value;
Positive example side integration / logarithmization means for calculating a logarithmic function value of the sum of the positive example side load values for all recognition symbol sequences as a positive example side integration value;
The negative example side load means obtains a first negative example side load value obtained by multiplying the degree of difference by a second attenuation coefficient smaller than the first attenuation coefficient, and a third attenuation coefficient larger than the second attenuation coefficient. A second negative example load value obtained by multiplying the exponential function value obtained by multiplying the degree of difference by the discriminant function smoothing value is calculated, and the second negative example side load value is calculated from the first negative example side load value. A negative example side load step that calculates a reduced value as a negative example side integrated value that is an integrated value for correcting the positive example side integrated value;
A negative integration / logarithmization step in which the negative integration / logarithmization means accumulates the negative load values corresponding to all recognition symbol sequences and outputs a logarithmic function value of the value as a negative example integration value;
An integrated value comparison step, wherein the integrated value comparison means subtracts the positive example integrated value from the negative example integrated value to make the identification scale, and outputs the identification scale passed through a loss function as an objective function;
A discriminative learning method using the degree of difference.
請求項7乃至10の何れかに記載の相違度利用型識別的学習方法において、
上記相違度算出過程は、学習データの特徴量と上記認識シンボル系列と識別関数値とを入力として、上記相違度を、上記認識シンボル系列が正解として信頼できる度合いを表す尺度の確信度として推定した相違度推定値として出力するものであることを特徴とする相違度利用型識別的学習方法。
The dissimilarity utilization type discriminative learning method according to any one of claims 7 to 10,
The degree-of-difference calculation process uses the feature amount of the learning data, the recognition symbol series, and the identification function value as inputs, and estimates the degree of difference as a certainty of a scale representing the degree to which the recognition symbol series is reliable as a correct answer. A discriminant utilization type discriminative learning method characterized in that the discriminating factor is output as a dissimilarity estimation value.
請求項1乃至6の何れかに記載した相違度利用型識別的学習装置としてコンピュータを機能させるための装置プログラム。   An apparatus program for causing a computer to function as the dissimilarity utilization type discriminative learning apparatus according to any one of claims 1 to 6.
JP2009100865A 2009-04-17 2009-04-17 Dissimilarity utilization type discriminative learning apparatus and method, and program thereof Active JP5113797B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2009100865A JP5113797B2 (en) 2009-04-17 2009-04-17 Dissimilarity utilization type discriminative learning apparatus and method, and program thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2009100865A JP5113797B2 (en) 2009-04-17 2009-04-17 Dissimilarity utilization type discriminative learning apparatus and method, and program thereof

Publications (2)

Publication Number Publication Date
JP2010250161A JP2010250161A (en) 2010-11-04
JP5113797B2 true JP5113797B2 (en) 2013-01-09

Family

ID=43312546

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2009100865A Active JP5113797B2 (en) 2009-04-17 2009-04-17 Dissimilarity utilization type discriminative learning apparatus and method, and program thereof

Country Status (1)

Country Link
JP (1) JP5113797B2 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5264649B2 (en) * 2009-08-18 2013-08-14 日本電信電話株式会社 Information compression model parameter estimation apparatus, method and program
JP5749187B2 (en) * 2012-02-07 2015-07-15 日本電信電話株式会社 Parameter estimation device, parameter estimation method, speech recognition device, speech recognition method and program
EP3690739A1 (en) * 2019-02-01 2020-08-05 Koninklijke Philips N.V. Confidence measure for a deployed machine learning model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4533160B2 (en) * 2005-01-21 2010-09-01 日本電信電話株式会社 Discriminative learning method, apparatus, program, and recording medium on which discriminative learning program is recorded

Also Published As

Publication number Publication date
JP2010250161A (en) 2010-11-04

Similar Documents

Publication Publication Date Title
JP2775140B2 (en) Pattern recognition method, voice recognition method, and voice recognition device
JP5177561B2 (en) Recognizer weight learning device, speech recognition device, and system
WO2008004666A1 (en) Voice recognition device, voice recognition method and voice recognition program
JP5113797B2 (en) Dissimilarity utilization type discriminative learning apparatus and method, and program thereof
US20080189109A1 (en) Segmentation posterior based boundary point determination
JP2005084436A (en) Speech recognition apparatus and computer program
JP7409381B2 (en) Utterance section detection device, utterance section detection method, program
JP5079760B2 (en) Acoustic model parameter learning device, acoustic model parameter learning method, acoustic model parameter learning program
JP4533160B2 (en) Discriminative learning method, apparatus, program, and recording medium on which discriminative learning program is recorded
JP5852550B2 (en) Acoustic model generation apparatus, method and program thereof
JP5738216B2 (en) Feature amount correction parameter estimation device, speech recognition system, feature amount correction parameter estimation method, speech recognition method, and program
JP4981850B2 (en) Voice recognition apparatus and method, program, and recording medium
JP7279800B2 (en) LEARNING APPARATUS, ESTIMATION APPARATUS, THEIR METHOD, AND PROGRAM
JP2008217592A (en) Language analysis model learning device, language analysis model learning method, language analysis model learning program and recording medium
JP5308102B2 (en) Identification score / posterior probability calculation method by number of errors, error number weighted identification learning device using the method, method thereof, speech recognition device using the device, program, and recording medium
JP2014153680A (en) Acoustic model correction parameter estimation device, feature quantity correction parameter estimation device, and methods and programs therefor
JP4843646B2 (en) Voice recognition apparatus and method, program, and recording medium
JP5982265B2 (en) Speech recognition apparatus, speech recognition method, and program
JP5166195B2 (en) Acoustic analysis parameter generation method and apparatus, program, and recording medium
JP5089651B2 (en) Speech recognition device, acoustic model creation device, method thereof, program, and recording medium
JP2007017548A (en) Verification device of voice recognition result and computer program
JP4801108B2 (en) Voice recognition apparatus, method, program, and recording medium thereof
JP2012118441A (en) Method, device, and program for creating acoustic model
JP4801107B2 (en) Voice recognition apparatus, method, program, and recording medium thereof
JP7173327B2 (en) LEARNING APPARATUS, VOICE RECOGNITION APPARATUS, THEIR METHOD, AND PROGRAM

Legal Events

Date Code Title Description
RD03 Notification of appointment of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7423

Effective date: 20110715

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20110825

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20120808

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20120814

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20120903

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20121002

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20121012

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20151019

Year of fee payment: 3

R150 Certificate of patent or registration of utility model

Ref document number: 5113797

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350