JP2003177785A

JP2003177785A - Linear transformation matrix calculation device and voice recognition device

Info

Publication number: JP2003177785A
Application number: JP2001375295A
Authority: JP
Inventors: Tadashi Emori; 正江森
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2001-12-10
Filing date: 2001-12-10
Publication date: 2003-06-27
Anticipated expiration: 2021-12-10
Also published as: JP3876974B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice recognition device having a higher performance for derivation of an LDA transformation matrix. <P>SOLUTION: When a linear transformation matrix for linear discrimination analysis is obtained, an inter-class covariance matrix is calculated. At this time, confusion matrix coefficients are multiplied. The confusion matrix coefficients are determined in accordance with frequencies of combination of pairs of phonemes, pairs of syllables or the like easy to be confused by previous voice recognition. With respect to the linear transformation matrix calculated in this process, feature vectors can be subjected to such transformation that classes easy to be confused can be more separated. As a result, a high recognition performance is obtained. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、ＬＤＡ変換行列計
算装置及び音声認識装置不特定話者の音声認識システ
ム、音声認識方法と音声認識用プログラムを記録した記
録媒体に関する。特に、識別性を向上させるため線形識
別分析の線形変換行列の作成方法と、その方法が記述さ
れたプログラムと、そのプログラムを記録した記録媒体
に関する。また、その線形変換行列を用いた音声認識シ
ステムと、音声認識用プログラムおよびプログラムを記
録した記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an LDA conversion matrix calculation device and a voice recognition device, a voice recognition system for an unspecified speaker, a voice recognition method, and a recording medium recording a voice recognition program. In particular, the present invention relates to a method for creating a linear transformation matrix for linear discriminant analysis to improve discriminability, a program in which the method is described, and a recording medium recording the program. The present invention also relates to a voice recognition system using the linear conversion matrix, a voice recognition program, and a recording medium recording the program.

【０００２】[0002]

【従来の技術】従来、音声認識の性能改善手法の一つと
てして、オーム社から出版されている石井健一朗達によ
って書かれた「パターン認識」(以後、参考文献１と称
する)の１１５ページから１３３ページにに示される様
な線形識別分析(Linear Discriminant Analysis:以後Ｌ
ＤＡと称する)が知られている。ＬＤＡとは、クラス内
の特徴量の分散を小さく、クラス間の特徴量の分散を大
きくなるように線形変換求め、特徴ベクトル空間に線形
変換を施す手法である (以後、その様な変換をＬＤＡ変
換と称する)。音声認識におけるＬＤＡ変換は、多次元
の特徴量に対し施すことが多く、その場合、求めるＬＤ
Ａ変換のための演算子は、行列となる(以後、この様な
ＬＤＡ変換のための行列をＬＤＡ変換行列と称する)。
音声認識における特徴ベクトルとは、１９９５年にＮＴ
Ｔアドバンステクノロジ株式会社から出版された古井監
訳の音声認識の基礎(上)(以後、参考文献２と称する)の
１３４ページ〜１４２ページに示されるような、メルケ
プストラムやその変化量等が用いられる。また、音声認
識において前述のクラスに相当するものを表すのに、１
９９５年にＮＴＴアドバンステクノロジ株式会社から出
版された古井監訳の音声認識の基礎(下)(以後、参考文
献３と称する)の１０２ページから１３８ページに示さ
れるようにヒドゥンマルコフモデル(Hidden Markov Mod
el:以後ＨＭＭと称する)の状態や出力分布等で表すこと
が多い。その様なＬＤＡ変換の結果、もとの特徴ベクト
ルが分布している空間よりもクラス間の分離度が大きく
なる。例えば、第ｉ番目のクラスを表す記号をω_１とし
た場合、クラスω_１に最も近い特徴ベクトルが入力され
た場合、他のクラスとのゆう度(または距離)がＬＤＡ変
換を行わない場合に比べ大きく(距離の場合小さく)な
る。その結果、識別能力が高くなり認識性能も向上する
ことが期待できる。2. Description of the Related Art Conventionally, as one of techniques for improving the performance of speech recognition, page 115 of "Pattern Recognition" (hereinafter referred to as Reference 1) written by Kenichiro Ishii published by Ohmsha. To Linear Discriminant Analysis (hereinafter referred to as L
(Referred to as DA) is known. LDA is a method of performing a linear transformation so that the variance of the feature amount within a class is small and the variance of the feature amount between classes is large, and performing a linear transform on the feature vector space (hereinafter, such a transform is performed by LDA. Referred to as conversion). In many cases, LDA conversion in speech recognition is applied to a multidimensional feature amount, and in that case, the LD
The operator for A conversion is a matrix (hereinafter, such a matrix for LDA conversion is referred to as an LDA conversion matrix).
Feature vectors in speech recognition were introduced in NT in 1995.
Mel cepstrum and its variation are used, as shown on pages 134 to 142 of the basics of speech recognition (above) published by T Advanced Technology Co., Ltd. (above) (above). . Also, in speech recognition, to represent the equivalent of the above class, 1
Hidden Markov Mod (Hidden Markov Mod) as shown on page 102 to page 138 of the basics of speech recognition (below) referred to by Director Furui, published by NTT Advanced Technology Co., Ltd.
el: hereinafter referred to as HMM), the output distribution, etc. As a result of such LDA conversion, the degree of separation between classes is larger than in the space in which the original feature vector is distributed. For example, when the symbol representing the i-th class and omega _1, if the feature vector closest to the class omega ₁ is input, when the likelihood of other classes (or distance) is not performed LDA transformation It becomes larger (smaller for distance). As a result, it can be expected that the discriminating ability is enhanced and the recognition performance is also enhanced.

【０００３】従来技術を用いたＬＤＡ変換行列を求める
従来型ＬＤＡ変換行列計算装置とＬＤＡ変換行列を用い
た音声認識装置の例を図と数式を用いて説明する。図４
は、従来の技術を用いた従来型ＬＤＡ変換行列計算装置
３を説明するための図である。従来型ＬＤＡ変換行列計
算装置３は、クラス間共分散行列計算部Ｃ００２とクラ
ス内共分散行列計算部Ｃ００３と変換行列計算部Ｃ００
４で構成される。クラス内共分散行列計算部Ｃ００２
は、学習用特徴ベクトルｘと以下の数５に示される数式
（参考文献１に記載されている。）を用いてクラス内共
分散行列Σ_Ｗを計算し、出力する。上記数５に示される
数式中、ω_１は第ｉ番目のクラス、ｃはクラス数、μ_１
はクラス毎の特徴ベクトルの平均値、Ｎ_ｉはクラスω_１
のデータ数を示す。An example of a conventional LDA conversion matrix calculation device for obtaining an LDA conversion matrix using a conventional technique and a speech recognition device using the LDA conversion matrix will be described with reference to the drawings and mathematical formulas. Figure 4
FIG. 6 is a diagram for explaining a conventional LDA conversion matrix calculation device 3 using a conventional technique. The conventional LDA conversion matrix calculation device 3 includes an interclass covariance matrix calculation unit C002, an intraclass covariance matrix calculation unit C003, and a conversion matrix calculation unit C00.
It is composed of 4. In-class covariance matrix calculation unit C002
Calculates the in-class covariance matrix Σ _W using the learning feature vector x and the mathematical expression shown in the following Expression 5 (described in Reference Document 1) and outputs it. In the formula shown in the above equation 5, ω ₁ is the i-th class, c is the number of classes, μ ₁
Is the average value of the feature vector for each class, N _i is the class ω ₁
Shows the number of data of.

【０００４】[0004]

【数５】 [Equation 5]

【０００５】ｘ∈ω_１は、特徴ベクトルｘのうち、ω_１
に属するものを示す。Ｐ（ω_１）は、クラスω_１が出現
する事前確率で、参考文献1の１２８ページに示される
ように、データ数に依存する値や、クラスによらず一律
の値が設定されることが多い。上記数５に示される数式
は、各クラス毎にクラスに属する特徴ベクトルｘを平均
したμ_ｉから、各クラスに属する特徴ベクトルの差分の
２乗平均の和(クラス内分散)を計算している。クラス内
に含まれる特徴ベクトルが密集している程、すなわちΣ
_Ｗの値が小さくなる程そのクラスへの帰属性が高くな
り、認識性能には有利に働くと思われる。クラス間共分
散行列計算部Ｃ００２は、学習用特徴ベクトルｘを用い
て、以下の数６に示される数式（参考文献１に記載され
ている。）を用いてクラス間共分散行列を計算し、出力
する。[0005] x∈ω _1, out of the feature vector x, ω ₁
Indicates that it belongs to. P (ω ₁ ) is the prior probability that the class ω ₁ appears, and as shown on page 128 of Reference 1, a value depending on the number of data or a uniform value may be set regardless of the class. Many. The mathematical expression shown in the above equation 5 calculates the sum of the mean squares of the differences of the feature vectors belonging to each class (in-class variance) from μ _i obtained by averaging the feature vectors x belonging to each class. . The denser the feature vectors included in the class, that is, Σ
_The smaller the value of _W, the higher the degree of belonging to the class, which seems to be advantageous for recognition performance. The inter-class covariance matrix calculation unit C002 calculates the inter-class covariance matrix by using the learning feature vector x and the mathematical formula (described in Reference 1) shown in the following Expression 6. Output.

【０００６】[0006]

【数６】 [Equation 6]

【０００７】上記数６に示される数式は、クラス毎の特
徴ベクトルｘの平均値μ_１毎の２乗距離の和を計算して
いる。変換行列計算部Ｃ００３は、Ａ^ＴΣ_ＢＡが大き
く、Ａ ^ＴΣ_ＷＡを小さくなるようにＬＤＡ変換行列Ａを
求める。The mathematical formula shown in the above equation 6 is a special feature for each class.
Mean value of characteristic vector x₁Calculate the sum of squared distances for each
There is. The conversion matrix calculation unit C003^TΣ_BA is big
A, ^TΣ_WThe LDA conversion matrix A is set so that A becomes small.
Ask.

【０００８】この様なＬＤＡ変換行列Ａを求める方法
は、例えば上記数５に示されるような数式を最大化する
ことが考えられる。従来型ＬＤＡ変換行列計算装置３に
入力する学習用用の特徴ベクトルを、学習用の音声信号
Ｓから計算するための分析装置Ｃ００１の説明を行う。
分析装置Ｃ００１は、音声信号Ｓを一定の時間毎(以
後、フレームと称する)に切り出し、メルケプストラム
やその変化量等で構成される特徴ベクトルｘの時系列を
計算し、出力する。ここでは、音声信号に予め学習用に
集められた学習用音声信号Ｓを用いる(以後、学習用音
声信号Ｓを分析して得られた特徴ベクトルを学習用特徴
ベクトルｘと称する)。As a method for obtaining such an LDA conversion matrix A, it is conceivable to maximize the mathematical formula as shown in the above equation 5, for example. The analysis device C001 for calculating the learning feature vector input to the conventional LDA conversion matrix calculation device 3 from the learning speech signal S will be described.
The analysis device C001 cuts out the audio signal S at regular time intervals (hereinafter, referred to as a frame), calculates a time series of the feature vector x composed of the mel cepstrum and its variation, and outputs the calculated time series. Here, a learning voice signal S collected in advance for learning is used as a voice signal (hereinafter, a feature vector obtained by analyzing the learning voice signal S is referred to as a learning feature vector x).

【０００９】次に従来型のＬＤＡ変換行列計算装置３の
計算動作について説明する。分析部Ｃ００１が学習用音
声信号Ｓの全てを学習用特徴ベクトルｘに変換する。従
来型ＬＤＡ変換行列計算装置３に、学習用特徴ベクトル
ｘが入力される。全ての学習用特徴ベクトルｘを用い、
クラス内共分散行列計算部Ｃ００３は、上記数５に示さ
れる数式に従ってクラス内共分散行列Σ_Ｗを計算する。
全ての学習用特徴ベクトルｘを用い、クラス間共分散行
列計算部Ｃ００２は、上記数５に示される数式に従って
クラス間共分散行列Σ_Ｂを計算する。変換行列計算部Ｃ
００４は、Ａ^ＴΣ_ＢＡが大きくなるように、かつＡ^ＴΣ
_ＷＡが小さくなるように、以下の数７に示される数式に
したがってＬＤＡ変換行列Ａを求める。Next, the calculation operation of the conventional LDA conversion matrix calculation device 3 will be described. The analysis unit C001 converts all of the learning voice signal S into the learning feature vector x. The learning feature vector x is input to the conventional LDA conversion matrix calculation device 3. Using all learning feature vectors x,
The in-class covariance matrix calculation unit C003 calculates the in-class covariance matrix Σ _W according to the mathematical formula shown in the above equation 5.
Using all the learning feature vectors x, the interclass covariance matrix calculation unit C002 calculates the interclass covariance matrix Σ _B according to the mathematical formula shown in the above equation 5. Transformation matrix calculation unit C
004 is such that A ^T Σ _B A is large and A ^T Σ
_The LDA conversion matrix A is obtained according to the mathematical formula shown in the following Expression 7 so that _W A becomes small.

【００１０】[0010]

【数７】 [Equation 7]

【００１１】図５は、ＬＤＡ変換行列Ａを用いた従来の
音声認識装置を説明するための図である。従来の音声認
識装置は、分析装置Ｃ００１とＬＤＡ変換行列記憶部Ｃ
００５とＨＭＭ記憶部Ｃ００８と認識部Ｃ００７と特徴
ベクトル用行列乗算器Ｃ００６とＨＭＭ用行列乗算器Ｃ
００９で構成されている。ＬＤＡ変換行列記憶部Ｃ００
５は、従来型ＬＤＡ変換行列計算装置で計算されたＬＤ
Ａ変換行列Ａを記憶する。ＨＭＭ記憶部Ｃ００８は、Ｈ
ＭＭを記憶する。ＨＭＭは、音素や音節など認識単位と
した状態系列等、様々な形態を選ぶことができる。FIG. 5 is a diagram for explaining a conventional speech recognition apparatus using the LDA conversion matrix A. The conventional speech recognition apparatus includes an analysis apparatus C001 and an LDA conversion matrix storage unit C
005, an HMM storage unit C008, a recognition unit C007, a feature vector matrix multiplier C006, and an HMM matrix multiplier C
009. LDA conversion matrix storage unit C00
5 is the LD calculated by the conventional LDA conversion matrix calculation device
The A conversion matrix A is stored. The HMM storage unit C008 stores H
Store the MM. Various forms can be selected for the HMM, such as a state series as a recognition unit such as a phoneme or a syllable.

【００１２】また、分布の形態は、離散分布でも連続分
布でもとることができて、ここでは、連続正規分布を用
いることとする。このときにクラスは、平均ベクトルと
それに対応する分散等を用いて表すことができる。特
徴ベクトル用行列乗算器Ｃ００６は、ＬＤＡ変換行列Ａ
と特徴ベクトルｘの乗算を行う。ＨＭＭ用行列乗算器Ｃ
００７は、ＬＤＡ変換行列ＡとＨＭＭの平均ベクトルや
分散等との乗算を行う。認識部Ｃ００７は、特徴ベクト
ルｘとＨＭＭのゆう度計算し、最もゆう度の高いものを
認識結果として出力する。ＨＭＭの照合の方法は、参考
文献３の１２５ページから１２８ページに示されるビタ
ビアルゴリズム等が考えられる。The form of the distribution can be either a discrete distribution or a continuous distribution, and here a continuous normal distribution is used. At this time, the class can be represented by using an average vector and its corresponding variance. The feature vector matrix multiplier C006 uses the LDA conversion matrix A
And the feature vector x are multiplied. HMM matrix multiplier C
007 multiplies the LDA conversion matrix A and the HMM mean vector, variance, and the like. The recognition unit C007 calculates the likelihood of the feature vector x and the HMM, and outputs the one having the highest likelihood as the recognition result. As a method of collating the HMM, the Viterbi algorithm shown on pages 125 to 128 of Reference 3 can be considered.

【００１３】次に、ＬＤＡ変換行列Ａを用いた従来の音
声認識装置の動作を説明する。分析装置Ｃ００１が、
入力された音声信号Ｓに対し、一定の時間間隔で分析を
行い、特徴ベクトルｘを出力する。このとき、音声信号
は、学習用の音声信号ではなく、認識を行うための音声
信号である。特徴ベクトル用行列乗算器Ｃ００６によっ
て、ＬＤＡ変換行列記憶部Ｃ００５に記憶されているＬ
ＤＡ変換行列Ａと1フレーム毎に特徴ベクトルｘ(以後、
この特徴ベクトルを変換特徴ベクトルｘ'と称する)が乗
算される。ＨＭＭ用行列乗算器Ｃ００９によって、ＨＭ
Ｍ記憶部Ｃ００８に記憶されているＨＭＭの平均ベクト
ルやそれに対応する分散にＬＤＡ変換行列Ａがかけられ
る(以後、このＨＭＭを、変換ＨＭＭと称する)。認識部
Ｃ００７は、変換特徴ベクトルｘ'と変換ＨＭＭを照合
し、ゆう度の高いものを認識結果として出力する。Next, the operation of the conventional speech recognition apparatus using the LDA conversion matrix A will be described. The analyzer C001
The input voice signal S is analyzed at fixed time intervals, and the feature vector x is output. At this time, the voice signal is not a voice signal for learning but a voice signal for recognition. L stored in the LDA conversion matrix storage unit C005 by the feature vector matrix multiplier C006.
The DA conversion matrix A and the feature vector x for each frame (hereinafter,
This feature vector is referred to as a transformed feature vector x ′). The HMM matrix multiplier C009 allows the HM
The LDA conversion matrix A is applied to the average vector of the HMMs and the variance corresponding thereto stored in the M storage unit C008 (hereinafter, this HMM is referred to as a converted HMM). The recognition unit C007 collates the converted feature vector x ′ with the converted HMM, and outputs the one with high likelihood as a recognition result.

【００１４】[0014]

【発明が解決しようとする課題】従来の技術を用いた従
来型ＬＤＡ変換行列計算装置では、クラス間分散を大き
くすることで、特徴空間におけるクラス毎の分離性が良
くなっているように思われる。一方、音声認識では、ゆ
う度を基準にしているため、上記数６に示された数式を
用いて分離度を向上させたからといって、認識性能が向
上するとは限らない。上記数５及び数６に示された数式
で、クラスω_１における事前確率Ｐ（ω_１）を指定でき
るが、それぞれ独立なクラス毎の出現確率であり直接認
識結果に寄与していないため、性能が上がる保証はな
い。線形識別学習を行うことで音声認識性能の向上を目
的を同じにする従来技術はあるが、音声認識性能の向上
の度合には問題があった。In the conventional LDA conversion matrix calculation device using the conventional technique, it seems that the separability of each class in the feature space is improved by increasing the interclass variance. . On the other hand, in the voice recognition, since the likelihood is used as a reference, even if the separation degree is improved by using the mathematical formula shown in the above Expression 6, the recognition performance does not always improve. The prior probabilities P (ω ₁ ) in the class ω ₁ can be specified by the mathematical formulas shown in the above formulas 5 and 6, but they are independent appearance probabilities for each class and do not directly contribute to the recognition result, so the performance There is no guarantee that will increase. Although there is a conventional technique that aims at improving the speech recognition performance by performing linear discriminative learning, there is a problem in the degree of improvement in the speech recognition performance.

【００１５】本発明の目的は、ＬＤＡ変換行列の導出に
おいて、クラス間分散の計算時に認識結果に根ざした混
同行列係数を与えることで、より認識結果に直結したＬ
ＤＡ変換行列を求めることで、より高性能な音声認識装
置を提供することにある。An object of the present invention is to provide a confusion matrix coefficient rooted in the recognition result in the calculation of the interclass variance in the derivation of the LDA transformation matrix, so that L which is more directly connected to the recognition result.
It is to provide a speech recognition device with higher performance by obtaining a DA conversion matrix.

【００１６】[0016]

【課題を解決するための手段】本発明におけるＬＤＡ変
換行列計算装置は、混同しやすい音素対や音節対等を混
同行列係数として記憶している混同行列係数記憶部と、
混同しやすい音素対等を考慮にいれて学習用特徴ベクト
ル等からクラス間の離れ具合を示す重み付きクラス間共
分散行列をもとめる重みつきクラス間共分散行列計算部
と、クラス内の集まり具合を示すクラス間共分散行列を
学習用特徴ベクトル等からもとめるクラス内共分散行列
計算部と、重み付きクラス間共分散行列とクラス内共分
散行列を用いて線形変換のための行列をもとめる重み付
き変換行列計算部を備える。また、前記混同行列係数記
憶部は、予め認識した結果とそれに対応する教師データ
の対から作成された混同しやすい音素対や音節対等を混
同行列係数として記憶する。An LDA conversion matrix calculation device according to the present invention includes a confusion matrix coefficient storage unit that stores phoneme pairs, syllable pairs and the like that are easily confused as confusion matrix coefficients.
Shows the weighted inter-class covariance matrix calculation unit that finds the weighted inter-class covariance matrix that indicates the degree of separation between classes from the learning feature vectors, etc., taking into consideration easily confused phoneme pairs, etc. An intra-class covariance matrix calculation unit that finds the inter-class covariance matrix from the learning feature vectors, etc., and a weighted transformation matrix that finds the matrix for linear transformation using the weighted inter-class covariance matrix and the intra-class covariance matrix The calculator is provided. Further, the confusion matrix coefficient storage unit stores, as confusion matrix coefficients, phoneme pairs, syllable pairs, and the like that are easily confused and are created from pairs of previously recognized results and corresponding teacher data.

【００１７】[0017]

【発明の実施の形態】以下、本発明の実施の形態につい
て図と数式を用いて詳細に説明する。図１は、発明の第
1の実施形態を説明するための図である。本発明におけ
る第1の実施例であるＬＤＡ変換行列計算装置1は、混同
行列係数記憶部Ａ００１と重みつきクラス間共分散行列
計算部Ａ００２とクラス内共分散行列計算部Ｃ００３と
重み付き変換行列計算部Ａ００３で構成される。クラス
内共分散行列計算部Ｃ００８は、従来と同じものであ
る。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described in detail below with reference to the drawings and mathematical expressions. FIG. 1 shows the first aspect of the invention.
FIG. 3 is a diagram for explaining the embodiment of 1. The LDA conversion matrix calculation device 1 according to the first exemplary embodiment of the present invention includes a confusion matrix coefficient storage unit A001, a weighted interclass covariance matrix calculation unit A002, an intraclass covariance matrix calculation unit C003, and a weighted conversion matrix calculation unit. It is composed of part A003. The in-class covariance matrix calculation unit C008 is the same as the conventional one.

【００１８】混同行列係数記憶部Ａ００１は、クラスω
_ｊとω_ｉが混同する度合を示す混同行列係数Ｐ（ω_ｉ，
ω_ｊ）を記憶する。混同行列係数Ｐ（ω_ｉ，ω_ｊ）は、
混同しやすいクラス間の関係をより強調するような値を
設定する。The confusion matrix coefficient storage unit A001 has a class ω
confusion matrix coefficient indicating the degree _j and ω _i is confused P (ω _i,
ω _j ) is stored. The confusion matrix coefficient P (ω _i , ω _j ) is
Set a value that emphasizes the relationships between classes that are easily confused.

【００１９】ここで、混同行列係数Ｐ（ω_ｉ，ω_ｊ）の
設定方法の例を説明する。例えば、ω_１＝“い”，ω_２
＝“き”，ω_３＝“し”のように、３音節のクラスを仮
定する。通常は、全ての組合せで、Ｐ（ω_ｉ，ω_ｊ）＝
αのように、αの様な一定の値に設定しておく。経験的
に、“き”と“し”が良く混同すると考えられる場合、
Ｐ（ω_３＝“し”，ω_２＝“き”）＝α±α_２３とす
る。An example of a method of setting the confusion matrix coefficient P (ω _i , ω _j ) will be described. For example, ω ₁ = “yes”, ω ₂
Assume a class of three syllables, such as = “ki”, ω ₃ = “shi”. Usually, P (ω _i , ω _j ) = for all combinations
Like α, it is set to a constant value such as α. Empirically, if you think that “ki” and “shi” are often confused,
P (ω ₃ = “shi”, ω ₂ = “ki”) = α ± α ₂₃ .

【００２０】ここで、±の符号は、認識の距離尺度等に
応じて適当に設定する。α_２３の値は、“き”と“し”
の混同の度合に応じて、経験的に決定しても良いし、予
め認識した結果の混同の頻度から計算しても良い。ま
た、α_２３＝α_３２としても良いし、α_２３≠α_３２と
しても良い。Here, the sign of ± is set appropriately according to the distance scale of recognition and the like. The value of α ₂₃ is "ki" and "shi"
May be empirically determined according to the degree of confusion, or may be calculated from the confusion frequency of the previously recognized result. Further, α ₂₃ = α ₃₂ may be set, or α ₂₃ ≠ α ₃₂ may be set.

【００２１】重みつきクラス間共分散行列計算部Ａ００
２は、予め設定された混同行列係数Ｐ（ω_ｉ，ω_ｊ）と
学習用特徴ベクトルｘを用いて、上記した数１に示され
る数式に従い重みつきクラス間共分散行列Σ_Ｂ'を計算
する。重み付き変換行列計算部Ａ００３は、重みつきク
ラス間共分散行列Σ_Ｂ'とクラス内共分散行列Σ_Ｗを用
いて、上記した数２に示される数式に従い重み付きＬＤ
Ａ行列Ａ'を計算する。Weighted interclass covariance matrix calculation unit A00
2 uses the preset confusion matrix coefficient P (ω _i , ω _j ) and the learning feature vector x to calculate the weighted interclass covariance matrix Σ _B 'according to the mathematical formula shown in the above-mentioned equation 1. . The weighted transformation matrix calculation unit A003 uses the weighted inter-class covariance matrix Σ _B ′ and the intra-class covariance matrix Σ _W according to the mathematical formula shown in the above-mentioned Equation 2
Compute the A matrix A '.

【００２２】上記数２に示される数式は、一例であり、
Ａ'^ＴΣ_Ｂ'Ａ'を最大にし、Ａ'^ＴΣ _ＷＡ'を最小にする
ようなＡ'を求めることのできる評価関数であればどの
様なものでも良い。The mathematical formula shown in the above equation 2 is an example,
A '^TΣ_BMaximize'A ', A'^TΣ _WMinimize A '
What is an evaluation function that can obtain A '
It can be something like this.

【００２３】以下、本発明における第１の実施の形態の
動作を説明する。分析部装置Ｃ００１等により学習音声
音声信号Ｓが分析された特徴ベクトルｘが、重み付きク
ラス間共分散行列計算部Ａ００２とクラス内共分散行列
計算部Ｃ００３に入力される。クラス内共分散行列計算
部Ｃ００３は、学習用特徴ベクトルｘを用い、上記数５
に示された数式に従い、クラス内共分散行列Σ_Ｗを計算
する。The operation of the first embodiment of the present invention will be described below. The feature vector x in which the learning voice signal S is analyzed by the analysis unit device C001 or the like is input to the weighted inter-class covariance matrix calculation unit A002 and the intra-class covariance matrix calculation unit C003. The in-class covariance matrix calculation unit C003 uses the learning feature vector x
The in-class covariance matrix Σ _W is calculated according to the formula shown in.

【００２４】重み付きクラス間共分散行列計算部Ａ００
２は、学習用特徴ベクトルｘを用い、上記数１に示され
る数式に従い重み付きクラス間共分散行列Σ_Ｂ'計算
し、出力する。重み付き変換行列計算部Ａ００３は、
Ａ'^ＴΣ_Ｂ'Ａ'が大きく、Ａ'^ＴΣ _ＷＡ'が小さくなるよ
うに重み付き変換行列Ａ'を求め、出力する。Weighted interclass covariance matrix calculation unit A00
2 is shown in Equation 1 above using the learning feature vector x.
Weighted interclass covariance matrix Σ_B'Calculate
And output. The weighted transformation matrix calculation unit A003
A '^TΣ_B"A" is big, A "^TΣ _WA'becomes smaller
Then, the weighted transformation matrix A ′ is obtained and output.

【００２５】図２は、本発明の第２の実施の形態を説明
するための図である。本発明の第２の実施例である第２
のＬＤＡ変換行列計算装置２は、ＬＤＡ変換行列計算装
置１と入れ替わり度数集計部Ａ００４と認識結果記憶部
Ａ００５と教師データ記憶部Ａ００６から構成されてい
る。認識結果記憶部Ａ００５は、予め任意の音声認識装
置等によって音声を認識された結果である、認識結果Ｒ
が記憶されている。FIG. 2 is a diagram for explaining the second embodiment of the present invention. Second Embodiment, which is the second embodiment of the present invention
The LDA conversion matrix calculation device 2 is composed of an LDA conversion matrix calculation device 1, a replacement frequency counting unit A004, a recognition result storage unit A005, and a teacher data storage unit A006. The recognition result storage unit A005 is a result of recognition of a voice by an arbitrary voice recognition device or the like in advance.
Is remembered.

【００２６】認識結果Ｒは、音素単位や音節単位、ある
いは単語や文章等どのような表現形式でも良い。教師デ
ータ記憶部Ａ００６は、認識結果Ｒに対応する正解であ
る教師データＶを記憶している。教師データＶは、認識
結果Ｒに１対１で対応する必要がある。The recognition result R may be a phoneme unit, a syllable unit, or any expression form such as a word or a sentence. The teacher data storage unit A006 stores the teacher data V that is the correct answer corresponding to the recognition result R. The teacher data V needs to correspond to the recognition result R on a one-to-one basis.

【００２７】また、認識結果Ｒや教師データＶは、どの
様な発声から収集しても良い。入れ替わり度数集計部Ａ
００４は、認識結果Ｒと教師データＶを用い、ＨＭＭな
どの状態や分布で表されるクラス対毎に混同している度
合を計算し、混同の度合を混同行列係数Ｐ（ω_ｉ，
ω_ｊ）として出力する。ＬＤＡ変換行列計算部1は、入
れ替わり度数集計部Ａ００４で出力されたＰ（ω_ｉ，ω
_ｊ）と学習用特徴ベクトルｘをもちいて、重み付きＬＤ
Ａ変換行列Ａ'を計算し、出力する。The recognition result R and the teacher data V may be collected from any utterance. Replacement frequency counting section A
004, using the recognition result R and the teacher data V, calculates the degree of confusion for each class pair represented by a state or distribution such as HMM, and determines the degree of confusion by the confusion matrix coefficient P (ω _i ,
output as ω _j ). The LDA conversion matrix calculation unit 1 outputs P (ω _i , ω) output by the replacement frequency aggregation unit A004.
_j ) and the learning feature vector x, and weighted LD
The A conversion matrix A ′ is calculated and output.

【００２８】図３は、本発明の第３の実施の形態を説明
するための図である。本発明の第３の実施の形態である
音声認識装置は、ＬＤＡ変換行列計算装置１と分析装置
Ｃ００１とＨＭＭ記憶部Ｃ００８と認識部Ｃ００７と特
徴ベクトル用行列乗算器Ｃ００６とＨＭＭ用行列乗算器
Ｃ００９で構成されている。分析装置Ｃ００１とＨＭＭ
記憶部Ｃ００８と認識部Ｃ００７と特徴ベクトル用行列
乗算器Ｃ００６とＨＭＭ用行列乗算器Ｃ００９は、従来
と同じ機能を持つ。ＬＤＡ変換行列計算装置１は、第１
の実施の形態と同様である。FIG. 3 is a diagram for explaining the third embodiment of the present invention. The speech recognition apparatus according to the third embodiment of the present invention includes an LDA conversion matrix calculation apparatus 1, an analysis apparatus C001, an HMM storage unit C008, a recognition unit C007, a feature vector matrix multiplier C006, and an HMM matrix multiplier C009. It is composed of. Analyzer C001 and HMM
The storage unit C008, the recognition unit C007, the feature vector matrix multiplier C006, and the HMM matrix multiplier C009 have the same functions as conventional ones. The LDA conversion matrix calculation device 1 has a first
This is the same as the embodiment.

【００２９】以下、本音声認識装置の動作について説明
する。ＬＤＡ変換行列計算装置１が学習用特徴ベクトル
を用いて計算した重み付きＬＤＡ変換行列Ａ'を出力す
る。分析装置Ｃ００１は、音声信号Ｓを1フレーム毎に
分析し、特徴ベクトルｘを出力する。特徴ベクトル用行
列乗算器Ｃ００６は、特徴ベクトルｘと重み付きＬＤＡ
変換行列Ａ'の乗算を行う。ＨＭＭ用行列乗算器Ｃ００
９は、ＨＭＭ記憶部Ｃ００８に記憶されているＨＭＭと
重み付きＬＤＡ変換行列Ａ'の乗算を行う。認識部Ｃ０
０７は、重み付きＬＤＡ変換行列Ａ'が掛けられた特徴
ベクトルｘと重み付きＬＤＡ変換行列Ａ'が掛けられた
ＨＭＭとを照合し、最もゆう度の高いものを認識結果と
して出力する。The operation of this speech recognition apparatus will be described below. The LDA conversion matrix calculation device 1 outputs the weighted LDA conversion matrix A ′ calculated using the learning feature vector. The analyzer C001 analyzes the audio signal S for each frame and outputs the feature vector x. The feature vector matrix multiplier C006 calculates the feature vector x and the weighted LDA.
The conversion matrix A ′ is multiplied. HMM matrix multiplier C00
9 multiplies the HMM stored in the HMM storage unit C008 by the weighted LDA conversion matrix A ′. Recognition unit C0
07 collates the feature vector x multiplied by the weighted LDA conversion matrix A ′ with the HMM multiplied by the weighted LDA conversion matrix A ′, and outputs the one having the highest likelihood as the recognition result.

【００３０】[0030]

【発明の効果】本発明によれば、クラス間距離を求める
際に用いる混同行列係数を経験的または認識結果から、
認識が困難な混同しやすい音素対等を強調するような混
同行列係数として与え、ＬＤＡ変換行列の計算に結びつ
けた。したがって、もともと認識が困難な混同しやすい
音素対等を強調して分離することができるため、認識性
能向上を見込むことができる。According to the present invention, the confusion matrix coefficient used in obtaining the interclass distance is empirically or from the recognition result,
It was given as a confusion matrix coefficient that emphasizes phoneme pairs that are difficult to recognize and easily confused, and was linked to the calculation of the LDA conversion matrix. Therefore, since it is possible to emphasize and separate phoneme pairs that are originally difficult to recognize and are easily confused, it is possible to expect improvement in recognition performance.

[Brief description of drawings]

【図１】本発明の第１の実施の形態を示した図である。FIG. 1 is a diagram showing a first exemplary embodiment of the present invention.

【図２】本発明の第２の実施の形態を示した図である。FIG. 2 is a diagram showing a second exemplary embodiment of the present invention.

【図３】本発明の第３の実施の形態を示した図である。FIG. 3 is a diagram showing a third exemplary embodiment of the present invention.

【図４】従来の技術を用いた線形変換行列計算装置を示
した図である。FIG. 4 is a diagram showing a linear transformation matrix calculation device using a conventional technique.

【図５】従来の技術を用いた線形変換行列を用いた音声
認識装置を説明するための図である。FIG. 5 is a diagram for explaining a speech recognition device using a linear transformation matrix using a conventional technique.

[Explanation of symbols]

１ＬＤＡ変換行列計算装置２第2ＬＤＡ変換行列計算装置３従来型ＬＤＡ変換行列計算装置Ｃ００１分析装置Ｃ００２クラス間共分散行列計算部Ｃ００３クラス内共分散行列計算部Ｃ００４変換行列計算部Ｃ００５ＬＤＡ変換行列記憶部Ｃ００６特徴ベクトル用行列乗算器Ｃ００７認識部Ｃ００８ＨＭＭ記憶部Ｃ００９ＨＭＭ用行列乗算器Ａ００１混同行列係数記憶部Ａ００２重み付きクラス間共分散行列計算部Ａ００３重み付き変換行列計算部Ａ００４入れ替わり度数集計部Ａ００５認識結果記憶部Ａ００６教師データ記憶部Ｓ音声記号ｘ特徴ベクトルＡＬＤＡ変換行列Ａ' 重み付きＬＤＡ変換行列 Σ_Ｗクラス内共分散行列 Σ_Ｂクラス間共分散行列 Σ_Ｂ' 重み付きクラス間共分散行列Ｐ（ω_ｉ，ω_ｊ）混同行列係数1 LDA conversion matrix calculation device 2 2nd LDA conversion matrix calculation device 3 Conventional LDA conversion matrix calculation device C001 Analysis device C002 Interclass covariance matrix calculation unit C003 In-class covariance matrix calculation unit C004 Conversion matrix calculation unit C005 LDA conversion matrix storage unit Part C006 Feature vector matrix multiplier C007 Recognition part C008 HMM storage part C009 HMM matrix multiplier A001 Confusion matrix coefficient storage part A002 Weighted interclass covariance matrix calculation part A003 Weighted conversion matrix calculation part A004 Swap frequency calculation part A005 the recognition result storage unit A006 teacher data storage unit S phonetic x feature vectors A LDA transformation matrix A 'weighted LDA transformation matrix sigma _W intraclass covariance matrix sigma _B interclass covariance matrix sigma _B' weighted inter-class covariance matrix P (ω _i , ω _j ) Confusion matrix coefficient

Claims

[Claims]

1. A confusion matrix coefficient storage unit that stores a phoneme pair, a syllable pair, and the like that are easily confused as a confusion matrix coefficient, and a weighted inter-class covariance matrix that uses the confusion matrix coefficient and a learning feature vector. Inter-class covariance matrix calculation unit, intra-class covariance matrix calculation unit that obtains the intra-class covariance matrix from the learning feature vector, and linear transformation using the weighted inter-class covariance matrix and the intra-class covariance matrix A linear transformation matrix calculation device comprising a weighted transformation matrix calculation unit for obtaining a weighted linear transformation matrix for.

2. The confusion matrix coefficient storage unit stores a confusion matrix coefficient for emphasizing a phoneme pair, a syllable pair, or the like, which is empirically and intuitively determined and is likely to be confused. Linear transformation matrix calculator.

3. The confusion matrix coefficient storage unit stores, as a confusion matrix coefficient, a phoneme pair, a syllable pair, or the like that is easily confused and is created from a pair of a previously recognized result and corresponding teacher data. Item 1. A linear conversion matrix calculation device according to item 1.

4. The weighted inter-class covariance matrix calculation unit uses the confusion matrix coefficient P (ω _i , ω _j ) and the learning feature vector (x) according to an equation shown in the following Expression 1. , The weighted interclass covariance matrix (Σ _B ') is calculated, and The weighted transformation matrix calculation unit uses the weighted inter-class covariance matrix (Σ _B ′) and the intra-class covariance matrix (Σ _W ) according to the mathematical formula shown in the following Equation 2.
Compute the weighted linear transformation matrix (A ′) The linear conversion matrix calculation device according to any one of claims 1 to 3, characterized in that.

5. A confusion matrix coefficient storage unit for storing a phoneme pair, a syllable pair, etc., which are easily confused as a confusion matrix coefficient, and a weighted interclass covariance matrix for obtaining a weighted interclass covariance matrix using the confusion matrix coefficient and a learning feature vector. An interclass covariance matrix calculation unit, an intraclass covariance matrix calculation unit that obtains an intraclass covariance matrix from a learning feature vector, and a linear transformation using the weighted interclass covariance matrix and the intraclass covariance matrix An LDA conversion matrix calculation apparatus having a weighted conversion matrix calculation unit for obtaining a weighted linear conversion matrix for the following, an analysis apparatus for analyzing a speech signal for learning to obtain a feature vector, the feature vector and the weighted linear A matrix multiplier for a feature vector that multiplies a transformation matrix and outputs a transformed feature vector, and a Hidden Markov model (hereinafter referred to as HMM) Matrix transformation for HMM that outputs the transformed HMM and the transformed feature vector and the transformed HMM, and the one with the highest likelihood is used as the recognition result. A voice recognition device comprising a recognition unit for outputting.

6. A confusion matrix coefficient storage unit for storing a confusion matrix coefficient for emphasizing a phoneme pair, a syllable pair, or the like which is empirically and intuitively determined and which is easily confused, and the confusion matrix coefficient and the learning feature vector. A weighted inter-class covariance matrix calculation unit for obtaining a weighted inter-class covariance matrix, an intra-class covariance matrix calculation unit for obtaining an intra-class covariance matrix from a learning feature vector, and the weighted inter-class covariance matrix An LDA transformation matrix calculation device having a weighted transformation matrix calculation unit for obtaining a weighted linear transformation matrix for linear transformation using the in-class covariance matrix, and an analysis for obtaining a feature vector by analyzing a speech signal for learning An apparatus, a matrix multiplier for a feature vector that outputs a transformed feature vector by multiplying the feature vector by the weighted linear transformation matrix, and Hidden Markov Moh. The HM converted by performing multiplication of Dell (hereinafter referred to as HMM) and the weighted linear conversion matrix.
An HMM matrix multiplier for outputting M, and a recognition unit for collating the converted feature vector with the converted HMM and outputting the one having the highest likelihood as a recognition result. Speech recognizer.

7. A confusion matrix coefficient storage unit for storing, as a confusion matrix coefficient, a phoneme pair, a syllable pair, etc., which are easily confused and which are created from a pair of preliminarily recognized results and corresponding teacher data, and a confusion matrix coefficient and for learning. A weighted inter-class covariance matrix calculation unit that obtains a weighted inter-class covariance matrix using a feature vector, an intra-class covariance matrix calculation unit that obtains an intra-class covariance matrix from the learning feature vector, and the weighted class An LDA conversion matrix calculation device having a weighted conversion matrix calculation unit for obtaining a weighted linear conversion matrix for linear conversion using the inter-covariance matrix and the intra-class covariance matrix, and analyzing a speech signal for learning. An analyzing device for obtaining a feature vector, and a feature vector matrix for multiplying the feature vector by the weighted linear transformation matrix and outputting the transformed feature vector. And adder performs multiplication of the HMM and the weighted linear transformation matrix,
An HMM matrix multiplier that outputs a converted HMM, and an authentication unit that collates the converted feature vector with the converted HMM and outputs the one having the highest likelihood as a recognition result. Characteristic voice recognition device.

8. The weighted inter-class covariance matrix calculation unit uses the confusion matrix coefficient P (ω _i , ω _j ) and the learning feature vector (x) in accordance with the mathematical formula shown below. , The weighted interclass covariance matrix (Σ _B ') is calculated, and The weighted transformation matrix calculation unit uses the weighted inter-class covariance matrix (Σ _B ′) and the intra-class covariance matrix (Σ _W ) according to the mathematical expression shown in the following Equation 4.
The weighted linear transformation matrix (A ′) is calculated. [Equation 4] The voice recognition device according to claim 5, wherein the voice recognition device is a voice recognition device.