JP2003177785A - Linear transformation matrix calculation device and voice recognition device - Google Patents

Linear transformation matrix calculation device and voice recognition device

Info

Publication number
JP2003177785A
JP2003177785A JP2001375295A JP2001375295A JP2003177785A JP 2003177785 A JP2003177785 A JP 2003177785A JP 2001375295 A JP2001375295 A JP 2001375295A JP 2001375295 A JP2001375295 A JP 2001375295A JP 2003177785 A JP2003177785 A JP 2003177785A
Authority
JP
Japan
Prior art keywords
matrix
weighted
feature vector
covariance matrix
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2001375295A
Other languages
Japanese (ja)
Other versions
JP3876974B2 (en
Inventor
Tadashi Emori
正 江森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP2001375295A priority Critical patent/JP3876974B2/en
Publication of JP2003177785A publication Critical patent/JP2003177785A/en
Application granted granted Critical
Publication of JP3876974B2 publication Critical patent/JP3876974B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice recognition device having a higher performance for derivation of an LDA transformation matrix. <P>SOLUTION: When a linear transformation matrix for linear discrimination analysis is obtained, an inter-class covariance matrix is calculated. At this time, confusion matrix coefficients are multiplied. The confusion matrix coefficients are determined in accordance with frequencies of combination of pairs of phonemes, pairs of syllables or the like easy to be confused by previous voice recognition. With respect to the linear transformation matrix calculated in this process, feature vectors can be subjected to such transformation that classes easy to be confused can be more separated. As a result, a high recognition performance is obtained. <P>COPYRIGHT: (C)2003,JPO

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【発明の属する技術分野】本発明は、LDA変換行列計
算装置及び音声認識装置不特定話者の音声認識システ
ム、音声認識方法と音声認識用プログラムを記録した記
録媒体に関する。特に、識別性を向上させるため線形識
別分析の線形変換行列の作成方法と、その方法が記述さ
れたプログラムと、そのプログラムを記録した記録媒体
に関する。また、その線形変換行列を用いた音声認識シ
ステムと、音声認識用プログラムおよびプログラムを記
録した記録媒体に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an LDA conversion matrix calculation device and a voice recognition device, a voice recognition system for an unspecified speaker, a voice recognition method, and a recording medium recording a voice recognition program. In particular, the present invention relates to a method for creating a linear transformation matrix for linear discriminant analysis to improve discriminability, a program in which the method is described, and a recording medium recording the program. The present invention also relates to a voice recognition system using the linear conversion matrix, a voice recognition program, and a recording medium recording the program.

【0002】[0002]

【従来の技術】従来、音声認識の性能改善手法の一つと
てして、オーム社から出版されている石井健一朗達によ
って書かれた「パターン認識」(以後、参考文献1と称
する)の115ページから133ページにに示される様
な線形識別分析(Linear Discriminant Analysis:以後L
DAと称する)が知られている。LDAとは、クラス内
の特徴量の分散を小さく、クラス間の特徴量の分散を大
きくなるように線形変換求め、特徴ベクトル空間に線形
変換を施す手法である (以後、その様な変換をLDA変
換と称する)。音声認識におけるLDA変換は、多次元
の特徴量に対し施すことが多く、その場合、求めるLD
A変換のための演算子は、行列となる(以後、この様な
LDA変換のための行列をLDA変換行列と称する)。
音声認識における特徴ベクトルとは、1995年にNT
Tアドバンステクノロジ株式会社から出版された古井監
訳の音声認識の基礎(上)(以後、参考文献2と称する)の
134ページ〜142ページに示されるような、メルケ
プストラムやその変化量等が用いられる。また、音声認
識において前述のクラスに相当するものを表すのに、1
995年にNTTアドバンステクノロジ株式会社から出
版された古井監訳の音声認識の基礎(下)(以後、参考文
献3と称する)の102ページから138ページに示さ
れるようにヒドゥンマルコフモデル(Hidden Markov Mod
el:以後HMMと称する)の状態や出力分布等で表すこと
が多い。その様なLDA変換の結果、もとの特徴ベクト
ルが分布している空間よりもクラス間の分離度が大きく
なる。例えば、第i番目のクラスを表す記号をωとし
た場合、クラスωに最も近い特徴ベクトルが入力され
た場合、他のクラスとのゆう度(または距離)がLDA変
換を行わない場合に比べ大きく(距離の場合小さく)な
る。その結果、識別能力が高くなり認識性能も向上する
ことが期待できる。
2. Description of the Related Art Conventionally, as one of techniques for improving the performance of speech recognition, page 115 of "Pattern Recognition" (hereinafter referred to as Reference 1) written by Kenichiro Ishii published by Ohmsha. To Linear Discriminant Analysis (hereinafter referred to as L
(Referred to as DA) is known. LDA is a method of performing a linear transformation so that the variance of the feature amount within a class is small and the variance of the feature amount between classes is large, and performing a linear transform on the feature vector space (hereinafter, such a transform is performed by LDA. Referred to as conversion). In many cases, LDA conversion in speech recognition is applied to a multidimensional feature amount, and in that case, the LD
The operator for A conversion is a matrix (hereinafter, such a matrix for LDA conversion is referred to as an LDA conversion matrix).
Feature vectors in speech recognition were introduced in NT in 1995.
Mel cepstrum and its variation are used, as shown on pages 134 to 142 of the basics of speech recognition (above) published by T Advanced Technology Co., Ltd. (above) (above). . Also, in speech recognition, to represent the equivalent of the above class, 1
Hidden Markov Mod (Hidden Markov Mod) as shown on page 102 to page 138 of the basics of speech recognition (below) referred to by Director Furui, published by NTT Advanced Technology Co., Ltd.
el: hereinafter referred to as HMM), the output distribution, etc. As a result of such LDA conversion, the degree of separation between classes is larger than in the space in which the original feature vector is distributed. For example, when the symbol representing the i-th class and omega 1, if the feature vector closest to the class omega 1 is input, when the likelihood of other classes (or distance) is not performed LDA transformation It becomes larger (smaller for distance). As a result, it can be expected that the discriminating ability is enhanced and the recognition performance is also enhanced.

【0003】従来技術を用いたLDA変換行列を求める
従来型LDA変換行列計算装置とLDA変換行列を用い
た音声認識装置の例を図と数式を用いて説明する。図4
は、従来の技術を用いた従来型LDA変換行列計算装置
3を説明するための図である。従来型LDA変換行列計
算装置3は、クラス間共分散行列計算部C002とクラ
ス内共分散行列計算部C003と変換行列計算部C00
4で構成される。クラス内共分散行列計算部C002
は、学習用特徴ベクトルxと以下の数5に示される数式
(参考文献1に記載されている。)を用いてクラス内共
分散行列Σを計算し、出力する。上記数5に示される
数式中、ωは第i番目のクラス、cはクラス数、μ
はクラス毎の特徴ベクトルの平均値、Nはクラスω
のデータ数を示す。
An example of a conventional LDA conversion matrix calculation device for obtaining an LDA conversion matrix using a conventional technique and a speech recognition device using the LDA conversion matrix will be described with reference to the drawings and mathematical formulas. Figure 4
FIG. 6 is a diagram for explaining a conventional LDA conversion matrix calculation device 3 using a conventional technique. The conventional LDA conversion matrix calculation device 3 includes an interclass covariance matrix calculation unit C002, an intraclass covariance matrix calculation unit C003, and a conversion matrix calculation unit C00.
It is composed of 4. In-class covariance matrix calculation unit C002
Calculates the in-class covariance matrix Σ W using the learning feature vector x and the mathematical expression shown in the following Expression 5 (described in Reference Document 1) and outputs it. In the formula shown in the above equation 5, ω 1 is the i-th class, c is the number of classes, μ 1
Is the average value of the feature vector for each class, N i is the class ω 1
Shows the number of data of.

【0004】[0004]

【数5】 [Equation 5]

【0005】x∈ωは、特徴ベクトルxのうち、ω
に属するものを示す。P(ω)は、クラスωが出現
する事前確率で、参考文献1の128ページに示される
ように、データ数に依存する値や、クラスによらず一律
の値が設定されることが多い。上記数5に示される数式
は、各クラス毎にクラスに属する特徴ベクトルxを平均
したμから、各クラスに属する特徴ベクトルの差分の
2乗平均の和(クラス内分散)を計算している。クラス内
に含まれる特徴ベクトルが密集している程、すなわちΣ
の値が小さくなる程そのクラスへの帰属性が高くな
り、認識性能には有利に働くと思われる。クラス間共分
散行列計算部C002は、学習用特徴ベクトルxを用い
て、以下の数6に示される数式(参考文献1に記載され
ている。)を用いてクラス間共分散行列を計算し、出力
する。
[0005] x∈ω 1, out of the feature vector x, ω 1
Indicates that it belongs to. P (ω 1 ) is the prior probability that the class ω 1 appears, and as shown on page 128 of Reference 1, a value depending on the number of data or a uniform value may be set regardless of the class. Many. The mathematical expression shown in the above equation 5 calculates the sum of the mean squares of the differences of the feature vectors belonging to each class (in-class variance) from μ i obtained by averaging the feature vectors x belonging to each class. . The denser the feature vectors included in the class, that is, Σ
The smaller the value of W, the higher the degree of belonging to the class, which seems to be advantageous for recognition performance. The inter-class covariance matrix calculation unit C002 calculates the inter-class covariance matrix by using the learning feature vector x and the mathematical formula (described in Reference 1) shown in the following Expression 6. Output.

【0006】[0006]

【数6】 [Equation 6]

【0007】上記数6に示される数式は、クラス毎の特
徴ベクトルxの平均値μ毎の2乗距離の和を計算して
いる。変換行列計算部C003は、AΣAが大き
く、A ΣAを小さくなるようにLDA変換行列Aを
求める。
The mathematical formula shown in the above equation 6 is a special feature for each class.
Mean value of characteristic vector x1Calculate the sum of squared distances for each
There is. The conversion matrix calculation unit C003TΣBA is big
A, TΣWThe LDA conversion matrix A is set so that A becomes small.
Ask.

【0008】この様なLDA変換行列Aを求める方法
は、例えば上記数5に示されるような数式を最大化する
ことが考えられる。従来型LDA変換行列計算装置3に
入力する学習用用の特徴ベクトルを、学習用の音声信号
Sから計算するための分析装置C001の説明を行う。
分析装置C001は、音声信号Sを一定の時間毎(以
後、フレームと称する)に切り出し、メルケプストラム
やその変化量等で構成される特徴ベクトルxの時系列を
計算し、出力する。ここでは、音声信号に予め学習用に
集められた学習用音声信号Sを用いる(以後、学習用音
声信号Sを分析して得られた特徴ベクトルを学習用特徴
ベクトルxと称する)。
As a method for obtaining such an LDA conversion matrix A, it is conceivable to maximize the mathematical formula as shown in the above equation 5, for example. The analysis device C001 for calculating the learning feature vector input to the conventional LDA conversion matrix calculation device 3 from the learning speech signal S will be described.
The analysis device C001 cuts out the audio signal S at regular time intervals (hereinafter, referred to as a frame), calculates a time series of the feature vector x composed of the mel cepstrum and its variation, and outputs the calculated time series. Here, a learning voice signal S collected in advance for learning is used as a voice signal (hereinafter, a feature vector obtained by analyzing the learning voice signal S is referred to as a learning feature vector x).

【0009】次に従来型のLDA変換行列計算装置3の
計算動作について説明する。分析部C001が学習用音
声信号Sの全てを学習用特徴ベクトルxに変換する。従
来型LDA変換行列計算装置3に、学習用特徴ベクトル
xが入力される。全ての学習用特徴ベクトルxを用い、
クラス内共分散行列計算部C003は、上記数5に示さ
れる数式に従ってクラス内共分散行列Σを計算する。
全ての学習用特徴ベクトルxを用い、クラス間共分散行
列計算部C002は、上記数5に示される数式に従って
クラス間共分散行列Σを計算する。変換行列計算部C
004は、AΣAが大きくなるように、かつAΣ
Aが小さくなるように、以下の数7に示される数式に
したがってLDA変換行列Aを求める。
Next, the calculation operation of the conventional LDA conversion matrix calculation device 3 will be described. The analysis unit C001 converts all of the learning voice signal S into the learning feature vector x. The learning feature vector x is input to the conventional LDA conversion matrix calculation device 3. Using all learning feature vectors x,
The in-class covariance matrix calculation unit C003 calculates the in-class covariance matrix Σ W according to the mathematical formula shown in the above equation 5.
Using all the learning feature vectors x, the interclass covariance matrix calculation unit C002 calculates the interclass covariance matrix Σ B according to the mathematical formula shown in the above equation 5. Transformation matrix calculation unit C
004 is such that A T Σ B A is large and A T Σ
The LDA conversion matrix A is obtained according to the mathematical formula shown in the following Expression 7 so that W A becomes small.

【0010】[0010]

【数7】 [Equation 7]

【0011】図5は、LDA変換行列Aを用いた従来の
音声認識装置を説明するための図である。従来の音声認
識装置は、分析装置C001とLDA変換行列記憶部C
005とHMM記憶部C008と認識部C007と特徴
ベクトル用行列乗算器C006とHMM用行列乗算器C
009で構成されている。LDA変換行列記憶部C00
5は、従来型LDA変換行列計算装置で計算されたLD
A変換行列Aを記憶する。HMM記憶部C008は、H
MMを記憶する。HMMは、音素や音節など認識単位と
した状態系列等、様々な形態を選ぶことができる。
FIG. 5 is a diagram for explaining a conventional speech recognition apparatus using the LDA conversion matrix A. The conventional speech recognition apparatus includes an analysis apparatus C001 and an LDA conversion matrix storage unit C
005, an HMM storage unit C008, a recognition unit C007, a feature vector matrix multiplier C006, and an HMM matrix multiplier C
009. LDA conversion matrix storage unit C00
5 is the LD calculated by the conventional LDA conversion matrix calculation device
The A conversion matrix A is stored. The HMM storage unit C008 stores H
Store the MM. Various forms can be selected for the HMM, such as a state series as a recognition unit such as a phoneme or a syllable.

【0012】また、分布の形態は、離散分布でも連続分
布でもとることができて、ここでは、連続正規分布を用
いることとする。このときにクラスは、平均ベクトルと
それに対応する分散等を用いて表すことができる。 特
徴ベクトル用行列乗算器C006は、LDA変換行列A
と特徴ベクトルxの乗算を行う。HMM用行列乗算器C
007は、LDA変換行列AとHMMの平均ベクトルや
分散等との乗算を行う。認識部C007は、特徴ベクト
ルxとHMMのゆう度計算し、最もゆう度の高いものを
認識結果として出力する。HMMの照合の方法は、参考
文献3の125ページから128ページに示されるビタ
ビアルゴリズム等が考えられる。
The form of the distribution can be either a discrete distribution or a continuous distribution, and here a continuous normal distribution is used. At this time, the class can be represented by using an average vector and its corresponding variance. The feature vector matrix multiplier C006 uses the LDA conversion matrix A
And the feature vector x are multiplied. HMM matrix multiplier C
007 multiplies the LDA conversion matrix A and the HMM mean vector, variance, and the like. The recognition unit C007 calculates the likelihood of the feature vector x and the HMM, and outputs the one having the highest likelihood as the recognition result. As a method of collating the HMM, the Viterbi algorithm shown on pages 125 to 128 of Reference 3 can be considered.

【0013】次に、LDA変換行列Aを用いた従来の音
声認識装置の動作を説明する。 分析装置C001が、
入力された音声信号Sに対し、一定の時間間隔で分析を
行い、特徴ベクトルxを出力する。このとき、音声信号
は、学習用の音声信号ではなく、認識を行うための音声
信号である。特徴ベクトル用行列乗算器C006によっ
て、LDA変換行列記憶部C005に記憶されているL
DA変換行列Aと1フレーム毎に特徴ベクトルx(以後、
この特徴ベクトルを変換特徴ベクトルx'と称する)が乗
算される。HMM用行列乗算器C009によって、HM
M記憶部C008に記憶されているHMMの平均ベクト
ルやそれに対応する分散にLDA変換行列Aがかけられ
る(以後、このHMMを、変換HMMと称する)。認識部
C007は、変換特徴ベクトルx'と変換HMMを照合
し、ゆう度の高いものを認識結果として出力する。
Next, the operation of the conventional speech recognition apparatus using the LDA conversion matrix A will be described. The analyzer C001
The input voice signal S is analyzed at fixed time intervals, and the feature vector x is output. At this time, the voice signal is not a voice signal for learning but a voice signal for recognition. L stored in the LDA conversion matrix storage unit C005 by the feature vector matrix multiplier C006.
The DA conversion matrix A and the feature vector x for each frame (hereinafter,
This feature vector is referred to as a transformed feature vector x ′). The HMM matrix multiplier C009 allows the HM
The LDA conversion matrix A is applied to the average vector of the HMMs and the variance corresponding thereto stored in the M storage unit C008 (hereinafter, this HMM is referred to as a converted HMM). The recognition unit C007 collates the converted feature vector x ′ with the converted HMM, and outputs the one with high likelihood as a recognition result.

【0014】[0014]

【発明が解決しようとする課題】従来の技術を用いた従
来型LDA変換行列計算装置では、クラス間分散を大き
くすることで、特徴空間におけるクラス毎の分離性が良
くなっているように思われる。一方、音声認識では、ゆ
う度を基準にしているため、上記数6に示された数式を
用いて分離度を向上させたからといって、認識性能が向
上するとは限らない。上記数5及び数6に示された数式
で、クラスωにおける事前確率P(ω)を指定でき
るが、それぞれ独立なクラス毎の出現確率であり直接認
識結果に寄与していないため、性能が上がる保証はな
い。線形識別学習を行うことで音声認識性能の向上を目
的を同じにする従来技術はあるが、音声認識性能の向上
の度合には問題があった。
In the conventional LDA conversion matrix calculation device using the conventional technique, it seems that the separability of each class in the feature space is improved by increasing the interclass variance. . On the other hand, in the voice recognition, since the likelihood is used as a reference, even if the separation degree is improved by using the mathematical formula shown in the above Expression 6, the recognition performance does not always improve. The prior probabilities P (ω 1 ) in the class ω 1 can be specified by the mathematical formulas shown in the above formulas 5 and 6, but they are independent appearance probabilities for each class and do not directly contribute to the recognition result, so the performance There is no guarantee that will increase. Although there is a conventional technique that aims at improving the speech recognition performance by performing linear discriminative learning, there is a problem in the degree of improvement in the speech recognition performance.

【0015】本発明の目的は、LDA変換行列の導出に
おいて、クラス間分散の計算時に認識結果に根ざした混
同行列係数を与えることで、より認識結果に直結したL
DA変換行列を求めることで、より高性能な音声認識装
置を提供することにある。
An object of the present invention is to provide a confusion matrix coefficient rooted in the recognition result in the calculation of the interclass variance in the derivation of the LDA transformation matrix, so that L which is more directly connected to the recognition result.
It is to provide a speech recognition device with higher performance by obtaining a DA conversion matrix.

【0016】[0016]

【課題を解決するための手段】本発明におけるLDA変
換行列計算装置は、混同しやすい音素対や音節対等を混
同行列係数として記憶している混同行列係数記憶部と、
混同しやすい音素対等を考慮にいれて学習用特徴ベクト
ル等からクラス間の離れ具合を示す重み付きクラス間共
分散行列をもとめる重みつきクラス間共分散行列計算部
と、クラス内の集まり具合を示すクラス間共分散行列を
学習用特徴ベクトル等からもとめるクラス内共分散行列
計算部と、重み付きクラス間共分散行列とクラス内共分
散行列を用いて線形変換のための行列をもとめる重み付
き変換行列計算部を備える。また、前記混同行列係数記
憶部は、予め認識した結果とそれに対応する教師データ
の対から作成された混同しやすい音素対や音節対等を混
同行列係数として記憶する。
An LDA conversion matrix calculation device according to the present invention includes a confusion matrix coefficient storage unit that stores phoneme pairs, syllable pairs and the like that are easily confused as confusion matrix coefficients.
Shows the weighted inter-class covariance matrix calculation unit that finds the weighted inter-class covariance matrix that indicates the degree of separation between classes from the learning feature vectors, etc., taking into consideration easily confused phoneme pairs, etc. An intra-class covariance matrix calculation unit that finds the inter-class covariance matrix from the learning feature vectors, etc., and a weighted transformation matrix that finds the matrix for linear transformation using the weighted inter-class covariance matrix and the intra-class covariance matrix The calculator is provided. Further, the confusion matrix coefficient storage unit stores, as confusion matrix coefficients, phoneme pairs, syllable pairs, and the like that are easily confused and are created from pairs of previously recognized results and corresponding teacher data.

【0017】[0017]

【発明の実施の形態】以下、本発明の実施の形態につい
て図と数式を用いて詳細に説明する。図1は、発明の第
1の実施形態を説明するための図である。本発明におけ
る第1の実施例であるLDA変換行列計算装置1は、混同
行列係数記憶部A001と重みつきクラス間共分散行列
計算部A002とクラス内共分散行列計算部C003と
重み付き変換行列計算部A003で構成される。クラス
内共分散行列計算部C008は、従来と同じものであ
る。
BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described in detail below with reference to the drawings and mathematical expressions. FIG. 1 shows the first aspect of the invention.
FIG. 3 is a diagram for explaining the embodiment of 1. The LDA conversion matrix calculation device 1 according to the first exemplary embodiment of the present invention includes a confusion matrix coefficient storage unit A001, a weighted interclass covariance matrix calculation unit A002, an intraclass covariance matrix calculation unit C003, and a weighted conversion matrix calculation unit. It is composed of part A003. The in-class covariance matrix calculation unit C008 is the same as the conventional one.

【0018】混同行列係数記憶部A001は、クラスω
とωが混同する度合を示す混同行列係数P(ω
ω)を記憶する。混同行列係数P(ω,ω)は、
混同しやすいクラス間の関係をより強調するような値を
設定する。
The confusion matrix coefficient storage unit A001 has a class ω
confusion matrix coefficient indicating the degree j and ω i is confused P (ω i,
ω j ) is stored. The confusion matrix coefficient P (ω i , ω j ) is
Set a value that emphasizes the relationships between classes that are easily confused.

【0019】ここで、混同行列係数P(ω,ω)の
設定方法の例を説明する。例えば、ω=“い”,ω
=“き”,ω=“し”のように、3音節のクラスを仮
定する。通常は、全ての組合せで、P(ω,ω)=
αのように、αの様な一定の値に設定しておく。経験的
に、“き”と“し”が良く混同すると考えられる場合、
P(ω=“し”,ω=“き”)=α±α23とす
る。
An example of a method of setting the confusion matrix coefficient P (ω i , ω j ) will be described. For example, ω 1 = “yes”, ω 2
Assume a class of three syllables, such as = “ki”, ω 3 = “shi”. Usually, P (ω i , ω j ) = for all combinations
Like α, it is set to a constant value such as α. Empirically, if you think that “ki” and “shi” are often confused,
P (ω 3 = “shi”, ω 2 = “ki”) = α ± α 23 .

【0020】ここで、±の符号は、認識の距離尺度等に
応じて適当に設定する。α23の値は、“き”と“し”
の混同の度合に応じて、経験的に決定しても良いし、予
め認識した結果の混同の頻度から計算しても良い。ま
た、α23=α32としても良いし、α23≠α32
しても良い。
Here, the sign of ± is set appropriately according to the distance scale of recognition and the like. The value of α 23 is "ki" and "shi"
May be empirically determined according to the degree of confusion, or may be calculated from the confusion frequency of the previously recognized result. Further, α 23 = α 32 may be set, or α 23 ≠ α 32 may be set.

【0021】重みつきクラス間共分散行列計算部A00
2は、予め設定された混同行列係数P(ω,ω)と
学習用特徴ベクトルxを用いて、上記した数1に示され
る数式に従い重みつきクラス間共分散行列Σ'を計算
する。重み付き変換行列計算部A003は、重みつきク
ラス間共分散行列Σ'とクラス内共分散行列Σを用
いて、上記した数2に示される数式に従い重み付きLD
A行列A'を計算する。
Weighted interclass covariance matrix calculation unit A00
2 uses the preset confusion matrix coefficient P (ω i , ω j ) and the learning feature vector x to calculate the weighted interclass covariance matrix Σ B 'according to the mathematical formula shown in the above-mentioned equation 1. . The weighted transformation matrix calculation unit A003 uses the weighted inter-class covariance matrix Σ B ′ and the intra-class covariance matrix Σ W according to the mathematical formula shown in the above-mentioned Equation 2
Compute the A matrix A '.

【0022】上記数2に示される数式は、一例であり、
A'Σ'A'を最大にし、A'Σ A'を最小にする
ようなA'を求めることのできる評価関数であればどの
様なものでも良い。
The mathematical formula shown in the above equation 2 is an example,
A 'TΣBMaximize'A ', A'TΣ WMinimize A '
What is an evaluation function that can obtain A '
It can be something like this.

【0023】以下、本発明における第1の実施の形態の
動作を説明する。分析部装置C001等により学習音声
音声信号Sが分析された特徴ベクトルxが、重み付きク
ラス間共分散行列計算部A002とクラス内共分散行列
計算部C003に入力される。クラス内共分散行列計算
部C003は、学習用特徴ベクトルxを用い、上記数5
に示された数式に従い、クラス内共分散行列Σを計算
する。
The operation of the first embodiment of the present invention will be described below. The feature vector x in which the learning voice signal S is analyzed by the analysis unit device C001 or the like is input to the weighted inter-class covariance matrix calculation unit A002 and the intra-class covariance matrix calculation unit C003. The in-class covariance matrix calculation unit C003 uses the learning feature vector x
The in-class covariance matrix Σ W is calculated according to the formula shown in.

【0024】重み付きクラス間共分散行列計算部A00
2は、学習用特徴ベクトルxを用い、上記数1に示され
る数式に従い重み付きクラス間共分散行列Σ'計算
し、出力する。重み付き変換行列計算部A003は、
A'Σ'A'が大きく、A'Σ A'が小さくなるよ
うに重み付き変換行列A'を求め、出力する。
Weighted interclass covariance matrix calculation unit A00
2 is shown in Equation 1 above using the learning feature vector x.
Weighted interclass covariance matrix ΣB'Calculate
And output. The weighted transformation matrix calculation unit A003
A 'TΣB"A" is big, A "TΣ WA'becomes smaller
Then, the weighted transformation matrix A ′ is obtained and output.

【0025】図2は、本発明の第2の実施の形態を説明
するための図である。本発明の第2の実施例である第2
のLDA変換行列計算装置2は、LDA変換行列計算装
置1と入れ替わり度数集計部A004と認識結果記憶部
A005と教師データ記憶部A006から構成されてい
る。認識結果記憶部A005は、予め任意の音声認識装
置等によって音声を認識された結果である、認識結果R
が記憶されている。
FIG. 2 is a diagram for explaining the second embodiment of the present invention. Second Embodiment, which is the second embodiment of the present invention
The LDA conversion matrix calculation device 2 is composed of an LDA conversion matrix calculation device 1, a replacement frequency counting unit A004, a recognition result storage unit A005, and a teacher data storage unit A006. The recognition result storage unit A005 is a result of recognition of a voice by an arbitrary voice recognition device or the like in advance.
Is remembered.

【0026】認識結果Rは、音素単位や音節単位、ある
いは単語や文章等どのような表現形式でも良い。教師デ
ータ記憶部A006は、認識結果Rに対応する正解であ
る教師データVを記憶している。教師データVは、認識
結果Rに1対1で対応する必要がある。
The recognition result R may be a phoneme unit, a syllable unit, or any expression form such as a word or a sentence. The teacher data storage unit A006 stores the teacher data V that is the correct answer corresponding to the recognition result R. The teacher data V needs to correspond to the recognition result R on a one-to-one basis.

【0027】また、認識結果Rや教師データVは、どの
様な発声から収集しても良い。入れ替わり度数集計部A
004は、認識結果Rと教師データVを用い、HMMな
どの状態や分布で表されるクラス対毎に混同している度
合を計算し、混同の度合を混同行列係数P(ω
ω)として出力する。 LDA変換行列計算部1は、入
れ替わり度数集計部A004で出力されたP(ω,ω
)と学習用特徴ベクトルxをもちいて、重み付きLD
A変換行列A'を計算し、出力する。
The recognition result R and the teacher data V may be collected from any utterance. Replacement frequency counting section A
004, using the recognition result R and the teacher data V, calculates the degree of confusion for each class pair represented by a state or distribution such as HMM, and determines the degree of confusion by the confusion matrix coefficient P (ω i ,
output as ω j ). The LDA conversion matrix calculation unit 1 outputs P (ω i , ω) output by the replacement frequency aggregation unit A004.
j ) and the learning feature vector x, and weighted LD
The A conversion matrix A ′ is calculated and output.

【0028】図3は、本発明の第3の実施の形態を説明
するための図である。本発明の第3の実施の形態である
音声認識装置は、LDA変換行列計算装置1と分析装置
C001とHMM記憶部C008と認識部C007と特
徴ベクトル用行列乗算器C006とHMM用行列乗算器
C009で構成されている。分析装置C001とHMM
記憶部C008と認識部C007と特徴ベクトル用行列
乗算器C006とHMM用行列乗算器C009は、従来
と同じ機能を持つ。LDA変換行列計算装置1は、第1
の実施の形態と同様である。
FIG. 3 is a diagram for explaining the third embodiment of the present invention. The speech recognition apparatus according to the third embodiment of the present invention includes an LDA conversion matrix calculation apparatus 1, an analysis apparatus C001, an HMM storage unit C008, a recognition unit C007, a feature vector matrix multiplier C006, and an HMM matrix multiplier C009. It is composed of. Analyzer C001 and HMM
The storage unit C008, the recognition unit C007, the feature vector matrix multiplier C006, and the HMM matrix multiplier C009 have the same functions as conventional ones. The LDA conversion matrix calculation device 1 has a first
This is the same as the embodiment.

【0029】以下、本音声認識装置の動作について説明
する。LDA変換行列計算装置1が学習用特徴ベクトル
を用いて計算した重み付きLDA変換行列A'を出力す
る。分析装置C001は、音声信号Sを1フレーム毎に
分析し、特徴ベクトルxを出力する。特徴ベクトル用行
列乗算器C006は、特徴ベクトルxと重み付きLDA
変換行列A'の乗算を行う。HMM用行列乗算器C00
9は、HMM記憶部C008に記憶されているHMMと
重み付きLDA変換行列A'の乗算を行う。認識部C0
07は、重み付きLDA変換行列A'が掛けられた特徴
ベクトルxと重み付きLDA変換行列A'が掛けられた
HMMとを照合し、最もゆう度の高いものを認識結果と
して出力する。
The operation of this speech recognition apparatus will be described below. The LDA conversion matrix calculation device 1 outputs the weighted LDA conversion matrix A ′ calculated using the learning feature vector. The analyzer C001 analyzes the audio signal S for each frame and outputs the feature vector x. The feature vector matrix multiplier C006 calculates the feature vector x and the weighted LDA.
The conversion matrix A ′ is multiplied. HMM matrix multiplier C00
9 multiplies the HMM stored in the HMM storage unit C008 by the weighted LDA conversion matrix A ′. Recognition unit C0
07 collates the feature vector x multiplied by the weighted LDA conversion matrix A ′ with the HMM multiplied by the weighted LDA conversion matrix A ′, and outputs the one having the highest likelihood as the recognition result.

【0030】[0030]

【発明の効果】本発明によれば、クラス間距離を求める
際に用いる混同行列係数を経験的または認識結果から、
認識が困難な混同しやすい音素対等を強調するような混
同行列係数として与え、LDA変換行列の計算に結びつ
けた。したがって、もともと認識が困難な混同しやすい
音素対等を強調して分離することができるため、認識性
能向上を見込むことができる。
According to the present invention, the confusion matrix coefficient used in obtaining the interclass distance is empirically or from the recognition result,
It was given as a confusion matrix coefficient that emphasizes phoneme pairs that are difficult to recognize and easily confused, and was linked to the calculation of the LDA conversion matrix. Therefore, since it is possible to emphasize and separate phoneme pairs that are originally difficult to recognize and are easily confused, it is possible to expect improvement in recognition performance.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の第1の実施の形態を示した図である。FIG. 1 is a diagram showing a first exemplary embodiment of the present invention.

【図2】本発明の第2の実施の形態を示した図である。FIG. 2 is a diagram showing a second exemplary embodiment of the present invention.

【図3】本発明の第3の実施の形態を示した図である。FIG. 3 is a diagram showing a third exemplary embodiment of the present invention.

【図4】従来の技術を用いた線形変換行列計算装置を示
した図である。
FIG. 4 is a diagram showing a linear transformation matrix calculation device using a conventional technique.

【図5】従来の技術を用いた線形変換行列を用いた音声
認識装置を説明するための図である。
FIG. 5 is a diagram for explaining a speech recognition device using a linear transformation matrix using a conventional technique.

【符号の説明】[Explanation of symbols]

1 LDA変換行列計算装置 2 第2LDA変換行列計算装置 3 従来型LDA変換行列計算装置 C001 分析装置 C002 クラス間共分散行列計算部 C003 クラス内共分散行列計算部 C004 変換行列計算部 C005 LDA変換行列記憶部 C006 特徴ベクトル用行列乗算器 C007 認識部 C008 HMM記憶部 C009 HMM用行列乗算器 A001 混同行列係数記憶部 A002 重み付きクラス間共分散行列計算部 A003 重み付き変換行列計算部 A004 入れ替わり度数集計部 A005 認識結果記憶部 A006 教師データ記憶部 S 音声記号 x 特徴ベクトル A LDA変換行列 A' 重み付きLDA変換行列 Σ クラス内共分散行列 Σ クラス間共分散行列 Σ' 重み付きクラス間共分散行列 P(ω,ω) 混同行列係数1 LDA conversion matrix calculation device 2 2nd LDA conversion matrix calculation device 3 Conventional LDA conversion matrix calculation device C001 Analysis device C002 Interclass covariance matrix calculation unit C003 In-class covariance matrix calculation unit C004 Conversion matrix calculation unit C005 LDA conversion matrix storage unit Part C006 Feature vector matrix multiplier C007 Recognition part C008 HMM storage part C009 HMM matrix multiplier A001 Confusion matrix coefficient storage part A002 Weighted interclass covariance matrix calculation part A003 Weighted conversion matrix calculation part A004 Swap frequency calculation part A005 the recognition result storage unit A006 teacher data storage unit S phonetic x feature vectors A LDA transformation matrix A 'weighted LDA transformation matrix sigma W intraclass covariance matrix sigma B interclass covariance matrix sigma B' weighted inter-class covariance matrix P (ω i , ω j ) Confusion matrix coefficient

Claims (8)

【特許請求の範囲】[Claims] 【請求項1】 混同しやすい音素対や音節対等を混同行
列係数として記憶する混同行列係数記憶部と、 前記混同行列係数と学習用特徴ベクトルを用いて重み付
きクラス間共分散行列を求める重みつきクラス間共分散
行列計算部と、 クラス内共分散行列を学習用特徴ベクトルから求めるク
ラス内共分散行列計算部と、 前記重み付きクラス間共分散行列と前記クラス内共分散
行列を用いて線形変換のための重み付き線形変換行列を
求める重み付き変換行列計算部を備えたことを特徴とす
る線形変換行列計算装置。
1. A confusion matrix coefficient storage unit that stores a phoneme pair, a syllable pair, and the like that are easily confused as a confusion matrix coefficient, and a weighted inter-class covariance matrix that uses the confusion matrix coefficient and a learning feature vector. Inter-class covariance matrix calculation unit, intra-class covariance matrix calculation unit that obtains the intra-class covariance matrix from the learning feature vector, and linear transformation using the weighted inter-class covariance matrix and the intra-class covariance matrix A linear transformation matrix calculation device comprising a weighted transformation matrix calculation unit for obtaining a weighted linear transformation matrix for.
【請求項2】 前記混同行列係数記憶部は、経験的およ
び直観的に決められた混同しやすい音素対や音節対等に
強調する混同行列係数を記憶することを特徴とする請求
項1に記載の線形変換行列計算装置。
2. The confusion matrix coefficient storage unit stores a confusion matrix coefficient for emphasizing a phoneme pair, a syllable pair, or the like, which is empirically and intuitively determined and is likely to be confused. Linear transformation matrix calculator.
【請求項3】 前記混同行列係数記憶部は、予め認識し
た結果とそれに対応する教師データの対から作成された
混同しやすい音素対や音節対等を混同行列係数として記
憶することを特徴とする請求項1記載の線形変換行列計
算装置。
3. The confusion matrix coefficient storage unit stores, as a confusion matrix coefficient, a phoneme pair, a syllable pair, or the like that is easily confused and is created from a pair of a previously recognized result and corresponding teacher data. Item 1. A linear conversion matrix calculation device according to item 1.
【請求項4】 前記重みつきクラス間共分散行列計算部
は、前記混同行列係数P(ω,ω)と前記学習用特
徴ベクトル(x)を用いて以下の数1に示される数式に
従って、前記重み付きクラス間共分散行列(Σ')を
計算し、 【数1】 前記重み付き変換行列計算部は、前記重み付きクラス間
共分散行列(Σ')と前記クラス内共分散行列
(Σ)を用いて以下の数2に示される数式に従って、
前記重み付き線形変換行列(A')を計算する 【数2】 ことを特徴とする請求項1〜3のいずれか一つに記載の
線形変換行列計算装置。
4. The weighted inter-class covariance matrix calculation unit uses the confusion matrix coefficient P (ω i , ω j ) and the learning feature vector (x) according to an equation shown in the following Expression 1. , The weighted interclass covariance matrix (Σ B ') is calculated, and The weighted transformation matrix calculation unit uses the weighted inter-class covariance matrix (Σ B ′) and the intra-class covariance matrix (Σ W ) according to the mathematical formula shown in the following Equation 2.
Compute the weighted linear transformation matrix (A ′) The linear conversion matrix calculation device according to any one of claims 1 to 3, characterized in that.
【請求項5】 混同しやすい音素対や音節対等を混同行
列係数として記憶する混同行列係数記憶部と、前記混同
行列係数と学習用特徴ベクトルを用いて重み付きクラス
間共分散行列を求める重みつきクラス間共分散行列計算
部と、クラス内共分散行列を学習用特徴ベクトルから求
めるクラス内共分散行列計算部と、前記重み付きクラス
間共分散行列と前記クラス内共分散行列を用いて線形変
換のための重み付き線形変換行列を求める重み付き変換
行列計算部を備えたLDA変換行列計算装置と、 学習用の音声信号を分析し特徴ベクトルを求める分析装
置と、 前記特徴ベクトルと前記重み付き線形変換行列の乗算を
行い、変換された特徴ベクトルを出力する特徴ベクトル
用行列乗算器と、 ヒドゥンマルコフモデル(以下、HMMと称す。)と重
み付き線形変換行列の乗算を行い、変換されたHMMを
出力するHMM用行列乗算器と、 前記変換された特徴ベクトルと前記変換されたHMMを
照合して最もゆう度の高いものを認識結果として出力す
る認式部を備えたことを特徴とする音声認識装置。
5. A confusion matrix coefficient storage unit for storing a phoneme pair, a syllable pair, etc., which are easily confused as a confusion matrix coefficient, and a weighted interclass covariance matrix for obtaining a weighted interclass covariance matrix using the confusion matrix coefficient and a learning feature vector. An interclass covariance matrix calculation unit, an intraclass covariance matrix calculation unit that obtains an intraclass covariance matrix from a learning feature vector, and a linear transformation using the weighted interclass covariance matrix and the intraclass covariance matrix An LDA conversion matrix calculation apparatus having a weighted conversion matrix calculation unit for obtaining a weighted linear conversion matrix for the following, an analysis apparatus for analyzing a speech signal for learning to obtain a feature vector, the feature vector and the weighted linear A matrix multiplier for a feature vector that multiplies a transformation matrix and outputs a transformed feature vector, and a Hidden Markov model (hereinafter referred to as HMM) Matrix transformation for HMM that outputs the transformed HMM and the transformed feature vector and the transformed HMM, and the one with the highest likelihood is used as the recognition result. A voice recognition device comprising a recognition unit for outputting.
【請求項6】 経験的および直観的に決められた混同し
やすい音素対や音節対等に強調する混同行列係数を記憶
する混同行列係数記憶部と、前記混同行列係数と学習用
特徴ベクトルを用いて重み付きクラス間共分散行列を求
める重みつきクラス間共分散行列計算部と、クラス内共
分散行列を学習用特徴ベクトルから求めるクラス内共分
散行列計算部と、前記重み付きクラス間共分散行列と前
記クラス内共分散行列を用いて線形変換のための重み付
き線形変換行列を求める重み付き変換行列計算部を備え
たLDA変換行列計算装置と、 学習用の音声信号を分析し特徴ベクトルを求める分析装
置と、 前記特徴ベクトルと前記重み付き線形変換行列の乗算を
行い変換された特徴ベクトルを出力する特徴ベクトル用
行列乗算器と、 ヒドゥンマルコフモデル(以下、HMMと称す。)と前
記重み付き線形変換行列の乗算を行い、変換されたHM
Mを出力するHMM用行列乗算器と、 前記変換された特徴ベクトルと前記変換されたHMMを
照合して最もゆう度の高いものを認識結果として出力す
る認式部を備えたことを特徴とする音声認識装置。
6. A confusion matrix coefficient storage unit for storing a confusion matrix coefficient for emphasizing a phoneme pair, a syllable pair, or the like which is empirically and intuitively determined and which is easily confused, and the confusion matrix coefficient and the learning feature vector. A weighted inter-class covariance matrix calculation unit for obtaining a weighted inter-class covariance matrix, an intra-class covariance matrix calculation unit for obtaining an intra-class covariance matrix from a learning feature vector, and the weighted inter-class covariance matrix An LDA transformation matrix calculation device having a weighted transformation matrix calculation unit for obtaining a weighted linear transformation matrix for linear transformation using the in-class covariance matrix, and an analysis for obtaining a feature vector by analyzing a speech signal for learning An apparatus, a matrix multiplier for a feature vector that outputs a transformed feature vector by multiplying the feature vector by the weighted linear transformation matrix, and Hidden Markov Moh. The HM converted by performing multiplication of Dell (hereinafter referred to as HMM) and the weighted linear conversion matrix.
An HMM matrix multiplier for outputting M, and a recognition unit for collating the converted feature vector with the converted HMM and outputting the one having the highest likelihood as a recognition result. Speech recognizer.
【請求項7】 予め認識した結果とそれに対応する教師
データの対から作成された混同しやすい音素対や音節対
等を混同行列係数として記憶する混同行列係数記憶部
と、前記混同行列係数と学習用特徴ベクトルを用いて重
み付きクラス間共分散行列を求める重みつきクラス間共
分散行列計算部と、クラス内共分散行列を学習用特徴ベ
クトルから求めるクラス内共分散行列計算部と、前記重
み付きクラス間共分散行列と前記クラス内共分散行列を
用いて線形変換のための重み付き線形変換行列を求める
重み付き変換行列計算部を備えたLDA変換行列計算装
置と、 学習用の音声信号を分析し特徴ベクトルを求める分析装
置と、 前記特徴ベクトルと前記重み付き線形変換行列の乗算を
行い、変換された特徴ベクトルを出力する特徴ベクトル
用行列乗算器と、 前記HMMと前記重み付き線形変換行列の乗算を行い、
変換されたHMMを出力するHMM用行列乗算器と、 前記変換された特徴ベクトルと前記変換されたHMMを
照合して最もゆう度の高いものを認識結果として出力す
る認式部を備えたことを特徴とする音声認識装置。
7. A confusion matrix coefficient storage unit for storing, as a confusion matrix coefficient, a phoneme pair, a syllable pair, etc., which are easily confused and which are created from a pair of preliminarily recognized results and corresponding teacher data, and a confusion matrix coefficient and for learning. A weighted inter-class covariance matrix calculation unit that obtains a weighted inter-class covariance matrix using a feature vector, an intra-class covariance matrix calculation unit that obtains an intra-class covariance matrix from the learning feature vector, and the weighted class An LDA conversion matrix calculation device having a weighted conversion matrix calculation unit for obtaining a weighted linear conversion matrix for linear conversion using the inter-covariance matrix and the intra-class covariance matrix, and analyzing a speech signal for learning. An analyzing device for obtaining a feature vector, and a feature vector matrix for multiplying the feature vector by the weighted linear transformation matrix and outputting the transformed feature vector. And adder performs multiplication of the HMM and the weighted linear transformation matrix,
An HMM matrix multiplier that outputs a converted HMM, and an authentication unit that collates the converted feature vector with the converted HMM and outputs the one having the highest likelihood as a recognition result. Characteristic voice recognition device.
【請求項8】 前記重みつきクラス間共分散行列計算部
は、前記混同行列係数P(ω,ω)と前記学習用特
徴ベクトル(x)を用いて以下の数3に示される数式に
従って、前記重み付きクラス間共分散行列(Σ')を
計算し、 【数3】 前記重み付き変換行列計算部は、前記重み付きクラス間
共分散行列(Σ')と前記クラス内共分散行列
(Σ)を用いて以下の数4に示される数式に従って、
前記重み付き線形変換行列(A')を計算する。 【数4】 ことを特徴とする請求項5〜7のいずれか一つに記載の
音声認識装置。
8. The weighted inter-class covariance matrix calculation unit uses the confusion matrix coefficient P (ω i , ω j ) and the learning feature vector (x) in accordance with the mathematical formula shown below. , The weighted interclass covariance matrix (Σ B ') is calculated, and The weighted transformation matrix calculation unit uses the weighted inter-class covariance matrix (Σ B ′) and the intra-class covariance matrix (Σ W ) according to the mathematical expression shown in the following Equation 4.
The weighted linear transformation matrix (A ′) is calculated. [Equation 4] The voice recognition device according to claim 5, wherein the voice recognition device is a voice recognition device.
JP2001375295A 2001-12-10 2001-12-10 Linear transformation matrix calculation device and speech recognition device Expired - Lifetime JP3876974B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2001375295A JP3876974B2 (en) 2001-12-10 2001-12-10 Linear transformation matrix calculation device and speech recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2001375295A JP3876974B2 (en) 2001-12-10 2001-12-10 Linear transformation matrix calculation device and speech recognition device

Publications (2)

Publication Number Publication Date
JP2003177785A true JP2003177785A (en) 2003-06-27
JP3876974B2 JP3876974B2 (en) 2007-02-07

Family

ID=19183698

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2001375295A Expired - Lifetime JP3876974B2 (en) 2001-12-10 2001-12-10 Linear transformation matrix calculation device and speech recognition device

Country Status (1)

Country Link
JP (1) JP3876974B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013205807A (en) * 2012-03-29 2013-10-07 Toshiba Corp Model learning device, model manufacturing method and program
CN103440498A (en) * 2013-08-20 2013-12-11 华南理工大学 Surface electromyogram signal identification method based on LDA algorithm
WO2022009408A1 (en) * 2020-07-10 2022-01-13 日本電気株式会社 Information processing device, information processing method, and recording medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546555B (en) * 2009-04-14 2011-05-11 清华大学 Constraint heteroscedasticity linear discriminant analysis method for language identification

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013205807A (en) * 2012-03-29 2013-10-07 Toshiba Corp Model learning device, model manufacturing method and program
CN103440498A (en) * 2013-08-20 2013-12-11 华南理工大学 Surface electromyogram signal identification method based on LDA algorithm
WO2022009408A1 (en) * 2020-07-10 2022-01-13 日本電気株式会社 Information processing device, information processing method, and recording medium

Also Published As

Publication number Publication date
JP3876974B2 (en) 2007-02-07

Similar Documents

Publication Publication Date Title
CN101136199B (en) Voice data processing method and equipment
KR100826875B1 (en) On-line speaker recognition method and apparatus for thereof
Mitra et al. Articulatory features from deep neural networks and their role in speech recognition
EP2888669B1 (en) Method and system for selectively biased linear discriminant analysis in automatic speech recognition systems
JPH05216490A (en) Apparatus and method for speech coding and apparatus and method for speech recognition
US20110123965A1 (en) Speech Processing and Learning
US11335324B2 (en) Synthesized data augmentation using voice conversion and speech recognition models
CN110970036B (en) Voiceprint recognition method and device, computer storage medium and electronic equipment
Das et al. Bangladeshi dialect recognition using Mel frequency cepstral coefficient, delta, delta-delta and Gaussian mixture model
JP2003308090A (en) Device, method and program for recognizing speech
US20030093269A1 (en) Method and apparatus for denoising and deverberation using variational inference and strong speech models
Ismail et al. Mfcc-vq approach for qalqalahtajweed rule checking
Ahsiah et al. Tajweed checking system to support recitation
Stuttle A Gaussian mixture model spectral representation for speech recognition
US8965832B2 (en) Feature estimation in sound sources
Patil et al. Marathi connected word speech recognition system
US10706867B1 (en) Global frequency-warping transformation estimation for voice timbre approximation
JP4716125B2 (en) Pronunciation rating device and program
Yavuz et al. A Phoneme-Based Approach for Eliminating Out-of-vocabulary Problem Turkish Speech Recognition Using Hidden Markov Model.
JP2003177785A (en) Linear transformation matrix calculation device and voice recognition device
Kinnunen Optimizing spectral feature based text-independent speaker recognition
Imperl et al. Clustering of triphones using phoneme similarity estimation for the definition of a multilingual set of triphones
JP2000194392A (en) Noise adaptive type voice recognition device and recording medium recording noise adaptive type voice recognition program
JPH1097285A (en) Speech recognition system
US20100204985A1 (en) Frequency axis warping factor estimation apparatus, system, method and program

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20041014

RD03 Notification of appointment of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7423

Effective date: 20050422

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20061003

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20061011

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20061024

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

Ref document number: 3876974

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091110

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101110

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111110

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111110

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121110

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121110

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20131110

Year of fee payment: 7

EXPY Cancellation because of completion of term