JPH04125599A - Reference pattern generating method - Google Patents
Reference pattern generating methodInfo
- Publication number
- JPH04125599A JPH04125599A JP2246863A JP24686390A JPH04125599A JP H04125599 A JPH04125599 A JP H04125599A JP 2246863 A JP2246863 A JP 2246863A JP 24686390 A JP24686390 A JP 24686390A JP H04125599 A JPH04125599 A JP H04125599A
- Authority
- JP
- Japan
- Prior art keywords
- vector output
- states
- output probability
- transition
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 22
- 238000009826 distribution Methods 0.000 claims abstract description 51
- 230000007704 transition Effects 0.000 claims description 30
- 239000000203 mixture Substances 0.000 claims description 8
- 230000002194 synthesizing effect Effects 0.000 abstract description 2
- 238000012549 training Methods 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000003909 pattern recognition Methods 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 241000283690 Bos taurus Species 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Abstract
Description
【発明の詳細な説明】
〔産業上の利用分野〕
本発明は、音声認識等パターン認識に用いられる標準パ
ターンの作成方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a method for creating standard patterns used in pattern recognition such as speech recognition.
音声認識などパターン認識の分野で、認識用の標準パタ
ーンとして確率モデルを用いる方法が近年注目されてお
り、特に隠れマルコフモデル(以下HMMと呼ぶ)は音
声認識の分野で標準パターンを表すモデルとして広(用
いられている。In the field of pattern recognition such as speech recognition, methods that use probabilistic models as standard patterns for recognition have attracted attention in recent years, and Hidden Markov Models (hereinafter referred to as HMMs) are particularly popular as models representing standard patterns in the field of speech recognition. (Used.
HMMは状態の集合と状態間の遷移確率と状態あるいは
遷移のベクトル出力確率によって定義され、入カバター
ンに対する各HMMの尤度を計算することにより認識を
行う。HMMによる音声認識については、刊行物「確率
モデルによる音声認識」牛用を一著に詳しく述べられて
いる。HMMs are defined by a set of states, transition probabilities between states, and vector output probabilities of states or transitions, and recognition is performed by calculating the likelihood of each HMM with respect to input cover turns. Speech recognition using HMM is described in detail in the publication ``Speech Recognition Using Probabilistic Models'' for Cattle.
各状態(あるいは遷移)のベクトル出力確率が混合連続
分布で表されるHMMモデルのパラメータを決定する方
法として、Baum−Welchアルゴリズムなど、あ
る初期値から学習用データを用いてパラメータを繰り返
し更新する学習法が知られている。この場合、出力確率
分布の平均値などのパラメータの初期値は、混合する各
分布毎に決定する必要がある。これらのパラメータの初
期値を与える方法としては、
(a)乱数で与える
(b)単一の分布の場合のパラメータに乱数値でぼかし
作用を行う(「連続出力分布型HMMによる日本語音韻
認識の検討」)電子情報通信学会音声研究会資料5P8
9−48)
などの方法が知られている。As a method for determining the parameters of an HMM model in which the vector output probability of each state (or transition) is expressed by a mixed continuous distribution, learning that repeatedly updates parameters using training data from a certain initial value, such as the Baum-Welch algorithm, is used. The law is known. In this case, the initial values of parameters such as the average value of the output probability distribution need to be determined for each distribution to be mixed. The methods of giving initial values for these parameters are: (a) giving them as random numbers; (b) blurring the parameters in the case of a single distribution with random values (see ``Japanese Phonological Recognition Using Continuous Output Distribution HMM''). ``Consideration'') Institute of Electronics, Information and Communication Engineers Speech Study Group Materials 5P8
9-48) and other methods are known.
一方、ある初期値から更新によって求めるのではなく学
習データから直接パラメータを決定する方法として、
(C)学習データをセグメンテーションしたあとクラス
タリングを行って混合する分布数のクラスタを求め、各
クラスタのデータから平均値等のパラメータを求める方
法が知られている(°°旧ghPerformance
Connected Digit Recognit
ion IJsing Hidden Markov
Models″、 IEEE Transaction
on Acoustics、 5peech、 and
Signal ProcessingVol、37+
No、8+ pp、1214−1224. Augu
st 1989)。On the other hand, as a method to directly determine the parameters from the training data instead of determining them by updating from a certain initial value, (C) segment the training data and then perform clustering to find clusters of the number of distributions to be mixed, and from the data of each cluster A method for determining parameters such as average values is known (°°formerly known as ghPerformance).
Connected Digit Recognize
ion IJsing Hidden Markov
Models'', IEEE Transaction
on Acoustics, 5peech, and
Signal Processing Vol, 37+
No, 8+ pp, 1214-1224. August
st 1989).
このようにして決められた値を初期値として、Baum
−Welchアルゴリズムなどにより更新を行うことも
できる。Using the value determined in this way as the initial value, Baum
-Updating can also be performed using the Welch algorithm or the like.
〔発明が解決しようとする課題]
学習により繰り返しパラメータを更新する方法を用いる
場合、効率よく学習が行われるためには初期値の設定が
重要であることが知られているが、(a)のように乱数
を用いたり(b)のように単一の分布の場合のパラメー
タを用いるのでは、学習の収束までに時間がかかり、ま
た収束値も全体の最適値ではなく局所的な最適値になる
可能性が高い。[Problem to be solved by the invention] When using a method of repeatedly updating parameters through learning, it is known that setting initial values is important for efficient learning. Using random numbers as in (b) or parameters for a single distribution as in (b), it takes time for learning to converge, and the convergence value is not the overall optimal value but the local optimal value. There is a high possibility that it will.
方(C)の方法は、パラメータ更新のための学習を必ず
しも必要とせず、また、更新の初期値として用いる場合
でも少ない繰り返し回数で収束すると考えられるが、ク
ラスタリングのための計算などが必要で、計算量が多く
なるという欠点があった。Method (C) does not necessarily require learning for parameter updating, and is thought to converge with a small number of iterations even when used as an initial value for updating, but it requires calculations for clustering, etc. The drawback is that the amount of calculation increases.
本発明の目的は、このような欠点を解消した標準パター
ン作成方法を提供することにある。An object of the present invention is to provide a standard pattern creation method that eliminates such drawbacks.
(課題を解決するための手段]
第1の発明は、状態の集合と状態間の遷移確率と状態あ
るいは遷移のベクトル出力確率とによって定義される標
準パターンの作成方法において、ベクトル出力確率が連
続分布で表される複数の標準パターンの対応する状態あ
るいは遷移のベクトル出力確率分布を重み付きで混合し
た混合連続分布を状態あるいは遷移のベクトル出力確率
とする標準パターンを作成することを特徴とする。(Means for Solving the Problems) A first invention provides a method for creating a standard pattern defined by a set of states, transition probabilities between states, and vector output probabilities of states or transitions, in which the vector output probabilities are distributed continuously. The present invention is characterized in that a standard pattern is created in which the vector output probability of a state or transition is a mixed continuous distribution obtained by weighting and mixing vector output probability distributions of corresponding states or transitions of a plurality of standard patterns represented by .
第2の発明は、状態の集合と状態間の遷移確率と状態あ
るいは遷移のベクトル出力確率とによって定義される音
声認識用の標準パターンの作成方法において、
複数の話者について話者ごとにその話者の音声データを
用いて学習して作成されたベクトル出力確率が連続分布
で表される標準パターンの対応する状態あるいは遷移の
ベクトル出力確率分布を重み付きで混合した混合連続分
布を状態あるいは遷移のベクトル出力確率とする標準パ
ターンを作成することを特徴とする。A second invention is a method for creating a standard pattern for speech recognition defined by a set of states, transition probabilities between states, and vector output probabilities of states or transitions, in which the speech recognition is performed for each speaker for a plurality of speakers. The vector output probability created by learning using the voice data of the person who is learning is expressed as a continuous distribution.The vector output probability distribution of the corresponding state or transition of the standard pattern is mixed with a weight, and the vector output probability distribution of the state or transition is expressed as a continuous distribution. The feature is that a standard pattern is created as a vector output probability.
第3の発明は、状態の集合と状態間の遷移確率と状態あ
るいは遷移のベクトル出力確率とによって定義される音
声認識用の標準パターンの作成方法において、
異なる環境で発声あるいは収録した音声データを用いて
環境ごとに学習して作成されたベクトル出力確率が連続
分布で表される標準パターンの対応する状態あるいは遷
移のベクトル出力確率分布を重み付きで混合した混合連
続分布を状態あるいは遷移のベクトル出力確率とする標
準パターンを作成することを特徴とする。The third invention is a method for creating a standard pattern for speech recognition defined by a set of states, transition probabilities between states, and vector output probabilities of states or transitions, using speech data uttered or recorded in different environments. The vector output probability created by learning for each environment is expressed as a continuous distribution.The vector output probability of the state or transition is a mixed continuous distribution that is a weighted mixture of the vector output probability distribution of the corresponding state or transition of the standard pattern. The feature is that a standard pattern is created.
[作用]
本発明によれば、混合連続分布で表されるベクトル出力
確率分布を、すでに学習済みの複数の標準パターンのベ
クトル出力確率分布から合成して求めることにより、標
準パターンのパラメータを簡易に決定することができる
。また、合成に用いる標準パターンを適切に選べば、B
aum−Welch法などの学習の初期パラメータとし
て用いる場合、乱数で初期パラメータを決定する場合な
どに比べ少ない学習回数で収束し、局所的な最適値に収
束する確率も小さくなると期待される。また、学習によ
るパラメータ更新を行わずそのまま用いることもできる
。[Operation] According to the present invention, the parameters of the standard pattern can be easily determined by synthesizing the vector output probability distribution represented by a mixed continuous distribution from the vector output probability distributions of a plurality of standard patterns that have already been learned. can be determined. In addition, if the standard pattern used for synthesis is appropriately selected, B
When used as an initial parameter for learning such as the aum-Welch method, it is expected that convergence will occur with a smaller number of learning times than when initial parameters are determined using random numbers, and the probability of convergence to a locally optimal value will be reduced. Furthermore, it is also possible to use it as is without updating the parameters through learning.
第2の発明のように、合成に複数の話者について話者ご
とにその話者の音声データを用いて学習して作成された
標準パターンを用いれば、ベクトル出力確率が混合連続
出力分布で表される不特定話者音声認識用の標準パター
ンを簡易乙こ作成することができる。As in the second invention, if a standard pattern created by learning for multiple speakers using the speech data of each speaker is used for synthesis, the vector output probability can be expressed as a mixed continuous output distribution. It is possible to easily create a standard pattern for speaker-independent speech recognition.
第3の発明のように、合成に異なる環境で発声あるいは
収録した音声データを用いて環境ごとに学習して作成さ
れた標準パターンを用いれば、ベクトル出力確率が混合
連続出力分布で表される環境の変動に強い標準パターン
を簡易に作成することができる。As in the third invention, if a standard pattern created by learning each environment using audio data uttered or recorded in different environments is used for synthesis, an environment where the vector output probability is expressed by a mixed continuous output distribution. Standard patterns that are resistant to fluctuations can be easily created.
[実施例〕
第1図は、第1の発明を不特定話者音声認識用のHMM
モデル作成に適用した実施例を説明するためのブロック
図である。話者Aの学習データ(1)からHMMモデル
A(3)を、話者Bの学習データ(2)からHMMモデ
ルB(4)を作成する。話者A、Bとしては、たとえば
男性1女性から標準的な話者を1名ずつ選んで用いる。[Example] FIG. 1 shows the first invention as an HMM for speaker-independent speech recognition.
FIG. 2 is a block diagram for explaining an example applied to model creation. HMM model A (3) is created from speaker A's learning data (1), and HMM model B (4) is created from speaker B's learning data (2). As speakers A and B, standard speakers are selected from, for example, one male and one female.
8MMモデルは第2図に示すような形のモデルとする。The 8MM model has a shape as shown in FIG.
各状態iに対し、状態遷移確率a iil a ii
*+ (a =i+ a it、、: l )と出力ベ
クトルyに対する出力確率分布b、(y)が定められて
いる。モデルAの状態遷移確率、出力確率分布を、それ
ぞれaA′bA(y)などと表す。出力ベクトル確率分
布が単一ガウス分布で表されたとすると、
b+A(y)=N (y、 μ、^、Σ、^)b=”
(y)=N (y、 μ4− Σ、II)と表され
る。ここで、N (y、 μ1.Σ、)は平均ベクト
ルをμ8、共分散行列をΣ、とする多次元ガウス分布を
表す。モデルAとモデルBから、不特定話者音声認識用
のHMMモデルC(5)を作成する。モデルCの状態遷
移確率をa81.aii+1出力確率分布をす、cとす
る。出力確率分布が、次のような混合数2の混合ガウス
分布で表されるとする。For each state i, the state transition probability a iii a ii
*+ (a = i+ a it, , : l ) and the output probability distribution b, (y) for the output vector y are determined. The state transition probability and output probability distribution of model A are expressed as aA'bA(y), respectively. If the output vector probability distribution is represented by a single Gaussian distribution, then b+A(y)=N (y, μ, ^, Σ, ^) b=”
It is expressed as (y)=N (y, μ4−Σ, II). Here, N (y, μ1.Σ,) represents a multidimensional Gaussian distribution with a mean vector of μ8 and a covariance matrix of Σ. From model A and model B, HMM model C (5) for speaker-independent speech recognition is created. The state transition probability of model C is a81. Let c be the output probability distribution of aii+1. Assume that the output probability distribution is represented by the following Gaussian mixture distribution with a mixture number of 2.
tle(y)=λ’N (y、 μi’+ Σ、1
)十λ”N (y、 IJi”+ Σ、′)このとき、
モデルCの各パラメータを次のように定める。tle(y)=λ'N (y, μi'+ Σ, 1
) 10λ”N (y, IJi”+ Σ, ′) At this time,
Each parameter of model C is determined as follows.
ai=c= (ai♂+aii”)/2a ii。1°
”” (aii++A+ a fis+”) / 2μ
、′=μ、′、Σ1′ −Σ、A
IJ i” −JJ iB+ Σ、2 =Σ、′
λ1 =λ2 =1/2
このようにして作成されたモデルCは、そのまま不特定
話者音声認識用の8MMモデルとして用いることもでき
、また、さらに多数の話者の学習データ(6)を用いて
Baum−Welch法などで学習を行い、よりよいモ
デルC’ (7)を作成するための初期モデルとして用
いることもできる。ai=c= (ai♂+aii”)/2a ii.1°
"" (aii++A+ a fis+") / 2μ
,′=μ,′,Σ1′ −Σ,A IJ i” −JJ iB+ Σ,2 =Σ,′
λ1 = λ2 = 1/2 The model C created in this way can be used as is as an 8MM model for speaker-independent speech recognition, or it can be used as an 8MM model for speaker-independent speech recognition, or it can be used as an 8MM model for speaker-independent speech recognition. It can also be used as an initial model for creating a better model C' (7) by performing learning using the Baum-Welch method or the like.
モデルA、Bとして出力確率分布が混合ガウス分布で表
されるものが用意されている場合にも、同様にモデルC
を作成することができる。この場合、モデルCの出力確
率分布の混合数は、モデルA、Bの出力確率分布の混合
数の和になる。Similarly, when models A and B are prepared whose output probability distribution is a Gaussian mixture distribution, model C
can be created. In this case, the number of mixed output probability distributions of model C is the sum of the number of mixed output probability distributions of models A and B.
次に、第2の発明の一実施例について説明する。Next, an embodiment of the second invention will be described.
多数の話者が発声した少数語案の音声データをクラスタ
リングすることにより話者をM個のクラスタに分け、各
クラスタからクラスタ中心の話者M名を選ぶ。M名の各
話者について、HMM学習に必要な量の音声データをも
とに、出力確率分布が単一ガウス分布で表されるHMM
モデルを学習して作成する。作成されたM個のモデルか
ら、第1の発明の実施例と同様に混合数がMの混合ガウ
ス分布を出力確率分布とするHMMモデルを作成するこ
とにより不特定話者音声認識用のHMMモデルが得られ
る。M名の話者を選ぶためのクラスタリングに用いるデ
ータは少数のデータでよいので、従来の技術の(C)に
比べ計算量は少なくなる。The speakers are divided into M clusters by clustering the voice data of the few words uttered by a large number of speakers, and M names of speakers at the center of the cluster are selected from each cluster. An HMM whose output probability distribution is represented by a single Gaussian distribution for each of M speakers, based on the amount of audio data required for HMM learning.
Learn and create models. From the M models created, an HMM model for speaker-independent speech recognition is created by creating an HMM model whose output probability distribution is a Gaussian mixture distribution with M mixtures, as in the embodiment of the first invention. is obtained. Since only a small amount of data is required for clustering to select M speakers, the amount of calculation is reduced compared to the conventional technique (C).
最後に、第3の発明の一実施例について説明する。第1
の発明の実施例において、モデルA、 Bの選び方とし
て、ある話者の異なる環境下(たとえば、静かな環境と
雑音の多い環境)で発声したデータを用いて学習したモ
デルを用いれば、モデルCとして環境の変動に強い認識
モデルを作成することができる。Finally, an embodiment of the third invention will be described. 1st
In the embodiment of the invention, models A and B can be selected by using models trained using data uttered by a certain speaker under different environments (for example, a quiet environment and a noisy environment). As a result, it is possible to create a recognition model that is resistant to changes in the environment.
以上述べたように、第1の発明によれば、すでに学習さ
れている複数の標準パターンを用いて、ベクトル出力確
率が混合連続分布で表される標準パターンのパラメータ
を簡単に決定することができ、そのまま、あるいはこの
値を初期値とした少数回の学習でパターン認識に用いる
ことができる。As described above, according to the first invention, it is possible to easily determine the parameters of a standard pattern whose vector output probability is represented by a mixed continuous distribution using a plurality of standard patterns that have already been learned. , it can be used for pattern recognition either as is or by a small number of training sessions with this value as the initial value.
また、第2.第3の発明によれば、不特定話者用、環境
の変動に強い標準パターンをそれぞれ簡易に作成するこ
とができる。Also, the second. According to the third invention, it is possible to easily create standard patterns for unspecified speakers and that are resistant to environmental changes.
第1図は、第1の発明を不特定話者音声認識用のHMM
モデル作成に適用した実施例を説明するためのブロック
図、
第2図は、実施例におけるHMMモデルの形を示す図で
ある。
1・・・・・話者Aの学習データ
2・・・・・話者Bの学習データ
3・・・・・f(MMモデルA
4・・・・・HMMモデルB
5・・・・・HMMモデルC
6・・・・・多数話者の学習データ
7・・・・・HMMモデルC′
代理人 弁理士 岩 佐 義 幸FIG. 1 shows the first invention as an HMM for speaker-independent speech recognition.
A block diagram for explaining the embodiment applied to model creation. FIG. 2 is a diagram showing the shape of the HMM model in the embodiment. 1...Speaker A's learning data 2...Speaker B's learning data 3...f (MM model A 4...HMM model B 5... HMM model C 6...Multi-speaker learning data 7...HMM model C' Agent Patent attorney Yoshiyuki Iwasa
Claims (3)
移のベクトル出力確率とによって定義される標準パター
ンの作成方法において、 ベクトル出力確率が連続分布で表される複数の標準パタ
ーンの対応する状態あるいは遷移のベクトル出力確率分
布を重み付きで混合した混合連続分布を状態あるいは遷
移のベクトル出力確率とする標準パターンを作成するこ
とを特徴とする標準パターン作成方法。(1) In a method for creating standard patterns defined by a set of states, transition probabilities between states, and vector output probabilities of states or transitions, corresponding states of multiple standard patterns whose vector output probabilities are expressed in a continuous distribution Alternatively, a standard pattern creation method is characterized in that a standard pattern is created in which the vector output probability of a state or transition is a mixed continuous distribution obtained by mixing the vector output probability distributions of transitions with weights.
移のベクトル出力確率とによって定義される音声認識用
の標準パターンの作成方法において、複数の話者につい
て話者ごとにその話者の音声データを用いて学習して作
成されたベクトル出力確率が連続分布で表される標準パ
ターンの対応する状態あるいは遷移のベクトル出力確率
分布を重み付きで混合した混合連続分布を状態あるいは
遷移のベクトル出力確率とする標準パターンを作成する
ことを特徴とする標準パターン作成方法。(2) In a method for creating a standard pattern for speech recognition defined by a set of states, transition probabilities between states, and vector output probabilities of states or transitions, each speaker's voice is The vector output probability of a state or transition is a mixed continuous distribution that is a weighted mixture of the vector output probability distribution of the corresponding state or transition of a standard pattern in which the vector output probability created by learning using data is expressed as a continuous distribution. A standard pattern creation method characterized by creating a standard pattern.
移のベクトル出力確率とによって定義される音声認識用
の標準パターンの作成方法において、異なる環境で発声
あるいは収録した音声データを用いて環境ごとに学習し
て作成されたベクトル出力確率が連続分布で表される標
準パターンの対応する状態あるいは遷移のベクトル出力
確率分布を重み付きで混合した混合連続分布を状態ある
いは遷移のベクトル出力確率とする標準パターンを作成
することを特徴とする標準パターン作成方法。(3) In a method for creating standard patterns for speech recognition defined by a set of states, transition probabilities between states, and vector output probabilities of states or transitions, speech data uttered or recorded in different environments is used to create a standard pattern for each environment. A standard in which the vector output probability of a state or transition is a mixed continuous distribution that is a weighted mixture of the vector output probability distributions of the corresponding state or transition of a standard pattern in which the vector output probability created by learning is expressed as a continuous distribution. A standard pattern creation method characterized by creating a pattern.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP24686390A JP3251005B2 (en) | 1990-09-17 | 1990-09-17 | Standard pattern creation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP24686390A JP3251005B2 (en) | 1990-09-17 | 1990-09-17 | Standard pattern creation method |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH04125599A true JPH04125599A (en) | 1992-04-27 |
JP3251005B2 JP3251005B2 (en) | 2002-01-28 |
Family
ID=17154851
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP24686390A Expired - Fee Related JP3251005B2 (en) | 1990-09-17 | 1990-09-17 | Standard pattern creation method |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP3251005B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08123468A (en) * | 1994-10-24 | 1996-05-17 | Atr Onsei Honyaku Tsushin Kenkyusho:Kk | Unspecified speaker model generating device and speech recognition device |
US7603276B2 (en) | 2002-11-21 | 2009-10-13 | Panasonic Corporation | Standard-model generation for speech recognition using a reference model |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6367197A (en) * | 1986-09-09 | 1988-03-25 | 松田 健次 | Elliptic trammel |
-
1990
- 1990-09-17 JP JP24686390A patent/JP3251005B2/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6367197A (en) * | 1986-09-09 | 1988-03-25 | 松田 健次 | Elliptic trammel |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08123468A (en) * | 1994-10-24 | 1996-05-17 | Atr Onsei Honyaku Tsushin Kenkyusho:Kk | Unspecified speaker model generating device and speech recognition device |
US7603276B2 (en) | 2002-11-21 | 2009-10-13 | Panasonic Corporation | Standard-model generation for speech recognition using a reference model |
Also Published As
Publication number | Publication date |
---|---|
JP3251005B2 (en) | 2002-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110444214B (en) | Speech signal processing model training method and device, electronic equipment and storage medium | |
Chen et al. | Deep attractor network for single-microphone speaker separation | |
Huo et al. | On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate | |
Qi et al. | Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier | |
Skowronski et al. | Automatic speech recognition using a predictive echo state network classifier | |
US10629185B2 (en) | Statistical acoustic model adaptation method, acoustic model learning method suitable for statistical acoustic model adaptation, storage medium storing parameters for building deep neural network, and computer program for adapting statistical acoustic model | |
Nefian et al. | Dynamic Bayesian networks for audio-visual speech recognition | |
US6343267B1 (en) | Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques | |
EP0750293B1 (en) | Triphone hidden Markov model (HMM) design method and apparatus | |
WO2019017403A1 (en) | Mask calculating device, cluster-weight learning device, mask-calculating neural-network learning device, mask calculating method, cluster-weight learning method, and mask-calculating neural-network learning method | |
Fujita et al. | Neural speaker diarization with speaker-wise chain rule | |
Lee et al. | Ensemble of jointly trained deep neural network-based acoustic models for reverberant speech recognition | |
Zweig et al. | Probabilistic modeling with Bayesian networks for automatic speech recognition. | |
Delcroix et al. | Context adaptive neural network based acoustic models for rapid adaptation | |
Yu et al. | Cam: Context-aware masking for robust speaker verification | |
WO2020170907A1 (en) | Signal processing device, learning device, signal processing method, learning method, and program | |
KR100574769B1 (en) | Speaker and environment adaptation based on eigenvoices imcluding maximum likelihood method | |
Sagi et al. | A biologically motivated solution to the cocktail party problem | |
Girin et al. | Audio source separation into the wild | |
Srijiranon et al. | Thai speech recognition using Neuro-fuzzy system | |
US20050021335A1 (en) | Method of modeling single-enrollment classes in verification and identification tasks | |
JPH04125599A (en) | Reference pattern generating method | |
CN116434758A (en) | Voiceprint recognition model training method and device, electronic equipment and storage medium | |
JPH0486899A (en) | Standard pattern adaption system | |
Delfarah et al. | Talker-independent speaker separation in reverberant conditions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20071116 Year of fee payment: 6 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20081116 Year of fee payment: 7 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20081116 Year of fee payment: 7 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20091116 Year of fee payment: 8 |
|
LAPS | Cancellation because of no payment of annual fees |