JP2001083986A - Method for forming statistical model - Google Patents

Method for forming statistical model

Info

Publication number
JP2001083986A
JP2001083986A JP26193599A JP26193599A JP2001083986A JP 2001083986 A JP2001083986 A JP 2001083986A JP 26193599 A JP26193599 A JP 26193599A JP 26193599 A JP26193599 A JP 26193599A JP 2001083986 A JP2001083986 A JP 2001083986A
Authority
JP
Japan
Prior art keywords
data
learning data
likelihood
learning
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP26193599A
Other languages
Japanese (ja)
Other versions
JP3525082B2 (en
Inventor
Satoshi Nakagawa
聡 中川
Yoshikazu Yamaguchi
義和 山口
Shoichi Matsunaga
昭一 松永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP26193599A priority Critical patent/JP3525082B2/en
Publication of JP2001083986A publication Critical patent/JP2001083986A/en
Application granted granted Critical
Publication of JP3525082B2 publication Critical patent/JP3525082B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To improve the confirmation ratio and shorten the HMM forming time by determining the tolerance when the same category as learning data is imparted to the learning data, selecting the tolerance of this learning data as a judgment reference, and learning a statistical model by use of only the selected learning data. SOLUTION: 'Learning data 1' for learning an acoustic model 1 (HMM1)' used for determining the tolerance and 'learning data 2' to be selected, which is the data for learning an 'acoustic model 2 (HMM2)' for recognition, are prepared. A preliminarily HMM1 ('acoustic model 1') is formed by use of the 'learning data 1'. The preliminary HMM1 formed therein is used to determine the tolerance when a right answer (for example, right answer sound element sequence) is imparted to each data of the 'learning data 2'. A threshold is provided in this tolerance, and a 'leaning data 2'' having a tolerance exceeding the threshold is selected to generate the 'acoustic model 2' for recognizing processing.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】本発明は、例えば音声、文
字、図形などのような認識すべき対象を、統計モデル、
例えば隠れマルコフモデル(Hidden Markov Model,以下
HMMと記す)(例えば中川他“確率モデルにおける音
声認識”,電子情報通信学会,1997)、を用いて表
現するパターン認識において、モデル作成時に尤度を用
いてモデル作成時の学習データを選択することによっ
て、頑健で高性能なモデルを作成し、この統計モデルを
用いて認識実行時の認識率の向上を目指し、かつHMM
作成時間を短縮できる統計モデル作成方法に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a statistical model,
For example, in pattern recognition expressed using a Hidden Markov Model (HMM) (for example, Nakagawa et al., "Speech Recognition in Stochastic Models", IEICE, 1997), likelihood is used at the time of model creation. A robust and high-performance model is created by selecting the learning data at the time of model creation by using the statistical model.
The present invention relates to a method for creating a statistical model that can shorten the creation time.

【0002】[0002]

【従来の技術】本発明は、音声を例に説明しているが、
入力音声データが如何なるカテゴリに属するかを認識す
るに当たって、当該入力音声データの特徴を統計モデル
で表して、当該統計モデル(以下では、説明をより具体
的にするためにHMMと記すが、他のいかなる統計(確
率)モデル、例えばニューラルネット等であってもよ
い)を用いた様々なパターン認識をする場合に作成する
モデルにおいて適用可能である。
2. Description of the Related Art Although the present invention has been described using voice as an example,
In recognizing which category the input voice data belongs to, the characteristics of the input voice data are represented by a statistical model, and the statistical model (hereinafter, referred to as HMM for more specific description, The present invention is applicable to a model created when performing various pattern recognition using any statistical (probability) model, for example, a neural network.

【0003】音声認識では、学習用音声データから求め
たHMM(音素モデル、音節モデル、単語モデルなど)
と入力音声データとを照合して両者の整合の程度を尤度
として求め、認識結果を得る。HMMのパラメータは学
習用音声データを収録した条件(背景雑音、回線歪み、
話者など)に大きく依存する。したがって、この音声収
録条件と実際の認識時が異なる場合、入力音声パターン
とHMMとの不整合が生じ、結果として認識率が低下す
る。
In speech recognition, an HMM (phoneme model, syllable model, word model, etc.) obtained from learning speech data is used.
Is compared with the input voice data, the degree of matching between the two is determined as likelihood, and a recognition result is obtained. The parameters of the HMM are the conditions for recording the learning speech data (background noise, line distortion,
Greatly depends on the speaker. Therefore, if the voice recording conditions are different from the actual recognition time, a mismatch between the input voice pattern and the HMM occurs, and as a result, the recognition rate decreases.

【0004】[0004]

【発明が解決しようとする課題】入力音声データとHM
Mとの不整合による認識率の低下を防ぐには、認識を実
行する際の条件と同じ条件で収録した音声データを用い
て、モデルを作成すれば良い。しかし、HMMのような
統計的手法に基づくモデルは、作成処理に時間がかかる
(約400時間以上)。またこれら音声データの中に
は、評価の対象となる音声データ(以下、評価データと
記す)と大きく特性が異なるものが含まれている。この
ような音声を学習に用いた場合、HMMの精度が悪化す
る恐れがある。
SUMMARY OF THE INVENTION Input voice data and HM
In order to prevent a reduction in the recognition rate due to mismatch with M, a model may be created using audio data recorded under the same conditions as those for performing recognition. However, a model based on a statistical method such as an HMM requires a long time for the creation process (about 400 hours or more). Further, among these audio data, there is included audio data having characteristics that are significantly different from audio data to be evaluated (hereinafter referred to as evaluation data). When such speech is used for learning, the accuracy of the HMM may be degraded.

【0005】本発明は、上記に鑑みてなされたもので、
その目的とするところは、HMMを学習する際にその学
習データの尤度を計算し、その尤度を基準として学習デ
ータを選択する方法を提供することによって認識率を向
上させ、かつHMM作成時間を短縮する事にある。
[0005] The present invention has been made in view of the above,
The purpose is to improve the recognition rate by calculating the likelihood of the learning data when learning the HMM and providing a method of selecting the learning data based on the likelihood, and improving the HMM creation time. Is to shorten.

【0006】[0006]

【課題を解決するための手段】上記目的を達成するた
め、請求項1記載の発明では、入力データに対応する入
力ベクトル時系列に対し、各認識カテゴリの特徴を表現
した統計モデルを用いて、各認識カテゴリに対する尤度
を計算し、最も尤度の高い統計モデルが表現するカテゴ
リを認識結果として出力するパターン認識において、モ
デル作成時に用いる学習データを、学習データの尤度を
求めるための予備統計モデルを作成し、この予備統計モ
デルを用いて、学習データに学習データと同じカテゴリ
を与えた場合での尤度を求めて、その学習データの尤度
を判断基準として選択し、選択した学習データのみを用
いて統計モデルを学習するようにする。
In order to achieve the above object, according to the first aspect of the present invention, a statistical model expressing the features of each recognition category is used for an input vector time series corresponding to input data. In pattern recognition, which calculates the likelihood for each recognition category and outputs the category represented by the statistical model with the highest likelihood as a recognition result, the learning data used when creating the model is calculated by using preliminary statistics to determine the likelihood of the learning data. A model is created, using this preliminary statistical model, a likelihood is obtained when the same category as the training data is given to the training data, and the likelihood of the training data is selected as a criterion. Learn the statistical model using only

【0007】請求項1記載の発明では、このように認識
時の判断基準である尤度を用い、学習データの正解に対
する尤度を求め、閾値を設けて学習データの取捨選択を
行う。
In the first aspect of the present invention, the likelihood for the correct answer of the learning data is obtained by using the likelihood as the criterion at the time of recognition, and a threshold is provided to select the learning data.

【0008】請求項2記載の発明では、学習データの選
択の基準に尤度値を閾値として用いて、尤度が閾値以上
の入力データを取捨選択する。
According to the second aspect of the present invention, the likelihood value is used as a threshold as a criterion for selecting learning data, and input data whose likelihood is equal to or larger than the threshold are selected.

【0009】また請求項3記載の発明では、学習データ
の選択の基準に、入力データを尤度の高い順番に並べ、
上位のデータを用いるようにする。
According to the third aspect of the present invention, input data is arranged in order of likelihood based on the selection criteria of learning data.
Use higher-order data.

【0010】[0010]

【発明の実施の形態】図1Aは、3状態のHMMの例を
示す。この様なモデルを音声単位(カテゴリ)ごとに作
成する。各状態S1からS3には、音声特徴パラメータ
の統計的な分布D1からD3がそれぞれ付与される。例
えば、これが音素モデルであるとすると、第1状態は音
素の始端付近、第2状態は中心付近、第3状態は終端付
近の特徴量の統計的な分布を表現する。
FIG. 1A shows an example of a three-state HMM. Such a model is created for each voice unit (category). To the states S1 to S3, statistical distributions D1 to D3 of the audio feature parameters are given, respectively. For example, assuming that this is a phoneme model, the first state represents the statistical distribution of the feature amount near the beginning of the phoneme, the second state near the center, and the third state near the end.

【0011】HMMの各状態の特徴量分布は、複雑な分
布形状を表現するために、複数の連続確率分布(以下、
混合連続分布と記す)の合成されたものとして表現され
る場合が多い。連続確率分布には、様々な分布が考えら
れるが、正規分布が用いられる場合が多い。また、それ
ぞれの正規分布は、特徴量と同じ次元数の多次元無相関
正規分布で表現されることが多い。
The feature distribution of each state of the HMM is represented by a plurality of continuous probability distributions (hereinafter, referred to as “multiple probability distributions”) in order to express a complicated distribution shape.
(Referred to as a mixture continuous distribution) in many cases. Although various distributions can be considered as the continuous probability distribution, a normal distribution is often used. Further, each normal distribution is often represented by a multidimensional uncorrelated normal distribution having the same number of dimensions as the feature amount.

【0012】図1Bは混合連続分布の例を示す。この図
では平均値ベクトルがμ1 、分散値がσ1 である正規分
布N(μ1 ,σ1 )と同様なN(μ2 ,σ2 )と同様な
N(μ3 ,σ3 )との3つの正規分で表現された場合と
して示されている。例えば分布D1の如き分布が図1B
に示す如き3つの正規分で表現されたものとして与えら
れる。
FIG. 1B shows an example of a continuous mixture distribution. 1 mean vector mu In this figure, a normal distribution N (μ 1, σ 1) variance is sigma 1 and similar N (μ 2, σ 2) and the same N (μ 3, σ 3) and Are shown as three normal parts. For example, a distribution such as distribution D1 is shown in FIG.
Are represented as three normal components.

【0013】時刻tの入力特徴ベクトル Xt =(Xt,1,t,2, … Xt,p T (Pは総次元
数) に対する混合連続分布HMMの状態での出力確率b(X
t )は、
The output probability b () in the state of the mixed continuous distribution HMM with respect to the input feature vector X t = (X t, 1, X t, 2, ... X t, p ) T (P is the total number of dimensions) at time t X
t )

【0014】[0014]

【数1】 (Equation 1)

【0015】のように計算される。ここでWk は状態に
含まれるk番目の多次元正規分布kに対する重み係数を
表す。多次元正規分布kに対する確率密度Pk (Xt
The calculation is as follows. Here, W k represents a weight coefficient for the k-th multidimensional normal distribution k included in the state. Probability density P k (X t ) for multidimensional normal distribution k
Is

【0016】[0016]

【数2】 (Equation 2)

【0017】のように計算される。ここでμk は状態の
k番目の多次元正規分布kに対する平均値ベクトル、Σ
k は同じく共分散行列を表す。共分散行列が対角成分の
み、つまり対角共分散行列であるとすると、P
k (Xt )の対数値は、
Is calculated as follows. Where μ k is the mean vector for the k-th multidimensional normal distribution k of the state, Σ
k also represents a covariance matrix. If the covariance matrix is only diagonal components, that is, a diagonal covariance matrix, then P
The logarithmic value of k (X t ) is

【0018】[0018]

【数3】 (Equation 3)

【0019】と表せる。## EQU1 ##

【0020】ここでμk,i は状態の第k番目の多次元正
規分布の平均値ベクトルの第i番目の成分を,σk,i
状態の第k番目の多次元正規分布の共分散行列の第i番
目の対角成分(分散値)を表す。このように尤度はHM
Mと音声データの類似度とを表す。
Here, μ k, i is the ith component of the mean vector of the k-th multidimensional normal distribution of the state , and σ k, i is the covariance of the k-th multidimensional normal distribution of the state. Represents the ith diagonal component (variance value) of the matrix. Thus, the likelihood is HM
M and the similarity of audio data.

【0021】図2は音響モデル作成のフローチャートを
示す。音声データを音響分析部で分析し、その音声に対
応する正解ラベルを用いて、HMMは上記のW、μ、σ
を推定するように初期音響モデルから作成される。
FIG. 2 shows a flow chart for creating an acoustic model. The audio data is analyzed by the acoustic analysis unit, and the HMM calculates the above W, μ, σ using the correct answer label corresponding to the audio.
Is estimated from the initial acoustic model.

【0022】音声データとして例えば「あらゆる現実・
・・」や「テレビゲームや・・・」の如きデータが与え
られるとき、正解ラベル(音韻列)として上記夫々の音
声データに対応する「arayuru・・・」や「te
lebi・・・」が用意される。そして、例えば「a」
や「r」や・・・の夫々に対応して、学習の結果で得ら
れるべき音響モデルのいわば見本として、初期音響モデ
ルが用意される。
As audio data, for example, "all realities
When data such as ".." or "video game or..." Is given, "arayuru..." Or "te
lebi ... "is prepared. And, for example, "a"
, "R" and so on, an initial acoustic model is prepared as a so-called sample of an acoustic model to be obtained as a result of learning.

【0023】図示の初期音響モデルは、例えば「a」に
対応して、26次元でかつ4混合で3状態のものとし
て、合計312個の正規分布 N(μ1 ,σ1 ),N(μ2 ,σ2 )・・・・・N(μ312 ,σ312 ) を指定すべく μ1 (平均) 0.0;σ1 (分散) 1.0 μ2 (平均) 0.0;σ2 (分散) 1.0 ・・・ が指示される。図示の学習部は、(i)夫々の音声デー
タについて音響分析部にて分析して得た線形予測分析
(LPC)ケプストラムや他のMFCC(メルケプスト
ラム)等と、(ii)正解ラベルと、(iii)初期音響モデ
ルとを与えられて、学習の結果で、例えば「a」が μ1 (平均) 0.01;σ1 (分散) 0.2 μ2 (平均)−0.03;σ2 (分散) 0.04 ・・・ の如き複数個の正規分布の合成されたもので代表される
ものであることを得る。
The illustrated initial acoustic model is, for example, corresponding to “a”, is a 26-dimensional, 4-mixed, 3-state model, and has a total of 312 normal distributions N (μ 1 , σ 1 ), N (μ 2 , σ 2 )... To specify N (μ 312 , σ 312 ) μ 1 (mean) 0.0; σ 1 (variance) 1.0 μ 2 (mean) 0.0; σ 2 (Dispersion) 1.0 ... is indicated. The learning unit illustrated includes (i) a linear prediction analysis (LPC) cepstrum or another MFCC (mel cepstrum) obtained by analyzing each sound data by the acoustic analysis unit, (ii) a correct answer label, iii) Given the initial acoustic model and learning results, for example, “a” is μ 1 (average) 0.01; σ 1 (variance) 0.2 μ 2 (average) −0.03; σ 2 (Variance) 0.04... Is represented by a composite of a plurality of normal distributions.

【0024】なお、以後の認識処理に当たっては、認識
対象となる入力データ中の例えば「a」について、上記
と同様な音響モデルを得た上で、正解となる「a」につ
いての音響モデルとの距離を計算して、当該入力データ
中の上記「a」が正解「a」に対応するものであると認
識するようにする。
In the subsequent recognition processing, for example, “a” in the input data to be recognized is obtained in the same acoustic model as described above, and then the correct acoustic model for “a” is obtained. The distance is calculated to recognize that “a” in the input data corresponds to the correct answer “a”.

【0025】図示学習部における学習処理は、従来公知
の (i)学習アルゴリズム(離散HMM) (ii)学習アルゴリズム(多次元正規分布) (iii)学習アルゴリズム(混合正規分布) (iv)学習アルゴリズム(半連続HMM) などを用いることができる。当該夫々のアルゴリズムに
ついては、鹿野清宏、中村哲、伊勢史郎著、発行者 阿
井國昭、発行所 株式会社昭晃堂 1997年11月1
0日初版1刷発行「音声・音情報のディジタル信号処理
(ディジタル信号処理シリーズ5)」の第74頁ないし
第79頁に解説されており、本発明者は実験に当たっ
て、上記の「学習アルゴリズム(混合正規分布)」を用
いた。
The learning process in the illustrated learning unit includes a conventionally known (i) learning algorithm (discrete HMM) (ii) learning algorithm (multidimensional normal distribution) (iii) learning algorithm (mixed normal distribution) (iv) learning algorithm ( Semi-continuous HMM) can be used. The algorithms are described by Kiyohiro Kano, Satoshi Nakamura and Shiro Ise, published by Kuniaki Ai, published by Shokodo Co., Ltd. November 1, 1997
This is described on pages 74 to 79 of "Digital Signal Processing of Voice / Sound Information (Digital Signal Processing Series 5)" issued on the first edition of the 0th edition, and the inventor described the above "learning algorithm ( Mixed normal distribution) ".

【0026】図3は認識のフローチャートを示す。認識
では、図3のように、認識候補のモデルについて、尤度
計算を入力音声の各フレームの特徴量ベクトルに対して
行い、得られる全音声の累積をフレーム数で割った尤度
の対数値と認識候補とが記述されている単語辞書を用
い、認識結果を出力する。
FIG. 3 shows a flowchart of the recognition. In the recognition, as shown in FIG. 3, for the recognition candidate model, likelihood calculation is performed on the feature vector of each frame of the input speech, and the logarithmic value of the likelihood obtained by dividing the cumulative total of the obtained speech by the number of frames. A recognition result is output using a word dictionary in which the word and a recognition candidate are described.

【0027】即ち、図3に示す「音響モデル」として図
2において得られている「音響モデル」が使用され、評
価データ(即ち、認識対象となる入力データ)例えば
「でもやる事について男女の差はありません」や「日米
関係は重要であろう」・・・などが与えられて、音響分
析部で上述のLPCケプストラムが得られた上で、認識
部に供給される。
That is, the “acoustic model” obtained in FIG. 2 is used as the “acoustic model” shown in FIG. 3, and the evaluation data (that is, the input data to be recognized), for example, Are not provided "," Japan-US relations will be important ",..., Etc., and the above-mentioned LPC cepstrum is obtained by the acoustic analysis unit, and then supplied to the recognition unit.

【0028】上述の如く、音響モデルを用意した上で
「単語辞書」と対応づけて認識処理が行われるが、本発
明では従来技術より好ましい「音響モデル」が得られ
る。
As described above, after the acoustic model is prepared, the recognition process is performed in association with the "word dictionary". According to the present invention, a "acoustic model" more preferable than the prior art is obtained.

【0029】認識部においては、音響分析部からのLP
Cケプストラムを用いて得た音響モデルと上述の図2に
示す如き「音響モデル」との距離を計算して、例えば
「d」「e」「m」「o」「y」「a」・・・について
の認識を得て、「単語辞書」を利用して、「認識結果」
を得る。
In the recognition section, the LP from the acoustic analysis section
The distance between the acoustic model obtained using the C cepstrum and the “acoustic model” as shown in FIG. 2 described above is calculated, and for example, “d”, “e”, “m”, “o”, “y”, “a”,.・ Recognition is obtained and “recognition result”
Get.

【0030】上述した如く、音響モデルを用意した上で
「単語辞書」と対応づけて認識処理が行われるが、本発
明では、より好ましい「音響モデル」を得られる。
As described above, after the acoustic model is prepared, the recognition process is performed in association with the "word dictionary". In the present invention, a more preferable "acoustic model" can be obtained.

【0031】即ち本発明では認識時の判断基準である尤
度を用い、学習データの正解に対する尤度を求め、閾値
を設けて学習データの取捨選択を行う。このように尤度
を判断基準とすることで、正解ラベルが間違っているデ
ータや、雑音を含むデータや、音声が途中で切れている
データなどを除外する事ができる。
That is, in the present invention, the likelihood for the correct answer of the learning data is obtained by using the likelihood which is a criterion at the time of recognition, and a threshold is provided to select the learning data. By using the likelihood as a criterion in this way, it is possible to exclude data having an incorrect correct label, data containing noise, data in which voice is cut off halfway, and the like.

【0032】図4は尤度を基準とした学習データ選択の
フローチャートを示す。尤度を求めるために用いる「音
響モデル1(HMM1)」を学習するための「学習デー
タ1」と、認識用の「音響モデル2(HMM2)」を学
習するためのデータで選択対象である「学習データ2」
とを用意する。このとき学習データ2は学習データ1を
含んでも含まなくても良い。
FIG. 4 shows a flowchart of the learning data selection based on the likelihood. "Learning data 1" for learning "Acoustic model 1 (HMM1)" used for obtaining likelihood and "Learning model 2 (HMM2)" for recognition are data to be selected. Learning data 2 "
And prepare. At this time, the learning data 2 may or may not include the learning data 1.

【0033】まずは学習データ1を用いて予備HMM1
(「音響モデル1」)を作成する。ここで作成した予備
HMM1を用いて、学習データ2の各データに対して正
解(例えば正解音素列)を与えた時の尤度を求める。次
にこの尤度に閾値を設け、閾値以上の尤度がある学習デ
ータ2’を選び出して、認識処理のための音響モデル2
を生成する。
First, the preliminary HMM 1 is set using the learning data 1.
(“Acoustic model 1”) is created. Using the preliminary HMM 1 created here, the likelihood when a correct answer (for example, a correct phoneme sequence) is given to each data of the learning data 2 is obtained. Next, a threshold value is set for this likelihood, and learning data 2 ′ having a likelihood greater than or equal to the threshold value is selected, and an acoustic model 2 for recognition processing is selected.
Generate

【0034】図5は学習データを尤度の閾値により選択
するフローチャートを示す。図5においては「学習デー
タ2」として、「予防や健康管理リハビリテーション・
・・」や「出口のない・・・」の如きものが任意に与え
られる。そして、「音響分析部」によりLPCケプスト
ラムを得て、「音響モデル1(図2の「音響モデル」;
図4の「音響モデル1」)」を用い、「正解ラベル」
(図2の如き正解ラベル)を用いて、「尤度計算部」に
て尤度を計算する。
FIG. 5 shows a flowchart for selecting learning data based on the likelihood threshold. In FIG. 5, “learning data 2” includes “prevention and health management rehabilitation
.. "and" No exit ... "are given arbitrarily. Then, the LPC cepstrum is obtained by the “acoustic analysis unit”, and “acoustic model 1 (“ acoustic model ”in FIG. 2);
"Acoustic model 1" in Fig. 4) "
The likelihood is calculated by the “likelihood calculation unit” using the (correct answer label as shown in FIG. 2).

【0035】その結果で学習データ選択部が、「学習デ
ータ2」として入力した各学習データについて、尤度の
高いものから順に並べる。図示の場合No.3の「わずかな
収入をやりくりして」が尤度「79.6」を得、No.5の
ものが尤度「77.7」を得、No.2の「出口のない・・
・」が尤度「74.8」を得・・・ていることが判った
とき、「学習データ2’」(図4の「学習データ
2’」)としてNo.3のもの、No.5のもの、No.2のもの、
No.6のもの、No.1のもの、No.7のものが夫々尤度として
「70」以上であったとして選択される。
Based on the result, the learning data selection unit arranges the learning data input as "learning data 2" in order from the one having the highest likelihood. In the case shown in the figure, No. 3 “Take a small amount of income” got likelihood “79.6”, No. 5 got likelihood “77.7”, and No. 2 “Exit Absent··
.. Has a likelihood of “74.8”, it is determined that “learning data 2 ′” (“learning data 2 ′” in FIG. 4) is No. 3 and No. 5 , No.2,
No. 6, No. 1, and No. 7 are each selected as having a likelihood of “70” or more.

【0036】図6は学習データを尤度の上位x%により
選択するフローチャートを示す。上位数%から数十%の
学習データを選択し、学習データ2’を作成する。即
ち、図6の場合には、尤度の高いものから、全体の上位
x%の学習データを選択する。図示の場合には、No.3の
もの、No.5のもの、No.2のもの、No.6のもの、No.1のも
のが上位x%に入るものとして選択されている。即ち
「学習データ2’」として選択されている。
FIG. 6 shows a flowchart for selecting the learning data based on the upper x% of the likelihood. The learning data of the upper few% to several tens% is selected, and learning data 2 'is created. That is, in the case of FIG. 6, the uppermost x% of the learning data is selected from those having the highest likelihood. In the case shown in the figure, No. 3, No. 5, No. 2, No. 6, and No. 1 are selected as the top x%. That is, it is selected as "learning data 2 '".

【0037】上記の手法いずれかにより「学習データ
2’」を選択し、この学習データ2’を用いて再び図4
に示す如く、「音響モデル2(HMM2)」を作成す
る。
The "learning data 2 '" is selected by any of the above-mentioned methods, and FIG.
As shown in (1), “acoustic model 2 (HMM2)” is created.

【0038】この学習の際、閾値の設定により学習デー
タ2’の量を調節する事が出来、HMM2の作成に必要
な時間を調節する事が可能になる。
At the time of this learning, the amount of the learning data 2 'can be adjusted by setting the threshold value, and the time required for creating the HMM 2 can be adjusted.

【0039】学習データ選択の効果を見る為に音声認識
実験を行った。HMMを作成する際に用いているマシン
はSun Ultra Enterprise 450MHz である。学習デー
タはニュース放送音声を用いており、評価にはニュース
音声50文を認識させている。語彙サイズは20000
語である。実験結果を
A speech recognition experiment was performed to see the effect of learning data selection. The machine used to create the HMM is a Sun Ultra Enterprise 450 MHz. The learning data uses a news broadcast voice, and the evaluation recognizes 50 sentences of the news voice. Vocabulary size is 20,000
Is a word. Experiment results

【0040】[0040]

【表1】 [Table 1]

【0041】に表す。学習データ1については選択を行
っていない、6666文を用いて300時間かけて学習
したモデルの認識率は93.23%で、本発明の学習デ
ータ選択により3894文を用いて240時間かけて学
習したモデルの認識率は93.79%となった。上記の
ように本発明による学習データ選択を行うことでHMM
の作成時間は60時間短縮され、また認識率も0.56
%の改善が見られた。
## EQU1 ## The learning data 1 is not selected, the recognition rate of the model learned over 300 hours using 6666 sentences is 93.23%, and the learning data selection according to the present invention is performed over 240 hours using 3894 sentences. The recognition rate of the model thus obtained was 93.79%. By performing the learning data selection according to the present invention as described above, the HMM
Creation time was reduced by 60 hours and the recognition rate was 0.56.
% Improvement was seen.

【0042】[0042]

【発明の効果】以上説明した如く、本発明によれば、H
MMを作成する場合においては学習データを特に尤度を
用いて選択する事によって、認識率が向上し、かつHM
Mを作成する際の時間を短縮することが出来る。また学
習データとして一般に特定の話者ならびに発声状態の音
声が用いられうる。かかる音声によって作成された音響
モデルを使用して音声認識を行うと学習時とは特性の異
なる音声に対して認識率が著しく低下する。しかし、本
発明では予め学習されたモデルとの尤度を計算し、尤度
が異常な値をとる音声を学習データとして使用すること
が避けられる。そのため、作成された音響モデルを用い
た認識率の低下を抑制することが可能になる。
As described above, according to the present invention, H
In the case of creating an MM, the recognition rate is improved by selecting the learning data particularly by using the likelihood, and the HM is selected.
The time for creating M can be reduced. In general, a specific speaker and a voice in a uttered state can be used as the learning data. When speech recognition is performed using an acoustic model created by such speech, the recognition rate of speech having characteristics different from those at the time of learning is significantly reduced. However, according to the present invention, the likelihood with a model that has been learned in advance is calculated, and it is possible to avoid using speech having an abnormal value as the learning data. Therefore, it is possible to suppress a decrease in the recognition rate using the created acoustic model.

【図面の簡単な説明】[Brief description of the drawings]

【図1】HMMについて説明する図である。FIG. 1 is a diagram illustrating an HMM.

【図2】音響モデルを作成するフローを示す。FIG. 2 shows a flow for creating an acoustic model.

【図3】認識処理のフローを示す。FIG. 3 shows a flow of a recognition process.

【図4】本発明による学習データの選択を行うフローを
示す。
FIG. 4 shows a flow for selecting learning data according to the present invention.

【図5】学習データの選択に当たって尤度の閾値を用い
る例を示す。
FIG. 5 shows an example of using a likelihood threshold in selecting learning data.

【図6】学習データの選択に当たって尤度の上位x%を
採用するようにした例を示す。
FIG. 6 shows an example in which the upper x% of the likelihood is adopted when selecting learning data.

───────────────────────────────────────────────────── フロントページの続き (72)発明者 松永 昭一 東京都千代田区大手町二丁目3番1号 日 本電信電話株式会社内 Fターム(参考) 5D015 GG04 HH03 HH04 HH14  ────────────────────────────────────────────────── ─── Continued on the front page (72) Inventor Shoichi Matsunaga 2-3-1 Otemachi, Chiyoda-ku, Tokyo Nippon Telegraph and Telephone Corporation F-term (reference) 5D015 GG04 HH03 HH04 HH14

Claims (3)

【特許請求の範囲】[Claims] 【請求項1】 入力データに対応する入力ベクトル時系
列に対し、各認識カテゴリの特徴を表現した統計モデル
を用いて、各認識カテゴリに対する尤度を計算し、最も
尤度の高い統計モデルが表現するカテゴリを認識結果と
して出力するパターン認識において、 モデル作成時に用いる学習データを、学習データの尤度
を求めるための予備統計モデルを作成し、この予備統計
モデルを用いて、学習データに学習データと同じカテゴ
リを与えた場合での尤度を求めて、その学習データの尤
度を判断基準として選択し、選択した学習データのみを
用いて統計モデルを学習することを特徴とする統計モデ
ル作成方法。
An input vector time series corresponding to input data is used to calculate a likelihood for each recognition category by using a statistical model expressing features of each recognition category, and a statistical model having the highest likelihood is represented. In pattern recognition, which outputs categories to be recognized as recognition results, a preliminary statistical model for calculating the likelihood of learning data is created for learning data used in model creation. A statistical model creation method characterized in that the likelihood in the case where the same category is given is obtained, the likelihood of the learning data is selected as a criterion, and a statistical model is learned using only the selected learning data.
【請求項2】 請求項1において、学習データの選択の
基準に尤度値を閾値として用いて、尤度が閾値以上の入
力データを用いるようにしたことを特徴とする統計モデ
ル作成方法。
2. The statistical model creating method according to claim 1, wherein a likelihood value is used as a threshold as a criterion for selecting learning data, and input data having a likelihood equal to or larger than the threshold is used.
【請求項3】 請求項1において、学習データの選択の
基準に、入力データを尤度の高い順番に並べ、上位のデ
ータを用いるようにしたことを特徴とする統計モデル作
成方法。
3. The statistical model creation method according to claim 1, wherein input data is arranged in order of likelihood as a criterion for selecting learning data, and higher-order data is used.
JP26193599A 1999-09-16 1999-09-16 Statistical model creation method Expired - Lifetime JP3525082B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP26193599A JP3525082B2 (en) 1999-09-16 1999-09-16 Statistical model creation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP26193599A JP3525082B2 (en) 1999-09-16 1999-09-16 Statistical model creation method

Publications (2)

Publication Number Publication Date
JP2001083986A true JP2001083986A (en) 2001-03-30
JP3525082B2 JP3525082B2 (en) 2004-05-10

Family

ID=17368742

Family Applications (1)

Application Number Title Priority Date Filing Date
JP26193599A Expired - Lifetime JP3525082B2 (en) 1999-09-16 1999-09-16 Statistical model creation method

Country Status (1)

Country Link
JP (1) JP3525082B2 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007065533A (en) * 2005-09-02 2007-03-15 Advanced Telecommunication Research Institute International Sound model generating device and program
JP2008241970A (en) * 2007-03-27 2008-10-09 Kddi Corp Speaker adaptation device, speaker adaptation method and speaker adaptation program
JP2009128490A (en) * 2007-11-21 2009-06-11 Nippon Telegr & Teleph Corp <Ntt> Learning data selecting device, learning data selecting method, program and recording medium, and acoustic model generating device, acoustic model generating method, program, and recording medium
JP2009251510A (en) * 2008-04-10 2009-10-29 Nippon Hoso Kyokai <Nhk> Acoustic processor and program
JP2014109698A (en) * 2012-12-03 2014-06-12 Nippon Telegr & Teleph Corp <Ntt> Speaker adaptation device, speaker adaptation method, and program
JP2016161762A (en) * 2015-03-02 2016-09-05 日本電信電話株式会社 Learning data generation device, method, and program
JP6280997B1 (en) * 2016-10-31 2018-02-14 株式会社Preferred Networks Disease onset determination device, disease onset determination method, disease feature extraction device, and disease feature extraction method
JP2020123198A (en) * 2019-01-31 2020-08-13 中国電力株式会社 Forecast system and forecast method
JP2021131528A (en) * 2020-02-19 2021-09-09 バイドゥ オンライン ネットワーク テクノロジー (ベイジン) カンパニー リミテッド User intention recognition method, device, electronic apparatus, computer readable storage media and computer program
JP7406885B2 (en) 2019-08-01 2023-12-28 キヤノン株式会社 Information processing device, information processing method and program

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007065533A (en) * 2005-09-02 2007-03-15 Advanced Telecommunication Research Institute International Sound model generating device and program
JP4654452B2 (en) * 2005-09-02 2011-03-23 株式会社国際電気通信基礎技術研究所 Acoustic model generation apparatus and program
JP2008241970A (en) * 2007-03-27 2008-10-09 Kddi Corp Speaker adaptation device, speaker adaptation method and speaker adaptation program
JP2009128490A (en) * 2007-11-21 2009-06-11 Nippon Telegr & Teleph Corp <Ntt> Learning data selecting device, learning data selecting method, program and recording medium, and acoustic model generating device, acoustic model generating method, program, and recording medium
JP2009251510A (en) * 2008-04-10 2009-10-29 Nippon Hoso Kyokai <Nhk> Acoustic processor and program
JP2014109698A (en) * 2012-12-03 2014-06-12 Nippon Telegr & Teleph Corp <Ntt> Speaker adaptation device, speaker adaptation method, and program
JP2016161762A (en) * 2015-03-02 2016-09-05 日本電信電話株式会社 Learning data generation device, method, and program
WO2018079840A1 (en) * 2016-10-31 2018-05-03 株式会社Preferred Networks Disease development determination device, disease development determination method, and disease development determination program
JP6280997B1 (en) * 2016-10-31 2018-02-14 株式会社Preferred Networks Disease onset determination device, disease onset determination method, disease feature extraction device, and disease feature extraction method
JP2018077814A (en) * 2016-10-31 2018-05-17 株式会社Preferred Networks Affection determination device for disease, affection determination method for disease, feature extraction device for disease, and feature extraction method for disease
JPWO2018079840A1 (en) * 2016-10-31 2019-09-19 株式会社Preferred Networks Disease determination apparatus, disease determination method, and disease determination program
JP7021097B2 (en) 2016-10-31 2022-02-16 株式会社Preferred Networks Disease morbidity determination device, disease morbidity determination method and disease morbidity determination program
JP2020123198A (en) * 2019-01-31 2020-08-13 中国電力株式会社 Forecast system and forecast method
JP7342369B2 (en) 2019-01-31 2023-09-12 中国電力株式会社 Prediction system, prediction method
JP7406885B2 (en) 2019-08-01 2023-12-28 キヤノン株式会社 Information processing device, information processing method and program
JP2021131528A (en) * 2020-02-19 2021-09-09 バイドゥ オンライン ネットワーク テクノロジー (ベイジン) カンパニー リミテッド User intention recognition method, device, electronic apparatus, computer readable storage media and computer program
US11646016B2 (en) 2020-02-19 2023-05-09 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for recognizing user intention, device, and readable storage medium

Also Published As

Publication number Publication date
JP3525082B2 (en) 2004-05-10

Similar Documents

Publication Publication Date Title
EP0831456B1 (en) Speech recognition method and apparatus therefor
Reynolds et al. Robust text-independent speaker identification using Gaussian mixture speaker models
US6868380B2 (en) Speech recognition system and method for generating phonotic estimates
US9972306B2 (en) Method and system for acoustic data selection for training the parameters of an acoustic model
EP0470245B1 (en) Method for spectral estimation to improve noise robustness for speech recognition
JP2002500779A (en) Speech recognition system using discriminatively trained model
JP3299408B2 (en) Speech recognition method and apparatus using dynamic features
CN106847259B (en) Method for screening and optimizing audio keyword template
US5794198A (en) Pattern recognition method
JPH09160584A (en) Voice adaptation device and voice recognition device
JP3298858B2 (en) Partition-based similarity method for low-complexity speech recognizers
JP3130524B2 (en) Speech signal recognition method and apparatus for implementing the method
US7133827B1 (en) Training speech recognition word models from word samples synthesized by Monte Carlo techniques
Lai et al. Phone-aware LSTM-RNN for voice conversion
WO2022148176A1 (en) Method, device, and computer program product for english pronunciation assessment
JP2001083986A (en) Method for forming statistical model
JP5091202B2 (en) Identification method that can identify any language without using samples
Martinčić-Ipšić et al. Croatian large vocabulary automatic speech recognition
JP2982689B2 (en) Standard pattern creation method using information criterion
US7454337B1 (en) Method of modeling single data class from multi-class data
JP2996925B2 (en) Phoneme boundary detection device and speech recognition device
JP2003271185A (en) Device and method for preparing information for voice recognition, device and method for recognizing voice, information preparation program for voice recognition, recording medium recorded with the program, voice recognition program and recording medium recorded with the program
Kovács Noise robust automatic speech recognition based on spectro-temporal techniques
JPH07160287A (en) Standard pattern making device
JP3589508B2 (en) Speaker adaptive speech recognition method and speaker adaptive speech recognizer

Legal Events

Date Code Title Description
TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20040210

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20040216

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

Ref document number: 3525082

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

RD02 Notification of acceptance of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: R3D02

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080220

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090220

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090220

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100220

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110220

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110220

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120220

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130220

Year of fee payment: 9

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

EXPY Cancellation because of completion of term