JPH08241092A - Speaker adaptation method for acoustic model and device therefor - Google Patents

Speaker adaptation method for acoustic model and device therefor

Info

Publication number
JPH08241092A
JPH08241092A JP7044430A JP4443095A JPH08241092A JP H08241092 A JPH08241092 A JP H08241092A JP 7044430 A JP7044430 A JP 7044430A JP 4443095 A JP4443095 A JP 4443095A JP H08241092 A JPH08241092 A JP H08241092A
Authority
JP
Japan
Prior art keywords
speaker
acoustic
hmm
recognition
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP7044430A
Other languages
Japanese (ja)
Inventor
Tomoko Matsui
知子 松井
Sadahiro Furui
貞煕 古井
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP7044430A priority Critical patent/JPH08241092A/en
Publication of JPH08241092A publication Critical patent/JPH08241092A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE: To realize a recognition system of high performance which minimizes the error rate of an acoustic HMM. CONSTITUTION: This device has a characteristic parameter extraction section 1 which forms and holds the acoustic HMM(hidden Markov model) 6 for nonspecific speakers formed by learning from the speeches of many speakers and extracts a characteristic parameter from speech data 5 of the person to be recognized and a likelihood maximizing adaptation section 2 which optimizes the parameter of the acoustic HMM for the nonspecific persons so as to maximize the likelihood for the speech of a person to be recognized. Further, the device has an identification error minimizing adaptation section 3 which obtaines the acoustic HMM of the min. in the identification error by defining the differentiable loss function from the parameter of the acoustic HMM having the parameter maximized in the likelihood and the time series speech data of the characteristic parameter of the person to be recognized and selecting the parameter so as to minimize the function and an adaptation acoustic HMM accumulation section 4.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、音声認識方法および装
置に関し、特に、音声の音声的特徴量をHMMによって
モデル化し、音素、単語などの認識カテゴリに対応した
不特定話者用音響HMMを特定の認識対象話者に適応化
する音響HMMの話者適応化方法と装置に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition method and apparatus, and more particularly, to a speech HMM for an unspecified speaker corresponding to a recognition category such as a phoneme or a word, which is obtained by modeling a speech feature of speech by an HMM. The present invention relates to a speaker adaptation method and apparatus for an acoustic HMM adapted to a specific recognition target speaker.

【0002】[0002]

【従来の技術】従来、不特定話者音声認識において、認
識システムを認識対象話者に適応化することで、認識性
能を改善しようとする試みがなされてきた。認識対象話
者の音声に対する不特定話者用音響HMMの尤度が最大
になるように、例えば文献「中川聖一:”確率モデルに
よる音声認識”電子情報通信学会、1988」(以下文
献1と称す)に発表されたバウム/ウエルチ(Baum-Wel
ch)アルゴリズムに従って、音響HMMのパラメータを
推定して、音HMMを認識対象話者に適応化していた。
2. Description of the Related Art Conventionally, in unspecified speaker voice recognition, attempts have been made to improve recognition performance by adapting a recognition system to a recognition target speaker. In order to maximize the likelihood of the acoustic HMM for the unspecified speaker with respect to the voice of the recognition target speaker, for example, the document “Seiji Nakagawa:“ Speech recognition by probabilistic model ”IEICE, 1988” (hereinafter referred to as Document 1). Baum-Wel (Baum-Wel)
ch) The parameters of the acoustic HMM are estimated according to the algorithm to adapt the sound HMM to the recognition target speaker.

【0003】[0003]

【本発明が解決しようとする課題】上述した従来の音響
モデルの話者適応方法は、認識誤り率に関して、良い性
能が得られなかったのが実情であった。
In the above-described conventional speaker adaptation method of the acoustic model, it is the actual situation that good performance is not obtained with respect to the recognition error rate.

【0004】本発明の目的は、認識誤り率に関して、よ
り優れた性能が得られる音響モデルの話者適応化方法と
その装置を提供することである。
It is an object of the present invention to provide a speaker adaptation method for an acoustic model and a device therefor which can obtain a better performance in terms of recognition error rate.

【0005】[0005]

【課題を解決するための手段】本発明の音響モデルの話
者適応化方法は、音声の音響的特徴を抽出し、その特徴
量を統計的にモデル化して、音素、単語その他の認識カ
テゴリに対応した音響モデルを構成するために、多数の
話者の音声を用いて学習した不特定話者用の音響モデル
をHMMと略称されている隠れマルコフモデルで表現し
ておき、認識対象となる話者の音声を用いて、前記不特
定話者用音響HMMのパラメータを、認識対象話者の音
声に対する尤度が最大となるように最適化する音響モデ
ルの話者適応化方法において、前記認識対象話者の音声
に対する尤度が最大になるように最適化された不特定話
者用音響HMMのパラメータを認識対象話者の音声に対
する認識誤りが最小になるように適応化するステップを
有する。
A speaker adaptation method for an acoustic model according to the present invention extracts acoustic characteristics of a speech, statistically models the characteristic quantity, and classifies the acoustic characteristics into phonemes, words and other recognition categories. In order to construct a corresponding acoustic model, an acoustic model for an unspecified speaker, which has been learned by using the voices of a large number of speakers, is represented by a hidden Markov model, which is abbreviated as HMM. In the speaker adaptation method of the acoustic model, the parameters of the acoustic HMM for the unspecified speaker are optimized using the speech of the speaker so that the likelihood for the speech of the recognition target speaker is maximized. There is a step of adapting the parameters of the acoustic HMM for the unspecified speaker optimized so that the likelihood for the speaker's voice is maximized so that the recognition error for the voice of the recognition-target speaker is minimized.

【0006】また、前記不特定話者用音響HMMのパラ
メータを認識対象者の音声に対する認識誤りを最小にな
るように適応化するステップが、微分可能な損失関数を
定義しこの値が減少するように前記音響HMMのパラメ
ータを逐次更新して最適値を求めるステップであるもの
も本発明に含まれる。
Further, the step of adapting the parameters of the acoustic HMM for the unspecified speaker so as to minimize the recognition error with respect to the voice of the recognition target person defines a differentiable loss function so that this value decreases. The present invention also includes the step of sequentially updating the parameters of the acoustic HMM to obtain an optimum value.

【0007】本発明の音響モデルの話者適応化装置は、
音声の音響的特徴を抽出し、その特徴量を統計的にモデ
ル化して、音素、単語その他の認識カテゴリに対応した
音響モデルを構成するために、多数の話者の音声を用い
て学習した不特定話者用の音響モデルをHMMと略称さ
れている隠れマルコフモデルで表現しておき、認識対象
となる話者の音声を用いて、前記不特定話者用音響HM
Mのパラメータを、認識対象話者の音声に対する尤度が
最大となるように最適化する音響モデルの話者適応化装
置において、前記認識対象話者の音声に対する尤度が最
大になるように最適化された不特定話者用音響HMMの
パラメータを認識対象話者の音声に対する認識誤りが最
小になるように適応化する適応化手段を有している。
The speaker adaptation device of the acoustic model of the present invention comprises:
In order to construct an acoustic model corresponding to phonemes, words, and other recognition categories by extracting the acoustic features of speech and statistically modeling the features, we learned the learning using speech from many speakers. The acoustic model for a specific speaker is represented by a hidden Markov model, which is abbreviated as HMM, and the voice of the speaker to be recognized is used to generate the acoustic HM for the unspecified speaker.
In a speaker adaptation device of an acoustic model for optimizing the parameters of M so that the likelihood for the speech of the recognition target speaker is maximized, the optimal is such that the likelihood for the speech of the recognition target speaker is maximized. It has an adapting means for adapting the parameters of the converted acoustic HMM for the unspecified speaker so that the recognition error with respect to the voice of the recognition target speaker is minimized.

【0008】また、前記不特定話者用音響HMMのパラ
メータを認識対象者の音声に対する認識誤りを最小にな
るように適応化する適応化手段が、微分可能な損失関数
を定義しこの値が減少するように前記音響HMMのパラ
メータを逐次更新して最適値を求める手段を含むものも
本発明の音響モデルの話者適応装置に含まれる。
Further, the adaptation means for adapting the parameters of the acoustic HMM for the unspecified speaker so as to minimize the recognition error with respect to the voice of the recognition target person defines a differentiable loss function, and this value decreases. As described above, a device including means for sequentially updating the parameters of the acoustic HMM to obtain an optimum value is also included in the speaker adaptation device of the acoustic model of the present invention.

【0009】[0009]

【作用】多数の話者の音声を用いて、不特定話者用の音
響HMMのパラメータを、ある特定の話者の音声に対し
て尤度が最大になるように最適化した後に、さらに、認
識対象話者の音声に対する識別誤りが最小となるように
適応化するので、誤り率の少ない音響モデル話者適応化
が可能となる。
After the parameters of the acoustic HMM for an unspecified speaker are optimized using the sounds of a large number of speakers so as to maximize the likelihood with respect to the sounds of a specific speaker, Since adaptation is performed so that the recognition error with respect to the voice of the recognition target speaker is minimized, adaptation of the acoustic model speaker with a low error rate becomes possible.

【0010】[0010]

【実施例】次に、本発明の実施例について、図面を参照
して説明する。
Embodiments of the present invention will now be described with reference to the drawings.

【0011】図1(A)は本発明の音響モデル話者適応
化方法の一実施例のフローチャート,図1(B)は図1
(A)のステップ13の識別誤り率最小化適応化に損失
関数を使用した実施態様のフローチャートである。
FIG. 1A is a flow chart of an embodiment of the acoustic model speaker adaptation method of the present invention, and FIG. 1B is FIG.
It is a flowchart of the embodiment which used the loss function for the identification error rate minimization adaptation of step 13 of (A).

【0012】この音響モデルの話者適応化方法は、多数
の話者の音声を用いて学習した不特定話者用の音響モデ
ルを隠れマルコフモデルである音響HMM(Hidden Mar
kovModel)で表現する(ステップ11)。次に、特定の
認識対象話者の音声を用いて音響HMMのパラメータを
該認識対象話者の音声に対する尤度が最大になるように
最適化する(ステップ12)。さらに、前記認識対象話
者の音声に対する識別誤りが最小となるようにステップ
12の出力の音響HMMのパラメータを適応化する(ス
テップ13)。ステップ13の出力である音響HMMを
蓄積する(ステップ14)。
In this speaker adaptation method of an acoustic model, an acoustic model for an unspecified speaker learned by using the voices of a large number of speakers is an acoustic HMM (Hidden Mar) which is a hidden Markov model.
kovModel) (step 11). Next, using the voice of the specific speaker to be recognized, the parameters of the acoustic HMM are optimized so that the likelihood for the voice of the speaker to be recognized is maximized (step 12). Further, the parameters of the acoustic HMM output in step 12 are adapted so that the identification error with respect to the voice of the recognition target speaker is minimized (step 13). The acoustic HMM output from step 13 is stored (step 14).

【0013】また、識別誤り最小化適応化ステップ13
が、損失関数
Also, the identification error minimization adaptation step 13 is performed.
Is the loss function

【0014】[0014]

【数1】 を減少するように[Equation 1] To reduce

【0015】[0015]

【数2】 ここでεtは更新量を調節する係数で、実験的に設定す
る。
[Equation 2] Here, ε t is a coefficient for adjusting the update amount, and is set experimentally.

【0016】式(6)を順次Λtを更新して最適値を求
めるステップである実施態様も本発明にふくまれる。
The embodiment including the step of sequentially updating Λ t in Equation (6) to obtain the optimum value is also included in the present invention.

【0017】図2は本発明の音響モデル話者適応化方法
が適用された装置のブロック図である。
FIG. 2 is a block diagram of an apparatus to which the acoustic model speaker adaptation method of the present invention is applied.

【0018】この音響モデル話者適応化装置は、多数の
話者の音声音声を収録して音声の特徴量を統計的にモデ
ル化した不特定話者用の音響モデルを隠れマルコフモデ
ル(以下音響HMM(HIDDEN MARKOV MODEL)と称す)
6と、認識対象話者の音声データ5を入力されると入力
された音声をケプストラム等の特徴パラメータを用いた
表現形式に変換する特徴パラメータ抽出部1と、特徴パ
ラメータ抽出部1の出力とHMM6が入力されると、認
識対象話者の時系列に変換された音声データにより認識
対象話者に適応化された尤度の高い音響HMMと、前記
認識対象話者の時系列に変換された音声データを出力す
る尤度最大化適応部2と、尤度最大化適応化部2の出力
を入力とし、識別誤りを最小にするように音響HMMの
パラメータを修正する識別誤り最小化適応部3と、識別
誤り最小化適応化部3の出力を蓄積する適応化音響HM
M蓄積部4を有している。
This acoustic model speaker adaptation apparatus includes a hidden Markov model (hereinafter referred to as an acoustic model) for an acoustic model for an unspecified speaker in which voices of a large number of speakers are recorded and statistically modeled for the feature amount of the voice. HMM (HIDDEN MARKOV MODEL)
6, a feature parameter extraction unit 1 for converting the input voice into an expression format using a feature parameter such as a cepstrum when the voice data 5 of the recognition target speaker is input, an output of the feature parameter extraction unit 1, and the HMM 6 Is input, a highly likely acoustic HMM adapted to the recognition target speaker by the voice data converted into the recognition target speaker in time series, and the speech converted into the recognition target speaker in time series. A likelihood maximization adaptation unit 2 that outputs data, and an identification error minimization adaptation unit 3 that receives the output of the likelihood maximization adaptation unit 2 and modifies the parameters of the acoustic HMM so as to minimize the identification error. , An adaptive acoustic HM for accumulating the output of the identification error minimization adaptation unit 3
It has an M storage unit 4.

【0019】尤度最大化適応化部2では、入力された音
響HMM6のHMMパラメータΛ=
In the likelihood maximization adaptation unit 2, the HMM parameter Λ = of the input acoustic HMM 6 is

【0020】[0020]

【外1】 が式1に示す尤度関数LK(・)を最大にするように、
例えばバウム/ウエルチアルゴリズム(文献1参照)に
よって認識対象話者に対する適応化を実行する。
[Outside 1] So that the likelihood function L K (·) shown in Equation 1 is maximized,
For example, the Baum / Welch algorithm (see Reference 1) is used to adapt the speaker to be recognized.

【0021】ここで、 αは状態、mは混合分布、Where α is a state, m is a mixture distribution,

【0022】[0022]

【外2】 は状態αの混合分布mの重み係数、[Outside 2] Is the weighting coefficient of the mixture distribution m of the state α,

【0023】[0023]

【外3】 は平均値、[Outside 3] Is the average value,

【0024】[0024]

【外4】 は分散値、[Outside 4] Is the variance value,

【0025】[0025]

【外5】 は状態αから状態βへの遷移確率を表す。[Outside 5] Represents the transition probability from the state α to the state β.

【0026】[0026]

【数3】 ここで、X={x1、x2、x3・・・xT)は特徴パラメ
ータの時系列に変換された音声データ、tは時刻、πは
初期状態確率、Q={q0、q1、・・・qT)は状態遷
移系列を表す。
(Equation 3) Here, X = {x 1 , x 2 , x 3 ... x T ) is voice data converted into time series of feature parameters, t is time, π is initial state probability, Q = {q 0 , q 1 , ... Q T ) represent a state transition sequence.

【0027】また、In addition,

【0028】[0028]

【数4】 は出現確率である。[Equation 4] Is the probability of occurrence.

【0029】識別誤り最小化適応化部3では、識別誤り
関数dk(・)が最小となるように音響HMMが修正さ
れる。ここで、識別関数gK(・)と識別誤り関数d
k(・)をそれぞれ式(3)、(4)で定義する。
In the identification error minimization adaptation section 3, the acoustic HMM is modified so that the identification error function d k (.) Is minimized. Here, the discrimination function g K (·) and the discrimination error function d
k (·) is defined by equations (3) and (4), respectively.

【0030】[0030]

【数5】 kはビタービ(Viterbi)アルゴリズムによる尤度関数
である(文献1による)。
(Equation 5) L k is a likelihood function based on the Viterbi algorithm (according to Reference 1).

【0031】また、識別誤り最小化適応部3では、例え
ば音響HMMのパラメータ修正において識別誤り関数d
k(・)の代りに式5に示す微分可能な損失関数l
(dk)を定義し、この関数が減少するように式(6)
の音響パラメータを逐次更新する。ここに、式(5)、
(6)は文献[B.H. Juang and Katagiri:"Discriminat
ivetrain-ing" J. Acoust. Soc. Jpn.,13, 6,pp. 333-3
39,1992.]の識別学習アルゴリズムによる。
In addition, in the identification error minimization adaptation section 3, for example, in the parameter correction of the acoustic HMM, the identification error function d
Instead of k (•), differentiable loss function l shown in Equation 5
(D k ) is defined and equation (6) is used so that this function decreases.
The acoustic parameters of are sequentially updated. Where equation (5),
(6) is a reference [BH Juang and Katagiri: "Discriminat
ivetrain-ing "J. Acoust. Soc. Jpn., 13, 6, pp. 333-3
39, 1992.].

【0032】[0032]

【数6】 ここで、εtは更新量を調節する係数で、実験的に設定
する。
(Equation 6) Here, ε t is a coefficient for adjusting the update amount, and is set experimentally.

【0033】次に、本実施例の動作結果について実験例
を参照して説明する。
Next, the operation results of this embodiment will be described with reference to experimental examples.

【0034】まず、不特定話者用音響HMMの作成を行
った。本例の音響HMMは、混合分布数256の半連続
型HMMであり、音響HMMは音韻環境独立の43種類
である。不特定話者用音響HMMの作成には、男性35
名に計7,016文章を用いて、バウムウエルチのアル
ゴリズムによって、HMMパラメータの推定を行った。
First, an acoustic HMM for an unspecified speaker was created. The acoustic HMM of this example is a semi-continuous HMM with a mixture distribution number of 256, and there are 43 types of acoustic HMM independent of the phonological environment. To create an acoustic HMM for unspecified speakers, a male 35
Using a total of 7,016 sentences as the name, the HMM parameters were estimated by the Baumwelch algorithm.

【0035】また、話者適応化に、不特定話者用音響H
MMの作成に用いた話者とは異なる男性2名に10およ
び50文章のデータを用いた場合の、連続音声中の音韻
認識率により評価した。音韻認識実験では、音声内容の
書き下ろしを与えて音声区間に対してビタビアラインメ
ントを取り、それを正解の音韻区間と仮定し、その音韻
区間で全ての音韻HMMのうち、最大尤度を示すものを
認識結果とした。音韻識別実験は、話者適応化に用いた
ものと同じ文章セットとは異なる100文章を用いた場
合について行った。特徴パラメータとして、標本周波数
12KHz、フレーム長32ms、フレーム周期8m
s、LPC(Linear Prediction Coefficient)分析次
数16でケプストラムを抽出した。
In addition, for speaker adaptation, the speaker-specific sound H is used.
It was evaluated by the phoneme recognition rate in continuous speech when data of 10 and 50 sentences were used for two men different from the speaker used for creating the MM. In the phonological recognition experiment, the voice content is written down, the Viterbi alignment is taken for the speech section, and it is assumed that it is the correct phonological section, and the one showing the maximum likelihood among all the phonological HMMs in the phonological section. The recognition result. The phonological discrimination experiment was performed using 100 sentences different from the same sentence set used for speaker adaptation. As a characteristic parameter, sample frequency 12 KHz, frame length 32 ms, frame period 8 m
s, LPC (Linear Prediction Coefficient) The cepstrum was extracted in the analysis order 16.

【0036】また、尤度最大化による話者適応化では、
学習の繰返し回数は5回とし、また、各混合分布の平均
値だけを推定した。また、識別誤り最小化による話者適
応化では、繰返し回数は10回として、各混合分布の平
均値と重み係数を推定した。両方ともに、各繰返しにお
いて、前学習データを適用した後に、一斉にHMMパラ
メータを更新した。
Further, in speaker adaptation by likelihood maximization,
The learning was repeated 5 times, and only the average value of each mixture distribution was estimated. Further, in the speaker adaptation by minimizing the discrimination error, the number of iterations was set to 10 and the average value and weighting coefficient of each mixture distribution were estimated. In both cases, in each iteration, the HMM parameters were updated all at once after applying the pre-learning data.

【0037】上記の実験結果を表1に示したが、、本方
法が従来の音響モデルの話者適応方法に比して有効であ
ることがよくわかる。
The results of the above experiment are shown in Table 1, and it is clear that this method is more effective than the speaker adaptation method of the conventional acoustic model.

【0038】[0038]

【表1】 [Table 1]

【0039】[0039]

【発明の効果】以上説明したとおり本発明は、不特定話
者用音響HMMを尤度最大化適応化された後さらに音響
HMMのHMMパラメータの識別誤り最小化適応化をす
るので、音響モデルの話者適応化が増進され、高性能の
音響認識システムを実現できる効果がある。
As described above, according to the present invention, the acoustic HMM for the unspecified speaker is subjected to the likelihood maximization adaptation and then the HMM parameter identification error minimization adaptation of the acoustic HMM is performed. Speaker adaptation is enhanced, and a high-performance acoustic recognition system can be realized.

【図面の簡単な説明】[Brief description of drawings]

【図1】(A)は本発明の音響モデル話者適応方法の一
実施例のフローチャート、(B)は(A)のステップ1
3の識別誤り率最小化適応化に損失関数を使用した実施
態様のフローチャートである。
1A is a flowchart of an embodiment of an acoustic model speaker adaptation method of the present invention, and FIG. 1B is step 1 of FIG.
3 is a flowchart of an embodiment using a loss function for the identification error rate minimization adaptation of FIG.

【図2】本発明の音響モデルの話者適応装置の一実施例
のブロック図である。
FIG. 2 is a block diagram of an embodiment of a speaker adaptation device for an acoustic model of the present invention.

【符号の説明】[Explanation of symbols]

1 特徴パラメータ抽出部 2 尤度最大化適応化部 3 識別誤り最小化適応化部 4 適応化音響HMM蓄積部 5 認識対象話者の音声データ 6 不特定話者用音響HMM 1 Feature Parameter Extraction Unit 2 Likelihood Maximization Adaptation Unit 3 Identification Error Minimization Adaptation Unit 4 Adapted Acoustic HMM Storage Unit 5 Speech Data of Speaker to be Recognized 6 Acoustic HMM for Unspecified Speaker

Claims (4)

【特許請求の範囲】[Claims] 【請求項1】 音声の音響的特徴を抽出し、その特徴量
を統計的にモデル化して、音素、単語その他の認識カテ
ゴリに対応した音響モデルを構成するために、多数の話
者の音声を用いて学習した不特定話者用の音響モデルを
HMMと略称されている隠れマルコフモデルで表現して
おき、認識対象となる話者の音声を用いて、前記不特定
話者用音響HMMのパラメータを、認識対象話者の音声
に対する尤度が最大となるように最適化する音響モデル
の話者適応化方法において、 前記認識対象話者の音声に対する尤度が最大になるよう
に最適化された不特定話者用音響HMMのパラメータを
認識対象話者の音声に対する認識誤りが最小になるよう
に適応化するステップを有することを特徴とする音響モ
デルの話者適応化方法。
1. To extract acoustic features of a voice and statistically model the feature amount to construct an acoustic model corresponding to a phoneme, a word and other recognition categories, voices of a large number of speakers are extracted. The acoustic model for the unspecified speaker learned by using it is represented by a hidden Markov model, which is abbreviated as HMM, and the speech of the speaker to be recognized is used to set the parameters of the acoustic HMM for the unspecified speaker. In the speaker adaptation method of the acoustic model that optimizes the likelihood for the speech of the recognition target speaker to be maximum, and is optimized to maximize the likelihood for the speech of the recognition target speaker. A speaker adaptation method for an acoustic model, comprising a step of adapting a parameter of an acoustic HMM for an unspecified speaker so as to minimize a recognition error with respect to a voice of a recognition target speaker.
【請求項2】 前記不特定話者用音響HMMのパラメー
タを認識対象話者の音声に対する認識誤りを最小になる
ように適応化するステップが、微分可能な損失関数を定
義し、この値が減少するように前記音響HMMのパラメ
ータを逐次更新して最適値を求めるステップである請求
項1記載の音響モデルの話者適応方法。
2. The step of adapting the parameters of the acoustic HMM for an unspecified speaker so as to minimize the recognition error with respect to the speech of the recognition target speaker defines a differentiable loss function, and this value is reduced. 2. The speaker adaptation method for an acoustic model according to claim 1, further comprising the step of sequentially updating the parameters of the acoustic HMM to obtain an optimum value.
【請求項3】 音声の音響的特徴を抽出し、その特徴量
を統計的にモデル化して、音素、単語その他の認識カテ
ゴリに対応した音響モデルを構成するために、多数の話
者の音声を用いて学習した不特定話者用の音響モデルを
HMMと略称されている隠れマルコフモデルで表現して
おき、認識対象となる話者の音声を用いて、前記不特定
話者用音響HMMのパラメータを、認識対象話者の音声
に対する尤度が最大になるように最適化する音響モデル
の話者適応化装置において、 前記認識対象話者の音声に対する尤度が最大になるよう
に最適化された不特定話者用音響HMMのパラメータを
認識対象話者の音声に対する認識誤りが最小になるよう
に適応化する適応化手段を有することを特徴とする音響
モデルの話者適応化装置。
3. A plurality of speakers' voices are extracted in order to extract an acoustic feature of the voice and statistically model the feature amount to construct an acoustic model corresponding to a phoneme, word or other recognition category. The acoustic model for the unspecified speaker learned by using it is represented by a hidden Markov model, which is abbreviated as HMM, and the speech of the speaker to be recognized is used to set the parameters of the acoustic HMM for the unspecified speaker. In a speaker adaptation device for an acoustic model that optimizes the likelihood for the speech of the recognition-target speaker to be maximum, and is optimized so that the likelihood for the speech of the recognition-target speaker is maximized. A speaker adaptation apparatus for an acoustic model, comprising an adaptation means for adapting parameters of an acoustic HMM for an unspecified speaker so that a recognition error with respect to a voice of a recognition target speaker is minimized.
【請求項4】 前記不特定話者用音響HMMのパラメー
タを認識対象者の音声に対する認識誤りを最小になるよ
うに適応化する適応化手段が、微分可能な損失関数を定
義し、この値が減少するように前記音響HMMのパラメ
ータを逐次更新して最適値を求める手段を含む請求項1
記載の音響モデルの話者適応装置。
4. An adapting means for adapting the parameters of the acoustic HMM for an unspecified speaker so as to minimize a recognition error with respect to a voice of a recognition target person defines a differentiable loss function, and this value is A means for obtaining an optimum value by sequentially updating the parameters of the acoustic HMM so as to decrease.
Speaker adaptation device of the described acoustic model.
JP7044430A 1995-03-03 1995-03-03 Speaker adaptation method for acoustic model and device therefor Pending JPH08241092A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP7044430A JPH08241092A (en) 1995-03-03 1995-03-03 Speaker adaptation method for acoustic model and device therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP7044430A JPH08241092A (en) 1995-03-03 1995-03-03 Speaker adaptation method for acoustic model and device therefor

Publications (1)

Publication Number Publication Date
JPH08241092A true JPH08241092A (en) 1996-09-17

Family

ID=12691283

Family Applications (1)

Application Number Title Priority Date Filing Date
JP7044430A Pending JPH08241092A (en) 1995-03-03 1995-03-03 Speaker adaptation method for acoustic model and device therefor

Country Status (1)

Country Link
JP (1) JPH08241092A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9251784B2 (en) 2013-10-23 2016-02-02 International Business Machines Corporation Regularized feature space discrimination adaptation
US10460232B2 (en) 2014-12-03 2019-10-29 Samsung Electronics Co., Ltd. Method and apparatus for classifying data, and method and apparatus for segmenting region of interest (ROI)
CN110706710A (en) * 2018-06-25 2020-01-17 普天信息技术有限公司 Voice recognition method and device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9251784B2 (en) 2013-10-23 2016-02-02 International Business Machines Corporation Regularized feature space discrimination adaptation
US10460232B2 (en) 2014-12-03 2019-10-29 Samsung Electronics Co., Ltd. Method and apparatus for classifying data, and method and apparatus for segmenting region of interest (ROI)
CN110706710A (en) * 2018-06-25 2020-01-17 普天信息技术有限公司 Voice recognition method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
JP4141495B2 (en) Method and apparatus for speech recognition using optimized partial probability mixture sharing
JP2986792B2 (en) Speaker normalization processing device and speech recognition device
JP2004109464A (en) Device and method for speech recognition
JPH11272291A (en) Phonetic modeling method using acoustic decision tree
JP3092491B2 (en) Pattern adaptation method using minimum description length criterion
US6546369B1 (en) Text-based speech synthesis method containing synthetic speech comparisons and updates
JP2000099087A (en) Method for adapting language model and voice recognition system
Steinbiss et al. The Philips research system for continuous-speech recognition
JP3088357B2 (en) Unspecified speaker acoustic model generation device and speech recognition device
Sukkar Subword-based minimum verification error (SB-MVE) training for task independent utterance verification
JP4461557B2 (en) Speech recognition method and speech recognition apparatus
JPH08241092A (en) Speaker adaptation method for acoustic model and device therefor
Junqua et al. Robustness in language and speech technology
JP2974621B2 (en) Speech recognition word dictionary creation device and continuous speech recognition device
JP2003330484A (en) Method and device for voice recognition
JP2002182682A (en) Speaker characteristic extractor, speaker characteristic extraction method, speech recognizer, speech synthesizer as well as program recording medium
JP2886118B2 (en) Hidden Markov model learning device and speech recognition device
JPH0895592A (en) Pattern recognition method
JP3532248B2 (en) Speech recognition device using learning speech pattern model
JP3754614B2 (en) Speaker feature extraction device, speaker feature extraction method, speech recognition device, speech synthesis device, and program recording medium
JPH0822296A (en) Pattern recognition method
JP3406672B2 (en) Speaker adaptation device
JP3036706B2 (en) Voice recognition method
JPH0786758B2 (en) Voice recognizer
JP2975540B2 (en) Free speech recognition device