JP2003131687A

JP2003131687A - System, method, and program for voice recognition sound model learning and adaptation

Info

Publication number: JP2003131687A
Application number: JP2001327958A
Authority: JP
Inventors: Yoshifumi Onishi; 祥史大西
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2001-10-25
Filing date: 2001-10-25
Publication date: 2003-05-09

Abstract

PROBLEM TO BE SOLVED: To estimate adapted HMM by providing an HMM estimating method based upon a framework for MAP estimation which can be extracted from learning data and constituted and estimating a weight belonging to a cluster of a previous distribution for a new speaker. SOLUTION: An initial value setting part 111 of a data processor 101 sets initial values of variables in a cluster weight variable storage part 122, an HMM variable storage part 123, and a cluster variable storage part 124, a cluster weight variable optimization part 112 optimizes cluster weight variables by speakers, and an HMM variable optimization part 113 optimizes the HMM variables of the respective learners. A convergence decision part 114 decides whether a cluster weight variable or HMM variable converges and instructs the process to be repeated when not. A cluster variable optimization part 115 optimizes the cluster variable and a convergence decision part 116 decides whether the cluster variable is optimized and instructs the process to be repeated when not.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識音響モデ
ル学習・適応化システム、音声認識音響モデル学習・適
応化方法、および、音声認識音響モデル学習・適応化プ
ログラムに関し、特に、話者クラスタの構成と話者適応
化する音声認識音響モデル学習・適応化システム、音声
認識音響モデル学習・適応化方法、および、音声認識音
響モデル学習・適応化プログラムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition acoustic model learning / adapting system, a speech recognition acoustic model learning / adapting method, and a speech recognition acoustic model learning / adaptation program. The present invention relates to a speech recognition acoustic model learning / adaptation system, a speech recognition acoustic model learning / adaptation method, and a speech recognition acoustic model learning / adaptation program for constructing and adapting a speaker.

【０００２】[0002]

【従来の技術】従来、音声認識装置における音響モデル
作成法の一つとして、最大事後確率推定（ＭＡＰ推定）
の枠組みに基づく隠れマルコフモデル（ＨＭＭ）推定方
法があり、＜１９９４年４月，アイ・イー・イー・イー
・トランザクションズ・オン・スピーチ・アンド・オー
ディオ・プロセッシング、第２巻第２号第２９１−
２９８項（ＩＥＥＥＴＲＡＮＳＡＣＴＩＯＮＳＯＮ
ＳＰＥＥＣＨＡＮＤＡＵＤＩＯＰＲＯＣＥＳＳＩ
ＮＧ，ＶＯＬ．２，ＮＯ．２，ｐｐ２９１−２９８，Ａ
ＰＲＩＬ１９９４）＞に記載されている（文献１とす
る）。2. Description of the Related Art Conventionally, as one of acoustic model creating methods in a speech recognition apparatus, maximum posterior probability estimation (MAP estimation) has been performed.
There is a Hidden Markov Model (HMM) estimation method based on the framework of <April 1994, IEE Transactions on Speech and Audio Processing, Volume 2, No. 2, No. 291. −
Item 298 (IEEE TRANSACTIONS ON
SPEECH ANDAUDIO PROCESSESI
NG, VOL. 2, NO. 2, pp 291-298, A
PRIL 1994)> (Reference 1).

【０００３】上記文献１に記載された手法は、ＨＭＭを
構成する変数をそれらが事前に分布するモデルを利用し
て推定する、また、事前分布を構成する変数を推定する
ものである。The method described in the above-mentioned document 1 estimates the variables constituting the HMM using a model in which they are distributed in advance, and also estimates the variables constituting the prior distribution.

【０００４】上記文献１に示されているＭＡＰ推定の枠
組みに基づくＨＭＭ推定法と、事前分布構成方法につい
て式を用いて説明する。The HMM estimation method based on the framework of MAP estimation shown in the above Document 1 and the prior distribution construction method will be described by using equations.

【０００５】図５は、数式（１）〜数式（７）の一覧を
示す説明図である。FIG. 5 is an explanatory diagram showing a list of equations (1) to (7).

【０００６】数式（１）は、ＭＡＰ推定に基づくＨＭＭ
変数λ推定法を表す。ここで、ＨＭＭ変数λとは、ＨＭ
Ｍを完全に記述するモデルパラメータである。Ｇ（λ）
は、ＨＭＭ変数λの出現分布を表す事前分布である。事
前分布Ｇは事前分布のモデルパラメータである事前分布
変数によって記述される。Equation (1) is an HMM based on MAP estimation.
Represents the variable λ estimation method. Here, the HMM variable λ is HM
A model parameter that completely describes M. G (λ)
Is a prior distribution representing the appearance distribution of the HMM variable λ. The prior distribution G is described by the prior distribution variable that is a model parameter of the prior distribution.

【０００７】ｆ（ｘ｜λ）は、ＨＭＭ変数λで記述され
たＨＭＭのもとで、学習データｘが出現する尤度であ
る。数式（１）は、既知の事前分布Ｇのもと学習データ
ｘに対して事後確率ｆ（ｘ｜λ）Ｇ（λ）が最大となる
ようλを決定し、その値を〜λと決定することをあらわ
す（ａｒｇｍａｘで表す）。F (x | λ) is the likelihood that the learning data x appears under the HMM described by the HMM variable λ. Formula (1) determines λ so that the posterior probability f (x | λ) G (λ) becomes maximum with respect to the learning data x under the known prior distribution G, and the value is determined as ˜λ. This is expressed (expressed as argmax).

【０００８】すなわち、ＭＡＰ推定に基づくＨＭＭ変数
λ推定法である。混合ガウス分布を有するＨＭＭの場合
の変数推定式は文献１第４章において示されている。事
前分布構成法は、数式（１）と数式（２）とを繰り返し
用いる方法である。That is, it is an HMM variable λ estimation method based on MAP estimation. The variable estimation formula in the case of the HMM having the mixed Gaussian distribution is shown in Chapter 4 of Reference 1. The prior distribution construction method is a method in which the formula (1) and the formula (2) are repeatedly used.

【０００９】数式（２）は、ＨＭＭ変数λを固定したと
き、その出現確率最大となるよう事前分布変数φを決定
し、その値を〜φとすることを表す。ここで、事前分布
は数学的取り扱いを簡単にするために文献１第２章に述
べられている数学的分類に属すものである。ある初期値
に対し数式（１）を用いてλを決定し、つぎに、そのλ
に対し数式（２）を用いてφを決定する。Equation (2) represents that when the HMM variable λ is fixed, the prior distribution variable φ is determined so that its appearance probability becomes maximum, and its value is set to Φ. Here, the prior distribution belongs to the mathematical classification described in Chapter 2 of Reference 1 in order to simplify the mathematical handling. Mathematical formula (1) is used for a certain initial value to determine λ, and then λ
Then, φ is determined by using the equation (2).

【００１０】この手続きを、φ、λが繰り返しに対して
不変、あるいは、変化が十分に小さくなることをもって
収束したと判定する。すなわち事前分布構成法ならびに
ＭＡＰ推定の枠組みに基づくＨＭＭ推定法である。It is determined that this procedure has converged when φ and λ are invariant with respect to repetition or when the change is sufficiently small. That is, the HMM estimation method based on the prior distribution construction method and the MAP estimation framework.

【００１１】上記文献１では、事前分布は１つの分布か
らなるものであった。一方、文献１の手法をもとにし
て，事前分布を複数の関数の線形結合からなるものに拡
張した手法が＜２０００年，アイ・イー・イー・イー・
プロシーディングス・アイ・シー・エー・エス・エス・
ピー００，第２巻，第ＩＩ９９３項（Ｐｒｏｃ．ＩＥ
ＥＥＩＣＡＳＳＰ−００，ＶＯＬ．２，ＩＩ９９３，
２０００）＞に記載されている（文献２とする）。In Reference 1, the prior distribution consisted of one distribution. On the other hand, based on the method of Reference 1, a method in which the prior distribution is extended to a linear combination of a plurality of functions is <2000, iEeEe
Proceedings IC SA S
P00, Volume 2, Item II993 (Proc. IE
EE ICASSP-00, VOL. 2, II993
2000)> (Reference 2).

【００１２】上記文献２に示されている従来技術を用い
たＭＡＰ推定の枠組みに基づく学習によるＨＭＭ推定法
と、ＨＭＭ適応法とを図と数式とを用いて説明する。The HMM estimation method by learning based on the framework of the MAP estimation using the conventional technique shown in the above-mentioned Document 2 and the HMM adaptation method will be described using figures and mathematical formulas.

【００１３】図６は、従来技術の学習法（ＭＡＰ推定の
枠組みに基づくＨＭＭ推定法）を説明するための説明図
である。FIG. 6 is an explanatory diagram for explaining a conventional learning method (HMM estimation method based on a MAP estimation framework).

【００１４】図６を参照すると、この従来技術は、デー
タ処理装置５０１と、記憶装置５０２とから構成され
る。Referring to FIG. 6, this conventional technique comprises a data processing device 501 and a storage device 502.

【００１５】記憶装置５０２は、多数話者の大規模学習
データを記憶する学習データ記憶部５２１と、事前分布
クラスタへの所属の重みを表すクラスタ重み変数記憶部
５２２と、不特定話者のＨＭＭを構成するＨＭＭ変数記
憶部５２３と、事前分布クラスタの構造を決定するクラ
スタ変数記憶部５２４とからなる。事前分布クラスタと
は、事前分布Ｇが複数の関数Ｇｊから構成されている各
Ｇｊのことをさす。たとえば、学習データが、携帯電話
のデータ、固定電話のデータ等、異なるデータセットか
らなる場合のＧ携帯、Ｇ固定等である。The storage device 502 has a learning data storage unit 521 for storing large-scale learning data of a large number of speakers, a cluster weight variable storage unit 522 for representing a weight of belonging to a prior distribution cluster, and an HMM for an unspecified speaker. And an HMM variable storage unit 523 that configures the above and a cluster variable storage unit 524 that determines the structure of the prior distribution cluster. The prior distribution cluster refers to each Gj in which the prior distribution G is composed of a plurality of functions Gj. For example, G-cell, G-fixed, etc. when the learning data is composed of different data sets such as mobile phone data and fixed phone data.

【００１６】クラスタ重み変数とは、事前分布Ｇに対す
る各事前分布クラスタＧｊの重み係数であり、数式
（３）のｑｊで表される。クラスタ変数とは、各事前分
布クラスタＧｊのモデルパラメータでありφｊで表され
る。The cluster weight variable is a weighting coefficient of each prior distribution cluster Gj with respect to the prior distribution G, and is represented by qj in Expression (3). The cluster variable is a model parameter of each prior distribution cluster Gj and is represented by φj.

【００１７】全学習データは、複数のクラスタから構成
されているものであり、クラスタ重み変数とクラスタ変
数とは、この全学習データ中の各クラスタの存在重みに
従うように前もって与えられるものである。数式（３）
は、ＨＭＭ変数λが従う事前分布ＧがＪ個のクラスタＧ
ｊから構成され、各クラスタへの重み変数がｑｊである
ことを表す。Ｇｊは各クラスタ変数φｊで決まる。All the learning data are composed of a plurality of clusters, and the cluster weight variable and the cluster variable are given in advance so as to follow the existence weights of the clusters in the all learning data. Formula (3)
Is a cluster G with J prior distributions G that the HMM variable λ follows.
It is composed of j and represents that the weight variable to each cluster is qj. Gj is determined by each cluster variable φj.

【００１８】データ処理装置５０１は、ＨＭＭ変数につ
いて最適化するＨＭＭ変数最適化部５１１と、ＨＭＭ変
数が収束したか判断する収束判定部５１２とからなる。
ＨＭＭ変数最適化部５１１では、ＨＭＭ変数について、
記憶装置５０２で記憶されているデータおよび変数を用
いてＭＡＰ推定をおこない、ＨＭＭ変数の更新を繰り返
し、収束判定部５１２で、このＨＭＭ変数が収束したと
判定すると更新を終了する。混合ガウス分布を有するＨ
ＭＭの変数推定式は文献２の第２章に示されている。The data processing device 501 comprises an HMM variable optimizing section 511 for optimizing HMM variables, and a convergence judging section 512 for judging whether the HMM variables have converged.
In the HMM variable optimization unit 511, regarding the HMM variable,
The MAP estimation is performed using the data and variables stored in the storage device 502, the HMM variable is repeatedly updated, and when the convergence determination unit 512 determines that this HMM variable has converged, the update ends. H with mixed Gaussian distribution
The MM variable estimation formula is shown in Chapter 2 of Reference 2.

【００１９】次に、文献２に示されている従来技術のＭ
ＡＰ推定の枠組みに基づくＨＭＭ適応法を説明する。Ｈ
ＭＭ適応法とは学習話者に属さない新話者に対しＨＭＭ
変数を適応化することである。Next, the prior art M shown in Document 2 is used.
The HMM adaptation method based on the framework of AP estimation will be described. H
What is the MM adaptation method? HMM is applied to new speakers who do not belong to the learning speaker.
Adapting variables.

【００２０】図７は、従来のＭＡＰ推定の枠組みに基づ
くＨＭＭ適応法を説明するためのブロック図である。FIG. 7 is a block diagram for explaining an HMM adaptation method based on the conventional MAP estimation framework.

【００２１】図７を参照すると、この従来技術は、入力
装置６００と、データ処理装置６０１と、記憶装置６０
２とから構成される。Referring to FIG. 7, in this conventional technique, an input device 600, a data processing device 601, and a storage device 60.
2 and.

【００２２】入力装置６００は、新話者の発声を入力す
る装置である。記憶装置６０２は、新話者の入力データ
を記憶する新話者データ記憶部６２１と、クラスタ重み
変数記憶部６２２と、ＨＭＭ変数記憶部６２３と、クラ
スタ変数記憶部６２４とからなる。クラスタ重み変数
は、新話者が学習時の学習データのクラスタに属する重
みを反映させて与える。データ処理装置６０１は、ＨＭ
Ｍ変数について最適化するＨＭＭ変数最適化部６１１
と、ＨＭＭ変数が収束したか判定する収束判定部６１２
とからなる。図７のデータ処理装置６０１は、図６のデ
ータ処理装置５０１と同様に動作する。従来技術のＨＭ
Ｍ適応化時は、ＨＭＭ推定法と同様の推定法を新話者デ
ータに対して行いＨＭＭ変数を新話者に対して適応化す
るものである。The input device 600 is a device for inputting the utterance of a new speaker. The storage device 602 includes a new speaker data storage unit 621 that stores input data of a new speaker, a cluster weight variable storage unit 622, an HMM variable storage unit 623, and a cluster variable storage unit 624. The cluster weight variable is given by reflecting the weight belonging to the cluster of the learning data when the new speaker learns. The data processing device 601 is an HM
HMM variable optimization unit 611 for optimizing M variables
And a convergence determination unit 612 that determines whether the HMM variables have converged.
Consists of. The data processing device 601 of FIG. 7 operates similarly to the data processing device 501 of FIG. Prior art HM
At the time of M adaptation, an estimation method similar to the HMM estimation method is performed on the new speaker data to adapt the HMM variable to the new speaker.

【００２３】また、「特開平１０−２０７４８５号公
報」記載の発明は、最大事後確率推定法を利用する音声
認識技術である。The invention described in Japanese Patent Application Laid-Open No. 10-207485 is a speech recognition technique using the maximum posterior probability estimation method.

【００２４】[0024]

【発明が解決しようとする課題】上述した従来技術の第
１の問題点は、学習時のクラスタ重み変数およびクラス
タ変数の構成方法が柔軟でないことである。The first problem of the above-mentioned prior art is that the cluster weight variable and the method of configuring the cluster variable at the time of learning are not flexible.

【００２５】その理由は、クラスタ重み変数およびクラ
スタ変数の決定法が、全学習データに事前に区別された
各クラスタの存在重みによって与えるというためであ
る。The reason is that the cluster weight variable and the method of determining the cluster variable are given to all the learning data by the existence weights of the respective clusters that are distinguished in advance.

【００２６】第２の問題点は、ＨＭＭ適応法が柔軟でな
いということである。The second problem is that the HMM adaptation method is not flexible.

【００２７】その理由は、第１の問題点で指摘した理由
で決定されたクラスタ変数を使用し、またクラスタ重み
変数を適応される新話者がもとの学習データのクラスタ
に属する重みによって決定しているためこの重みを知ら
ないときに十分な適応効果が見込めないためである。The reason is that the cluster variable determined for the reason pointed out in the first problem is used, and the cluster weight variable is determined by the weight of the adapted new speaker belonging to the cluster of the original learning data. This is because a sufficient adaptation effect cannot be expected when this weight is not known.

【００２８】本発明の目的は、ＨＭＭ適応化時に、柔軟
で有効な事前分布を学習時に学習データから抽出して構
成できるＭＡＰ推定の枠組みに基づくＨＭＭ推定方法を
提供すること、すなわちこの学習法を提供することであ
る。It is an object of the present invention to provide an HMM estimation method based on a MAP estimation framework which can be constructed by extracting a flexible and effective prior distribution from learning data at the time of HMM adaptation, that is, this learning method. Is to provide.

【００２９】本発明の他の目的は、ＨＭＭ適応化時に、
学習時に構成された柔軟で有効な事前分布を用い、新話
者に対してこの事前分布に属する重みを推定する方法
と、適応化されたＨＭＭを推定する方法を提供すること
である。Another object of the present invention is to adapt to HMM adaptation.
It is to provide a method of estimating a weight belonging to this prior distribution for a new speaker and a method of estimating an adapted HMM by using a flexible and effective prior distribution configured at the time of learning.

【００３０】[0030]

【課題を解決するための手段】本発明の第１の音声認識
音響モデル学習・適応化システムは、学習データを記憶
する学習データ記憶手段と、前記学習データ記憶部から
学習データを読み出し、読み出した学習データに基づい
てクラスタ化された事前分布を最適化し、最適化された
事前分布に基づいてＨＭＭ変数を最適化する最適化手段
とを備えたことを特徴とする。A first speech recognition acoustic model learning / adapting system of the present invention reads learning data from a learning data storage unit for storing learning data and the learning data storage unit. An optimization means for optimizing the clustered prior distribution based on the learning data and optimizing the HMM variable based on the optimized prior distribution.

【００３１】本発明の第２の音声認識音響モデル学習・
適応化システムは、学習データを記憶する学習データ記
憶部と、クラスタ化された事前分布を決める変数を記憶
するクラスタ変数記憶部と、クラスタ重み変数を記憶す
るクラスタ重み記憶部と、音響モデルを決定するＨＭＭ
変数を記憶するＨＭＭ変数記憶部と、学習データごとに
クラスタ重み変数を最適化するクラスタ重み変数最適化
手段と、学習用データごとにＨＭＭ変数を最適化するＨ
ＭＭ変数最適化手段と、全学習データに対してクラスタ
変数を最適化するクラスタ変数最適化手段とを備えたこ
とを特徴とする。Second speech recognition acoustic model learning of the present invention
The adaptation system includes a learning data storage unit that stores learning data, a cluster variable storage unit that stores a variable that determines a clustered prior distribution, a cluster weight storage unit that stores a cluster weight variable, and an acoustic model. HMM
An HMM variable storage unit that stores variables, a cluster weight variable optimization unit that optimizes cluster weight variables for each learning data, and an HMM variable that optimizes HMM variables for each learning data.
An MM variable optimizing means and a cluster variable optimizing means for optimizing cluster variables for all learning data are provided.

【００３２】本発明の第３の音声認識音響モデル学習・
適応化システムは、新話者への音響モデル適応化時に話
者クラスタの線形結合として表された事前分布を用い、
クラスタ重み変数を最適化し、ＨＭＭ変数を適応化する
手段を備えたことを特徴とする。Third speech recognition acoustic model learning of the present invention
The adaptation system uses the prior distribution expressed as a linear combination of speaker clusters when adapting the acoustic model to the new speaker,
It is characterized by comprising means for optimizing the cluster weight variables and adapting the HMM variables.

【００３３】本発明の第４の音声認識音響モデル学習・
適応化システムは、新話者の適応化データを記憶する新
話者データ記憶部と、話者クラスタの線形結合として表
された事前分布を決定するクラスタ変数を記憶するクラ
スタ変数記憶部と、クラスタ重み変数を記憶するクラス
タ重み変数記憶部と、音響モデルを決定するＨＭＭ変数
を記憶するＨＭＭ変数記憶部と、新話者の適応化データ
に基づいて新話者のクラスタへ重み変数を最適化するク
ラスタ重み変数最適化手段と、新話者の適応化データに
基づいてＨＭＭ変数を最適化するＨＭＭ変数最適化手段
とを備えたことを特徴とする。Fourth speech recognition acoustic model learning of the present invention
The adaptation system includes a new speaker data storage unit that stores the adaptation data of a new speaker, a cluster variable storage unit that stores a cluster variable that determines a prior distribution expressed as a linear combination of speaker clusters, and a cluster A cluster weight variable storage unit that stores the weight variable, an HMM variable storage unit that stores the HMM variable that determines the acoustic model, and optimizes the weight variable to the new speaker cluster based on the adaptation data of the new speaker. A cluster weight variable optimizing means and an HMM variable optimizing means for optimizing the HMM variable based on the adaptation data of the new speaker are provided.

【００３４】本発明の第１の音声認識音響モデル学習・
適応化方法は、学習データを記憶する前記学習データ記
憶部から学習データを読み出し、読み出した学習データ
に基づいてクラスタ化された事前分布を最適化し、最適
化された事前分布に基づいてＨＭＭ変数を最適化する最
適化手順を含むことを特徴とする。First speech recognition acoustic model learning of the present invention
The adaptation method reads learning data from the learning data storage unit that stores learning data, optimizes a clustered prior distribution based on the read learning data, and determines an HMM variable based on the optimized prior distribution. It is characterized by including an optimization procedure for optimizing.

【００３５】本発明の第２の音声認識音響モデル学習・
適応化方法は、学習データ記憶部の学習データごとにク
ラスタ重み記憶部のクラスタ重み変数を最適化するクラ
スタ重み変数最適化手順と、学習データごとにＨＭＭ変
数記憶部のＨＭＭ変数を最適化するＨＭＭ変数最適化手
順と、全学習用データに対してクラスタ変数記憶部のク
ラスタ変数を最適化するクラスタ変数最適化手順とを含
むことを特徴とする。Second speech recognition acoustic model learning of the present invention
The adaptation method includes a cluster weight variable optimization procedure for optimizing the cluster weight variable in the cluster weight storage unit for each learning data in the learning data storage unit and an HMM for optimizing the HMM variable in the HMM variable storage unit for each learning data. The method is characterized by including a variable optimization procedure and a cluster variable optimization procedure for optimizing the cluster variables in the cluster variable storage unit for all learning data.

【００３６】本発明の第３の音声認識音響モデル学習・
適応化方法は、新話者への音響モデル適応化時に話者ク
ラスタの線形結合として表された事前分布を用い、クラ
スタ重み変数を最適化し、ＨＭＭ変数を適応化する手順
を含むことを特徴とする。Learning the third speech recognition acoustic model of the present invention
The adaptation method is characterized by including a procedure of optimizing a cluster weight variable and adapting an HMM variable by using a prior distribution expressed as a linear combination of speaker clusters when an acoustic model is adapted to a new speaker. To do.

【００３７】本発明の第４の音声認識音響モデル学習・
適応化方法は、新話者データ記憶部の新話者の適応化デ
ータに基づいて新話者のクラスタへ重み変数を最適化す
るクラスタ重み変数最適化手順と、新話者データ記憶部
の新話者の適応化データに基づいてＨＭＭ変数を最適化
するＨＭＭ変数最適化手順とを備えたことを特徴とす
る。Fourth speech recognition acoustic model learning of the present invention
The adaptation method includes a cluster weight variable optimization procedure for optimizing a weight variable to a cluster of a new speaker based on the adaptation data of the new speaker in the new speaker data storage unit, and a new weight of the new speaker data storage unit. And an HMM variable optimizing procedure for optimizing the HMM variable based on the speaker's adaptation data.

【００３８】本発明の第１の音声認識音響モデル学習・
適応化プログラムは、学習データを記憶する前記学習デ
ータ記憶部から学習データを読み出し、読み出した学習
データに基づいてクラスタ化された事前分布を最適化
し、最適化された事前分布に基づいてＨＭＭ変数を最適
化する最適化手順をコンピュータに実行させることを特
徴とする。Learning the first speech recognition acoustic model of the present invention
The adaptation program reads the learning data from the learning data storage unit that stores the learning data, optimizes the clustered prior distribution based on the read learning data, and determines the HMM variable based on the optimized prior distribution. It is characterized by causing a computer to execute an optimization procedure for optimization.

【００３９】本発明の第２の音声認識音響モデル学習・
適応化プログラムは、学習データ記憶部の学習データご
とにクラスタ重み記憶部のクラスタ重み変数を最適化す
るクラスタ重み変数最適化手順と、学習データごとにＨ
ＭＭ変数記憶部のＨＭＭ変数を最適化するＨＭＭ変数最
適化手順と、全学習用データに対してクラスタ変数記憶
部のクラスタ変数を最適化するクラスタ変数最適化手順
とをコンピュータに実行させることを特徴とする。Second speech recognition acoustic model learning of the present invention
The adaptation program includes a cluster weight variable optimization procedure for optimizing the cluster weight variable in the cluster weight storage unit for each learning data in the learning data storage unit, and H for each learning data.
It is characterized by causing a computer to execute an HMM variable optimization procedure for optimizing HMM variables in the MM variable storage section and a cluster variable optimization procedure for optimizing cluster variables in the cluster variable storage section for all learning data. And

【００４０】本発明の第３の音声認識音響モデル学習・
適応化プログラムは、新話者への音響モデル適応化時に
話者クラスタの線形結合として表された事前分布を用
い、クラスタ重み変数を最適化し、ＨＭＭ変数を適応化
する手順をコンピュータに実行させることを特徴とす
る。Third speech recognition acoustic model learning according to the present invention
The adaptation program uses a prior distribution expressed as a linear combination of speaker clusters when adapting the acoustic model to a new speaker, optimizes cluster weight variables, and causes a computer to execute a procedure for adapting HMM variables. Is characterized by.

【００４１】本発明の第４の音声認識音響モデル学習・
適応化プログラムは、新話者データ記憶部の新話者の適
応化データに基づいて新話者のクラスタへ重み変数を最
適化するクラスタ重み変数最適化手順と、新話者データ
記憶部の新話者の適応化データに基づいてＨＭＭ変数を
最適化するＨＭＭ変数最適化手順とをコンピュータに実
行させることを特徴とする。Fourth speech recognition acoustic model learning of the present invention
The adaptation program includes a cluster weight variable optimization procedure for optimizing a weight variable for a new speaker cluster based on the new speaker adaptation data in the new speaker data storage unit, and a new speaker data storage unit new optimization unit. And a HMM variable optimization procedure for optimizing HMM variables based on speaker adaptation data.

【００４２】本発明の第５の音声認識音響モデル学習・
適応化システムは、各学習話者ごとの、クラスタ重み変
数の初期値、ＨＭＭ変数の初期値、全学習データにおけ
る話者クラスタの構造を決めるクラスタ変数の初期値を
それぞれ、クラスタ重み変数記憶部、ＨＭＭ変数記憶
部、クラスタ変数記憶部に設定する初期値設定部と、前
記クラスタ重み変数記憶部の話者について各話者クラス
タ重み変数を最適化し、前記クラスタ重み変数記憶部の
内容を更新するクラスタ重み変数最適化部と、前記ＨＭ
Ｍ変数記憶部の話者についてＨＭＭ変数を最適化し、前
記ＨＭＭ変数記憶部の内容を更新するＨＭＭ変数最適化
部と、学習話者について、クラスタ重み変数、および、
ＨＭＭ変数が収束したかどうかを判定し、収束していな
いと判定すれば、前記クラスタ重み変数最適化部、前記
ＨＭＭ変数最適化部に処理を戻し、さらに、全学習話者
に対して処理したかどうかを調べ、処理していないと判
定すれば、次の学習話者について、前記クラスタ重み変
数最適化部、前記ＨＭＭ変数最適化部に処理を戻す第１
の収束判定部と、前記第１の収束判定部が全学習話者
に対して処理したと判定すると、前記クラスタ変数記憶
部の全話者に対する話者クラスタの構成が最適となるよ
うクラスタ変数の最適化を行い、前記クラスタ変数記憶
部の内容を更新するクラスタ変数最適化部と、クラスタ
変数が収束したかどうかを判定し、収束していないと判
定すれば、前記クラスタ変数最適化部により前記クラス
タ変数記憶部に格納されたクラスタ変数を用いて最適化
を実施させるよう、前記クラスタ重み変数最適化部、前
記ＨＭＭ変数最適化部に処理を戻す第２の収束判定部と
を備えたことを特徴とする。Fifth speech recognition acoustic model learning of the present invention
The adaptation system uses an initial value of a cluster weight variable, an initial value of an HMM variable, and an initial value of a cluster variable that determines the structure of a speaker cluster in all learning data for each learning speaker as a cluster weight variable storage unit, An HMM variable storage unit, an initial value setting unit set in the cluster variable storage unit, and a cluster for optimizing each speaker cluster weight variable for a speaker in the cluster weight variable storage unit and updating the contents of the cluster weight variable storage unit A weight variable optimization unit and the HM
An HMM variable optimizing unit that optimizes the HMM variables for the speakers in the M variable storage unit and updates the contents of the HMM variable storage unit, a cluster weight variable for the learning speaker, and
If it is determined whether or not the HMM variable has converged, and if it is determined that the HMM variable has not converged, the process is returned to the cluster weight variable optimization unit and the HMM variable optimization unit, and further processed for all learning speakers. If it is determined that it is not processed, the process returns to the cluster weight variable optimization unit and the HMM variable optimization unit for the next learning speaker.
And the first convergence determination unit has determined that all the learning speakers have been processed, the cluster variables of the cluster variables are optimized to optimize the speaker cluster configuration for all the speakers in the cluster variable storage unit. A cluster variable optimizing unit that performs optimization and updates the contents of the cluster variable storage unit, and whether or not the cluster variables have converged. If it is determined that the cluster variables have not converged, the cluster variable optimizing unit performs the A cluster weight variable optimizing unit and a second convergence determining unit for returning the processing to the HMM variable optimizing unit are provided so as to perform optimization using the cluster variables stored in the cluster variable storage unit. Characterize.

【００４３】本発明の第５の音声認識音響モデル学習・
適応化方法は、各学習話者ごとの、クラスタ重み変数の
初期値、ＨＭＭ変数の初期値、全学習データにおける話
者クラスタの構造を決めるクラスタ変数の初期値をそれ
ぞれ、クラスタ重み変数記憶部、ＨＭＭ変数記憶部、ク
ラスタ変数記憶部に設定する初期値設定手順と、前記ク
ラスタ重み変数記憶部の話者について各話者クラスタ重
み変数を最適化し、前記クラスタ重み変数記憶部の内容
を更新するクラスタ重み変数最適化手順と、前記ＨＭＭ
変数記憶部の話者についてＨＭＭ変数を最適化し、前記
ＨＭＭ変数記憶部の内容を更新するＨＭＭ変数最適化手
順と、学習話者について、クラスタ重み変数、および、
ＨＭＭ変数が収束したかどうかを判定し、収束していな
いと判定すれば、前記クラスタ重み変数最適化手順、前
記ＨＭＭ変数最適化手順に処理を戻し、さらに、全学習
話者に対して処理したかどうかを調べ、処理していない
と判定すれば、次の学習話者について、前記クラスタ重
み変数最適化手順、前記ＨＭＭ変数最適化手順に処理を
戻す第１の収束判定手順と、前記第１の収束判定手順
が全学習話者に対して処理したと判定すると、前記クラ
スタ変数記憶部の全話者に対する話者クラスタの構成が
最適となるようクラスタ変数の最適化を行い、前記クラ
スタ変数記憶部の内容を更新するクラスタ変数最適化手
順と、クラスタ変数が収束したかどうかを判定し、収束
していないと判定すれば、前記クラスタ変数最適化手順
により前記クラスタ変数記憶部に格納されたクラスタ変
数を用いて最適化を実施させるよう、前記クラスタ重み
変数最適化手順、前記ＨＭＭ変数最適化手順に処理を戻
す第２の収束判定手順とを含むことを特徴とする。Fifth speech recognition acoustic model learning of the present invention
The adaptation method is such that, for each learning speaker, the initial value of the cluster weight variable, the initial value of the HMM variable, the initial value of the cluster variable that determines the structure of the speaker cluster in all the learning data, An initial value setting procedure to be set in the HMM variable storage unit and the cluster variable storage unit, and a cluster for optimizing each speaker cluster weight variable for the speaker in the cluster weight variable storage unit and updating the contents of the cluster weight variable storage unit Weight variable optimization procedure and the HMM
An HMM variable optimization procedure for optimizing the HMM variables for the speaker in the variable storage unit and updating the contents of the HMM variable storage unit, a cluster weight variable for the learning speaker, and
If it is determined whether or not the HMM variables have converged, and if it is determined that they have not converged, the processing is returned to the cluster weight variable optimization procedure and the HMM variable optimization procedure, and further processed for all learning speakers. If it is determined that it is not processed, the first convergence determination procedure for returning the processing to the cluster weight variable optimization procedure and the HMM variable optimization procedure for the next learning speaker; When it is determined that the convergence determination procedure of (1) has been processed for all the learning speakers, the cluster variables are optimized so that the speaker cluster configuration for all the speakers in the cluster variable storage unit is optimized, and the cluster variable storage is performed. The cluster variable optimization procedure for updating the contents of the section, and whether or not the cluster variables have converged. If it is determined that the cluster variables have not converged, the cluster variable optimization procedure performs the cluster variable optimization procedure. A cluster weight variable optimization procedure and a second convergence determination procedure for returning the processing to the HMM variable optimization procedure so that optimization is performed using the cluster variables stored in the variable storage unit. To do.

【００４４】本発明の第５の音声認識音響モデル学習・
適応化プログラムは、各学習話者ごとの、クラスタ重み
変数の初期値、ＨＭＭ変数の初期値、全学習データにお
ける話者クラスタの構造を決めるクラスタ変数の初期値
をそれぞれ、クラスタ重み変数記憶部、ＨＭＭ変数記憶
部、クラスタ変数記憶部に設定する初期値設定手順と、
前記クラスタ重み変数記憶部の話者について各話者クラ
スタ重み変数を最適化し、前記クラスタ重み変数記憶部
の内容を更新するクラスタ重み変数最適化手順と、前記
ＨＭＭ変数記憶部の話者についてＨＭＭ変数を最適化
し、前記ＨＭＭ変数記憶部の内容を更新するＨＭＭ変数
最適化手順と、学習話者について、クラスタ重み変数、
および、ＨＭＭ変数が収束したかどうかを判定し、収束
していないと判定すれば、前記クラスタ重み変数最適化
手順、前記ＨＭＭ変数最適化手順に処理を戻し、さら
に、全学習話者に対して処理したかどうかを調べ、処理
していないと判定すれば、次の学習話者について、前記
クラスタ重み変数最適化手順、前記ＨＭＭ変数最適化手
順に処理を戻す第１の収束判定手順と、前記第１の収束
判定手順が全学習話者に対して処理したと判定すると、
前記クラスタ変数記憶部の全話者に対する話者クラスタ
の構成が最適となるようクラスタ変数の最適化を行い、
前記クラスタ変数記憶部の内容を更新するクラスタ変数
最適化手順と、クラスタ変数が収束したかどうかを判定
し、収束していないと判定すれば、前記クラスタ変数最
適化手順により前記クラスタ変数記憶部に格納されたク
ラスタ変数を用いて最適化を実施させるよう、前記クラ
スタ重み変数最適化手順、前記ＨＭＭ変数最適化手順に
処理を戻す第２の収束判定手順とをコンピュータに実行
させることを特徴とする。Fifth speech recognition acoustic model learning of the present invention
The adaptation program stores, for each learning speaker, an initial value of a cluster weight variable, an initial value of an HMM variable, and an initial value of a cluster variable that determines the structure of a speaker cluster in all learning data. An initial value setting procedure to be set in the HMM variable storage unit and the cluster variable storage unit;
A cluster weight variable optimization procedure for optimizing each speaker cluster weight variable for the speaker in the cluster weight variable storage unit and updating the contents of the cluster weight variable storage unit, and an HMM variable for the speaker in the HMM variable storage unit. And a HMM variable optimization procedure for updating the contents of the HMM variable storage unit, a cluster weight variable for a learning speaker,
Further, it is determined whether or not the HMM variable has converged, and if it is determined that the HMM variable has not converged, the process is returned to the cluster weight variable optimization procedure and the HMM variable optimization procedure, and further, for all learning speakers. If it is determined that it has not been processed, and if it is determined that it has not been processed, the first convergence determination procedure for returning the processing to the cluster weight variable optimization procedure and the HMM variable optimization procedure for the next learning speaker, When it is determined that the first convergence determination procedure has been processed for all learning speakers,
The cluster variables are optimized so that the speaker cluster configuration for all speakers in the cluster variable storage unit is optimal,
A cluster variable optimization procedure for updating the contents of the cluster variable storage unit, and whether or not the cluster variables have converged, and if it is determined that the cluster variables have not converged, the cluster variable storage unit is stored in the cluster variable storage unit by the cluster variable optimization procedure. The computer is made to execute the cluster weight variable optimization procedure and the second convergence determination procedure for returning the processing to the HMM variable optimization procedure so as to perform the optimization using the stored cluster variable. .

【００４５】[0045]

【発明の実施の形態】次に、本発明の第１の実施の形態
の特徴について説明する。BEST MODE FOR CARRYING OUT THE INVENTION Next, features of the first embodiment of the present invention will be described.

【００４６】学習時は、大規模データベースから話者性
によるクラスタ化された事前分布を構成する。また、新
話者へのＨＭＭ適応化時は、学習時に十分なデータから
構成された事前分布を用いて新話者の発声データで適応
化された音響モデルを作成する。At the time of learning, a clustered prior distribution by speaker characteristics is constructed from a large-scale database. When the HMM is adapted to the new speaker, an acoustic model adapted with the new speaker's utterance data is created using a prior distribution composed of sufficient data at the time of learning.

【００４７】クラスタ化された事前分布構成は、十分な
学習データを用いて事前に学習時に行うことが可能であ
り、またこの事前分布構成法は、各学習話者の変数最適
化と全データによる大域的なクラスタ変数最適化を階層
的に繰り返し実行しているため、新話者適応化時は各学
習話者の変数最適化と同格の物となり、話者適応時に有
効な事前分布クラスタを構築することができる。The clustered prior distribution configuration can be performed in advance using sufficient learning data at the time of learning, and this prior distribution configuration method uses variable optimization of each learning speaker and all data. Since global cluster variable optimization is repeatedly executed hierarchically, it is equivalent to the variable optimization of each learning speaker during new speaker adaptation and constructs a prior distribution cluster that is effective during speaker adaptation. can do.

【００４８】このため、新話者へのＨＭＭ適応化時に少
数の発声でクラスタ重み変数と、ＨＭＭ変数を決定する
だけで十分な話者適応化の実現が見込める。また話者適
応時に存在しなかった音響モデルの認識単位についても
話者クラスタ重みについて最適化されているため、これ
を用いて重みづけられた事前分布で効果的に話者適応化
されることが見込まれる。Therefore, it is expected that sufficient speaker adaptation can be realized only by determining the cluster weight variable and the HMM variable with a small number of utterances during HMM adaptation to a new speaker. In addition, since the recognition unit of the acoustic model that did not exist during speaker adaptation is also optimized for the speaker cluster weight, it can be effectively speaker-adapted with the weighted prior distribution. Expected

【００４９】次に、従来技術との差異について説明す
る。Next, the difference from the prior art will be described.

【００５０】従来法の学習法では、事前分布クラスタす
なわちクラスタ変数と、事前分布クラスタへの重みであ
るクラスタ重み変数は、学習データそのもののクラスタ
をそのまま反映したものである。従来法の適応化法で
は、新話者への適応化時にクラスタ重み変数を知ってい
るものとして与える方法である。In the conventional learning method, the prior distribution cluster, that is, the cluster variable, and the cluster weight variable which is a weight for the prior distribution cluster, directly reflect the cluster of the learning data itself. In the conventional adaptation method, the cluster weight variable is given as a known value when adapting to a new speaker.

【００５１】一方、本発明の実施の形態での学習法で
は、各学習話者の変数最適化と全データによる大域的な
クラスタ変数最適化を階層的に繰り返し実行し、データ
中に内在する話者性を反映した事前分布を話者適応時に
有効な事前分布クラスタを構成するように自動的にクラ
スタ変数を決定する。また本実施の形態での適応化法で
は、クラスタ重み変数を最適化し、有効な話者性を反映
した事前分布を用いて新話者へのＨＭＭ適応をおこな
う。On the other hand, in the learning method according to the embodiment of the present invention, the variable optimization of each learning speaker and the global cluster variable optimization by all data are hierarchically repeatedly executed, and the story inherent in the data is The cluster variables are automatically determined so that the prior distribution that reflects personality constitutes a prior distribution cluster that is effective during speaker adaptation. Further, in the adaptation method according to the present embodiment, the cluster weight variable is optimized, and the HMM adaptation to a new speaker is performed using the prior distribution that reflects the effective speaker characteristic.

【００５２】次に、本発明の第１の実施の形態について
図面を参照して詳細に説明する。Next, a first embodiment of the present invention will be described in detail with reference to the drawings.

【００５３】図１は、本発明の第１の実施の形態の構成
を示すブロック図である。FIG. 1 is a block diagram showing the configuration of the first embodiment of the present invention.

【００５４】図１を参照すると、本発明の第１の実施の
形態は、プログラム制御により動作するデータ処理装置
１０１と、情報を記憶する記憶装置１０２とから構成さ
れる。Referring to FIG. 1, the first embodiment of the present invention comprises a data processing device 101 which operates under program control and a storage device 102 which stores information.

【００５５】記憶装置１０２は、多数話者の大規模学習
データを記憶する学習データ記憶部１２１と、事前分布
クラスタへの所属の重みを表すクラスタ重み変数記憶部
１２２と、学習データ中の各話者のＨＭＭを構成するＨ
ＭＭ変数記憶部１２３と、全学習データにおける話者ク
ラスタを事前分布のクラスタとして表し、その各クラス
タの構造を決定するクラスタ変数記憶部１２４とからな
る。ここで、話者クラスタとは、話者性の近い話者の集
合を各クラスタとすることであり、またこれは前もって
与えておくのではなく、事前分布クラスタとして抽出さ
れるものである。The storage device 102 stores a learning data storage unit 121 for storing large-scale learning data of a large number of speakers, a cluster weight variable storage unit 122 for indicating the weight of belonging to a prior distribution cluster, and each story in the learning data. H that constitutes the person's HMM
The MM variable storage unit 123 and the cluster variable storage unit 124 that represents speaker clusters in all learning data as clusters of prior distribution and determines the structure of each cluster. Here, the speaker cluster means that each cluster is a set of speakers having close speaker characteristics, and this is not given in advance but is extracted as a prior distribution cluster.

【００５６】データ処理装置１０１は、変数の初期値を
設定する初期値設定部１１１と、各学習話者のクラスタ
重み変数最適化部１１２と、各学習話者のＨＭＭ変数最
適化部１１３と、クラスタ重みおよびＨＭＭ変数が収束
したか判定する収束判定部１１４と、クラスタ変数最適
化部１１５と、全学習データに対して話者クラスタ最適
化が収束したかどうかを判定する収束判定部１１６とを
含む。The data processing device 101 includes an initial value setting unit 111 for setting initial values of variables, a cluster weight variable optimization unit 112 for each learning speaker, an HMM variable optimization unit 113 for each learning speaker, A convergence determination unit 114 that determines whether the cluster weights and the HMM variables have converged, a cluster variable optimization unit 115, and a convergence determination unit 116 that determines whether the speaker cluster optimization has converged for all learning data. Including.

【００５７】次に、本発明の第１の実施の形態の動作に
ついて図面を参照して説明する。Next, the operation of the first embodiment of the present invention will be described with reference to the drawings.

【００５８】特に、多数話者の大規模データベースから
話者性によるクラスタ化された事前分布を構成する学習
部分の動作について説明する。In particular, the operation of the learning part which constitutes a clustered prior distribution based on speaker characteristics from a large-scale database of many speakers will be described.

【００５９】図２は、本発明の第１の実施の形態の動作
を示すフローチャートである。FIG. 2 is a flow chart showing the operation of the first embodiment of the present invention.

【００６０】図２を参照すると、まず、初期値設定部１
１１が、各学習話者ごとの、クラスタ重み変数（ｑ、全
クラスタ分の数値）の初期値と、ＨＭＭ変数（λ、数値
の組）の初期値と、全学習データにおける話者クラスタ
の構造を決めるクラスタ変数（φ、数値の組）の初期値
とをそれぞれ、操作者の入力にしたがい（キーボート、
記録媒体等から）、クラスタ重み変数記憶部１２２、Ｈ
ＭＭ変数記憶部１２３、クラスタ変数記憶部１２４に設
定する（図２ステップＡ１）。初期値設定部１１１で
は、事前の知識ない場合、クラスタ重み変数を均等に分
布するものとし、クラスタ変数についても事前分布関数
を不特定ＨＭＭ変数に対する単一のもの、すなわち文献
１での事前分布決定法で決定されたものとする。Referring to FIG. 2, first, the initial value setting unit 1
11 is the initial value of the cluster weight variable (q, the numerical value for all clusters), the initial value of the HMM variable (λ, a set of numerical values), and the structure of the speaker cluster in all the learning data for each learning speaker. Initial values of the cluster variables (φ, a set of numerical values) that determine
(From recording medium or the like), cluster weight variable storage unit 122, H
The values are set in the MM variable storage unit 123 and the cluster variable storage unit 124 (step A1 in FIG. 2). In the initial value setting unit 111, if there is no prior knowledge, it is assumed that the cluster weight variables are evenly distributed, and the cluster variables also have a single prior distribution function for the unspecified HMM variable, that is, the prior distribution determination in Literature 1. It shall be decided by law.

【００６１】ＨＭＭ変数については全学習データからＭ
ＡＰ推定の枠組みで無く、最ゆう推定の枠組みで不特定
話者ＨＭＭ変数を決定し、初期値として用いる。なお数
式（３）における事前分布Ｇｊの関数形は数学的取り扱
い簡単のため文献１および文献２と同じ数学的分類に属
するとする。ここですべてのクラスタ変数に拘束を設け
なくても良いし、拘束条件を設けて自由変数を減らして
も良い。事前に知識がある場合その情報を使用して初期
値とすることも出来る。たとえば学習データから尤度基
準に静的なデータの話者クラスタを構成し各データクラ
スタ毎にＨＭＭを最ゆう推定し、その結果を事前の知識
とできる。For the HMM variables, M
The unspecified speaker HMM variable is determined by the maximum likelihood estimation framework instead of the AP estimation framework and used as the initial value. It is assumed that the functional form of the prior distribution Gj in the mathematical expression (3) belongs to the same mathematical classification as the literature 1 and the literature 2 for easy mathematical handling. Here, it is not necessary to set constraints on all cluster variables, or a constraint condition may be set to reduce free variables. If you have knowledge in advance, you can use that information as the initial value. For example, it is possible to construct a speaker cluster of static data based on the likelihood criterion from the learning data, estimate the HMM for each data cluster, and use the result as prior knowledge.

【００６２】また、ここで、学習データは、事前に学習
データ記憶部１２１に設定されているものとする。Here, it is assumed that the learning data has been set in the learning data storage unit 121 in advance.

【００６３】ステップＡ２〜ステップＡ４については、
各学習話者ごとに全学習話者について実行する。ここで
は以下、話者αに注目して説明する。Regarding steps A2 to A4,
All learning speakers are executed for each learning speaker. Here, the explanation will be given by focusing on the speaker α.

【００６４】次に、クラスタ重み変数最適化部１１２
が、クラスタ重み変数記憶部１２２の話者αについて各
話者クラスタへの所属の重み変数ｑを最適化し、クラス
タ重み変数記憶部１２２の内容を更新する（図２ステッ
プＡ２）。これは、数式（４）で表されている。｛ｑ｝
は全ｊに対しての集合をあらわす。Next, the cluster weight variable optimization unit 112
, Optimizes the weight variable q belonging to each speaker cluster for the speaker α in the cluster weight variable storage unit 122, and updates the contents of the cluster weight variable storage unit 122 (step A2 in FIG. 2). This is represented by equation (4). {Q}
Represents the set for all j.

【００６５】数式（４）は、話者αが各話者クラスタへ
のクラスタ重み変数｛ｑ^α ｝を、話者αのＨＭＭ変数
λ^α で学習データｘ^α を観測したときＭＡＰ最適と
なるように決定することを表す。たとえば、クラスタ重
み変数最適化部１１２では、話者αについての｛ｑ
^α ｝を変数とする（本ステップでは他の変数はすべて
固定のため一般的な）最急降下法や共役勾配法などの、
数値的関数最大化アルゴリズムを用いる。[Mathematical formula-see original document] Equation (4) is such that the speaker α becomes the MAP optimum when the cluster weight variable {q ^α } for each speaker cluster is observed as the learning data x ^α with the HMM variable λ ^α of the speaker α. It means to decide. For example, in the cluster weight variable optimization unit 112, {q
such as the steepest descent method or the conjugate gradient method, which uses ^α } as a variable (in this step, all other variables are fixed)
A numerical function maximization algorithm is used.

【００６６】次に、ＨＭＭ変数最適化部１１３が、ＨＭ
Ｍ変数記憶部１２３の話者αについてＨＭＭ変数λを最
適化し、ＨＭＭ変数記憶部１２３の内容を更新する（図
２ステップＡ３）。これは数式（５）で表されている。Next, the HMM variable optimizing unit 113 causes the HM
The HMM variable λ is optimized for the speaker α in the M variable storage unit 123, and the contents of the HMM variable storage unit 123 are updated (step A3 in FIG. 2). This is represented by equation (5).

【００６７】数式（５）は、話者αのＨＭＭ変数λ^α
を、話者αにクラスタを重みづけた事前分布を用いて学
習データｘ^α を観測したときＭＡＰ推定することを表
す。たとえば、ＨＭＭ変数最適化部１１３は、文献２で
用いられているＥＭアルゴリズムにもとづく推定式を用
いる。本ステップでは、ＨＭＭ変数以外は固定のため文
献２第２章のＨＭＭ変数最適化式をそのまま用いること
ができる。Equation (5) is the HMM variable λ ^{α of the} speaker ^α.
Represents that MAP estimation is performed when the learning data x ^α is observed using the prior distribution in which the speaker α is cluster-weighted. For example, the HMM variable optimizing unit 113 uses an estimation formula based on the EM algorithm used in Reference 2. In this step, since the HMM variables are fixed except the HMM variable, the HMM variable optimization formula of Chapter 2 of Reference 2 can be used as it is.

【００６８】次に、収束判定部１１４が、話者αについ
て、クラスタ重み変数｛ｑ^α ｝、および、ＨＭＭ変数
λ^α が収束したかどうかを判定し（図２ステップＡ
４）、収束していないと判定すれば（図２ステップＡ４
／ＮＯ）、ステップＡ２、ステップＡ３を繰り返す。Next, the convergence determination unit 114 determines whether or not the cluster weight variable {q ^α } and the HMM variable λ ^α have converged for the speaker α (step A in FIG. 2).
4) If it is determined that it has not converged (step A4 in FIG. 2).
/ NO), step A2 and step A3 are repeated.

【００６９】収束判定部１１４が、収束したと判定すれ
ば（図２ステップＡ４／ＹＥＳ）、さらに、全話者に対
して処理したかどうかを調べ（図２ステップＡ５）、処
理していないと判定すれば（図２ステップＡ５／Ｎ
Ｏ）、次の話者（たとえば、α’）について、ステップ
Ａ２の処理から開始する。If the convergence determination unit 114 determines that the processing has converged (step A4 / YES in FIG. 2), it further checks whether or not all speakers have been processed (step A5 in FIG. 2). If judged (Fig. 2 step A5 / N
O), for the next speaker (for example, α '), start from the process of step A2.

【００７０】全話者に対して処理したと判定すれば（図
２ステップＡ５／ＹＥＳ）、クラスタ変数最適化部１１
５が、クラスタ変数記憶部１２４の全話者に対する話者
クラスタの構成が最適となるようクラスタ変数最適化を
行い、クラスタ変数記憶部１２４の内容を更新する（図
２ステップＡ６）。これは、数式（６）で表されてい
る。｛φ｝は、全ｊに対しての集合を表す。If it is determined that all speakers have been processed (step A5 / YES in FIG. 2), the cluster variable optimization unit 11
5 optimizes the cluster variables so that the speaker cluster configuration for all speakers in the cluster variable storage unit 124 is optimized, and updates the contents of the cluster variable storage unit 124 (step A6 in FIG. 2). This is represented by equation (6). {Φ} represents a set for all j.

【００７１】数式（６）は、各学習話者についての事後
確率の和が最大となるようにクラスタ変数φを決定する
ことを表す。たとえば、クラスタ変数最適化部１１５で
は、｛φ｝を変数とする最急降下法や共役勾配法など
の、（本ステップでは他の変数はすべて固定のため一般
的な）数値的関数最大化アルゴリズムを用いる。ここで
すべてのクラスタ変数｛φ｝に拘束条件を用いなくても
良いし、数式（７）で示されるような変分関数Ψを用い
て制限される拘束条件を用いて自由変数を減らしても良
い。たとえば、Ψとしてクラスタ数がＪのときＪ行×Ｊ
列の線形変換行列で関係づけられるものに制限できる。
たとえば、 φ’１＝ｃｏｓθ＊φ１＋ｓｉｎθ＊φ２。Expression (6) represents that the cluster variable φ is determined so that the sum of the posterior probabilities for each learning speaker is maximized. For example, in the cluster variable optimizing unit 115, a numerical function maximizing algorithm (which is general in this step because all other variables are fixed) such as the steepest descent method and the conjugate gradient method using {φ} as a variable is used. To use. Here, it is not necessary to use constraint conditions for all the cluster variables {φ}, or it is possible to reduce the free variables by using constraint conditions that are restricted by using the variational function Ψ as shown in Expression (7). good. For example, if Ψ is the number of clusters as Ψ, then J rows x J
It can be limited to those related by a linear transformation matrix of columns.
For example, φ′1 = cos θ * φ1 + sin θ * φ2.

【００７２】φ’２＝−ｓｉｎθ＊φ１＋ｃｏｓθ＊φ
２とすれば、クラスタ変数｛φ｝から、クラスタ変数
｛φ’｝への変換は、θによって制限されたものにな
る。Φ'2 = -sin θ * φ1 + cos θ * φ
If it is 2, the conversion from the cluster variable {φ} to the cluster variable {φ ′} will be limited by θ.

【００７３】次に、収束判定部１１６が、クラスタ変数
が収束したかどうかを判定し（図２ステップＡ７）、収
束していないと判定すれば（図２ステップＡ７／Ｎ
Ｏ）、新たにクラスタ変数記憶部１２４に格納されたク
ラスタ変数｛φ｝を用いて、ステップＡ２からの変数更
新の手続きを繰り返す。収束判定部１１６が、収束した
と判定すれば（図２ステップＡ７／ＹＥＳ）、処理を終
了する。Next, the convergence judgment unit 116 judges whether or not the cluster variables have converged (step A7 in FIG. 2), and if it does not converge (step A7 / N in FIG. 2).
O), using the cluster variable {φ} newly stored in the cluster variable storage unit 124, the variable updating procedure from step A2 is repeated. If the convergence determination unit 116 determines that the convergence has occurred (step A7 / YES in FIG. 2), the process ends.

【００７４】このようにして、最適化された適応システ
ムが構築される。In this way, the optimized adaptive system is constructed.

【００７５】次に、最適化された適応システムに対し、
新たな話者へ適応化を実施する場合について図面を参照
して詳細に説明する。Next, for the optimized adaptive system,
The case of implementing adaptation to a new speaker will be described in detail with reference to the drawings.

【００７６】図３は、新話者用に適応システム利用する
場合の本発明の第１の実施の形態の構成を示すブロック
図である。FIG. 3 is a block diagram showing the configuration of the first embodiment of the present invention when the adaptive system is used for a new speaker.

【００７７】図３を参照すると、本発明の第１の実施の
形態の新話者用に適応システムは、新話者の発声を入力
する入力装置３００と、プログラム制御により動作する
データ処理装置３０１と、情報を記憶する記憶装置３０
２とから構成される。Referring to FIG. 3, the adaptive system for a new speaker according to the first embodiment of the present invention includes an input device 300 for inputting a new speaker's utterance and a data processing device 301 operated by program control. And a storage device 30 for storing information
2 and.

【００７８】記憶装置３０２は、新話者の入力データを
記憶する新話者データ記憶部３２１と、事前分布クラス
タへの所属の重みを表すクラスタ重み変数記憶部３２２
と、学習データ中の各話者のＨＭＭを構成するＨＭＭ変
数記憶部３２３と、全学習データにおける話者クラスタ
を事前分布のクラスタとして表し、その各クラスタの構
造を決定するクラスタ変数記憶部３２４とからなる。The storage device 302 includes a new speaker data storage unit 321 for storing input data of a new speaker, and a cluster weight variable storage unit 322 for representing a weight of belonging to a prior distribution cluster.
And an HMM variable storage unit 323 that constitutes the HMM of each speaker in the learning data, and a cluster variable storage unit 324 that represents the speaker clusters in all learning data as clusters of prior distribution and determines the structure of each cluster. Consists of.

【００７９】データ処理装置３０１は、新話者用に変数
の初期値を設定する初期値設定部３１１と、新話者話者
のクラスタ重み最適化部１３２と、新話者者のＨＭＭ変
数最適化部３１３と、クラスタ重みおよびＨＭＭ変数が
収束したか判定する収束判定部３１４とを含む。The data processing device 301 includes an initial value setting unit 311 for setting initial values of variables for a new speaker, a cluster weight optimization unit 132 for the new speaker, and an HMM variable optimization for the new speaker. The conversion unit 313 and the convergence determination unit 314 that determines whether the cluster weight and the HMM variable have converged.

【００８０】次に、動作について図面を参照して説明す
る。Next, the operation will be described with reference to the drawings.

【００８１】図４は、新話者用に適応システム利用する
場合の本発明の第１の実施の形態の動作を示すフローチ
ャートである。FIG. 4 is a flowchart showing the operation of the first embodiment of the present invention when the adaptive system is used for a new speaker.

【００８２】図４を参照すると、まず、操作者が、入力
装置３００から新話者データを新話者データ記憶部３２
１に格納する（図４ステップＢ０）。Referring to FIG. 4, first, the operator inputs new speaker data from the input device 300 to the new speaker data storage unit 32.
1 (step B0 in FIG. 4).

【００８３】次に、初期値設定部３１１が、新話者の、
クラスタ重み変数（ｑ、全クラスタ分の数値）の初期値
と、ＨＭＭ変数（λ、数値の組）の初期値とをそれぞ
れ、操作者の入力にしたがい（キーボート、記録媒体等
から）、クラスタ重み変数記憶部３２２、ＨＭＭ変数記
憶部３２３に設定する（図４ステップＢ１）。ステップ
Ｂ１において、初期値設定部３１１では新話者について
事前の知識ない場合、クラスタ重み変数を均等に分布す
るものとする。事前の知識がある場合それをクラスタ重
み変数としてもよい。ＨＭＭ変数については事前に学習
しておいた不特定ＨＭＭをＨＭＭ変数記憶部３２３に記
憶させておくことで初期値とする。Next, the initial value setting section 311
The initial values of the cluster weight variables (q, numerical values for all clusters) and the initial values of the HMM variables (λ, a set of numerical values) are respectively input from the operator (from keyboard, recording medium, etc.), cluster weights. The variables are set in the variable storage unit 322 and the HMM variable storage unit 323 (step B1 in FIG. 4). In step B1, if the initial value setting unit 311 has no prior knowledge about the new speaker, the cluster weight variables are evenly distributed. If there is prior knowledge, it may be used as a cluster weight variable. Regarding the HMM variable, an unspecified HMM learned in advance is stored in the HMM variable storage unit 323 to be an initial value.

【００８４】次に、クラスタ重み変数最適化部３１２
が、クラスタ重み変数記憶部３２２の新話者αについて
新話者クラスタへの所属の重み変数ｑを最適化し、クラ
スタ重み変数記憶部３２２の内容を更新する（図４ステ
ップＢ２）。Next, the cluster weight variable optimization unit 312
Optimizes the weight variable q belonging to the new speaker cluster for the new speaker α in the cluster weight variable storage unit 322, and updates the contents of the cluster weight variable storage unit 322 (step B2 in FIG. 4).

【００８５】次に、ＨＭＭ変数最適化部３１３が、ＨＭ
Ｍ変数記憶部３２３の新話者αについてＨＭＭ変数λを
最適化し、ＨＭＭ変数記憶部３２３の内容を更新する
（図４ステップＢ３）。Next, the HMM variable optimization unit 313 causes the HM
The HMM variable λ is optimized for the new speaker α in the M variable storage unit 323, and the contents of the HMM variable storage unit 323 are updated (step B3 in FIG. 4).

【００８６】ステップＢ３のＨＭＭ変数最適化部３１３
では文献２で用いられているＥＭアルゴリズムにもとづ
く推定式を用いる。また、本ステップではＨＭＭ変数以
外は固定のため文献２第２章のＨＭＭ変数最適化式をそ
のまま用いることができる。HMM variable optimization unit 313 in step B3
Then, the estimation formula based on the EM algorithm used in Reference 2 is used. Further, in this step, since the HMM variables other than the HMM variables are fixed, the HMM variable optimization formula in Chapter 2 of Reference 2 can be used as it is.

【００８７】次に、収束判定部３１４が、新話者αにつ
いて、クラスタ重み変数｛ｑ^α ｝、および、ＨＭＭ変
数λ^α が収束したかどうかを判定し（図４ステップＢ
４）、収束していないと判定すれば（図４ステップＢ４
／ＮＯ）、ステップＢ２、ステップＢ３を繰り返す。Next, the convergence determination unit 314 determines whether or not the cluster weight variable {q ^α } and the HMM variable λ ^α have converged for the new speaker α (step B in FIG. 4).
4) If it is determined that it has not converged (step B4 in FIG. 4).
/ NO), step B2 and step B3 are repeated.

【００８８】収束判定部３１４が、収束したと判定すれ
ば（図４ステップＢ４／ＹＥＳ）、終了する。If the convergence determination unit 314 determines that the convergence has occurred (step B4 / YES in FIG. 4), the process ends.

【００８９】次に、本発明の第２の実施の形態について
図面を参照して詳細に説明する。Next, a second embodiment of the present invention will be described in detail with reference to the drawings.

【００９０】図２を参照すると、本発明の第２の実施の
形態は、各学習話者ごとの、クラスタ重み変数の初期
値、ＨＭＭ変数の初期値、全学習データにおける話者ク
ラスタの構造を決めるクラスタ変数の初期値をそれぞ
れ、クラスタ重み変数記憶部１２２、ＨＭＭ変数記憶部
１２３、クラスタ変数記憶部１２４に設定する初期値設
定手順と（図２ステップＡ１）、クラスタ重み変数記憶
部１２２の話者について各話者クラスタ重み変数を最適
化し、クラスタ重み変数記憶部１２２の内容を更新する
クラスタ重み変数最適化手順と（図２ステップＡ２）、
ＨＭＭ変数記憶部１２３の話者についてＨＭＭ変数を最
適化し、ＨＭＭ変数記憶部１２３の内容を更新するＨＭ
Ｍ変数最適化手順と（図２ステップＡ３）、学習話者に
ついて、クラスタ重み変数、および、ＨＭＭ変数が収束
したかどうかを判定し（図２ステップＡ４）、収束して
いないと判定すれば、前記クラスタ重み変数最適化手
順、前記ＨＭＭ変数最適化手順に処理を戻し（図２ステ
ップＡ４／ＮＯ）、さらに、全学習話者に対して処理し
たかどうかを調べ（図２ステップＡ５）、処理していな
いと判定すれば、次の学習話者について、前記クラスタ
重み変数最適化手順、前記ＨＭＭ変数最適化手順に処理
を戻す（図２ステップＡ５／ＮＯ）第１の収束判定手順
と、前記第１の収束判定手順が全学習話者に対して処理
したと判定すると（図２ステップＡ５／ＹＥＳ）、クラ
スタ変数記憶部１２４の全話者に対する話者クラスタの
構成が最適となるようクラスタ変数の最適化を行い、ク
ラスタ変数記憶部１２４の内容を更新するクラスタ変数
最適化手順と（図２ステップＡ６）、クラスタ変数が収
束したかどうかを判定し（図２ステップＡ７）、収束し
ていないと判定すれば、前記クラスタ変数最適化手順に
よりクラスタ変数記憶部１２４に格納されたクラスタ変
数を用いて最適化を実施させるよう、前記クラスタ重み
変数最適化手順、前記ＨＭＭ変数最適化手順に処理を戻
す（図２ステップＡ７／ＮＯ）第２の収束判定手順とを
含む方法である。Referring to FIG. 2, according to the second embodiment of the present invention, the initial value of the cluster weight variable, the initial value of the HMM variable, and the structure of the speaker cluster in all learning data are set for each learning speaker. The initial values of the cluster variables to be decided are respectively set in the cluster weight variable storage unit 122, the HMM variable storage unit 123, and the cluster variable storage unit 124 (step A1 in FIG. 2), and the cluster weight variable storage unit 122 is described. Cluster weight variable optimization procedure for optimizing each speaker cluster weight variable for each speaker and updating the contents of the cluster weight variable storage unit 122 (step A2 in FIG. 2),
HM for optimizing HMM variables for the speaker in the HMM variable storage unit 123 and updating the contents of the HMM variable storage unit 123
With the M variable optimization procedure (step A3 in FIG. 2), it is determined whether the cluster weight variable and the HMM variable have converged for the learning speaker (step A4 in FIG. 2), and if it is determined that they have not converged, The process is returned to the cluster weight variable optimization procedure and the HMM variable optimization procedure (step A4 / NO in FIG. 2), and further it is checked whether or not all learning speakers have been processed (step A5 in FIG. 2). If not, the process returns to the cluster weight variable optimization procedure and the HMM variable optimization procedure for the next learning speaker (step A5 / NO in FIG. 2), the first convergence determination procedure, and When it is determined that the first convergence determination procedure has been processed for all the learned speakers (step A5 / YES in FIG. 2), the configuration of the speaker cluster for all the speakers in the cluster variable storage unit 124 is optimized. A cluster variable optimizing procedure of optimizing the cluster variables and updating the contents of the cluster variable storage unit 124 (step A6 in FIG. 2), determining whether the cluster variables have converged (step A7 in FIG. 2), and converging If not determined, the cluster weight variable optimization procedure and the HMM variable optimization procedure are performed so that optimization is performed using the cluster variables stored in the cluster variable storage unit 124 by the cluster variable optimization procedure. This is a method including a second convergence determination procedure for returning the processing (step A7 / NO in FIG. 2).

【００９１】次に、本発明の第３の実施の形態について
図面を参照して詳細に説明する。Next, a third embodiment of the present invention will be described in detail with reference to the drawings.

【００９２】図２を参照すると、本発明の第３の実施の
形態は、各学習話者ごとの、クラスタ重み変数の初期
値、ＨＭＭ変数の初期値、全学習データにおける話者ク
ラスタの構造を決めるクラスタ変数の初期値をそれぞ
れ、クラスタ重み変数記憶部１２２、ＨＭＭ変数記憶部
１２３、クラスタ変数記憶部１２４に設定する初期値設
定手順と（図２ステップＡ１）、クラスタ重み変数記憶
部１２２の話者について各話者クラスタ重み変数を最適
化し、クラスタ重み変数記憶部１２２の内容を更新する
クラスタ重み変数最適化手順と（図２ステップＡ２）、
ＨＭＭ変数記憶部１２３の話者についてＨＭＭ変数を最
適化し、ＨＭＭ変数記憶部１２３の内容を更新するＨＭ
Ｍ変数最適化手順と（図２ステップＡ３）、学習話者に
ついて、クラスタ重み変数、および、ＨＭＭ変数が収束
したかどうかを判定し（図２ステップＡ４）、収束して
いないと判定すれば、前記クラスタ重み変数最適化手
順、前記ＨＭＭ変数最適化手順に処理を戻し（図２ステ
ップＡ４／ＮＯ）、さらに、全学習話者に対して処理し
たかどうかを調べ（図２ステップＡ５）、処理していな
いと判定すれば、次の学習話者について、前記クラスタ
重み変数最適化手順、前記ＨＭＭ変数最適化手順に処理
を戻す（図２ステップＡ５／ＮＯ）第１の収束判定手順
と、前記第１の収束判定手順が全学習話者に対して処理
したと判定すると（図２ステップＡ５／ＹＥＳ）、クラ
スタ変数記憶部１２４の全話者に対する話者クラスタの
構成が最適となるようクラスタ変数の最適化を行い、ク
ラスタ変数記憶部１２４の内容を更新するクラスタ変数
最適化手順と（図２ステップＡ６）、クラスタ変数が収
束したかどうかを判定し（図２ステップＡ７）、収束し
ていないと判定すれば、前記クラスタ変数最適化手順に
よりクラスタ変数記憶部１２４に格納されたクラスタ変
数を用いて最適化を実施させるよう、前記クラスタ重み
変数最適化手順、前記ＨＭＭ変数最適化手順に処理を戻
す（図２ステップＡ７／ＮＯ）第２の収束判定手順とを
コンピュータ（たとえば、データ処理装置１０１）に実
行させるプログラムである。Referring to FIG. 2, according to the third embodiment of the present invention, the initial value of the cluster weight variable, the initial value of the HMM variable, and the structure of the speaker cluster in all learning data are set for each learning speaker. The initial values of the cluster variables to be decided are respectively set in the cluster weight variable storage unit 122, the HMM variable storage unit 123, and the cluster variable storage unit 124 (step A1 in FIG. 2), and the cluster weight variable storage unit 122 is described. Cluster weight variable optimization procedure for optimizing each speaker cluster weight variable for each speaker and updating the contents of the cluster weight variable storage unit 122 (step A2 in FIG. 2),
HM for optimizing HMM variables for the speaker in the HMM variable storage unit 123 and updating the contents of the HMM variable storage unit 123
With the M variable optimization procedure (step A3 in FIG. 2), it is determined whether the cluster weight variable and the HMM variable have converged for the learning speaker (step A4 in FIG. 2), and if it is determined that they have not converged, The process is returned to the cluster weight variable optimization procedure and the HMM variable optimization procedure (step A4 / NO in FIG. 2), and further it is checked whether or not all learning speakers have been processed (step A5 in FIG. 2). If not, the process returns to the cluster weight variable optimization procedure and the HMM variable optimization procedure for the next learning speaker (step A5 / NO in FIG. 2), the first convergence determination procedure, and When it is determined that the first convergence determination procedure has been processed for all the learned speakers (step A5 / YES in FIG. 2), the configuration of the speaker cluster for all the speakers in the cluster variable storage unit 124 is optimized. A cluster variable optimizing procedure of optimizing the cluster variables and updating the contents of the cluster variable storage unit 124 (step A6 in FIG. 2), determining whether the cluster variables have converged (step A7 in FIG. 2), and converging If not determined, the cluster weight variable optimization procedure and the HMM variable optimization procedure are performed so that optimization is performed using the cluster variables stored in the cluster variable storage unit 124 by the cluster variable optimization procedure. It is a program that causes the computer (for example, the data processing device 101) to execute the second convergence determination procedure for returning the process (step A7 / NO in FIG. 2).

【００９３】[0093]

【発明の効果】第１の効果は、大規模データベースから
話者性によるクラスタ化された事前分布を構成できるこ
とである。The first effect is that a clustered prior distribution based on speaker characteristics can be constructed from a large-scale database.

【００９４】その理由は、各学習話者の変数最適化と全
データによる大域的なクラスタ変数最適化を階層的に繰
り返し実行しているためである。The reason is that the variable optimization of each learning speaker and the global cluster variable optimization by all data are repeatedly executed hierarchically.

【００９５】第２の効果は話者適応時に少数の発声でク
ラスタ重み変数と、ＨＭＭ変数を決定するだけで十分な
話者適応化できることにある。The second effect is that sufficient speaker adaptation can be achieved by simply determining cluster weight variables and HMM variables with a small number of utterances during speaker adaptation.

【００９６】その理由は、学習時大規模データベースか
ら話者性によるクラスタ化された事前分布を構成し、新
話者へのＨＭＭ適応化は各学習話者の変数最適化と同格
の物であり、適応時有効な事前分布クラスタを使用でき
るためである。The reason is that a clustered prior distribution based on speaker characteristics is constructed from a large-scale database during learning, and HMM adaptation to a new speaker is equivalent to variable optimization of each learning speaker. , Because the prior distribution cluster that is effective at the time of adaptation can be used.

【００９７】第３の効果は新話者への話者適応時に適応
化用データに存在しなかった音響モデルの認識単位につ
いても柔軟に適応化できることである。The third effect is that the recognition unit of the acoustic model, which was not present in the adaptation data when the speaker was adapted to the new speaker, can be flexibly adapted.

【００９８】その理由は、適応化用データに存在する認
識単位を用いて話者クラスタ重みについて最適化されて
いるためこれを用いて重み付けた事前分布クラスタで効
果的に話者適応化できるためである。The reason is that the speaker cluster weight is optimized by using the recognition unit existing in the adaptation data, and thus the speaker prioritization can be effectively performed by the weighted prior distribution cluster. is there.

[Brief description of drawings]

【図１】本発明の第１の実施の形態の構成を示すブロッ
ク図である。FIG. 1 is a block diagram showing a configuration of a first exemplary embodiment of the present invention.

【図２】本発明の第１の実施の形態の動作を示すフロー
チャートである。FIG. 2 is a flowchart showing an operation of the first exemplary embodiment of the present invention.

【図３】本発明の第１の実施の形態の新話者用に音響モ
デルを適応する構成を示すブロック図である。FIG. 3 is a block diagram showing a configuration for adapting an acoustic model for a new speaker according to the first embodiment of this invention.

【図４】本発明の第１の実施の形態の新話者用に音響モ
デルを適応する動作を示すフローチャートである。FIG. 4 is a flowchart showing an operation of adapting an acoustic model for a new speaker according to the first embodiment of this invention.

【図５】数式を示す説明図である。FIG. 5 is an explanatory diagram showing mathematical expressions.

【図６】従来のＭＡＰ推定の枠組みに基づくＨＭＭ推定
法の構成を示すブロック図である。FIG. 6 is a block diagram showing a configuration of an HMM estimation method based on a conventional MAP estimation framework.

【図７】従来のＭＡＰ推定の枠組みに基づくＨＭＭ適応
法の構成を示すブロック図である。FIG. 7 is a block diagram showing a configuration of an HMM adaptation method based on a conventional MAP estimation framework.

[Explanation of symbols]

１０１データ処理装置３０１データ処理装置５０１データ処理装置６０１データ処理装置１０２記憶装置３０２記憶装置５０２記憶装置６０２記憶装置３００入力装置６００入力装置１２１学習データ記憶部３２１新話者データ記憶部５２１学習データ記憶部６２１新話者データ記憶部１２２クラスタ重み変数記憶部３２２クラスタ重み変数記憶部５２２クラスタ重み変数記憶部６２２クラスタ重み変数記憶部１２３ＨＭＭ変数記憶部３２３ＨＭＭ変数記憶部５２３ＨＭＭ変数記憶部６２３ＨＭＭ変数記憶部１２４クラスタ変数記憶部３２４クラスタ変数記憶部５２４クラスタ変数記憶部６２４クラスタ変数記憶部１１１初期値設定部３１１初期値設定部１１２クラスタ重み変数最適化部３１２クラスタ重み変数最適化部１１３ＨＭＭ変数最適化部３１３ＨＭＭ変数最適化部５１１ＨＭＭ変数最適化部６１１ＨＭＭ変数最適化部１１５クラスタ変数最適化部１１４収束判定部１１６収束判定部３１４収束判定部５１２収束判定部６１２収束判定部 101 Data processing device 301 Data processing device 501 data processor 601 Data processing device 102 storage device 302 storage device 502 storage device 602 storage device 300 input device 600 input device 121 learning data storage unit 321 New speaker data storage 521 Learning data storage unit 621 New speaker data storage unit 122 cluster weight variable storage unit 322 cluster weight variable storage unit 522 cluster weight variable storage unit 622 cluster weight variable storage unit 123 HMM variable storage 323 HMM variable storage 523 HMM variable storage 623 HMM variable storage 124 cluster variable storage 324 cluster variable storage 524 cluster variable storage 624 cluster variable storage 111 Initial value setting section 311 Initial value setting section 112 cluster weight variable optimization unit 312 cluster weight variable optimization unit 113 HMM variable optimization unit 313 HMM variable optimization unit 511 HMM variable optimization unit 611 HMM variable optimization unit 115 Cluster Variable Optimization Unit 114 Convergence determination unit 116 Convergence determination unit 314 Convergence determination unit 512 Convergence determination unit 612 Convergence determination unit

Claims

[Claims]

1. A learning data storage unit for storing learning data, and reading the learning data from the learning data storage unit, optimizing a clustered prior distribution based on the read learning data, and optimizing the prior distribution. A speech recognition acoustic model learning / adaptation system comprising: an optimization means for optimizing an HMM variable based on the above.

2. A learning data storage unit that stores learning data, a cluster variable storage unit that stores variables that determine a clustered prior distribution, a cluster weight storage unit that stores cluster weight variables, and an acoustic model is determined. HMM variable storage unit for storing HMM variables, cluster weight variable optimizing means for optimizing cluster weight variables for each learning data, and HMM for optimizing HMM variables for each learning data
A speech recognition acoustic model learning / adaptive system comprising a variable optimization means and a cluster variable optimization means for optimizing a cluster variable for all learning data.

3. A means for optimizing a cluster weight variable and adapting an HMM variable using a prior distribution expressed as a linear combination of speaker clusters when adapting an acoustic model to a new speaker. Speech recognition acoustic model learning and adaptation system.

4. A new speaker data storage unit for storing new speaker adaptation data, a cluster variable storage unit for storing a cluster variable for determining a prior distribution expressed as a linear combination of speaker clusters, and a cluster. A cluster weight variable storage unit that stores the weight variable, an HMM variable storage unit that stores the HMM variable that determines the acoustic model, and optimizes the weight variable to the new speaker cluster based on the adaptation data of the new speaker. A speech recognition acoustic model learning / adapting system comprising: a cluster weight variable optimizing means; and an HMM variable optimizing means for optimizing an HMM variable based on new speaker adaptation data.

5. The learning data is read from the learning data storage unit that stores the learning data, the clustered prior distribution is optimized based on the read learning data, and the HMM variable is calculated based on the optimized prior distribution. A method for learning and adapting a voice recognition acoustic model, characterized by including an optimization procedure for optimizing.

6. A cluster weight variable optimization procedure for optimizing a cluster weight variable of a cluster weight storage unit for each learning data of a learning data storage unit, and an HMM for optimizing an HMM variable of an HMM variable storage unit for each learning data. A speech recognition acoustic model learning / adaptation method comprising: a variable optimization procedure; and a cluster variable optimization procedure for optimizing a cluster variable in a cluster variable storage unit for all learning data.

7. A method of optimizing a cluster weight variable and adapting an HMM variable by using a prior distribution expressed as a linear combination of speaker clusters when adapting an acoustic model to a new speaker. Speech recognition acoustic model learning and adaptation method.

8. A cluster weight variable optimization procedure for optimizing a weight variable for a cluster of a new speaker based on the adaptation data of the new speaker in the new speaker data storage unit, and a new procedure for the new speaker data storage unit. HMM variable optimization procedure for optimizing HMM variables based on speaker adaptation data.

9. The learning data is read from the learning data storage unit that stores the learning data, the clustered prior distribution is optimized based on the read learning data, and the HMM variable is calculated based on the optimized prior distribution. A speech recognition acoustic model learning / adapting program characterized by causing a computer to execute an optimizing procedure for optimizing.

10. A cluster weight variable optimization procedure for optimizing a cluster weight variable in a cluster weight storage unit for each learning data in the learning data storage unit, and an HMM for each learning data.
It is characterized by causing a computer to execute an HMM variable optimization procedure for optimizing HMM variables in a variable storage section and a cluster variable optimization procedure for optimizing cluster variables in a cluster variable storage section for all learning data. Speech recognition acoustic model learning / adaptation program.

11. A method of optimizing a cluster weight variable and causing a computer to perform a procedure of adapting an HMM variable by using a prior distribution expressed as a linear combination of speaker clusters when adapting an acoustic model to a new speaker. A speech recognition acoustic model learning / adaptation program characterized by.

12. A cluster weight variable optimization procedure for optimizing a weight variable to a cluster of a new speaker based on the adaptation data of the new speaker in the new speaker data storage unit, and a new procedure for the new speaker data storage unit. A speech recognition acoustic model learning / adaptation program for causing a computer to execute an HMM variable optimizing procedure for optimizing an HMM variable based on speaker adaptation data.

13. A cluster weight variable storage unit, an initial value of a cluster weight variable, an initial value of an HMM variable, and an initial value of a cluster variable that determines a structure of a speaker cluster in all learning data for each learning speaker, respectively. HMM variable storage,
An initial value setting unit to be set in the cluster variable storage unit, and a cluster weight variable optimization unit to optimize each speaker cluster weight variable for the speakers in the cluster weight variable storage unit and update the contents of the cluster weight variable storage unit And optimize the HMM variables for the speaker in the HMM variable storage unit,
An HMM variable optimization unit that updates the contents of the MM variable storage unit, a cluster weight variable, and H for the learning speaker.
If it is determined whether or not the MM variable has converged, and it is determined that the MM variable has not converged, the cluster weight variable optimization unit, the H
The process is returned to the MM variable optimizing unit, whether or not all the learning speakers have been processed is checked, and if it is determined that they have not been processed, the cluster weight variable optimizing unit for the next learning speaker, A first convergence determination unit that returns the processing to the HMM variable optimization unit, and when it is determined that the first convergence determination unit has processed all the learning speakers, the cluster variable storage unit talks to all the speakers. Cluster variables are optimized so that the cluster configuration is optimized, and a cluster variable optimization unit that updates the contents of the cluster variable storage unit and a judgment whether the cluster variables have converged. If the determination is made, the cluster weight variable optimization unit and the H variable optimization unit perform optimization by using the cluster variables stored in the cluster variable storage unit by the cluster variable optimization unit.
A speech recognition acoustic model learning / adaptation system, comprising: a second convergence determination unit that returns processing to the MM variable optimization unit.

14. An initial value of a cluster weight variable, an initial value of an HMM variable, and an initial value of a cluster variable that determines a speaker cluster structure in all learning data for each learning speaker, respectively. HMM variable storage,
An initial value setting procedure to be set in the cluster variable storage unit, and a cluster weight variable optimization procedure for optimizing each speaker cluster weight variable for the speaker in the cluster weight variable storage unit and updating the contents of the cluster weight variable storage unit And the HM
The HMM variable optimization procedure for optimizing the HMM variables for the speaker in the M variable storage unit and updating the contents of the HMM variable storage unit, and for the learning speaker, the cluster weight variable and whether the HMM variable has converged If it is determined that it has not converged, the process returns to the cluster weight variable optimization procedure and the HMM variable optimization procedure, and
If it is determined that all the learning speakers have not been processed, and if it is determined that they have not been processed, the process returns to the cluster weight variable optimization procedure and the HMM variable optimization procedure for the next learning speaker. And the first convergence determination procedure is performed for all the learning speakers, cluster variables for all speakers in the cluster variable storage unit are optimized so that the configuration of the speaker cluster is optimized. A cluster variable optimizing procedure of performing optimization and updating the contents of the cluster variable storage section, and determining whether or not the cluster variables have converged, and determining that they have not converged, the cluster variable optimizing procedure Second convergence returning processing to the cluster weight variable optimization procedure and the HMM variable optimization procedure so that optimization is performed using the cluster variables stored in the cluster variable storage unit. A method for learning and adapting a speech recognition acoustic model, characterized by including a decision procedure.

15. An initial value of a cluster weight variable, an initial value of an HMM variable, and an initial value of a cluster variable that determines the structure of a speaker cluster in all learning data for each learning speaker, respectively. HMM variable storage,
An initial value setting procedure to be set in the cluster variable storage unit, and a cluster weight variable optimization procedure for optimizing each speaker cluster weight variable for the speaker in the cluster weight variable storage unit and updating the contents of the cluster weight variable storage unit And the HM
The HMM variable optimization procedure for optimizing the HMM variables for the speaker in the M variable storage unit and updating the contents of the HMM variable storage unit, and for the learning speaker, the cluster weight variable and whether the HMM variable has converged If it is determined that it has not converged, the process returns to the cluster weight variable optimization procedure and the HMM variable optimization procedure, and
If it is determined that all the learning speakers have not been processed, and if it is determined that they have not been processed, the process returns to the cluster weight variable optimization procedure and the HMM variable optimization procedure for the next learning speaker. And the first convergence determination procedure is performed for all the learning speakers, cluster variables for all speakers in the cluster variable storage unit are optimized so that the configuration of the speaker cluster is optimized. A cluster variable optimizing procedure of performing optimization and updating the contents of the cluster variable storage section, and determining whether or not the cluster variables have converged, and determining that they have not converged, the cluster variable optimizing procedure Second convergence returning processing to the cluster weight variable optimization procedure and the HMM variable optimization procedure so that optimization is performed using the cluster variables stored in the cluster variable storage unit. A speech recognition acoustic model learning / adaptation program characterized by causing a computer to execute a judgment procedure.