JPH06266384A

JPH06266384A - Acousting model adaption system

Info

Publication number: JPH06266384A
Application number: JP5055332A
Authority: JP
Inventors: Yasunaga Miyazawa; 康永宮沢; Shigeki Sagayama; 茂樹嵯峨山
Original assignee: A T R JIDO HONYAKU DENWA KENKYUSHO KK
Current assignee: A T R JIDO HONYAKU DENWA KENKYUSHO KK
Priority date: 1993-03-16
Filing date: 1993-03-16
Publication date: 1994-09-22
Anticipated expiration: 2010-01-11
Also published as: JPH071435B2

Abstract

PURPOSE:To improve performance such as the speech recognition rate of a tutor-less speaker adaption system. CONSTITUTION:A probability model (all-phoneme Ergodic HMM) is generated by the ergodic combination of all phonemes learnt from the input speech of a standard speaker by a phoneme bigram probability value. Then, the mean value vector mu of an output probability distribution as a parameter of the probability model is learnt by a maximum likelihood estimating method of Baum- Welch by using an input speech whose vocalization contents are unknown and the mean value vector mu is corrected by a moving vector field smoothing system. Then, the learning and the correction of the mean value vector is repeated until the value of output likelihood to input speed data is converged.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は音響モデル適応方式に
関し、さらに詳しくは、話者、発話様式または発話環境
などに応じて音響モデルを入力音声の特徴空間に適応す
る教師なし適応方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an acoustic model adaptation system, and more particularly to an unsupervised adaptation system for adapting an acoustic model to a feature space of an input speech according to a speaker, a speech style, a speech environment, or the like.

【０００２】[0002]

【従来の技術】従来、発声内容に関する教師データなし
の音響モデル適応方式として、ベクトル量子化コードブ
ックの写像を基本とする方法や、これらの原理を連続分
布型隠れマルコフモデル（ＨＭＭ；hidden Markov mode
l ）に適用した方法など、音声パターンの分布に基づい
て話者適応を行なう技術が、電子情報通信学会技術研究
報告ＳＰ８８−２１，ＳＰ８８−１２２，ＳＰ９０−６
７などに開示されている。2. Description of the Related Art Conventionally, a method based on mapping of a vector quantization codebook or a principle of continuous distribution type Hidden Markov Model (HMM; hidden Markov mode) has been used as an acoustic model adaptation method without teacher data relating to utterance contents.
The technique applied to the speaker based on the distribution of the voice pattern, such as the method applied to L), is the technical report of IEICE Technical Report SP88-21, SP88-122, SP90-6.
7, etc.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、教師な
し話者適応方式では、発声内容に関する言語情報を利用
していないため、的確な写像を行なうことが困難であっ
た。このため、教師なし話者適応方式は、一般に発声内
容が既知の音声データを用いる教師あり話者適応方式と
比較して、性能および効率の点で劣るという問題があっ
た。However, in the unsupervised speaker adaptation method, it is difficult to perform accurate mapping because the linguistic information relating to the utterance content is not used. Therefore, there is a problem that the unsupervised speaker adaptation method is generally inferior in performance and efficiency as compared with the supervised speaker adaptation method using voice data whose utterance content is known.

【０００４】この発明は、これらの問題点を解決するも
のであり、発声内容が未知の入力音声を用いて音響モデ
ルとその入力音声の特徴空間に適応する教師なし話者適
応方式などにおいて、その性能および効率を向上させる
ことを目的とする。The present invention solves these problems, and provides an unsupervised speaker adaptation method that adapts to an acoustic model and a feature space of the input speech by using an input speech whose utterance content is unknown. The purpose is to improve performance and efficiency.

【０００５】[0005]

【課題を解決するための手段】この発明に従った音響モ
デル適応方式の要旨とするところは、音声認識に用いる
ための音声の特徴を表現する複数の音響モデルであっ
て、１または２以上の標準話者の音声で学習されたもの
を、発声内容が未知の入力音声の特徴空間に適応する音
響モデル適応方式において、上記音響モデルのすべてが
所望の遷移確率によって互いに連結され、かつ上記音響
モデル自身も所望の遷移確率によって自己連結されてな
る確率モデルを作成し、上記確率モデルの全部または一
部のパラメータを上記入力音声で再学習することにあ
る。The gist of the acoustic model adaptation method according to the present invention is a plurality of acoustic models that express the characteristics of speech for use in speech recognition. In the acoustic model adaptation method in which what is learned by the voice of the standard speaker is adapted to the feature space of the input voice whose utterance content is unknown, all of the acoustic models are connected to each other by a desired transition probability, and the acoustic model The object is to create a probabilistic model which itself is self-connected with a desired transition probability, and re-learn all or part of the parameters of the probabilistic model with the input speech.

【０００６】また、上記音響モデル適応方式おいて、上
記音響モデルとして、音素の離散分布型、連続分布型ま
たは半連続分布型ＨＭＭを用いることにある。Further, in the acoustic model adaptation method, a discrete distribution type, a continuous distribution type or a semi-continuous distribution type HMM of phonemes is used as the acoustic model.

【０００７】また、上記音響モデル適応方式において、
上記遷移確率の初期値として、所望のテキストデータか
ら求めた音素バイグラム確率値を用いることにある。In the acoustic model adaptation method,
A phoneme bigram probability value obtained from desired text data is used as the initial value of the transition probability.

【０００８】一方、上記音響モデル適応方式において、
上記音響モデルとして、コンテキスト依存音素モデルを
用い、かつ上記遷移確率の初期値として、所望のテキス
トデータから求めたコンテキスト依存音素バイグラム確
率値を用いることにある。On the other hand, in the above acoustic model adaptation method,
A context-dependent phoneme model is used as the acoustic model, and a context-dependent phoneme bigram probability value obtained from desired text data is used as an initial value of the transition probability.

【０００９】また、上記音響モデル適応方式において、
上記確率モデルの全部または一部のパラメータを上記入
力音声で再学習するときに、移動ベクトル場平滑化方式
を用いることにある。Further, in the above acoustic model adaptation method,
This is to use a motion vector field smoothing method when retraining all or some of the parameters of the above probabilistic model with the above input speech.

【００１０】[0010]

【作用】この発明に従った音響モデル適応方式によれ
ば、音響モデルがある遷移確率によって連結されてなる
確率モデルが作成され、その確率モデルの種々のパラメ
ータが発声内容が未知の入力音声を用いて再学習される
ことによって、音響モデルが入力音声の特徴空間に適応
される。したがって、音声認識率などの性能が向上す
る。According to the acoustic model adaptation method according to the present invention, a probabilistic model in which an acoustic model is connected by a certain transition probability is created, and various parameters of the probabilistic model use the input speech whose utterance content is unknown. The acoustic model is adapted to the feature space of the input speech by being retrained. Therefore, the performance such as the voice recognition rate is improved.

【００１１】[0011]

【実施例】次に、この発明に従った音響モデル適応方式
の実施例について図面を参照しながら説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, an embodiment of an acoustic model adaptation system according to the present invention will be described with reference to the drawings.

【００１２】図２は、この発明の一実施例である教師な
し話者適応方式における確率モデルを示す概念図であ
る。FIG. 2 is a conceptual diagram showing a stochastic model in the unsupervised speaker adaptation system which is an embodiment of the present invention.

【００１３】図２に示すように、まず標準話者の入力音
声を用いて学習された混合連続分布型の音素ＨＭＭ１
を、ある遷移確率ａ_ijによってエルゴディックに連結
し、１つの大規模な確率モデルを作成する。ここでは、
無音を含む４９個の音素ＨＭＭ１を用いる。ただし、図
１においては、図を簡単にするために、音素ＨＭＭ１が
４つだけの場合を示す。以下、この確率モデルを「全音
素エルゴディックＨＭＭ」という。As shown in FIG. 2, first, a mixed continuous distribution type phoneme HMM1 learned by using a standard speaker's input voice.
Are connected to the ergodic with a certain transition probability a _ij to create one large-scale probabilistic model. here,
49 phonemes HMM1 including silence are used. However, in FIG. 1, only four phonemes HMM1 are shown for the sake of simplicity. Hereinafter, this probabilistic model is referred to as "whole phoneme ergodic HMM".

【００１４】すなわち、この全音素エルゴディックＨＭ
Ｍは、４９個の音素ＨＭＭ１のすべてがある遷移確率ａ
_ijによって互いに連結され、かつそれら音素ＨＭＭ１自
身もある遷移確率ａ_ijによって自己連結されて構成され
ている。音素ＨＭＭ１は、音声認識に用いるための音声
の特徴を表現する音響モデルの一種である。That is, this whole phoneme ergodic HM
M is a transition probability a with all 49 phoneme HMM1s.
_The phonemes HMM1 themselves are connected to each other by _ij , and are also self-connected by a certain transition probability a _ij . The phoneme HMM1 is a kind of acoustic model that expresses the characteristics of a voice used for voice recognition.

【００１５】この全音素エルゴディックＨＭＭ内におけ
る各音素ＨＭＭ１間および各音素ＨＭＭ１内の遷移確率
ａ_ijは、音素バイグラム（bigram）確率値に相当するた
め、それらの遷移確率ａ_ijの初期値として、何らかのテ
キストデータによって求めた音素バイグラム確率値を用
いる。Since the transition probabilities a _ij between the phonemes HMM1 and in each phoneme HMM1 in this all-phoneme ergodic HMM correspond to the phoneme bigram probability values, the initial values of the transition probabilities a _ij are: A phoneme bigram probability value obtained from some text data is used.

【００１６】なお、図２において、遷移確率ａ_ijは１ヶ
所だけに示されているが、その他の音素ＨＭＭ１間およ
び音素ＨＭＭ１内についても同様である。また、ａ_ijは
第ｉ番目の音素ＨＭＭ１から第ｊ番目の音素ＨＭＭ１へ
の遷移確率を表す。したがって、ｉ＝ｊのときは同じ音
素ＨＭＭ１内での自己遷移を表す。In FIG. 2, the transition probability a _ij is shown only at one location, but the same applies to other phonemes HMM1 and within the phoneme HMM1. Further, a _ij represents the transition probability from the i-th phoneme HMM1 to the j-th phoneme HMM1. Therefore, when i = j, it represents a self-transition within the same phoneme HMM1.

【００１７】したがって、この全音素エルゴディックＨ
ＭＭは、言語モデルと音響モデルとの融合した確率モデ
ルであり、あらゆる言語音声を表現するものである。こ
こで、話者により発声された音声は、音響情報と言語情
報とを備えた「言語音声」であるため、発声内容が未知
の入力話者の入力音声を用いて、この全音素エルゴディ
ックＨＭＭの各パラメータを最尤推定法により学習する
ことが可能となる。Therefore, this whole phoneme ergodic H
The MM is a probabilistic model in which a language model and an acoustic model are fused, and expresses all language speech. Here, since the voice uttered by the speaker is a “language voice” including acoustic information and linguistic information, the whole phoneme ergodic HMM is obtained by using the input voice of the input speaker whose utterance content is unknown. It is possible to learn each of the parameters by the maximum likelihood estimation method.

【００１８】このように、学習を行なうことによって発
声内容に関する言語情報を確率的に用いることができる
ので、従来のようにまったく言語情報を利用しない方式
に比べて、話者適応の性能を高めることが可能となる。As described above, since the linguistic information about the utterance content can be stochastically used by performing the learning, it is possible to improve the performance of speaker adaptation as compared with the conventional method which does not use the linguistic information at all. Is possible.

【００１９】ところで、この教師なし話者適応方式にお
いて、大量の学習データが存在する場合には、全音素エ
ルゴディックＨＭＭ中のすべてのパラメータを発声内容
が未知の入力音声で再学習することによって、音響モデ
ルと言語モデルとを同時に適応することが可能である。
ここでいうパラメータには、音素ＨＭＭ間における遷移
確率ａ_ij、ならびに音素ＨＭＭ１内における遷移確率ａ
_ij、出力確率分布の平均値ベクトル、共分散行列、およ
び混合分布の重み係数などがある。By the way, in this unsupervised speaker adaptation method, when a large amount of learning data exists, all parameters in the all-phoneme ergodic HMM are re-learned by an input voice whose utterance content is unknown, It is possible to adapt acoustic and language models simultaneously.
The parameters here include transition probability a _ij between phoneme HMMs, and transition probability a _ij within phoneme HMM1.
_ij , mean value vector of output probability distribution, covariance matrix, and weighting coefficient of mixture distribution.

【００２０】したがって、音響モデルのパラメータであ
る音素ＨＭＭ１間の遷移確率などを学習した後、言語モ
デルのパラメータである音素ＨＭＭ１間の遷移確率を学
習するという２段階の学習方法も考えられるが、この場
合は、ある程度多くの学習データが必要となる。このた
め、以下の実施例においては、より少量の学習データに
よる適応を前提として、音素ＨＭＭ１間の遷移確率ａを
固定し、音素ＨＭＭ１のパラメータのうち最も適応効果
が高いと考えられる音素ＨＭＭ１内における出力確率分
布の平均値ベクトルμだけを再学習する場合について説
明する。Therefore, a two-stage learning method is conceivable in which after learning the transition probability between the phoneme HMM1s which are the parameters of the acoustic model, the learning probability between the phoneme HMM1s which is the parameter of the language model is learned. In this case, a certain amount of learning data is needed. For this reason, in the following examples, the transition probability a between the phonemes HMM1 is fixed and the adaptation effect in the phoneme HMM1 which is considered to have the highest adaptation effect among the parameters of the phoneme HMM1 is premised on adaptation by a smaller amount of learning data. A case of re-learning only the average value vector μ of the output probability distribution will be described.

【００２１】図１は、このような場合の学習アルゴリズ
ムを示すフローチャートである。図１に示すように、ま
ずステップＳ１において、標準話者の音声によって音素
ＨＭＭをすべての音素について作成するとともに、ステ
ップＳ２において、テキストデータを用いて音素バイグ
ラム確率値を計算する。FIG. 1 is a flow chart showing a learning algorithm in such a case. As shown in FIG. 1, first, in step S1, phoneme HMMs are created for all phonemes by the voice of a standard speaker, and in step S2, phoneme bigram probability values are calculated using text data.

【００２２】次いでステップＳ３において、これらの音
素ＨＭＭを各音素バイグラム確率値によってエルゴディ
ックに連結し、図２に示した全音素エルゴディックＨＭ
Ｍを作成する。Next, in step S3, these phoneme HMMs are connected to an ergodic by each phoneme bigram probability value, and the whole phoneme ergodic HM shown in FIG. 2 is connected.
Create M.

【００２３】次いでステップＳ４において、発声内容が
未知の入力音声を用いて、バウム−ウェルチ（Baum-Wel
ch）最尤推定法によって出力確率分布の平均値ベクトル
μを学習する。Next, in step S4, the Baum-Welch (Baum-Welch) is used by using the input voice whose utterance content is unknown.
ch) Learn the average value vector μ of the output probability distribution by the maximum likelihood estimation method.

【００２４】次いでステップＳ５において、移動ベクト
ル場平滑化方式によって出力確率分布の平均値ベクトル
μを補正する。なお、移動ベクトル場平滑化方式につい
ては電子情報通信学会技術研究報告ＳＰ９２−１６に詳
しく開示されているので、ここでは簡単に説明する。Next, in step S5, the average value vector μ of the output probability distribution is corrected by the moving vector field smoothing method. Since the moving vector field smoothing method is disclosed in detail in IEICE Technical Research Report SP92-16, it will be briefly described here.

【００２５】まず、最尤推定法により入力音声で再学習
された全音素エルゴディックＨＭＭの出力確率分布の平
均値ベクトルと、その適応前の平均値ベクトルとの差分
ベクトルを標準話者空間から入力話者空間への移動ベク
トルと考え、その集合を移動ベクトル場とする。教師な
し学習の場合、誤った音素データにより出力確率分布の
平均値ベクトルを再学習している可能性があるので、こ
れには推定誤差が含まれていると考えられる。また、こ
の推定誤差は学習サンプルが少量である場合にも生じ
る。したがって、このようにして得られた移動ベクトル
の方向は非連続的な動きをしていると考えられる。さら
に、学習サンプルが少量である場合は、再学習されない
出力確率分布の平均値ベクトルも存在する。First, a difference vector between the average value vector of the output probability distribution of the all-phoneme ergodic HMM retrained by the input speech by the maximum likelihood estimation method and the average value vector before adaptation is input from the standard speaker space. We consider it as a movement vector to the speaker space, and let the set be a movement vector field. In the case of unsupervised learning, there is a possibility that the mean value vector of the output probability distribution is re-learned due to erroneous phoneme data, so it is considered that this includes an estimation error. This estimation error also occurs when the learning sample is small. Therefore, it is considered that the direction of the movement vector obtained in this way is discontinuous. Furthermore, when the learning sample is small, there is also an average value vector of the output probability distribution that is not retrained.

【００２６】そこで、移動ベクトル場に「連続性の拘束
条件」を導入することによって、移動ベクトルを平滑化
し、これにより出力確率分布の平均値ベクトルを補正す
る。さらに、未学習の平均値ベクトルに対する移動ベク
トルについては、他の移動ベクトルの内挿または外挿に
よって補間する。ここで、移動ベクトルの平滑化の強さ
はファジネス（fuzziness ）の値で制御し、この値が大
きいほど強い平滑化が行なわれる。したがって、ファジ
ネスの値が無限大の場合はすべての音素モデルは平行移
動する。Therefore, by introducing a "constraint condition of continuity" into the movement vector field, the movement vector is smoothed, and thereby the average value vector of the output probability distribution is corrected. Further, the movement vector with respect to the unlearned average value vector is interpolated by interpolation or extrapolation of another movement vector. Here, the smoothing strength of the movement vector is controlled by the value of fuzziness, and the larger this value, the stronger the smoothing. Therefore, when the fuzziness value is infinite, all phoneme models move in parallel.

【００２７】そしてステップＳ６において、全音素エル
ゴディックＨＭＭの入力音声データに対する出力尤度の
値が収束しているか否かを判別し、収束していない場合
は上記ステップＳ４に戻る。すなわち、入力音声データ
に対する出力尤度の値が収束するまで、上記ステップＳ
４およびＳ５を繰返す。Then, in step S6, it is determined whether or not the value of the output likelihood of the input voice data of the all-phoneme ergodic HMM has converged. If it has not converged, the process returns to step S4. That is, until the value of the output likelihood for the input voice data converges, the above step S
Repeat 4 and S5.

【００２８】したがって、出力尤度の値が収束している
場合は、ステップＳ７へ移行し、上記ステップＳ４〜６
で再学習された全音素エルゴディックＨＭＭ内における
音素ＨＭＭ１間の連結を外して、各音素ＨＭＭに分解す
る。Therefore, if the value of the output likelihood has converged, the process proceeds to step S7, and steps S4 to S6 are performed.
The phoneme HMM1 in the all-phoneme ergodic HMM retrained in (3) is disconnected to be decomposed into each phoneme HMM.

【００２９】以上の方法により、標準話者の音素ＨＭＭ
は発声内容が未知の入力音声を用いてその入力音声の特
徴空間に適応される。By the above method, the phoneme HMM of the standard speaker
Is adapted to the feature space of the input voice using the input voice whose utterance content is unknown.

【００３０】次に、この教師なし話者適応方式によって
１名の標準話者モデルを他の１名の入力話者モデルへ適
応した場合の実験結果を以下に示す。Next, the experimental results when one standard speaker model is adapted to another input speaker model by this unsupervised speaker adaptation method are shown below.

【００３１】標準話者の音素ＨＭＭとしては、状態数が
４で、ループ数が３で、かつ混合数が３の混合連続分布
型ＨＭＭを用いた。音素ＨＭＭの数は４９とした。音素
ＨＭＭの学習には標準話者の重要語を５２４０単語用
い、バランス単語を２１６単語用いた。音素バイグラム
確率値は、テキストデータから求めた。話者適応には入
力話者の単語発話音声を用い、評価は入力話者の適応学
習と異なる２５６０単語中の音素認識実験により行なっ
た。As a phoneme HMM for a standard speaker, a mixed continuous distribution HMM having four states, three loops and three mixtures was used. The number of phoneme HMMs was 49. For the phoneme HMM learning, 5240 words were used as the important words of the standard speaker and 216 words were used as the balance words. The phoneme bigram probability value was obtained from the text data. The speech utterance of the input speaker was used for speaker adaptation, and the evaluation was performed by a phoneme recognition experiment in 2560 words different from the adaptive learning of the input speaker.

【００３２】その結果、話者適応前の標準話者モデルで
７０．２％であった音素認識率が、この話者適応方式に
より、２５単語を用いて学習した場合は８０．４％、１
００単語を用いて学習した場合は８３．３％、２００単
語を用いて学習した場合は８７．６％となり、この発明
の話者適応方式が有効であることが実証された。As a result, the phoneme recognition rate, which was 70.2% in the standard speaker model before speaker adaptation, was 80.4% when learning with 25 words by this speaker adaptation method, 1
It was proved that the speaker adaptation method of the present invention was effective, which was 83.3% when learned by using 00 words and 87.6% when learned by using 200 words.

【００３３】以上、この発明の一実施例を説明したが、
この発明は上述した実施例に限定されることなく、その
他の態様でも実施し得るものである。The embodiment of the present invention has been described above.
The present invention is not limited to the above-mentioned embodiments, but can be implemented in other modes.

【００３４】たとえば、上記実施例では、音響モデルと
して混合連続分布型ＨＭＭを用いたが、単一連続分布型
ＨＭＭを用いてもよく、さらに離散分布型ＨＭＭを用い
てもよい。また、音響モデルとしてコンテキスト依存音
素モデルを用い、その遷移確率の初期値として何らかの
テキストデータから求めたコンテキスト依存音素バイグ
ラム確率値を用いてもよい。コンテキスト依存音素モデ
ルについては、電子情報通信学会技術研究報告ＳＰ９１
−１９の「単一ガウス分布ＨＭＭの音素環境木構造に基
づく平滑による頑健な音素認識」、および同Ｓ９１−８
８の「音素テキストと時間に関する逐次状態分割による
隠れマルコフ網の自動生成」に詳しく開示されているの
で、ここではこれを援用する。For example, although the mixed continuous distribution type HMM is used as the acoustic model in the above-mentioned embodiment, a single continuous distribution type HMM may be used, or a discrete distribution type HMM may be used. Alternatively, a context-dependent phoneme model may be used as the acoustic model, and a context-dependent phoneme bigram probability value obtained from some text data may be used as the initial value of the transition probability. Regarding the context-dependent phoneme model, IEICE Technical Report SP91
"Stable Robust Phoneme Recognition by Smoothing Based on Phoneme Environment Tree Structure of Single Gaussian HMM", and S91-8.
Since it is disclosed in detail in "Automatic Generation of Hidden Markov Network by Sequential State Division with respect to Phoneme Text and Time" in Section 8, this is incorporated herein.

【００３５】その他、上記実施例では、適応前の音素Ｈ
ＭＭとして１名の標準話者で学習したモデルを用いた
が、数名の話者の音声データが学習した不特定話者モデ
ルを用いてもよい。また、話者適応だけでなく、発話様
式適応、話者環境適応などにも応用することができる。In addition, in the above embodiment, the phoneme H before adaptation is applied.
Although the model learned by one standard speaker is used as MM, an unspecified speaker model in which voice data of several speakers is learned may be used. Further, it can be applied not only to speaker adaptation, but also to speech style adaptation, speaker environment adaptation, and the like.

【００３６】[0036]

【発明の効果】以上のように、この発明に従った音響モ
デル適応方式によれば、発声内容が未知の入力音声を用
いて、既存の音響モデルをその入力音声の特徴空間に適
応することが可能となる。このため、教師あり話者適応
方式に匹敵するほどの音声認識率が得られるなど、認識
性能が向上する。さらに、話者適応だけでなく、発話様
式適応、発話環境適応などに応用した場合も同様に認識
性能は向上する。As described above, according to the acoustic model adaptation method according to the present invention, it is possible to adapt the existing acoustic model to the feature space of the input speech by using the input speech whose utterance content is unknown. It will be possible. For this reason, the recognition performance is improved such that a speech recognition rate comparable to that of the supervised speaker adaptation method can be obtained. Furthermore, not only speaker adaptation, but also when applied to utterance style adaptation, utterance environment adaptation, etc., the recognition performance is similarly improved.

[Brief description of drawings]

【図１】この発明に従った音響モデル適応方式の一実施
例のアルゴリズムを示すフローチャートである。FIG. 1 is a flowchart showing an algorithm of an embodiment of an acoustic model adaptation system according to the present invention.

【図２】図１に示した音響モデル適応方式における確率
モデルを示す概念図である。FIG. 2 is a conceptual diagram showing a stochastic model in the acoustic model adaptation method shown in FIG.

[Explanation of symbols]

１音素ＨＭＭａ_ij 音素バイグラム確率値1 phoneme HMM a _ij phoneme bigram probability value

Claims

[Claims]

1. A plurality of acoustic models for expressing speech features for use in speech recognition, which are learned by one or two or more standard speaker's voices and are used as input voices whose utterance contents are unknown. In an acoustic model adaptation method that adapts to a feature space, all of the acoustic models are connected to each other by a desired transition probability, and the acoustic model itself also creates a probability model in which it is self-connected by a desired transition probability. An acoustic model adaptation method characterized in that all or part of parameters of a model are retrained by the input speech.

2. The acoustic model adaptation method according to claim 1, wherein a discrete distribution type, a continuous distribution type or a semi-continuous distribution type hidden Markov model of phonemes is used as the acoustic model.

3. The acoustic model adaptation method according to claim 1, wherein a phoneme bigram probability value obtained from desired text data is used as an initial value of the transition probability.

4. The context-dependent phoneme model is used as the acoustic model, and the context-dependent phoneme bigram probability value obtained from desired text data is used as an initial value of the transition probability. The acoustic model adaptation method described.

5. The moving vector field smoothing method is used when retraining all or some of the parameters of the probabilistic model with the input speech. The acoustic model adaptation method described.