JP3247746B2

JP3247746B2 - Creating a noise-resistant phoneme model

Info

Publication number: JP3247746B2
Application number: JP00568893A
Authority: JP
Inventors: 清宏鹿野; 泰浩南; 達雄松岡; マルタンフランク
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1993-01-18
Filing date: 1993-01-18
Publication date: 2002-01-21
Anticipated expiration: 2017-01-21
Also published as: JPH06214592A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、音声認識に用いる音韻
モデルの作成技術に関し、特に、雑音の存在する場所で
の音声認識に有効な耐雑音音韻モデルの作成に関するも
のである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for creating a phoneme model used for speech recognition, and more particularly, to a technique for creating a noise-resistant phoneme model effective for speech recognition in a place where noise exists.

【０００２】[0002]

【従来の技術】音韻ＨＭＭを用いた従来の音声認識方式
を図を用いて説明する。図６は従来の音声認識方式を実
現する機能ブロック図の一例を示す図である。同図中１
は音声入力部、２は認識部、３は認識結果出力部であ
る。４は音韻モデル格納部であり音韻ＨＭＭはここに格
納されている。2. Description of the Related Art A conventional speech recognition system using a phoneme HMM will be described with reference to the drawings. FIG. 6 is a diagram showing an example of a functional block diagram for realizing a conventional voice recognition system. 1 in the figure
Is a voice input unit, 2 is a recognition unit, and 3 is a recognition result output unit. Reference numeral 4 denotes a phoneme model storage unit in which the phoneme HMM is stored.

【０００３】まず、図５の動作を説明する。認識部２で
は、音声入力部１に入力された音声をフレーム化し、フ
レームごとに、音韻モデル格納部４に予め格納された音
韻ＨＭＭを用いて確率計算を行い、確率の高いものを認
識結果として認識結果出力部３に送信していた。認識部
２の動作をより詳細に説明する。ある一定の期間、音声
が入力されると、この入力を単位時間に区切ってフレー
ム化し、各フレーム毎に特徴パラメータを抽出する。そ
して、最初のフレームについて抽出された特徴パラメー
タについて、各音韻ごとに用意された音韻ＨＭＭを用い
て、例えば「あ」である確率、「い」である確率、
「う」である確率、等を音韻ごとに求める。First, the operation of FIG. 5 will be described. In the recognition unit 2, the speech input to the speech input unit 1 is framed, and for each frame, a probability calculation is performed using a phoneme HMM stored in advance in the phoneme model storage unit 4, and a speech with a high probability is used as a recognition result. It has been transmitted to the recognition result output unit 3. The operation of the recognition unit 2 will be described in more detail. When a voice is input for a certain period, the input is divided into unit times to form a frame, and a feature parameter is extracted for each frame. Then, for the feature parameters extracted for the first frame, for example, using a phoneme HMM prepared for each phoneme, the probability of “A”, the probability of “I”,
The probability of "U" is obtained for each phoneme.

【０００４】続いて、次のフレームの特徴パラメータを
入力し、前の状態の次にこのフレームがくる確率を、そ
れぞれ音韻ごとに求める。これを、入力音声のすべての
フレームについて行い、例えば「あ」の音韻についての
音韻ＨＭＭを用いたときに最も確率が高ければ、入力音
声を「あ」として認識する。Subsequently, the characteristic parameters of the next frame are input, and the probability that this frame will come after the previous state is obtained for each phoneme. This is performed for all the frames of the input voice. For example, if the probability is highest when using the phonetic HMM for the phoneme “a”, the input voice is recognized as “a”.

【０００５】[0005]

【発明が解決しようとする課題】しかし従来、音韻ＨＭ
Ｍの作成は、雑音のない状態で得られた音声情報をもと
に行っており、実環境では、雑音の影響を受けて音声認
識の性能が著しく低下する。また、あらかじめ種々の雑
音下で音声を収録し、この音声から音韻ＨＭＭを作成す
る方式では、雑音の種類が膨大であるため、高い認識性
能を得るためにはシステムが肥大化するという問題点が
ある。However, conventionally, the phoneme HM
M is created based on voice information obtained in a noise-free state. In a real environment, the performance of voice recognition is significantly reduced due to the influence of noise. Also, in the method of recording speech under various kinds of noise in advance and generating a phonological HMM from the speech, since the type of noise is enormous, the system becomes bloated in order to obtain high recognition performance. is there.

【０００６】本発明は、上記問題点を解決し、簡易なシ
ステムで用いることができ、これにより雑音の多い状況
下においても高い認識性能が得られるような音韻モデル
を作成することにより、発声環境に合わせた音韻モデル
を作成し、音声認識性能を向上させることを目的とす
る。[0006] The present invention solves the above-mentioned problems and can be used in a simple system, thereby creating a phoneme model capable of obtaining high recognition performance even in a noisy situation, thereby producing a vocal environment. The purpose of this invention is to create a phoneme model in accordance with, and to improve speech recognition performance.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するた
め、本発明の耐雑音音韻隠れマルコフモデルの作成方式
では、あらかじめ作成された雑音がない環境での音声の
音韻隠れマルコフモデルと雑音の隠れマルコフモデルに
ついて、前記音声の音韻隠れマルコフモデルの状態と前
記雑音の隠れマルコフモデルの状態を組み合わせ、対応
するそれぞれの状態の確率に対しスペクトル領域で前記
確率を加算することにより積空間で合成して、雑音の重
畳した音韻隠れマルコフモデルを作成する。または、ケ
プストラム領域で表された雑音がない環境での音声の音
韻隠れマルコフモデルと雑音の隠れマルコフモデルの出
力確率の分布をコサイン変換とエキスポネンシャル変換
を行って線形スペクトル領域での分布をそれぞれ算出
し、前記音声の音韻隠れマルコフモデルと前記雑音の隠
れマルコフモデルの前記線形スペクトル領域での分布を
畳み込んで雑音に適合した音韻隠れマルコフモデルを構
成し、前記畳み込みにより得られた分布に対数変換と逆
コサイン変換を行ってケプストラム領域まで変換する。
もしくは、合成で前記音声の音韻隠れマルコフモデルと
前記雑音の隠れマルコフモデルの出力確率を畳み込んで
雑音に適合した音韻隠れマルコフモデルを構成し、前記
畳み込みにより得られた分布に対数変換と逆コサイン変
換を行ってケプストラム領域まで変換する。 In order to achieve the above object, a method for creating a noise-resistant phoneme hidden Markov model according to the present invention employs a pre-created speech utterance in a noise-free environment.
Phonetic hidden Markov model and hidden Markov model of noise
The state and the state of the phonetic hidden Markov model of the speech
Combine and combine states of hidden Markov model with noise
In the spectral domain for the probability of each state
Combining in the product space by adding probabilities,
Create a folded phoneme hidden Markov model. Or,
Speech sound in a noise-free environment represented by the pustrum domain
Rhythmic Hidden Markov Model and Hidden Markov Model of Noise
Cosine transform and exponential transform of force probability distribution
To calculate each distribution in the linear spectral domain
And the phonological hidden Markov model of the speech and the noise
The distribution of the Markov model in the linear spectral region.
Construct a phonemic hidden Markov model that is convolved and adapted to noise
And the inverse of the logarithmic transformation
Perform cosine transform to convert to cepstrum domain.
Or, in synthesis, the phonetic hidden Markov model of the speech
Convolve the output probability of the hidden Markov model of the noise
Construct a phonetic hidden Markov model adapted to the noise,
Logarithmic transformation and inverse cosine transformation of distribution obtained by convolution
Conversion to the cepstrum domain.

【０００８】[0008]

【作用】これにより、発声場所の雑音を考慮した音韻Ｈ
ＭＭが作成でき、雑音下における音声認識性能の向上を
可能にする。In this way, the phoneme H considering the noise at the utterance location is obtained.
An MM can be created, and the speech recognition performance under noise can be improved.

【０００９】[0009]

【実施例】図１は本発明による、耐雑音音韻ＨＭＭの作
成手順を示す図である。以下同図に従って作成手順を説
明する。音声認識のＨＭＭで用いられている音響パラメ
ータとして、ケプストラム係数が広く用いられている。
このケプストラム係数は、対数スペクトル（対数パワー
スペクトル）とコサイン変換の関係にある。一方、周囲
雑音等は加法性雑音であり、音声とはスペクトル領域で
加算することができる。FIG. 1 is a diagram showing a procedure for creating a noise-resistant phoneme HMM according to the present invention. Hereinafter, the creation procedure will be described with reference to FIG. Cepstrum coefficients are widely used as acoustic parameters used in HMMs for speech recognition.
The cepstrum coefficient has a relationship between a logarithmic spectrum (logarithmic power spectrum) and a cosine transform. On the other hand, ambient noise and the like are additive noise and can be added to speech in the spectral domain.

【００１０】まず、ケプストラム領域で、雑音のない環
境での音韻ＨＭＭと、雑音のある環境での雑音ＨＭＭと
を作成しておき、それぞれをコサイン変換してそれぞれ
の対数スペクトルを算出する。次に、それらにエキスポ
ネンシャル変換を行ってそれぞれの線形スペクトルを算
出し、音韻と雑音の出力確率を畳み込み（コンボルーシ
ョン）で求め、このように構成した雑音込みの音韻ＨＭ
Ｍを対数変換し、続いて逆コサイン変換を行うことによ
り、音韻ＨＭＭを作成する。First, in the cepstrum domain, a phonemic HMM in a noise-free environment and a noise HMM in a noisy environment are created, and each of them is cosine-transformed to calculate each logarithmic spectrum. Next, they are subjected to exponential transformation to calculate their respective linear spectra, the output probabilities of the phoneme and noise are obtained by convolution, and the phoneme HM with noise configured in this way is obtained.
By performing logarithmic transformation of M and then performing inverse cosine transformation, a phoneme HMM is created.

【００１１】これらの変換を、音声、雑音についてそれ
ぞれについて行ない、この線形スペクトル領域で、両者
のパラメータを加算する。その後、この線形スペクトル
を対数変換して対数スペクトル領域に戻し、さらに、コ
サイン変換でケプストラム領域に戻し、周囲雑音を考慮
した音韻ＨＭＭを作成する。These conversions are performed for each of speech and noise, and both parameters are added in this linear spectrum region. Thereafter, the linear spectrum is logarithmically transformed back to a logarithmic spectrum region, and further returned to a cepstrum region by cosine transformation to create a phonemic HMM taking ambient noise into account.

【００１２】ここで、音韻ＨＭＭと雑音ＨＭＭは、正規
分布の混合（いわゆるGaussian mixture）を用いたＨＭ
Ｍで表されているとする。よって、上記の変換では、平
均値の変換とともに、共分散についても変換を行う。次
に具体的計算方法について説明する。まず、これらのＨ
ＭＭのパラメータとして、０次からｐ次までのケプスト
ラム係数を考える。このベクトルをＣ＝（Ｃ₀ Ｃ₁ Ｃ₂ - - - - - Ｃ_p-1 Ｃ_p）Ｄ＝（Ｄ₀ Ｄ₁ Ｄ₂ - - - - - Ｄ_p-1 Ｄ_p）と表す。ここで、Ｃは音韻のパラメータ、Ｄは雑音のパ
ラメータである。このケプストラム係数から対数スペク
トルの変換は、コサイン変換が知られているが、これは
線形変換であり、(p+1) ｘｍの変換行列(COS)で表す。
対数スペクトルのベクトルをそれぞれＬＣ, ＬＤで表
す。よって、コサイン変換による平均値は、ＬＣ＝Ｃ(COS) ＬＤ＝Ｄ(COS) となる。また、ケプストラム係数の共分散をΣ^C、Σ^D
とすると、対数スペクトルの共分散Σ^LC、Σ^LDは、 Σ^LC＝(COS)Σ^C(COS)^t Σ^LD＝(COS)Σ^D(COS)^t となる。このように、平均値と共分散を変換することに
より、音声、雑音の正規分布の対数スペクトル領域での
平均値と共分散が得られる。Here, the phonological HMM and the noise HMM are HM using a mixture of so-called Gaussian mixtures.
Let it be represented by M. Therefore, in the above conversion, the conversion is performed for the covariance as well as the conversion of the average value. Next, a specific calculation method will be described. First, these H
Cepstrum coefficients from the 0th to the pth order are considered as parameters of the MM. This vector is represented as C = (C ₀ C ₁ C ₂ _----- C _p-1 C _p ) D = (D ₀ D ₁ D ₂ _----- D _p-1 D _p ). Here, C is a phoneme parameter, and D is a noise parameter. The cosine transform from the cepstrum coefficients to the logarithmic spectrum is known as a cosine transform, which is a linear transform and is represented by a (p + 1) × m transform matrix (COS).
The vectors of the logarithmic spectrum are represented by LC and LD, respectively. Therefore, the average value obtained by the cosine transform is as follows: LC = C (COS) LD = D (COS) Also, the covariances of the cepstrum coefficients are Σ ^C and ^D
Then, the covariances Σ ^LC , Σ ^LD of the logarithmic spectrum are Σ ^LC = (COS) OS ^C (COS) ^{t t} ^LD = (COS) Σ ^D (COS) ^t . As described above, by converting the average value and the covariance, the average value and the covariance in the logarithmic spectrum region of the normal distribution of speech and noise can be obtained.

【００１３】次に、対数スペクトルを線形スペクトルに
変換するエキスポネンシャル変換について述べる。この
変換では正規分布の形は保たれないが、これを正規分布
で近似する。エキスポネンシャル変換したときの分布形
の平均値ＳＣ, ＳＤと共分散Σ^SC，Σ^SDを計算すると、ＳＣⁱ＝exp(ＬＣⁱ＋Σ^LC _ii/2) ＳＤⁱ＝exp(ＬＤⁱ＋Σ^LD _ii/2) Σ^SC _ij＝ＳＣⁱ× ＳＣ^j×｛ exp( Σ^LC _ij )−1 ｝ Σ^SD _ij＝ＳＤⁱ× ＳＤ^j×｛ exp( Σ^LD _ij )−1 ｝となる。このように、線形スペクトルの領域で音声と加
法性雑音の和をとることになり、和をとることは、分布
形のコンボルーションであるので、容易に２つの分布形
のコンボルーションの結果の平均値、Ｍと共分散、Σ^M
を求めることができる。よって、Ｍ＝ＳＣⁱ＋ＳＤⁱ Σ^M＝Σ^SC＋Σ^SD となる。このようにして得られた分布形の平均値と共分
散を、今までの過程と逆に、ケプストラム領域まで変換
して行く。Next, an exponential conversion for converting a logarithmic spectrum into a linear spectrum will be described. This transformation does not maintain the shape of the normal distribution, but approximates it with a normal distribution. When the mean SC, SD and the covariance Σ ^SC , Σ ^SD of the distribution form at the time of the exponential transformation are calculated, SC ⁱ = exp (LC ⁱ + Σ ^LC _ii / 2) SD ⁱ = exp (LD ⁱ + Σ ^LD _ii) ^{_{/ 2) Σ SC ij = SC}} i × SC j × {exp (Σ LC ij) -1} Σ SD ij = SD i × SD j × {exp (Σ LD ij) -1} becomes. As described above, the sum of speech and additive noise is obtained in the linear spectrum region, and since the sum is a distributed convolution, the average of the results of the two distributed convolutions is easily obtained. Value, M and covariance, Σ ^M
Can be requested. Therefore, it is ^{^{M = SC i + SD i Σ}} M = Σ SC + Σ SD. The mean value and covariance of the distribution form obtained in this way are converted to the cepstrum domain in the reverse of the previous process.

【００１４】まず、エキスポネンシャル変換の逆変換で
ある対数変換を行なう。対数変換された平均値をＬＭ、
共分散をΣ^LMとすると、エキスポネンシャル変換の逆変
換であるので、ＬＭ_i＝log(Ｍ_i) −1/2 log(Σ^M _ii /Ｍ_i ²＋1 ) Σ^LM _ij＝log(Σ^M _ij / (Ｍ_iＭ_j）＋ 1 ) と変換する。さらに、逆コサイン変換（コサイン変換と
同じ）(COS')ｍｘ(p+1)によって対数スペクトルをケプ
ストラム領域へ変換し、平均値、Ｓと共分散Σ^Sを、Ｓ＝ＬＭ (COS') Σ^S＝(COS')Σ^LM (COS')^t のように得る。そして、ケプストラム領域で得られたＨ
ＭＭの出力確率の正規分布を線形スペクトル領域までも
って行き、雑音モデルと加算してから、更に逆変換を行
ない、雑音を加算した音韻ＨＭＭを作成する。First, logarithmic transformation, which is the inverse transformation of exponential transformation, is performed. The log-transformed average is LM,
If the covariance is ^{LM LM} , it is the inverse of the exponential transform, so LM _i = log (M _i ) −1/2 log (Σ ^M _ii / M _i ² +1) Σ ^LM _ij = log (Σ ^M _ij / (M _i M _j ) +1). Furthermore, the inverse cosine transform (same as cosine transform) (COS ') by mx (p + 1) converts the logarithmic spectrum into a cepstrum region, the average value, the S and covariance ^{Σ S, S = LM (COS} ') Σ ^S = (COS ') Σ ^LM (COS') ^t . Then, the H obtained in the cepstrum region
The normal distribution of the output probability of the MM is taken up to the linear spectrum region, added to the noise model, and then subjected to an inverse transform to produce a phoneme HMM to which the noise is added.

【００１５】上記説明では、二つの分布形を取り上げ、
その合成方法を述べた。次に、二つのＨＭＭの合成方法
について述べる。通常、音韻ＨＭＭは、図２に示すよう
な左から右への３状態ぐらいのモデルで表される。一
方、雑音のモデルとしては、図３に示すようなエルゴー
ド的なＨＭＭが適している。この２つのモデルを積とし
て、積空間での合成を考える。すると、図４のような積
空間でのＨＭＭが得られる。それぞれの状態は、音韻Ｈ
ＭＭの状態と雑音ＨＭＭの状態の組み合わせからなって
いる。よって、対応するそれぞれの状態の出力確率に対
し、上記の線形スペクトル領域への変換を行ない、続い
てコンボルーションを行ない、逆変換を行なう。In the above description, two distribution types are taken up,
The synthesis method was described. Next, a method of combining two HMMs will be described. Normally, a phonological HMM is represented by a model of about three states from left to right as shown in FIG. On the other hand, an ergodic HMM as shown in FIG. 3 is suitable as a noise model. Using these two models as a product, consider combining in a product space. Then, an HMM in a product space as shown in FIG. 4 is obtained. Each state is a phoneme H
It consists of a combination of the state of the MM and the state of the noise HMM. Therefore, the output probability of each corresponding state is converted into the above-mentioned linear spectral domain, followed by convolution and inverse conversion.

【００１６】出力確率が単一の正規分布であるときに
は、２つの分布での上記の変換を行なえばよい。出力分
布が正規分布の混合であるときには、あらゆる分布の組
み合わせ対について上記の変換を行なうことになる。こ
こまでの説明では、ケプストラム係数のみについて説明
したが、音韻ＨＭＭでよく用いられるΔケプストラムや
Δパワーに対しても、それらが、ケプストラム係数の線
形和で表されているので、同様の変換が適用できる。When the output probability is a single normal distribution, the above-described conversion may be performed for two distributions. When the output distribution is a mixture of normal distributions, the above-described conversion is performed for every pair of distribution combinations. In the description so far, only the cepstrum coefficient has been described. However, the same conversion is applied to the Δcepstrum and Δpower, which are often used in the phonological HMM, since they are represented by the linear sum of the cepstrum coefficients. it can.

【００１７】図５は、本発明を音声認識装置に適用した
場合の実施例を示す図である。図中５は音韻モデル格納
部、６は雑音入力部、７は本発明の耐雑音音韻モデル作
成方式を実行する耐雑音音韻モデル作成部、８は耐雑音
音韻モデル格納部である。予め雑音のない状態で得た音
声情報から作成した音韻ＨＭＭモデルを音韻モデル格納
部５に用意し、耐雑音音韻モデル作成部７では、雑音入
力部６から入力された雑音から、雑音音韻ＨＭＭを作成
し、本発明の方式に従って作成された耐雑音音韻モデル
を耐雑音音韻モデル格納部８に格納させる。FIG. 5 is a diagram showing an embodiment in which the present invention is applied to a speech recognition device. In the figure, 5 is a phoneme model storage unit, 6 is a noise input unit, 7 is a noise-resistant phoneme model creation unit that executes the noise-resistant phoneme model creation method of the present invention, and 8 is a noise-resistant phoneme model storage unit. A phoneme HMM model created in advance from speech information obtained in a noise-free state is prepared in the phoneme model storage unit 5, and the noise-resistant phoneme model creation unit 7 converts the noise phoneme HMM from the noise input from the noise input unit 6. The noise-resistant phoneme model created according to the method of the present invention is stored in the noise-resistant phoneme model storage unit 8.

【００１８】認識部２では、音声入力部１から入力され
た入力音声を耐雑音音韻モデル格納部８に格納された音
韻モデルを用いて音声認識し、認識結果出力部３から出
力させる。The recognizing unit 2 performs speech recognition of the input speech input from the speech input unit 1 using the phoneme model stored in the noise-resistant phoneme model storage unit 8 and outputs the speech from the recognition result output unit 3.

【００１９】[0019]

【発明の効果】この発明によれば、発声環境の雑音モデ
ルを用いることができるので、発声環境に適合した頑健
な音韻ＨＭＭの作成が可能になり、高い音韻認識率の達
成が期待できる。According to the present invention, since a noise model of the speech environment can be used, a robust phoneme HMM suitable for the speech environment can be created, and a high phoneme recognition rate can be expected to be achieved.

[Brief description of the drawings]

【図１】本発明による耐雑音音韻モデルの作成手順を示
す図である。FIG. 1 is a diagram showing a procedure for creating a noise-resistant phoneme model according to the present invention.

【図２】３状態３ループの音韻ＨＭＭの例を示す。FIG. 2 shows an example of a three-state three-loop phoneme HMM.

【図３】２状態のエルゴードＨＭＭによる雑音ＨＭＭの
例を示す。FIG. 3 shows an example of a noise HMM using a two-state ergodic HMM.

【図４】図２の音韻ＨＭＭと図３の雑音ＨＭＭを、積空
間で合成した音韻ＨＭＭを示す。FIG. 4 shows a phoneme HMM obtained by combining the phoneme HMM of FIG. 2 and the noise HMM of FIG. 3 in a product space.

【図５】本発明を音声認識装置に適用した場合の実施例
を説明する図である。FIG. 5 is a diagram illustrating an embodiment when the present invention is applied to a speech recognition device.

【図６】従来の音声認識装置の構成を説明するための図
である。FIG. 6 is a diagram illustrating a configuration of a conventional voice recognition device.

───────────────────────────────────────────────────── フロントページの続き (72)発明者フランクマルタン東京都目黒区駒場４丁目６番29号 (56)参考文献特開平２−83596（ＪＰ，Ａ) ＰｒｏｃＩＥＥＥＩｎｔＣｏｎｆＡｃｏｕｓｔｉｃｓ、Ｓｐｅｅｃｈ、ａｎｄＳｉｇｎａｌＰｒｏｃｅｓｓ（ＩＣＡＳＳＰ）1992、Ｖｏｌ．１Ｉ−489〜Ｉ−492頁「ＡＤＡＰＴＡＴＩＯＮＯＦＴＨＥＨＭＭＤＩＳＴＲＩＢＵＴＩＯＳ：ＡＰＰＬＩＣＡＴＩＯＮＴＯＡＶＱＣＯＤＥＢＯＯＫＡＮＤＴＯＡＮＯＩＳＹＥＮＶＩＲＯＮＭＥＮＴ」ＡＵ：ＦＲＡＮＧＯＵＬＩＳＥＤ、ＧＡＧＡＮＥＬＩＳＤＡ (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/14 G10L 15/06 G10L 15/20 ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Frank Martin 4-6-29 Komaba, Meguro-ku, Tokyo (56) References JP-A-2-83596 (JP, A) Proc IEEE Int Conf Acoustics, Speed , And Signal Process (ICASSP) 1992, Vol. 1 I-489 to I-492 "ADAPTATION ION OF THE HMM DIS TRIBUTIOS: APPLICAT ION TO A VQ CODEBOOK AND TO A NOISY ENVIRONMENT" AU .Cl. ⁷ , DB name) G10L 15/14 G10L 15/06 G10L 15/20

Claims

(57) [Claims]

1. A state of a feature parameter of a speech is obtained for each phoneme.
A creating method of phoneme hidden Markov model consisting of that probability, hidden phoneme of the voice in the absence of noise, which is prepared in advance environment
About the Markov model and the hidden Markov model of noise
Te, of the phoneme hidden Markov model of the speech state of the noise
Combine the states of the Hidden Markov Model and the corresponding
A noise-resistant phoneme model creation method characterized by creating a phoneme hidden Markov model on which noise is superimposed by adding the probabilities in the spectral domain to the probabilities of the respective states to synthesize in a product space .

2. Obtaining a state of a feature parameter of a speech for each phoneme.
Is a method for creating a phonetic hidden Markov model consisting of probabilities, and the cosine of the output probability distributions of the phonetic hidden Markov model and the noise hidden Markov model in a noise-free environment represented by the cepstrum domain. Calculate distribution in linear spectral domain by performing transformation and exponential transformation
And the hidden phonetic Markov model of the speech and the hidden
Convolve the distribution of the Lkov model in the linear spectral domain
A noise-resistant phoneme model creation method, comprising: constructing a phoneme hidden Markov model adapted to noise and performing logarithmic transformation and inverse cosine transformation on the distribution obtained by the convolution to cepstrum domain.

3. The method according to claim 2 , wherein said synthesizing comprises a phonemic hidden Markov of said speech.
Tatami the output probability of the hidden Markov model of the model and the noise
Construct phoneme hidden Markov model adapted to noise
Then, the distribution obtained by the convolution is log-transformed and inversely transformed.
Conversion to the cepstrum domain.
3. A method for creating a noise-resistant phoneme model according to claim 2.