JPH09230886A - Noise-resistant hidden markov model creating method for speech recognition and speech recognition device using the method - Google Patents

Noise-resistant hidden markov model creating method for speech recognition and speech recognition device using the method

Info

Publication number
JPH09230886A
JPH09230886A JP8047551A JP4755196A JPH09230886A JP H09230886 A JPH09230886 A JP H09230886A JP 8047551 A JP8047551 A JP 8047551A JP 4755196 A JP4755196 A JP 4755196A JP H09230886 A JPH09230886 A JP H09230886A
Authority
JP
Japan
Prior art keywords
hmm
noise
speech
voice
resistant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP8047551A
Other languages
Japanese (ja)
Inventor
Yasuhiro Minami
泰浩 南
Tomoko Matsui
知子 松井
Sadahiro Furui
貞煕 古井
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP8047551A priority Critical patent/JPH09230886A/en
Publication of JPH09230886A publication Critical patent/JPH09230886A/en
Pending legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To make the multiplicative distortion of linear spectral area caused by passing through telephone line into an effective noise-resistance speech HMM(Hidden Markov Model). SOLUTION: Moises in an utterance environment are recorded (S1 ), and HMM of the noises a generated. Each output probability distribution of the noise HMM and a speech HMM created from a speech uninfluenced by the noise and the multiplicative distortion is transformed to a linear spectral area (S31 ), and the speech HMM distribution in the linear spectral area is multiplied by a multiplicative distortion W of an unknown. The result of this of this multiplication and the noise HMM distribution in the linear spectral area are convolution-operated (S322 ), and the operation result is inversely transformed (S33 ) to the original speech HMM area, and an incomplete noise-resistant speech HMM (S3 ) is generated with the multiplicative distortion mode as an unknown. Frequency to the input speech of this incomplete HMM is determined, to estimate (S4 ) the multiplicative distortion of the incomplete HMM to make the frequency maximum, and this estimated value is substituted for incomplete HMM to obtain the noise-resistant speech HMM (S5 ).

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【発明の属する技術分野】この発明は、隠れマルコフモ
デル(Hidden Markov Model 、以下、HMMという)を
使用する音声認識方法に用いられ、背景雑音が加算され
た音声や、例えば電話回線を通されることで乗算性ひず
みが生じた音声などの認識に適する音声認識用耐雑音H
MMの作成方法及びその作成方法が適用された音声認識
装置に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is used in a voice recognition method using a Hidden Markov Model (hereinafter referred to as HMM), and is passed through voice added with background noise or through a telephone line, for example. Noise resistance H for voice recognition, which is suitable for recognizing voices with multiplicative distortion
The present invention relates to a method for creating an MM and a voice recognition device to which the method is applied.

【0002】[0002]

【従来の技術】HMMを使用する音声認識方法の従来例
を図1を参照して説明する。音声入力手段1を介して入
力された音声は、音声認識手段2において音声HMM格
納部3に格納されている各音声HMMとの間の類似度が
計算され、その値に基づいた認識結果が認識結果出力手
段4を介して出力される。
2. Description of the Related Art A conventional example of a voice recognition method using an HMM will be described with reference to FIG. For the voice input through the voice input unit 1, the voice recognition unit 2 calculates the degree of similarity with each voice HMM stored in the voice HMM storage unit 3 and recognizes the recognition result based on that value. It is output via the result output means 4.

【0003】従来においては、音声HMMの作成は一般
に雑音のない状態において得られた音声情報に基づいて
行われる。この様にして得られた音声HMMは、雑音の
影響を受けていないものであるところから、雑音の存在
する実環境において雑音の影響を受けた入力音声や電話
回線を通されてひずみを受けた入力音声との類似度を計
算する場合に適切な音声HMMとならず、音声認識の性
能が著しく劣化する。
Conventionally, the creation of a voice HMM is generally performed on the basis of voice information obtained in a noise-free state. Since the voice HMM obtained in this way is not affected by noise, it is distorted by passing through the input voice or telephone line affected by noise in a real environment where noise is present. When calculating the degree of similarity with the input voice, the voice HMM is not appropriate, and the voice recognition performance is significantly deteriorated.

【0004】一方、雑音の存在する環境下において雑音
の影響を受けた音声を収録し、この音声に基づいて音声
HMMを作成することも行われているが、雑音の種類は
膨大であるところから、高い認識性能を得ようとする
と、音声認識装置全体の構成が肥大化するに到る。また
実環境において、学習音声を収録して、この収録音声か
ら音声HMMを作成するには、その学習音声長は例えば
24時間程度もの長いものを必要とし、しかも、音声H
MMを作成するのに例えば2カ月程度もの時間を必要と
する。このように実環境ごとに、学習音声を収録して、
これより音声HMMを作成することは簡単に、かつ短時
間では行えなかった。
On the other hand, it is also practiced to record a voice affected by noise in an environment where noise is present and create a voice HMM based on this voice, but the types of noise are enormous. In order to obtain high recognition performance, the entire structure of the voice recognition device becomes bloated. In addition, in a real environment, in order to record a learning voice and create a voice HMM from the recorded voice, the learning voice length needs to be long, for example, about 24 hours.
It takes about two months, for example, to create the MM. In this way, learning voices are recorded for each real environment,
Therefore, it is not possible to easily create a voice HMM in a short time.

【0005】このような点より、実環境において、その
環境に適合した耐雑音音声HMMを短時間でしかも比較
的簡単に作成して、雑音の多い環境の下においても高い
認識率を得る音声認識方法が、文献1(F.Matin , K.Sh
ikano & Y.Minami ,“ Recognition of Noisy Speech b
y Composition of Hidden Markov Models ”ISSN1
018−4074( Volume 2)ESCA,pp. 103
1−1034)で提案されている。この発明はこの従来
の方法を改善したものであるから、以下にこの従来の方
法に用いられている耐雑音音声HMMの作成方法を簡単
に説明する。
From this point of view, in a real environment, a noise-resistant voice HMM suitable for the environment can be created in a short time and relatively easily, and a voice recognition that can obtain a high recognition rate even in a noisy environment. The method is document 1 (F.Matin, K.Sh
ikano & Y.Minami, “Recognition of Noisy Speech b
y Composition of Hidden Markov Models "ISSN1
018-4074 (Volume 2) ESCA, pp. 103
1-1034). Since the present invention is an improvement of this conventional method, a method of creating a noise resistant speech HMM used in this conventional method will be briefly described below.

【0006】この従来の方法では実環境で雑音を収録
し、この収録雑音に基づき雑音HMMを作成し、この雑
音HMMと、雑音や乗算性ひずみの影響を受けていない
音声HMMと積空間で合成する。この合成を図2に示す
ように行う。音声認識のHMMにおいて使用される音響
パラメータとして、ケプストラム係数が広く使用されて
いる。このケプストラム係数は、対数スペクトル(対数
パワースペクトル)とコサイン変換の関係にある。
According to this conventional method, noise is recorded in a real environment, a noise HMM is created based on this recording noise, and this noise HMM and a speech HMM not affected by noise or multiplicative distortion are combined in a product space. To do. This synthesis is performed as shown in FIG. Cepstral coefficients are widely used as acoustic parameters used in HMMs for speech recognition. The cepstrum coefficient has a relationship of logarithmic spectrum (logarithmic power spectrum) and cosine transform.

【0007】雑音HMM及び音声HMMは共にケプスト
ラム領域で作成されているものとする。これら雑音HM
M及び音声HMMの各出力確率の分布をそれぞれコサイ
ン変換してそれぞれの対数スペクトル上の分布を算出す
る(S311 )。次に、これら対数スペクトル上の分布の
エキスポーネンシャル変換を行なってそれぞれの線形ス
ペクトル上の分布を算出する(S312 )。これら線形ス
ペクトルとされた雑音HMMと音声HMMとの分布を畳
み込み演算により求め(S32)、この合成HMMを対数
変換し(S331 )、続いて逆コサイン変換を行ってケプ
ストラム領域での耐雑音音声HMMを作成する
(S332 )。
It is assumed that both the noise HMM and the voice HMM are created in the cepstrum domain. These noise HM
The distributions of the output probabilities of the M and the speech HMM are respectively cosine-transformed to calculate the distributions on the respective logarithmic spectra (S 311 ). Next, the exponential transformation of these logarithmic spectrum distributions is performed to calculate the respective linear spectrum distributions ( S312 ). The distribution of the noise HMM and the speech HMM, which are linear spectra, is obtained by a convolutional operation (S 32 ), the composite HMM is logarithmically transformed (S 331 ), and subsequently the inverse cosine transform is performed to perform noise resistance in the cepstrum domain. A voice HMM is created ( S332 ).

【0008】ここで、音声HMMと雑音HMMの各出力
確率の分布は正規分布の混合(いわゆる Gaussiann mix
ture)を使用したHMMで表されているとする。正規分
布はその平均値と共分散で表現することができるから、
図2中の各変換は、平均値の変換と共分散の変換を行な
う。次に、HMMの出力確率の分布が正規分布の場合の
上記変換方法について更に説明する。
The distribution of the output probabilities of the speech HMM and the noise HMM is a mixture of normal distributions (so-called Gaussiann mix).
ture). Since the normal distribution can be expressed by its mean value and covariance,
Each conversion in FIG. 2 performs conversion of average value and conversion of covariance. Next, the conversion method when the distribution of the output probability of the HMM is a normal distribution will be further described.

【0009】先ず、これらHMMの出力確率の分布のパ
ラメータとして0次からp次までのケプストラム係数を
考え、 C=(C0 1 2 …Cp-1 p ) ・・・(1) D=(D0 1 2 …Dp-1 p ) ・・・(2) と表す。ここでCは音声HMMの分布の平均値、Dは雑
音HMMの分布の平均値である。このケプストラム係数
から対数スペクトルへの変換は、コサイン変換として知
られているが、これは線形変換であり、(p+1)×m
の変換行列(COS)で表す。mは対数スペクトルの次
数である。音声HMM、雑音HMMの各分布の平均値の
対数スペクトルのベクトルをそれぞれLC,LDで表す
と、 LC=C(COS) ・・・(3) LD=D(COS) ・・・(4) となる(図2、S311 )。また、ケプストラム領域での
音声HMM、雑音HMMの分布の共分散をそれぞれ
ΣC ,ΣD とすると、対数スペクトルの領域でのこれら
分布の共分散ΣLC,ΣLDは、 ΣLC=(COS)ΣC (COS)t ・・・(5) ΣLD=(COS)ΣD (COS)t ・・・(6) となる。 t は転置行列を表す。この様にして、音声H
MM、雑音HMMの正規分布の対数スペクトル領域の平
均値と共分散がそれぞれ得られる(図2、S311)。
First, consider the 0th to pth cepstrum coefficients as parameters of the distribution of the output probabilities of these HMMs, and C = (C 0 C 1 C 2 ... C p-1 C p ) (1) D = (D 0 D 1 D 2 ... D p-1 D p ) ... (2) Here, C is the average value of the distribution of the voice HMM, and D is the average value of the distribution of the noise HMM. The conversion from this cepstrum coefficient to a logarithmic spectrum is known as a cosine transform, which is a linear transform and is (p + 1) × m.
Is represented by a conversion matrix (COS). m is the order of the log spectrum. When the vector of the logarithmic spectrum of the average value of each distribution of the voice HMM and the noise HMM is represented by LC and LD, respectively, LC = C (COS) ... (3) LD = D (COS) ... (4) (FIG. 2, S 311 ). If the covariances of the speech HMM and noise HMM distributions in the cepstrum region are Σ C and Σ D , respectively, the covariances Σ LC and Σ LD of these distributions in the logarithmic spectrum region are Σ LC = (COS) Σ C (COS) t (5) Σ LD = (COS) Σ D (COS) t (6) t represents a transposed matrix. In this way, voice H
The average value and covariance of the logarithmic spectrum region of the normal distribution of the MM and the noise HMM are obtained (FIG. 2, S 311 ).

【0010】次に、対数スペクトルを線形スペクトルに
変換するエキスポーネンシャル変換について説明する。
この変換は正規分布の形にならないが、変換されたもの
を正規分布を用いて近似する。対数スペクトル領域の各
平均値LC,LD、各共分散ΣLC,ΣLDをそれぞれエキ
スポーネンシャル変換したときの各分布の平均値SC,
SDと共分散ΣSC,ΣSDを計算すると、それぞれ SCi =exp(LCi +ΣLC ij/2) ・・・(7) SDi =exp(LDi +ΣLD ij/2) ・・・(8) ΣSC ij=SCi ×SCj ×{exp(ΣLC ij)−1} ・・・(9) ΣSD ij=SDi ×SDj ×{exp(ΣLD ij)−1} ・・・(10) i,j=0,1,2,・・・,p となる(S312 )。
Next, exponential conversion for converting a logarithmic spectrum into a linear spectrum will be described.
This transformation does not take the form of a normal distribution, but the transformed one is approximated using a normal distribution. Mean values LC and LD of the logarithmic spectrum region, mean values SC of respective distributions when respective covariances Σ LC and Σ LD are subjected to exponential transformation,
When SD and covariances Σ SC and Σ SD are calculated, SC i = exp (LC i + Σ LC ij / 2) ... (7) SD i = exp (LD i + Σ LD ij / 2) ... ( 8) Σ SC ij = SC i × SC j × {exp (Σ LC ij ) −1} (9) Σ SD ij = SD i × SD j × {exp (Σ LD ij ) −1} ... (10) i, j = 0,1,2, ..., p ( S312 ).

【0011】周囲雑音は加法性雑音であって線形スペク
トルの領域において音声と雑音とを加算することができ
るから、線形スペクトル領域での音声HMM、雑音HM
Mの分布の和である耐雑音音声HMMの分布の平均値M
と共分散ΣM は次式により求める(図2、S32)。 Mi =SCi +SDi ・・・(11) ΣM ij=ΣSC ij+ΣSD ij ・・・(12) この様にして得られた分布の平均値Mi と共分散ΣM
今までの過程と逆にケプストラム領域まで変換してい
く。先ず、エキスポーネンシャル変換の逆変換である対
数変換を行なう。その対数変換された平均値をLM、共
分散をΣLMとすると、エキスポーネンシャル変換の逆変
換であるので LMi =log(Mi )−1/2log(ΣM ij/Mi 2 +1) ・・(13) ΣLM ij=log(ΣM ij/(Mi j )+1) ・・(14) を演算する(図2、S331 )。更に、逆コサイン変換
(COS′)m×(p+1)によって対数スペクトルを
ケプストラム領域へ変換し、耐雑音音声HMMの出力確
率の分布の平均値Sと共分散ΣS を次式により得る(図
2、S332 )。
Since ambient noise is additive noise and speech and noise can be added in the region of the linear spectrum, the speech HMM and noise HM in the region of the linear spectrum are present.
The average value M of the noise resistant speech HMM distribution, which is the sum of the distributions of M
And covariance Σ M are calculated by the following equation (FIG. 2, S 32 ). M i = SC i + SD i (11) Σ M ij = Σ SC ij + Σ SD ij (12) The average value M i and covariance Σ M of the distribution thus obtained have been calculated up to now. Reverse the process of to convert to the cepstrum region. First, logarithmic transformation, which is the inverse transformation of exponential transformation, is performed. Letting LM be the logarithmically transformed average value and Σ LM be the covariance, it is the inverse transformation of the exponential transformation, so LM i = log (M i ) −½ log (Σ M ij / M i 2 +1). ) ··· (13) Σ LM ij = log (Σ M ij / (M i M j ) +1) ··· (14) is calculated (FIG. 2, S 331 ). Further, the inverse cosine transform (COS ′) m × (p + 1) is used to transform the logarithmic spectrum into the cepstrum region, and the average value S and covariance Σ S of the output probability distribution of the noise-resistant speech HMM are obtained by the following equation (FIG. 2). , S 332 ).

【0012】 S=LM(COS′) ・・・(15) ΣS =(COS′)ΣLM(COS′)t ・・・(16) 分布が単一の正規分布であるときには、2つの分布で上
記変換を行なえばよい。分布が正規分布の混合であると
きには、あらゆる分布の組み合わせに対して、上記の変
換を行なえばよい。従って例えば音声HMMが3つの正
規分布の混合で、雑音HMMが3つの正規分布の混合で
ある場合は耐雑音音声HMMは3×3=9の正規分布の
混合となる。
S = LM (COS ′) (15) Σ S = (COS ′) Σ LM (COS ′) t (16) When the distribution is a single normal distribution, two distributions Then, the above conversion may be performed. When the distribution is a mixture of normal distributions, the above conversion may be performed for all combinations of distributions. Therefore, for example, when the speech HMM is a mixture of three normal distributions and the noise HMM is a mixture of three normal distributions, the noise-resistant speech HMM is a mixture of 3 × 3 = 9 normal distributions.

【0013】上記説明では、音声と雑音のHMM中の一
つずつの分布形を取り上げ、その合成法を述べた。通常
音声HMMは図3左上に示すような右から左へ遷移する
3状態A、B、Cぐらいのモデルで表せる。一方雑音モ
デルとしては図3右上に示すように2状態1、2間を遷
移するエルゴード的なHMMが適している。この時、耐
雑音音声HMMは、図3下に示すような積モデルとな
り、6つの状態1A,1B,1C,2A,2B,2Cを
もち、それぞれの状態は音声HMMの状態と雑音HMM
の状態の組み合わせからなっている。例えば状態1Aは
音声HMMの状態Aと雑音HMMの状態1から合成さ
れ、この状態1Aでの出力分布はPA *P1となる。こ
こでPA は音声HMMの状態Aでの分布、P1 は雑音H
MMの状態1での分布をそれぞれ表し、また*は、図2
を参照して説明した変換を表す。このような操作を、各
状態1B,1C,2A〜2Cについてそれぞれ対応する
音声HMM、雑音HMMの状態での分布を用いて行な
う。さらにこの耐雑音音声HMMの状態間の遷移確率
は、図3の下に示すように音声HMMの状態間の遷移と
雑音HMMの状態間の遷移との積の形となる。例えば、
状態1Aから状態1Bへの遷移確率は音声HMMの状態
Aから状態Bへの遷移確率aABと雑音HMMの状態1か
ら状態1への遷移確率a11との積の形(aAB×a11)に
なる。
In the above description, the respective distribution forms in the HMM of speech and noise are taken up and the synthesis method thereof is described. The normal voice HMM can be represented by a model of three states A, B, and C that transit from right to left as shown in the upper left of FIG. On the other hand, as the noise model, an ergodic HMM that transits between two states 1 and 2 as shown in the upper right of FIG. 3 is suitable. At this time, the noise resistant speech HMM becomes a product model as shown in the lower part of FIG. 3, and has six states 1A, 1B, 1C, 2A, 2B and 2C, and each state is a state of the speech HMM and a noise HMM.
It consists of a combination of states. For example, the state 1A is synthesized from the state A of the voice HMM and the state 1 of the noise HMM, and the output distribution in this state 1A is P A * P 1 . Here, P A is the distribution in the state A of the speech HMM, P 1 is the noise H
Each of the distributions of MM in the state 1 is shown, and * is shown in FIG.
Represents the transformation described with reference to. Such an operation is performed using the distributions in the states of the voice HMM and the noise HMM corresponding to each of the states 1B, 1C, 2A to 2C. Further, the transition probability between the states of the noise resistant speech HMM is in the form of the product of the transition between the states of the speech HMM and the transition between the states of the noise HMM as shown in the lower part of FIG. For example,
The transition probability from the state 1A to the state 1B is a product of the transition probability a AB from the state A of the speech HMM to the state B and the transition probability a 11 of the noise HMM from the state 1 to the state 1 (a AB × a 11 )become.

【0014】この従来の方法によれば確かに加算値雑音
に対して多少強い音声HMMが得られる。しかも音声H
MMのセットとしては既存のものを利用でき、一方雑音
HMMは実環境で例えば5〜6秒、長くても20秒程度
の短かい時間雑音を収録し、その収録雑音に基づき作成
すればよく、かつこの雑音HMMの作成時間は1秒程度
で作ることができる。このようにしてその環境雑音を収
録し、更に雑音HMMを作り、この雑音HMMと音声H
MMとを前述のように合成して耐雑音音声HMMを作
り、この耐雑音音声HMMを用いて入力音声を認識する
までの時間は1分間程(高速の演算手段を用いるともっ
と短時間)と短時間である。
According to this conventional method, it is possible to obtain a speech HMM which is somewhat strong against the additive noise. Moreover, voice H
An existing one can be used as a set of MMs, while a noise HMM can record a short time noise of, for example, 5 to 6 seconds, and at most about 20 seconds in a real environment, and can be created based on the recorded noise. Moreover, the noise HMM can be created in about 1 second. In this way, the environmental noise is recorded, and a noise HMM is further created.
MM and MM are combined as described above to create a noise resistant speech HMM, and the time until the input speech is recognized using this noise resistant speech HMM is about one minute (it is much shorter if a high-speed arithmetic means is used). It's a short time.

【0015】[0015]

【発明が解決しようとする課題】しかしこの従来の方法
は音声信号を例えば電話回線を通すことによりその音声
信号に生じる線形スペクトル領域の乗算性ひずみに対し
ては全く効果がなく、また加算性雑音に対しても十分と
は言えなかった。この発明の目的は加算性雑音のみなら
ず線形スペクトル領域の乗算性ひずみを受けた音声に対
しても高い認識率を得ることを可能とし、しかも短時間
に、比較的簡単に作ることができる音声認識用耐雑音隠
れマルコフモデル作成方法を提供することにある。
However, this conventional method has no effect on the multiplicative distortion in the linear spectral region which occurs in a voice signal when the voice signal is passed through, for example, a telephone line, and the additive noise is not generated. It wasn't enough for me. It is an object of the present invention to obtain a high recognition rate not only for additive noise but also for voices that are subject to multiplicative distortion in the linear spectral region, and that can be produced relatively easily in a short time. It is to provide a method for creating a noise-resistant hidden Markov model for recognition.

【0016】この発明の他の目的は加算性雑音のみなら
ず線形スペクトル領域の乗算性ひずみを受けた音声に対
しても高い認識率が得られ、かつ耐雑音音声HMMを比
較的簡単、短時間に作ることができる音声認識装置を提
供することにある。
Another object of the present invention is to obtain a high recognition rate not only for additive noise but also for speech that has undergone multiplicative distortion in the linear spectrum region, and for making noise resistant speech HMM relatively simple and in a short time. The object is to provide a voice recognition device that can be manufactured in

【0017】[0017]

【課題を解決するための手段】この発明の方法によれば
雑音や乗算性ひずみ(以下単に雑音で総称する)の影響
を受けていない音声HMMと、雑音から作られたそのH
MM(雑音HMM)とから、線形スペクトル領域の乗算
性ひずみ又はS/N(信号/雑音)を未知数(変数)と
して含む未完耐雑音音声HMMを第1ステップで作成
し、上記未完耐雑音音声HMMの入力音声に対する尤度
が最大になるような上記乗算性ひずみ又はS/Nを第2
ステップで推定し、この推定した値を上記未完耐雑音音
声HMMに代入して耐雑音音声HMMを第3ステップで
完成する。
According to the method of the present invention, a speech HMM that is not affected by noise or multiplicative distortion (hereinafter simply referred to as noise) and its H generated from noise.
From the MM (noise HMM), an incomplete noise-resistant speech HMM including a multiplicative distortion in the linear spectrum region or S / N (signal / noise) as an unknown (variable) is created in the first step, and the uncompleted noise-resistant speech HMM is generated. The multiplicative distortion or S / N that maximizes the likelihood of the input speech of
The noise-resistant voice HMM is completed in the third step by estimating the value in the step and substituting the estimated value into the uncompleted noise-resistant voice HMM.

【0018】上記第1ステップは音声HMMと雑音HM
Mとを積空間で合成することにより上記未完耐雑音音声
HMMを求める。前記積空間での合成は音声HMMと雑
音HMMとの各出力確率の分布を線形スペクトル領域に
第4ステップで変換し、これら線形スペクトル領域の音
声HMMと雑音HMMの各出力確率の分布を互いに畳み
込み演算を第5ステップで行い、この畳み込み演算結果
を元の音声HMM領域に第6ステップで逆変換して行
う。
The first step is the speech HMM and the noise HM.
The uncompleted noise resistant speech HMM is obtained by synthesizing M and M in the product space. In the synthesis in the product space, the distributions of the output probabilities of the speech HMM and the noise HMM are transformed into a linear spectral domain in the fourth step, and the distributions of the output probabilities of the speech HMM and the noise HMM in the linear spectral domain are convoluted with each other. The calculation is performed in the fifth step, and the convolution calculation result is inversely transformed into the original speech HMM area in the sixth step.

【0019】ケプストラム領域でそれぞれ表わされた音
声HMMと雑音HMMの各出力確率の分布を、コサイン
変換し、更にエキスポーネンシャル変換し、上記第6ス
テップは上記畳み込み演算結果を対数変換し、更に逆コ
サイン変換する。上記第4ステップは、対数スペクトル
領域でそれぞれ表わされた音声HMMと雑音HMMの各
出力確率の分布をそれぞれエキスポーネンシャル変換
し、上記第6ステップは上記畳み込み演算結果を対数変
換する。
The distributions of the output probabilities of the speech HMM and the noise HMM respectively expressed in the cepstrum domain are cosine-transformed, and further exponential-transformed. In the sixth step, the convolution operation result is logarithmically transformed, Further, inverse cosine transform is performed. In the fourth step, the output probability distributions of the voice HMM and the noise HMM respectively expressed in the logarithmic spectrum domain are subjected to exponential transformation, and the sixth step is subjected to logarithmic transformation of the convolution operation result.

【0020】前記積空間での合成は、線形スペクトル領
域でそれぞれ表わされた音声HMMと雑音HMMの各出
力確率の分布の畳み込み演算で行う。前記第2ステップ
は、未完耐雑音音声HMMに各種の乗算ひずみ又はS/
Nを与え、これら各未完耐雑音音声HMMと入力音声と
の尤度を求め、求めた尤度の最大となった乗算ひずみ又
はS/Nを推定値とする。
The synthesis in the product space is performed by the convolution operation of the distributions of the output probabilities of the speech HMM and the noise HMM respectively expressed in the linear spectral domain. In the second step, various multiplication distortions or S /
N is given, the likelihood between each of these uncompleted noise-resistant speech HMMs and the input speech is calculated, and the multiplication distortion or S / N that maximizes the calculated likelihood is used as the estimated value.

【0021】前記第2ステップは最尤推定方法又は最急
降下法による繰り返し演算により推定する。第1ステッ
プで用いる雑音HMMを、環境雑音を収録して作成す
る。前記線形スペクトル領域での音声HMMの分布と雑
音HMMの分布との畳み込み演算を、音声HMMの分布
の線形スペクトルに未知の乗算性ひずみを乗算して行
う。
In the second step, estimation is performed by iterative calculation according to the maximum likelihood estimation method or the steepest descent method. The noise HMM used in the first step is created by recording environmental noise. The convolution operation of the distribution of the speech HMM and the distribution of the noise HMM in the linear spectrum region is performed by multiplying the linear spectrum of the distribution of the speech HMM by an unknown multiplicative distortion.

【0022】前記線形スペクトル領域での音声HMMの
分布と雑音HMMの分布との畳み込み演算を、音声HM
Mの分布の線形スペクトルに10-(S/N)/2を乗算して又
は雑音HMMの分布の線形スペクトルに10
(S/N)/2 (S/Nは未知数)を乗算して行う。ケプスト
ラム領域で表わされた上記音声HMMの出力確率の分布
の各ケプストラムに未知数の乗算性ひずみ成分を加算し
て上記積空間での合成を行う。
The convolution operation of the distribution of the speech HMM and the distribution of the noise HMM in the linear spectrum domain is performed by the speech HM.
Multiply the linear spectrum of the distribution of M by 10 − (S / N) / 2 or 10 to the linear spectrum of the distribution of the noise HMM.
It is performed by multiplying (S / N) / 2 (S / N is an unknown number). An unknown multiplicative distortion component is added to each cepstrum of the output probability distribution of the speech HMM represented in the cepstrum region, and synthesis in the product space is performed.

【0023】ケプストラム領域で表わされた上記音声H
MMと雑音HMMの一方の出力確率の分布のケプストラ
ムの0次に未知数のS/N成分を加算して上記積空間で
の合成を行う。対数スペクトル領域で表わされた上記音
声HMMの出力確率の分布の対数スペクトラムに未知数
の乗算性ひずみ成分を加算して上記積空間での合成を行
う。
The voice H represented by the cepstrum region
The 0th-order unknown S / N component of the cepstrum of the distribution of the output probabilities of one of the MM and the noise HMM is added to perform synthesis in the product space. An unknown multiplicative distortion component is added to the logarithmic spectrum of the output probability distribution of the speech HMM expressed in the logarithmic spectrum domain, and synthesis in the product space is performed.

【0024】対数スペクトル領域で表わされた上記音声
HMM及び雑音HMMの一方の出力確率の分布の対数ス
ペクトラムに未知数のS/N成分を加算して上記積空間
での合成を行う。この発明の音声認識装置によれば、前
記この発明の方法により音声HMMと雑音HMMとから
作られた耐雑音音声HMMを用いて入力音声を認識する
装置であって、雑音や乗算性ひずみの影響を受けていな
い音声HMMのセットが音声HMM格納部に格納されて
あり、周囲雑音が雑音入力手段により入力され、その入
力された雑音に基づき雑音HMMが雑音HMM作成手段
により作成されて雑音HMM格納部に格納され、この格
納された雑音HMMと音声HMM格納部の音声HMM
と、S/N又は乗算性ひずみ格納部からのS/N又は乗
算性ひずみとが未完耐雑音音声HMM合成手段により、
S/N又は乗算性ひずみを未知数として積空間で合成さ
れて未完耐雑音HMMが作成され、この未完耐雑音音声
HMMの未知数が、S/N或は乗算性ひずみ推定手段に
より、音声入力手段から入力された入力音声に対する未
完耐雑音音声HMMの尤度が最大になるように推定さ
れ、その推定されたS/N或いは乗算性ひずみの値が耐
雑音音声HMM完成手段で未完耐雑音音声HMMに代入
されて耐雑音音声HMMが完成されて耐雑音音声HMM
格納部に格納され、入力音声はこの耐雑音音声HMMと
の類似度が音声認識手段で計算され、その計算結果にも
とづき、認識結果が出力される。
The unknown S / N component is added to the logarithmic spectrum of the output probability distribution of one of the speech HMM and the noise HMM expressed in the logarithmic spectrum domain to perform synthesis in the product space. According to the speech recognition apparatus of the present invention, there is provided an apparatus for recognizing an input speech by using a noise resistant speech HMM made up of a speech HMM and a noise HMM by the method of the present invention, which is affected by noise and multiplicative distortion. A set of voice HMMs that have not been received are stored in the voice HMM storage unit, ambient noise is input by the noise input means, and noise HMMs are created by the noise HMM creating means based on the input noises and stored in the noise HMMs. Noise HMM stored in the storage unit and the voice HMM stored in the voice HMM storage unit
And the S / N or the multiplicative distortion from the S / N or the multiplicative distortion storage unit are processed by the incomplete noise resistant speech HMM synthesizing means.
The unfinished noise resistant HMM is created by synthesizing in the product space with S / N or the multiplicative distortion as an unknown number, and the unknown number of this unfinished noise resistant speech HMM is output from the voice input means by the S / N or the multiplicative distortion estimation means. The uncompleted noise-resistant speech HMM is estimated so that the likelihood of the uncompleted noise-resistant speech HMM for the input speech is maximized, and the estimated S / N or the value of the multiplicative distortion is converted into the uncompleted noise-resistant speech HMM by the noise-resistant speech HMM completion means. Noise-resistant speech HMM is completed by substitution and noise-resistant speech HMM
The input voice is stored in the storage unit, the degree of similarity with the noise resistant voice HMM is calculated by the voice recognition means, and the recognition result is output based on the calculation result.

【0025】[0025]

【発明の実施の形態】図4にこの発明による音声認識装
置の機能構成例を示し、図5Aにこの発明方法の一例に
おける処理手順を示す。音声HMM格納部3には予め雑
音の無い状態において収録された音声情報について作成
された音声HMMのセットが格納されている。そして、
雑音入力手段5を介して環境雑音を収録し(S1 )、収
録された環境雑音について雑音HMM作成手段6におい
て雑音HMMを作成する(S2 )。雑音HMMの作成は
音声HMMの作成と同様の手法で行えばよく、これは例
えば S.E.Levinson , L. &. Rabinen , and M.M.Sondhi
:“ An Introduction to the Application of the The
ory of Probabilistic Functions of a Markov Process
to Automatic Speech Recognition”, The Bell Syste
m Technical Journal , Vol. 62 , No.4 ( April1983)
に示されている。作成された雑音HMMは雑音HMM格
納部7に格納しておく。雑音HMM格納部7に格納され
た雑音HMMは音声HMM格納部3に格納されている音
声HMMと同一の特徴パラメータで表現されている、つ
まり音声HMMが例えばケプストラム領域で表現されて
いる場合は雑音HMMもケプストラム領域で表現したも
のとする。
DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 4 shows an example of the functional configuration of a speech recognition apparatus according to the present invention, and FIG. 5A shows the processing procedure in an example of the present invention method. The voice HMM storage unit 3 stores a set of voice HMMs created for voice information recorded in advance in a noise-free state. And
Environmental noise is recorded via the noise input means 5 (S 1 ), and noise HMM is created by the noise HMM creating means 6 for the recorded environmental noise (S 2 ). The noise HMM can be created by the same method as that of the speech HMM, and this can be done by, for example, SELevinson, L. &. Rabinen, and MMSondhi.
: “An Introduction to the Application of the The
ory of Probabilistic Functions of a Markov Process
to Automatic Speech Recognition ”, The Bell Syste
m Technical Journal, Vol. 62, No.4 (April 1983)
Is shown in The created noise HMM is stored in the noise HMM storage unit 7. The noise HMM stored in the noise HMM storage unit 7 is represented by the same characteristic parameter as that of the speech HMM stored in the speech HMM storage unit 3, that is, noise is generated when the speech HMM is represented in, for example, a cepstrum region. The HMM is also expressed in the cepstrum region.

【0026】未完耐雑音音声HMM合成手段8で雑音H
MM格納部7内の雑音HMMと音声HMM格納部3内の
音声HMMとから、S/N或いは乗算性ひずみ格納部9
内の線形スペクトル領域の乗算性ひずみ又はS/Nを未
知数として含む未完耐雑音音声HMMを作成する
(S3 )。この未完耐雑音音声HMMの作成は雑音HM
Mと音声HMMとを積空間において合成することにより
行うが、その際に音声HMMに未知乗算性ひずみが乗算
乃至加算され、あるいは音声HMM又は雑音HMMに対
し未知S/Nが乗算乃至加算される。
Incomplete noise resistant speech HMM synthesizer 8 generates noise H
From the noise HMM in the MM storage unit 7 and the voice HMM in the voice HMM storage unit 3, the S / N or multiplicative distortion storage unit 9 is obtained.
An uncompleted noise-resistant speech HMM including the multiplicative distortion or S / N in the linear spectral region in as an unknown is created (S 3 ). This incomplete noise resistant speech HMM is created by noise HM
This is performed by synthesizing M and the voice HMM in the product space. At that time, the voice HMM is multiplied or added with the unknown multiplicative distortion, or the voice HMM or the noise HMM is multiplied or added with the unknown S / N. .

【0027】雑音HMMと音声HMMとの積空間におけ
る合成は例えば従来の手法と同様に図5Bに示すよう
に、雑音HMMと音声HMMの各出力確率の分布を線形
スペクトル領域に変換し(S31)、これら線形スペクト
ル領域の雑音HMMと音声HMMの出力確率の分布の畳
み込み演算を行うが、この際に、線形スペクトル領域の
音声HMMの出力確率分布の線形スペクトルに対して乗
算性ひずみ(未知数)を乗算し、あるいは線形スペクト
ル領域の音声HMM及び雑音HMMの一方の出力確率分
布の線形スペクトルにS/Nと対応した成分(未知数)
を乗算し(S321)、その後、その乗算された出力確率
分布と乗算されない出力確率分布との畳み込み演算を行
う(S322 )、その畳み込み演算結果を元の領域に逆変
換して未完耐雑音音声HMMを得る(S33)。図2に示
した従来法の説明と同様に音声HMM、雑音HMMは共
にケプストラム領域で作成されているものとし、図6に
図2と対応する部分に同一記号を付けて、この発明方法
のより具体例を示す。つまり音声HMM、雑音HMMの
各出力確率分布の平均値C,Dは式(1),(2)で表
わされ、これらはコサイン変換されて対数スペクトルL
C,LDにそれぞれ変換され、また各分布の共分散もΣ
LC,ΣLDに変換され(S311 )、更にこれらはそれぞれ
線形スペクトル領域の平均値SCi ,SDi ,ΣSC ij
ΣSD ijに変換される(S312 )。所で一般の通信分野で
は例えば環境雑音Nの存在下で発声した音声Sをマイク
ロホンで受音し、伝送路へ通した時の出力Xは、マイク
ロホン及び伝送路で受けるひずみをWとするとX=WS
+Nと表わされることが知られている。この点を考慮し
て、この実施例では乗算性ひずみ格納部9(図4)内の
未知乗算性ひずみWを音声HMMの出力確率の分布に乗
算し(S321 )、この乗算した分布と雑音HMMとの畳
み込み演算をする(S322 )。つまり式(11),(12)
に代えて次式を演算する。
In the synthesis of the product space of the noise HMM and the speech HMM, for example, as in the conventional method, as shown in FIG. 5B, the distribution of the output probabilities of the noise HMM and the speech HMM is converted into a linear spectral domain (S 31 ), The convolution operation of the distribution of the output probabilities of the noise HMM and the speech HMM in the linear spectrum region is performed. At this time, the multiplicative distortion (unknown number) is applied to the linear spectrum of the output probability distribution of the speech HMM in the linear spectrum region. Or a component corresponding to S / N in the linear spectrum of the output probability distribution of one of the speech HMM and the noise HMM in the linear spectrum domain (unknown)
(S 321 ), and then, a convolution operation is performed between the output probability distribution that has been multiplied and the output probability distribution that is not multiplied (S 322 ), and the result of the convolution operation is inversely transformed into the original area to perform the uncompleted noise immunity. get a voice HMM (S 33). Similar to the description of the conventional method shown in FIG. 2, it is assumed that both the voice HMM and the noise HMM are created in the cepstrum region. In FIG. 6, parts corresponding to those in FIG. A specific example is shown. That is, the average values C and D of the output probability distributions of the speech HMM and the noise HMM are represented by the equations (1) and (2), and these are cosine transformed to obtain the logarithmic spectrum L.
C and LD respectively, and the covariance of each distribution is Σ
LC , Σ LD (S 311 ), and these are respectively averaged values SC i , SD i , Σ SC ij in the linear spectral region,
It is converted into Σ SD ij (S 312 ). In the general communication field, for example, an output X when a voice S uttered in the presence of environmental noise N is received by a microphone and passed through a transmission path is X =, where W is a distortion received by the microphone and the transmission path. WS
It is known to be represented as + N. In consideration of this point, in this embodiment, the distribution of the output probability of the voice HMM is multiplied by the unknown multiplicative distortion W in the multiplicative distortion storage unit 9 (FIG. 4) (S 321 ), and this multiplied distribution and noise are multiplied. The convolution operation with the HMM is performed ( S322 ). That is, equations (11) and (12)
Instead of, the following equation is calculated.

【0028】 Mi =Wi SCi +SDi ・・・(17) ΣM ij=Wi j ΣSC ij + ΣSD ij ・・・(18) この畳み込み演算の結果Mi ,ΣM ijに対しては図2
の場合と同様に式(13), (14)により対数変換を行い
(S331 )、更に式(15),(16)により逆コサイン変
換を行って元のケプストラム領域での乗算性ひずみを未
知数として含む未完耐雑音音声HMMが得られる(S
232 )。
M i = W i SC i + SD i (17) Σ M ij = W i W j Σ SC ij + Σ SD ij (18) Results of this convolution operation M i , Σ M ij For Figure 2
As in the case of, the logarithmic transformation is performed by the equations (13) and (14) (S 331 ), and the inverse cosine transformation is performed by the equations (15) and (16) to calculate the multiplicative distortion in the original cepstrum region by an unknown number. An uncompleted noise resistant speech HMM including
232 ).

【0029】次にこのようにして得られた未完耐雑音音
声HMM中の未知数である乗算性ひずみW(=W1 ,・
・・,Wm )を推定する(図5A,S4 )。このため音
声入力手段1(図4)から音声を入力し、S/N或いは
乗算性ひずみ推定手段10で、その入力音声の系列Xに
対し、未完耐雑音音声HMMのセットM(W)の尤度P
(X|M(W))が最大となる乗算ひずみWを推定す
る。この推定は最急降下法又は最尤推定法による繰り返
し演算により求めることができる。
Next, the multiplicative distortion W (= W 1 , ..., Which is an unknown number in the incomplete noise-resistant speech HMM obtained in this way.
.., W m ) is estimated (FIG. 5A, S 4 ). Therefore, a voice is input from the voice input means 1 (FIG. 4), and the S / N or multiplicative distortion estimation means 10 calculates the likelihood of the set M (W) of uncompleted noise-resistant voice HMMs with respect to the input voice sequence X. Degree P
Estimate the multiplication distortion W that maximizes (X | M (W)). This estimation can be obtained by iterative calculation by the steepest descent method or the maximum likelihood estimation method.

【0030】即ち最急降下法により尤度P(X|M
(W))を最大にするには以下の繰り返し演算を実行す
る。 1.Wを初期設定し、 2.Wt =Wt-1 +ε(∂P(X|M(W))/(∂
W)を用いて次のWt を推定し、 3.Wt-1 をWt で更新する。
That is, the likelihood P (X | M
(W)) is maximized by executing the following iterative calculation. 1. Initialize W, W t = W t-1 + ε (∂P (X | M (W)) / (∂
2.) estimate the next W t using Update W t-1 with W t .

【0031】4.2、3を収束するまで繰り返す。εに
は適当な小さな値を用いる。最尤推定法により尤度P
(X|M(W))を最大にするには次のようにする。一
般にトレリス尤度とビタービ尤度の間には以下のような
式が成り立つ。 P(X|M(W))=ΣP(X,S|M(W)) Σは総てのSについての和 ここでSは状態の遷移を表す。またWの更新後のものを
W′としてQ(W,W′)を次のように定義する。
Repeat steps 4.2 and 3 until convergence. Use an appropriate small value for ε. Likelihood P by the maximum likelihood estimation method
To maximize (X | M (W)), do as follows. Generally, the following equation holds between the trellis likelihood and the Viterbi likelihood. P (X | M (W)) = ΣP (X, S | M (W)) Σ is the sum for all S, where S represents a state transition. Further, Q (W, W ') is defined as follows, where W after updating W is W'.

【0032】Q(W,W′)=ΣP(X,S|M
(W))logP(X,S|M(W′)) Σは総てのSについての和 ここでQ(W,W′)Q(W,W)ならばP(X|M
(W′))P(X|M(W))が成り立つ。この原理と
最尤推定法とを用いるWの推定方法を以下に示す。
Q (W, W ') = ΣP (X, S | M
(W)) log P (X, S | M (W ')) Σ is the sum of all S Here, if Q (W, W') > Q (W, W), then P (X | M
(W ')) > P (X | M (W)). An estimation method of W using this principle and the maximum likelihood estimation method will be shown below.

【0033】1.Wを初期設定し、 2.Q(Wt-1 ,Wt )を最大にするWt を最尤推定法
で推定する。 3.Wt-1 をWt で更新する。 4.2、3を収束するまで繰り返す。 このようにして推定された乗算性ひずみWの値をHMM
完成手段11(図4)で各未完耐雑音音声HMMに代入
して耐雑音音声HMMを完成し、これを耐雑音音声HM
M格納部12に格納する(図5A,S5 )。未知音声の
認識は、音声入力手段1より未知音声を入力して、耐雑
音音声HMM格納部12に格納されている各耐雑音音声
HMMとの類似度を音声認識手段2で計算して、その計
算結果にもとづいて認識結果を出力する。乗算性ひずみ
を推定する際に用いる入力音声Xは学習音声ではなく、
認識しようとしている未知音声でもよい。後者の場合は
乗算性ひずみを推定後、その未知音声について前記音声
認識処理を行う。
1. Initialize W, Q the (W t-1, W t ) estimated by maximum likelihood estimation method W t to maximum. 3. Update W t-1 with W t . Repeat steps 4.2 and 3 until convergence. The value of the multiplicative distortion W estimated in this way is calculated by the HMM.
The completion means 11 (FIG. 4) substitutes each incomplete noise resistant speech HMM to complete the noise resistant speech HMM, and this is completed.
It is stored in the M storage unit 12 (FIG. 5A, S 5 ). The unknown voice is recognized by inputting the unknown voice from the voice input unit 1, calculating the degree of similarity with each noise resistant voice HMM stored in the noise resistant voice HMM storage unit 12 by the voice recognition unit 2, and The recognition result is output based on the calculation result. The input speech X used when estimating the multiplicative distortion is not a learning speech,
It may be the unknown voice that you are trying to recognize. In the latter case, the multiplicative distortion is estimated, and then the voice recognition process is performed on the unknown voice.

【0034】上述の具体例では音声HMM、雑音HMM
がケプストラム領域のものとしたが、対数スペクトル
(対数パワースペクトル)領域で作られたものにもこの
発明を適用することができる。この場合は図7に示すよ
うに図6中のステップS312 におけるエキスポーネンシ
ャル変換を、対数スペクトル領域での雑音HMMの分布
と音声HMMの分布に対して行って、線形スペクトル領
域の各分布を得(S31)、その音声HMMの分布の線形
スペクトルに対して乗算性ひずみWを乗算し
(S 321 )、この乗算された音声HMMの分布と雑音H
MMの分布とコンボルーション演算を行い(S322 )、
その演算結果を対数変換して未完耐雑音音声HMMを得
る(S33)。
In the above specific example, a voice HMM and a noise HMM
Is the cepstrum region, but the logarithmic spectrum
This is also true for those made in the (logarithmic power spectrum) domain.
The invention can be applied. In this case, it's shown in Figure 7.
Sea urchin in step S in FIG.312Exponency in
Distribution of noise HMM in logarithmic spectral domain
And the speech HMM distribution,
Obtain each distribution of the region (S31), The linear distribution of the speech HMM
Multiply the spectrum by the multiplicative distortion W
(S 321), The distribution of this multiplied speech HMM and the noise H
Perform the convolution operation with the distribution of MM (S322),
The calculation result is logarithmically converted to obtain an incomplete noise resistant speech HMM.
(S33).

【0035】同様に音声HMM、雑音HMMが線形スペ
クトル領域でそれぞれ作られている場合は、これらの分
布間のコンボルーションを行って直ちに未完耐雑音音声
HMMを得る。更に上述では未完耐雑音音声HMMに未
知数として乗算性ひずみWを導入したが、S/Nを導入
してもよい。つまりこの場合の音声HMMの分布と、雑
音HMMの分布とのコンボルーションの結果得られる未
完耐雑音音声HMMの分布の平均値Mi と共分散ΣM ij
に対しそれぞれ次式のようにS/Nを導入する。
Similarly, when the speech HMM and the noise HMM are created in the linear spectral domain, the uncompleted noise-resistant speech HMM is immediately obtained by performing convolution between these distributions. Furthermore, in the above, the multiplicative distortion W is introduced as an unknown into the incomplete noise resistant speech HMM, but S / N may be introduced. That is, the average value M i and the covariance Σ M ij of the distribution of the incomplete noise-resistant speech HMM obtained as a result of the convolution of the distribution of the speech HMM and the distribution of the noise HMM in this case.
On the other hand, S / N is introduced as in the following equations.

【0036】 Mi =SCi +kSD ・・・(19) ΣM ij=ΣSC ij+k2 ΣSD ij ・・・(20) k=10-((S/N)/2) ・・・(21) この場合図4中のS/N或は乗算性ひずみ格納部9にW
ではなくkを格納しておく、また図5B、図6、図7の
各ステップS321 における乗算は、雑音HMMの分布
(音声HMMの分布でもよい)に対してkを乗算する。
つまり式(19),(20),(21)の演算を行う。更に図
4中のS/N或は乗算性ひずみ推定手段10ではP(X
|M(S/N))という尤度を最大にする様にS/Nが
選ばれる。つまり入力音声の系列Xに対し、S/Nの関
数である未完耐雑音音声HMMのセットM(S/N)の
尤度が最大になるS/Nを推定する。この入力音声とし
ては雑音HMMを作成した時の環境で発声した学習音声
又は認識対象音声を用いるとよい。このS/Nの推定も
P(X|M(S/N))を最大にするS/Nを最尤推定
法或は最急降下法による繰り返し演算によって求めるこ
とができる。この他のS/N推定の例を図8に示す。N
個のS/Nの値(S/N)1 〜(S/N)N を用意し、
これらを式(19),(20),(21)に代入してN個の未
完耐雑音音声HMMを作り、これらN個の未完耐雑音音
声HMMの入力音声に対する尤度P(X|M(S/N))
をそれぞれ計算し、このN個の尤度中最大となるS/N
を選ぶ。
M i = SC i + kSD (19) Σ M ij = Σ SC ij + k 2 Σ SD ij (20) k = 10 − ((S / N) / 2)・ ・ ・ ( 21) In this case, W is stored in the S / N or multiplicative distortion storage section 9 in FIG.
Instead, k is stored, and the multiplication in each step S 321 of FIGS. 5B, 6 and 7 multiplies the distribution of the noise HMM (or the distribution of the speech HMM) by k.
That is, the equations (19), (20), and (21) are calculated. Further, in the S / N or multiplicative distortion estimating means 10 in FIG. 4, P (X
S / N is chosen to maximize the likelihood of | M (S / N)). That is, with respect to the input speech sequence X, the S / N that maximizes the likelihood of the set M (S / N) of the incomplete noise-resistant speech HMMs, which is a function of the S / N, is estimated. As the input voice, learning voice or recognition target voice uttered in the environment when the noise HMM is created may be used. This S / N estimation can also be obtained by iterative calculation by the maximum likelihood estimation method or the steepest descent method for the S / N that maximizes P (X | M (S / N)). Another example of S / N estimation is shown in FIG. N
Prepare individual S / N values (S / N) 1 to (S / N) N ,
Substituting these into equations (19), (20), and (21), N uncompleted noise-resistant speech HMMs are created, and the likelihood P (X | M ( S / N))
Respectively, and the maximum S / N among these N likelihoods is calculated.
Choose

【0037】(S/N)1 〜(S/N)N としては、例
えば雑音HMMを作成するための雑音収録環境におい
て、信号Sと雑音Nの和の雑音Nに対する比(S+N)
/Nを求め、この値と、これに対し±3dBした各値と
の3つを用い、その3つについての尤度P(X|M(S
/N))を求め、その最大となるS/Nを決定する。ま
た、P(X|M(S/N))の代わりにビタービアルゴ
リズムによる尤度P(X,S|M(S/N))を最大に
するという定式化も可能である。ここで、SはHMMの
状態遷移を表す。乗算性ひずみWの推定も図8に示した
ようにして行ってもよい。
As (S / N) 1 to (S / N) N , for example, in a noise recording environment for creating a noise HMM, the ratio of the sum of the signal S and the noise N to the noise N (S + N).
/ N is obtained, and three values of this value and each value obtained by ± 3 dB are used, and the likelihood P (X | M (S
/ N)), and the maximum S / N is determined. In addition, a formulation that maximizes the likelihood P (X, S | M (S / N)) by the Viterbi algorithm instead of P (X | M (S / N)) is also possible. Here, S represents the state transition of the HMM. The estimation of the multiplicative distortion W may also be performed as shown in FIG.

【0038】上述においては、未完耐雑音音声HMMの
乗算性ひずみW、S/Nの導入を、線形スペクトル領域
で行ったが、ケプストラム領域において、式 (1)に対し
Wを加算し、つまりC+W=(C0 +W0 ,C1
1 ,・・・Cp +Wp )を求め、このWを加算した音
声HMMと、雑音HMMとの積空間での合成を行っても
よい。またC+k=(C0 +k,C1 ,…,Cp ),
(k=αlog(10(S/N)/ 2 )、αはコサイン変換に
よって定まる定数)を求め、このkを加算した音声HM
Mと、雑音HMMとの積空間での合成を行ってもよい
し、あるいは雑音HMMに対しkを加算し、kを加算し
ない音声HMMとの積空間での合成を行ってもよい。さ
らに、対数スペクトル領域でWまたはS/Nを導入して
もよい。つまり、LC+W=C(COS)+Wを求め、
これとLDとの積空間での合成を行ってもよく、あるい
はLC+k=(LC0 +k,LC1 +k,…,LCP
k),(k=log(10(S/N)/2 )、を求めこれとL
Dとの積空間での合成を行ってもよく、あるいは、LC
+kを求め、これとLDとの積空間での合成を行っても
よい。
In the above, the introduction of the multiplicative distortion W and S / N of the incomplete noise resistant speech HMM was performed in the linear spectrum domain, but in the cepstrum domain, W is added to the equation (1), that is, C + W. = (C 0 + W 0 , C 1 +
W 1 , ... C p + W p ) may be obtained, and the speech HMM to which this W is added and the noise HMM may be combined in the product space. Also, C + k = (C 0 + k, C 1 , ..., C p ),
(K = αlog (10 (S / N) / 2 ), α is a constant determined by cosine transformation)
The M and the noise HMM may be combined in the product space, or k may be added to the noise HMM and the speech HMM in which the k is not added may be combined in the product space. Furthermore, W or S / N may be introduced in the logarithmic spectral domain. That is, LC + W = C (COS) + W is obtained,
Synthesis may be performed in the product space of this and LD, or LC + k = (LC 0 + k, LC 1 + k, ..., LC P +
k), (k = log (10 (S / N) / 2 )) and L
D may be combined in the product space with D, or LC
Alternatively, + k may be obtained and the product of LD and LD may be combined in the product space.

【0039】更に上述では雑音HMMとしては、音声H
MMと同一領域のものを作成したが、例えば音声HMM
はケプストラム領域で求め、雑音HMMは対数スペクト
ル領域で求めてもよい。この場合は、雑音HMMをケプ
ストラム領域から対数スペクトル領域へ変換する演算
分、演算量が少なくなる。
Further, in the above description, the noise HMM is the speech H.
I created one in the same area as MM.
May be obtained in the cepstrum region, and the noise HMM may be obtained in the logarithmic spectrum region. In this case, the amount of calculation is reduced by the calculation for converting the noise HMM from the cepstrum domain to the logarithmic spectrum domain.

【0040】[0040]

【発明の効果】以上述べたように、この発明によれば、
発声場所の雑音に基づいて雑音HMMを作成し、この雑
音HMMと、予め雑音の無い状態において収録された音
声情報に基づいて作成された音声HMMとからS/N又
は乗算性ひずみを未知数として導入した未完耐雑音音声
HMMを作り、この未完耐雑音音声HMMの入力音声に
対する尤度を最大にするS/N或は乗算性ひずみを決定
して耐雑音音声HMMを作成する。この様にすることに
より、S/Nの変動、マイクロホン歪み、回線歪み、話
者の発声変動に強い音声HMMを作成することができ、
つまりこの発明方法により得られた音声HMMを用いる
この発明の音声認識装置は、雑音が加わった音声や回線
ひずみを受けた音声などを従来より高い認識率で認識す
ることができる。
As described above, according to the present invention,
A noise HMM is created based on the noise of the utterance location, and S / N or multiplicative distortion is introduced as an unknown number from this noise HMM and a voice HMM created based on voice information recorded in advance in the absence of noise. The uncompleted noise resistant speech HMM is created, and the S / N or multiplicative distortion that maximizes the likelihood of the uncompleted noise resistant speech HMM with respect to the input speech is determined to create the noise resistant speech HMM. By doing this, it is possible to create a voice HMM that is resistant to S / N fluctuation, microphone distortion, line distortion, and speaker utterance fluctuation.
That is, the voice recognition device of the present invention using the voice HMM obtained by the method of the present invention can recognize a voice to which noise is added, a voice subjected to line distortion, and the like with a higher recognition rate than before.

【0041】そして、乗算性ひずみを未知数として音声
HMMに含め、これと雑音HMMとの合成を、積空間に
おいて実施することにより、あるいは音声HMM及び雑
音HMMの一方にS/Nを未知数として含め、これと、
未知数として含めなかったHMMとの合成を積空間にお
いて実施することにより雑音を含んだ音声をモデル化す
る様な未完耐雑音音声HMMの構成が可能となり、更に
このHMMの入力音声に対する尤度を最大にする乗算性
ひずみ又はS/Nを決定することにより耐雑音音声HM
Mが構成できる。
Then, the multiplicative distortion is included in the speech HMM as an unknown value, and the synthesis of this and the noise HMM is performed in the product space, or S / N is included as an unknown value in one of the speech HMM and the noise HMM. This and
An uncompleted noise-resistant speech HMM that models noise-containing speech can be constructed by performing synthesis in the product space with an HMM that was not included as an unknown number, and the likelihood of this HMM for input speech is maximized. Noise resistant speech HM by determining multiplicative distortion or S / N
M can be configured.

【0042】また、音声HMM、雑音HMMの両者を線
形スペクトル領域に変換し、その線形スペクトル領域の
音声HMMの出力確率の分布に乗算性ひずみを与え、こ
れと雑音HMMの出力確率の分布との畳み込み演算を
し、その結果を、元の音声HMMの領域に逆変換するこ
とで前記未完耐雑音音声HMMが求められる。同様に音
声HMM、雑音HMMの両者を線形スペクトル領域に変
換し、その線形スペクトル領域の音声HMM、雑音HM
Mの一方の出力確率の分布の線形スペクトルに対してS
/N成分を乗算し、これと、そのS/N成分を乗算しな
いHMMの出力確率分布との畳み込み演算し、この演算
結果を元の音声HMMの領域に逆変換することで前記未
完耐雑音音声HMMが求められる。
Further, both the speech HMM and the noise HMM are converted into a linear spectrum region, and the distribution of the output probability of the speech HMM in the linear spectrum region is subjected to multiplicative distortion. The uncompleted noise resistant speech HMM is obtained by performing a convolution operation and inversely converting the result into the area of the original speech HMM. Similarly, both the voice HMM and the noise HMM are converted into a linear spectrum region, and the voice HMM and the noise HM in the linear spectrum region are converted.
For a linear spectrum of one output probability distribution of M, S
/ N component is multiplied, and the convolution operation of this and the output probability distribution of the HMM which is not multiplied by the S / N component is performed, and the calculation result is inversely transformed into the region of the original speech HMM to obtain the uncompleted noise resistant speech. HMM is required.

【0043】ケプストラム領域で表現された音声HMM
と雑音HMMの出力確率の分布を線形スペクトル領域ま
でコサイン変換およびエキスポーネンシャル変換により
変換し、次いで畳み込み演算を行ない、この演算結果を
逆の変換である対数変換および逆コサイン変換を行なう
ことにより、ケプストラム領域における未完耐雑音音声
HMMの分布を計算することができる。また特に対数ス
ペクトル領域で表現された音声HMMと雑音HMMの出
力確率の分布をエキスポーネンシャル変換し、前記の畳
み込み演算を行ない、その演算結果を対数変換すること
により対数スペクトル領域における未完耐雑音音声HM
Mを求める。
Speech HMM represented in the cepstrum domain
And the distribution of the output probability of the noise HMM are transformed to the linear spectral domain by the cosine transform and the exponential transform, then the convolution operation is performed, and the result of this operation is subjected to the inverse transformation, the logarithmic transformation and the inverse cosine transformation. , It is possible to calculate the distribution of uncompleted noise resistant speech HMM in the cepstrum region. Further, particularly, the output probability distributions of the speech HMM and the noise HMM expressed in the logarithmic spectrum domain are subjected to the exponential transformation, the convolution operation is performed, and the operation result is logarithmically transformed to obtain the incomplete noise resistance in the logarithmic spectrum domain. Voice HM
Find M.

【0044】この発明方法及び装置の効果を調べるため
に音韻認識実験を行った。評価用データには話者1名の
発声した電話番号案内タスク51文を用いた。実験には
12dBと6dBの雑音が加わり、さらに、乗算性雑音
で歪んだ音声を用いた。音韻認識率の実験結果を下の表
に示す。何れもケプストラム領域のモデルであり、”音
声HMM”の実験では、何も処理を行わない音声HM
M、つまり図4中の音声HMM格納部3内の音声HMM
を用いた。“HMM合成法のみ”の実験は、発明の背景
の項で述べた前記文献1で提案された耐雑音HMM作成
法で得られたHMMを用いた実験である。“HMM合成
+S/N推定”はこの発明方法により、S/Nを未知数
として導入して作成したHMMを用いた実験であり、
“HMM合成+乗算性ひずみ推定”はこの発明方法によ
り乗算性ひずみを未知数として導入して作成したHMM
を用いた実験である。これらこの発明の方法でのHMM
の作成におけるS/N、乗算性ひずみの推定は最急降下
法により行った。これらの実験結果から、何も処理しな
い音声HMMより、従来のHMM合成のみの方が認識率
がよくなり、この発明の方法によれば加算性雑音に更に
強くなり、しかも乗算性ひずみに対しても従来よりも著
しく強い耐雑音音声HMMが得られ、認識結果が改善さ
れることがわかる。 以上述べたように、この発明によれば発声環境の雑音モ
デルを使用し、従来の音声HMMと合成して乗算性ひず
み、又はS/Nを未知数とした未完耐雑音音声HMMを
作り、そのHMMの入力音声に対する尤度が最大的未知
数を推定しているから発声環境に適した頑健な音声HM
Mの作成をすることができ、このHMMを用いるこの発
明の音声認識装置によれば高い音声認識率を達成するこ
とが可能となる。しかも、この発明装置によれば雑音H
MMの作成から入力音声の認識までの時間は例えば1分
間程度(高速の演算装置を用いればもっと短時間)であ
り、短時間に、かつ比較的簡単な処理で認識を行うこと
ができる。
A phoneme recognition experiment was conducted to investigate the effect of the method and apparatus of the present invention. As the evaluation data, 51 sentences of the telephone number guidance task uttered by one speaker were used. Noises of 12 dB and 6 dB were added to the experiment, and a voice distorted by multiplicative noise was used. The experimental results of the phoneme recognition rate are shown in the table below. Both are models of the cepstrum domain, and in the experiment of "speech HMM", the speech HM without any processing is performed.
M, that is, the voice HMM in the voice HMM storage unit 3 in FIG.
Was used. The "HMM synthesis method only" experiment is an experiment using the HMM obtained by the noise-resistant HMM creation method proposed in the above-mentioned Document 1 described in the background of the invention. “HMM synthesis + S / N estimation” is an experiment using an HMM created by introducing S / N as an unknown by the method of the present invention,
"HMM synthesis + multiplicative distortion estimation" is an HMM created by introducing the multiplicative distortion as an unknown by the method of the present invention.
It is an experiment using. HMMs in these inventive methods
The estimation of S / N and multiplicative distortion in the preparation of was performed by the steepest descent method. From these experimental results, the recognition rate of the conventional HMM synthesis alone is better than that of the speech HMM that does not process anything. According to the method of the present invention, it becomes stronger against additive noise, and moreover, against multiplicative distortion. It can be seen that also a noise resistant speech HMM that is significantly stronger than the conventional one is obtained and the recognition result is improved. As described above, according to the present invention, the noise model of the utterance environment is used and synthesized with the conventional speech HMM to produce an incomplete noise-resistant speech HMM in which multiplicative distortion or S / N is an unknown number, and the HMM is used. Robust HM suitable for the utterance environment because the maximum likelihood of the input speech is estimated
M can be created, and the speech recognition apparatus of the present invention using this HMM can achieve a high speech recognition rate. Moreover, according to the device of the present invention, noise H
The time from the creation of the MM to the recognition of the input voice is, for example, about 1 minute (which is shorter if a high-speed arithmetic device is used), and the recognition can be performed in a short time and with a relatively simple process.

【図面の簡単な説明】[Brief description of drawings]

【図1】音声認識装置の簡略に示した機能構成を示すブ
ロック図。
FIG. 1 is a block diagram showing a simplified functional configuration of a voice recognition device.

【図2】従来の耐雑音音声HMMの作成処理手順を示す
流れ図。
FIG. 2 is a flowchart showing a procedure of a conventional noise-resistant speech HMM creation process.

【図3】音声HMM、雑音HMM及びこれらを合成した
耐雑音音声HMMの各状態遷移を示す図。
FIG. 3 is a diagram showing state transitions of a speech HMM, a noise HMM, and a noise-resistant speech HMM obtained by combining these.

【図4】この発明による音声認識装置の実施例の機能構
成を示すブロック図。
FIG. 4 is a block diagram showing a functional configuration of an embodiment of a voice recognition device according to the present invention.

【図5】Aはこの発明による耐雑音音声HMM作成方法
の実施例の処理手順を示す流れ図、BはA中のステップ
3 の処理の具体例を示す手順を示す流れ図である。
5A is a flowchart showing a processing procedure of an embodiment of the noise resistant speech HMM creating method according to the present invention, and B is a flowchart showing a procedure showing a concrete example of the processing of step S 3 in A. FIG.

【図6】図5Bの処理の更に具体例を示す流れ図。FIG. 6 is a flowchart showing a further specific example of the processing in FIG. 5B.

【図7】図5Bの処理の更に他の具体例を示す流れ図。FIG. 7 is a flowchart showing yet another specific example of the processing of FIG. 5B.

【図8】未完耐雑音音声HMMのS/Nを推定する手法
の一例を示すための図。
FIG. 8 is a diagram showing an example of a method for estimating the S / N of an incomplete noise-resistant speech HMM.

Claims (21)

【特許請求の範囲】[Claims] 【請求項1】 雑音や乗算性ひずみ(以下単に雑音で総
称する)の影響を受けていない音声HMMと、雑音から
作られたそのHMM(雑音HMM)とから、線形スペク
トル領域の乗算性ひずみ又はS/N(信号/雑音)を未
知数として含む未完耐雑音音声HMMを作成する第1ス
テップと、 上記未完耐雑音音声HMMの入力音声に対する尤度が最
大になるような上記乗算性ひずみ又はS/Nを推定する
第2ステップと、 上記推定した値を上記未完耐雑音音声HMMに代入して
耐雑音音声HMMを完成する第3ステップとを有する音
声認識用耐雑音隠れマルコフモデル作成方法。
1. A multiplicative distortion in a linear spectrum region is calculated from a speech HMM that is not affected by noise and multiplicative distortion (hereinafter simply referred to as noise) and its HMM (noise HMM) created from noise. A first step of creating an incomplete noise-resistant speech HMM including S / N (signal / noise) as an unknown number, and the above-mentioned multiplicative distortion or S / so that the likelihood of the incomplete noise-resistant speech HMM with respect to an input speech becomes maximum. A method for creating a noise-resistant hidden Markov model for speech recognition, comprising: a second step of estimating N; and a third step of substituting the estimated value into the incomplete noise-resistant speech HMM to complete the noise-resistant speech HMM.
【請求項2】 請求項1の隠れマルコフモデル作成方法
において、 上記第1ステップは上記音声HMMに上記乗算性ひずみ
を与える第4ステップと、上記乗算性ひずみが与えられ
た音声HMMと上記雑音HMMとを積空間で合成する第
5ステップよりなる。
2. The method of creating a hidden Markov model according to claim 1, wherein the first step includes a fourth step of giving the multiplicative distortion to the speech HMM, a speech HMM to which the multiplicative distortion has been given, and the noise HMM. And the fifth step of synthesizing and in the product space.
【請求項3】 請求項2の隠れマルコフモデル作成方法
において、 上記第4ステップはケプストラム領域でそれぞれ表わさ
れた音声HMMの出力確率の分布の平均値に対する上記
乗算性ひずみの加算である。
3. The hidden Markov model creation method according to claim 2, wherein the fourth step is addition of the multiplicative distortion to the average value of the distribution of the output probabilities of the speech HMMs represented in the cepstrum region.
【請求項4】 請求項2の隠れマルコフモデル作成方法
において、 上記第4ステップは対数スペクトル領域で表わされた音
声HMMの出力確率の分布の平均値に上記乗算性ひずみ
の加算である。
4. The method of creating a hidden Markov model according to claim 2, wherein the fourth step is the addition of the multiplicative distortion to the average value of the output probability distribution of the speech HMM represented in the logarithmic spectrum domain.
【請求項5】 請求項1の隠れマルコフモデル作成方法
において、 上記第1ステップは上記音声HMM及び上記雑音HMM
に上記S/Nを含む成分を付加する第4ステップと、上
記S/Nを含む成分が付加されたHMMと付加されない
HMMとを積空間で合成する第5ステップよりなる。
5. The hidden Markov model creation method according to claim 1, wherein the first step comprises the speech HMM and the noise HMM.
To the HMM to which the component including the S / N is added and the HMM to which the component including the S / N is not added in the product space.
【請求項6】 請求項5の隠れマルコフモデル作成方法
において、 上記第4ステップはケプストラム領域で表わされたHM
Mの出力確率の分布の平均値の0次の項に対する上記S
/Nを含む成分の加算である。
6. The method of creating a hidden Markov model according to claim 5, wherein the fourth step is an HM represented by a cepstrum region.
The above S for the 0th order term of the average value of the distribution of the output probabilities of M
This is addition of components including / N.
【請求項7】 請求項5の隠れマルコフモデル作成方法
において、 上記第4ステップは対数スペクトル領域で表わされたH
MMの出力確率の分布の平均値に対する上記S/Nを含
む成分の加算である。
7. The method of creating a hidden Markov model according to claim 5, wherein the fourth step is H represented in a logarithmic spectral domain.
It is the addition of the component including the above S / N to the average value of the distribution of the output probability of MM.
【請求項8】 請求項1の隠れマルコフモデル作成方法
において、 上記第1ステップは上記音声HMMと上記雑音HMMの
各出力確率分布を線形スペクトル領域に変換する第6ス
テップと、上記線形スペクトル領域の音声HMMの出力
確率の分布に上記乗算性ひずみを乗算する第7ステップ
と、上記乗算性ひずみが乗算された音声HMMの出力確
率の分布と上記線形スペクトル領域の雑音HMMの出力
確率の分布とを畳み込み演算する第8ステップと、上記
畳み込み演算の結果を上記元の音声HMMの領域に逆変
換する第9ステップとよりなる。
8. The hidden Markov model creation method according to claim 1, wherein said first step comprises a sixth step of converting each output probability distribution of said speech HMM and said noise HMM into a linear spectral domain, and said linear spectral domain The seventh step of multiplying the output probability distribution of the speech HMM by the above-mentioned multiplicative distortion, the distribution of the output probability of the speech HMM multiplied by the above-mentioned multiplicative distortion, and the distribution of the output probability of the noise HMM in the above-mentioned linear spectral region are described. It comprises an eighth step of performing a convolution operation and a ninth step of inversely transforming the result of the convolution operation into the region of the original speech HMM.
【請求項9】 請求項1の隠れマルコフモデル作成方法
において、 上記第1ステップは上記音声HMMと上記雑音HMMの
各出力確率の分布を線形スペクトル領域に変換する第6
ステップと、上記線形スペクトル領域の音声HMM及び
雑音HMMの一方の出力確率の分布において上記S/N
を含む成分を乗算する第7ステップと、上記乗算された
HMMの出力確率分布と乗算されないHMMの出力確率
の分布とを畳み込み演算する第8ステップと、上記畳み
込み演算の結果を上記元の音声HMMの領域に逆変換す
る第9ステップとよりなる。
9. The method of creating a hidden Markov model according to claim 1, wherein the first step is to convert a distribution of output probabilities of the speech HMM and the noise HMM into a linear spectral domain.
And S / N in the distribution of the output probability of one of the speech HMM and the noise HMM in the linear spectral domain.
And a seventh step of convoluting the output probability distribution of the multiplied HMM and the output probability distribution of the non-multiplied HMM, and a result of the convolution calculation of the original speech HMM. (9) Step 9 of inverse transformation into the region of.
【請求項10】 請求項8又は9の隠れマルコフモデル
作成方法において、 上記第6ステップはケプストラム領域でそれぞれ表わさ
れた音声HMMと雑音HMMの各出力確率の分布を、コ
サイン変換し、更にエキスポーネンシャル変換するステ
ップであり、上記第9ステップは上記畳み込み演算結果
を対数変換し、更に逆コサイン変換するステップであ
る。
10. The method of creating a hidden Markov model according to claim 8 or 9, wherein in the sixth step, the distributions of the output probabilities of the speech HMM and the noise HMM respectively represented in the cepstrum domain are cosine-transformed and further extracted. The ninth step is a step of performing a logarithmic transformation of the convolution operation result, and further an inverse cosine transformation of the result.
【請求項11】 請求項8又は9の隠れマルコフモデル
作成方法において、 上記第6ステップは対数スペクトル領域でそれぞれ表わ
された音声HMMと雑音HMMの各出力確率の分布をそ
れぞれエキスポーネンシャル変換するステップであり、
上記第9ステップは上記畳み込み演算結果を対数変換す
るステップである。
11. The hidden Markov model creation method according to claim 8 or 9, wherein in the sixth step, the distributions of the output probabilities of the speech HMM and the noise HMM respectively expressed in the logarithmic spectral domain are subjected to the exponential transformation. Is the step
The ninth step is a step of logarithmically converting the convolution operation result.
【請求項12】 請求項1の隠れマルコフモデル作成方
法において、 上記第1ステップは線形スペクトル領域で表わされた音
声HMMの出力確率分布で上記乗算性ひずみを乗算する
第4ステップと、上記乗算性ひずみが乗算された出力確
率分布と線形スペクトル領域で表わされた雑音HMMの
出力確率の分布の畳み込み演算を行う第5ステップとよ
りなる。
12. The hidden Markov model creation method according to claim 1, wherein said first step comprises a fourth step of multiplying said multiplicative distortion by an output probability distribution of a speech HMM expressed in a linear spectral domain, and said multiplication. The fifth step comprises a convolution operation of the output probability distribution multiplied by the sexual distortion and the distribution of the output probability of the noise HMM expressed in the linear spectral domain.
【請求項13】 請求項1の隠れマルコフモデル作成方
法において、 上記第1ステップは線形スペクトル領域でそれぞれ表わ
された音声HMMと雑音HMMの一方の出力確率の分布
は上記S/Nを含む成分を乗算する第4ステップと、上
記乗算された出力確率の分布と、乗算されない出力確率
の分布との畳み込み演算を行う第5ステップとよりな
る。
13. The method of creating a hidden Markov model according to claim 1, wherein the first step comprises a distribution of output probabilities of one of the speech HMM and the noise HMM expressed in a linear spectral domain, the component including the S / N. And a fifth step of performing a convolution operation of the above-mentioned multiplied output probability distribution and the non-multiplied output probability distribution.
【請求項14】 請求項5、9、又は13の隠れマルコ
フモデル作成方法において、 上記S/Nを含む成分は10((S/N)/2) である。
14. The hidden Markov model creating method according to claim 5, 9, or 13, wherein the component containing S / N is 10 ((S / N) / 2) .
【請求項15】 請求項1乃至9、12、13の何れか
の隠れマルコフモデル作成方法において、 上記第2ステップは最尤推定方法による繰り返し演算に
より推定する。
15. The method of creating a hidden Markov model according to claim 1, wherein the second step is estimated by iterative calculation by a maximum likelihood estimation method.
【請求項16】 請求項1乃至9、12、13の何れか
の隠れマルコフモデル作成方法において、 上記第2ステップは最急降下法による繰り返し演算によ
り推定する。
16. The hidden Markov model creation method according to claim 1, wherein the second step is estimated by iterative calculation by the steepest descent method.
【請求項17】 請求項1乃至4、8、12の何れかの
隠れマルコフモデル作成方法において、 上記第2ステップは上記各未完耐雑音音声HMMに各種
の値の乗算性ひずみを与え、これら各未完耐雑音音声H
MMの上記入力音声に対する尤度を求め、これら尤度の
最大となったHMMに与えた乗算性ひずみの値を推定値
とする。
17. The method of creating a hidden Markov model according to claim 1, wherein the second step applies multiplicative distortions of various values to each of the incomplete noise-resistant speech HMMs. Uncompleted noise resistant voice H
The likelihoods of the MM with respect to the input speech are calculated, and the value of the multiplicative distortion given to the HMM having the maximum likelihood is used as the estimated value.
【請求項18】 請求項1、5、乃至7、9、13の何
れかの隠れマルコフモデル作成方法において、 上記第2ステップは上記各未完耐雑音音声HMMに各種
の値のS/Nを与え、これら各未完耐雑音音声HMMの
上記入力音声に対する尤度を求め、これら尤度の最大と
なったHMMに与えたS/Nの値を推定値とする。
18. The hidden Markov model generation method according to claim 1, wherein the second step gives S / Ns of various values to each of the incomplete noise-resistant speech HMMs. , The likelihood of each of these uncompleted noise resistant speech HMMs with respect to the input speech is obtained, and the S / N value given to the HMM having the maximum likelihood is used as an estimated value.
【請求項19】 請求項1乃至9、12、13の何れか
の隠れマルコフモデル作成方法において、 上記第1ステップで用いる雑音HMMを、認識対象音声
が受ける環境下の雑音を収録して作成するステップを有
する。
19. The method of creating a hidden Markov model according to claim 1, wherein the noise HMM used in the first step is created by recording noise under the environment of the speech to be recognized. Have steps.
【請求項20】 請求項1乃至3、5、6、8、9の何
れかの隠れマルコフモデル作成方法において、 上記音声HMMはケプストラム領域で表現され、上記雑
音HMMは対数スペクトル領域で表現されたものであ
る。
20. The hidden Markov model generation method according to claim 1, wherein the speech HMM is expressed in a cepstrum domain, and the noise HMM is expressed in a logarithmic spectrum domain. It is a thing.
【請求項21】 雑音や乗算性ひずみの影響を受けてい
ない音声HMMと雑音HMMから作成した耐雑音音声H
MMを用いる音声認識装置であって、 上記音声HMMのセットが格納された音声HMM格納部
と、 上記雑音HMMが格納される雑音HMM格納部と、 上記耐雑音音声HMMのセットが格納される耐雑音音声
HMM格納部と、 未知数の乗算性ひずみ又はS/Nを格納した乗算性ひず
み又はS/N格納部と、 認識対象音声と同一環境の雑音を収録する雑音入力手段
と、 上記入力手段で収録した雑音に基づき上記雑音HMMを
作成して上記雑音格納部に格納する雑音HMM作成手段
と、 上記雑音HMM格納部内の上記雑音HMMと、上記音声
HMM格納部内の上記音声HMMと、上記乗算性ひずみ
又はS/N格納部内の上記未知数の乗算性ひずみ又はS
/Nとにより乗算性ひずみ又はS/Nを未知数として含
む未完耐雑音音声HMMを作成する未完耐雑音音声HM
M作成手段と、 認識対象音声を入力する音声入力手段と、 上記音声入力手段より入力された音声に対する上記未完
耐雑音音声HMMの尤度が最大となる上記乗算性ひずみ
又はS/Nを推定する乗算性ひずみ又はS/N推定手段
と、 上記推定手段により推定された乗算性ひずみ又はS/N
を上記未完耐雑音音声HMMに代入して上記耐雑音音声
HMMのセットを作成して上記耐雑音音声HMM格納部
に格納する耐雑音音声HMM完成手段と、 上記音声入力手段より入力された認識すべき音声と上記
耐雑音音声HMM格納部内の各上記耐雑音音声HMMと
の類似度を計算し、その計算結果にもとづいて認識結果
を出力する音声認識手段とを具備する。
21. A noise resistant speech H created from a speech HMM not affected by noise and multiplicative distortion and a noise HMM.
A voice recognition device using MM, comprising: a voice HMM storage unit storing the set of voice HMMs; a noise HMM storage unit storing the noise HMMs; and an endurance store storing the set of noise resistant voice HMMs. A noise voice HMM storage unit, a multiplicative distortion or S / N storage unit storing an unknown number of multiplicative distortions or S / Ns, a noise input means for recording noises in the same environment as the recognition target voice, and the above input means. Noise HMM creating means for creating the noise HMM based on the recorded noise and storing it in the noise storage unit, the noise HMM in the noise HMM storage unit, the voice HMM in the voice HMM storage unit, and the multiplyability. Multiplying strain or S of the above unknown in the strain or S / N storage
/ N to create an incomplete noise resistant speech HMM containing multiplicative distortion or S / N as an unknown number
M creating means, voice input means for inputting a voice to be recognized, and estimating the above-mentioned multiplicative distortion or S / N that maximizes the likelihood of the incomplete noise resistant voice HMM for the voice input from the voice input means. Multiplying distortion or S / N estimating means and multiplying distortion or S / N estimated by the above estimating means
To the uncompleted noise-resistant speech HMM to create a set of the noise-resistant speech HMM and store it in the noise-resistant speech HMM storage unit, and a noise-recognition speech HMM completion means for recognizing the speech input by the speech input means. And a voice recognition means for calculating the similarity between the power voice and each of the noise resistant voice HMMs in the noise resistant voice HMM storage unit and outputting the recognition result based on the calculation result.
JP8047551A 1995-03-06 1996-03-05 Noise-resistant hidden markov model creating method for speech recognition and speech recognition device using the method Pending JPH09230886A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP8047551A JPH09230886A (en) 1995-03-06 1996-03-05 Noise-resistant hidden markov model creating method for speech recognition and speech recognition device using the method

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
JP4530495 1995-03-06
JP7-45304 1995-12-21
JP33333595 1995-12-21
JP7-333335 1995-12-21
JP8047551A JPH09230886A (en) 1995-03-06 1996-03-05 Noise-resistant hidden markov model creating method for speech recognition and speech recognition device using the method

Publications (1)

Publication Number Publication Date
JPH09230886A true JPH09230886A (en) 1997-09-05

Family

ID=27292178

Family Applications (1)

Application Number Title Priority Date Filing Date
JP8047551A Pending JPH09230886A (en) 1995-03-06 1996-03-05 Noise-resistant hidden markov model creating method for speech recognition and speech recognition device using the method

Country Status (1)

Country Link
JP (1) JPH09230886A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100434532B1 (en) * 1998-02-24 2004-07-16 삼성전자주식회사 Online model variable compensating method for voice recognition and voice recognizing method in accordance with compensating method, especially in relation to compensating for model variables used for voice recognition according to environment change with inputted voice signals only without another adaptive data
WO2011010647A1 (en) * 2009-07-21 2011-01-27 独立行政法人産業技術総合研究所 Method and system for estimating mixture ratio in mixed-sound signal, and phoneme identifying method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100434532B1 (en) * 1998-02-24 2004-07-16 삼성전자주식회사 Online model variable compensating method for voice recognition and voice recognizing method in accordance with compensating method, especially in relation to compensating for model variables used for voice recognition according to environment change with inputted voice signals only without another adaptive data
WO2011010647A1 (en) * 2009-07-21 2011-01-27 独立行政法人産業技術総合研究所 Method and system for estimating mixture ratio in mixed-sound signal, and phoneme identifying method
JP5334142B2 (en) * 2009-07-21 2013-11-06 独立行政法人産業技術総合研究所 Method and system for estimating mixing ratio in mixed sound signal and method for phoneme recognition

Similar Documents

Publication Publication Date Title
CN111756942B (en) Communication device and method for performing echo cancellation and computer readable medium
EP0831461B1 (en) Scheme for model adaptation in pattern recognition based on taylor expansion
US5924065A (en) Environmently compensated speech processing
Narayanan et al. Improving robustness of deep neural network acoustic models via speech separation and joint adaptive training
JP4750271B2 (en) Noise compensated speech recognition system and method
US7792672B2 (en) Method and system for the quick conversion of a voice signal
US6671666B1 (en) Recognition system
US7571095B2 (en) Method and apparatus for recognizing speech in a noisy environment
US5721808A (en) Method for the composition of noise-resistant hidden markov models for speech recognition and speech recognizer using the same
JPH0850499A (en) Signal identification method
JP2007523374A (en) Method and system for generating training data for an automatic speech recognizer
EP1189205A2 (en) HMM-based noisy speech recognition
MX2007015446A (en) Multi-sensory speech enhancement using a speech-state model.
JP2004347761A (en) Voice recognition device, voice recognition method, computer executable program and storage medium for performing the voice recognition method to computer
GB2560174A (en) A feature extraction system, an automatic speech recognition system, a feature extraction method, an automatic speech recognition method and a method of train
JP2002311989A (en) Speech recognition method with corrected channel distortion and corrected background noise
US20240129410A1 (en) Learning method for integrated noise echo cancellation system using cross-tower nietwork
US20240105199A1 (en) Learning method based on multi-channel cross-tower network for jointly suppressing acoustic echo and background noise
JP2002268698A (en) Voice recognition device, device and method for standard pattern generation, and program
Raut et al. Model adaptation for long convolutional distortion by maximum likelihood based state filtering approach
JP2002091478A (en) Voice recognition system
JPH10149191A (en) Method and device for adapting model and its storage medium
Han et al. Reverberation and noise robust feature compensation based on IMM
Sehr et al. Towards robust distant-talking automatic speech recognition in reverberant environments
JPH09230886A (en) Noise-resistant hidden markov model creating method for speech recognition and speech recognition device using the method