JP3247746B2 - Creating a noise-resistant phoneme model - Google Patents

Creating a noise-resistant phoneme model

Info

Publication number
JP3247746B2
JP3247746B2 JP00568893A JP568893A JP3247746B2 JP 3247746 B2 JP3247746 B2 JP 3247746B2 JP 00568893 A JP00568893 A JP 00568893A JP 568893 A JP568893 A JP 568893A JP 3247746 B2 JP3247746 B2 JP 3247746B2
Authority
JP
Japan
Prior art keywords
noise
phoneme
markov model
hidden markov
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP00568893A
Other languages
Japanese (ja)
Other versions
JPH06214592A (en
Inventor
清宏 鹿野
泰浩 南
達雄 松岡
マルタン フランク
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP00568893A priority Critical patent/JP3247746B2/en
Publication of JPH06214592A publication Critical patent/JPH06214592A/en
Application granted granted Critical
Publication of JP3247746B2 publication Critical patent/JP3247746B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【産業上の利用分野】本発明は、音声認識に用いる音韻
モデルの作成技術に関し、特に、雑音の存在する場所で
の音声認識に有効な耐雑音音韻モデルの作成に関するも
のである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for creating a phoneme model used for speech recognition, and more particularly, to a technique for creating a noise-resistant phoneme model effective for speech recognition in a place where noise exists.

【0002】[0002]

【従来の技術】音韻HMMを用いた従来の音声認識方式
を図を用いて説明する。図6は従来の音声認識方式を実
現する機能ブロック図の一例を示す図である。同図中1
は音声入力部、2は認識部、3は認識結果出力部であ
る。4は音韻モデル格納部であり音韻HMMはここに格
納されている。
2. Description of the Related Art A conventional speech recognition system using a phoneme HMM will be described with reference to the drawings. FIG. 6 is a diagram showing an example of a functional block diagram for realizing a conventional voice recognition system. 1 in the figure
Is a voice input unit, 2 is a recognition unit, and 3 is a recognition result output unit. Reference numeral 4 denotes a phoneme model storage unit in which the phoneme HMM is stored.

【0003】まず、図5の動作を説明する。認識部2で
は、音声入力部1に入力された音声をフレーム化し、フ
レームごとに、音韻モデル格納部4に予め格納された音
韻HMMを用いて確率計算を行い、確率の高いものを認
識結果として認識結果出力部3に送信していた。認識部
2の動作をより詳細に説明する。ある一定の期間、音声
が入力されると、この入力を単位時間に区切ってフレー
ム化し、各フレーム毎に特徴パラメータを抽出する。そ
して、最初のフレームについて抽出された特徴パラメー
タについて、各音韻ごとに用意された音韻HMMを用い
て、例えば「あ」である確率、「い」である確率、
「う」である確率、等を音韻ごとに求める。
First, the operation of FIG. 5 will be described. In the recognition unit 2, the speech input to the speech input unit 1 is framed, and for each frame, a probability calculation is performed using a phoneme HMM stored in advance in the phoneme model storage unit 4, and a speech with a high probability is used as a recognition result. It has been transmitted to the recognition result output unit 3. The operation of the recognition unit 2 will be described in more detail. When a voice is input for a certain period, the input is divided into unit times to form a frame, and a feature parameter is extracted for each frame. Then, for the feature parameters extracted for the first frame, for example, using a phoneme HMM prepared for each phoneme, the probability of “A”, the probability of “I”,
The probability of "U" is obtained for each phoneme.

【0004】続いて、次のフレームの特徴パラメータを
入力し、前の状態の次にこのフレームがくる確率を、そ
れぞれ音韻ごとに求める。これを、入力音声のすべての
フレームについて行い、例えば「あ」の音韻についての
音韻HMMを用いたときに最も確率が高ければ、入力音
声を「あ」として認識する。
Subsequently, the characteristic parameters of the next frame are input, and the probability that this frame will come after the previous state is obtained for each phoneme. This is performed for all the frames of the input voice. For example, if the probability is highest when using the phonetic HMM for the phoneme “a”, the input voice is recognized as “a”.

【0005】[0005]

【発明が解決しようとする課題】しかし従来、音韻HM
Mの作成は、雑音のない状態で得られた音声情報をもと
に行っており、実環境では、雑音の影響を受けて音声認
識の性能が著しく低下する。また、あらかじめ種々の雑
音下で音声を収録し、この音声から音韻HMMを作成す
る方式では、雑音の種類が膨大であるため、高い認識性
能を得るためにはシステムが肥大化するという問題点が
ある。
However, conventionally, the phoneme HM
M is created based on voice information obtained in a noise-free state. In a real environment, the performance of voice recognition is significantly reduced due to the influence of noise. Also, in the method of recording speech under various kinds of noise in advance and generating a phonological HMM from the speech, since the type of noise is enormous, the system becomes bloated in order to obtain high recognition performance. is there.

【0006】本発明は、上記問題点を解決し、簡易なシ
ステムで用いることができ、これにより雑音の多い状況
下においても高い認識性能が得られるような音韻モデル
を作成することにより、発声環境に合わせた音韻モデル
を作成し、音声認識性能を向上させることを目的とす
る。
[0006] The present invention solves the above-mentioned problems and can be used in a simple system, thereby creating a phoneme model capable of obtaining high recognition performance even in a noisy situation, thereby producing a vocal environment. The purpose of this invention is to create a phoneme model in accordance with, and to improve speech recognition performance.

【0007】[0007]

【課題を解決するための手段】上記目的を達成するた
め、本発明の耐雑音音韻隠れマルコフモデルの作成方式
では、あらかじめ作成された雑音がない環境での音声の
音韻隠れマルコフモデルと雑音の隠れマルコフモデルに
ついて、前記音声の音韻隠れマルコフモデルの状態と前
記雑音の隠れマルコフモデルの状態を組み合わせ、対応
するそれぞれの状態の確率に対しスペクトル領域で前記
確率を加算することにより積空間で合成して、雑音の重
畳した音韻隠れマルコフモデルを作成する。 または、ケ
プストラム領域で表された雑音がない環境での音声の音
韻隠れマルコフモデルと雑音の隠れマルコフモデルの出
力確率の分布をコサイン変換とエキスポネンシャル変換
を行って線形スペクトル領域での分布をそれぞれ算出
し、前記音声の音韻隠れマルコフモデルと前記雑音の隠
れマルコフモデルの前記線形スペクトル領域での分布を
畳み込んで雑音に適合した音韻隠れマルコフモデルを構
成し、前記畳み込みにより得られた分布に対数変換と逆
コサイン変換を行ってケプストラム領域まで変換する。
もしくは、合成で前記音声の音韻隠れマルコフモデルと
前記雑音の隠れマルコフモデルの出力確率を畳み込んで
雑音に適合した音韻隠れマルコフモデルを構成し、前記
畳み込みにより得られた分布に対数変換と逆コサイン変
換を行ってケプストラム領域まで変換する。
In order to achieve the above object, a method for creating a noise-resistant phoneme hidden Markov model according to the present invention employs a pre-created speech utterance in a noise-free environment.
Phonetic hidden Markov model and hidden Markov model of noise
The state and the state of the phonetic hidden Markov model of the speech
Combine and combine states of hidden Markov model with noise
In the spectral domain for the probability of each state
Combining in the product space by adding probabilities,
Create a folded phoneme hidden Markov model. Or,
Speech sound in a noise-free environment represented by the pustrum domain
Rhythmic Hidden Markov Model and Hidden Markov Model of Noise
Cosine transform and exponential transform of force probability distribution
To calculate each distribution in the linear spectral domain
And the phonological hidden Markov model of the speech and the noise
The distribution of the Markov model in the linear spectral region.
Construct a phonemic hidden Markov model that is convolved and adapted to noise
And the inverse of the logarithmic transformation
Perform cosine transform to convert to cepstrum domain.
Or, in synthesis, the phonetic hidden Markov model of the speech
Convolve the output probability of the hidden Markov model of the noise
Construct a phonetic hidden Markov model adapted to the noise,
Logarithmic transformation and inverse cosine transformation of distribution obtained by convolution
Conversion to the cepstrum domain.

【0008】[0008]

【作用】これにより、発声場所の雑音を考慮した音韻H
MMが作成でき、雑音下における音声認識性能の向上を
可能にする。
In this way, the phoneme H considering the noise at the utterance location is obtained.
An MM can be created, and the speech recognition performance under noise can be improved.

【0009】[0009]

【実施例】図1は本発明による、耐雑音音韻HMMの作
成手順を示す図である。以下同図に従って作成手順を説
明する。音声認識のHMMで用いられている音響パラメ
ータとして、ケプストラム係数が広く用いられている。
このケプストラム係数は、対数スペクトル(対数パワー
スペクトル)とコサイン変換の関係にある。一方、周囲
雑音等は加法性雑音であり、音声とはスペクトル領域で
加算することができる。
FIG. 1 is a diagram showing a procedure for creating a noise-resistant phoneme HMM according to the present invention. Hereinafter, the creation procedure will be described with reference to FIG. Cepstrum coefficients are widely used as acoustic parameters used in HMMs for speech recognition.
The cepstrum coefficient has a relationship between a logarithmic spectrum (logarithmic power spectrum) and a cosine transform. On the other hand, ambient noise and the like are additive noise and can be added to speech in the spectral domain.

【0010】まず、ケプストラム領域で、雑音のない環
境での音韻HMMと、雑音のある環境での雑音HMMと
を作成しておき、それぞれをコサイン変換してそれぞれ
の対数スペクトルを算出する。次に、それらにエキスポ
ネンシャル変換を行ってそれぞれの線形スペクトルを算
出し、音韻と雑音の出力確率を畳み込み(コンボルーシ
ョン)で求め、このように構成した雑音込みの音韻HM
Mを対数変換し、続いて逆コサイン変換を行うことによ
り、音韻HMMを作成する。
First, in the cepstrum domain, a phonemic HMM in a noise-free environment and a noise HMM in a noisy environment are created, and each of them is cosine-transformed to calculate each logarithmic spectrum. Next, they are subjected to exponential transformation to calculate their respective linear spectra, the output probabilities of the phoneme and noise are obtained by convolution, and the phoneme HM with noise configured in this way is obtained.
By performing logarithmic transformation of M and then performing inverse cosine transformation, a phoneme HMM is created.

【0011】これらの変換を、音声、雑音についてそれ
ぞれについて行ない、この線形スペクトル領域で、両者
のパラメータを加算する。その後、この線形スペクトル
を対数変換して対数スペクトル領域に戻し、さらに、コ
サイン変換でケプストラム領域に戻し、周囲雑音を考慮
した音韻HMMを作成する。
These conversions are performed for each of speech and noise, and both parameters are added in this linear spectrum region. Thereafter, the linear spectrum is logarithmically transformed back to a logarithmic spectrum region, and further returned to a cepstrum region by cosine transformation to create a phonemic HMM taking ambient noise into account.

【0012】ここで、音韻HMMと雑音HMMは、正規
分布の混合(いわゆるGaussian mixture)を用いたHM
Mで表されているとする。よって、上記の変換では、平
均値の変換とともに、共分散についても変換を行う。次
に具体的計算方法について説明する。まず、これらのH
MMのパラメータとして、0次からp次までのケプスト
ラム係数を考える。このベクトルを C=(C0 1 2 - - - - - Cp-1 p ) D=(D0 1 2 - - - - - Dp-1 p ) と表す。ここで、Cは音韻のパラメータ、Dは雑音のパ
ラメータである。このケプストラム係数から対数スペク
トルの変換は、コサイン変換が知られているが、これは
線形変換であり、(p+1) xmの変換行列(COS)で表す。
対数スペクトルのベクトルをそれぞれLC, LDで表
す。よって、コサイン変換による平均値は、 LC=C(COS) LD=D(COS) となる。また、ケプストラム係数の共分散をΣC 、ΣD
とすると、対数スペクトルの共分散ΣLC、ΣLDは、 ΣLC=(COS)ΣC(COS)t ΣLD=(COS)ΣD(COS)t となる。このように、平均値と共分散を変換することに
より、音声、雑音の正規分布の対数スペクトル領域での
平均値と共分散が得られる。
Here, the phonological HMM and the noise HMM are HM using a mixture of so-called Gaussian mixtures.
Let it be represented by M. Therefore, in the above conversion, the conversion is performed for the covariance as well as the conversion of the average value. Next, a specific calculation method will be described. First, these H
Cepstrum coefficients from the 0th to the pth order are considered as parameters of the MM. This vector is represented as C = (C 0 C 1 C 2 ----- C p-1 C p ) D = (D 0 D 1 D 2 ----- D p-1 D p ). Here, C is a phoneme parameter, and D is a noise parameter. The cosine transform from the cepstrum coefficients to the logarithmic spectrum is known as a cosine transform, which is a linear transform and is represented by a (p + 1) × m transform matrix (COS).
The vectors of the logarithmic spectrum are represented by LC and LD, respectively. Therefore, the average value obtained by the cosine transform is as follows: LC = C (COS) LD = D (COS) Also, the covariances of the cepstrum coefficients are Σ C and D
Then, the covariances Σ LC , Σ LD of the logarithmic spectrum are Σ LC = (COS) OS C (COS) t t LD = (COS) Σ D (COS) t . As described above, by converting the average value and the covariance, the average value and the covariance in the logarithmic spectrum region of the normal distribution of speech and noise can be obtained.

【0013】次に、対数スペクトルを線形スペクトルに
変換するエキスポネンシャル変換について述べる。この
変換では正規分布の形は保たれないが、これを正規分布
で近似する。エキスポネンシャル変換したときの分布形
の平均値SC, SDと共分散ΣSC,ΣSDを計算すると、 SCi =exp(LCi +ΣLC ii/2) SDi =exp(LDi +ΣLD ii/2) ΣSC ij=SCi × SCj ×{ exp( ΣLC ij )−1 } ΣSD ij=SDi × SDj ×{ exp( ΣLD ij )−1 } となる。このように、線形スペクトルの領域で音声と加
法性雑音の和をとることになり、和をとることは、分布
形のコンボルーションであるので、容易に2つの分布形
のコンボルーションの結果の平均値、Mと共分散、ΣM
を求めることができる。よって、 M=SCi +SDi ΣM =ΣSC+ΣSD となる。このようにして得られた分布形の平均値と共分
散を、今までの過程と逆に、ケプストラム領域まで変換
して行く。
Next, an exponential conversion for converting a logarithmic spectrum into a linear spectrum will be described. This transformation does not maintain the shape of the normal distribution, but approximates it with a normal distribution. When the mean SC, SD and the covariance Σ SC , Σ SD of the distribution form at the time of the exponential transformation are calculated, SC i = exp (LC i + Σ LC ii / 2) SD i = exp (LD i + Σ LD ii) / 2) Σ SC ij = SC i × SC j × {exp (Σ LC ij) -1} Σ SD ij = SD i × SD j × {exp (Σ LD ij) -1} becomes. As described above, the sum of speech and additive noise is obtained in the linear spectrum region, and since the sum is a distributed convolution, the average of the results of the two distributed convolutions is easily obtained. Value, M and covariance, Σ M
Can be requested. Therefore, it is M = SC i + SD i Σ M = Σ SC + Σ SD. The mean value and covariance of the distribution form obtained in this way are converted to the cepstrum domain in the reverse of the previous process.

【0014】まず、エキスポネンシャル変換の逆変換で
ある対数変換を行なう。対数変換された平均値をLM、
共分散をΣLMとすると、エキスポネンシャル変換の逆変
換であるので、 LMi =log(Mi ) −1/2 log(ΣM ii /Mi 2 +1 ) ΣLM ij=log(ΣM ij / (Mi j)+ 1 ) と変換する。さらに、逆コサイン変換(コサイン変換と
同じ)(COS')mx(p+1)によって対数スペクトルをケプ
ストラム領域へ変換し、平均値、Sと共分散ΣSを、 S=LM (COS') ΣS =(COS')ΣLM (COS')t のように得る。そして、ケプストラム領域で得られたH
MMの出力確率の正規分布を線形スペクトル領域までも
って行き、雑音モデルと加算してから、更に逆変換を行
ない、雑音を加算した音韻HMMを作成する。
First, logarithmic transformation, which is the inverse transformation of exponential transformation, is performed. The log-transformed average is LM,
If the covariance is LM LM , it is the inverse of the exponential transform, so LM i = log (M i ) −1/2 log (Σ M ii / M i 2 +1) Σ LM ij = log (Σ M ij / (M i M j ) +1). Furthermore, the inverse cosine transform (same as cosine transform) (COS ') by mx (p + 1) converts the logarithmic spectrum into a cepstrum region, the average value, the S and covariance Σ S, S = LM (COS ') Σ S = (COS ') Σ LM (COS') t . Then, the H obtained in the cepstrum region
The normal distribution of the output probability of the MM is taken up to the linear spectrum region, added to the noise model, and then subjected to an inverse transform to produce a phoneme HMM to which the noise is added.

【0015】上記説明では、二つの分布形を取り上げ、
その合成方法を述べた。次に、二つのHMMの合成方法
について述べる。通常、音韻HMMは、図2に示すよう
左から右への3状態ぐらいのモデルで表される。一
方、雑音のモデルとしては、図3に示すようなエルゴー
ド的なHMMが適している。この2つのモデルを積とし
て、積空間での合成を考える。すると、図4のような積
空間でのHMMが得られる。それぞれの状態は、音韻H
MMの状態と雑音HMMの状態の組み合わせからなって
いる。よって、対応するそれぞれの状態の出力確率に対
し、上記の線形スペクトル領域への変換を行ない、続い
てコンボルーションを行ない、逆変換を行なう。
In the above description, two distribution types are taken up,
The synthesis method was described. Next, a method of combining two HMMs will be described. Normally, a phonological HMM is represented by a model of about three states from left to right as shown in FIG. On the other hand, an ergodic HMM as shown in FIG. 3 is suitable as a noise model. Using these two models as a product, consider combining in a product space. Then, an HMM in a product space as shown in FIG. 4 is obtained. Each state is a phoneme H
It consists of a combination of the state of the MM and the state of the noise HMM. Therefore, the output probability of each corresponding state is converted into the above-mentioned linear spectral domain, followed by convolution and inverse conversion.

【0016】出力確率が単一の正規分布であるときに
は、2つの分布での上記の変換を行なえばよい。出力分
布が正規分布の混合であるときには、あらゆる分布の組
み合わせ対について上記の変換を行なうことになる。こ
こまでの説明では、ケプストラム係数のみについて説明
したが、音韻HMMでよく用いられるΔケプストラムや
Δパワーに対しても、それらが、ケプストラム係数の線
形和で表されているので、同様の変換が適用できる。
When the output probability is a single normal distribution, the above-described conversion may be performed for two distributions. When the output distribution is a mixture of normal distributions, the above-described conversion is performed for every pair of distribution combinations. In the description so far, only the cepstrum coefficient has been described. However, the same conversion is applied to the Δcepstrum and Δpower, which are often used in the phonological HMM, since they are represented by the linear sum of the cepstrum coefficients. it can.

【0017】図5は、本発明を音声認識装置に適用した
場合の実施例を示す図である。図中5は音韻モデル格納
部、6は雑音入力部、7は本発明の耐雑音音韻モデル作
成方式を実行する耐雑音音韻モデル作成部、8は耐雑音
音韻モデル格納部である。予め雑音のない状態で得た音
声情報から作成した音韻HMMモデルを音韻モデル格納
部5に用意し、耐雑音音韻モデル作成部7では、雑音入
力部6から入力された雑音から、雑音音韻HMMを作成
し、本発明の方式に従って作成された耐雑音音韻モデル
を耐雑音音韻モデル格納部8に格納させる。
FIG. 5 is a diagram showing an embodiment in which the present invention is applied to a speech recognition device. In the figure, 5 is a phoneme model storage unit, 6 is a noise input unit, 7 is a noise-resistant phoneme model creation unit that executes the noise-resistant phoneme model creation method of the present invention, and 8 is a noise-resistant phoneme model storage unit. A phoneme HMM model created in advance from speech information obtained in a noise-free state is prepared in the phoneme model storage unit 5, and the noise-resistant phoneme model creation unit 7 converts the noise phoneme HMM from the noise input from the noise input unit 6. The noise-resistant phoneme model created according to the method of the present invention is stored in the noise-resistant phoneme model storage unit 8.

【0018】認識部2では、音声入力部1から入力され
た入力音声を耐雑音音韻モデル格納部8に格納された音
韻モデルを用いて音声認識し、認識結果出力部3から出
力させる。
The recognizing unit 2 performs speech recognition of the input speech input from the speech input unit 1 using the phoneme model stored in the noise-resistant phoneme model storage unit 8 and outputs the speech from the recognition result output unit 3.

【0019】[0019]

【発明の効果】この発明によれば、発声環境の雑音モデ
ルを用いることができるので、発声環境に適合した頑健
な音韻HMMの作成が可能になり、高い音韻認識率の達
成が期待できる。
According to the present invention, since a noise model of the speech environment can be used, a robust phoneme HMM suitable for the speech environment can be created, and a high phoneme recognition rate can be expected to be achieved.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明による耐雑音音韻モデルの作成手順を示
す図である。
FIG. 1 is a diagram showing a procedure for creating a noise-resistant phoneme model according to the present invention.

【図2】3状態3ループの音韻HMMの例を示す。FIG. 2 shows an example of a three-state three-loop phoneme HMM.

【図3】2状態のエルゴードHMMによる雑音HMMの
例を示す。
FIG. 3 shows an example of a noise HMM using a two-state ergodic HMM.

【図4】図2の音韻HMMと図3の雑音HMMを、積空
間で合成した音韻HMMを示す。
FIG. 4 shows a phoneme HMM obtained by combining the phoneme HMM of FIG. 2 and the noise HMM of FIG. 3 in a product space.

【図5】本発明を音声認識装置に適用した場合の実施例
を説明する図である。
FIG. 5 is a diagram illustrating an embodiment when the present invention is applied to a speech recognition device.

【図6】従来の音声認識装置の構成を説明するための図
である。
FIG. 6 is a diagram illustrating a configuration of a conventional voice recognition device.

───────────────────────────────────────────────────── フロントページの続き (72)発明者 フランク マルタン 東京都目黒区駒場4丁目6番29号 (56)参考文献 特開 平2−83596(JP,A) Proc IEEE Int Con f Acoustics、Speec h、and Signal Proce ss(ICASSP)1992、Vol.1 I−489〜I−492頁「ADAPTAT ION OF THE HMM DIS TRIBUTIOS:APPLICAT ION TO A VQ CODEBO OK AND TO A NOISY ENVIRONMENT」AU:FRA NGOULIS E D、GAGANE LIS D A (58)調査した分野(Int.Cl.7,DB名) G10L 15/14 G10L 15/06 G10L 15/20 ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Frank Martin 4-6-29 Komaba, Meguro-ku, Tokyo (56) References JP-A-2-83596 (JP, A) Proc IEEE Int Conf Acoustics, Speed , And Signal Process (ICASSP) 1992, Vol. 1 I-489 to I-492 "ADAPTATION ION OF THE HMM DIS TRIBUTIOS: APPLICAT ION TO A VQ CODEBOOK AND TO A NOISY ENVIRONMENT" AU .Cl. 7 , DB name) G10L 15/14 G10L 15/06 G10L 15/20

Claims (3)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】音韻毎に音声の特徴パラメータの状態を得
る確率からなる 音韻隠れマルコフモデルの作成方式であ
って、 あらかじめ作成された雑音がない環境での音声の音韻隠
れマルコフモデルと雑音の隠れマルコフモデルについ
て、 前記音声の音韻隠れマルコフモデルの状態と前記雑音の
隠れマルコフモデルの状態を組み合わせ、対応するそれ
ぞれの状態の確率に対し スペクトル領域で前記確率を加
算することにより積空間で合成して、 雑音の重畳した音韻隠れマルコフモデルを作成すること
を特徴とする耐雑音音韻モデルの作成方式。
1. A state of a feature parameter of a speech is obtained for each phoneme.
A creating method of phoneme hidden Markov model consisting of that probability, hidden phoneme of the voice in the absence of noise, which is prepared in advance environment
About the Markov model and the hidden Markov model of noise
Te, of the phoneme hidden Markov model of the speech state of the noise
Combine the states of the Hidden Markov Model and the corresponding
A noise-resistant phoneme model creation method characterized by creating a phoneme hidden Markov model on which noise is superimposed by adding the probabilities in the spectral domain to the probabilities of the respective states to synthesize in a product space .
【請求項2】音韻毎に音声の特徴パラメータの状態を得
る確率からなる 音韻隠れマルコフモデルの作成方式であ
って、 ケプストラム領域で表された雑音がない環境での音声の
音韻隠れマルコフモデル雑音の隠れマルコフモデル
出力確率の分布をコサイン変換とエキスポネンシャル変
を行って線形スペクトル領域での分布をそれぞれ算出
し、 前記音声の音韻隠れマルコフモデルと前記雑音の隠れマ
ルコフモデルの前記線形スペクトル領域での分布を畳み
込んで 雑音に適合した音韻隠れマルコフモデルを構成
し、前記畳み込みにより得られた分布に 対数変換と逆コサイ
ン変換を行ってケプストラム領域まで変換することを特
徴とする耐雑音音韻モデルの作成方式。
2. Obtaining a state of a feature parameter of a speech for each phoneme.
Is a method for creating a phonetic hidden Markov model consisting of probabilities, and the cosine of the output probability distributions of the phonetic hidden Markov model and the noise hidden Markov model in a noise-free environment represented by the cepstrum domain. Calculate distribution in linear spectral domain by performing transformation and exponential transformation
And the hidden phonetic Markov model of the speech and the hidden
Convolve the distribution of the Lkov model in the linear spectral domain
A noise-resistant phoneme model creation method, comprising: constructing a phoneme hidden Markov model adapted to noise and performing logarithmic transformation and inverse cosine transformation on the distribution obtained by the convolution to cepstrum domain.
【請求項3】前記 合成で、前記音声の音韻隠れマルコフ
モデルと前記雑音の隠れマルコフモデルの出力確率を畳
み込んで雑音に適合した音韻隠れマルコフモデルを構成
し、 前記畳み込みにより得られた分布に対数変換と逆コサイ
ン変換を行ってケプストラム領域まで変換することを特
徴とする請求項2記載の耐雑音音韻モデルの作成方式。
3. The method according to claim 2 , wherein said synthesizing comprises a phonemic hidden Markov of said speech.
Tatami the output probability of the hidden Markov model of the model and the noise
Construct phoneme hidden Markov model adapted to noise
Then, the distribution obtained by the convolution is log-transformed and inversely transformed.
Conversion to the cepstrum domain.
3. A method for creating a noise-resistant phoneme model according to claim 2.
JP00568893A 1993-01-18 1993-01-18 Creating a noise-resistant phoneme model Expired - Lifetime JP3247746B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP00568893A JP3247746B2 (en) 1993-01-18 1993-01-18 Creating a noise-resistant phoneme model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP00568893A JP3247746B2 (en) 1993-01-18 1993-01-18 Creating a noise-resistant phoneme model

Publications (2)

Publication Number Publication Date
JPH06214592A JPH06214592A (en) 1994-08-05
JP3247746B2 true JP3247746B2 (en) 2002-01-21

Family

ID=11618046

Family Applications (1)

Application Number Title Priority Date Filing Date
JP00568893A Expired - Lifetime JP3247746B2 (en) 1993-01-18 1993-01-18 Creating a noise-resistant phoneme model

Country Status (1)

Country Link
JP (1) JP3247746B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101047104B1 (en) * 2009-03-26 2011-07-07 고려대학교 산학협력단 Acoustic model adaptation method and apparatus using maximum likelihood linear spectral transform, Speech recognition method using noise speech model and apparatus

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100442825B1 (en) * 1997-07-11 2005-02-03 삼성전자주식회사 Method for compensating environment for voice recognition, particularly regarding to improving performance of voice recognition system by compensating polluted voice spectrum closely to real voice spectrum
KR100434527B1 (en) * 1997-08-01 2005-09-28 삼성전자주식회사 Speech Model Compensation Method Using Vector Taylor Series
JP2002323900A (en) * 2001-04-24 2002-11-08 Sony Corp Robot device, program and recording medium
US7209881B2 (en) 2001-12-20 2007-04-24 Matsushita Electric Industrial Co., Ltd. Preparing acoustic models by sufficient statistics and noise-superimposed speech data
JP4061094B2 (en) 2002-03-15 2008-03-12 インターナショナル・ビジネス・マシーンズ・コーポレーション Speech recognition apparatus, speech recognition method and program thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Proc IEEE Int Conf Acoustics、Speech、and Signal Process(ICASSP)1992、Vol.1 I−489〜I−492頁「ADAPTATION OF THE HMM DISTRIBUTIOS:APPLICATION TO A VQ CODEBOOK AND TO A NOISY ENVIRONMENT」AU:FRANGOULIS E D、GAGANELIS D A

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101047104B1 (en) * 2009-03-26 2011-07-07 고려대학교 산학협력단 Acoustic model adaptation method and apparatus using maximum likelihood linear spectral transform, Speech recognition method using noise speech model and apparatus

Also Published As

Publication number Publication date
JPH06214592A (en) 1994-08-05

Similar Documents

Publication Publication Date Title
US10535336B1 (en) Voice conversion using deep neural network with intermediate voice training
JP2826215B2 (en) Synthetic speech generation method and text speech synthesizer
US7792672B2 (en) Method and system for the quick conversion of a voice signal
US10186252B1 (en) Text to speech synthesis using deep neural network with constant unit length spectrogram
JP2691109B2 (en) Speech coder with speaker-dependent prototype generated from non-user reference data
JP2733955B2 (en) Adaptive speech recognition device
Sisman et al. A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder.
US6615174B1 (en) Voice conversion system and methodology
US6026359A (en) Scheme for model adaptation in pattern recognition based on Taylor expansion
US5327521A (en) Speech transformation system
US8301445B2 (en) Speech recognition based on a multilingual acoustic model
JP3836815B2 (en) Speech recognition apparatus, speech recognition method, computer-executable program and storage medium for causing computer to execute speech recognition method
US8812312B2 (en) System, method and program for speech processing
US5165008A (en) Speech synthesis using perceptual linear prediction parameters
EP2109096B1 (en) Speech synthesis with dynamic constraints
WO2006053256A2 (en) Speech conversion system and method
US20060195317A1 (en) Method and apparatus for recognizing speech in a noisy environment
Dharanipragada et al. Robust feature extraction for continuous speech recognition using the MVDR spectrum estimation method
JP4061094B2 (en) Speech recognition apparatus, speech recognition method and program thereof
US5721808A (en) Method for the composition of noise-resistant hidden markov models for speech recognition and speech recognizer using the same
JP3247746B2 (en) Creating a noise-resistant phoneme model
JP3973492B2 (en) Speech synthesis method and apparatus thereof, program, and recording medium recording the program
JP2898568B2 (en) Voice conversion speech synthesizer
JPH10149191A (en) Method and device for adapting model and its storage medium
JP3999731B2 (en) Method and apparatus for isolating signal sources

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20071102

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081102

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091102

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101102

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101102

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111102

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111102

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121102

Year of fee payment: 11

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121102

Year of fee payment: 11

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20131102

Year of fee payment: 12

EXPY Cancellation because of completion of term
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20131102

Year of fee payment: 12