JPH11212587A

JPH11212587A - Noise adapting method for speech recognition

Info

Publication number: JPH11212587A
Application number: JP10010128A
Authority: JP
Inventors: Hiroaki Kokubo; 浩明小窪
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1998-01-22
Filing date: 1998-01-22
Publication date: 1999-08-06

Abstract

PROBLEM TO BE SOLVED: To decrease the process quantity of noise adaptation by limiting a distribution of addition of noise HMM in a probability distribution constituting phoneme HMM generated in small-noise environment. SOLUTION: A distribution selection part 302 selects a distribution for model composition with noise HMM out of (n) representative distributions stored in a representative distribution storage part 104 and outputs it to a spectrum conversion part 303. The representative distributions are selected on the basis of distances calculated between one representative distribution and a noise superposition representative distribution obtained by adding the noise HMM to the representative distribution in a frequency range, distribution by distribution. On the basis of the inter-distribution distances of respective previously calculated representative vectors, the distribution selection part 302 stores code words of distributions having inter-distribution distance values exceeding a certain threshold value and the noise HMM is added only to the representative distributions corresponding to the code word in the frequency range. The process quantity of a noise adapting method by the superposition of the noise HMM is nearly proportional to a distribution to which the process is applied. Here, the distribution selection part 302 limits the number of representative distributions which are processed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声認識における雑
音適応方法に係る。The present invention relates to a noise adaptation method in speech recognition.

【０００２】[0002]

【従来の技術】音声認識装置を実用化するためには、騒
音下で発声した音声でも正しく認識するような耐騒音化
技術が必要不可欠である。音声認識の分野では、雑音の
ない環境で作成した音韻単位に分割された音声の特徴パ
ラメータ系列を隠れマルコフモデル（音韻ＨＭＭ）に対
して、雑音の特徴パラメータ系列を隠れマルコフモデル
として表現したもの（雑音ＨＭＭ）をスペクトル上で加
算する雑音適応方式（特開平6-214592号公報）が知られ
ている。2. Description of the Related Art In order to put a speech recognition apparatus into practical use, it is essential to have a noise-resistant technique for correctly recognizing speech uttered under noise. In the field of speech recognition, a feature parameter sequence of speech divided into phoneme units created in a noise-free environment is represented by a hidden Markov model (phoneme HMM), and a feature parameter sequence of noise is represented by a hidden Markov model ( A noise adaptive method of adding a noise HMM on a spectrum (Japanese Patent Laid-Open No. 6-214592) is known.

【０００３】[0003]

【発明が解決しようとする課題】前記従来技術の雑音適
応方式は、音韻ＨＭＭのすべての音響パラメータをスペ
クトル領域のパラメータに変換する必要があるため、そ
の適応処理に時間がかかる。雑音適応処理に時間がかか
りすぎると、音声認識機能を利用する際に雑音適応が完
了するまで待たなければならず、使い勝手が悪くなる。
また、常に雑音が変化するような環境では、適応する時
点と実際に音声認識を開始する時点とで騒音環境にずれ
が生じてしまうため、雑音適応による効果が充分に発揮
できない。In the above-described conventional noise adaptation method, it is necessary to convert all acoustic parameters of the phonological HMM into parameters in the spectral domain, so that the adaptation process takes time. If the noise adaptation processing takes too much time, it is necessary to wait until the noise adaptation is completed when using the speech recognition function, and the usability deteriorates.
Further, in an environment where noise constantly changes, a difference occurs between the noise adaptation time and the time point when speech recognition is actually started, so that the effect of noise adaptation cannot be sufficiently exhibited.

【０００４】本発明の目的は、雑音ＨＭＭの重畳による
雑音適応処理において、該処理量を大幅に削減する音声
認識における雑音適応方法を提供し、雑音下でも安定に
動作する耐騒音型音声認識装置を提供することにある。SUMMARY OF THE INVENTION It is an object of the present invention to provide a noise adaptation method in speech recognition that significantly reduces the processing amount in noise adaptation processing by superposition of a noise HMM, and a noise-resistant speech recognition apparatus that operates stably even under noise. Is to provide.

【０００５】[0005]

【課題を解決するための手段】上記の目的を達成するた
めに、本発明では、音韻単位に分割された音声の特徴パ
ラメータ系列を隠れマルコフモデルとして表現したもの
（音韻ＨＭＭ）を雑音の無い環境で作成し、該雑音の無
い環境で作成した前記音韻ＨＭＭと雑音の特徴パラメー
タ系列を隠れマルコフモデルとして表現したもの（雑音
ＨＭＭ）とをそれぞれの前記特徴パラメータ系列の周波
数領域で加算し、発声環境に合わせた前記音韻ＨＭＭを
作成する音声認識における雑音適応方法において、前記
雑音の無い環境で作成した前記音韻ＨＭＭを構成する確
率分布のうち、前記雑音ＨＭＭを加算する分布を制限す
る。In order to achieve the above object, according to the present invention, a feature parameter sequence of a speech divided into phoneme units is represented as a hidden Markov model (phoneme HMM). , And adding the phoneme HMM created in the noise-free environment and a feature parameter sequence of noise as a hidden Markov model (noise HMM) in the frequency domain of each of the feature parameter sequences. In the noise adaptation method in speech recognition for creating the phoneme HMM adapted to the above, the distribution to which the noise HMM is added is restricted from the probability distributions constituting the phoneme HMM created in the noise-free environment.

【０００６】（実施例）以下、本発明の実施例を示す。
図１は本発明の一実施例を説明するための半連続ＨＭＭ
による単語音声認識装置のブロック図である。図１にお
いて、１０１は音声入力部、１０２は特徴パラメータ抽
出部、１０３は音声認識部、１０４は代表分布格納部、
１０５は音響ＨＭＭ格納部、１０６は辞書格納部、１０
７は認識結果出力部、１０８は雑音モデル適応部であ
る。(Embodiment) An embodiment of the present invention will be described below.
FIG. 1 is a semi-continuous HMM for explaining an embodiment of the present invention.
1 is a block diagram of a word speech recognition device according to the present invention. In FIG. 1, 101 is a speech input unit, 102 is a feature parameter extraction unit, 103 is a speech recognition unit, 104 is a representative distribution storage unit,
105 is an acoustic HMM storage unit, 106 is a dictionary storage unit, 10
7, a recognition result output unit; and 108, a noise model adaptation unit.

【０００７】音声入力部（１０１）は通常、マイクロフ
ォンとＡ／Ｄ変換器より構成されており、音声波形を取
り込み、ディジタルデータに変換する。特徴パラメータ
抽出部（１０２）は、ディジタルデータに変換された入
力音声波形を短時間周期（通常は10ms〜20ms）毎に分割
し、分割したデータに対する特徴パラメータを抽出す
る。音声認識における特徴パラメータとしては、ＬＰＣ
ケプストラムが使われることが多い。ＬＰＣケプストラ
ムについては中田著「音声」（コロナ社）が詳しい。音
響ＨＭＭ格納部（１０５）には、音韻ＨＭＭが格納され
ている。音韻ＨＭＭとは、音韻単位に分割された音声の
特徴パラメータ系列を隠れマルコフモデル（ＨＭＭ）と
して表現したものである。ＨＭＭに関しては、中川著
「確率モデルによる音声認識」（電子情報処理学会編）
に詳細な説明がされている。音韻ＨＭＭにおける確率密
度関数は、多次元ガウス分布の混合分布として近似され
る。このとき、半連続ＨＭＭ方式では、この多次元ガウ
ス分布を有限個のクラスタに分類し、それぞれのクラス
タに属する分布を一つの分布で代表させる。ここで、代
表させた分布を代表分布、クラスタの識別子（ＩＤ）を
コードワード、コードワードとそのクラスタの代表分布
とを関係付ける対応表をコードブックと呼んでいる。代
表分布格納部（１０４）には、このコードブックが格納
されている。代表分布格納部（１０４）の一実施例を図
２に示す。代表分布は、これを多次元ガウス分布として
モデル化した場合には、平均ベクトルと分散行列によっ
て表現することができる。代表分布格納部（１０４）
は、代表分布を表現する平均ベクトルVと分散行列Σ、
及びその分布を特定するためのＩＤ番号（コードワー
ド）を組とする集合によって構成されている。辞書格納
部（１０５）は、認識対象となる語彙が音韻記号列とし
て記述されている。単語ＨＭＭは、各認識対象単語に対
して、辞書格納部（１０５）に記述されている音韻記号
列にしたがって音韻ＨＭＭを連結していくことにより作
成される。音声認識部（１０３）は、特徴パラメータ抽
出部（１０２）で抽出した入力音声の特徴パラメータの
時系列データと認識対象語彙に基づいて作成した音声Ｈ
ＭＭとの間で尤度を計算し、その尤度の一番高い単語を
認識結果とする。認識結果出力部（１０６）は、音声認
識部（１０３）で得られた音声認識結果を出力する装置
によって構成される。例えば、認識結果をテキストとし
て表示するためのディスプレイなどがこれにあたる。The voice input unit (101) is usually composed of a microphone and an A / D converter, and takes in a voice waveform and converts it into digital data. The feature parameter extraction unit (102) divides the input speech waveform converted into digital data at short time intervals (usually 10 ms to 20 ms), and extracts feature parameters for the divided data. As feature parameters in speech recognition, LPC
Cepstrum is often used. For more information on LPC cepstrum, see Nakata's "Voice" (Corona). The phonetic HMM is stored in the acoustic HMM storage unit (105). The phoneme HMM represents a speech feature parameter sequence divided into phoneme units as a hidden Markov model (HMM). For HMM, Nakagawa, "Speech Recognition by Probabilistic Model" (IEEJ)
A detailed description is given in The probability density function in the phonological HMM is approximated as a mixture distribution of a multidimensional Gaussian distribution. At this time, in the semi-continuous HMM method, the multidimensional Gaussian distribution is classified into a finite number of clusters, and a distribution belonging to each cluster is represented by one distribution. Here, the represented distribution is called a representative distribution, a cluster identifier (ID) is called a codeword, and a correspondence table that associates a codeword with a representative distribution of the cluster is called a codebook. This codebook is stored in the representative distribution storage unit (104). One embodiment of the representative distribution storage unit (104) is shown in FIG. When the representative distribution is modeled as a multidimensional Gaussian distribution, it can be represented by a mean vector and a variance matrix. Representative distribution storage unit (104)
Is a mean vector V representing the representative distribution and a variance matrix Σ,
And an ID number (codeword) for specifying the distribution. In the dictionary storage unit (105), the vocabulary to be recognized is described as a phoneme symbol string. The word HMM is created by connecting the phoneme HMM to each recognition target word in accordance with the phoneme symbol string described in the dictionary storage unit (105). The speech recognition unit (103) includes a speech H created based on the time-series data of the feature parameters of the input speech extracted by the feature parameter extraction unit (102) and the vocabulary to be recognized.
The likelihood is calculated with respect to the MM, and the word having the highest likelihood is set as a recognition result. The recognition result output unit (106) is configured by a device that outputs the speech recognition result obtained by the speech recognition unit (103). For example, a display for displaying a recognition result as text corresponds to this.

【０００８】図３は雑音モデル適応部（１０８）の一実
施例を説明するための図である。図３において、３０１
は雑音ＨＭＭ作成部、３０２は分布選択部、３０３はス
ペクトル変換部、３０４はパラメータ加算部、３０５は
スペクトル逆変換部である。FIG. 3 is a diagram for explaining an embodiment of the noise model adapting section (108). In FIG.
Denotes a noise HMM creation unit, 302 denotes a distribution selection unit, 303 denotes a spectrum conversion unit, 304 denotes a parameter addition unit, and 305 denotes a spectrum inverse conversion unit.

【０００９】雑音ＨＭＭ作成部（３０１）は、参照対象
とする雑音波形に対応する特徴パラメータ系列を入力と
し、雑音ＨＭＭを作成する。入力する特徴パラメータ系
列の計算については、図１で既に説明した。雑音ＨＭＭ
を出現確率が単一ガウス分布１状態のモデルとすると、
雑音ＨＭＭ作成部（３０１）は、入力系列に対する特徴
パラメータの平均値と分散値を計算することになる。分
布選択部（３０２）は、コードブック格納部に格納され
ている代表分布のうち雑音ＨＭＭとモデル合成をおこな
う分布を選択する。分布選択部（３０２）の詳細につい
ては後述する。スペクトル変換部（３０３）は、分布選
択部（３０２）で選択された代表分布に対して、その特
徴パラメータを周波数領域のパラメータに変換する。前
述したように、音声認識ではＬＰＣケプストラムが特徴
パラメータとしてよく用いられる。ＬＰＣケプストラム
の周波数変換には、コサイン変換による対数スペクトル
領域への変換と、線形変換による線形スペクトル領域へ
の変換の２ステップにより行われる。ここで、特徴パラ
メータのガウス分布において、平均ベクトルの第r次元
目の値をμ[r]、分散行列の第r行s列の値をσ[r;s]とす
ると、コサイン変換は次式で与えられる。A noise HMM creating section (301) receives a feature parameter sequence corresponding to a noise waveform to be referred to and creates a noise HMM. The calculation of the input feature parameter sequence has already been described with reference to FIG. Noise HMM
Is a model with a single Gaussian distribution with one appearance probability.
The noise HMM creating unit (301) calculates the average value and the variance of the feature parameters for the input sequence. The distribution selection unit (302) selects a distribution that performs model synthesis with the noise HMM from the representative distributions stored in the codebook storage unit. Details of the distribution selection unit (302) will be described later. The spectrum converter (303) converts the characteristic parameters of the representative distribution selected by the distribution selector (302) into parameters in the frequency domain. As described above, in speech recognition, LPC cepstrum is often used as a feature parameter. The frequency conversion of the LPC cepstrum is performed in two steps: conversion to a logarithmic spectral domain by cosine conversion, and conversion to a linear spectral domain by linear conversion. Here, in the Gaussian distribution of the feature parameter, if the value of the r-th dimension of the average vector is μ [r] and the value of the r-th row and s-column of the variance matrix is σ [r; s], the cosine transform is Given by

【００１０】[0010]

【数１】 (Equation 1)

【００１１】[0011]

【数２】 (Equation 2)

【００１２】ここで、Here,

【００１３】[0013]

【数３】 (Equation 3)

【００１４】はコサイン変換行列である。また、μ、σ
に付けた下付きsuffixのcpとlgはそれぞれケプストラム
領域のパラメータ、対数スペクトル領域のパラメータで
あることを表す。同様に、線形変換は次式で行われる。Is a cosine transform matrix. Μ, σ
The subscripts cp and lg of the subscript suffix attached to indicate that they are parameters in the cepstrum domain and parameters in the logarithmic spectrum domain, respectively. Similarly, the linear transformation is performed by the following equation.

【００１５】[0015]

【数４】 (Equation 4)

【００１６】[0016]

【数５】 (Equation 5)

【００１７】ただし、suffixのlnは線形スペクトル領域
でのパラメータであることを表す。パラメータ加算部
（３０４）は、代表分布と雑音ＨＭＭの各特長量をスペ
クトル領域上で加算する。However, suffix ln indicates that it is a parameter in the linear spectrum region. The parameter adding unit (304) adds each characteristic amount of the representative distribution and the noise HMM on the spectral domain.

【００１８】[0018]

【数６】 (Equation 6)

【００１９】[0019]

【数７】 (Equation 7)

【００２０】ここで、μ、σに付けた上付きsuffixのs
n,s,nはそれぞれ、加算後の特徴量、代表分布の特徴
量、雑音ＨＭＭの特徴量であることを示す。スペクトル
逆変換部（３０５）は、雑音ＨＭＭと加算した代表分布
の特徴量を周波数領域から元のケプストラム領域に逆変
換する。この変換は対数変換とコサイン逆変換とに分け
られ、それぞれ線型変換の逆変換とコサイン変換の逆変
換に対応する。スペクトル逆変換部（３０５）で再び元
のケプストラム領域に戻された特徴パラメータは、代表
分布格納部（１０４）に格納され、これまでの代表分布
の代わりに音声認識部（１０３）で使用する。Here, s of superscript suffix added to μ and σ
n, s, and n indicate the added feature, the feature of the representative distribution, and the feature of the noise HMM, respectively. The spectrum inverse transform unit (305) inversely transforms the characteristic amount of the representative distribution added to the noise HMM from the frequency domain to the original cepstrum domain. This transformation is divided into logarithmic transformation and cosine inverse transformation, which correspond to linear transformation inverse transformation and cosine transformation inverse transformation, respectively. The feature parameters returned to the original cepstrum area again by the spectrum inverse transform unit (305) are stored in the representative distribution storage unit (104), and are used in the speech recognition unit (103) instead of the conventional representative distribution.

【００２１】次に、分布選択部（３０２）の詳細につい
て説明する。分布選択部（３０２）は、代表分布格納部
（１０４）に格納されているn個の代表分布のうち雑音
ＨＭＭとモデル合成をする分布を選択し、スペクトル変
換部（３０３）に出力する。図２で説明したように、代
表分布格納部（１０４）に存在する代表分布にはそれぞ
れの分布を示すＩＤ番号（コードワード）がつけられて
いる。したがって、分布選択部（３０２）では、選択対
象とする代表分布のＩＤ番号を記憶しておけば良い。Next, details of the distribution selecting section (302) will be described. The distribution selection unit (302) selects a distribution for performing model synthesis with the noise HMM from the n representative distributions stored in the representative distribution storage unit (104), and outputs the selected distribution to the spectrum conversion unit (303). As described with reference to FIG. 2, the representative distributions existing in the representative distribution storage unit (104) are assigned ID numbers (codewords) indicating the respective distributions. Therefore, the distribution selection unit (302) may store the ID number of the representative distribution to be selected.

【００２２】以下、分布選択部（３０２）においてなさ
れる代表分布の選択方法について説明する。代表分布の
選択は、代表分布の一つと、その代表分布に雑音ＨＭＭ
を周波数領域で加算した雑音重畳代表分布との分布間距
離をそれぞれの分布毎に計算しておき、その大きさに基
づいて行われる。この分布間距離の計算の説明を図４に
示す。この図ではn個ある代表分布のうち、コードワー
ドiの代表分布に対する例である。この代表分布i（４０
１）に対して雑音ＨＭＭ（４０２）をスペクトル領域で
加算することにより、雑音重畳代表分布i（４０３）を
作成する。使用する雑音ＨＭＭは音声認識システムが良
く使われると想定される環境雑音にもとづき作成する。
このとき、コードワードiの代表分布が雑音重畳によっ
て受ける変形量は、代表分布（４０１）と雑音重畳代表
分布（４０３）との間の距離として求める。分布間の距
離尺としては、カルバック・ダイバージェンス（Kullba
ckdivergence）距離尺度がよく用いられる。無相関多次
元ガウス分布に対するKullback divergence距離は次式
で計算する。Hereinafter, a method of selecting a representative distribution performed by the distribution selecting section (302) will be described. The selection of the representative distribution is performed by selecting one of the representative distributions and adding the noise HMM to the representative distribution.
Is calculated in the frequency domain, and the distance between the distribution and the noise superimposed representative distribution is calculated for each distribution, and the calculation is performed based on the magnitude. FIG. 4 illustrates the calculation of the distance between distributions. This figure shows an example of the representative distribution of the code word i among the n representative distributions. This representative distribution i (40
By adding the noise HMM (402) to 1) in the spectral domain, a noise superimposed representative distribution i (403) is created. The noise HMM to be used is created based on environmental noise which is assumed to be used frequently by the speech recognition system.
At this time, the amount of deformation that the representative distribution of the codeword i receives due to the noise superposition is obtained as the distance between the representative distribution (401) and the noise superimposed representative distribution (403). The distance between distributions is Kullbaq divergence (Kullba
The ckdivergence distance measure is often used. The Kullback divergence distance for an uncorrelated multidimensional Gaussian distribution is calculated by the following equation.

【００２３】[0023]

【数８】 (Equation 8)

【００２４】[0024]

【数９】 (Equation 9)

【００２５】先に述べたように、分布間距離Diはこの代
表分布が雑音重畳によって受ける変形量に相当する。つ
まり、この値が大きな代表分布は雑音の影響を受けやす
いモデルと考えられ、逆に、この値が小さい代表分布は
雑音に対して頑健なモデルであると考えられる。そこ
で、雑音に対して頑健な分布は雑音ＨＭＭの重畳による
雑音適応処理を行わないことにする。分布選択部（３０
２）では、事前に計算した各代表ベクトルの分布間距離
の値をもとに、この値がある閾値を上回る分布のコード
ワードを記憶しておく。そして、記憶したコードワード
に対応する代表分布のみに対し、雑音ＨＭＭを周波数領
域で加算する。雑音ＨＭＭの重畳による雑音適応手法の
処理量は、処理を適用する分布数にほぼ比例する。した
がって、分布選択部（３０２）において、処理する代表
分布の数を制限することで、雑音適応の処理量を大幅に
削減することが可能である。たとえば、処理する分布の
数を１／２に制限すれば、雑音適応処理もほぼ１／２の
処理量に削減される。また、選択基準として雑音重畳に
よって受けるモデルの変形量に相当する分布間距離尺度
を採用することで、処理量削減により雑音適応の効果が
大幅に劣化するようなことはない。As described above, the inter-distribution distance Di corresponds to the amount of deformation of this representative distribution due to noise superposition. That is, a representative distribution having a large value is considered to be a model susceptible to noise, and a representative distribution having a small value is considered to be a model robust to noise. Therefore, for a distribution that is robust to noise, noise adaptation processing by superimposing a noise HMM is not performed. Distribution selection unit (30
In 2), based on the value of the inter-distribution distance of each representative vector calculated in advance, a codeword of a distribution in which this value exceeds a certain threshold is stored. Then, the noise HMM is added in the frequency domain only to the representative distribution corresponding to the stored codeword. The processing amount of the noise adaptation method by superimposing the noise HMM is almost proportional to the number of distributions to which the processing is applied. Therefore, by limiting the number of representative distributions to be processed in the distribution selection unit (302), it is possible to greatly reduce the processing amount of noise adaptation. For example, if the number of distributions to be processed is limited to １／, the noise adaptation processing is also reduced to approximately 処理. In addition, by employing a distribution distance scale corresponding to the amount of deformation of a model received by noise superposition as a selection criterion, the effect of noise adaptation does not significantly deteriorate due to the reduction in processing amount.

【００２６】第一の実施例では、分布選択部（３０２）
によって、雑音ＨＭＭを周波数領域上で加算する代表分
布の数を制限する方式について説明した。しかし、使用
状況によっては雑音適応の処理時間に余裕があることも
ある。そこで、第二の実施例として、雑音適応を高速に
処理する必要があるときには、第一の実施例と同様に処
理する代表分布の数を制限し、処理に余裕がある場合に
は、すべての代表分布に対して雑音ＨＭＭを重畳する方
式を説明する。第二の実施例では、分布選択部（３０
２）において、図５に示すような優先度テーブルを用意
する。優先度テーブルには、代表分布格納部（１０４）
に対応するコードワードとその優先度により構成され
る。ここで、優先度は分布間距離の値に基づいて決定さ
れる。雑音モデル適応部（１０８）は優先度の高い順に
代表分布を選択し、逐次、雑音ＨＭＭとｋ重畳を処理し
ていく。このとき、雑音適応の処理が完了する前に、音
声認識が開始されたとしても、優先度の高い（雑音によ
る変形を受けやすい）代表分布は既に雑音ＨＭＭとの重
畳が完了しているため、雑音適応の性能が大幅に低下す
ることはない。In the first embodiment, the distribution selector (302)
Has described the method of limiting the number of representative distributions for adding the noise HMM in the frequency domain. However, there is a case where the processing time for noise adaptation has a margin depending on the use situation. Therefore, as a second embodiment, when it is necessary to process noise adaptation at high speed, the number of representative distributions to be processed is limited in the same manner as in the first embodiment. A method of superimposing the noise HMM on the representative distribution will be described. In the second embodiment, the distribution selection unit (30
In 2), a priority table as shown in FIG. 5 is prepared. In the priority table, a representative distribution storage unit (104)
And their priorities. Here, the priority is determined based on the value of the distance between distributions. The noise model adaptation unit (108) selects a representative distribution in descending order of priority, and sequentially processes the noise HMM and k superposition. At this time, even if speech recognition is started before the process of noise adaptation is completed, since the representative distribution having a high priority (which is easily deformed by noise) has already been superimposed on the noise HMM, The performance of noise adaptation is not significantly reduced.

【００２７】また、同様な考え方として、優先度に応じ
て、雑音適応の更新頻度を代表分布毎に変えるという方
式も考えられる。つまり、優先度の高い分布ほど更新を
頻繁におこない、優先度の低い代表分布に対しては、そ
の更新頻度を下げることで、限られた処理時間の中で効
果的な雑音適応処理が可能となる。As a similar idea, a method of changing the frequency of updating noise adaptation for each representative distribution according to the priority may be considered. In other words, the higher the priority distribution, the more frequently the update is performed, and the lower the priority distribution, the lower the update frequency, so that effective noise adaptation processing can be performed within a limited processing time. Become.

【００２８】以上はすべて単語音声認識装置を例にとっ
て説明したが、連続音声認識に対しても、同様に適用可
能である。Although all of the embodiments have been described with reference to a word speech recognition apparatus, the present invention can be similarly applied to continuous speech recognition.

【００２９】[0029]

【発明の効果】以上述べてきたように、本発明では、前
記雑音の無い環境で作成した前記音韻ＨＭＭを構成する
確率分布のうち、前記雑音ＨＭＭを加算する分布を制限
するので、雑音適応の処理量を大幅に削減することが可
能となる。すなわち、雑音ＨＭＭの重畳による雑音適応
手法の処理量は処理を適用する分布数にほぼ比例するの
で、処理する分布数を減少させれば、雑音適応の処理量
を減少させることが可能となる。As described above, according to the present invention, of the probability distributions constituting the phoneme HMM created in the noise-free environment, the distribution to which the noise HMM is added is limited. The processing amount can be significantly reduced. That is, the processing amount of the noise adaptation method by superimposition of the noise HMM is substantially proportional to the number of distributions to which the processing is applied. Therefore, if the number of distributions to be processed is reduced, the processing amount of the noise adaptation can be reduced.

[Brief description of the drawings]

【図１】音声認識装置の構成図。FIG. 1 is a configuration diagram of a speech recognition device.

【図２】代表分布格納部を説明する図。FIG. 2 is a diagram illustrating a representative distribution storage unit.

【図３】雑音モデル適応部の構成図。FIG. 3 is a configuration diagram of a noise model adaptation unit.

【図４】分布間距離の計算を説明するための図。FIG. 4 is a diagram for explaining calculation of a distance between distributions.

【図５】優先度テーブルを説明する図。FIG. 5 is a diagram illustrating a priority table.

[Explanation of symbols]

１０２…特徴パラメータ抽出部、１０４…代表分布格納
部、３０１…雑音ＨＭＭ作成部、３０２…分布選択部、
３０３…スペクトル変換部、３０４…パラメータ加算
部。102: feature parameter extraction unit; 104: representative distribution storage unit; 301: noise HMM creation unit; 302: distribution selection unit;
303: spectrum conversion unit; 304: parameter addition unit.

Claims

[Claims]

1. A method in which a feature parameter sequence of a speech divided into phoneme units is expressed as a hidden Markov model (phoneme HMM) in a noise-free environment, and the phoneme HMM and the noise created in the noise-free environment are created. Expressed as a hidden Markov model (Noise HM
M) in the frequency domain of each of the feature parameter sequences, and a noise adaptation method in speech recognition for creating the phoneme HMM adapted to a vocal environment, wherein the phoneme HMM created in the noise-free environment is configured. A noise adaptation method in speech recognition, wherein a distribution to which the noise HMM is added is restricted from a probability distribution.

2. The phoneme H created in the noise-free environment.
A probability distribution forming the MM, the probability distribution and the noise HM
M and the noise superimposition probability distribution obtained by adding in the frequency domain, the distribution distance is calculated in advance for each distribution, and the distribution to which the noise HMM is added is limited based on the magnitude of the distribution distance. 2. The noise adaptation method in speech recognition according to claim 1, wherein:

3. The distribution to which the restricted noise HMM is added is a distribution in which the value of the distance between the distributions calculated in advance exceeds a predetermined threshold. Noise adaptation method in speech recognition of humans.

4. The processing according to claim 2, wherein the processing of adding said noise HMM is performed with priority given to the distribution having a large value of said distance between distributions calculated in advance. Noise adaptation method in speech recognition as described.