JP2003223186A

JP2003223186A - Speech recognition method and speech recognition device

Info

Publication number: JP2003223186A
Application number: JP2002020303A
Authority: JP
Inventors: Hiroyuki Hoshino; 博之星野; Ryuta Terajima; 立太寺嶌
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2002-01-29
Filing date: 2002-01-29
Publication date: 2003-08-08
Anticipated expiration: 2022-01-29
Also published as: JP3693022B2

Abstract

<P>PROBLEM TO BE SOLVED: To set a flooring coefficient β for each frequency. <P>SOLUTION: In a linear or nonlinear spectrum subtraction method, for example, the flooring coefficient β is set for each frequency on the basis of the frequency spectrum envelope of a noise in a block just before a speech signal block. Thus, near the maximum value of the frequency spectrum envelope of the noise, the flooring coefficient β is set to 0.01, for example, and near the minimum value of the frequency spectrum envelope of the noise, the flooring coefficient β is set to 0.1, for example. Thus, a noise suppression effect is effectively operated according to the level of the noise for each frequency, and hence speech recognition performance can be remarkably improved. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、継続的に雑音の発
生する騒音下において、有効に作用する音声認識方法及
び音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition method and a voice recognition device which effectively operate under noise in which noise is continuously generated.

【０００２】[0002]

【従来の技術】入力された音声から発音、単語及び文章
を分析理解する装置である音声認識装置において、雑音
信号を除去し音声信号のみを取りだすことが望ましいこ
とは言うまでもない。ところが継続的ではあるものの一
定ではない雑音の発生する騒音下においては、雑音を予
め予測することは容易ではない。白色雑音でない騒音の
例としては、移動中の車両、船舶、航空機等の操縦室或
いは貨物室、作業機器及び輸送機器による騒音を有する
工場及び倉庫内などが挙げられる。2. Description of the Related Art Needless to say, it is desirable to remove noise signals and extract only voice signals in a voice recognition device which is a device for analyzing and understanding pronunciation, words and sentences from input voice. However, it is not easy to predict the noise in advance under a noise that is continuous but not constant. Examples of noises that are not white noises include cockpits or cargo compartments of moving vehicles, ships, aircraft, etc., factories and warehouses having noise from work equipment and transportation equipment, and the like.

【０００３】このような、継続的ではあるものの一定で
はない雑音の発生する騒音下における音声認識装置にお
いて、雑音を低下させる手法にスペクトルサブトラクシ
ョン法がある(S. F. Boll, IEEE Trans. ASSP-27, 2 (1
979) 113)。線型スペクトルサブトラクション法は、入
力信号を周波数スペクトルに変換した後、音声を含む信
号区間と背景雑音信号区間とに判別し、音声を含む信号
区間の周波数スペクトルからその直前の背景雑音信号区
間の周波数スペクトルを減じることで音声信号の周波数
スペクトルを得るものである。この際、直前の背景雑音
信号区間の周波数スペクトルのパワーを一律に１乃至３
倍として音声を含む信号区間の周波数スペクトルから減
じることで、雑音抑制をより効果的にすることもでき
る。[0003] In such a speech recognition apparatus under noise in which noise is generated continuously but not constantly, there is a spectrum subtraction method as a method for reducing the noise (SF Boll, IEEE Trans. ASSP-27, 2 (1
979) 113). The linear spectral subtraction method, after converting the input signal into a frequency spectrum, distinguishes between a signal section containing speech and a background noise signal section, and from the frequency spectrum of the signal section containing speech to the frequency spectrum of the immediately preceding background noise signal section. The frequency spectrum of the audio signal is obtained by subtracting. At this time, the power of the frequency spectrum in the immediately preceding background noise signal section is uniformly set to 1 to 3
Noise suppression can be made more effective by subtracting from the frequency spectrum of the signal section including voice as a double.

【０００４】一方、非線型スペクトルサブトラクション
法と呼ばれる、減算パラメータαを周波数毎に設定する
ものが知られている（P. Lockwood and J. Bondy, Spee
ch Communication, 11 (1992) 215）。これは、周波数
毎の減算パラメータα(ω)を、音声を含まない周波数ス
ペクトルの、各周波数ω毎の最大値（又はそれに比例さ
せる）とするものである。例えば時間軸上40個のフレー
ムを切り出し、各々を周波数変換して、周波数毎に40個
のスペクトル（パワー）のうちの最大値をとるとするも
のである。また、特開平９−１６０５９４には、周波数
帯域毎に減算パラメータαを最小二乗近似により求める
方法が記載されている。この文献においては、最小二乗
近似計算の計算量を減らすため周波数の帯域毎とする構
成である。また、特開平１０−１７７３９４には、騒音
のスペクトル分析により予め記憶されたパターンのいず
れであるかを認識し、それにより減算パラメータαを読
み出す構成が開示されている。On the other hand, there is known a non-linear spectral subtraction method in which a subtraction parameter α is set for each frequency (P. Lockwood and J. Bondy, Spee).
ch Communication, 11 (1992) 215). This is to set the subtraction parameter α (ω) for each frequency to the maximum value (or to be proportional to it) for each frequency ω of the frequency spectrum that does not include voice. For example, 40 frames are cut out on the time axis, each is frequency-converted, and the maximum value of 40 spectra (power) is taken for each frequency. Further, Japanese Patent Laid-Open No. 9-160594 describes a method of obtaining the subtraction parameter α by least square approximation for each frequency band. In this document, each frequency band is configured to reduce the calculation amount of the least-squares approximation calculation. Further, Japanese Patent Laid-Open No. 10-177394 discloses a configuration in which a subtraction parameter α is read out by recognizing a prestored pattern by noise spectrum analysis.

【０００５】[0005]

【発明が解決しようとする課題】ところで減算パラメー
タαは、雑音の抑制のために大きな値を取るため、直前
の背景雑音信号区間の周波数スペクトルのパワーを例え
ば一律に３倍として音声を含む信号区間の周波数スペク
トルから減じると、その出力が負の値をとることがあり
うる。しかし音声を含む信号区間の周波数スペクトルは
負の値を処理できない。そこで、このような不都合を回
避するため、フロアリング係数βが用いられている（例
えば公開特許公報２００１−２２８８９２）。フロアリ
ング係数βはいわば「下駄」であり、音声を含む信号区
間の周波数スペクトルをβ倍した値を下限値とすること
で、音声認識手段へ出力される周波数スペクトルが負と
ならないようにするものである。このフロアリング係数
βとしては例えば0.01〜0.1の値で固定されるものが使
用されている。By the way, since the subtraction parameter α has a large value in order to suppress noise, the power of the frequency spectrum of the immediately preceding background noise signal section is uniformly tripled, for example, and the signal section containing voice is increased. When subtracted from the frequency spectrum of, its output can have a negative value. However, the frequency spectrum of the signal section including voice cannot handle negative values. Therefore, in order to avoid such an inconvenience, the flooring coefficient β is used (for example, Japanese Patent Laid-Open No. 2001-228892). The flooring coefficient β is, so to speak, “geta”, and the value obtained by multiplying the frequency spectrum of the signal section including voice by β is used as the lower limit value to prevent the frequency spectrum output to the voice recognition means from becoming negative. Is. As the flooring coefficient β, one fixed at a value of 0.01 to 0.1 is used.

【０００６】しかし非線型スペクトルサブトラクション
法（ＮＳＳ）においては、減算パラメータαは一定値で
はないため、フロアリング係数βを一定値としては、背
景騒音のレベルの差の大きい各周波数に対し、フロアリ
ング係数βの最適値はそもそも無く、雑音抑制を効果的
にすることができなかった。However, in the non-linear spectral subtraction method (NSS), the subtraction parameter α is not a constant value, so that the flooring coefficient β is set to a constant value and the flooring is performed for each frequency having a large difference in background noise level. There was no optimum value of the coefficient β in the first place, and it was not possible to effectively suppress noise.

【０００７】本発明は上記の課題を解決するために成さ
れたものであり、その目的は、各周波数ωごとのフロア
リング係数β(ω)を算出して雑音を抑制する音声認識方
法及び音声認識装置を提供する事である。また、簡易且
つ計算量を抑えたまま、周波数ωごとのフロアリング係
数β(ω)を算出する方法を提供することである。The present invention has been made to solve the above problems, and an object thereof is to calculate a flooring coefficient β (ω) for each frequency ω and suppress a noise. It is to provide a recognition device. Another object of the present invention is to provide a method for calculating the flooring coefficient β (ω) for each frequency ω simply and with a reduced amount of calculation.

【０００８】[0008]

【課題を解決するための手段】上記の課題を解決するた
め、請求項１に記載の手段によれば、スペクトルサブト
ラクション法を用いて雑音を低下させた上で音声を認識
する音声認識方法において、音声を含む時間区間の周波
数スペクトルから、音声を含まない時間区間の雑音周波
数スペクトルを基にして周波数ごとに雑音を消去する
際、音声を含む時間区間の周波数スペクトルと、１より
小さく、周波数の関数であるフロアリング係数との積を
下限値とすることを特徴とする。また、請求項２に記載
の手段によれば、請求項１に記載の音声認識方法におい
て、スペクトルサブストラクション法において、雑音を
消去する際、周波数ごと音声を含む時間区間の周波数ス
ペクトルから、音声を含まない時間区間の雑音周波数ス
ペクトルと、周波数の関数である減算パラメータとの積
を減じる事を特徴とする。In order to solve the above-mentioned problems, according to the means described in claim 1, in a voice recognition method for recognizing a voice after reducing noise by using a spectral subtraction method, When eliminating noise for each frequency based on the noise frequency spectrum of the time section that does not include voice from the frequency spectrum of the time section that includes voice, the frequency spectrum of the time section that includes voice and a function of frequency smaller than 1 The lower limit value is the product of the flooring coefficient and the flooring coefficient. Further, according to the means described in claim 2, in the voice recognition method according to claim 1, when the noise is eliminated in the spectral subtraction method, the voice is extracted from the frequency spectrum of the time section including the voice for each frequency. It is characterized by subtracting the product of the noise frequency spectrum in the time interval not included and the subtraction parameter that is a function of frequency.

【０００９】また、請求項３に記載の手段によれば、騒
音下における音声認識装置において、任意の区間に対し
周波数スペクトルを求める周波数分析手段と、音声を含
まない時間区間に対し、周波数分析手段により求められ
た雑音周波数スペクトルからフロアリング係数を設定す
るフロアリング係数算定手段と、フロアリング係数算定
手段により決定された各周波数におけるフロアリング係
数を音声を含む時間区間の周波数スペクトルの周波数ご
とに乗じた値を算出する乗算手段と、音声を含む時間区
間に対し、周波数分析手段により求められた周波数スペ
クトルから、雑音周波数スペクトルの周波数ごとに減算
パラメータを乗じた値を減算する減算手段と、乗算手段
の出力と減算手段の出力を比較して、大きいほう出力す
る比較手段とを備えたことを特徴とする。According to the third aspect of the present invention, in the voice recognition device under noise, frequency analysis means for obtaining a frequency spectrum for an arbitrary section and frequency analysis means for a time section that does not include voice. The flooring coefficient calculation means for setting the flooring coefficient from the noise frequency spectrum obtained by the above, and the flooring coefficient at each frequency determined by the flooring coefficient calculation means are multiplied for each frequency of the frequency spectrum of the time section including speech. And a subtracting means for subtracting a value obtained by multiplying a subtraction parameter for each frequency of the noise frequency spectrum from the frequency spectrum obtained by the frequency analyzing means for a time section including voice. And the output of the subtraction means are compared, and the comparison means for outputting the larger one is provided. Characterized in that was.

【００１０】また、請求項４に記載の手段によれば、騒
音下における音声認識装置において、任意の区間に対し
周波数スペクトルを求める周波数分析手段と、音声を含
まない時間区間に対し、周波数分析手段により求められ
た雑音周波数スペクトルから減算パラメータを設定する
減算パラメータ算定手段と、音声を含まない時間区間に
対し、周波数分析手段により求められた雑音周波数スペ
クトルからフロアリング係数を設定するフロアリング係
数算定手段と、フロアリング係数算定手段により決定さ
れた各周波数におけるフロアリング係数を音声を含む時
間区間の周波数スペクトルの周波数ごとに乗じた値を算
出する乗算手段と、音声を含む時間区間に対し、周波数
分析手段により求められた周波数スペクトルから、雑音
周波数スペクトルの周波数ごとに減算パラメータ算定手
段により決定された各周波数における減算パラメータを
乗じた値を減算する減算手段と、乗算手段の出力と減算
手段の出力を比較して、大きいほう出力する比較手段と
を備えたことを特徴とする。Further, according to the means described in claim 4, in the voice recognition device under noise, a frequency analysis means for obtaining a frequency spectrum for an arbitrary section and a frequency analysis means for a time section containing no voice. Subtraction parameter calculation means for setting a subtraction parameter from the noise frequency spectrum obtained by the above, and flooring coefficient calculation means for setting a flooring coefficient from the noise frequency spectrum obtained by the frequency analysis means for a time section that does not include speech. And a multiplication means for calculating a value obtained by multiplying the flooring coefficient at each frequency determined by the flooring coefficient calculation means for each frequency of the frequency spectrum of the time section including voice, and a frequency analysis for the time section including voice. From the frequency spectrum obtained by the method, the noise frequency spectrum The subtraction means for subtracting a value obtained by multiplying the subtraction parameter at each frequency determined by the subtraction parameter calculation means for each frequency, and the comparison means for comparing the output of the multiplication means and the output of the subtraction means and outputting the larger one It is characterized by that.

【００１１】また、請求項５に記載の手段によれば、減
算パラメータ算定手段は、周波数分析手段により求めら
れた雑音周波数スペクトルからスペクトル包絡を求めた
上で当該各周波数におけるスペクトル包絡に対応して減
算パラメータを設定するものであることを特徴とする。
また、請求項６に記載の手段によれば、フロアリング係
数算定手段は、周波数分析手段により求められた雑音周
波数スペクトルからスペクトル包絡を求めた上で当該各
周波数におけるスペクトル包絡に対応してフロアリング
係数を設定するものであることを特徴とする。Further, according to the means of claim 5, the subtraction parameter calculation means obtains a spectrum envelope from the noise frequency spectrum obtained by the frequency analysis means, and then, corresponds to the spectrum envelope at each frequency. It is characterized in that a subtraction parameter is set.
Further, according to the means described in claim 6, the flooring coefficient calculating means obtains the spectrum envelope from the noise frequency spectrum obtained by the frequency analyzing means, and then performs the flooring corresponding to the spectrum envelope at each frequency. It is characterized in that a coefficient is set.

【００１２】[0012]

【作用及び発明の効果】本発明においては、フロアリン
グ係数を音声を含まない時間区間の信号の周波数スペク
トルによって周波数毎に設定するので、フロアリング係
数にいわば周波数依存性をもたせているので、スペクト
ルサブストラクション法において、各周波数毎に適切な
「下限値」を設定する事ができる。例えばスペクトルレ
ベルの大きい周波数に対しては当該「下限値」を設定す
るためのフロアリング係数を小さく、スペクトルレベル
の小さい周波数に対しては当該「下限値」を設定するた
めのフロアリング係数を大きくするなどすれば良い。In the present invention, since the flooring coefficient is set for each frequency according to the frequency spectrum of the signal in the time section which does not include voice, the flooring coefficient has a frequency dependence, so to speak. In the subtraction method, an appropriate "lower limit value" can be set for each frequency. For example, for frequencies with a large spectrum level, the flooring coefficient for setting the "lower limit" is small, and for frequencies with a small spectrum level, the flooring coefficient for setting the "lower limit" is large. You can do it.

【００１３】また、フロアリング係数を算出するための
元となるものは、音声を含まない時間区間の信号のみで
あり、実質的には１組の雑音データから、観測データ中
の音声信号を取りだすことができる。また、フロアリン
グ係数を算出する方法を、簡易な雑音データのスペクト
ル包絡を求めることとすれば、極めて容易である。この
様にして得られたフロアリング係数は、各周波数ごとに
設定され、且つ、雑音のパワーの確率論的な周波数毎の
時間変動を平均したものとすることができる。即ち、こ
のフロアリング係数を用いることで、音声を含む信号区
間における雑音スペクトルを抑制する処理において、そ
の出力が負とならないような適切なスペクトルの下限値
を決定することができる。こうして、スペクトル包絡か
らフロアリング係数を算出することで、全体の構成とし
ても小さく、且つ適切なフロアリング係数を算出できる
音声認識装置とすることができる。尚、当該雑音データ
のスペクトル包絡から、減算パラメータを求めるように
することも可能である。Further, the source for calculating the flooring coefficient is only the signal in the time section which does not include voice, and the voice signal in the observation data is substantially extracted from one set of noise data. be able to. Further, if the method of calculating the flooring coefficient is to obtain a simple spectral envelope of noise data, it is extremely easy. The flooring coefficient thus obtained can be set for each frequency, and can be the average of the stochastic frequency fluctuations of the noise power for each frequency. That is, by using this flooring coefficient, it is possible to determine an appropriate lower limit value of the spectrum such that its output does not become negative in the process of suppressing the noise spectrum in the signal section including the voice. In this way, by calculating the flooring coefficient from the spectrum envelope, it is possible to provide a voice recognition device that has a small overall configuration and can calculate an appropriate flooring coefficient. It is also possible to obtain the subtraction parameter from the spectral envelope of the noise data.

【００１４】[0014]

【発明の実施の形態】以下、本発明の具体的な実施例に
ついて説明する。なお、本発明は以下の実施例に限定さ
れるものではない。BEST MODE FOR CARRYING OUT THE INVENTION Specific embodiments of the present invention will be described below. The present invention is not limited to the examples below.

【００１５】図１は、本発明の要部である、雑音周波数
スペクトルのスペクトル包絡と減算パラメータα及びフ
ロアリング係数βの関係の一例を示すグラフ図である。
本実施例では雑音周波数スペクトル包絡に対し、減算パ
ラメータαが最大2.6最小0.8となるよう、また、フロア
リング係数βが最小0.005最大0.11となるよう設定して
いる。即ち、雑音周波数スペクトル包絡の値が高いとこ
ろでは減算パラメータαを大きく、フロアリング係数β
を小さく、雑音周波数スペクトル包絡の値が低いところ
では減算パラメータαを小さく、フロアリング係数βを
大きくする。このように、雑音スペクトル包絡の各周波
数ごとの値から減算パラメータα及びフロアリング係数
βを決定するよう設定することで、容易に周波数依存の
パラメータα及びフロアリング係数βを決定できる。FIG. 1 is a graph showing an example of the relationship between the spectral envelope of the noise frequency spectrum and the subtraction parameter α and the flooring coefficient β, which is the main part of the present invention.
In this embodiment, the subtraction parameter α is set to a maximum of 2.6 and a minimum of 0.8, and the flooring coefficient β is set to a minimum of 0.005 and a maximum of 0.11. That is, at a high noise frequency spectrum envelope value, the subtraction parameter α is increased and the flooring coefficient β is increased.
Is small and the value of the noise frequency spectrum envelope is low, the subtraction parameter α is made small and the flooring coefficient β is made large. In this way, the frequency-dependent parameter α and the flooring coefficient β can be easily determined by setting the subtraction parameter α and the flooring coefficient β to be determined from the values of the noise spectrum envelope for each frequency.

【００１６】図２に、雑音信号から雑音周波数スペクト
ルのスペクトル包絡を求める具体例を示す。ディジタル
データである雑音信号波形を高速フーリエ変換器（ＦＦ
Ｔ，1）により高速フーリエ変換し、各周波数毎のパワ
ー（雑音周波数スペクトル）を求める。これの対数（ｌ
ｏｇ、図２で１１）をとって再度高速フーリエ変換（Ｆ
ＦＴ、図２で１２）すれば、雑音信号のケプストラムを
得ることができる。ここでケフレンシーの低い部分のみ
を取りだし（図２で１３）、低ケフレンシー成分を逆高
速フーリエ変換（ＩＦＦＴ、図２で１４）すれば、雑音
周波数スペクトルの対数の包絡を得ることができる。こ
の後指数（ｅｘｐ、図２で２０）をとって雑音周波数ス
ペクトルの包絡として、又は雑音周波数スペクトルの対
数の包絡自体から減算パラメータα及びフロアリング係
数βを算出することが可能となる。FIG. 2 shows a specific example of obtaining the spectrum envelope of the noise frequency spectrum from the noise signal. A noise signal waveform that is digital data is converted to a fast Fourier transformer (FF
Fast Fourier transform is performed by T, 1) to obtain the power (noise frequency spectrum) for each frequency. Logarithm of this (l
og, 11 in FIG. 2) and again the Fast Fourier Transform (F
FT, 12) in FIG. 2, the cepstrum of the noise signal can be obtained. Here, by taking out only the portion with low kefflenency (13 in FIG. 2) and performing the inverse fast Fourier transform (IFFT, 14 in FIG. 2) on the low kefflenency component, the logarithmic envelope of the noise frequency spectrum can be obtained. After that, it is possible to calculate the subtraction parameter α and the flooring coefficient β as an envelope of the noise frequency spectrum by taking an index (exp, 20 in FIG. 2) or from the envelope of the logarithm of the noise frequency spectrum itself.

【００１７】図３は、上記のような減算パラメータα及
びフロアリング係数βの算定部（減算パラメータ算定手
段及びフロアリング係数算定手段）１０を有する音声認
識装置１００の概略を示すブロック図である。入力信号
が高速フーリエ変換器（ＦＦＴ、周波数分析手段）１に
より周波数スペクトル信号となる。スペクトル信号は例
えば0〜10kHzの範囲である。次にその周波数スペクトル
信号が音声有無判定器（音声区間判定手段）２により、
一連の入力信号の音声の有無が判定される。例えば1000
〜4000Hzの範囲での周波数スペクトルのパワーが他の範
囲の周波数スペクトルのパワーよりも大きいか、などの
特徴により判定される。ここで音声が含まれない雑音信
号区間であると判断されると、雑音周波数スペクトル記
憶部（メモリ）３に周波数スペクトル（雑音周波数スペ
クトルN(ω)）が記憶される。また、算定部（減算パラ
メータ算定手段及びフロアリング係数算定手段）１０に
雑音周波数スペクトルN(ω)が送られる。FIG. 3 is a block diagram showing an outline of a speech recognition apparatus 100 having the above-described subtraction parameter α and flooring coefficient β calculation unit (subtraction parameter calculation means and flooring coefficient calculation means) 10. The input signal becomes a frequency spectrum signal by the fast Fourier transformer (FFT, frequency analysis means) 1. The spectral signal is in the range of 0 to 10 kHz, for example. Next, the frequency spectrum signal is detected by the voice presence / absence determiner (voice section determining means) 2.
The presence or absence of sound in the series of input signals is determined. For example 1000
It is determined by a characteristic such as whether the power of the frequency spectrum in the range of up to 4000 Hz is larger than the power of the frequency spectrum in other ranges. If it is determined that the noise signal section does not include voice, the frequency spectrum (noise frequency spectrum N (ω)) is stored in the noise frequency spectrum storage unit (memory) 3. Further, the noise frequency spectrum N (ω) is sent to the calculation unit (subtraction parameter calculation means and flooring coefficient calculation means) 10.

【００１８】算定部１０では、以下のようにして雑音周
波数スペクトルN(ω)から減算パラメータα(ω)及びフ
ロアリング係数β(ω)を算定する。まず、雑音周波数ス
ペクトルN(ω)の対数logN(ω)が対数演算器１１により
求められる。次に高速フーリエ変換器（ＦＦＴ）１２に
より、ケプストラムＣが求められる。次に低ケフレンシ
ー窓器１３によりケプストラムＣのうち低ケフレンシー
部分Ｃ'が求められる。次に逆高速フーリエ変換器（Ｉ
ＦＦＴ）１４により、雑音周波数スペクトルN(ω)の対
数logN(ω)の包絡l(ω)が求められる。包絡l(ω)の値か
ら減算パラメータα(ω)及びフロアリング係数β(ω)が
算出器１５により求められる。The calculation unit 10 calculates the subtraction parameter α (ω) and the flooring coefficient β (ω) from the noise frequency spectrum N (ω) as follows. First, the logarithm log N (ω) of the noise frequency spectrum N (ω) is obtained by the logarithmic calculator 11. Next, the cepstrum C is obtained by the fast Fourier transformer (FFT) 12. Next, the low-keflency window device 13 determines the low-keflency portion C ′ of the cepstrum C. Next, the inverse fast Fourier transformer (I
The FFT) 14 obtains the envelope l (ω) of the logarithm logN (ω) of the noise frequency spectrum N (ω). From the value of the envelope l (ω), the subtraction parameter α (ω) and the flooring coefficient β (ω) are calculated by the calculator 15.

【００１９】この様な演算が、音声を含む信号区間が入
力されるまで続けられ、雑音周波数スペクトルN(ω)、
減算パラメータα(ω)及びフロアリング係数β(ω)が更
新されていく。そして、音声を含む信号区間が入力され
ると、その高速フーリエ変換器（周波数分析手段）１の
出力（音声有無判定器２で音声を含むとされたS(ω)）
が、雑音抑制処理器（減算手段、乗算手段及び比較手
段）４に出力され、雑音周波数スペクトル記憶部（メモ
リ）３に記憶された雑音周波数スペクトルN(ω)と算出
器１５の出力である減算パラメータα(ω)及びフロアリ
ング係数β(ω)から、次の処理及び比較により出力P
(ω)を算出し、音声認識処理部５に出力する。なお、Ma
x｛A, B｝は、AとBのうち、小さくないほうを示す。 P(ω)＝Max｛S(ω)−α(ω)N(ω), β(ω)S(ω)｝Such calculation is continued until a signal section including voice is input, and the noise frequency spectrum N (ω),
The subtraction parameter α (ω) and the flooring coefficient β (ω) are updated. Then, when a signal section including voice is input, the output of the fast Fourier transformer (frequency analysis means) 1 (S (ω) determined to include voice by the voice presence / absence determiner 2)
Is output to the noise suppression processor (subtraction means, multiplication means and comparison means) 4, and the noise frequency spectrum N (ω) stored in the noise frequency spectrum storage unit (memory) 3 and the subtraction which is the output of the calculator 15. From the parameter α (ω) and flooring coefficient β (ω), output P by the following processing and comparison.
(ω) is calculated and output to the voice recognition processing unit 5. Note that Ma
x {A, B} indicates the smaller one of A and B. P (ω) = Max {S (ω) −α (ω) N (ω), β (ω) S (ω)}

【００２０】本願においては周波数スペクトルは、周波
数毎のパワーを意味する。また、ケプストラムを求める
際、スペクトルａ_nから次のようにケプストラムｃ_nを求
めても良い。尚、Σは、kについて、k=1からk=n-1まで
の和である。ｃ_n＝ａ_n−Σkｃ_kａ_n-k／nIn the present application, the frequency spectrum means the power for each frequency. Further, when obtaining the cepstrum, the cepstrum c _n may be obtained from the spectrum a _{n as} follows. Note that Σ is the sum of k = 1 to k = n−1 for k. c _n = a _n −Σkc _k a _nk / n

[Brief description of drawings]

【図１】本発明の雑音周波数スペクトルと、減算パラメ
ータα及びフロアリング係数βを決定する雑音周波数ス
ペクトル包絡との関係を示すグラフ図。FIG. 1 is a graph showing a relationship between a noise frequency spectrum of the present invention and a noise frequency spectrum envelope that determines a subtraction parameter α and a flooring coefficient β.

【図２】雑音周波数スペクトル包絡を求めるためのブロ
ック図。FIG. 2 is a block diagram for obtaining a noise frequency spectrum envelope.

【図３】本発明の具体的な一実施例に係る音声認識装置
の構成を示すブロック図。FIG. 3 is a block diagram showing a configuration of a voice recognition device according to a specific embodiment of the present invention.

[Explanation of symbols]

１００音声認識装置１０算定部１、１２高速フーリエ変換器２音声有無判定器３雑音周波数スペクトル記憶部４雑音抑制処理器１１対数演算器１３低ケフレンシー窓器１４逆高速フーリエ変換器１５算出器 100 voice recognition device 10 Calculation Department 1,12 Fast Fourier transformer 2 Audio presence / absence determiner 3 Noise frequency spectrum storage 4 Noise suppression processor 11 Logarithmic calculator 13 Low kefrenshi window 14 Inverse fast Fourier transformer 15 calculator

Claims

[Claims]

1. A voice recognition method for recognizing a voice after reducing noise by using a spectral subtraction method, wherein a frequency spectrum of a time section containing voice is based on a noise frequency spectrum of a time section not containing voice. A method for recognizing speech, wherein a lower limit is set to a product of a frequency spectrum in a time section including speech and a flooring coefficient which is a function of frequency and is smaller than 1 when noise is eliminated for each frequency.

2. In the spectral subtraction method, when noise is eliminated, a noise frequency spectrum of a time section that does not include voice and a subtraction parameter that is a function of frequency from a frequency spectrum of a time section that includes voice for each frequency. The speech recognition method according to claim 1, wherein the product of

3. A speech recognition apparatus under noisy frequency analysis means for obtaining a frequency spectrum for an arbitrary section, and flooring from a noise frequency spectrum obtained by the frequency analysis means for a time section that does not include speech. Flooring coefficient calculation means for setting a coefficient, and multiplication means for calculating a value obtained by multiplying the flooring coefficient at each frequency determined by the flooring coefficient calculation means for each frequency of the frequency spectrum of the time section including the voice. A subtraction unit for subtracting a value obtained by multiplying a subtraction parameter for each frequency of the noise frequency spectrum from a frequency spectrum obtained by the frequency analysis unit for a time section including voice; an output of the multiplication unit and the subtraction; Compare the output of the means,
A voice recognition device, comprising: a comparing means for outputting the larger one.

4. A speech recognition device under noisy frequency analysis means for obtaining a frequency spectrum for an arbitrary section, and a subtraction parameter from a noise frequency spectrum obtained by the frequency analysis means for a time section containing no speech. A subtraction parameter calculation means for setting a flooring coefficient calculation means for setting a flooring coefficient from the noise frequency spectrum obtained by the frequency analysis means for a time section not containing speech, and the flooring coefficient calculation means. Multiplying means for calculating a value obtained by multiplying the determined flooring coefficient at each frequency for each frequency of the frequency spectrum of the time section including the voice, and the frequency obtained by the frequency analyzing means for the time section including the voice. From the spectrum, the frequency of the noise frequency spectrum Subtracting means for subtracting a value obtained by multiplying the subtraction parameter at each frequency determined by the subtraction parameter calculating means bets, by comparing the output of said subtracting means and the output of said multiplying means,
A voice recognition device, comprising: a comparing means for outputting the larger one.

5. The subtraction parameter calculation means calculates a spectrum envelope from the noise frequency spectrum obtained by the frequency analysis means, and then sets a subtraction parameter corresponding to the spectrum envelope at each frequency. The voice recognition means according to claim 4.

6. The flooring coefficient calculation means calculates a spectrum envelope from the noise frequency spectrum obtained by the frequency analysis means, and then sets a flooring coefficient corresponding to the spectrum envelope at each frequency. The voice recognition means according to any one of claims 3 to 5, wherein the voice recognition means is present.