JP2003316381A

JP2003316381A - Method and program for restricting noise

Info

Publication number: JP2003316381A
Application number: JP2002121072A
Authority: JP
Inventors: Mitsuyoshi Tatemori; 三慶舘森
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2002-04-23
Filing date: 2002-04-23
Publication date: 2003-11-07

Abstract

<P>PROBLEM TO BE SOLVED: To obtain a noise restricting effect with high precision by reducing a spectrum residual. <P>SOLUTION: Noise spectrums of each frame from a first frame to a T-th frame are obtained in a step S1. Then an average spectrum N and a standard deviation V are obtained in the noise in a step S3. In a step S4, a noise restriction amount D(ω, t) is calculated concerning each frequency ω by defining a present frame as the t-th frame (t>T). In this case, the noise restriction amount D(ω, t) is calculated based on a present input spectrum X(t), and the average value N and the standard deviation V of the noise. In a step S5, a voice spectrum is estimated through the use of S(t)=X(t)-D(t). The noise restriction amount D(ω, t) is calculated based on not only the present input spectrum X(t) and the noise average value N but the noise standard deviation (the square root of a variance) V. Thus, the restriction amount D(ω, t) is made to correspond further to an actual environmental noise, so that the spectrum residual is sufficiently reduced. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、雑音環境下で発声
された音声から高精度に雑音を抑圧するための雑音抑圧
方法及び雑音抑圧プログラムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a noise suppression method and a noise suppression program for highly accurately suppressing noise from a voice uttered in a noise environment.

【０００２】[0002]

【従来の技術】近年、音声認識技術の性能向上に伴い、
実環境における音声認識エンジンの実用化が活発になっ
てきている。特に、カーナビゲーションシステムやモバ
イル機器など入力装置が限定されるような状況におい
て、音声認識への期待は大きい。2. Description of the Related Art In recent years, as the performance of voice recognition technology has improved,
Practical application of voice recognition engine in real environment is becoming active. Especially, in a situation where input devices such as a car navigation system and a mobile device are limited, expectations for voice recognition are great.

【０００３】音声認識処理は、マイクロホンから取込ん
だ入力音声を、認識対象語彙と比較することで音声認識
結果を得る。実環境下においては、種々の雑音源がある
ことから、マイクロホンで取込んだ音声信号には、環境
雑音が混入する。音声認識処理においては、耐雑音性が
認識精度に大きな影響を与える。In the voice recognition process, the voice recognition result is obtained by comparing the input voice taken from the microphone with the vocabulary to be recognized. In a real environment, since there are various noise sources, environmental noise is mixed in the voice signal captured by the microphone. In speech recognition processing, noise resistance greatly affects recognition accuracy.

【０００４】このような雑音環境下で発声された音声信
号（スペクトル）に対する雑音抑圧方法として、スペク
トルサブトラクション（以下、ＳＳともいう）が広く用
いられている。Spectral subtraction (hereinafter also referred to as SS) is widely used as a noise suppression method for a voice signal (spectrum) uttered in such a noise environment.

【０００５】ＳＳによる雑音抑圧の最も基本的なアルゴ
リズムを以下に示す。基本的には、ＳＳは、観測した雑
音の平均レベルに応じて、予め雑音レベルを予測する。
そして、入力信号から予測した雑音レベルを減算するこ
とで、雑音を抑圧するようになっている。The most basic algorithm for noise suppression by SS is shown below. Basically, the SS predicts the noise level in advance according to the average level of the observed noise.
Then, the noise level is suppressed by subtracting the predicted noise level from the input signal.

【０００６】即ち、先ず、雑音抑圧に先立って、全く音
声を含まない雑音スペクトルの時系列｛Ｎ(t) ｝、Ｎ
(t) ＝（Ｎ(ω₁,t) ，Ｎ(ω₂,t) ，…，Ｎ(ω_d,t)）
（ωは周波数、ｔは時刻、すなわちフレーム番号を表
す）から、下記（１）式によって、雑音の平均ベクトル
Ｎ_aveを求める。That is, first, prior to noise suppression, a time series {N (t)}, N of a noise spectrum containing no speech at all.
(t) = (N (ω ₁ , t), N (ω ₂ , t), ..., N (ω _d , t))
From (where ω is the frequency and t is the time, that is, the frame number), the average vector N _{ave of} noise is calculated by the following equation (1).

【０００７】Ｎ_ave＝Σ_tＮ(t) ／Ｔ …（１）ここで、Σ_tは適当な時間区間での和を意味し、Ｔはそ
の時間区間の長さ（フレーム数）である。[0007] _{_{N ave = Σ t N (t}} ) / T ... (1) where, sigma _t denotes the sum of a suitable time interval, T is the length of the time interval (number of frames).

【０００８】次に、雑音が混入した音声スペクトルの時
系列｛Ｘ(t) ｝をＸ(t) ＝（Ｘ(ω₁,t) ，Ｘ(ω₂,t) ，
…，Ｘ(ω_d,t) )とし、雑音を含まない音声スペクトル
の推定量Ｓ(t) をＳ(t) ＝（Ｓ(ω₁,t) ，Ｓ(ω₂,t) ，
…，Ｓ(ω_d,t) )として、抑圧係数αを用いて、下記
（２）式に示すように、雑音が混入した入力信号Ｘ(ω,
t)から雑音成分を抑圧した音声パワスペクトルの推定値
Ｓ(ω, t)を求める。Next, the time series {X (t)} of the voice spectrum in which noise is mixed is expressed as X (t) = (X (ω ₁ , t), X (ω ₂ , t),
, X (ω _d , t)), and the estimated amount S (t) of the speech spectrum containing no noise is S (t) = (S (ω ₁ , t), S (ω ₂ , t),
, S (ω _d , t)) using the suppression coefficient α as shown in the following equation (2), the input signal X (ω,
The estimated value S (ω, t) of the speech power spectrum in which the noise component is suppressed is calculated from t).

【０００９】Ｓ(ω, t) ＝Ｘ(ω, t)−αＮ_ave(ω) （Ｘ(ω, t)−αＮ_ave(ω)＞0 の場合) ＝０（Ｘ(ω, t)−αＮ_ave(ω) ≦ 0 の場合 ) …（２）なお、ＳＳについては、文献１（Jean-Claude Janqua,
Jean-Paul Haton著『ROBUSTNESS IN AUTOMATIC SPEECH
RECOGNITION』Kluwer Academic Publishers ）に詳述さ
れている。S (ω, t) = X (ω, t) −αN _ave (ω) (when X (ω, t) −αN _ave (ω)> 0) = 0 (X (ω, t) − In the case of α N _ave (ω) ≤ 0) (2) Regarding SS, reference 1 (Jean-Claude Janqua,
ROBUSTNESS IN AUTOMATIC SPEECH by Jean-Paul Haton
RECOGNITION ”Kluwer Academic Publishers).

【００１０】[0010]

【発明が解決しようとする課題】ところで、上述したＳ
Ｓアルゴリズムにおいては、環境雑音が一定であるもの
として処理を行っている。即ち、上記（２）式の抑圧係
数αが固定値であるものとして処理している。ところ
が、一般に雑音のスペクトルは一定ではないことから、
抑圧係数αは環境に応じて可変とした方が高い雑音抑圧
効果を得ることができる。下記（３）式はこの理由を示
したものである。By the way, the above-mentioned S
In the S algorithm, processing is performed assuming that the environmental noise is constant. That is, the suppression coefficient α in the above equation (2) is processed as a fixed value. However, since the spectrum of noise is not constant in general,
It is possible to obtain a higher noise suppression effect by making the suppression coefficient α variable according to the environment. The following formula (3) shows the reason for this.

【００１１】即ち、上述したＳＳによる音声スペクトル
の推定値Ｓ(ω, t)は、より詳細には、下記（３）式に
示すことができる。That is, the above-mentioned estimated value S (ω, t) of the voice spectrum by SS can be shown in more detail by the following equation (3).

【００１２】Ｓ(ω, t) ＝Ｘ(ω, t)−αＮ_ave(ω) ＝ S₀(ω, t) + Ｎ_R(ω,t)−αＮ_ave(ω) ＝ S₀(ω, t) + (Ｒ(ω,t)−aＮ_ave(ω))+(Ｎ(ω)−(1+b)Ｎ_ave(ω)) …（３）なお、Ｘ(ω, t) ＝ S₀ (ω, t)+Ｎ_R (ω, t)＝S₀(ω, t)+Ｎ
(ω, t)+Ｒ(ω, t) Ｎ_R (ω, t) ＝Ｘ(ω, t)−S₀(ω, t)＝Ｒ(ω, t)+ Ｎ
(ω, t) Ｒ(ω, t) ＝ 2((S₀Ｎ)^1/2)cosθ(ω, t) α ＝ 1＋ａ＋ｂであり、また、Ｘ(ω, t)…入力信号のスペクトル S₀(ω, t)…真の音声スペクトルＮ_R (ω, t)…入力信号の非音声成分（雑音と、雑音と
音声の相関から成る）Ｎ(ω, t)…雑音スペクトルＲ(ω, t)…音声信号と雑音信号の相関である。S (ω, t) = X (ω, t) -αN _ave (ω) = S ₀ (ω, t) + N _R (ω, t) -αN _ave (ω) = S ₀ (ω, t) + (R (ω, t) −aN _ave (ω)) + (N (ω) − (1 + b) N _ave (ω)) (3) Note that X (ω, t) = S ₀ (ω, t) + N _R (ω, t) ＝ S ₀ (ω, t) + N
(ω, t) + R (ω, t) N _R (ω, t) = X (ω, t) −S ₀ (ω, t) = R (ω, t) + N
(ω, t) R (ω, t) = 2 ((S ₀ N) ^1/2 ) cos θ (ω, t) α = 1 + a + b, and X (ω, t) ... Input signal spectrum S ₀ (ω, t) ... True voice spectrum N _R (ω, t) ... Non-voice component of input signal (consisting of noise and correlation between noise and voice) N (ω, t) ... Noise spectrum R (ω, t) ) ... It is the correlation between the voice signal and the noise signal.

【００１３】なお、例えば特開２００１−２２８８９２
号公報（以下、文献２という）に記載されているよう
に、一般的には、抑圧係数αとして定数を用いる場合に
は、抑圧係数αは、（３）式に示すように、（ａ＋ｂ）
の分だけ１より大きな値に設定するとよいことが知られ
ている。Incidentally, for example, Japanese Patent Laid-Open No. 2001-228892
As described in Japanese Patent Publication (hereinafter referred to as Document 2), generally, when a constant is used as the suppression coefficient α, the suppression coefficient α is (a + b) as shown in Expression (3).
It is known that a value larger than 1 should be set by the amount of.

【００１４】上記（３）式のＮ_R(ω,t)−αＮ_ave(ω)
は雑音成分の引き残り分であるスペクトル残差を示して
いる。スペクトルサブトラクション技術は、環境雑音が
一定であることを前提として処理することにより、簡単
な構成で高速な処理を可能にしている。しかしながら、
実際には環境雑音は変動する。このため、抑圧係数αを
定数にすると、スペクトル残差Ｎ_R(ω, t)−αＮ
_ave(ω)が、時間に応じて変動することになり、十分な
雑音抑圧精度が得られない。 _NR (ω, t) -αN _ave (ω) in the above equation (3)
Indicates the spectral residual which is the residual of the noise component. The spectral subtraction technique enables high-speed processing with a simple configuration by processing on the assumption that environmental noise is constant. However,
In reality, environmental noise fluctuates. Therefore, if the suppression coefficient α is a constant, the spectrum residual N _R (ω, t) −αN
_{Since ave} (ω) varies with time, sufficient noise suppression accuracy cannot be obtained.

【００１５】そこで、上記文献１においては、スペクト
ル残差の項をより０に近い値とするために、抑圧係数α
を可変にする方法が考えられている。即ち、文献１にお
いては、雑音と音声のＳ／Ｎ比（Signal to Noise Rati
o）に応じて抑圧係数αを決定するようになっている。
しかし、この方法ではまだ十分な雑音抑圧精度は得られ
ない。Therefore, in Reference 1, the suppression coefficient α is set in order to make the term of the spectrum residual closer to 0.
The method of making variable is considered. That is, in Document 1, noise to speech S / N ratio (Signal to Noise Rati
The suppression coefficient α is determined according to o).
However, this method still does not provide sufficient noise suppression accuracy.

【００１６】また、文献２においても、抑圧係数αを可
変とする方法が開示されているが、この文献２の方法に
おいても十分な雑音抑圧精度は得られていない。[0016] Also, in Document 2, a method of varying the suppression coefficient α is disclosed, but even in the method of Document 2, sufficient noise suppression accuracy is not obtained.

【００１７】本発明は、雑音抑圧係数を雑音の分散に応
じて可変にすることにより雑音抑圧精度を向上させるこ
とができる雑音抑圧方法及び雑音抑圧プログラムを提供
することを目的とする。It is an object of the present invention to provide a noise suppression method and a noise suppression program that can improve noise suppression accuracy by making the noise suppression coefficient variable according to the variance of noise.

【００１８】[0018]

【課題を解決するための手段】本発明の請求項１に係る
雑音抑圧方法は、雑音のみの入力信号のスペクトル時系
列から、雑音の平均ベクトル及び標準偏差を求める手順
と、スペクトルサブトラクションにおけるスペクトル残
差を低減するように、雑音が混入した入力信号と前記雑
音の平均ベクトル及び標準偏差とに基づいて雑音抑圧量
を決定する雑音抑圧量決定手順と、雑音が混入した入力
信号から前記雑音抑圧量を減算することで前記入力信号
の雑音を抑圧する手順とを具備したものであり、本発明
の請求項２に係る雑音抑圧方法は、雑音のみの入力信号
のスペクトル時系列を所定クラスタ数の複数のクラスタ
に分割する手順と、前記雑音のみの入力信号から、各ク
ラスタ毎に雑音の平均ベクトル及び標準偏差を求める手
順と、雑音が混入した入力信号の入力スペクトルから所
定の距離尺度を用いて前記入力スペクトルに最も近似し
たクラスタを選択し、選択したクラスタの平均ベクトル
及び標準偏差を求める手順と、スペクトルサブトラクシ
ョンにおけるスペクトル残差を低減するように、雑音が
混入した入力信号と前記選択したクラスタの平均ベクト
ル及び標準偏差とに基づいて雑音抑圧量を決定する雑音
抑圧量決定手順と、雑音が混入した入力信号から前記雑
音抑圧量を減算することで前記入力信号の雑音を抑圧す
る手順とを具備したものである。According to a first aspect of the present invention, there is provided a noise suppression method, wherein a noise mean vector and standard deviation are determined from a spectral time series of a noise-only input signal, and a spectral residual in spectral subtraction. A noise suppression amount determining procedure for determining an amount of noise suppression based on an input signal mixed with noise and an average vector and standard deviation of the noise so as to reduce the difference, and the noise suppression amount from the input signal mixed with noise. And a step of suppressing the noise of the input signal by subtracting the noise. The noise suppression method according to claim 2 of the present invention comprises: , A procedure for dividing the noise into a cluster, a procedure for obtaining a noise average vector and a standard deviation for each cluster from the noise-only input signal, and noise A cluster that most closely approximates the input spectrum using a predetermined distance measure from the input spectrum of the input signal, a procedure for obtaining the average vector and standard deviation of the selected cluster, and reducing the spectral residual in the spectral subtraction. , A noise suppression amount determining procedure for determining an amount of noise suppression based on an input signal containing noise, and an average vector and standard deviation of the selected clusters, and subtracting the amount of noise suppression from the input signal containing noise. Therefore, a procedure for suppressing the noise of the input signal is provided.

【００１９】本発明の請求項１においては、先ず、入力
信号に対する雑音の抑圧に先だって、雑音のみの入力信
号のスペクトル時系列から、雑音の平均ベクトル及び標
準偏差が求められる。スペクトルサブトラクションにお
ける雑音抑圧量は、雑音が混入した入力信号と前記雑音
の平均ベクトル及び標準偏差とに基づいて決定される。
雑音混入した入力信号から決定した雑音抑圧量を減算す
ることにより入力信号の雑音を抑圧する。雑音抑圧量が
雑音の標準偏差を用いて決定されており、スペクトルサ
ブトラクションにおけるスペクトル残差は低減される。In the first aspect of the present invention, first, prior to suppression of noise with respect to the input signal, the average vector and standard deviation of noise are obtained from the spectral time series of the input signal containing only noise. The amount of noise suppression in the spectral subtraction is determined based on the input signal in which noise is mixed and the average vector and standard deviation of the noise.
The noise of the input signal is suppressed by subtracting the determined noise suppression amount from the noise-containing input signal. The amount of noise suppression is determined using the standard deviation of noise, and the spectral residual in spectral subtraction is reduced.

【００２０】本発明の請求項２において、雑音のみの入
力信号のスペクトル時系列は、所定クラスタ数の複数の
クラスタに分割される。そして、各クラスタ毎に雑音の
平均ベクトル及び標準偏差が求められる。所定の距離尺
度を用いて、各クラスタのうち、雑音が混入した入力信
号の入力スペクトルに最も近似したクラスタが選択さ
れ、選択されたクラスタの平均ベクトル及び標準偏差が
求められる。スペクトルサブトラクションにおける雑音
抑圧量は、雑音が混入した入力信号と選択したクラスタ
の平均ベクトル及び標準偏差とに基づいて決定される。
雑音が混入した入力信号から雑音抑圧量を減算すること
で入力信号の雑音を抑圧する。雑音はクラスタに分割さ
れ、入力スペクトルに最も近似したクラスタが選択され
て、平均ベクトル及び標準偏差が求められており、雑音
抑圧量は一層実際の雑音環境に適応したものとなり、ス
ペクトル残差は一層低減される。In claim 2 of the present invention, the spectral time series of the noise-only input signal is divided into a plurality of clusters of a predetermined number of clusters. Then, the average vector and standard deviation of noise are obtained for each cluster. Among the clusters, the cluster closest to the input spectrum of the noise-containing input signal is selected using a predetermined distance measure, and the average vector and standard deviation of the selected clusters are obtained. The amount of noise suppression in spectral subtraction is determined based on the input signal containing noise and the average vector and standard deviation of the selected clusters.
The noise of the input signal is suppressed by subtracting the noise suppression amount from the input signal containing the noise. The noise is divided into clusters, the cluster closest to the input spectrum is selected, the average vector and standard deviation are obtained, and the noise suppression amount is more adapted to the actual noise environment, and the spectral residual is more Will be reduced.

【００２１】なお、方法に係る本発明は、コンピュータ
に当該発明に相当する処理を実行させるためのプログラ
ムとしても成立する。The present invention relating to the method is also realized as a program for causing a computer to execute the processing corresponding to the present invention.

【００２２】[0022]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態について詳細に説明する。図１は本発明の一実
施の形態に係る雑音抑圧方法を示すフローチャートであ
る。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a flowchart showing a noise suppression method according to an embodiment of the present invention.

【００２３】本実施の形態は音声認識等に採用されるス
ペクトルサブトラクションを用いた雑音抑圧に適用した
ものである。スペクトルサブトラクションにおける音声
スペクトルの推定値Ｓ(ω, t)を示す上記（３）式は、
抑圧係数αを適宜設定することによってスペクトル残差
を小さくすることができる可能性を示している。The present embodiment is applied to noise suppression using spectral subtraction adopted for voice recognition and the like. The above equation (3) showing the estimated value S (ω, t) of the speech spectrum in the spectral subtraction is
It shows the possibility of reducing the spectrum residual by setting the suppression coefficient α appropriately.

【００２４】ここで、議論を明確にするために、相関項
Ｒ(ω,t)−aＮ_ave(ω)をほとんど無視することができ、
ｂが０であるような場合を考えると、音声スペクトルの
推定値Ｓ(ω, t)は下記（４）式によって示すことがで
きる。Here, to clarify the argument, the correlation term R (ω, t) -aN _ave (ω) can be almost ignored,
Considering the case where b is 0, the estimated value S (ω, t) of the speech spectrum can be expressed by the following equation (4).

【００２５】Ｓ(ω, t) ＝ S₀(ω, t) + （Ｎ(ω, t)−Ｎ_ave(ω) ) …（４）この（４）式の変動項Ｎ(ω,t)−Ｎ_ave(ω)は、現在の
フレームｔにおける雑音成分の平均値からのずれを意味
している。従って、変動項は、雑音の統計的性質の１つ
である分散と強い相関があることが分かる。つまり、分
散が小さい雑音の場合には、平均的に変動項の絶対値も
小さく、また、逆に分散が大きい雑音の場合には、変動
項の絶対値は平均的に大きくなる。S (ω, t) = S ₀ (ω, t) + (N (ω, t) -N _ave (ω)) (4) The variation term N (ω, t) in the equation (4) −N _ave (ω) means a deviation from the average value of the noise component in the current frame t. Therefore, it can be seen that the variation term has a strong correlation with the variance, which is one of the statistical properties of noise. That is, in the case of noise with small variance, the absolute value of the variation term is small on average, and conversely, in the case of noise with large variance, the absolute value of the variation term is averagely large.

【００２６】これを（３）式に適用すると、分散が小さ
い雑音に対しては抑圧係数αを１に近い値にとり、逆
に、分散が大きい場合にはαを大きくし、抑圧量を増や
すほうが好ましいことが推測される。なお、この時、分
散が大きい雑音に対してαを大きくすると、Ｎ(ω, t)
がＮ_ave(ω)より大きい場合には変動項が小さくなる
が、逆の場合には変動項は負のほうに大きくなるため、
かえって悪影響を及ぼす虞がある。しかし、先に述べた
ように、αとして定数を使用するＳＳの場合には、その
値を１より大きくして雑音抑圧量を大きくする方がよい
ことが知られている。それゆえ、変動項が負になる場合
にはデメリットはあるものの、やはりαを大きくするほ
うが良いことが予想できる。When this is applied to the equation (3), it is better to set the suppression coefficient α to a value close to 1 for noise with small variance, and conversely, to increase the suppression amount by increasing α when the variance is large. It is speculated that it is preferable. At this time, if α is increased for noise with large variance, N (ω, t)
When N is larger than N _ave (ω), the variation term becomes smaller, but in the opposite case, the variation term becomes larger in the negative direction.
On the contrary, there is a risk of adverse effects. However, as described above, in the case of SS that uses a constant as α, it is known that it is better to make the value larger than 1 to increase the noise suppression amount. Therefore, although there is a disadvantage when the variation term becomes negative, it can be expected that it is better to increase α.

【００２７】これらの理由から、本実施の形態において
は、αを雑音の分散に応じて可変にし、残差（変動項）
を小さくするようになっている。なお、本実施の形態に
おいては、雑音の抑圧量を分散の平方根である標準偏差
に応じて可変にしている。For these reasons, in the present embodiment, α is made variable according to the variance of noise, and the residual (variation term) is changed.
Is designed to be small. In the present embodiment, the amount of noise suppression is variable according to the standard deviation which is the square root of the variance.

【００２８】図１は雑音抑圧処理全体のアルゴリズムを
示している。FIG. 1 shows the algorithm of the entire noise suppression process.

【００２９】いま、入力スペクトルの時系列を{Ｘ(t)
｝とする。なお、ｔはフレーム番号であり、ｔ＝1,2,
3,…である。また、ωは周波数を表すものとする。ここ
で、入力信号の第１フレームから、少なくとも第Ｔフレ
ーム（Ｔは所定の定数）までには音声が混入せず、雑音
のみが入力されることが保証されているものとする。図
２は横軸にフレーム単位の時間をとり縦軸に周波数をと
って、このような条件を満たす入力スペクトル時系列を
示すスペクトル図である。Now, let the time series of the input spectrum be {X (t)
}. Note that t is a frame number, and t = 1,2,
3, ... Further, ω represents the frequency. Here, it is assumed that voice is not mixed and only noise is input from the first frame of the input signal to at least the T-th frame (T is a predetermined constant). FIG. 2 is a spectrum diagram showing an input spectrum time series that satisfies such a condition, where the horizontal axis represents time in frame units and the vertical axis represents frequency.

【００３０】図２は濃淡によって各フレーム毎の各周波
数帯域における信号レベルを示しており、濃い部分は信
号レベルが高いことを示し、淡い部分は信号レベルが低
いことを示している。FIG. 2 shows the signal level in each frequency band for each frame by shading. The dark part shows that the signal level is high, and the light part shows that the signal level is low.

【００３１】図２の時間０〜Ｔフレームの間は、入力信
号が雑音のみであることが保証されたフレームを示して
おり、Ｔフレーム以降のフレームが雑音抑圧の対象とな
るフレームである。なお、図２の中央の比較的濃い部分
を有する時間帯は、実際に「はちのへ」と発声した場合
の入力音声のスペクトルを示している。From time 0 to T frame in FIG. 2, frames in which the input signal is guaranteed to be only noise are shown, and the frames after T frame are the frames to be subjected to noise suppression. Note that the time zone having a relatively dark portion in the center of FIG. 2 shows the spectrum of the input voice when “Hachinohe” is actually uttered.

【００３２】先ず、フレーム番号ｔを１に初期化し、図
１のステップＳ1 において第１フレームの雑音スペクト
ルを取得する。ステップＳ212 においてｔをインクリメ
ントしながらｔがＴに到達したか否かを判定することで
（ステップＳ2 ）、第１フレームから第Ｔフレームまで
の各フレームの雑音スペクトルを取得する。First, the frame number t is initialized to 1, and the noise spectrum of the first frame is acquired in step S1 of FIG. The noise spectrum of each frame from the first frame to the T-th frame is acquired by determining whether t has reached T while incrementing t in step S212 (step S2).

【００３３】次に、ステップＳ3 において、雑音の平均
スペクトルＮ＝（Ｎ(ω₁), Ｎ(ω₂),…, Ｎ(ω_d) ）及
び標準偏差Ｖ＝（Ｖ(ω₁), Ｖ(ω₂),…, Ｖ(ω_d) ）を
下記（５）式及び（６）式によって求める。Next, in step S3, the average spectrum N = (N (ω ₁ ), N (ω ₂ ), ..., N (ω _d )) of noise and the standard deviation V = (V (ω ₁ ), V (ω ₂ ), ..., V (ω _d )) is obtained by the following equations (5) and (6).

【００３４】Ｎ(ω) ＝Σ_tＸ(ω,t) ／Ｔ …（５）Ｖ(ω) ＝｛Σ_tＸ(ω, t)² / T − Ｎ(ω)²｝^1/2 …（６）第１乃至第Ｔフレームは雑音の統計量を計算するための
フレームであり、雑音抑圧の対象とはならないフレーム
である。第（Ｔ＋１）フレーム以降に対して、雑音抑圧
処理によって推定した音声スペクトル｛Ｓ(t)｝ｔ＝Ｔ
＋１，Ｔ＋２，…を出力する。[0034] _{N (ω) = Σ t X} (ω, t) / T ... (5) V (ω) = {Σ t X (ω, t) 2 / T - N (ω) 2} 1/2 ... (6) The first to T-th frames are frames for calculating noise statistics, and are frames that are not the target of noise suppression. The speech spectrum {S (t)} t = T estimated by the noise suppression processing for the (T + 1) th frame and thereafter.
+1, T + 2, ... Are output.

【００３５】現在のフレームを第ｔフレーム（ｔ＞Ｔ）
とし、各周波数ωについて、雑音抑圧量Ｄ(ω,t)を計算
する（ステップＳ4 ）。本実施の形態においては、雑音
抑圧量Ｄ(ω,t)は、現在の入力スペクトルＸ(t) と雑音
の平均値Ｎと標準偏差Ｖとに基づいて算出するようにな
っている。The current frame is the t-th frame (t> T)
Then, the noise suppression amount D (ω, t) is calculated for each frequency ω (step S4). In the present embodiment, the noise suppression amount D (ω, t) is calculated based on the current input spectrum X (t), the average value N of noise, and the standard deviation V.

【００３６】次に、ステップＳ5 において、下記（７）
式から音声スペクトルＳ(t)を推定する。Next, in step S5, the following (7)
The speech spectrum S (t) is estimated from the equation.

【００３７】Ｓ(t) ＝Ｘ(t) −Ｄ(t) …（７）次にステップＳ6 において、ｔをインクリメントして、
ステップＳ7 で入力の終了を確認するまで、入力音声に
対してステップＳ3 乃至Ｓ5 の雑音抑圧処理を繰返す。S (t) = X (t) -D (t) (7) Next, at step S6, t is incremented by
The noise suppression process of steps S3 to S5 is repeated for the input voice until the end of the input is confirmed in step S7.

【００３８】このように構成された実施の形態において
は、雑音抑圧量Ｄ(ω,t)を、現在の入力スペクトルＸ
(t) と雑音の平均値Ｎだけでなく、雑音の標準偏差（分
散の平方根）Ｖに基づいて算出する。これにより、雑音
抑圧量Ｄ(ω,t)は、実際の環境雑音に一層対応したもの
となり、スペクトル残差を十分に低減することができ
る。In the embodiment configured as above, the noise suppression amount D (ω, t) is calculated as the current input spectrum X.
It is calculated based on not only (t) and the average value N of noise, but also the standard deviation (square root of variance) V of noise. As a result, the noise suppression amount D (ω, t) further corresponds to the actual environmental noise, and the spectrum residual can be sufficiently reduced.

【００３９】このように、本実施の形態においては、雑
音の分散に応じた雑音抑圧量を設定していることから、
雑音抑圧効果を十分に向上させることができるという効
果を有する。As described above, in this embodiment, since the noise suppression amount is set according to the noise variance,
It has an effect that the noise suppression effect can be sufficiently improved.

【００４０】図３は本発明の他の実施の形態を示すフロ
ーチャートである。図３において図１と同一の手順には
同一符号を付して説明を省略する。FIG. 3 is a flow chart showing another embodiment of the present invention. In FIG. 3, the same steps as those in FIG.

【００４１】本実施の形態は、雑音スペクトルのクラス
タリングを行うことにより、雑音抑圧の精度を向上させ
ることを可能にしたものである。The present embodiment is capable of improving the accuracy of noise suppression by performing noise spectrum clustering.

【００４２】本実施の形態においても、入力スペクトル
の時系列を{Ｘ(t) ｝とし、入力信号の第１フレームか
ら、少なくとも第Ｔフレームまでには音声が混入せず、
雑音のみが入力されることが保証されているものとす
る。Also in the present embodiment, the time series of the input spectrum is {X (t)}, and no voice is mixed from the first frame of the input signal to at least the Tth frame,
It is assumed that only noise is input.

【００４３】図３のステップＳ1 ，Ｓ2 ，Ｓ212 におい
て、第１フレームから第Ｔフレームまでの各フレームの
雑音スペクトルを取得する。次に、図３のステップＳ1
1，Ｓ12において、第１フレームから第Ｔフレームまで
の雑音に対して雑音スペクトルのクラスタリングを行
う。推定する雑音のクラスタ数は、Ｍ_eであり、各クラ
スタ毎に標準偏差を求める。各クラスタ毎の雑音の平均
値Ｎ(ω)及び標準偏差Ｖ(ω)の算出は、上記（５）式及
び（６）式と同様に求める。In steps S1, S2 and S212 of FIG. 3, the noise spectrum of each frame from the first frame to the Tth frame is acquired. Next, step S1 in FIG.
In S1 and S12, noise spectrum clustering is performed on the noise from the first frame to the Tth frame. The number of noise clusters to be estimated is M _e , and the standard deviation is obtained for each cluster. The average value N (ω) of noise and the standard deviation V (ω) of each cluster are calculated in the same manner as the above equations (5) and (6).

【００４４】図４は図３中の処理Ｓ11〜Ｓ13におけるク
ラスタリングの具体的な処理を説明するためのフローチ
ャートである。FIG. 4 is a flow chart for explaining a concrete process of clustering in the processes S11 to S13 in FIG.

【００４５】雑音クラスタ数が最大Ｍ(≧2)個まで許さ
れている場合におけるクラスタ計算方法の一例を示して
いる。An example of the cluster calculation method when the maximum number of noise clusters is M (≧ 2) is shown.

【００４６】先ず、第１フレームから第Ｔフレームの入
力(雑音)スペクトルを適宜のバッファに格納しておく
（ステップＳ21，Ｓ211 ，Ｓ212 ）。第Ｔフレームのス
ペクトルを格納した後、ステップＳ21から処理をステッ
プＳ22に移行して、Ｔ個の雑音スペクトルのクラスタリ
ングを行う。First, the input (noise) spectra from the first frame to the Tth frame are stored in an appropriate buffer (steps S21, S211, S212). After the spectrum of the T-th frame is stored, the process proceeds from step S21 to step S22 to cluster T noise spectra.

【００４７】クラスタリングの方法としては、例えば、
テレビジョン学会編『認識工学』コロナ社に詳述され
ているｋ-means法を用いる。この方法では、先ず、所定
のＭ個のクラスタＣ＝｛Ｎ_1,Ｎ_2，…, Ｎ_Ｍ｝、Ｎ_ｍ＝
（Ｎ_ｍ(ω₁), Ｎ_ｍ(ω₂), …, Ｎ(ω_d) ), ｍ＝１，
２，…，Ｍを作成する。As a clustering method, for example,
The k-means method described in detail in "Cognitive Engineering", Corona Publishing, edited by the Television Society is used. In this method, first, predetermined M clusters C = {N _1, N _2, ..., N _M }, N _m =
(N _m (ω ₁ ), N _m (ω ₂ ), ..., N (ω _d )), m = 1
2, ..., M are created.

【００４８】次に、適当に決めたクラスタリングの数が
妥当であるか否かを所定の距離尺度をＬ（例えばユーク
リッド距離）を用いて調べる。任意の２個のクラスタの
距離が所定の閾値よりも小さい場合には、これらのクラ
スタは本来１つのクラスタとすべきであるものと判断す
る。Next, a predetermined distance measure is examined by using L (for example, Euclidean distance) to determine whether or not the number of clusterings determined appropriately is appropriate. When the distance between any two clusters is smaller than a predetermined threshold value, it is determined that these clusters should originally be one cluster.

【００４９】即ち、所定の距離尺度をＬ（例えばユーク
リッド距離）とし、クラスタＣ中の合い異なる２個のセ
ントロイド（雑音の平均ベクトル）の、全ての組み合わ
せの中で最もＬの距離が小さい組を求める（ステップＳ
24）。ここでは距離最小の組み合わせをＮ_k、Ｎ_lとす
る。That is, assuming that a predetermined distance measure is L (for example, Euclidean distance), a pair of two different centroids (mean vectors of noise) in the cluster C having the smallest distance L among all combinations. (Step S
twenty four). Here, the minimum distance combination is N _k and N _l .

【００５０】距離Ｌ(Ｎ_k、Ｎ_l)が所定の値以下の場合に
は、ステップＳ26において、この２つの組み合わせの重
心を次式Ｎ_h＝（n_kＮ_k＋n_lＮ_l )／（n_k＋n_l) n_k 、n_lは、それぞれ、セントロイドＮ_kに属す雑音スペ
クトルの数によって計算し、これら２つのセントロイド
をマージして（ステップＳ26）、クラスタＣからＮ_k、
Ｎ_lを削除すると共に、新たにＮ_hを追加する（ステップ
Ｓ27）。If the distance L (N _k , N _l ) is less than or equal to a predetermined value, the center of gravity of these two combinations is calculated by the following equation N _h = (n _k N _k + n _l N _l ) / ( n _k + n _l ) n _k and n _l are respectively calculated by the number of noise spectra belonging to the centroid N _k , and these two centroids are merged (step S26) to obtain clusters C to N _k ,
N _l is deleted and N _h is newly added (step S27).

【００５１】この時、Ｃのセントロイド（Ｃの要素）数
はＭ−１に減る。また、これまでＮ _k、Ｎ_l に属してい
たベクトルはＮ_hに属すベクトルとなり、その数は n_h
＝ n_k＋n_lである。At this time, the number of centroids of C (elements of C)
Is reduced to M-1. Also, until now N _k, N_l Belongs to
Vector is N_hAnd the number of them is n_h
= N_k+ N_lIs.

【００５２】以下同様に、新たなクラスタに対してセン
トロイドが１個になるか、又は、最も近いセントロイド
間の距離が所定の値以上に大きくなるまで、ステップＳ
23，Ｓ25で判断しながら、上記の処理を繰り返す。Similarly, step S is repeated until there is one centroid for the new cluster or the distance between the nearest centroids becomes larger than a predetermined value.
The above process is repeated while making a determination in S23 and S25.

【００５３】逆に、距離Ｌ(Ｎ_k、Ｎ_l)が所定の値よりも
大きい場合には、クラスタＣが求める雑音の平均ベクト
ルの集合となる。この場合には、ステップＳ25からステ
ップＳ28に処理を移行して各クラスタの標準偏差Ｖm を
計算する。On the contrary, when the distance L (N _k , N _l ) is larger than a predetermined value, the cluster C is a set of average vectors of noise. In this case, the process shifts from step S25 to step S28 to calculate the standard deviation Vm of each cluster.

【００５４】即ち、求めたクラスタに対して、各クラス
タに属するノイズベクトルから、Ｖ_m(ω) ＝｛ Σ_τ（Ｎ(ω,τ)−Ｎ_m(ω))²／n_m｝^1/2 τはクラスタmに属する雑音スペクトルのフレーム番号
を表すによって、各クラスタの標準偏差Ｖ_ｍ＝（Ｖ
_ｍ(ω₁), Ｖ_ｍ(ω₂), …, Ｖ(ω_d))を求める（ステップ
Ｓ28）。That is, V _m (ω) = {Σ _τ (N (ω, τ) -N _m (ω)) ² / n _m } ^{1 / 2} τ represents the frame number of the noise spectrum belonging to the cluster m, so that the standard deviation V _m = (V
_m (ω ₁ ), V _m (ω ₂ ), ..., V (ω _d )) are obtained (step S 28).

【００５５】入力ベクトルＸ(t) に対する雑音抑圧量を
計算する際の平均ベクトルＮと標準偏差Ｖの選択におい
ては、以上のように求めたクラスタから、 e ＝ argmin_ｍ（Ｌ(Ｘ(t) , Ｎ_ｍ) ) として、Ｎ＝Ｎ_e、Ｖ＝Ｖ_eとし、これらを用いて抑
圧量Ｄ(t)を計算する。When selecting the average vector N and the standard deviation V when calculating the noise suppression amount for the input vector X (t), e = argmin _m (L (X (t) , N _m )), N = N _e , V = V _e, and the suppression amount D (t) is calculated using these.

【００５６】このように、本実施の形態においては、雑
音スペクトルをクラスタリングし、入力信号のスペクト
ルパターンがいずれのクラスタに最も近似しているかに
基づいて標準偏差算出に用いるクラスタを決定してい
る。これにより、雑音抑圧量の算出精度を向上させるこ
とができ、雑音抑圧効果を一層向上させることができ
る。As described above, in this embodiment, the noise spectrum is clustered, and the cluster used for calculating the standard deviation is determined based on which cluster the spectral pattern of the input signal is most similar to. Thereby, the calculation accuracy of the noise suppression amount can be improved, and the noise suppression effect can be further improved.

【００５７】次に、本発明の第３の実施の形態について
説明する。Next, a third embodiment of the present invention will be described.

【００５８】第１及び第２の実施の形態においては、雑
音抑圧量Ｄ(ω,t)は、現在の入力スペクトルＸ(t) と雑
音の平均値Ｎだけでなく、雑音の標準偏差（分散の平方
根）Ｖに基づいて算出する点を説明した。第１及び第２
の実施の形態においては、スペクトル残差を低減可能で
あれば、雑音抑圧量の算出方法としてはいずれの方法も
採用することができる。In the first and second embodiments, the noise suppression amount D (ω, t) is not limited to the current input spectrum X (t) and the noise mean value N, but also the noise standard deviation (variance). The point calculated based on the (square root of) V. First and second
In the embodiment, any method can be adopted as the method of calculating the noise suppression amount as long as the spectrum residual can be reduced.

【００５９】本実施の形態は、雑音抑圧量Ｄ(ω, t)の
具体的な計算式として下記（８）式を採用したものであ
る。In this embodiment, the following equation (8) is adopted as a concrete calculation equation of the noise suppression amount D (ω, t).

【００６０】Ｄ(ω,t)＝ α(ω,t)Ｖ(ω) + βＮ(ω) （Ｘ(ω,t)＞α(ω,t)Ｖ(ω) + βＮ(ω)の場合) ＝Ｘ(ω,t) （Ｘ(ω,t)≦α(ω,t)Ｖ(ω) + βＮ(ω)の場合) …（８）上記（８）式では、入力信号Ｘ(ω,t)と雑音の平均値の
β倍であるβＮ(ω)との差が標準偏差のα(ω,t)倍以下
であれば、Ｄ(ω,t)＝Ｘ(ω,t)とする。即ち、Ｓ(ω,
t) ＝Ｘ(ω,t)−Ｄ(ω,t) ＝Ｘ(ω,t)−Ｘ(ω,t) ＝
０なので、その入力信号の音声成分は０と推定すること
を意味している。D (ω, t) = α (ω, t) V (ω) + βN (ω) (when X (ω, t)> α (ω, t) V (ω) + βN (ω) ) = X (ω, t) (when X (ω, t) ≦ α (ω, t) V (ω) + βN (ω)) (8) In the above equation (8), the input signal X (ω , t) and βN (ω), which is β times the average value of noise, are less than or equal to α (ω, t) times the standard deviation, D (ω, t) = X (ω, t) To do. That is, S (ω,
t) = X (ω, t) -D (ω, t) = X (ω, t) -X (ω, t) =
Since it is 0, it means that the voice component of the input signal is estimated to be 0.

【００６１】一方、入力信号が十分大きく、雑音の平均
値のβ倍であるβＮ(ω)との差が標準偏差のα(ω,t)倍
以上の時には、推定音声スペクトルＳ(ω, t)は下記
（９）式で表すことができる。On the other hand, when the input signal is sufficiently large and the difference from βN (ω), which is β times the average value of noise, is α (ω, t) times the standard deviation or more, the estimated speech spectrum S (ω, t) ) Can be expressed by the following equation (9).

【００６２】Ｓ(ω, t) ＝Ｘ(ω,t) −βＮ(ω) − α(ω,t)Ｖ(ω) …（９）即ち、この場合には、雑音抑圧量は標準偏差Ｖ(ω)に関
して比例している。S (ω, t) = X (ω, t) -βN (ω) -α (ω, t) V (ω) (9) That is, in this case, the noise suppression amount is the standard deviation V It is proportional with respect to (ω).

【００６３】従って、雑音の分散（＝標準偏差の２乗）
が大きい場合には雑音抑圧量を大きくし、小さい場合に
は抑圧量を小さくするという目的を明確に実現した雑音
抑圧方法となっている。Therefore, the variance of noise (= square of standard deviation)
This is a noise suppression method that clearly realizes the purpose of increasing the noise suppression amount when is large, and decreasing the suppression amount when is small.

【００６４】なお、ここでα(ω,t)として、 α(ω,t) ＝Ｆ(ω,t)／Ｖ(ω) …（１０）Ｆ(ω,t)はＶ(ω)には依存しない関数または定数という
形式は除外されることに注意する。なぜなら、このα
(ω,t)を上記（８）式に代入すると、Ｄ(ω,t)＝ βＮ(ω)＋Ｆ(ω,t) （Ｘ(ω,t)＞βＮ(ω)＋Ｆ(ω,t)の場合) ＝Ｘ(ω,t) （Ｘ(ω,t)≦βＮ(ω)＋Ｆ(ω,t)の場合) …（１１）となり、雑音抑圧量が標準偏差Ｖ(ω)には依存しなくな
るからである。Here, as α (ω, t), α (ω, t) = F (ω, t) / V (ω) (10) F (ω, t) is V (ω) Note that the form of independent functions or constants is excluded. Because this α
Substituting (ω, t) into the above equation (8), D (ω, t) = βN (ω) + F (ω, t) (X (ω, t)> βN (ω) + F (ω, t) ) = X (ω, t) (X (ω, t) ≦ βN (ω) + F (ω, t)) (11), and the noise suppression amount depends on the standard deviation V (ω). Because it will not do.

【００６５】この（１１）式による雑音抑圧方法は、雑
音抑圧量が雑音の標準偏差に比例しているという点で、
最も簡単に雑音の分散（標準偏差の２乗）に依存した雑
音抑圧量を実現している。The noise suppression method based on the equation (11) is that the noise suppression amount is proportional to the standard deviation of noise.
The noise suppression amount that depends on the variance of noise (square of standard deviation) is most easily realized.

【００６６】図５及び図６は本発明の第４の実施の形態
を示すグラフである。図５は横軸にｘをとり縦軸にＤ
(ω,t)をとって雑音抑圧量Ｄ(ω,t)を示すグラフであ
り、図６は横軸にＸ(ω,t)をとり縦軸にＳ(ω,t)をとっ
て、音声スペクトルＳ(ω, t)を示すグラフである。FIGS. 5 and 6 are graphs showing the fourth embodiment of the present invention. In FIG. 5, x is on the horizontal axis and D is on the vertical axis.
7 is a graph showing the noise suppression amount D (ω, t) by taking (ω, t), and FIG. 6 shows X (ω, t) on the horizontal axis and S (ω, t) on the vertical axis. 6 is a graph showing a speech spectrum S (ω, t).

【００６７】本実施の形態は第１の実施の形態と同様の
雑音抑圧アルゴリズムを採用すると共に、上記（８）式
に示す雑音抑圧量Ｄ(ω, t)の計算方法及び下記（１
２）式に示す計算式を採用したものである。This embodiment adopts the same noise suppression algorithm as that of the first embodiment, and calculates the noise suppression amount D (ω, t) shown in the above equation (8) and the following (1
The calculation formula shown in Formula 2) is adopted.

【００６８】上記（８）式及び（１２）式を採用した場合における雑
音抑圧量Ｄ(ω, t)及び推定音声スペクトルＳ(ω, t)
は、夫々下記（１３）式及び（１４）式によって表すこ
とができる。[0068] The noise suppression amount D (ω, t) and the estimated speech spectrum S (ω, t) when the above equations (8) and (12) are adopted.
Can be expressed by the following equations (13) and (14), respectively.

【００６９】Ｄ(ω,t) ＝ βＮ(ω)＋ｋＶ(ω) （ｘ≦ｋの場合 ) ＝ βＮ(ω)＋(ｋ＋ａ)Ｖ(ω) （ｘ≧ｋ＋ｂの場合) ＝ βＮ(ω)＋(ａ(ｘ−ｋ)/ｂ＋ｋ)Ｖ(ω) (その他の場合) ただし、x ＝（Ｘ(ω,t)−βＮ(ω) )／Ｖ(ω) …（１３）Ｓ(ω, t) ＝０（Ｘ(ω, t)＜βＮ(ω)＋ｋＶ(ω) の場合 ) ＝Ｘ(ω,t)−βＮ(ω)−(ｋ＋ａ)Ｖ(ω) （Ｘ(ω, t)＞βＮ(ω)＋(ｋ＋ｂ)Ｖ(ω) の場合 ) ＝Ｘ(ω,t)−βＮ(ω)−(ａ(ｘ−ｋ)/ｂ＋ｋ)Ｖ(ω) （その他の場合 ) …（１４）ここで、ｋ，ａ，ｂは実験的に求めた定数であり、ａ，
ｂは非負値である。D (ω, t) = βN (ω) + kV (ω) (when x ≦ k) = βN (ω) + (k + a) V (ω) (when x ≧ k + b) = βN (ω) + (A (x−k) / b + k) V (ω) (other cases) where x = (X (ω, t) −βN (ω)) / V (ω) (13) S (ω, t) = 0 (when X (ω, t) <βN (ω) + kV (ω)) = X (ω, t) −βN (ω) − (k + a) V (ω) (X (ω, t) > ΒN (ω) + (k + b) V (ω)) = X (ω, t) −βN (ω) − (a (x−k) / b + k) V (ω) (other cases) (14) Here, k, a, and b are constants obtained experimentally, and a,
b is a non-negative value.

【００７０】図５の太線は上記（１３）式に基づく雑音
抑圧量Ｄ(ω,t)を示し、図６の太線は上記（１４）式に
基づく音声スペクトルＳ(ω, t)を示している。The thick line in FIG. 5 indicates the noise suppression amount D (ω, t) based on the above equation (13), and the thick line in FIG. 6 indicates the speech spectrum S (ω, t) based on the above equation (14). There is.

【００７１】上記（１４）式は、入力信号Ｘ(ω,t)と雑
音の平均値Ｎ(ω)との差の値に応じて３つの場合に場合
分けする。第１の場合、即ち、入力信号Ｘ(ω,t)と雑音
の平均値Ｎ(ω)との差が、標準偏差Ｖ(ω)のｋ倍以下で
あれば、Ｓ(ω,t)＝０となることを示している。これ
は、入力信号と雑音の平均値の差は雑音のゆらぎ（時間
変動）に由来するものと考えて、入力信号の音声成分は
０と推定することを意味する。The above equation (14) is classified into three cases according to the value of the difference between the input signal X (ω, t) and the noise average value N (ω). In the first case, that is, when the difference between the input signal X (ω, t) and the average value N (ω) of noise is less than or equal to k times the standard deviation V (ω), S (ω, t) = It has become 0. This means that the difference between the average value of the input signal and the noise originates from the noise fluctuation (time fluctuation), and the voice component of the input signal is estimated to be zero.

【００７２】第２の場合、即ち、入力信号が平均値から
の分散の(ｋ＋ｂ)倍以上大きくなれば、その信号は、突
発的に雑音が大きくなった可能性がないとは言えない
が、雑音信号に大きな音声信号が加わった可能性が高い
と判定し、比較的大きめの値Ｎ(ω)＋(ｋ＋ｂ)Ｖ(ω)
を減算する。In the second case, that is, when the input signal becomes (k + b) times larger than the variance from the average value, it cannot be said that the signal may have suddenly become large in noise. It is determined that there is a high possibility that a large voice signal has been added to the noise signal, and a relatively large value N (ω) + (k + b) V (ω)
Subtract.

【００７３】第３の場合、即ち、入力信号Ｘ(ω,t)が雑
音の平均値よりも標準偏差のｋ倍以上大きいが、(ｋ＋
ｂ)倍よりも小さい場合には、雑音信号に弱い音声雑音
が混入した可能性が高いとして、入力信号と雑音の平均
値の差に応じた、いわば程々の雑音抑圧量を減算するこ
とを意味する。In the third case, that is, the input signal X (ω, t) is larger than the average value of noise by k times the standard deviation, but (k +
If it is smaller than b) times, it means that weak voice noise is likely to be mixed in the noise signal, which means that a so-called moderate noise suppression amount is subtracted according to the difference between the average value of the input signal and the noise. To do.

【００７４】このように、本実施の形態においては、上
記（８）式を最も単純な形式で具体化することができ
る。従って、実用上も簡単に実装可能であり、計算量も
少ないという利点がある。As described above, in the present embodiment, the above equation (8) can be embodied in the simplest form. Therefore, there is an advantage that it can be easily implemented in practice and the amount of calculation is small.

【００７５】[0075]

【発明の効果】以上説明したように本発明によれば、抑
圧係数を雑音の分散に応じて可変にすることにより雑音
抑圧精度を向上させることができるという効果を有す
る。As described above, according to the present invention, the noise suppression precision can be improved by making the suppression coefficient variable according to the variance of noise.

[Brief description of drawings]

【図１】本発明の一実施の形態に係る雑音抑圧方法を示
すフローチャート。FIG. 1 is a flowchart showing a noise suppression method according to an embodiment of the present invention.

【図２】横軸にフレーム単位の時間をとり縦軸に周波数
をとって、入力スペクトル時系列の一例を示すスペクト
ル図。FIG. 2 is a spectrum diagram showing an example of an input spectrum time series in which the horizontal axis represents time in frame units and the vertical axis represents frequency.

【図３】本発明の他の実施の形態を示すフローチャー
ト。FIG. 3 is a flowchart showing another embodiment of the present invention.

【図４】図３中の処理Ｓ11〜Ｓ13におけるクラスタリン
グの具体的な処理を説明するためのフローチャート。FIG. 4 is a flowchart for explaining a specific process of clustering in processes S11 to S13 in FIG.

【図５】本発明の第４の実施の形態を示すグラフ。FIG. 5 is a graph showing a fourth embodiment of the present invention.

【図６】本発明の第４の実施の形態を示すグラフ。FIG. 6 is a graph showing a fourth embodiment of the present invention.

[Explanation of symbols]

Ｓ1 …雑音スペクトルの取得処理、Ｓ3 …平均スペクト
ル及び標準偏差算出処理、Ｓ4 …雑音抑圧量算出、Ｓ5
…音声スペクトルの推定。代理人弁理士伊
藤進S1 ... Noise spectrum acquisition processing, S3 ... Average spectrum and standard deviation calculation processing, S4 ... Noise suppression amount calculation, S5
... estimation of the speech spectrum. Proxy Patent Attorney Susumu Ito

Claims

[Claims]

1. A procedure for obtaining an average vector and standard deviation of noise from a spectral time series of an input signal containing only noise, and an input signal containing noise and the noise so as to reduce a spectral residual in spectral subtraction. A noise suppression amount determining procedure for determining a noise suppression amount based on an average vector and a standard deviation, and a procedure for suppressing noise of the input signal by subtracting the noise suppression amount from an input signal mixed with noise. A noise suppression method characterized by the above.

2. A step of dividing a spectral time series of a noise-only input signal into a plurality of clusters of a predetermined number of clusters, and a step of obtaining an average vector and standard deviation of noise for each cluster from the noise-only input signal. And a procedure for selecting a cluster that is the closest to the input spectrum using a predetermined distance measure from the input spectrum of the input signal containing noise, and obtaining the average vector and standard deviation of the selected cluster, and the spectrum residual in the spectral subtraction. A noise suppression amount determining procedure for determining a noise suppression amount based on an input signal containing noise and an average vector and standard deviation of the selected clusters so as to reduce the difference; And a step of suppressing the noise of the input signal by subtracting the suppression amount. Noise suppression method.

3. A spectral time series {N (t)} t = 1,2, ..., N of a noise-only signal prior to noise suppression of an input signal.
(t) = (N (ω ₁ , t), N (ω ₂ , t), ..., N (ω _d , t)) (d-dimensional vector, where ω ₁ , ω ₂ etc. are vectors corresponding to frequencies The average vector N = (N of the noise spectrum time series {N (t)}
(ω ₁ ), N (ω ₂ ), ..., N (ω _d )) and the standard deviation vector V = (V (ω ₁ ), V (ω ₂ ), ..., V (ω _d )) for each frequency
, And the current input spectrum X (t) = (X
(ω ₁ , t), X (ω ₂ , t), ..., X (ω _d , t)) and the noise suppression amount D (for the current frame t based on the average vector N and the standard deviation vector V t) = (D (ω ₁ , t), D (ω ₂ ,
t), ..., D (ω _d , t)), and the spectrum of the input signal S (t) = (S (ω ₁ ,
t), S (ω ₂ , t), ..., S (ω _d , t)), where S (t) = X (t)
A noise suppression method comprising: a procedure of estimating by D (t).

4. A spectral time series {N (t)} t = 1,2, ..., N of a noise-only signal prior to noise suppression of an input signal.
(t) = (N (ω ₁ , t), N (ω ₂ , t), ..., N (ω _d , t)) (d-dimensional vector, where ω ₁ , ω ₂ etc. are vectors corresponding to frequencies Of the noise spectrum time series {N
(t)} is divided into M _e clusters that do not exceed a predetermined number M, and for each cluster m (m = 1,2, ..., M _e (≦ M)),
Average vector of noise (centroid) N _m = (N _m (ω ₁ ),
N _m (ω ₂ ), ..., N _m (ω _d )) and the standard deviation vector V _m = (V _m (ω ₁ ), V _m (ω ₂ ), ..., V
_Of the procedure for estimating _m (ω _d )) and N _m (m = 1,2, ..., M _e ), the current input spectrum X (t) = (X (ω ₁ , t), X (ω ₂ ,
t), ..., X (ω _d , t))
And a corresponding standard deviation V, and the current input spectrum X (t) = (X
(ω ₁ , t), X (ω ₂ , t), ..., X (ω _d , t)) and the noise suppression amount D (for the current frame t based on the average vector N and the standard deviation vector V t) = (D (ω ₁ , t), D (ω ₂ ,
t), ..., D (ω _d , t)), and the spectrum of the input signal S (t) = (S (ω ₁ ,
t), S (ω ₂ , t), ..., S (ω _d , t)), where S (t) = X (t)
A noise suppression method comprising: a procedure of estimating by D (t).

5. The noise suppression amount D (t) in the input spectrum X (t) of the current frame and the obtained average spectrum N _m (m = 1, 2, ..., _Me ) of the noise. , A coefficient α (t) = (α (ω ₁ , t), α (ω) calculated from the average vector N of noises closest to X (t) and the standard deviation V corresponding to N for a predetermined distance measure. ₂ , t),…, α (ω _d , t))
And by the predetermined constants β and γ, D (ω, t) = α (ω, t) V (ω) + βN (ω) (X (ω, t)> α (ω, t) V (ω) + βN (ω)) = X (ω, t) (X (ω, t) ≤ α (ω, t) V (ω) + βN (ω)) Item 5. The noise suppression method described in either item 3 or 4.

6. The coefficient α (ω, t) is calculated by using a predetermined constant k, a non-negative constant a and a non-negative constant b, The noise suppression method according to claim 5, wherein the noise suppression method is determined by

7. A process for calculating a mean vector and standard deviation of noise from a spectral time series of an input signal containing only noise, and an input signal containing noise so as to reduce a spectral residual in spectral subtraction. Noise suppression amount determination processing for determining a noise suppression amount based on the average vector and standard deviation of the noise, and processing for suppressing the noise of the input signal by subtracting the noise suppression amount from an input signal containing noise A noise suppression program for executing and.

8. A process for causing a computer to divide a spectral time series of a noise-only input signal into a plurality of clusters having a predetermined number of clusters, and an average vector and standard deviation of noise for each cluster from the noise-only input signal. And a process of selecting a cluster that is the closest to the input spectrum using a predetermined distance measure from the input spectrum of the input signal containing noise, and calculating the average vector and standard deviation of the selected cluster, and spectral subtraction. Noise suppression amount determination processing for determining the noise suppression amount based on the input signal mixed with noise and the average vector and standard deviation of the selected clusters so as to reduce the spectrum residual in the input signal mixed with noise. And a process of suppressing the noise of the input signal by subtracting the noise suppression amount from Because of the noise suppression program.