JP2008116686A

JP2008116686A - Noise suppression device

Info

Publication number: JP2008116686A
Application number: JP2006299770A
Authority: JP
Inventors: Ryoji Miyahara; 良次宮原
Original assignee: NEC Engineering Ltd
Current assignee: NEC Engineering Ltd
Priority date: 2006-11-06
Filing date: 2006-11-06
Publication date: 2008-05-22
Anticipated expiration: 2026-11-06
Also published as: JP4757775B2

Abstract

<P>PROBLEM TO BE SOLVED: To suppress a noise which becomes a problem in picking up a high-quality sound without distorting the target sound. <P>SOLUTION: A noise is reduced by a filter coefficient based on the Wiener filter theory using a band division filter bank utilizing the SSB modulation. By applying a treatment to the obtained filter coefficient, the distortion to an output signal is reduced, and the musical nose is also reduced. Specifically, a past SNR is used in estimating a noise, and the filter coefficient is smoothed so as not to change on a temporal axis excessively. The filter coefficient is corrected so as not to have a value extremely different from the coefficient of the adjoining band. A noise segment estimation is made and an amount of inflation to the noise level is varied. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声の高品質収音を目的とした雑音抑圧に関し、特に定常的な雑音と音声が混在した音響信号から、音声信号以外の雑音を抑圧する雑音抑圧装置に関する。 The present invention relates to noise suppression for the purpose of high-quality sound collection of speech, and more particularly to a noise suppression device that suppresses noise other than speech signals from an acoustic signal in which stationary noise and speech are mixed.

この種の雑音抑圧に関する技術として、スペクトルサブトラクション法（以降ＳＳ法と表記）が周知である。ＳＳ法では、マイクロホンに入力した音響信号にフーリエ変換を施し、周波数領域の振幅情報（以降スペクトルと表記）と位相情報に変換する。そして、このスペクトルの情報を利用し、雑音と音声が混在した音響信号のスペクトルから、推定された雑音信号のみのスペクトルを差し引くことで雑音の抑圧を図っている。 As a technique related to this type of noise suppression, a spectral subtraction method (hereinafter referred to as SS method) is well known. In the SS method, an acoustic signal input to a microphone is subjected to Fourier transform to convert it into frequency domain amplitude information (hereinafter referred to as spectrum) and phase information. Then, by using this spectrum information, noise is suppressed by subtracting the spectrum of only the estimated noise signal from the spectrum of the acoustic signal in which noise and voice are mixed.

ＳＳ法による雑音信号の推定方法については、これまでいくつかの方法が提案されており、例えば、特許文献１記載の技術は、フレームと呼ばれる一定時間毎にＳＮ比を求めて音声区間と雑音区間に分け、雑音区間の推定を行い、その区間でのスペクトルを雑音スペクトルとしている。 As a method for estimating a noise signal by the SS method, several methods have been proposed so far. For example, the technique described in Patent Document 1 obtains an S / N ratio at a certain time called a frame and obtains a speech interval and a noise interval. The noise interval is estimated and the spectrum in that interval is used as the noise spectrum.

また、特許文献２記載の技術は、雑音区間では、その前フレームでのスペクトルを雑音スペクトルとし、音声区間では入力信号と出力信号、推定雑音成分から音声成分のみを取り出し、入力信号との差分をとることで雑音信号のみを得、その成分によって更新することで雑音推定の精度を上げている。 In the technique described in Patent Document 2, in the noise section, the spectrum in the previous frame is set as the noise spectrum. In the voice section, only the speech component is extracted from the input signal and the output signal and the estimated noise component, and the difference from the input signal is calculated. In this way, only the noise signal is obtained, and the accuracy of noise estimation is improved by updating with the component.

ＳＳ法は入力信号から雑音成分を差し引く方法であるが、これは入力信号から雑音を作り出し、付加することで雑音の低減を図っているとも捉えることができる。この付加する雑音によって生じる、ミュージカルノイズと呼ばれる新たな雑音も問題となる。更に、ＳＳ法は、マイクロホンで収音できる時間領域の音響信号を周波数領域に変換するために、フーリエ変換を利用している。このフーリエ変換の処理方法については、高速フーリエ変換（ＦＦＴ）が周知であり、ＦＦＴ自体の演算量はそれほど多くないものの、周波数領域での信号が複素数となるために、実数の演算と比較し演算量のかかる複素数演算が、音声をリアルタイムで処理する場合には問題になってくる場合も多い。 The SS method is a method of subtracting the noise component from the input signal, but this can also be regarded as reducing noise by creating and adding noise from the input signal. New noise called musical noise caused by the added noise also becomes a problem. Further, the SS method uses Fourier transform in order to convert a time domain acoustic signal that can be picked up by a microphone into a frequency domain. As the Fourier transform processing method, Fast Fourier Transform (FFT) is well known, and although the amount of computation of FFT itself is not so large, since the signal in the frequency domain is a complex number, computation is performed in comparison with real number computation. In many cases, complex arithmetic operations are a problem when speech is processed in real time.

システムの大規模化および演算量の増加を解決するため、周知の一般的な片側波帯（以降ＳＳＢと表記）変調を利用したフィルタバンクを用いて帯域分割を行い、ウィナーフィルタの理論を利用して雑音抑圧を行う手法（例えば、特許文献３参照）が知られている。この手法では、雑音の推定ならびにフィルタ係数の更新を指数平滑することが効果的であるとし、これによりミュージカルノイズの低減が除去されるとしている。 In order to solve the increase in the scale of the system and the increase in the amount of calculation, band division is performed using a filter bank using a well-known general single sideband (hereinafter referred to as SSB) modulation, and the theory of Wiener filter is used. There is a known technique for performing noise suppression (see, for example, Patent Document 3). In this method, it is assumed that exponential smoothing of noise estimation and filter coefficient update is effective, thereby eliminating the reduction of musical noise.

また、ＳＳ法は周波数領域で非線形な処理を施すことになるため必然的に音声に歪が生じる。そこで、減衰係数を補正する帯域を中心に、任意の数の帯域の減衰帯域の減衰係数を重み付け平均することで、周波数領域での減衰係数の平滑化実現している（例えば、特許文献４参照）。 Further, since the SS method performs non-linear processing in the frequency domain, the sound is inevitably distorted. Therefore, the attenuation coefficient in the frequency domain is smoothed by weighted averaging of the attenuation coefficients of an arbitrary number of bands around the band for correcting the attenuation coefficient (see, for example, Patent Document 4). ).

特開平１０−０９７２８８号公報（第３頁−第４頁、図１）Japanese Patent Laid-Open No. 10-097288 (page 3 to page 4, FIG. 1) 特許第３２７０４８０号公報（第４頁−第５頁、図２）Japanese Patent No. 3270480 (pages 4-5, FIG. 2) 特表２００４−５０２９７７号公報（第７頁−第１０頁、図１）Japanese translation of PCT publication No. 2004-502977 (page 7-10, FIG. 1) 特開２００５−３４８１７３号公報（第６頁−第７頁、図１）Japanese Patent Laying-Open No. 2005-348173 (pages 6-7, FIG. 1)

しかし、特許文献１記載の技術では、音声区間での雑音の推定が行われず、その間に雑音に変化があると音声に歪みが生じる。また、雑音区間の推定に誤りが生じると、音声によって推定雑音スペクトルに誤りが生じ、これも音声の歪みの原因となる。この結果、雑音や音声の歪を伴った収音信号となってしまい、高品質な音声収音が実現されないという問題点がある。 However, in the technique described in Patent Document 1, noise is not estimated in a speech section, and if there is a change in noise during that time, the speech is distorted. In addition, when an error occurs in the estimation of the noise interval, an error occurs in the estimated noise spectrum due to speech, which also causes speech distortion. As a result, there is a problem that the sound collection signal is accompanied by noise and sound distortion, and high-quality sound collection is not realized.

また、特許文献２記載の技術では、前フレームの出力信号を利用して雑音成分を推定しており、音声信号が帯域3.4kHz程度（電話に利用されるの音声帯域）の信号であれば、大きな問題にはならないものの、広帯域である7kHz以上の音声を扱うような場合（音声会議・Webカンファレンスなど）、その推定には大きな誤差が生じるという問題点がある。 Further, in the technique described in Patent Document 2, the noise component is estimated using the output signal of the previous frame, and if the audio signal is a signal having a band of about 3.4 kHz (the audio band used for a telephone), Although it does not become a big problem, there is a problem that a large error occurs in the estimation when dealing with a voice of 7 kHz or higher which is a wide band (voice conference, web conference, etc.).

また、特許文献３記載の技術では、雑音の推定はあくまで平均値を利用することとなり、収音すべき音声に歪みが生じてしまうという問題点がある。 Further, the technique described in Patent Document 3 has a problem in that noise is estimated by using an average value to the extent that the sound to be collected is distorted.

また、特許文献４記載の技術では、各帯域の減衰係数の平準化により帯域間の連続性は改善するものの、雑音の抑圧に支障をきたす上、音声には新たな歪を生じさせてしまう。 Further, in the technique described in Patent Document 4, although the continuity between the bands is improved by leveling the attenuation coefficient of each band, it interferes with noise suppression and causes a new distortion in the voice.

そこで、本発明の目的は、帯域分割フィルタバンクを利用した雑音抑圧方法における、雑音の推定手法ならびにフィルタ係数の修正方法を提案し、装置の大規模化や演算の際に複素数演算を含まず、かつ歪の少ない雑音抑圧を実現する雑音抑圧装置を提供することにある。 Therefore, an object of the present invention is to propose a noise estimation method and a filter coefficient correction method in a noise suppression method using a band-division filter bank, and does not include complex number calculation in the scale-up or calculation of the device, Another object of the present invention is to provide a noise suppression device that realizes noise suppression with little distortion.

本発明の雑音抑圧装置は、スペクトルサブトラクション法を採用した雑音抑圧装置において、音声信号と定常的な雑音信号が混在している時間領域の入力信号をＳＳＢ変調の利用により制限された周波数帯域の信号に分割する帯域分割手段（図１の２０）と、分割された周波数帯域の入力信号の内の雑音を抑圧する処理を行う周波数帯域対応の処理部（図１の３０）と、各処理部で処理された信号を合成することによって雑音の抑圧された一つの信号を出力する帯域合成部（図１の４０）とで構成され、各処理部（図１の３０）は、リーク積分により入力信号のフレーム間での減衰係数の差を抑えて時間軸上で平滑化し現在の信号の値の振幅値と雑音の振幅値を推定する振幅推定手段（図１の１，２）と、振幅推定手段からの振幅値によりＳＮＲを求めると共に減衰係数を決める減衰係数決定手段（図１の５，６）と、減衰係数決定手段により決められた減衰係数により出力信号の歪みを低減するための減衰係数補正手段（図１の７，８）と、減衰係数補正手段から得られた減衰係数を入力信号に乗じる乗算手段（図１の９）を有し、振幅推定手段（図１の２）は、１フレーム分だけ過去のＳＮＲの逆数でリーク積分の式を変形することにより音声区間であっても雑音推定を止めないことを特徴とする。 The noise suppressor of the present invention is a noise suppressor that employs a spectral subtraction method. A signal in a frequency band in which a time domain input signal in which a speech signal and a stationary noise signal are mixed is limited by using SSB modulation. In each of the processing units, the frequency band dividing means (20 in FIG. 1), the frequency band processing unit (30 in FIG. 1) that performs processing to suppress noise in the divided frequency band input signal, and 1 is composed of a band synthesizing unit (40 in FIG. 1) that outputs one signal in which noise is suppressed by synthesizing the processed signals, and each processing unit (30 in FIG. 1) receives an input signal by leak integration. Amplitude estimating means (1 and 2 in FIG. 1) for smoothing on the time axis and estimating the amplitude value of the current signal value and the noise amplitude value by suppressing the difference in attenuation coefficient between the frames, and the amplitude estimating means By the amplitude value from Attenuation coefficient determining means (5 and 6 in FIG. 1) for obtaining R and determining an attenuation coefficient, and an attenuation coefficient correcting means (in FIG. 1) for reducing distortion of the output signal by the attenuation coefficient determined by the attenuation coefficient determining means. 7 and 8) and multiplication means (9 in FIG. 1) for multiplying the input signal by the attenuation coefficient obtained from the attenuation coefficient correction means, and the amplitude estimation means (2 in FIG. 1) is past by one frame. It is characterized in that noise estimation is not stopped even in a speech section by modifying the equation of leak integration with the reciprocal of SNR.

また、本発明の雑音抑圧装置には、振幅推定手段（図１の１，２）により得られた入力信号振幅推定値と雑音振幅推定値を比較して雑音区間を判断し、このとき雑音振幅推定値には２程度の係数を乗じ、また雑音区間では0.5程度、音声区間では1.0程度の雑音バイアス値を出力する雑音区間推定手段（図１の３）と、振幅推定手段により得られる雑音振幅推定値と入力信号振幅推定値に雑音バイアス値を乗じて比較し、入力信号振幅推定値が小さければ、その帯域の減衰係数を最大とするような最大減衰係数フラグを減衰係数決定手段（図１の５，６）に出力する雑音振幅比較手段（図１の４）を設けてもよい。 Further, the noise suppression apparatus of the present invention compares the input signal amplitude estimated value obtained by the amplitude estimating means (1, 2 in FIG. 1) with the noise amplitude estimated value to determine the noise interval, and at this time, the noise amplitude The estimated value is multiplied by a coefficient of about 2, and the noise amplitude obtained by the noise estimation unit (3 in FIG. 1) that outputs a noise bias value of about 0.5 in the noise interval and about 1.0 in the speech interval, and the amplitude estimation unit The estimated value and the input signal amplitude estimated value are multiplied by a noise bias value and compared. If the input signal amplitude estimated value is small, a maximum attenuation coefficient flag that maximizes the attenuation coefficient of the band is set as the attenuation coefficient determining means (FIG. 1). 5 and 6) may be provided with noise amplitude comparison means (4 in FIG. 1).

より詳しくは、減衰係数決定手段（図１の５，６）は、振幅推定からの推定振幅値を入力して、雑音振幅推定値を入力信号振幅推定値で除算したＳＮＲの逆数を算出し、入力信号振幅推定値が雑音振幅推定値より小さい場合は該逆数を1.0として出力する信号・雑音比算出手段（図１の５）と、信号・雑音比算出手段からのＳＮＲの逆数と最大減衰係数フラグとにより減衰係数を算出して減衰係数補正手段へ出力する減衰係数算出手段（図１の６）とで構成される。 More specifically, the attenuation coefficient determining means (5, 6 in FIG. 1) inputs the estimated amplitude value from the amplitude estimation, calculates the reciprocal of the SNR obtained by dividing the noise amplitude estimated value by the input signal amplitude estimated value, When the input signal amplitude estimated value is smaller than the noise amplitude estimated value, the signal / noise ratio calculating means (5 in FIG. 1) outputs the reciprocal as 1.0, and the SNR reciprocal and maximum attenuation coefficient from the signal / noise ratio calculating means An attenuation coefficient calculating means (6 in FIG. 1) that calculates an attenuation coefficient by a flag and outputs the calculated attenuation coefficient to the attenuation coefficient correcting means.

また、減衰係数補正手段（図１の７，８）は、減衰係数決定手段減からの減衰係数についてリーク積分により時間軸上の平滑化を行う減衰係数平準手段（図１の７）と、当該周波数帯域における減衰係数について、隣り合う帯域の減衰係数を調べ、その減衰係数との比が一定以上にならないよう減衰係数を小さくする方向にのみ補正する帯域間減衰係数平準手段（図１の８）とで構成される。 Further, the attenuation coefficient correcting means (7, 8 in FIG. 1) includes an attenuation coefficient leveling means (7 in FIG. 1) for smoothing the attenuation coefficient from the attenuation coefficient determining means on the time axis by leak integration, Regarding the attenuation coefficient in the frequency band, the attenuation coefficient of the adjacent band is examined, and the inter-band attenuation coefficient leveling means (8 in FIG. 1) is corrected only in the direction of decreasing the attenuation coefficient so that the ratio with the attenuation coefficient does not exceed a certain value. It consists of.

本発明では、ＳＳＢ変調を利用した帯域分割により、先ず、複素数演算による演算量の増加を解決する。そして、雑音推定誤りに関しては、雑音と現在の振幅値の比（以後ＳＮＲと表記）を利用した雑音の推定とすることで音声区間中でも雑音の推定を継続し誤りを抑えている。以上より求まる雑音振幅スペクトルと入力信号スペクトルを利用し、ウィナーフィルタの理論を用いることによって雑音の低減を図る。このとき、ウィナーフィルタ理論により求まる減衰係数を時間領域で平滑化することで時間領域での非線形性を低減し、かつ隣接帯域間での減衰係数を基に各帯域の減衰係数を補正することで周波数領域での非線形性を低減し、音声の歪みを抑えている。さらに、ミュージカルノイズに関しては、雑音区間の推定を行い、雑音区間では雑音を見た目上大きくし、減衰係数を大きくすることで解決する。 In the present invention, the increase in the amount of calculation due to complex number calculation is first solved by band division using SSB modulation. With regard to noise estimation errors, noise estimation is continued and suppressed even during the speech period by using noise estimation using the ratio of noise to the current amplitude value (hereinafter referred to as SNR). By using the noise amplitude spectrum and the input signal spectrum obtained as described above, the noise is reduced by using the Wiener filter theory. At this time, the nonlinearity in the time domain is reduced by smoothing the attenuation coefficient obtained by the Wiener filter theory in the time domain, and the attenuation coefficient in each band is corrected based on the attenuation coefficient between adjacent bands. It reduces nonlinearity in the frequency domain and suppresses audio distortion. Furthermore, with respect to musical noise, the noise interval is estimated, and the noise is apparently increased in the noise interval, and the attenuation coefficient is increased.

本発明によれば、従来手法よりも演算量が少なく、出力音声に歪みの少ない、雑音抑圧が可能になる。これは、雑音推定にＳＮＲを利用する形態とすること、各帯域の減衰係数を時間軸上において平滑化すること、隣接帯域の減衰係数に基づいて各帯域の減衰係数を平滑化したためである。 According to the present invention, it is possible to perform noise suppression with a smaller amount of calculation than that of the conventional method and less distortion in the output speech. This is because the SNR is used for noise estimation, the attenuation coefficient of each band is smoothed on the time axis, and the attenuation coefficient of each band is smoothed based on the attenuation coefficient of the adjacent band.

以下、本発明の実施の形態について図面を参照しながら詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

［構成の説明］
図１に本発明の雑音抑圧装置の構成と処理の流れを示す。この雑音抑圧装置は、マイクロホン１０から入力してくる入力信号を周波数帯域に制限された信号に分割する帯域分割部２０と、分割された信号と１対１対応の処理部３０と、各処理部３０において処理された信号を合成する帯域合成部４０とで構成されている。 [Description of configuration]
FIG. 1 shows the configuration and processing flow of the noise suppression apparatus of the present invention. This noise suppression apparatus includes a band dividing unit 20 that divides an input signal input from a microphone 10 into a signal limited to a frequency band, a processing unit 30 that has a one-to-one correspondence with the divided signal, and each processing unit. And a band synthesizing unit 40 that synthesizes the signal processed in 30.

マイクロホン１０で収音される入力信号には音声信号と雑音信号が混在している。そこで、帯域分割部２０は入力信号を帯域分割し、各処理部３０は当該周波数帯域の入力信号の内の雑音を抑圧する処理を行い、帯域合成部４０で各帯域の信号を合成することによって、雑音の抑圧された信号を出力する。 An audio signal and a noise signal are mixed in the input signal picked up by the microphone 10. Therefore, the band dividing unit 20 divides the input signal into bands, each processing unit 30 performs a process of suppressing noise in the input signal in the frequency band, and the band synthesizing unit 40 synthesizes the signals of each band. , Output a noise-suppressed signal.

帯域分割部２０には一般的なＳＳＢ（Single Side Band）変調を利用したフィルタバンクを利用する。もし、帯域分割部２０にＦＦＴなどフーリエ変換の手法を利用すると、帯域分割内の信号が複素数となるため演算量の増加を招いてしまう。そこで、このような演算量の無駄な増加を抑えるためＳＳＢ変調を利用した帯域分割を行なう。 The band dividing unit 20 uses a filter bank using general SSB (Single Side Band) modulation. If a Fourier transform method such as FFT is used for the band dividing unit 20, the signal in the band division becomes a complex number, which increases the amount of calculation. Therefore, band division using SSB modulation is performed in order to suppress such a wasteful increase in calculation amount.

具体的な帯域分割部２０の流れである。先ず、入力信号を帯域分割数に依存するフレーム長で切り取る。これは、帯域分割部２０にて行われる分割の方法にも因るが、本発明で利用するＳＳＢ変調を利用した帯域分割では、例えば16個の帯域に分割したい場合、10サンプル程度のフレームにて処理を行う。分割数に関しては、サンプリング周波数によって変えるべきである。参考としては、16kHzサンプリングの音声を扱う場合は32分割で十分である。この場合、フレーム長は20サンプル程度となる。 This is a specific flow of the band dividing unit 20. First, the input signal is cut out with a frame length depending on the number of band divisions. This depends on the division method performed in the band dividing unit 20, but in the band division using the SSB modulation used in the present invention, for example, when it is desired to divide into 16 bands, the frame is about 10 samples. Process. The number of divisions should be changed according to the sampling frequency. For reference, 32 divisions are sufficient when dealing with 16 kHz sampled audio. In this case, the frame length is about 20 samples.

フレーム長で切り取られた信号は、ＳＳＢ変調を利用したフィルタバンクに入力され、各周波数帯域に制限された１サンプルの信号の集合となる。この各周波数帯域の信号は、それぞれほぼ独立に処理されるため、図１では、処理部３０をレイヤー構造にて表現している。 The signal cut out by the frame length is input to a filter bank using SSB modulation and becomes a set of signals of one sample limited to each frequency band. Since the signals in the respective frequency bands are processed almost independently, the processing unit 30 is represented by a layer structure in FIG.

各処理部３０は、入力信号振幅推定部１，雑音信号振幅推定部２，雑音区間推定部３，雑音振幅比較部４，信号・雑音比算出部５，減衰係数算出部６，減衰係数平滑部７，帯域間減衰係数平滑部８および乗算器９で構成されている。各部１〜８は、帯域分割部２０からの入力信号を分析するより、入力信号に対する減衰係数を求める。乗算器９は、帯域分割部２０からの入力信号に、各部１〜８で求まった減衰係数を乗じて帯域合成部４０へ出力する。 Each processing unit 30 includes an input signal amplitude estimating unit 1, a noise signal amplitude estimating unit 2, a noise interval estimating unit 3, a noise amplitude comparing unit 4, a signal / noise ratio calculating unit 5, an attenuation coefficient calculating unit 6, an attenuation coefficient smoothing unit. 7 and an inter-band attenuation coefficient smoothing unit 8 and a multiplier 9. Each part 1-8 calculates | requires the attenuation coefficient with respect to an input signal rather than analyzing the input signal from the band division part 20. FIG. The multiplier 9 multiplies the input signal from the band dividing unit 20 by the attenuation coefficient obtained by each unit 1 to 8 and outputs the result to the band synthesizing unit 40.

入力信号振幅推定部１は、リーク積分と呼ばれる処理で入力信号の振幅の推定を行なう。ＳＳ法は周波数領域で非線形な処理を施すことになるため、必然的に信号に歪が生じる。この信号の歪に対する対策として、入力信号のフレーム間での減衰係数の差を抑えて時間軸上で平滑化を図り、現在の信号の振幅値（平均値）を推定する。 The input signal amplitude estimator 1 estimates the amplitude of the input signal by a process called leak integration. Since the SS method performs nonlinear processing in the frequency domain, inevitably distortion occurs in the signal. As a countermeasure against the distortion of the signal, smoothing is performed on the time axis by suppressing the difference in attenuation coefficient between frames of the input signal, and the amplitude value (average value) of the current signal is estimated.

雑音信号振幅推定部２は、入力信号振幅推定部１と同様に、リーク積分を利用して雑音の振幅値を推定する。ここでの推定には、後段の信号・雑音比算出部５の出力を利用し、ＳＮＲ（SignalNoiseRatio:ここではAs/An）の逆数γ(=An/As)でリーク積分の式を変形することによって、音声区間・雑音区間の判別を必要とせず、かつ、音声区間でも雑音の推定を精度良く行うことを可能としている。 Similar to the input signal amplitude estimation unit 1, the noise signal amplitude estimation unit 2 estimates the noise amplitude value using leak integration. For the estimation here, the output of the signal / noise ratio calculation unit 5 in the subsequent stage is used, and the equation of the leakage integral is modified by the reciprocal γ (= An / As) of SNR (SignalNoiseRatio: Here As / An). Therefore, it is possible to accurately estimate noise even in a voice section without requiring discrimination between a voice section and a noise section.

雑音区間推定部３は、入力信号振幅推定部１と雑音信号振幅推定部２からのそれぞれの推定振幅値を入力して比較し雑音区間の推定を行う。この場合、雑音振幅推定値は平均値であるため、２程度の係数を乗じることで雑音区間を音声区間と誤推定することが半減する。このため、雑音区間推定部３は、雑音区間では0.5程度、音声区間では1.0程度の雑音バイアス値を後段の雑音振幅比較部４へ渡す。 The noise interval estimation unit 3 inputs the estimated amplitude values from the input signal amplitude estimation unit 1 and the noise signal amplitude estimation unit 2 and compares them to estimate the noise interval. In this case, since the estimated noise amplitude value is an average value, multiplying a coefficient of about 2 halves the false estimation of a noise interval as a speech interval. For this reason, the noise section estimation unit 3 passes a noise bias value of about 0.5 in the noise section and about 1.0 in the voice section to the noise amplitude comparison unit 4 in the subsequent stage.

雑音振幅比較部４は、入力信号振幅推定部１と雑音信号振幅推定部２からのそれぞれの推定振幅値を入力して比較し、入力信号振幅推定値が小さければ、その帯域の減衰係数を最大とするような最大減衰係数フラグを減衰係数算出部６に渡す。この際、推定値に雑音区間推定部３からの雑音バイアス値を乗じて比較する。これは、雑音区間推定部３において雑音振幅推定値に係数を乗じたことに呼応して雑音区間での雑音抑圧を効果的に行うためである。 The noise amplitude comparison unit 4 inputs and compares the respective estimated amplitude values from the input signal amplitude estimation unit 1 and the noise signal amplitude estimation unit 2, and if the input signal amplitude estimation value is small, the attenuation coefficient of the band is maximized. Is passed to the attenuation coefficient calculation unit 6. At this time, the estimated value is compared with the noise bias value from the noise interval estimation unit 3 for comparison. This is to effectively perform noise suppression in the noise interval in response to the noise interval estimation unit 3 multiplying the noise amplitude estimation value by a coefficient.

信号・雑音比算出部５は、入力信号振幅推定部１と雑音信号振幅推定部２のそれぞれの推定振幅値を入力して、雑音振幅推定値を入力信号振幅推定値で除算したＳＮＲの逆数γ=(An/As)を算出する。もし、入力信号振幅推定値が雑音振幅推定値より小さい場合はγ=1.0として雑音信号振幅推定部２および減衰係数算出部６に渡す。雑音信号振幅推定部２におけるγの使途は前述のとおりである。 The signal / noise ratio calculation unit 5 receives the estimated amplitude values of the input signal amplitude estimation unit 1 and the noise signal amplitude estimation unit 2 and divides the noise amplitude estimation value by the input signal amplitude estimation value γ. = (An / As) is calculated. If the input signal amplitude estimation value is smaller than the noise amplitude estimation value, it is passed to the noise signal amplitude estimation unit 2 and the attenuation coefficient calculation unit 6 as γ = 1.0. The usage of γ in the noise signal amplitude estimation unit 2 is as described above.

減衰係数算出部６は、信号・雑音比算出部５からのＳＮＲの逆数γ=(An/As)と、雑音振幅比較部４からの最大減衰係数フラグとにより減衰係数Lを算出し減衰係数平滑部７に渡す。減衰係数平滑部７は、減衰係数算出部６からの減衰係数Lについて、リーク積分により更に時間軸上の平滑化を行い、最終的な減衰係数SLを求めて帯域間減衰係数平滑部８に渡す。 The attenuation coefficient calculation unit 6 calculates the attenuation coefficient L from the inverse SNR γ = (An / As) from the signal / noise ratio calculation unit 5 and the maximum attenuation coefficient flag from the noise amplitude comparison unit 4, and smoothes the attenuation coefficient. Pass to part 7. The attenuation coefficient smoothing unit 7 further smoothes the attenuation coefficient L from the attenuation coefficient calculation unit 6 on the time axis by leak integration, obtains a final attenuation coefficient SL, and passes it to the interband attenuation coefficient smoothing unit 8. .

帯域間減衰係数平滑部８は、当該周波数帯域における減衰係数SLについて、隣り合う帯域（以降、隣接バンドと表記）の減衰係数SLを調べ、隣接バンドの減衰係数SLとの比（以降、MDと表記）が一定以上にならないよう補正する。補正は、減衰係数を小さくする方向にのみ行う。これにより、隣接バンド間での減衰係数が滑らかにつながり、音声のひずみを大きく解消できる。この目的のため、図１では描き難いが、帯域間減衰係数平滑部８には、隣り合う帯域に対応する処理部３０の減衰係数平滑部７から減衰係数SLが入力している。 The inter-band attenuation coefficient smoothing unit 8 checks the attenuation coefficient SL of an adjacent band (hereinafter referred to as an adjacent band) for the attenuation coefficient SL in the frequency band, and compares the attenuation coefficient SL with the adjacent band (hereinafter referred to as MD). (Notation) is corrected so that it does not exceed a certain level. Correction is performed only in the direction of decreasing the attenuation coefficient. Thereby, the attenuation coefficient between adjacent bands is smoothly connected, and the distortion of the voice can be largely eliminated. For this purpose, although difficult to draw in FIG. 1, the attenuation coefficient SL is inputted to the inter-band attenuation coefficient smoothing unit 8 from the attenuation coefficient smoothing unit 7 of the processing unit 30 corresponding to the adjacent band.

［動作の説明］
次に、以上のように構成された本雑音抑圧装置の動作について図２〜図１２をも参照しながら詳述する。 [Description of operation]
Next, the operation of the present noise suppression apparatus configured as described above will be described in detail with reference to FIGS.

帯域分割部２０にて分割された各周波数帯域の信号は、処理部３０の入力信号振幅推定部１に入力される。この入力信号振幅推定部１では、リーク積分と呼ばれる処理で振幅の推定が行われる。リーク積分は以下の式で表される。
As(t) = δ×|S| + (1-δ)×As(t-1) (1)
ここで、tはサンプル時間、(t-1)は１サンプル過去の時間を表す。Sはマイクロホンに入力する音声と雑音の混合した入力信号である。また、Asは入力信号の振幅推定値を表現している。δは瞬時値が推定値に与える影響をコントロールするためのパラメータであり、１以下の値とする。パラメータδを小さくすれば、振幅推定値Asは入力信号の平均値に近似され、大きくすれば入力信号の瞬時値に近くなる。 The signal of each frequency band divided by the band dividing unit 20 is input to the input signal amplitude estimating unit 1 of the processing unit 30. In the input signal amplitude estimation unit 1, amplitude is estimated by a process called leak integration. The leak integral is expressed by the following equation.
As (t) = δ × | S | + (1-δ) × As (t-1) (1)
Here, t represents the sample time, and (t-1) represents the time of one sample in the past. S is an input signal in which voice and noise input to the microphone are mixed. As represents the amplitude estimate of the input signal. δ is a parameter for controlling the influence of the instantaneous value on the estimated value, and is a value of 1 or less. If the parameter δ is reduced, the amplitude estimated value As is approximated to the average value of the input signal, and if it is increased, it is close to the instantaneous value of the input signal.

ウィナーフィルタの理論により雑音抑圧をするためには、入力信号の振幅値は入力信号の瞬時値を利用すれば問題ない。もし、入力信号の瞬時値を利用したい場合は、パラメータδを１とすればよい。ただし、δを0.5から0.25程度の値にすることでフレーム間での減衰係数の差が抑えられるため、時間軸上で平滑化がなされ、非線形性が抑えられる。このことにより、歪みを低減することができる。本発明ではδ＝0.5〜0.25を推奨する。 In order to suppress noise by the Wiener filter theory, there is no problem if the instantaneous value of the input signal is used as the amplitude value of the input signal. If the instantaneous value of the input signal is to be used, the parameter δ may be set to 1. However, since the difference in attenuation coefficient between frames can be suppressed by setting δ to a value of about 0.5 to 0.25, smoothing is performed on the time axis and nonlinearity can be suppressed. As a result, distortion can be reduced. In the present invention, δ = 0.5 to 0.25 is recommended.

次に、入力信号振幅推定部１の出力を雑音信号振幅推定部２に入力する。雑音信号振幅推定部２では、入力信号振幅推定部１と同様に、リーク積分を利用して雑音の平均値を推定する。この場合、瞬時値への追従を行う必要がないから、δの値をごく小さくし、0.0001などとする。雑音信号振幅推定部２への入力となる入力信号振幅推定部１の出力Asは、入力信号振幅推定部１にて予め平滑化が行われているため、雑音推定精度の向上が見込める。ただし、入力信号振幅推定部１の出力には、抑圧したい雑音の他に音声の成分も含まれている。 Next, the output of the input signal amplitude estimation unit 1 is input to the noise signal amplitude estimation unit 2. Similar to the input signal amplitude estimation unit 1, the noise signal amplitude estimation unit 2 estimates the average value of noise using leak integration. In this case, since it is not necessary to follow the instantaneous value, the value of δ is made extremely small, such as 0.0001. Since the output As of the input signal amplitude estimation unit 1 that is an input to the noise signal amplitude estimation unit 2 is smoothed in advance by the input signal amplitude estimation unit 1, an improvement in noise estimation accuracy can be expected. However, the output of the input signal amplitude estimation unit 1 includes a voice component in addition to the noise to be suppressed.

従来の手法では、音声区間と雑音区間を分けることで、音声成分の排除を行っているものが多い。例えば、特許文献１記載の技術では、雑音推定を、
An(t) = δ×As + (1-δ)×An(t-1) (2)
δ＝α when As/An ≦ TH
δ＝０ when As/An ＞ TH
０＜α＜１
（変数名は本発明のものに合わせている。）とし、ＳＮＲが悪いところ（THは閾値であり、一定の値である。）、つまり雑音区間でのみ雑音の推定を行っている。このような推定方法では、音声区間で雑音振幅の推定が行われず、雑音の変化に追従できず音声に歪が生じたり、雑音抑圧の効果が薄れてしまう。 In many conventional techniques, the speech component is eliminated by dividing the speech segment and the noise segment. For example, in the technique described in Patent Document 1, noise estimation is performed using
An (t) = δ × As + (1-δ) × An (t-1) (2)
δ = α when As / An ≤ TH
δ = 0 when As / An> TH
0 <α <1
(Variable names are the same as those of the present invention.) The noise is estimated only where the SNR is bad (TH is a threshold value and a constant value), that is, in the noise interval. In such an estimation method, the noise amplitude is not estimated in the speech section, the change in the noise cannot be followed, the speech is distorted, and the noise suppression effect is reduced.

また、特許文献２記載の技術では、雑音推定を、
An(t) = δ×As + (1-δ)×An(t-1) when ΣAs ≦ TH
An(t) = δ×（As-(η×Ao(t-1)+(1-η)×(As-An(t-1)) + (1-δ)×An(t-1) when
ΣAs ＞ TH (3)
０.５＜η＜１
（変数名は本発明のものに合わせている。）としている。ここで、Aoは出力信号の振幅推定値を示している。この方法だと、非雑音区間では、１サンプル過去の音声のみが存在しているであろう出力信号（雑音抑圧処理を行った結果であるので）と入力信号から１サンプル過去の雑音成分を引き去った音声だけがあるであろう信号を考慮し、入力信号の中の雑音成分のみを抽出・雑音推定を行っているが、仮定が多い。例えば、(As-An(t-1))の式で音声のみが取り出せるのであれば、この項だけで雑音抑圧が可能である。実際には、これが困難であるから、付加機能を利用しており、(As-An(t-1))で音声のみが取り出せるかは疑問である。また、なにより計算が煩雑である。 In the technique described in Patent Document 2, noise estimation is performed using
An (t) = δ × As + (1-δ) × An (t-1) when ΣAs ≤ TH
An (t) = δ × (As- (η × Ao (t-1) + (1-η) × (As-An (t-1)) + (1-δ) × An (t-1) when
ΣAs> TH (3)
0.5 <η <1
(Variable names match those of the present invention.) Here, Ao represents an amplitude estimation value of the output signal. In this method, in the non-noise section, the noise component of one sample past is subtracted from the output signal (because it is the result of noise suppression processing) where only speech of one sample past exists. Considering a signal that will only be left speech, only the noise component in the input signal is extracted and noise estimation is performed, but there are many assumptions. For example, if only speech can be extracted with the expression (As-An (t-1)), noise suppression is possible only with this term. In practice, this is difficult, so an additional function is used, and it is questionable whether (As-An (t-1)) can extract only sound. In addition, the calculation is complicated.

そこで、本発明では後段の信号・雑音比算出部５の出力を利用し、以下のように雑音の推定を行う。具体的には、リーク積分の式、
An(t) = δ×As + (1-δ)×An(t-1) (4)
にパラメータγを追加し、
An(t) = δ×γ×As + (1-δ)×An(t-1) (5)
と変形する。ここで、Anは雑音振幅の推定値である。γについては信号・雑音比算出部５の説明にて詳しく説明するが、簡単にいえば、ＳＮＲ（SignalNoiseRatio:ここではAs/An）の逆数である。式(5)のように、リーク積分の式を変形することによって、雑音区間では雑音信号そのもので学習が可能となり、かつ、音声区間ではAsに含まれる雑音信号の振幅値を推定し、学習することが可能となる。 Therefore, in the present invention, the noise is estimated as follows using the output of the signal / noise ratio calculator 5 in the subsequent stage. Specifically, the leak integral equation,
An (t) = δ × As + (1-δ) × An (t-1) (4)
Parameter γ is added to
An (t) = δ × γ × As + (1-δ) × An (t-1) (5)
And deformed. Here, An is an estimated value of noise amplitude. γ will be described in detail in the description of the signal / noise ratio calculation unit 5, but simply speaking, it is the reciprocal of SNR (SignalNoiseRatio: Here, As / An). By modifying the leakage integral equation as shown in Equation (5), learning can be performed with the noise signal itself in the noise interval, and the amplitude value of the noise signal included in As is estimated and learned in the speech interval. It becomes possible.

ここでは、１サンプル過去のＳＮＲと現在のＳＮＲが等しいという仮定をしているが、１サンプルの時間は、例えば、400Hz（16kHzを40samples間引き）では0.0025秒であり、この時間間隔でのＳＮＲの変化はごく小さいため、１サンプル過去のＳＮＲと現在のＳＮＲが等しいという仮定は妥当であるといえる。この結果、音声区間・雑音区間の判別を必要とせず、かつ、音声区間でも雑音の推定を精度良く行うことが可能となる。 Here, it is assumed that the SNR of one sample in the past is equal to the current SNR, but the time of one sample is, for example, 0.0025 seconds at 400 Hz (16 kHz is thinned out by 40 samples), and the SNR of this time interval is Since the change is very small, it can be said that the assumption that the SNR of one sample is equal to the current SNR is reasonable. As a result, it is not necessary to discriminate between speech and noise sections, and noise can be accurately estimated even in the speech sections.

以上より求まる、入力信号振幅推定部１と雑音信号振幅推定部２のそれぞれの推定振幅値は、雑音区間推定部３に入力される。ここでは、入力信号振幅の推定値と雑音振幅の推定値を比較し、雑音区間の推定を行う。ただし、雑音振幅推定値はあくまで雑音振幅の平均値を示しており、単純に比較するだけでは完全に雑音区間を推定できない。なぜなら、雑音には分散を伴うためである。 The estimated amplitude values of the input signal amplitude estimation unit 1 and the noise signal amplitude estimation unit 2 obtained from the above are input to the noise interval estimation unit 3. Here, the estimated value of the input signal amplitude and the estimated value of the noise amplitude are compared to estimate the noise interval. However, the noise amplitude estimation value merely indicates the average value of the noise amplitude, and the noise section cannot be estimated completely by simply comparing. This is because noise involves dispersion.

雑音の分散を示す一例として、ある部屋における暗騒音の振幅分布を図２に示す。このヒストグラムはある部屋の暗騒音を測定し、その絶対値振幅の最大値を１として正規化、分布をヒストグラムとしたものである。また、雑音振幅の平均値を白の点線で示してある。雑音の平均値と入力信号の振幅推定値（ほぼ瞬時値）を比較すると、雑音振幅の平均値よりも高い振幅値を示す入力信号が半数近くある。すなわち、雑音の振幅推定値と入力信号の振幅推定値を単純に比較し、雑音区間を判別するとその半数が誤りであることになる。 As an example showing the variance of noise, the amplitude distribution of background noise in a room is shown in FIG. This histogram measures the background noise in a room, normalizes the maximum absolute value of the amplitude as 1, and sets the distribution as a histogram. The average value of the noise amplitude is indicated by a white dotted line. Comparing the average value of noise and the estimated amplitude value (substantially instantaneous value) of the input signal, nearly half of the input signals show an amplitude value higher than the average value of the noise amplitude. That is, if the noise amplitude estimation value and the input signal amplitude estimation value are simply compared to determine the noise interval, half of the noise estimation values are incorrect.

そこで、雑音区間推定部３では、雑音振幅推定値に係数を乗じ、比較することで推定誤りを低減する。この係数は大きいほうが雑音区間を確実に雑音区間と推定できるが、大きくしすぎると音声があるにもかかわらず雑音区間と誤推定しかねない。そこで、係数は２程度とする。この係数を雑音振幅推定値に乗じることで、雑音区間を音声区間と誤推定することが半減する。 Therefore, the noise interval estimation unit 3 reduces the estimation error by multiplying the noise amplitude estimation value by a coefficient and comparing the coefficients. If this coefficient is large, the noise interval can be reliably estimated as the noise interval, but if it is too large, it may be erroneously estimated as the noise interval even though there is speech. Therefore, the coefficient is about 2. By multiplying the noise amplitude estimate by this coefficient, erroneous estimation of a noise interval as a speech interval is halved.

以上により推定された雑音区間では、後段の雑音振幅比較部４へ渡す雑音バイアス値を小さくする。この雑音バイアス値は、雑音区間では0.5程度、音声区間では1.0程度の値とする。この値の利用法は後述する。 In the noise section estimated as described above, the noise bias value passed to the subsequent noise amplitude comparison unit 4 is reduced. This noise bias value is about 0.5 in the noise section and about 1.0 in the voice section. The usage of this value will be described later.

雑音区間推定部３からの雑音バイアス値を受け、雑音振幅比較部４では入力信号の振幅推定値と雑音振幅の推定値の比較を行う。この際、入力信号に雑音区間推定部３からの雑音バイアス値を乗じて比較する。これは、雑音区間での雑音抑圧を効果的に行うためである。図３を用いて説明する。 In response to the noise bias value from the noise interval estimation unit 3, the noise amplitude comparison unit 4 compares the amplitude estimation value of the input signal with the noise amplitude estimation value. At this time, the input signal is compared with the noise bias value from the noise interval estimation unit 3 for comparison. This is to effectively perform noise suppression in the noise section. This will be described with reference to FIG.

図３は、ある入力信号の振幅の経時変化を模擬したものである。各棒グラフの斜線部は雑音成分であり、白抜き部は音声信号を表している。更に、実線は推定された雑音の平均値である。雑音の平均値と雑音成分の大きさを比べ、斜線の棒のほうが実線よりも低ければ、その雑音は完全に抑圧できるが、高い場合には抑圧できない。これはウィナーフィルタによる処理が、雑音振幅の平均値を入力信号の振幅から減算する方式であるためで、平均値より大きい雑音に関しては完全な抑圧ができないことに起因する。 FIG. 3 simulates the change over time of the amplitude of a certain input signal. The shaded part of each bar graph is a noise component, and the white part represents an audio signal. Furthermore, the solid line is the average value of the estimated noise. If the average value of the noise is compared with the magnitude of the noise component, if the shaded bar is lower than the solid line, the noise can be completely suppressed, but if it is high, it cannot be suppressed. This is because the process using the Wiener filter is a method in which the average value of the noise amplitude is subtracted from the amplitude of the input signal, and noise that is larger than the average value cannot be completely suppressed.

そこで、雑音区間推定部３において雑音区間と判別された場合には図４のように雑音振幅の平均値を見た目上大きくし、ウィナーフィルタの処理を施すことによって、雑音を完全に抑圧する。ただし、音声区間でも同様に見た目上大きくした雑音振幅で処理をすると、音声の歪みが大きくなってしまうので、雑音区間以外では、図３のような本来の雑音振幅平均値を用いる。この場合、雑音は完全に抑圧できないが、音声により雑音がマスクされるため、実際にはほとんど雑音は気にならなくなる。 Therefore, when the noise section estimation unit 3 determines that it is a noise section, the average value of the noise amplitude is visually increased as shown in FIG. 4 and the Wiener filter processing is performed to completely suppress the noise. However, if processing is performed with a noise amplitude that is apparently increased in the voice section as well, the distortion of the voice increases. Therefore, the original average noise amplitude value as shown in FIG. 3 is used outside the noise section. In this case, the noise cannot be completely suppressed. However, since the noise is masked by the voice, the noise is hardly noticed in practice.

さて、雑音振幅比較部４では、前述した雑音区間推定部３からの雑音バイアス値を乗じたものと入力信号振幅の推定値を比較し、もし入力信号振幅値が小さければ、その帯域の減衰係数を最大とするような最大減衰係数フラグを減衰係数算出部６に渡す。この最大減衰係数フラグの利用法については後述する。 The noise amplitude comparison unit 4 compares the estimated value of the input signal amplitude with the product of the noise bias value from the noise interval estimation unit 3 described above, and if the input signal amplitude value is small, the attenuation coefficient of that band. Is passed to the attenuation coefficient calculation unit 6. A method of using the maximum attenuation coefficient flag will be described later.

信号・雑音比算出部５では、入力信号振幅推定値Asと雑音振幅推定値Anの除算を行い、次式で示すγの値を算出する。
γ=An/As
if An > As then γ=1.0 (6)
である。このγは、ＳＮＲの逆数であり、かつ入力信号の振幅が雑音振幅の推定値より小さい場合はγ=1.0とする。すなわち、雑音振幅推定値よりも入力信号が大きければ、この値は小さいものとなる。このγを雑音信号振幅推定部２で利用する。 The signal / noise ratio calculation unit 5 divides the input signal amplitude estimated value As and the noise amplitude estimated value An to calculate a value of γ represented by the following equation.
γ = An / As
if An> As then γ = 1.0 (6)
It is. This γ is the reciprocal of SNR, and γ = 1.0 when the amplitude of the input signal is smaller than the estimated value of the noise amplitude. That is, if the input signal is larger than the noise amplitude estimation value, this value is small. This γ is used in the noise signal amplitude estimation unit 2.

雑音の推定にはAsを利用することは前述の通りだが、非雑音区間ではAsに含まれる音声成分で雑音推定の精度が落ちてしまう。そこで、更新をγ×Asにより行うことで推定精度を高める。式（5）の右辺第一項δ×γ×Asは、
δ×γ×As＝δ×An/As×As≒δ×An (7)
と変形できる。従って、(5)式右辺全体は、音声を含まない雑音のみの値で更新が可能となる。すなわち、非雑音区間でも雑音の推定を精度良く行うことが可能となる。 The use of As for noise estimation is as described above, but in the non-noisy section, the accuracy of noise estimation is reduced by the speech component contained in As. Therefore, the estimation accuracy is increased by updating with γ × As. The first term δ × γ × As on the right side of Equation (5) is
δ × γ × As = δ × An / As × As ≒ δ × An (7)
And can be transformed. Therefore, the entire right side of the equation (5) can be updated with only the noise-free value. That is, it is possible to accurately estimate noise even in a non-noise section.

次に雑音抑圧の主たる部となる減衰係数算出部６では減衰係数Lを求める。減衰係数は基本的にウィナーフィルタの式により求められ、求められる減衰係数をL'とすると、
L' = ( As - An )/As (8)
この式を式(6)のγを利用して展開すると、
L' = ( As - An )/As = 1 - γ (9)
となる。ここで、雑音振幅比較部４から最大減衰係数フラグが受け渡されていた場合には、L'の値を0とする。 Next, an attenuation coefficient L, which is a main part of noise suppression, is obtained as an attenuation coefficient L. The attenuation coefficient is basically obtained by the Wiener filter formula, and when the obtained attenuation coefficient is L ′,
L '= (As-An) / As (8)
When this equation is expanded using γ in Equation (6),
L '= (As-An) / As = 1-γ (9)
It becomes. Here, when the maximum attenuation coefficient flag is transferred from the noise amplitude comparison unit 4, the value of L ′ is set to zero.

さて、このL'の値をそのまま減衰係数Lとして利用すると、減衰係数が極端に大きくなる場合がある。例えば、γ=0.9のよう場合、L'=0.1となり、入力信号を1/10の大きさにしてしまう。このように減衰係数が大きくなると、音声の歪みが生じやすいため、最大減衰係数MLを設け、最終的に出力される減衰係数Lを
L=L'
if ML > L' then L=ML (10)
とする。ここで求まった減衰係数Lは減衰係数平滑部７に渡される。 If the value of L ′ is used as it is as the attenuation coefficient L, the attenuation coefficient may become extremely large. For example, when γ = 0.9, L ′ = 0.1, and the input signal becomes 1/10. If the attenuation coefficient increases in this way, audio distortion is likely to occur. Therefore, the maximum attenuation coefficient ML is provided, and the attenuation coefficient L that is finally output is set.
L = L '
if ML> L 'then L = ML (10)
And The attenuation coefficient L obtained here is passed to the attenuation coefficient smoothing unit 7.

前述の減衰係数算出部６によって算出された減衰係数Lは、入力信号振幅推定部１のAsの平滑化により、時間軸方向に平滑化がなされているが、まだ音声の歪みを低減する意味では十分でない。そこで、減衰係数Lをリーク積分によってさらに時間軸上の平滑化を行う。これを行うの
が、減衰係数平滑部７である。
SL(t) = δ×L + (1-δ)×SL(t-1) (11)
式(11)は減衰係数のリーク積分の式であり、SLが最終的な減衰係数となる。ここで、δはおよそ0.5とし、減衰係数の瞬時値に追従し易いものとする。これはδをあまり小さくし過ぎると、音声に歪みが生じてしまうためで、雑音抑圧性能と音声の歪みのトレードオフの関係を調整するパラメータとなる。 The attenuation coefficient L calculated by the above-described attenuation coefficient calculation unit 6 has been smoothed in the time axis direction by smoothing As of the input signal amplitude estimation unit 1, but it still means that the distortion of the speech is reduced. not enough. Therefore, the attenuation coefficient L is further smoothed on the time axis by leak integration. The attenuation coefficient smoothing unit 7 performs this.
SL (t) = δ × L + (1-δ) × SL (t-1) (11)
Expression (11) is an expression of the leakage coefficient leakage integral, and SL is the final attenuation coefficient. Here, it is assumed that δ is approximately 0.5 and it is easy to follow the instantaneous value of the attenuation coefficient. This is because if δ is too small, distortion occurs in the voice, which is a parameter for adjusting the trade-off relationship between the noise suppression performance and the voice distortion.

以上で、各周波数帯域における減衰係数SLが求まった。この減衰係数SLを各周波数帯域の信号に乗ずることにより、雑音の抑圧が可能となる。しかし、このままの減衰係数SLを乗じた場合、帯域間の減衰係数の差が非常に大きいと、音声の歪みが非常に大きくなってしまう。そこで、本発明では帯域間減衰係数平滑部８を導入する。 As described above, the attenuation coefficient SL in each frequency band is obtained. Noise can be suppressed by multiplying the signal of each frequency band by this attenuation coefficient SL. However, when the attenuation coefficient SL is multiplied as it is, if the difference between the attenuation coefficients between the bands is very large, the distortion of the sound becomes very large. Therefore, in the present invention, an interband attenuation coefficient smoothing unit 8 is introduced.

帯域間減衰係数平滑部８の機能を、図５〜１２を用いて説明する。先ず、ある時間の各周波数帯域の入力信号のモデルを図５に示す。ここで、グラフの横軸は周波数であり、縦軸は振幅である。また、グラフの斜線部は音声成分であり、白抜き部は雑音成分を表現している。 The function of the interband attenuation coefficient smoothing unit 8 will be described with reference to FIGS. First, FIG. 5 shows a model of an input signal in each frequency band at a certain time. Here, the horizontal axis of the graph is frequency, and the vertical axis is amplitude. Further, the hatched portion of the graph represents a voice component, and the white portion represents a noise component.

このような入力信号が入力し、雑音が精度良く推定されていると仮定すると、ウィナーフィルタの理論によりも求まる減衰係数は図６のようになる。このグラフの横軸は図５と同様周波数であり、図５と対応している。また、縦軸は算出された減衰のための乗算値である。図６をみると、隣り合う帯域間で減衰係数の差が大きい組み合わせがあるのが分かる。この極端に大きい減衰係数の差を補正することが帯域間減衰係数平滑部８の目的である。 Assuming that such an input signal is input and noise is accurately estimated, the attenuation coefficient obtained by the Wiener filter theory is as shown in FIG. The horizontal axis of this graph is the frequency as in FIG. 5, and corresponds to FIG. The vertical axis represents the calculated multiplication value for attenuation. It can be seen from FIG. 6 that there are combinations in which the difference in attenuation coefficient is large between adjacent bands. The purpose of the inter-band attenuation coefficient smoothing unit 8 is to correct this extremely large attenuation coefficient difference.

既存の帯域間の減衰係数の平滑化技術として、特許文献４記載の技術を挙げることができる。この文献では、減衰係数を補正する帯域を中心に、任意の数の帯域の減衰係数を重み付けて平均し、平滑化を実現している。特許文献４で例として挙げられている隣り合う３つの帯域の減衰係数の平均を減衰係数とする処理を図６の減衰係数に適応すると、図７のような減衰係数となる。 As a technique for smoothing the attenuation coefficient between existing bands, a technique described in Patent Document 4 can be cited. In this document, smoothing is realized by weighting and averaging the attenuation coefficients of an arbitrary number of bands around the band for correcting the attenuation coefficient. When the process of using the average of the attenuation coefficients of the three adjacent bands as an example in Patent Document 4 as the attenuation coefficient is applied to the attenuation coefficient in FIG. 6, the attenuation coefficient is as shown in FIG.

確かに、図７をみると、隣り合う帯域間の減衰係数は平滑化され、極端な変化はなくなっている。そこで、図５の入力信号を図７の減衰係数にて出力を算出してみる。結果は図８のようになった。この図では、図５（入力信号）に含まれる音声成分（斜線部）と減衰処理によって生じた雑音（白抜き部）を示している。ここで、注目すべき点は白抜きの雑音成分であり、帯域によってはマイナス方向の雑音が生じている。つまり入力した音声成分が削られ、音声に歪が生じていることがわかる。つまり、この平滑化手法では、各帯域の減衰係数の平滑化により、帯域間の連続性は改善するものの、新たに雑音が付加され音声の歪み・雑音が生じてしまう。これでは、高品質収音は見込めない。 Certainly, looking at FIG. 7, the attenuation coefficient between adjacent bands is smoothed, and there is no extreme change. Therefore, the output of the input signal shown in FIG. 5 is calculated using the attenuation coefficient shown in FIG. The result is shown in FIG. In this figure, the audio component (shaded portion) included in FIG. 5 (input signal) and the noise (white portion) generated by the attenuation process are shown. Here, the point to be noted is a white noise component, and noise in a negative direction is generated depending on the band. That is, it can be seen that the input audio component is cut off and the audio is distorted. That is, in this smoothing method, although the continuity between the bands is improved by smoothing the attenuation coefficient of each band, noise is newly added to cause distortion and noise of the voice. With this, high-quality sound pickup cannot be expected.

本発明では、帯域間減衰係数平滑部８において、各帯域の隣り合う帯域（以降、隣接バンドと表記）の減衰係数を調べ、隣接バンドの減衰係数との比（以降、MDと表記）が一定以上にならないよう補正する機能を有する。補正は、減衰係数を小さくする方向にのみ行う。これにより、隣接バンド間での減衰係数が滑らかに繋がり、音声の歪みを大幅に解消できる。 In the present invention, the inter-band attenuation coefficient smoothing unit 8 checks the attenuation coefficient of adjacent bands (hereinafter referred to as adjacent bands) of each band, and the ratio (hereinafter referred to as MD) with the attenuation coefficient of the adjacent bands is constant. It has a function of correcting so as not to be above. Correction is performed only in the direction of decreasing the attenuation coefficient. Thereby, the attenuation coefficient between adjacent bands is connected smoothly, and the distortion of the voice can be largely eliminated.

具体的な補正の流れについては、図９および１０を用いて説明する。図９は図６において、隣り合う“高い”周波数帯域とのMDが一定以上の帯域を黒塗りで示している。この場合、補正をされるのは、水玉で描かれた帯域であり、この帯域の減衰係数をMDがある値以下になるように補正する。具体的には、例えば減衰係数0.2（水玉）と0.8（黒塗り）が隣り合っていた場合、
MD=0.2/0.8=0.25 (12)
となる。ここで、MDの最小値を0.5とした場合、MDが0.25であるので補正を行う。補正は、0.2の減衰係数であった帯域の減衰係数を、
0.2 → 0.8*MDの最小値=0.8×0.5=0.4 (13)
のように補正する。ここで、注意すべき点は、減衰係数が大きい（乗算値が小さい）帯域の減衰係数を補正している点である。また、周波数の最も高い帯域の減衰係数から補正をする。これにより、図９の2000Hz，2250Hzの帯域のように、２つの帯域で連続して補正が必要な場合でも補正が可能となる。 A specific correction flow will be described with reference to FIGS. FIG. 9 shows, in FIG. 6, a band in which the MD of the adjacent “high” frequency band is a certain level or more is black. In this case, the band to be corrected is a band drawn with polka dots, and the attenuation coefficient of this band is corrected to be equal to or less than a certain value. Specifically, for example, when the attenuation coefficient 0.2 (polka dots) and 0.8 (black) are next to each other,
MD = 0.2 / 0.8 = 0.25 (12)
It becomes. Here, when the minimum value of MD is set to 0.5, correction is performed because MD is 0.25. The correction is the attenuation coefficient of the band that was 0.2 attenuation coefficient,
0.2 → 0.8 * Minimum value of MD = 0.8 × 0.5 = 0.4 (13)
Correct as follows. Here, the point to be noted is that the attenuation coefficient in a band where the attenuation coefficient is large (the multiplication value is small) is corrected. Also, correction is performed from the attenuation coefficient in the highest frequency band. As a result, correction is possible even when correction is required continuously in two bands, such as the 2000 Hz and 2250 Hz bands in FIG. 9.

次に、図１０の補正について述べる。図１０は図９の補正（隣接する周波数の高い帯域に注目した補正）を行った結果を示している。また、図１０の色分けは図９とは逆に、隣り合う“低い”周波数の帯域と比較し、MDが一定値以上ある帯域を水玉で表現している。この水玉の帯域について、前述の図９の補正処理である(12),(13)式の補正を行うことで、図１１のような平滑化された減衰係数を得ることができる。図１１の減衰係数で、入力信号を処理した結果を図１２に示す。この結果を見る、マイナス方向の雑音はないことから音声成分を削ることなく、雑音成分を抑圧できていることが分かる。 Next, the correction in FIG. 10 will be described. FIG. 10 shows the result of performing the correction of FIG. 9 (correction focusing on adjacent high frequency bands). In contrast to FIG. 9, the color coding in FIG. 10 is compared with adjacent “low” frequency bands, and a band where MD is a certain value or more is expressed by polka dots. With respect to the band of the polka dots, the smoothed attenuation coefficient as shown in FIG. 11 can be obtained by performing the correction of the equations (12) and (13) which are the correction processes of FIG. FIG. 12 shows the result of processing the input signal with the attenuation coefficient of FIG. From this result, it can be seen that since there is no noise in the negative direction, the noise component can be suppressed without removing the voice component.

ただし、最も低い帯域に抑圧し切れなかった雑音成分が見られる。この雑音に関しては、隣接する帯域の音声信号が大きいため、マスキング効果によりほとんど雑音は気にならない。マスキング効果とは人間の聴覚上の特性のひとつで、ある周波数成分に大きな成分があると、その近傍の音は聞こえにくくなるという現象のことである。 However, a noise component that cannot be completely suppressed is observed in the lowest band. Regarding this noise, since the voice signal in the adjacent band is large, the noise is hardly noticed by the masking effect. The masking effect is one of human auditory characteristics, and when there is a large component in a certain frequency component, it is a phenomenon that it becomes difficult to hear the sound in the vicinity.

よって、MDの最小値に関しては、隣接バンドの音によって雑音がマスクされる限界の値とするのが好ましい。これは各周波数帯域をマスキングの指標となる臨界帯域バンド（一般的なBark Scaleが利用できる）に分け、MDの最小値を決めることになるが、演算量が増大するため、簡易的にすべての帯域において一定の値としてもかまわない。この場合にはMDの最小値を0.5程度とする。 Therefore, the minimum value of MD is preferably set to a limit value at which noise is masked by the sound of the adjacent band. This means that each frequency band is divided into critical band bands (general Bark Scale can be used) that serve as masking indices, and the minimum value of MD is determined. However, since the amount of calculation increases, It may be a constant value in the band. In this case, the minimum value of MD is set to about 0.5.

以上により、定まった各帯域での減衰係数を乗算器９にて各帯域の入力信号に乗ずる。この信号を帯域合成部４０にて合成することによって最終的な出力信号を得る。帯域合成部４０では、帯域分割部２０と同様にＳＳＢ変調を利用した帯域合成の方法が利用できる。 As described above, the multiplier 9 multiplies the input signal of each band by the determined attenuation coefficient in each band. This signal is synthesized by the band synthesizing unit 40 to obtain a final output signal. The band synthesizing unit 40 can use a band synthesizing method using SSB modulation as in the band dividing unit 20.

以上が本発明の最良の実施の形態である。これまで一構成をモデルに説明をしてきたが、パラメータなど記載の内容に限定されたものではなく、その要旨を維持する範囲内で変更可能である。 The above is the best embodiment of the present invention. Up to now, one configuration has been described as a model, but it is not limited to the description of parameters and the like, and can be changed within the scope of maintaining the gist.

本発明の雑音抑圧装置を示すブロック図The block diagram which shows the noise suppression apparatus of this invention ある部屋における暗騒音の振幅絶対値の分布図Distribution diagram of absolute amplitude of background noise in a room 雑音平均値と雑音・音声の入力モデルの例を示す図Figure showing examples of noise average and noise / speech input models 雑音平均値を大きく見せた場合の雑音・音声の入力モデルの例を示す図Figure showing an example of noise / speech input model when the average noise value is shown to be large ある帯域の音声信号と雑音信号の振幅を示す図The figure which shows the amplitude of the audio signal and noise signal of a certain band 図５の入力の場合に算出される減衰係数を示す図The figure which shows the attenuation coefficient calculated in the case of the input of FIG. 特許文献４記載の技術による減衰係数の平滑結果を示す図The figure which shows the smoothing result of the attenuation coefficient by the technique of patent document 4 図７の減衰係数による出力信号の振幅を示す図The figure which shows the amplitude of the output signal by the attenuation coefficient of FIG. 本発明での高域比較による減衰係数の平滑結果を示す図The figure which shows the smoothing result of the attenuation coefficient by the high region comparison in this invention 本発明での低域比較による減衰係数の平滑結果を示す図The figure which shows the smoothing result of the attenuation coefficient by the low region comparison in this invention 本発明での高域比較および低域比較減衰係数の平滑結果を示す図The figure which shows the smoothing result of the high region comparison and low region comparison attenuation coefficient in this invention 図１１の減衰係数による出力信号の振幅を示す図The figure which shows the amplitude of the output signal by the attenuation coefficient of FIG.

Explanation of symbols

１入力信号振幅推定部
２雑音信号振幅推定部
３雑音区間推定部
４雑音振幅比較部
５信号・雑音比算出部
６減衰係数算出部
７減衰係数平滑部
８帯域間減衰係数平滑部
９乗算器
１０マイクロホン
２０帯域分割部
３０処理部
４０帯域合成部
DESCRIPTION OF SYMBOLS 1 Input signal amplitude estimation part 2 Noise signal amplitude estimation part 3 Noise area estimation part 4 Noise amplitude comparison part 5 Signal / noise ratio calculation part 6 Attenuation coefficient calculation part 7 Attenuation coefficient smoothing part 8 Interband attenuation coefficient smoothing part 9 Multiplier 10 Microphone 20 Band division unit 30 Processing unit 40 Band synthesis unit

Claims

In the noise suppressor that uses the spectral subtraction method,
Band dividing means for dividing a time-domain input signal in which an audio signal and a stationary noise signal are mixed into a signal in a frequency band limited by using SSB modulation;
A processing unit corresponding to a frequency band that performs processing to suppress noise in the input signal of the divided frequency band;
A band synthesis unit that outputs one signal in which noise is suppressed by synthesizing the signals processed by the respective processing units,
Each of the processing units
Amplitude estimation means for suppressing the difference in attenuation coefficient between the frames of the input signal by leak integration and smoothing on the time axis and estimating the amplitude value of the current signal value and the noise amplitude value,
Attenuation coefficient determining means for determining the SNR and determining the attenuation coefficient from the amplitude value from the amplitude estimating means;
Attenuation coefficient correction means for reducing distortion of the output signal by the attenuation coefficient determined by the attenuation coefficient determination means;
Multiplying means for multiplying the input signal by the attenuation coefficient obtained from the attenuation coefficient correction means;
The noise suppression device according to claim 1, wherein the amplitude estimation means does not stop noise estimation even in a speech section by modifying a leak integral equation by a reciprocal of the previous SNR by one frame.

The noise amplitude estimation value is compared with the input signal amplitude estimation value obtained by the amplitude estimation means to determine the noise interval. At this time, the noise amplitude estimation value is multiplied by a coefficient of about 2, and the noise interval is about 0.5. Noise interval estimation means for outputting a noise bias value of about 1.0 in the speech interval;
The noise amplitude estimated value obtained by the amplitude estimating means and the input signal amplitude estimated value are multiplied by the noise bias value and compared, and if the input signal amplitude estimated value is small, the maximum attenuation coefficient of the band is maximized. 2. The noise suppression apparatus according to claim 1, further comprising noise amplitude comparison means for outputting an attenuation coefficient flag to the attenuation coefficient determination means.

The attenuation coefficient determination means includes
The estimated amplitude value from the amplitude estimation is input, and the reciprocal of the SNR obtained by dividing the noise amplitude estimated value by the input signal amplitude estimated value is calculated. If the input signal amplitude estimated value is smaller than the noise amplitude estimated value, the reciprocal is calculated. Signal / noise ratio calculating means for outputting as 1.0,
5. An attenuation coefficient calculating means for calculating an attenuation coefficient from the reciprocal of the SNR from the signal / noise ratio calculating means and the maximum attenuation coefficient flag and outputting the calculated attenuation coefficient to the attenuation coefficient correcting means. The noise suppression device according to claim 1 or 2.

The attenuation coefficient correction means includes
Attenuation coefficient leveling means for smoothing the attenuation coefficient from the attenuation coefficient determining means on the time axis by leak integration,
Concerning the attenuation coefficient in the frequency band, it is composed of inter-band attenuation coefficient leveling means that checks the attenuation coefficient of the adjacent band and corrects only in the direction of decreasing the attenuation coefficient so that the ratio with the attenuation coefficient does not exceed a certain value. The noise suppression device according to any one of claims 1 to 3, wherein