JP4448464B2

JP4448464B2 - Noise reduction method, apparatus, program, and recording medium

Info

Publication number: JP4448464B2
Application number: JP2005062616A
Authority: JP
Inventors: 賢一野口; 澄宇阪内; 陽一羽田; 章俊片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-03-07
Filing date: 2005-03-07
Publication date: 2010-04-07
Anticipated expiration: 2025-03-07
Also published as: JP2006243644A

Description

本発明は、目的となる音声信号と不要な雑音信号が混在する入力信号から、雑音を低減する方法に関し、特に非定常雑音も低減可能な雑音低減方法及び装置、プログラム、記録媒体に関する。 The present invention relates to a method for reducing noise from an input signal in which a target audio signal and an unnecessary noise signal are mixed, and more particularly, to a noise reduction method and apparatus, program, and recording medium that can reduce unsteady noise.

TV会議などの拡声通話系において、送話側のマイクロホンには目的となる音声以外にも、周囲で発生する不要な雑音が収音され、受話側で再生される。これら不要な雑音は通話品質を著しく劣化させ問題となる。そこで、目的となる音声信号と不要な雑音信号が混在する入力信号から、雑音を低減する方法が要求されている。
周囲の雑音を低減させる第１の従来技術としては、特許文献１に示されている定常雑音に対する雑音低減方法がある。この方法では、音声信号に雑音が混入した入力信号に、推定された雑音パワースペクトルを用いて計算される損失値を掛け合わせることで雑音を低減する。 In a loudspeaker communication system such as a TV conference, unnecessary noise generated in the surroundings is collected in the microphone on the transmission side in addition to the target voice and reproduced on the reception side. These unnecessary noises cause a problem that the communication quality is significantly deteriorated. Therefore, there is a demand for a method for reducing noise from an input signal in which a target audio signal and an unnecessary noise signal are mixed.
As a first conventional technique for reducing ambient noise, there is a noise reduction method for stationary noise disclosed in Patent Document 1. In this method, noise is reduced by multiplying an input signal in which noise is mixed into a speech signal by a loss value calculated using the estimated noise power spectrum.

複数のマイクロホン（マイクロホンアレー）を用いた周囲の雑音を低減させる第２の従来技術としては、非特許文献１に示されているAMNOR(Adaptive Microphone-array for NOise Reduction)が知られている。この方法では、雑音源方向に感度の低い指向特性を形成することで雑音を低減する。
非定常雑音も低減可能な第３の従来技術としては、非特許文献２に示されている突発性雑音判別及び低減方法がある。この方法では、入力信号の周波数特性、ケプストラムを求め、突発性雑音区間を判別し、音声区間に突発性雑音が重畳していると判別された場合には、直前の周期波形挿入により雑音の低減を行う。
特開平９−２５８７９２号公報大賀、山崎、金田共著“音響システムとディジタル処理”電子情報通信学会、１９９５年、ｐｐ．１７３−１９７野口、阪内、羽田、片岡“１チャネル入力信号中の突発性雑音の判別と除去”日本音響学会２００４年春季研究発表会講演論文集、３−Ｐ−３０、ｐｐ．６５５−６５６ As a second conventional technique for reducing ambient noise using a plurality of microphones (microphone array), AMNOR (Adaptive Microphone-array for NOise Reduction) shown in Non-Patent Document 1 is known. In this method, noise is reduced by forming a directional characteristic with low sensitivity in the noise source direction.
As a third prior art capable of reducing non-stationary noise, there is a sudden noise discrimination and reduction method shown in Non-Patent Document 2. In this method, the frequency characteristics and cepstrum of the input signal are obtained, the sudden noise section is determined, and if it is determined that the sudden noise is superimposed on the speech section, the noise is reduced by inserting the immediately preceding periodic waveform. I do.
JP-A-9-258792 Co-authored by Oga, Yamazaki and Kaneda, “Acoustic Systems and Digital Processing”, The Institute of Electronics, Information and Communication Engineers, 1995, pp. 173-197 Noguchi, Hannai, Haneda, Kataoka, “Distinction and Elimination of Sudden Noise in 1-Channel Input Signals”, Acoustical Society of Japan Spring 2004 Presentation, 3-P-30, pp. 655-656

第１の従来技術のような定常雑音に対する低減方法では、１入力で音声に重畳した定常雑音は低減できるが、時間的変動の大きい非定常雑音を低減することはできない。第２の従来技術のような複数のマイクロホンを用いた方法では、非定常雑音に関わらず雑音低減を行うことができるが、マイクロホンを複数設置する必要がある。一般的な通信や収音システムでは、マイクロホン数は１本であり、ハードウェア規模、処理演算量の増大を避けるためにも１入力による雑音低減方法が望まれる。第３の従来技術では、１入力で継続時間が数ｍｓの非定常雑音に対して、音声の劣化を抑えて低減可能である。しかし、雑音の継続時間が数十ｍｓ以上続く長い場合には、雑音の低減は可能であるが、低減処理による音声の劣化が生じ、通話品質の劣化につながる。 With the method for reducing stationary noise as in the first prior art, stationary noise superimposed on speech with one input can be reduced, but unsteady noise with large temporal fluctuation cannot be reduced. In the method using a plurality of microphones as in the second prior art, noise can be reduced regardless of non-stationary noise, but it is necessary to install a plurality of microphones. In a general communication or sound collection system, the number of microphones is one, and a noise reduction method with one input is desired in order to avoid an increase in hardware scale and processing calculation amount. According to the third conventional technique, it is possible to suppress the deterioration of speech against non-stationary noise having a duration of several ms with one input, and can reduce the noise. However, when the duration of the noise lasts longer than several tens of ms, the noise can be reduced, but the voice is deteriorated due to the reduction process, leading to the deterioration of the call quality.

本発明の目的は、入力信号が非定常雑音の混入した音声信号であり、特に紙資料をめくる音のような時間変動が激しい非定常雑音の場合、処理による音声の劣化を抑え、かつ継続時間が長い非定常雑音を低減する方法を実現することにある。 An object of the present invention is an audio signal in which non-stationary noise is mixed, especially in the case of non-stationary noise with a strong time variation such as a sound of turning paper material, and the deterioration of the sound due to processing is suppressed, and the duration time Is to realize a method for reducing long non-stationary noise.

この発明では目的となる音声信号と不要な雑音信号とが混在する音声雑音混在信号に対して、時間分解能の高い帯域分割方法を用いて信号を複数帯域に分割し、分割された帯域信号のうち低域信号の時間エンベロープを計算し、計算された低域の時間エンベロープに帯域毎の重みを乗じて高域エンベロープを推定し、分割された帯域信号のうちの高域信号と推定された高域エンベロープとを比較し、推定された高域エンベロープを上限とする雑音低減処理を帯域毎に行い、雑音低減処理された各帯域信号と低域信号を合成することを特徴とする。 In the present invention, for a mixed audio / noise signal in which a target audio signal and an unnecessary noise signal are mixed, the signal is divided into a plurality of bands using a band division method with high time resolution, and among the divided band signals, Calculates the low-frequency signal time envelope, multiplies the calculated low-frequency time envelope by the weight for each band to estimate the high-frequency envelope, and estimates the high-frequency signal as the high-frequency signal of the divided band signals Comparing with the envelope, noise reduction processing with the estimated high frequency envelope as the upper limit is performed for each band, and each band signal subjected to the noise reduction processing and the low frequency signal are synthesized.

この発明は、非定常雑音が混入した音声信号に対して、雑音低減処理を行う。時間分解能の高い帯域分割方法を用いる雑音低減処理を行うことで、紙めくり音をはじめとする非定常雑音の時間変動の追従した雑音低減処理が可能となる。また、本発明は１入力による処理を実現しているので、既存の通信や収音システムなどに組み合わせることも容易となる。 In the present invention, noise reduction processing is performed on an audio signal mixed with non-stationary noise. By performing noise reduction processing using a band division method with high temporal resolution, it is possible to perform noise reduction processing that tracks time fluctuations of non-stationary noise such as paper turning sound. In addition, since the present invention realizes a process with one input, it can be easily combined with an existing communication or sound collection system.

本発明による雑音低減装置は拡声通信システムの送話装置に組み込まれて使用する実施形態が最良である。送話装置に組み込むことにより周囲の雑音を除去し、高品質な音声信号を受話装置に送り出すことができる。
また、本発明による雑音低減装置はハードウェアによって構成することができるが、最も簡素に実施するには本発明で提案する雑音低減プログラムをコンピュータにインストールし、コンピュータに備えた中央演算処理装置にプログラムを実行させ、コンピュータを雑音低減装置として機能させる実施形態が最良である。この場合、コンピュータには少なくとも帯域分割部と、時間エンベロープ計算部と、高域エンベロープ推定部と、信号低減部と、信号合成部とによって構成される帯域別雑音低減装置が構築され、帯域別雑音低減処理が実行される。 An embodiment in which the noise reduction apparatus according to the present invention is incorporated in a transmitter of a loudspeaker communication system is the best. By incorporating it into the transmitter, ambient noise can be removed and a high-quality voice signal can be sent to the receiver.
The noise reduction apparatus according to the present invention can be configured by hardware. However, in the simplest implementation, the noise reduction program proposed in the present invention is installed in a computer, and the program is stored in the central processing unit provided in the computer. The embodiment in which the computer is executed and the computer functions as a noise reduction device is the best. In this case, the computer includes a noise reduction device for each band configured by at least a band division unit, a time envelope calculation unit, a high frequency envelope estimation unit, a signal reduction unit, and a signal synthesis unit. Reduction processing is performed.

図１に本発明における第１の実施例を示す。図に示す１００は本発明による雑音低減装置を示す。この第１の実施例では本発明の要旨となる帯域別雑音低減装置１５の前段側に雑音判別部１１と切替手段１４を配置し、雑音判別部１１の判別結果に従って切替手段１４を切替制御し、雑音の有無に応じて雑音が無であれば入力信号をそのまま通過させ、雑音が存在する場合は帯域別雑音低減装置１５で雑音低減処理を施し、入力信号が非音声の場合は無信号を出力する切替を行わせる場合の実施例を示す。
図１に示す実施例１では音声通信を想定し、フレームに分割された入力信号をX(n)とする。ここで、ｎは信号の時間表現を離散時間として表わす整数値である。例えばサンプリング周波数が16kHzでフレーム長が32msとする。 FIG. 1 shows a first embodiment of the present invention. 100 shown in the figure represents a noise reduction apparatus according to the present invention. In the first embodiment, the noise discriminating unit 11 and the switching unit 14 are arranged on the upstream side of the band-by-band noise reduction device 15 which is the gist of the present invention, and the switching unit 14 is controlled to switch according to the discrimination result of the noise discriminating unit 11. Depending on the presence or absence of noise, if there is no noise, the input signal is passed as it is. If noise is present, noise reduction processing is performed by the noise reduction device 15 for each band. If the input signal is non-speech, no signal is passed. An embodiment in the case of switching output is shown.
In the first embodiment shown in FIG. 1, assuming voice communication, an input signal divided into frames is X (n). Here, n is an integer value representing the time representation of the signal as discrete time. For example, the sampling frequency is 16 kHz and the frame length is 32 ms.

最初に目的信号と不要な周囲雑音等の混入する入力信号X(n)を雑音判定部１１に入力する。雑音判定部１１は、入力信号X(n)が音声か非音声か、音声であれば非定常雑音の存在の有無を判別し、制御信号Cを出力する。この判別方法には、例えば非特許文献１記載の判別方法を用いる。判別結果が音声かつ非定常雑音なしであれば制御信号C=1を出力し、かつ非定常雑音ありであれば制御信号C=2を出力し、判別結果が非音声であれば制御信号C=3を出力する。
制御信号Cは切替手段１４に入力され、制御信号Cの値によって切替手段１４の切替制御が実行される。つまり、制御信号CがC=1のときは切替手段１４は接点Fd1を選択し、入力信号X(n)をそのまま処理信号Y₁(n)として加算器１６に出力する。更に制御信号C=2のときは切替手段１４は接点Fd2を選択し、入力信号X(n)を帯域別雑音低減装置１５に入力する。更に、制御信号CがC=3のときは切替手段１４は接点Fd3を選択し、入力信号X(n)はどこにも出力されず遮断される。このとき加算器１６は無信号を出力する。 First, an input signal X (n) in which a target signal and unnecessary ambient noise are mixed is input to the noise determination unit 11. The noise determination unit 11 determines whether or not non-stationary noise exists if the input signal X (n) is voice or non-voice, and outputs a control signal C. For this discrimination method, for example, the discrimination method described in Non-Patent Document 1 is used. If the discrimination result is speech and no non-stationary noise, the control signal C = 1 is output.If the discrimination result is non-stationary noise, the control signal C = 2 is output. If the discrimination result is non-speech, the control signal C = 3 is output.
The control signal C is input to the switching means 14, and the switching control of the switching means 14 is executed according to the value of the control signal C. That is, when the control signal C is C = 1, the switching means 14 selects the contact Fd1, and outputs the input signal X (n) as it is to the adder 16 as the processing signal Y ₁ (n). Further, when the control signal C = 2, the switching means 14 selects the contact point Fd2, and inputs the input signal X (n) to the noise reduction device 15 for each band. Further, when the control signal C is C = 3, the switching means 14 selects the contact Fd3, and the input signal X (n) is not output anywhere and is cut off. At this time, the adder 16 outputs no signal.

帯域別雑音低減装置１５では、入力信号X(n)を入力とし、帯域別処理信号Y₂(n)を出力する。図２に帯域別雑音低減装置１５の構成図を示し、その動作について説明する。入力信号X(n)を帯域分割部２１に転送する。帯域分割部２１では、入力信号X(n)を入力とし、各周波数帯域に分割する。ここでは、等分割フィルタバンクを用い、１６帯域の信号X_i(m)(i=1〜16)に分割し、出力する。ｍは信号の時間表現を離散時間として表わす整数値である。他にも、フーリエ変換を用いた帯域分割、不等分割フィルタバンク、ウェーブレット変換を用いてもよい。ここで、時間分解能の高い帯域分割法を用いることで、紙めくり音をはじめとする非定常雑音の時間変動に追従した処理が可能となる。各帯域の信号のうち低域信号を時間エンベロープ計算部２２と信号合成部２５に転送する。ここでは、１６分割した帯域の最低域信号x₁(m)を転送する。その他の帯域分割信号x₂(m)〜x₁₆(m)は、信号低減部２４−２〜信号低減部２４−１６にそれぞれ転送する。時間エンベロープ計算部２２では、帯域分割信号x₁(m)を入力とし、時間エンベロープ（例えば複数サンプルの移動平均値）を計算し、帯域時間エンベロープ信号x₁￣(m) を出力する。ここではx₁￣(m)=(Σ_k=m-1 ^k=m+1x₁(m))/3を計算する。つまり、ここではx₁(m)の３サンプル毎の移動平均値を帯域時間エンベロープ信号x₁￣(m)として計算した場合を示す。帯域時間エンベロープ信号x₁￣(m)を高域エンベロープ推定部２３に転送する。高域エンベロープ推定部２３では、帯域時間エンベロープx₁￣(m)を入力とし、帯域毎に異なる重みw_i(i=2〜16)を乗じて、高域の時間エンベロープをw_ix₁￣(m)として推定し、出力する。重みw_iには、あらかじめ求めておいて長時間音声平均スペクトルの帯域毎の平均値の比を用いる。高域信号の推定時間エンベロープw_ix₁￣(m)を、信号低減部２４−２〜信号低減部２４−１６にそれぞれ転送する。信号低減部２４−２〜２４−１６では、帯域信号x_i(m)、推定時間エンベロープw_ix₁￣(m)を入力し、雑音低減処理を行い、処理信号y_i(m)をそれぞれ出力する。雑音低減処理は次式のように、各mに対して、帯域信号x_i(m)と推定時間エンベロープw_ix₁￣(m)を比較して、推定時間エンベロープw_ix₁￣(m)を上限値とする低減処理を行う。 The band-specific noise reduction device 15 receives the input signal X (n) and outputs a band-specific processed signal Y ₂ (n). FIG. 2 shows a configuration diagram of the noise reduction device 15 for each band, and the operation thereof will be described. The input signal X (n) is transferred to the band dividing unit 21. The band dividing unit 21 receives the input signal X (n) and divides it into frequency bands. Here, an equally divided filter bank is used to divide the signal into 16-band signals X _i (m) (i = 1 to 16) and output them. m is an integer value representing the time representation of the signal as discrete time. In addition, band division using Fourier transformation, unequal division filter bank, and wavelet transformation may be used. Here, by using a band division method with high time resolution, it is possible to perform processing that follows temporal fluctuations of non-stationary noise such as a paper turning sound. Of the signals in each band, the low-frequency signal is transferred to the time envelope calculation unit 22 and the signal synthesis unit 25. Here, the lowest band signal x ₁ (m) of the band divided into 16 is transferred. The other band division signals x ₂ (m) to x ₁₆ (m) are transferred to the signal reduction unit 24-2 to the signal reduction unit 24-16, respectively. The time envelope calculation unit 22 receives the band division signal x ₁ (m), calculates a time envelope (for example, a moving average value of a plurality of samples), and outputs a band time envelope signal x ₁ ￣ (m). Here, x ₁ ￣ (m) = (Σ _{k = m−1} ^{k = m + 1} × ₁ (m)) / 3 is calculated. That is, in this shows the case of calculating the moving average of every three samples x ₁ (m) as a band temporal envelope signals x ₁ ¯ (m). The bandwidth time envelope signal x ₁ ￣ (m) is transferred to the high frequency envelope estimation unit 23. The high frequency envelope estimator 23 receives the band time envelope x ₁ ￣ (m) as input, and multiplies the weight w _i (i = 2 to 16) different for each band to obtain the high frequency time envelope w _i x ₁ ￣. Estimate and output as (m). As the weight w _i , the ratio of the average value for each band of the long-time average speech spectrum obtained in advance is used. The estimated time envelope w _i x ₁ ￣ (m) of the high frequency signal is transferred to the signal reduction unit 24-2 to the signal reduction unit 24-16, respectively. In the signal reduction units 24-2 to 24-16, the band signal x _i (m) and the estimated time envelope w _i x ₁ ￣ (m) are input, noise reduction processing is performed, and the processed signal y _i (m) is respectively received. Output. For each m, the noise reduction process compares the band signal x _i (m) with the estimated time envelope w _i x ₁ ￣ (m) for each m, and the estimated time envelope w _i x ₁ ￣ (m ) Is set to the upper limit value.

y_i(m)=w_ix₁￣(m) (x_i(m)> w_ix₁￣(m)の場合)
y_i(m)=x_i(m) (x_i(m)<w_ix₁￣(m)の場合)
処理信号y_i(m)(i=2〜16)を信号合成部２５に転送する。信号合成部２５は、低域信号x₁(m)、処理信号y_i(m)(i=2〜16)を入力とし、帯域別処理信号Y₂(n)を出力する。ここでは、等分割フィルタバンク合成を用い、低域信号x1(m)、処理信号y_i(m) (i=2〜16)を合成する。図１に示す実施例構造を採る場合は帯域別処理信号Y₂(n)を加算器１６に転送する。
加算器１６では、信号Y₁(n)、帯域別処理信号Y₂(n)を入力とし、出力信号Y(n)= Y₁(n)+ Y₂(n)を出力する。 y _i (m) = w _i x ₁ ￣ (m) (when x _i (m)> w _i x ₁ ￣ (m))
y _i (m) = x _i (m) (if x _i (m) <w _i x ₁ ￣ (m))
The processing signal y _i (m) (i = 2 to 16) is transferred to the signal synthesis unit 25. The signal synthesizer 25 receives the low-frequency signal x ₁ (m) and the processed signal y _i (m) (i = 2 to 16) as inputs, and outputs a band-specific processed signal Y ₂ (n). Here, the low-frequency signal x1 (m) and the processed signal y _i (m) (i = 2 to 16) are synthesized using equal division filter bank synthesis. When the embodiment structure shown in FIG. 1 is adopted, the band-specific processed signal Y ₂ (n) is transferred to the adder 16.
The adder 16 receives the signal Y ₁ (n) and the band-specific processed signal Y ₂ (n) and outputs an output signal Y (n) = Y ₁ (n) + Y ₂ (n).

以上説明した帯域別雑音低減装置１５によれば最低域信号x₁(n)は音声成分が支配的であるためその時間エンベロープは雑音の影響は少ない。この雑音の影響が少ない時間エンベロープから各帯域分割信号x₂(n)〜x₁₆(n)の時間エンベロープw_ix₁￣(m)を推定し、その推定値w_ix₁￣(m)と現実の帯域分割信号x₂(n)〜x₁₆(n)のそれぞれのレベルを比較し、その小さい方（雑音の影響が小さい方）を選択して処理値Y_i(m)としたから、この時点で雑音成分が除去される。尚、上述では帯域分割した最低域信号の時間エンベロープを雑音の少ない模範信号としたが、最低域側の複数の帯域の時間エンベロープを求め、それぞれを合成して模範信号とすることもできる。この場合には各帯域分割信号x₂(m)〜x₁₆(m)の時間エンベロープw_ix₁￣(m)を推定する重みw_iを適宜設定すればよい。 According to the noise reduction device 15 for each band described above, the voice signal component is dominant in the lowest frequency signal x ₁ (n), so that the time envelope is less affected by noise. Estimate the time envelope w _i x ₁ ￣ (m) of each band-divided signal x ₂ (n) to x ₁₆ (n) from the time envelope that is less affected by this noise, and the estimated value w _i x ₁ ￣ (m) And the actual levels of the band division signals x ₂ (n) to x ₁₆ (n) are compared, and the smaller one (the one with less influence of noise) is selected as the processed value Y _i (m). At this point, the noise component is removed. In the above description, the time envelope of the lowest frequency band signal divided is the model signal with less noise. However, the time envelopes of a plurality of bands on the lowest frequency side can be obtained and combined to form a model signal. In this case, the weight w _i for estimating the time envelope w _i x ₁ ￣ (m) of each of the band division signals x ₂ (m) to x ₁₆ (m) may be set as appropriate.

本発明における実施例２の雑音低減装置１００の構成は実施例１の構成と等しく、高域エンベロープ推定部２３の動作のみ異なる。高域エンベロープ推定部２３では、帯域時間エンベロープ信号x₁￣(m)を入力とし、帯域毎に異なる重みw_i(i=2〜16)を乗じて、高域の時間エンベロープをw_ix₁￣(m)と推定し、出力する。重みw_iを小さくし、強い低減を図る。例えば、w_i=0.1(i=2〜3)、w_i=0.05(i=4〜6)、w_i=0.03(i=7〜8)、w_i=0.02(i=9〜13)、w_i=0.01(i=14〜16)とする。これにより、高周波域を特に強く低減することで、聴感上の雑音低減の効果を上げることができる。 The configuration of the noise reduction apparatus 100 according to the second embodiment of the present invention is the same as that according to the first embodiment, and only the operation of the high frequency envelope estimation unit 23 is different. The high frequency envelope estimation unit 23 receives the band time envelope signal x ₁ ￣ (m) and multiplies the weight w _i (i = 2 to 16) which is different for each band to obtain the high frequency time envelope w _i x _1. Estimate ￣ (m) and output. The weight w _i is reduced to achieve a strong reduction. For example, w _i = 0.1 (i = 2 to 3), w _i = 0.05 (i = 4 to 6), w _i = 0.03 (i = 7 to 8), w _i = 0.02 (i = 9 to 13), Let w _i = 0.01 (i = 14-16). As a result, the noise reduction effect on hearing can be enhanced by particularly strongly reducing the high frequency range.

図３に本発明における実施例３の雑音低減装置１００の構成を示す。実施例３の雑音低減装置１００の構成は実施例１の構成とほぼ等しいが、原音調整部１３と処理音調整部１７を設けた点が異なり、雑音判別部１１の動作が異なる。原音調整部１３と処理音調整部１７はそれぞれ可変利得増幅器で構成することができ、任意の利得に制御することができる。
雑音判別部１１は判別結果に従って、原音調整部１３への利得制御信号C₁、切替手段１４への制御信号C₂、処理音調整部１７への利得制御信号C₃を出力とする。処理フレームが音声かつ非定常雑音なしのとき、利得制御信号C₁=1、制御信号C₂=0（スイッチオフ）、利得制御信号C₃=0を出力する。処理フレームが音声かつ非定常雑音ありのとき、利得制御信号C₁=α、制御信号C₂=1（スイッチオン）、利得制御信号C₃=1-αを出力する。ここで、αは付加率である。ここでは、α=0.1とする。処理フレームが非音声のとき、利得制御信号C₁=0を原音調整部１３へ転送し、制御信号C₂=0（スイッチオフ）を切替手段１４へ転送し、利得制御信号C₃=0を処理音調整部１７へ転送する。 FIG. 3 shows the configuration of the noise reduction apparatus 100 according to the third embodiment of the present invention. The configuration of the noise reduction apparatus 100 according to the third embodiment is almost the same as the configuration according to the first embodiment, except that the original sound adjustment unit 13 and the processing sound adjustment unit 17 are provided, and the operation of the noise determination unit 11 is different. Each of the original sound adjusting unit 13 and the processed sound adjusting unit 17 can be configured by a variable gain amplifier, and can be controlled to an arbitrary gain.
The noise discriminating unit 11 outputs a gain control signal C ₁ to the original sound adjusting unit 13, a control signal C ₂ to the switching unit 14, and a gain control signal C ₃ to the processing sound adjusting unit 17 according to the discrimination result. When the processing frame is voice and there is no non-stationary noise, a gain control signal C ₁ = 1, a control signal C ₂ = 0 (switch off), and a gain control signal C ₃ = 0 are output. When the processing frame is voice and has non-stationary noise, a gain control signal C ₁ = α, a control signal C ₂ = 1 (switch on), and a gain control signal C ₃ = 1−α are output. Here, α is an addition rate. Here, α = 0.1. When the processing frame is non-speech, the gain control signal C ₁ = 0 is transferred to the original sound adjustment unit 13, the control signal C ₂ = 0 (switch off) is transferred to the switching unit 14, and the gain control signal C ₃ = 0 is set. Transfer to the processing sound adjustment unit 17.

原音調整部１３では、利得制御信号C₁、入力信号X（n）を入力とし、信号C_１Y_１(n)を出力し、信号C_１Y_１(n)を加算器１６に転送する。
処理音調整部１７では、利得制御信号C₃、入力信号Y₂(n)を入力とし、信号C₃Y₂(n)を出力し、信号C₃Y₂(n)を加算器１６に転送する。
雑音低減処理を行う場合に、入力信号（原音）に付加率αを乗じたものを出力信号に付与することにより、処理による音声劣化を抑えることができる。 The direct sound adjusting unit 13, the gain control signal C _1, receives the input signal X (n), and outputs a signal C ₁ Y _{1 (n),} and transfers the signal C ₁ Y _{1 (n)} to the adder 16.
In the processing sound adjustment section 17, the gain control signal C _3, receives the input signal Y ₂ (n), and outputs a signal C ₃ Y ₂ (n), a signal C ₃ Y ₂ (n) to the adder 16 transfers To do.
When noise reduction processing is performed, speech deterioration due to processing can be suppressed by giving the input signal (original sound) multiplied by the addition rate α to the output signal.

図４に本発明における実施例４の雑音低減装置１００の構成を示す。実施例４の雑音低減装置１００の構成は実施例１の構成とほぼ等しいが、低域音声合成装置１８と加算器１９を設けた点と、帯域別雑音低減装置１５の動作が異なる。
図２に示した帯域別雑音低減装置１５の最低域信号x₁(m)は雑音除去処理を施されることなく直接信号合成部２５に転送されていることから、この最低域信号x₁(m)に重畳している雑音はそのまま除去されずに出力されることになる。このため、この実施例４では低域に重畳している雑音を除去し、加算器１９に送出するように構成するものである。 FIG. 4 shows the configuration of the noise reduction apparatus 100 according to the fourth embodiment of the present invention. The configuration of the noise reduction device 100 according to the fourth embodiment is almost the same as the configuration according to the first embodiment, but the operation of the noise reduction device for each band 15 is different from that of the low frequency speech synthesizer 18 and the adder 19.
Since the lowest band signal x ₁ (m) of the noise reduction device 15 for each band shown in FIG. 2 is directly transferred to the signal synthesis unit 25 without being subjected to noise removal processing, this lowest band signal x ₁ ( The noise superimposed on m) is output without being removed. For this reason, in the fourth embodiment, noise superimposed on the low band is removed and sent to the adder 19.

この実施例で用いる帯域別雑音低減装置１５は、その構成は実施例１と等しいが、信号合成部２５の動作のみ異なる。信号合成部２５は、最低域信号x₁(m)と処理信号y_i(m)(i=2〜16)を入力とし、帯域別処理信号Y₂(n)を出力するが、低域音声合成装置１８と組み合わせることを考慮し、高域の信号y₂(m)〜y₁₆(m)のみを合成する。ここでは、等分割フィルタバンク合成を用い、処理信号y_i(m)(i=4〜16)を合成する。帯域別処理信号Y₂(n)を加算器１９に転送する。
低域音声合成装置１８では、入力信号X(n)を入力とし、雑音が除去された低域音声合成処理信号Y₃(n)を出力する。図５に低域音声合成装置１８の構成を示し、その動作について説明する。ここで用いる低域音声合成装置１８は簡単に説明すると、入力信号を線形予測分析し、残差信号から、基本周期、残差スペクトルを求め、残差スペクトル上で基本周波数のK倍(K=1,2,3…n)にあたる調波構造の振幅を推定し、その推定された基本周期、調波構造の振幅を用いて音声を合成する技術である。この音声合成技術はボコーダ技術として知られている。ここで、基本周期、調波構造の振幅という音声の特徴パラメータのみを分析し、合成しているので、スペクトルピークの間の谷を掘る形になり、雑音を低減することができる。 The band-specific noise reduction device 15 used in this embodiment has the same configuration as that of the first embodiment, but only the operation of the signal synthesis unit 25 is different. The signal synthesizer 25 receives the lowest frequency signal x ₁ (m) and the processed signal y _i (m) (i = 2 to 16) and outputs a processed signal Y ₂ (n) for each band. Considering the combination with the synthesizer 18, only the high-frequency signals y ₂ (m) to y ₁₆ (m) are synthesized. Here, the processing signal y _i (m) (i = 4 to 16) is synthesized using equal division filter bank synthesis. The band-specific processing signal Y ₂ (n) is transferred to the adder 19.
The low frequency speech synthesizer 18 receives the input signal X (n) and outputs a low frequency speech synthesis processing signal Y ₃ (n) from which noise has been removed. FIG. 5 shows the configuration of the low frequency speech synthesizer 18 and its operation will be described. Briefly, the low-frequency speech synthesizer 18 used here performs linear prediction analysis on the input signal, obtains a fundamental period and a residual spectrum from the residual signal, and K times the fundamental frequency (K = This is a technique for estimating the amplitude of the harmonic structure corresponding to 1,2,3... N) and synthesizing speech using the estimated fundamental period and amplitude of the harmonic structure. This speech synthesis technology is known as vocoder technology. Here, since only the speech characteristic parameters such as the fundamental period and the amplitude of the harmonic structure are analyzed and synthesized, a valley is formed between the spectrum peaks, and noise can be reduced.

低域音声合成装置１８の内部では入力信号X(n)をLPC分析部５１に転送する。LPC分析部５１では、入力信号X(n)を入力とし、LPC分析を行い、残差信号e(n)、線形予測係数α_jを出力する。ここでは、線形予測次数を１４とし、j=1〜14となる。残差信号e(n)を、基本周期推定部５２と、周波数分析部５３に転送する。線形予測係数α_jをLPC合成部５７に転送する。基本周期推定部５２では、残差信号e(n)を入力とし、信号の基本周期を推定し、推定基本周期Tを出力する。ここでは、残差信号e(n)の自己相関を計算し、ピークを抽出し、推定基本周期Tとする。推定基本周期Tを調波構造分析部５４に転送する。周波数分析部５３では、残差信号e(n)を入力とし、残差スペクトルe(ω)を出力する。ここでは、周波数分析にFFTを（高速フーリエ変換）を用いる。 Inside the low frequency speech synthesizer 18, the input signal X (n) is transferred to the LPC analyzer 51. The LPC analysis unit 51 receives the input signal X (n), performs LPC analysis, and outputs a residual signal e (n) and a linear prediction coefficient α _j . Here, the linear prediction order is 14, and j = 1 to 14. The residual signal e (n) is transferred to the basic period estimation unit 52 and the frequency analysis unit 53. The linear prediction coefficient α _j is transferred to the LPC synthesis unit 57. The fundamental period estimation unit 52 receives the residual signal e (n), estimates the fundamental period of the signal, and outputs an estimated fundamental period T. Here, the autocorrelation of the residual signal e (n) is calculated, the peak is extracted, and is assumed to be the estimated basic period T. The estimated fundamental period T is transferred to the harmonic structure analysis unit 54. The frequency analysis unit 53 receives the residual signal e (n) and outputs a residual spectrum e (ω). Here, FFT (Fast Fourier Transform) is used for frequency analysis.

残差スペクトルe(ω)を調波構造分析部５４に転送する。調波構造分析部５４では、推定基本周期T、残差スペクトルe(ω)を入力とし、推定基本周期T、ピーク振幅a_kを出力する。推定基本周波数1/Tのk倍の振幅を、残差スペクトルe(ω)上で求めa_kとする。ここでは、k=1〜11として計算する。
推定基本周期T、ピーク振幅a_kを調波構造合成部５５に転送する。調波構造合成部５５では、推定基本周期T、ピーク振幅a_kを入力とし、推定残差スペクトルe_e(ω)を合成する。推定基本数端数1/Tのk倍の周波数上に振幅がピーク振幅a_kとなるハミング窓をFFTしたものを配置し、推定残差スペクトルe_e(ω)を合成する。推定残差スペクトルe_e(ω)を逆周波数分析部５６に転送する。逆周波数分析部５６では、推定残差スペクトルe_e(ω)を入力とし、推定残差信号e_e(n)を出力する。ここでは、逆周波数分析にIFFT（逆フーリエ変換）を用いる。推定残差信号e_e(n)をLPC合成部５７に転送する。LPC合成部５７では、線形予測係数α_j、推定残差信号e_e(n)を入力とし、LPC合成を行い、処理信号Y_e(n)を出力する。処理信号Y_e(n)をフィルタ処理部５８に転送する。フィルタ処理部５８では処理信号Y_e(n)を入力とし、フィルタ処理を行い、低域音声合成処理信号Y₃(n)を出力する。ここでは2kHzを遮断周波数とするローパスフィルタリング処理を行う。低域音声合成処理信号Y₃(n)を加算器１９（図４）に転送する。加算器１９では、帯域別雑音低減処理信号Y₂(n)、低域音声合成処理信号Y₃(n)を入力とし、出力信号Y₄(n)= Y₂(n)+Y₃(n)を出力する。 The residual spectrum e (ω) is transferred to the harmonic structure analysis unit 54. The harmonic structure analysis unit 54 receives the estimated fundamental period T and the residual spectrum e (ω), and outputs the estimated fundamental period T and the peak amplitude a _k . An amplitude k times the estimated fundamental frequency 1 / T is obtained on the residual spectrum e (ω) and is defined as a _k . Here, k = 1 to 11 is calculated.
The estimated basic period T and the peak amplitude a _k are transferred to the harmonic structure synthesis unit 55. The harmonic structure synthesis unit 55 receives the estimated fundamental period T and the peak amplitude a _k and synthesizes the estimated residual spectrum e _e (ω). An estimated residual spectrum e _e (ω) is synthesized by placing an FFT of a Hamming window whose amplitude is the peak amplitude a _k on a frequency that is k times the estimated fundamental fraction 1 / T. The estimated residual spectrum e _e (ω) is transferred to the inverse frequency analysis unit 56. The inverse frequency analysis unit 56 receives the estimated residual spectrum e _e (ω) as an input and outputs an estimated residual signal e _e (n). Here, IFFT (Inverse Fourier Transform) is used for inverse frequency analysis. The estimated residual signal e _e (n) is transferred to the LPC synthesis unit 57. The LPC synthesis unit 57 receives the linear prediction coefficient α _j and the estimated residual signal e _e (n), performs LPC synthesis, and outputs a processed signal Y _e (n). The processing signal Y _e (n) is transferred to the filter processing unit 58. The filter processing unit 58 receives the processing signal Y _e (n) as input, performs filter processing, and outputs a low-frequency speech synthesis processing signal Y ₃ (n). Here, low-pass filtering is performed with a cutoff frequency of 2 kHz. The low frequency speech synthesis processing signal Y ₃ (n) is transferred to the adder 19 (FIG. 4). The adder 19 receives the band-specific noise reduction processing signal Y ₂ (n) and the low-frequency speech synthesis processing signal Y ₃ (n) and outputs the output signal Y ₄ (n) = Y ₂ (n) + Y ₃ (n ) Is output.

図６に本発明における実施例５の雑音低減装置１００の構成を示す。実施例５の雑音低減装置１００の構成は実施例４の構成とほぼ等しいが、処理音調整部１７を設けた点が異なり、雑音判別部１１の動作が異なる。処理音調整部１７と、雑音判別部１１の動作は図３に示した実施例３の動作に等しい。
雑音低減処理を行う場合に、入力信号に付加率αを乗じたものを出力信号に付加することにより、図３の場合と同様に処理による音声劣化を抑えることができる効果が得られる。 FIG. 6 shows the configuration of the noise reduction apparatus 100 according to the fifth embodiment of the present invention. The configuration of the noise reduction apparatus 100 according to the fifth embodiment is substantially the same as the configuration according to the fourth embodiment, but differs in that the processing sound adjustment unit 17 is provided and the operation of the noise determination unit 11 is different. The operations of the processed sound adjustment unit 17 and the noise discrimination unit 11 are the same as those of the third embodiment shown in FIG.
When noise reduction processing is performed, by adding the input signal multiplied by the addition rate α to the output signal, it is possible to obtain the effect of suppressing speech degradation due to the processing as in the case of FIG.

上述した本発明による雑音低減装置はハードウェアによって構成することができるが、これより簡素に実現するにはこの発明による雑音低減プログラムをコンピュータにインストールし、コンピュータに備えた中央演算処理装置(CPU)によりプログラムを解読させ、雑音低減プログラムを実行させることによりコンピュータに雑音低減装置として機能させる実施形態が最良である。
本発明による雑音低減プログラムはコンピュータが解読可能なプログラム言語によって記述され、コンピュータが読み取り可能な例えば磁気ディスク或はCD-ROMのような記録媒体に記録され、これらの記録媒体からコンピュータにインストールするか、又は通信回線を通じてインストールすることができる。 The above-described noise reduction apparatus according to the present invention can be configured by hardware. However, in order to implement it more simply, a noise reduction program according to the present invention is installed in a computer, and a central processing unit (CPU) provided in the computer. In this embodiment, the computer is made to function as a noise reduction device by causing the computer to decrypt the program and execute the noise reduction program.
The noise reduction program according to the present invention is written in a computer-readable program language, recorded on a computer-readable recording medium such as a magnetic disk or a CD-ROM, and installed in the computer from these recording media. Or can be installed through a communication line.

本発明による雑音低減装置はTV会議システム等の拡声通話等の分野で活用される。 The noise reduction apparatus according to the present invention is utilized in the field of loudspeaking calls such as a video conference system.

本発明の第１及び第２の実施例を説明するためのブロック図。The block diagram for demonstrating the 1st and 2nd Example of this invention. 図１に示した帯域別雑音低減装置の内部構成を説明するためのブロック図。The block diagram for demonstrating the internal structure of the noise reduction apparatus classified by band shown in FIG. 本発明の第３の実施例を説明するためのブロック図。The block diagram for demonstrating the 3rd Example of this invention. 本発明の第４の実施例を説明するためのブロック図。The block diagram for demonstrating the 4th Example of this invention. 図４に示した低域音声合成装置の内部構成を説明するためのブロック図。The block diagram for demonstrating the internal structure of the low frequency speech synthesizer shown in FIG. 本発明の第５の実施例を説明するためのブロック図。The block diagram for demonstrating the 5th Example of this invention.

Explanation of symbols

１１雑音判別部５１ LPC分析部
１３原音調整部５２基本周期推定部
１４切替手段５３周波数分析部
１５帯域別雑音低減装置５４調波構造分析部
１６加算器５５調波構造合成部
１７処理音調整部５６逆周波数分析部
１８低域音声合成装置５７ LPC合成部
２１帯域分割部５８フィルタ処理部
２２時間エンベロープ計算部
２３高域エンベロープ推定部
２４信号低減部
２５信号合成部

DESCRIPTION OF SYMBOLS 11 Noise discrimination | determination part 51 LPC analysis part 13 Original sound adjustment part 52 Fundamental period estimation part 14 Switching means 53 Frequency analysis part 15 Noise reduction apparatus classified by band 54 Harmonic structure analysis part 16 Adder 55 Harmonic structure synthesis part 17 Process sound adjustment part 56 Inverse frequency analysis unit 18 Low frequency speech synthesizer 57 LPC synthesis unit 21 Band division unit 58 Filter processing unit 22 Time envelope calculation unit 23 High frequency envelope estimation unit 24 Signal reduction unit 25 Signal synthesis unit

Claims

Divide the signal into multiple bands using the band division method with high time resolution for the voice noise mixed signal in which the target audio signal and unnecessary noise signal are mixed. Calculate the time envelope, multiply the calculated low-frequency time envelope by the weight for each band and estimate it as a high-frequency envelope, and determine the high-frequency signal of the divided band signals and the estimated high-frequency envelope. characterized in that comparison, performs noise reduction processing of up to the high frequency envelope is the estimated for each band, including per-band noise reduction method for synthesizing each band signal and the low frequency signal the noise reduction processing A noise reduction method.

Divide the signal into multiple bands using the band division method with high time resolution for the voice noise mixed signal in which the target audio signal and unnecessary noise signal are mixed. Calculate the time envelope, multiply the calculated low-frequency time envelope by the weight for each band and estimate it as a high-frequency envelope, and determine the high-frequency signal of the divided band signals and the estimated high-frequency envelope. In comparison, noise reduction processing for each band that performs the above-described estimated high-frequency envelope as an upper limit, and each band signal subjected to noise reduction processing and the low-frequency signal is synthesized,
Performs LPC analysis of the signal mixed with the target audio signal and unwanted noise signal, performs LPC analysis of the signal, estimates the basic period using the residual signal of LPC analysis, and performs frequency analysis of the residual signal Calculate the harmonic structure amplitude using the estimated fundamental period and the calculated residual spectrum, synthesize the residual spectrum using the estimated fundamental period and the estimated harmonic structure amplitude, and A noise reduction method characterized by performing an inverse frequency analysis, performing LPC synthesis using a synthesized residual signal and a linear prediction coefficient, and a low-frequency speech synthesis method for performing low-pass filter processing on the signal.

For voice / noise mixed signals in which the target voice signal and unnecessary noise signals are mixed, it is determined for each processing frame whether it is voice, non-voice, or non-stationary noise if it is voice, If the processing frame is a speech signal that does not include non-stationary noise, the input signal is output as it is. If the processing frame is a speech signal that includes non-stationary noise, the noise reduction method according to claim 1 or 2 is used. A noise reduction method, characterized in that an input signal is subjected to noise reduction processing and output, and nothing is output when the processing frame is a non-voice signal.

For voice / noise mixed signals in which the target voice signal and unnecessary noise signals are mixed, it is determined for each processing frame whether it is voice, non-voice, or non-stationary noise if it is voice, 3. The noise reduction according to claim 1, wherein the input signal is output as it is when the processing frame is a speech signal including no stationary noise, and the input signal and the noise reduction according to claim 1 or 2 when the processing frame is a speech signal including the nonstationary noise. A noise reduction method comprising: outputting a weighted addition signal with a signal obtained by performing noise reduction processing on an input signal using the method, and outputting nothing when the processing frame is a non-speech signal.

A band division unit that divides a signal into multiple bands using a band division method with high time resolution for a voice noise mixed signal in which a target audio signal and an unnecessary noise signal are mixed, and among the divided band signals A time envelope calculator that calculates the time envelope of the low frequency signal, a high frequency envelope estimator that estimates the high frequency envelope by multiplying the calculated low frequency time envelope by the weight for each band, and A high-frequency signal and the estimated high-frequency envelope are compared, a noise reduction unit for performing noise reduction processing for each band with the estimated envelope as an upper limit, and each band signal subjected to noise reduction processing and the above-mentioned The noise reduction apparatus comprised by the noise reduction apparatus classified by band provided with the signal synthetic | combination part which synthesize | combines a low frequency signal.

For a mixed audio / noise signal in which the target audio signal and unnecessary noise signal are mixed, a band dividing unit that divides the signal into a plurality of bands using a band dividing method with high time resolution, and a divided band signal Of these, the time envelope calculation unit that calculates the time envelope of the low frequency signal, the high frequency envelope estimation unit that estimates the high frequency envelope by multiplying the calculated time envelope by the weight for each band, and the divided band signal A high-frequency signal and the estimated high-frequency envelope are compared, a noise reduction unit for performing noise reduction processing for each band with the estimated envelope as an upper limit, and each band signal subjected to noise reduction processing and the above-mentioned A noise reduction device for each band including a signal synthesis unit for synthesizing a low-frequency signal;
A signal analysis unit that performs LPC analysis of the signal and a basic period estimation unit that estimates the basic period using the residual signal of the LPC analysis for the speech and noise mixed signal in which the target speech signal and unnecessary noise signal are mixed A frequency analyzer for frequency analysis of the residual signal, a harmonic structure synthesizer for calculating the amplitude of the harmonic structure using the residual spectrum calculated as the estimated fundamental period, and an inverse frequency analysis of the synthesized residual spectrum Low-frequency speech synthesizer comprising: an inverse frequency analysis unit that performs LPC synthesis using a synthesized residual signal and a linear prediction coefficient; and a filter processing unit that performs low-pass filter processing on the signal A noise reduction device characterized by having a configuration in which the two are arranged in parallel.

If the processing frame is speech, non-speech, or speech for a speech-noise mixed signal in which a target speech signal and an unnecessary noise signal are mixed on the front side of the noise reduction device according to claim 5 It is determined whether or not non-stationary noise exists, and a first control signal is generated when the processing frame is an audio signal that does not include non-stationary noise. When the processing frame is an audio signal including non-stationary noise, A noise determination unit that generates a second control signal and generates a third control signal when the processing frame is a non-speech signal;
When the noise discriminating unit generates the first control signal, the input signal is output as it is, and when the noise discriminating unit generates the second control signal, the input signal is input to the noise reduction device. Switching means for switching to a state in which nothing is output when the noise determination unit generates the third control signal;
A noise reduction apparatus characterized by having a configuration in which

Whether the processing frame is speech or non-speech with respect to a speech noise mixed signal in which a target speech signal and an unnecessary noise signal are mixed on the front side of the noise reduction device according to claim 5 or 6 In the case of speech, it is determined whether or not non-stationary noise exists. If the processing frame is a speech signal that does not include non-stationary noise, a first control signal is generated, and the processing frame includes non-stationary noise. A noise determination unit that generates a second control signal in the case of an audio signal and generates a third control signal when the processing frame is a non-audio signal;
When the noise determination unit generates the first control signal or the third control signal , the input signal is not input to the noise reduction device, and when the noise determination unit generates the second control signal, the input signal is not input. Switching means for switching to a state of input to the noise reduction device ;
In addition,
Original sound adjusting means for multiplying an input signal by a gain of a predetermined value ;
An adder for adding a signal obtained by multiplying an input signal by a gain of a predetermined value and an output signal from the noise reduction device when the noise determination unit generates the second control signal;
A noise reduction device characterized by having a configuration comprising:

Whether the processing frame is speech or non-speech with respect to a speech noise mixed signal in which a target speech signal and an unnecessary noise signal are mixed on the front side of the noise reduction device according to claim 5 or 6 In the case of speech, it is determined whether or not non-stationary noise exists. If the processing frame is a speech signal that does not include non-stationary noise, a first control signal is generated, and the processing frame includes non-stationary noise. A noise determination unit that generates a second control signal in the case of an audio signal and generates a third control signal when the processing frame is a non-audio signal;
When the noise determination unit generates the first control signal or the third control signal, the input signal is not input to the noise reduction device, and when the noise determination unit generates the second control signal, the input signal is not input. Switching means for switching to a state of input to the noise reduction device;
In addition,
Original sound adjusting means for multiplying an input signal by a gain of a first predetermined value;
Processing sound adjustment means for multiplying a signal output from the noise reduction device by a gain of a second predetermined value;
An adder for adding a signal obtained by multiplying an input signal by a gain of a first predetermined value and an output signal from the processed sound adjusting means when the noise discriminating unit generates the second control signal; ,
A noise reduction device characterized by having a configuration comprising:

A noise reduction program that is written in a computer-readable program language and causes the computer to function as the noise reduction device according to any one of claims 5 to 9 .

A recording medium comprising a computer-readable recording medium, wherein the noise reduction program according to claim 10 is recorded on the recording medium.