JP2005107448A

JP2005107448A - Noise reduction processing method, and device, program, and recording medium for implementing same method

Info

Publication number: JP2005107448A
Application number: JP2003344406A
Authority: JP
Inventors: Kenichi Noguchi; 賢一野口; Kiyotaka Sakauchi; 澄宇阪内; Yoichi Haneda; 陽一羽田; Akitoshi Kataoka; 章俊片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-10-02
Filing date: 2003-10-02
Publication date: 2005-04-21
Anticipated expiration: 2023-10-02
Also published as: JP4460256B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a noise reduction processing method of suppressing distortion of a voice signal and suppressing auditory deterioration in voice quality by maintaining auditory noise reduction by providing a corrected noise power spectrum calculation part that regards the difference between a noise quantity and a masking threshold as a corrected noise quantity when a noise quantity on a near-end side is larger than the masking threshold and reduces the corrected noise quantity to zero when the noise quantity is smaller than the found masking threshold by paying attention to the state wherein ambient noise on a far-end side serves as a masker to mask mixed noise on the near-end side outputted from a speaker on the far-end side in a sound-reinforcing telephone call device. <P>SOLUTION: Disclosed is the noise reduction processing method in which a voice-noise mixed signal including a target voice signal and an unnecessary noise signal is inputted and a noise-reduced signal is outputted, the noise reduction processing method being characterized in that a masking threshold generated based upon ambient noise in environment wherein the noise-reduced signal is reproduced is estimated and only a mixed noise quantity exceeding the masking threshold is reduced. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

この発明は、雑音低減処理方法、この方法を実施する装置、プログラム、記録媒体に関し、特に、音声通信装置の如き拡声通話装置において、目的となる音声信号と不要な雑音信号が混在する音声雑音混在信号から雑音信号を低減する雑音低減処理方法、この方法を実施する装置、プログラム、記録媒体に関する。 The present invention relates to a noise reduction processing method, an apparatus, a program, and a recording medium for performing the method, and in particular, a voice noise mixture in which a target voice signal and an unnecessary noise signal are mixed in a voice communication apparatus such as a voice communication apparatus. The present invention relates to a noise reduction processing method for reducing a noise signal from a signal, an apparatus for implementing the method, a program, and a recording medium.

図３は、送話者側を近端側とし、受話者側を遠端側とした場合の拡声通話系の模式図を示したものである。図３において、３１は近端側マイクロホン、３２は近端側スピーカ、３３は遠端側マイクロホン、３４は遠端側スピーカ、３５は伝送路、３６は送話者、３７は受話者をそれぞれ表す。送話者３６の発声した送話音声は、近端側マイクロホン３１、伝送路３５、遠端側スピーカ３４を経て受話者３７に伝わる。この拡声通話系は、通信会議やテレビ電話、拡声電話機などに広く利用が進められている。
一方、近端側マイクロホン３１で受音するに際して、目的となる音声信号以外の周囲雑音が混入すると、遠端側スビーカ３４から発せられる送話者音声の明瞭性が損なわれて音声品質が著しく劣化する。このために、近端側マイクロホン３１と遠端側スピーカ３４の間に雑音低減装置を用いて、送話信号に含まれる目的音声以外の周囲雑音を低減することが求められている。 FIG. 3 shows a schematic diagram of a loudspeaker communication system when the transmitter side is the near end side and the receiver side is the far end side. In FIG. 3, 31 is a near-end microphone, 32 is a near-end speaker, 33 is a far-end microphone, 34 is a far-end speaker, 35 is a transmission line, 36 is a transmitter, and 37 is a receiver. . The transmitted voice uttered by the speaker 36 is transmitted to the receiver 37 via the near-end microphone 31, the transmission path 35, and the far-end speaker 34. This loudspeaker communication system is widely used for communication conferences, videophones, loudspeakers, and the like.
On the other hand, when ambient noise other than the target audio signal is mixed when receiving the sound by the near-end microphone 31, the clarity of the talker voice emitted from the far-end side speaker 34 is impaired, and the voice quality is significantly deteriorated. To do. For this reason, it is required to reduce ambient noise other than the target voice included in the transmitted signal by using a noise reduction device between the near-end microphone 31 and the far-end speaker 34.

従来の雑音低減処理方法としては、例えば、特許文献１に示されている方法がある。この技術の詳細は同公報に示されているので、ここにおいては図５を参照して簡単に説明するにとどめる。図示されないマイクロホンには、当該マイクロホンから離れた位置に居る話者の発声により得られる目的となる発声信号Ｓ（ｋ）と空調雑音その他の外来音に起因して発生する不要な雑音信号Ｎ（ｋ）とが入力信号Ｘ（ｋ）＝Ｓ（ｋ）＋Ｎ（ｋ）として収音される。ここで、ｋは信号の時間表現を離散時間として表わす整数値である。入力信号Ｘ（ｋ）は、第１の周波数領域変換部５１に入力され、例えば、短時間毎の離散フーリエ変換により周波数領域信号Ｘ（ω）に変換される。ωは周波数を表わす。入力信号パワースペクトル計算部５２は周波数領域信号Ｘ（ω）を入力してそのパワースペクトルＰａｖ_x（ω）を計算する。雑音パワースペクトル推定部５３は周波数領域信号Ｘ（ω）を入力してその内の雑音パワースペクトルＰａｖ_N（ω）を推定する。損失計算部５４はＰａｖ_x（ω）、Ｐａｖ_N（ω）を入力して損失値Ｌ（ω）を計算し、この結果を損失挿入部５５に転送する。損失挿入部５５は、損失計算部５４において計算された損失値Ｌ（ω）を用いてＹ（ω）＝Ｌ（ω）×Ｘ（ω）の計算により、雑音を低減した出力Ｙ（ω）を出力する。時間領域変換部５６はＹ（ω）を入力してこれを時間領域の信号に変換し、雑音を低減した時間領域の信号Ｙ（ｋ）を出力する。
特開平９−２５８７９２号公報 As a conventional noise reduction processing method, for example, there is a method disclosed in Patent Document 1. Details of this technique are disclosed in the publication, and will be described briefly with reference to FIG. A microphone (not shown) includes a target utterance signal S (k) obtained by the utterance of a speaker located away from the microphone, an unnecessary noise signal N (k) generated due to air conditioning noise and other external sounds. ) Is collected as the input signal X (k) = S (k) + N (k). Here, k is an integer value representing the time representation of the signal as discrete time. The input signal X (k) is input to the first frequency domain converter 51, and is converted into the frequency domain signal X (ω) by, for example, discrete Fourier transform every short time. ω represents a frequency. The input signal power spectrum calculation unit 52 receives the frequency domain signal X (ω) and calculates the power spectrum Pav _x (ω). The noise power spectrum estimation unit 53 receives the frequency domain signal X (ω) and estimates the noise power spectrum Pav _N (ω) therein. The loss calculator 54 receives Pav _x (ω) and Pav _N (ω), calculates a loss value L (ω), and transfers the result to the loss inserter 55. The loss insertion unit 55 uses the loss value L (ω) calculated by the loss calculation unit 54 to calculate Y (ω) = L (ω) × X (ω), thereby reducing the output Y (ω). Is output. The time domain conversion unit 56 receives Y (ω), converts it into a time domain signal, and outputs a time domain signal Y (k) with reduced noise.
JP-A-9-258792

周囲雑音が混入した入力信号の雑音を低減させる従来技術として、雑音のパワースペクトルを推定し、入力信号に雑音パワースペクトルに見合う損失値を挿入し、雑音を低減する方法がある。この方法では、雑音パワースペクトルを正確に推定し雑音低減量を大きくすると、改善ＳＮ比は高くなるが、雑音の推定誤差は必ず存在するために引き残された雑音、引き過ぎた音声信号が生じ音声信号が歪む。これらは時間的に変化するために聴感上好ましくない音を発生する問題がある。一方、処理による音声信号の歪みを抑えるために、推定雑音パワースペクトルを小さく見積もり雑音低減量を小さくすると、改善ＳＮ比が低くなり、混入した雑音を充分に低減させられないという問題がある。 As a conventional technique for reducing the noise of an input signal mixed with ambient noise, there is a method of estimating the noise power spectrum and inserting a loss value corresponding to the noise power spectrum into the input signal to reduce the noise. In this method, if the noise power spectrum is accurately estimated and the noise reduction amount is increased, the improved S / N ratio is increased, but noise estimation errors always exist, so that residual noise and excessive audio signals are generated. The audio signal is distorted. Since these change with time, there is a problem of generating unfavorable sound for hearing. On the other hand, if the estimated noise power spectrum is made small and the estimated noise reduction amount is made small in order to suppress the distortion of the audio signal due to processing, there is a problem that the improved SN ratio becomes low and the mixed noise cannot be reduced sufficiently.

この発明は、拡声通話装置において遠端側のスピーカから発せられる近端側の混入雑音を、遠端側の周囲雑音がマスカーとなりマスキングすることに着目し、近端側の雑音量が求めたマスキング閾値以上であれば、雑音量とマスキング閾値の差を補正雑音量とし、雑音量が求めたマスキング閾値以下であれば、補正雑音量を零とする補正雑音パワースペクトル計算部を設けることにより、音声信号の歪みを抑え、かつ聴感上の雑音低減を保持することによって聴感上音声品質劣化を抑える雑音低減処理方法、この方法を実施する装置、プログラム、記録媒体を提供するものである。 The present invention focuses on masking near-end side noise generated from a far-end speaker in a loudspeaker apparatus by using the near-end ambient noise as a masker and masking the amount of near-end noise. If the noise amount is equal to or greater than the threshold value, the difference between the noise amount and the masking threshold value is used as a corrected noise amount. The present invention provides a noise reduction processing method that suppresses signal quality degradation by suppressing signal distortion and maintaining auditory noise reduction, and an apparatus, a program, and a recording medium that implement this method.

請求項１：目的となる音声信号と不要な雑音信号の混在する音声雑音混在信号を入力し、雑音信号を低減した雑音低減済み信号を出力する雑音低減処理方法において、雑音低減済み信号が再生される環境における周囲雑音により形成されるマスキング閾値を推定し、推定されたマスキング閾値を超える混入雑音量だけを低減する雑音低減処理方法を構成した。 [Claim 1] In a noise reduction processing method of inputting a voice noise mixed signal in which a target voice signal and an unnecessary noise signal are mixed and outputting a noise reduced signal in which the noise signal is reduced, the noise reduced signal is reproduced. A noise reduction processing method for estimating a masking threshold value formed by ambient noise in a certain environment and reducing only an amount of mixed noise exceeding the estimated masking threshold value is constructed.

そして、請求項２：請求項１に記載される雑音低減処理方法において、入力される音声雑音混在信号を周波数領域の信号に変換し、周波数領域の信号に変換された音声雑音混在信号に対して音声雑音混在信号の信号パワースペクトルを計算し、周波数領域の信号に変換された音声雑音混在信号を用いて音声雑音混在信号の雑音パワースペクトルを計算し、一方、雑音低減済み信号が再生される環境において収音された再生環境収音信号を周波数領域の信号に変換し、周波数領域の信号に変換された再生環境収音信号を用いて再生環境収音信号の雑音パワースペクトルを計算し、再生環境収音信号の雑音パワースペクトルから周波数領域のマスキング閾値を計算し、マスキング閾値と音声雑音混在信号の雑音パワースペクトルとを比較し、マスキング閾値を超える音声雑音混在信号の補正雑音パワースペクトルを計算し、音声雑音混在信号の入力信号パワースペクトルと音声雑音混在信号の補正雑音パワースペクトルを用いて、補正音声雑音混在信号雑音の比率を予測し、これに基づいて損失値を計算し、周波数領域の信号に変換された音声雑音混在信号に対して損失値を挿入し、雑音を低減した信号を出力し、雑音を低減した信号を時間領域の音声信号に変換する雑音低減処理方法を構成した。
ここで、請求項３：目的となる音声信号と不要な雑音信号の混在する音声雑音混在信号を入力し、雑音信号を低減した雑音低減済み信号を出力する雑音低減処理装置において、雑音低減済み信号が再生される環境における周囲雑音が形成するマスキング閾値を推定し、マスキング閾値を超える混入雑音量だけを低減する雑音低減処理装置を構成した。 And, in the noise reduction processing method according to claim 2, the input voice noise mixed signal is converted into a frequency domain signal, and the voice noise mixed signal converted into the frequency domain signal is converted. Calculate the signal power spectrum of the mixed audio and noise signal, calculate the noise power spectrum of the mixed audio and noise signal using the mixed audio and noise signal converted to the frequency domain signal, and reproduce the noise-reduced signal Is converted to a frequency domain signal, and the noise power spectrum of the playback environment collected signal is calculated using the playback environment acquired signal converted to the frequency domain signal. The masking threshold in the frequency domain is calculated from the noise power spectrum of the collected signal, and the masking threshold is compared with the noise power spectrum of the mixed audio and noise signal. Calculates the corrected noise power spectrum of the mixed audio noise signal exceeding the threshold, and predicts the ratio of the corrected mixed noise signal noise using the input signal power spectrum of the mixed audio noise signal and the corrected noise power spectrum of the mixed audio noise signal Based on this, the loss value is calculated, the loss value is inserted into the audio noise mixed signal converted to the frequency domain signal, the noise reduced signal is output, and the noise reduced signal is output in the time domain. The noise reduction processing method to convert to the voice signal is constructed.
Claim 3: In a noise reduction processing apparatus for inputting a voice noise mixed signal in which a target voice signal and an unnecessary noise signal are mixed, and outputting a noise reduced signal in which the noise signal is reduced, the noise reduced signal The noise reduction processing apparatus is configured to estimate the masking threshold value formed by the ambient noise in the environment where the noise is reproduced, and to reduce only the amount of mixed noise exceeding the masking threshold value.

そして、請求項４：請求項３に記載される雑音低減処理装置において、入力される音声雑音混在信号を周波数領域の信号に変換する第１の周波数領域変換部５１と、周波数領域の信号に変換された音声雑音混在信号に対して音声雑音混在信号の信号パワースペクトルを計算する音声雑音混在信号の入力信号パワ−スペクトル計算部５２と、周波数領域の信号に変換された音声雑音混在信号を用いて音声雑音混在信号の雑音パワースペクトルを推定する音声雑音混在信号の雑音パワースペクトル推定部１１と、雑音低減済み信号が再生される環境において収音された再生環境収音信号を周波数領域の信号に変換する第２の周波数領域変換部１２と、周波数領域の信号に変換された再生環境収音信号を用いて雑音パワースペクトルを推定する再生環境収音信号の雑音パワースペクトル推定部１３と、再生環境収音信号雑音パワースペクトルから周波数領域のマスキング閾値を推定するマスキング閾値推定部１４と、マスキング閾値と音声雑音混在信号の雑音パワースペクトルとを比較し、マスキング閾値を超える音声雑音混在信号雑音パワースペクトルを計算する音声雑音混在信号の補正雑音パワースペクトル計算部１５と、音声雑音混在信号の入力信号パワースペクトルと音声雑音混在信号の補正雑音パワースペクトルを用いて、補正音声雑音混在信号雑音の比率を予測し、これに基づいて損失値を決定する損失計算部１６と、周波数
領域の信号に変換された音声雑音混在信号に対して損失値を挿入し、雑音を低減した信号を出力する損失挿入部５５と、雑音を低減した信号を時間領域の信号に変換する時間領域変換部５６とを具備する雑音低減処理装置を構成した。 And, in the noise reduction processing apparatus according to claim 4, the first frequency domain conversion unit 51 that converts the input audio noise mixed signal into the frequency domain signal, and the frequency domain signal. An input signal power spectrum calculation unit 52 for a mixed sound noise signal for calculating a signal power spectrum of the mixed sound noise signal with respect to the mixed sound noise signal, and a mixed sound noise signal converted into a frequency domain signal. A noise power spectrum estimation unit 11 for estimating a noise power spectrum of an audio noise mixed signal, and a reproduction environment sound collection signal collected in an environment where the noise-reduced signal is reproduced is converted into a frequency domain signal. Reproduction that estimates a noise power spectrum using a second frequency domain conversion unit 12 that performs the reproduction environment sound collection signal converted into a frequency domain signal A noise power spectrum estimation unit 13 for the boundary sound collection signal, a masking threshold estimation unit 14 for estimating a masking threshold value in the frequency domain from the reproduction environment sound collection signal noise power spectrum, and a noise power spectrum of the masking threshold value and the voice noise mixed signal. Comparing and correcting the noise power spectrum calculation unit 15 of the mixed noise signal for calculating the noise noise spectrum of the mixed noise signal exceeding the masking threshold, and the corrected noise power spectrum of the mixed input signal power spectrum and the mixed noise signal Is used to predict the ratio of the corrected speech noise mixed signal noise, and to determine the loss value based on this, and to insert the loss value into the speech noise mixed signal converted to the frequency domain signal The loss insertion unit 55 that outputs a signal with reduced noise and the signal with reduced noise in the time domain To constitute a noise reduction processing device comprising a time-domain converter 56 which converts the signal.

また、請求項５：入力される音声雑音混在信号を周波数領域の信号に変換し、周波数領域の信号に変換された音声雑音混在信号に対して音声雑音混在信号の信号パワースペクトルを計算し、周波数領域の信号に変換された音声雑音混在信号を用いて音声雑音混在信号の雑音パワースペクトルを計算し、雑音低減済み信号が再生される環境において収音された再生環境収音信号を周波数領域の信号に変換し、周波数領域の信号に変換された再生環境収音信号を用いて再生環境収音信号の雑音パワースペクトルを計算し、再生環境収音信号の雑音パワースペクトルから周波数領域のマスキング閾値を計算し、マスキング閾値と音声雑音混在信号の雑音パワースペクトルとを比較し、マスキング閾値を超える音声雑音混在信号の補正雑音パワースペクトルを計算し、音声雑音混在信号の入力信号パワースペクトルと音声雑音混在信号の補正雑音パワースペクトルを用いて、補正音声雑音混在信号雑音の比率を予測し、これに基づいて損失値を計算し、周波数領域の信号に変換された音声雑音混在信号に対して損失値を挿入し、雑音を低減した信号を出力し、雑音を低減した信号を時間領域の音声信号に変換する、指令をコンピュータに対してする雑音低減処理プログラムを構成した。
更に、請求項６：請求項５に記載される雑音低減処理プログラムが記録された記録媒体を構成した。 Further, the present invention converts an input voice noise mixed signal into a frequency domain signal, calculates a signal power spectrum of the voice noise mixed signal with respect to the voice noise mixed signal converted into the frequency domain signal, Calculates the noise power spectrum of the mixed audio and noise signal using the mixed audio and noise signal converted to the signal in the domain, and uses the frequency environment signal as the playback environment collected signal collected in the environment where the noise-reduced signal is reproduced. The noise power spectrum of the playback environment sound pickup signal is calculated using the playback environment sound pickup signal converted into the frequency domain signal, and the frequency domain masking threshold is calculated from the noise power spectrum of the playback environment sound pickup signal. The masking threshold is compared with the noise power spectrum of the voice noise mixed signal, and the corrected noise power spectrum of the voice noise mixed signal exceeding the masking threshold is compared. To calculate the loss value based on this, predict the ratio of the corrected voice noise mixed signal noise using the input noise power spectrum of the mixed voice noise signal and the corrected noise power spectrum of the mixed voice noise signal, Inserts a loss value into a voice / noise mixed signal converted to a frequency domain signal, outputs a noise-reduced signal, and converts the noise-reduced signal into a time-domain voice signal. Configured a noise reduction processing program.
Further, a recording medium on which the noise reduction processing program according to claim 6 is recorded is configured.

この発明は、上述した補正雑音量を用いることで、遠端側の周囲雑音がマスキングするマスキング閾値を超える近端側の混入雑音量だけを低減させることにより、雑音推定誤差により音声信号が歪み、聴感上音声品質劣化を招くという雑音低減処理における問題を解消した。マスキング閾値以下となる近端側の混入雑音に関しては低減しないので、改善ＳＮ比を高くすることはできないが、遠端側における近端側の混入雑音は検知限界以下で聞こえることはなく、聴感上充分に雑音を低減する効果がある。また、マスキング閾値以下となる近端側の混入雑音に関しては低減しないので、雑音低減量は小さくなり、雑音推定誤差の部分を低減しなくなるので、音声信号の歪みの発生を減少することができる。即ち、この発明によれば、聴感上雑音を低減させたままで雑音低減処理による音声品質の劣化を抑えることができる。 The present invention uses the above-described correction noise amount to reduce only the near-end side mixed noise amount exceeding the masking threshold for masking the far-end side ambient noise, thereby distorting the audio signal due to the noise estimation error, The problem of noise reduction processing that caused audio quality degradation on hearing was solved. The near-end mixed noise that is lower than the masking threshold is not reduced, so the improved SN ratio cannot be increased, but the near-end mixed noise at the far end is not audible below the detection limit. There is an effect of sufficiently reducing noise. Further, since the near-end side mixed noise that is equal to or less than the masking threshold is not reduced, the amount of noise reduction is reduced and the noise estimation error portion is not reduced, so that the occurrence of distortion of the audio signal can be reduced. That is, according to the present invention, it is possible to suppress deterioration in voice quality due to noise reduction processing while reducing noise in terms of hearing.

発明を実施するための最良の形態を図１の実施例を参照して説明する。
１１は雑音パワースペクトル推定部、１２は第２の周波数領域変換部、１３は雑音パワースペクトル推定部、１４はマスキング閾値推定部、１５は補正雑音パワースペクトル計算部を表わす。残りは、図５と同じの共通の記号を用いた。
この発明による雑音低減装置１０の動作について説明する。まず、近端側の目的信号と不要な周囲雑音等の混入する入力信号Ｘ（ｋ）は、第１の周波数領域変換部５１において、例えば短時間ごとの離散フーリエ変換により周波数領域信号Ｘ（ω）に変換される。ここで、周波数領域に変換された信号は一般に複素数であり、Ｘ（ω）＝Ｘ_r（ω）＋ｊＸ_i（ω）（ここで、Ｘ_r、Ｘ_iはそれぞれＸ（ω）の実数部と虚数部）とする。周波数領域信号Ｘ〈ω）は、入力信号パワースペクトル計算部５２、雑音パワースペクトル推定部１１、損失挿入部５５に転送される。以下、Ｘ（ω）に対する処理の流れを説明する。 The best mode for carrying out the invention will be described with reference to the embodiment of FIG.
Reference numeral 11 denotes a noise power spectrum estimation unit, 12 denotes a second frequency domain conversion unit, 13 denotes a noise power spectrum estimation unit, 14 denotes a masking threshold estimation unit, and 15 denotes a corrected noise power spectrum calculation unit. For the rest, the same common symbols as in FIG. 5 were used.
The operation of the noise reduction device 10 according to the present invention will be described. First, an input signal X (k) mixed with a target signal on the near end side and unnecessary ambient noise or the like is converted into a frequency domain signal X (ω by a discrete Fourier transform, for example, every short time in the first frequency domain conversion unit 51. ). Here, the signal converted into the frequency domain is generally a complex number, and X (ω) = X _r (ω) + jX _i (ω) (where X _r and X _i are respectively the real part of X (ω) and Imaginary part). The frequency domain signal X <ω) is transferred to the input signal power spectrum calculation unit 52, the noise power spectrum estimation unit 11, and the loss insertion unit 55. Hereinafter, the flow of processing for X (ω) will be described.

入力信号パワースペクトル計算部５２では、転送されてきた周波数領域信号Ｘ（ω）のパワースペクトルＰ_x（ω）＝(Ｘ_r(ω))²＋(Ｘ_i(ω))²が計算される。次に、パワースペクトルＰ_x（ω）は予め定められた時間について平均され、Ｐａｖ_x（ω）として損失値計算部１６に転送される。時間平均は、例えば、フレーム毎の処理において、現処理フレー
ムを（・）_nとし、平均時間をｍフレームとすると、
Ｐａｖ_x、_n（ω）＝（１／Ａ）Σ_mγ_mＰ_x、_n-m（ω）
と計算される。ここで、γ_mは例えば、γ_m＝（γ）^mと表わされるような指数重み付けの係数で、（γ＜１）、Ａは（１／Ａ）Σ_mγ_m＝１とする正規化のための定数である。 The input signal power spectrum calculation unit 52 calculates the power spectrum P _x (ω) = (X _r (ω)) ² + (X _i (ω)) ^{2 of} the transferred frequency domain signal X (ω). . Next, the power spectrum P _x (ω) is averaged over a predetermined time and transferred to the loss value calculation unit 16 as Pav _x (ω). For example, in the processing for each frame, if the current processing frame is (·) _n and the average time is m frames,
Pav _x , _n (ω) = (1 / A) Σ _m γ _m P _x , _nm (ω)
Is calculated. Here, γ _m is an exponential weighting coefficient expressed as, for example, γ _m = (γ) ^m , (γ <1), and A is a normalization of (1 / A) Σ _m γ _m = 1. Is a constant for

雑音パワースペクトル推定部１１においては転送されてきた周波数領域信号Ｘ（ω）を用いて雑音パワースペクトルＰａｖ_N（ω）の推定が行われ、補正雑音パワースペクトル計算部１５に送られる。雑音パワースペクトル推定には、例えば、最小値を更新してその値を雑音パワースペクトルと推定する方法を用いる。この方法は、先ず、第１の周波数領域変換部５１より転送されてきた周波数領域信号Ｘ（ω）のパワースペクトルＰ_x（ω）を求める。次に、パワースペクトルＰ_x（ω）を予め決められた時間について平均し、Ｐａｖ’_x（ω）とする。次に、求めた現処理フレームの平均パワースペクトルＰａｖ’_x、_n（ω）と１フレーム前のＰａｖ’_x、_n-1（ω）を比較し、Ｐａｖ’_x、_n（ω）の方が小さければ雑音パワースペクトルＰａｖ_N、_n（ω）＝Ｐａｖ’_x、_n（ω）とし、大きければＰａｖ_N、_n（ω）＝Ｐａｖ’_x、_n-1（ω）としてＰａｖ_N、_n（ω）の時間的最小値を雑音パワースペクトルＰａｖ_N、_n（ω）と推定する。 The noise power spectrum estimation unit 11 estimates the noise power spectrum Pav _N (ω) using the transferred frequency domain signal X (ω) and sends it to the corrected noise power spectrum calculation unit 15. For the noise power spectrum estimation, for example, a method of updating the minimum value and estimating the value as the noise power spectrum is used. In this method, first, the power spectrum P _x (ω) of the frequency domain signal X (ω) transferred from the first frequency domain converter 51 is obtained. Next, the power spectrum P _x (ω) is averaged over a predetermined time to obtain Pav ′ _x (ω). Next, the average power spectrum Pav ′ _x , _n (ω) of the current processing frame obtained is compared with Pav ′ _x , _n−1 (ω) one frame before, and Pav ′ _x , _n (ω) is Noise power spectrum Pav _N , _n (ω) = Pav ′ _x , _n (ω) if small, and Pav _N , _n (ω) = Pav ′ _x , _n−1 (ω) if large, Pav _N , _n (ω ) Is estimated as the noise power spectrum Pav _N , _n (ω).

一方、遠端側マイクロホンにより受音された、遠端側の目的信号ど不要な周囲雑音等の混入する入力信号Ｘ_r（ｋ）は、雑音処理装置の第２の周波数領域変換部１２に転送される。第２の周波数領域変換部１２においては、例えば、短時間毎の離散フーリエ変換により周波数領域信号Ｘ_r（ω）に変換される。Ｘ_r（ω）は雑音パワースペクトル推定部１３に送られる。
雑音パワースペクトル推定部１３においては転送されてきたＸ_r（ω）を用いて雑音パワースペクトルＰａｖｒ_N（ω）の推定が行われる。雑音パワースペクトル推定には、例えば雑音パワースペクトル推定部１１と同様の最小値を更新してその値を雑音パワースペクトルと推定する方法を用いる。求めた雑音パワースペクトルＰａｖｒ_N（ω）はマスキング閾値推定部１４に送られる。 On the other hand, the input signal X _r (k), which is received by the far-end microphone and mixed with unnecessary ambient noise or the like of the far-end target signal, is transferred to the second frequency domain converter 12 of the noise processing device. Is done. In the second frequency domain transforming unit 12, for example, the frequency domain signal X _r (ω) is transformed by discrete Fourier transform every short time. X _r (ω) is sent to the noise power spectrum estimation unit 13.
The noise power spectrum estimation unit 13 estimates the noise power spectrum Pavr _N (ω) using the transferred X _r (ω). For the noise power spectrum estimation, for example, the same minimum value as that of the noise power spectrum estimation unit 11 is updated and the value is estimated as the noise power spectrum. The obtained noise power spectrum Pavr _N (ω) is sent to the masking threshold value estimation unit 14.

マスキング閾値推定部１４は受話側の周囲雑音等がマスキングする量を求める。即ち、雑音パワースペクトル推定部１３より転送された雑音パワースペクトルＰａｖｒ_N（ω）がマスカーとなり、マスキングする閾値Ｐ_t（ω）を求める。周囲雑音をエアコンの稼動音のような定常雑音と仮定すると、マスキング閾値の計算には、例えば、境久雄編著「聴覚と音響心理」ｐ．１１１−１１３に説明されている白色雑音による純音のマスキング曲線を用いることができる。これにより、雑音パワースペクトルＰａｖｒ_N（ω）がマスカーとなり、マスキングする閾値Ｐｔ（ω）を求めることができる。求めたマスキング閾値Ｐ_t（ω）は補正雑音パワースペクトル計算部１５に送られる。 The masking threshold value estimation unit 14 obtains the amount that the ambient noise on the receiving side masks. That is, the noise power spectrum Pavr _N (ω) transferred from the noise power spectrum estimation unit 13 becomes a masker, and a threshold value P _t (ω) for masking is obtained. Assuming that the ambient noise is stationary noise such as an air conditioner operating sound, the masking threshold value is calculated by, for example, “Hearing and Acoustic Psychology” edited by Hisao Sakai p. The pure tone masking curve described in 111-113 can be used. Thereby, the noise power spectrum Pavr _N (ω) becomes a masker, and the threshold value Pt (ω) for masking can be obtained. The obtained masking threshold value P _t (ω) is sent to the corrected noise power spectrum calculation unit 15.

補正雑音パワースペクトル計算部１５においては、マスキング閾値Ｐｔ（ω）を用いて、低減すべき補正雑音パワースペクトルＰ’ａｖ_N（ω）を求める。処理の流れ図を図２に示す。先ず、マスキング閾値推定部１４より転送されてきたマスキング閾値Ｐｔ（ω）と雑音パワースペクトル推定部１１より転送されてきた雑音パワースペクトルＰａｖ_N（ω）とを比較する。雑音パワースペクトルＰａｖ_N（ω）が大きいときは、補正雑音パワースペクトルをＰ’ａｖ_N（ω）＝αＰａｖ_N（ω）−Ｐｔ（ω）とする。ここで、αは推定した雑音パウースペクトルの分散（推定誤差）を補正するための係数である。
雑音パワースペクトルＰａｖ_N（ω）が小さいときは、混入した雑音はマスキングされると考えられるので、補正雑音パワースペクトルをＰ’ａｖ_N（ω）＝０とする。補正雑音パワースペクトルＰ’ａｖ_N（ω）は損失計算部１６に送られる。 The corrected noise power spectrum calculation unit 15 obtains a corrected noise power spectrum P′av _N (ω) to be reduced using the masking threshold value Pt (ω). A flow chart of the processing is shown in FIG. First, the masking threshold value Pt (ω) transferred from the masking threshold value estimation unit 14 is compared with the noise power spectrum Pav _N (ω) transferred from the noise power spectrum estimation unit 11. When the noise power spectrum Pav _N (ω) is large, the correction noise power spectrum is set to P′av _N (ω) = αPav _N (ω) −Pt (ω). Here, α is a coefficient for correcting the variance (estimation error) of the estimated noise pow spectrum.
When the noise power spectrum Pav _N (ω) is small, the mixed noise is considered to be masked, so the corrected noise power spectrum is set to P′av _N (ω) = 0. The corrected noise power spectrum P′av _N (ω) is sent to the loss calculator 16.

損失計算部１６においては、補主雑音パワースペクトル計算部１５より送られてきた補正雑音パワースペクトルＰ’ａｖ_N（ω）と入力信号パワースペクトル計算部より送られ
てきた入力信号のパワースペクトルＰａｖ_N（ω）とより、雑音を低減するための損失値Ｌ（ω）を求める。損失値Ｌ（ω）は、例えば、簡易なスペクトラルサブトラクション法により、Ｌ_k（ω）＝√（(Ｐａｖ_N（ω）)²−(Ｐ’ａｖ_N、（ω）)²）／（Ｐａｖ_N（ω）)²で求められる。損失値Ｌ_k（ω）は損失挿入部５５に送られる。
損失痛入部で５５は、損失計算部１６より転送されてきた損失値Ｌ_k（ω）を周波数領域信号Ｘ（ω）に乗じて、雑音を低減した信号Ｙ（ω）を出力する。次いで、時間領域変換部５６は、この雑音を低減した信号Ｙ（ω）を入力して時間領域の信号に変換して出力する。 In the loss calculator 16, the corrected noise power spectrum P′av _N (ω) sent from the complementary noise power spectrum calculator 15 and the power spectrum Pav _{N of} the input signal sent from the input signal power spectrum calculator. From (ω), a loss value L (ω) for reducing noise is obtained. The loss value L (ω) is calculated by, for example, L _k (ω) = √ ((Pav _N (ω)) ² − (P′av _N , (ω)) ² ) / (Pav _N ) by a simple spectral subtraction method. (Ω)) ² The loss value L _k (ω) is sent to the loss insertion unit 55.
The loss penetration unit 55 multiplies the frequency domain signal X (ω) by the loss value L _k (ω) transferred from the loss calculation unit 16 and outputs a signal Y (ω) with reduced noise. Next, the time domain converter 56 receives the signal Y (ω) with reduced noise, converts it into a time domain signal, and outputs it.

図４はこの発明の雑音低減装置を拡声通話装置に適用したところを示す図である。この発明は、近端側で混入した雑音を遠端側でのマスキング効果を考え、遠端側で低減している。これは近端側と遠端側を入れ替えても成り立つので、近端側にもこの発明による雑音低減装置を用いることで、両側の雑音を低減することができる。
この発明にかかる雑音低減装置の各ブロックの処理を、DSP（Digital Signal Processor）により行うことができる。また、コンピュータによりプログラムを実行させることにより実行することができる。この場合は、そのプログラムはＣＤ−ＲＯＭ、フレキシブルディスク、磁気ディスクその他の記録媒体に記録されたものをコンピュータ内のプログラム用メモリに取り込んで行うことになる。このプログラム用メモリには、通信によりプログラムをダウンロードさせてもよい。 FIG. 4 is a diagram showing the application of the noise reduction apparatus of the present invention to a loudspeaker. In the present invention, noise mixed on the near end side is reduced on the far end side in consideration of the masking effect on the far end side. Since this is true even if the near end side and the far end side are interchanged, the noise on both sides can be reduced by using the noise reduction device according to the present invention also on the near end side.
The processing of each block of the noise reduction device according to the present invention can be performed by a DSP (Digital Signal Processor). The program can be executed by causing the computer to execute the program. In this case, the program is recorded by being recorded on a CD-ROM, flexible disk, magnetic disk or other recording medium into a program memory in the computer. The program memory may be downloaded by communication.

実施例を説明する図。The figure explaining an Example. この発明による補正雑音パワースペクトルの導出方法を説明するフロー図。The flowchart explaining the derivation method of the correction noise power spectrum by this invention. 拡声通話装置を説明する図。The figure explaining a loudspeaker apparatus. この発明による雑音低減装置を適用した拡声通話装置を説明する図。The figure explaining the loudspeaker apparatus to which the noise reduction apparatus by this invention is applied. 従来例を説明する図。The figure explaining a prior art example.

Explanation of symbols

１０雑音低減装置１１雑音パワースペクトル推定部
１２周波数領域変換部１３雑音パワースペクトル推定部
１４マスキング閾値推定部１５補正雑音パワースペクトル計算部
１６損失計算部３１近端側マイクロホン
３２近端側スピーカ３３遠端側マイクロホン
３４遠端側スピーカ３５伝送路
３６送話者３７受話者
５０雑音低減装置５１周波数領域変換部
５２入力信号パワースペクトル計算部５３雑音パワースペクトル計算部
５４損失計算部５５損失挿入部
５６時間領域変換部 DESCRIPTION OF SYMBOLS 10 Noise reduction apparatus 11 Noise power spectrum estimation part 12 Frequency domain conversion part 13 Noise power spectrum estimation part 14 Masking threshold value estimation part 15 Correction noise power spectrum calculation part 16 Loss calculation part 31 Near end side microphone 32 Near end side speaker 33 Far end Side microphone 34 Far end speaker 35 Transmission path 36 Speaker 37 Receiver 50 Noise reduction device 51 Frequency domain converter 52 Input signal power spectrum calculator 53 Noise power spectrum calculator 54 Loss calculator 55 Loss inserter 56 Time domain Conversion unit

Claims

In the noise reduction processing method of inputting the voice noise mixed signal in which the target voice signal and unnecessary noise signal are mixed, and outputting the noise reduced signal with reduced noise signal,
A noise reduction processing method characterized by estimating a masking threshold value formed by ambient noise in an environment where a noise-reduced signal is reproduced, and reducing only an amount of mixed noise exceeding the estimated masking threshold value.

The noise reduction processing method according to claim 1,
Converts input audio noise mixed signal to frequency domain signal,
Calculate the signal power spectrum of the mixed audio and noise signal for the mixed audio and noise signal converted to the frequency domain signal,
Calculate the noise power spectrum of the mixed audio and noise signal using the mixed audio and noise signal converted to the frequency domain signal,
On the other hand, the reproduction environment sound collection signal collected in the environment where the noise-reduced signal is reproduced is converted into a frequency domain signal,
Calculate the noise power spectrum of the reproduction environment sound signal using the reproduction environment sound signal converted into the frequency domain signal,
Calculate the masking threshold in the frequency domain from the noise power spectrum of the reproduction environment sound signal,
Comparing the masking threshold and the noise power spectrum of the voice noise mixed signal, calculating the corrected noise power spectrum of the voice noise mixed signal exceeding the masking threshold,
Using the input signal power spectrum of the voice noise mixed signal and the corrected noise power spectrum of the voice noise mixed signal, the ratio of the corrected voice noise mixed signal noise is predicted, and based on this, the loss value is calculated.
Insert a loss value into the audio / noise mixed signal converted to the frequency domain signal, and output a signal with reduced noise,
A noise reduction processing method characterized by converting a noise-reduced signal into a time-domain audio signal.

In the noise reduction processing device that inputs the voice noise mixed signal in which the target voice signal and unnecessary noise signal are mixed, and outputs the noise reduced signal with reduced noise signal,
A noise reduction processing apparatus characterized by estimating a masking threshold value formed by ambient noise in an environment where a noise-reduced signal is reproduced, and reducing only an amount of mixed noise exceeding the masking threshold value.

In the noise reduction processing apparatus according to claim 3,
A first frequency domain converter that converts an input audio noise mixed signal into a frequency domain signal;
A voice noise mixed signal input signal power spectrum calculation unit for calculating a signal power spectrum of the voice noise mixed signal with respect to the voice noise mixed signal converted into the frequency domain signal;
A noise power spectrum estimator for a voice noise mixed signal that estimates a noise power spectrum of the voice noise mixed signal using the voice noise mixed signal converted into a frequency domain signal;
A second frequency domain conversion unit that converts a reproduction environment sound collection signal collected in an environment where a noise-reduced signal is reproduced, into a frequency domain signal;
A noise power spectrum estimator for a reproduction environment sound collection signal for estimating a noise power spectrum using the reproduction environment sound collection signal converted into a frequency domain signal;
A masking threshold estimator for estimating a masking threshold in the frequency domain from the reproduction environment sound pickup signal noise power spectrum;
Comparing the masking threshold and the noise power spectrum of the voice noise mixed signal and calculating the voice noise mixed signal noise power spectrum exceeding the masking threshold;
A loss calculation unit that predicts the ratio of the corrected voice noise mixed signal noise using the input signal power spectrum of the voice noise mixed signal and the corrected noise power spectrum of the voice noise mixed signal, and determines a loss value based on this,
A loss insertion unit that inserts a loss value into a voice noise mixed signal converted into a frequency domain signal and outputs a signal with reduced noise;
A noise reduction processing apparatus comprising: a time domain conversion unit that converts a noise-reduced signal into a time domain signal.

Converts input audio noise mixed signal to frequency domain signal,
Calculate the signal power spectrum of the mixed audio and noise signal for the mixed audio and noise signal converted to the frequency domain signal,
Calculate the noise power spectrum of the mixed audio and noise signal using the mixed audio and noise signal converted to the frequency domain signal,
On the other hand, the reproduction environment sound collection signal collected in the environment where the noise-reduced signal is reproduced is converted into a frequency domain signal,
Calculate the noise power spectrum of the reproduction environment sound signal using the reproduction environment sound signal converted into the frequency domain signal,
Calculate the masking threshold in the frequency domain from the noise power spectrum of the reproduction environment sound signal,
Comparing the masking threshold and the noise power spectrum of the voice noise mixed signal, calculating the corrected noise power spectrum of the voice noise mixed signal exceeding the masking threshold,
Using the input signal power spectrum of the voice noise mixed signal and the corrected noise power spectrum of the voice noise mixed signal, the ratio of the corrected voice noise mixed signal noise is predicted, and based on this, the loss value is calculated.
Insert a loss value into the audio / noise mixed signal converted to the frequency domain signal, and output a signal with reduced noise,
Convert noise-reduced signals to time-domain audio signals,
A noise reduction processing program that gives commands to a computer.

A recording medium on which the noise reduction processing program according to claim 5 is recorded.