JP2000082999A

JP2000082999A - Noise reduction processing method/device and program storage medium

Info

Publication number: JP2000082999A
Application number: JP10252709A
Authority: JP
Inventors: Jiyunko Sasaki; 潤子佐々木; Yoichi Haneda; 陽一羽田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1998-09-07
Filing date: 1998-09-07
Publication date: 2000-03-21
Anticipated expiration: 2018-09-07
Also published as: JP3459363B2

Abstract

PROBLEM TO BE SOLVED: To reduce a processing delay without being affected by S/N and to reduce deterioration in sound quality in terms of listening. SOLUTION: An input signal is divided into a plurality of frequency bands (22) and power is calculated at every band (24). A noise rate against an instantaneous objective sound signal is estimated by using respective band input signals (201). The gain factors of the respective bands are decided based on instantaneous SN (27). Noise is reduced by using the gain factors against the respective band input signals (28), the long time average value of instantaneous S/N is calculated (52), an addition rate is decided based on the long time average value (53), the respective band input signals and the noise reduction signal are added at the addition rate (54) and the addition signal is converted into a time area signal for the whole frequency band (29).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、音声会議装置・
ＴＶ会議装置等の音声／音響装置等において、目的とな
る信号と不要な雑音等の信号が混在する入力信号から、
雑音を低減した信号を出力する雑音低減処理方法、その
装置及びプログラム記憶媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention
In a voice / sound device such as a TV conference device, an input signal in which a signal of interest and a signal of unnecessary noise and the like are mixed,
The present invention relates to a noise reduction processing method for outputting a signal with reduced noise, a device therefor, and a program storage medium.

【０００２】[0002]

【従来の技術】音声会議・ＴＶ会議等の拡声通話系で
は、マイクロホンで受音し、相手側に送出される送話信
号に目的となる音声以外の周囲雑音等が混入すると、音
声の明瞭性が損なわれ通話品質が著しく劣化する。この
為、送話信号に含まれる目的音声以外の周囲雑音を低減
する事が強く求められている。2. Description of the Related Art In a voice communication system such as a voice conference or a TV conference, if a surrounding signal other than a target voice is mixed in a transmission signal received by a microphone and transmitted to a partner side, the clarity of the voice is increased. And the call quality is significantly degraded. For this reason, there is a strong demand for reducing ambient noise other than the target voice included in the transmission signal.

【０００３】雑音低減方法とは、目的となる音声信号と
不要な周囲雑音等の信号が混在する入力信号から、雑音
を低減した信号を出力する技術である。図４は収音シス
テムを示すもので、これを用いて従来の雑音低減方法・
装置を説明する。この明細書においては、信号の時間表
現は離散時間を表わす整数値ｎを用いて、例えばＸ
（ｎ）と表わす。[0003] The noise reduction method is a technique of outputting a noise-reduced signal from an input signal in which a target voice signal and a signal such as unnecessary ambient noise are mixed. FIG. 4 shows a sound pick-up system, which is used for a conventional noise reduction method.
The device will be described. In this specification, the time representation of a signal uses an integer value n representing a discrete time, for example, X
(N).

【０００４】マイクロホン１１から離れた位置にある発
声者１２が発声した目的とする音声信号１３をＳ
（ｎ）、空調などの不要な周囲雑音１４をＮ（ｎ）、こ
れら音声信号１３と雑音１４とがマイクロホン１１で受
音されて雑音低減装置１６へ入力される入力信号１５を
Ｘ（ｎ）、雑音低減装置１６の出力信号１７をＹ（ｎ）
とする。雑音低減装置１６への入力信号Ｘ（ｎ）には、
目的となる音声信号Ｓ（ｎ）以外に周囲雑音Ｎ（ｎ）が
混入している。即ちＸ（ｎ）＝Ｓ（ｎ）＋Ｎ（ｎ）（１）と表わされる。この時、入力信号Ｘ（ｎ）中の雑音Ｎ
（ｎ）を低減し、目的となる音声信号Ｓ（ｎ）に近い信
号を出力信号Ｙ（ｎ）として取り出す装置を雑音低減装
置と呼ぶ。[0004] A target voice signal 13 uttered by a speaker 12 located at a position distant from the microphone 11 is represented by S
(N), N (n) represents unnecessary ambient noise 14 such as air-conditioning, and X (n) represents the input signal 15 which is received by the microphone 11 and input to the noise reduction device 16. , The output signal 17 of the noise reduction device 16 is represented by Y (n).
And The input signal X (n) to the noise reduction device 16 includes:
Ambient noise N (n) is mixed in addition to the target audio signal S (n). That is, X (n) = S (n) + N (n) (1) At this time, the noise N in the input signal X (n)
A device that reduces (n) and extracts a signal close to the target audio signal S (n) as an output signal Y (n) is called a noise reduction device.

【０００５】図５は、Spectrum Subtraction（S.F.Bol
l, IEEE Trans. on ASSP, vol.27, no.2, pp.113-120,
Apr(1979)). Wiener Filer(J.S.Lim. & A.V.Oppenheim,
in Proc. IEEE, vol.67, no.12, pp.1586-1604, Dec(1
979)). Maximum Likelihood Envelop（R.J.McAulay &
M.L.Malpass, IEEE Trans. on ASSP, vol.28, no.2, p
p.137-145, Apr(1980)). minimum mean squared error
method(MMSE)(Y.Ephraim& D.Malah, IEEE Trans. on AS
SP, vol.32, no.6, pp.1109-1121, Dec(1984)).等の短
時間スペクトラル振幅（ＳＴＳＡ）評価（Short Time S
pectral Amplitude(STSA) Estimation）を基礎とした雑
音低減方式で、従来使用されている方法の機能構成を示
すものである。これを用いて従来の雑音低減方法を説明
する。図４と同一の要素には共通の記号を用いた。FIG. 5 shows Spectrum Subtraction (SFBol
l, IEEE Trans. on ASSP, vol.27, no.2, pp.113-120,
Apr (1979)). Wiener Filer (JSLim. & AVOppenheim,
in Proc.IEEE, vol.67, no.12, pp.1586-1604, Dec (1
979)). Maximum Likelihood Envelop (RJMcAulay &
MLMalpass, IEEE Trans.on ASSP, vol.28, no.2, p
p.137-145, Apr (1980)). minimum mean squared error
method (MMSE) (Y.Ephraim & D.Malah, IEEE Trans.on AS
SP, vol.32, no.6, pp.1109-1121, Dec (1984)) etc. (Short Time S)
This is a noise reduction method based on Spectral Amplitude (STSA) Estimation, and shows the functional configuration of a conventionally used method. Using this, a conventional noise reduction method will be described. Common symbols are used for the same elements as in FIG.

【０００６】まず、マイクロホン１１により受音され
た、目的信号と不要な雑音とが混入する入力信号１５を
Ａ／Ｄ変換部２１においてデジタル化し、周波数帯域分
割部２２と雑音判別部２３に転送する。周波数帯域分割
部２２では、転送された信号がＭ個の周波数帯域に分割
される。周波数帯域への分割は、例えば離散的フーリエ
変換等を用いて行う。ここで、帯域分割された信号は一
般に複素数であるが、分割方法によっては実数となる場
合もある。ここでは一般的に、複素数を仮定して議論す
るが、実数の場合も同じ議論が可能である。周波数帯域
に分割されたｋ番目の周波数帯域の信号をＸ_k(n) ＝Ｘ_k,r(n) ＋ｊＸ_k,i(n) （２）（Ｘ_k,r，Ｘ_k,iはそれぞれＸ_k(n) の実数部分と虚数
部分）とすると、Ｘ_k(n) は、入力信号パワー計算部２
４、入力信号位相計算部２５、雑音パワー計算部２６に
転送される。入力信号パワー計算部２４では、各帯域ご
との入力信号のパワーレベルＰ_X,k(n) ＝Ｘ_k,r(n)²＋Ｘ_k,i(n)² （３）が、入力信号位相計算部２５では各帯域ごとの位相 Φ_k(n) ＝tan ^-1［Ｘ_k,i(n) ／Ｘ_k,r(n) ］（４）それぞれが計算される。その後Ｐ_X,k(n) は瞬時Ｓ／Ｎ
比推定部２０１およびゲインファクター挿入部２８に転
送され、Φ_k(n) は時間領域変換部２９に転送される。
一方、雑音判別部２３ではＡ／Ｄ変換部２１から転送さ
れたきたＸ（ｎ）に対して、まずパワーレベルＰ_X(n) ＝Σ｛Ｘ(n-k) ｝² （５） Σはｋ＝０からＬ−１までが計算される。ここで、Ｌは積分時間を表わす。次に例
えば予め決められたしきい値Ｐ_thに対し、Ｐ_X(n) ＜Ｐ_th （６）の判定が行われ、この条件式を満たした場合には、雑音
であると判別する。雑音パワー計算部２６では、雑音判
別部２３において入力信号Ｘ（ｎ）が雑音であると判定
された時のみ、雑音の各帯域ごとのパワーレベルをＰ_N,k(n) ＝Ｘ_k,r(n)²＋Ｘ_k,i(n)² （７）として計算し、その時間平均Ｐav_N,k(n) を瞬時Ｓ／Ｎ
比推定部２０１に転送する。時間平均は、例えばＰav_N,k(n) ＝（１／Ａ）Σ_mγ_mＰ_N,k（ｎ−ｍ）（８）と計算される。ここでγ_mは例えば、 γ_m＝（γ）^m （９）と表わされるような指数重みづけの係数で（γ＜１）、
Ａは（１／Ａ）Σ_mγ_m＝１（10）となる正規化の為の定数である。First, an A / D converter 21 digitizes an input signal 15 mixed with a target signal and unnecessary noise received by a microphone 11, and transfers the digitized signal to a frequency band division unit 22 and a noise discrimination unit 23. . In the frequency band dividing unit 22, the transferred signal is divided into M frequency bands. Division into frequency bands is performed using, for example, a discrete Fourier transform or the like. Here, the band-divided signal is generally a complex number, but may be a real number depending on the division method. Here, in general, discussion will be made assuming a complex number, but the same argument can be made for a real number. X _k (n) = X _{k, r} (n) + jX _{k, i} (n) (2) where X _{k, r} , X _{k, i} are X _k (n), the real part and the imaginary part of X (n)), X _k (n) is
4. The signal is transferred to the input signal phase calculator 25 and the noise power calculator 26. The input signal power calculator 24 calculates the input signal power level P _{X, k} (n) = X _{k, r} (n) ² + X _{k, i} (n) ² (3) for each band. The unit 25 calculates the phase Φ _k (n) = tan ^-1 [X _{k, i} (n) / X _{k, r} (n)] (4) for each band. Then, P _{X, k} (n) becomes instantaneous S / N
Φ _k (n) is transferred to the ratio estimating unit 201 and the gain factor inserting unit 28, and Φ _k (n) is transferred to the time domain transforming unit 29.
On the other hand, in the noise discrimination unit 23, first, for X (n) transferred from the A / D conversion unit 21, the power level P _X (n) = {X (nk) ² (5)} is k = From 0 to L-1 are calculated. Here, L represents the integration time. Next, for example, a determination of P _X (n) <P _th (6) is performed for a predetermined threshold value P _th , and if this conditional expression is satisfied, it is determined that the noise is present. The noise power calculator 26 sets the power level of each band of noise to P _{N, k} (n) = X _{k, r} only when the noise discriminator 23 determines that the input signal X (n) is noise. (n) ² + X _{k, i} (n) ² (7), and the time average Pav _{N, k} (n) is calculated as the instantaneous S / N
Transfer to ratio estimating section 201. Time average, for example _{Pav N, k (n) =} (1 / A) Σ m γ m P N, is calculated as k (n-m) (8 ). Here, γ _m is, for example, an exponentially weighted coefficient expressed as γ _m = (γ) ^m (9) (γ <1),
A is a constant for normalization to be _{(1 / A) Σ m γ} m = 1 (10).

【０００７】瞬時Ｓ／Ｎ比推定部２０１では、各帯域ご
とに入力信号パワー計算部２４で計算されたＰ_X,k(n)
及び雑音パワー計算部２６で計算された雑音パワーＰav
_N,k(n) を用いて目的音声信号対雑音信号の比率である
Ｓ／Ｎ比が推定される。瞬時Ｓ／Ｎ比推定部２０１につ
いては、処理の流れ図を図６に記した。詳しい処理の説
明はこの図を用いて行う。In the instantaneous S / N ratio estimator 201, P _{X, k} (n) calculated by the input signal power calculator 24 for each band.
And the noise power Pav calculated by the noise power calculator 26
_The S / N ratio, which is the ratio of the target speech signal to the noise signal, is estimated using _{N, k} (n). FIG. 6 shows a flowchart of the processing for the instantaneous S / N ratio estimating unit 201. Detailed description of the processing will be made using this figure.

【０００８】まず、ステップ３１において、ＳＮＲ_k(n) ′＝Ｐ_X,k(n) ／Ｐ_N,k(n) （11）で定義されるＳＮＲ_k(n) ′を計算する。次にステップ
３２において、決定したＳＮＲ_k(n) ′に対して、（１
３）式で表わされる一時刻前の推定値（雑音低減された
パワー）Ｐ_Y,k（ｎ−１）を用いて平均化してＳＮＲ_k
(n) とする。即ちＳＮＲ_k(n) ＝（１−β）Ｐ［ＳＮＲ_k(n) ′−１］＋β［Ｐ_Y,k（ｎ−１）／Ｐ_N,k（ｎ−１）］（12）とする。Ｐ［＊］は＊が正なら＊を、＊が負なら０をと
る。このＳＮＲ_k(n) 、必要に応じてＳＮＲ_k(n) ′
は、ゲインファクター計算部２７に転送される。[0008] First, in step 31, calculates the _{SNR k (n) '= P} X, k (n) / P N, k (n) SNR k defined in (11) (n)'. Next, in step 32, for the determined SNR _k (n) ′, (1
SNR _k by averaging using the estimated value (noise reduced power) P _{Y, k} (n-1) one time ago represented by the expression (3)
(n). That is, SNR _k (n) = (1−β) P [SNR _k (n) ′ − 1] + β [P _{Y, k} (n−1) / P _{N, k} (n−1)] (12) . P [*] takes * if * is positive and 0 if * is negative. This SNR _k (n) and, if necessary, SNR _k (n) ′
Is transferred to the gain factor calculator 27.

【０００９】ゲインファクター計算部２７では、瞬時Ｓ
／Ｎ比推定部２０１より転送されてきたＳＮＲ_k(n) 、
場合によると、これとＳＮＲ_k(n) ′を用いて、各雑音
低減方式で定義されている各周波数におけるゲインファ
クターＧ（ＳＮＲ_k(n) ）を計算する。このゲインファ
クターＧ（ＳＮＲ_k(n) ）は、ゲインファクター挿入部
２８に転送される。図７に各手法によるゲインファクタ
ーを表わす。つまり図５中の上３つの手法ではＳＮＲ_k
(n) のみを用いてゲインファクターを求めるが、最も下
のＭＭＳＥ法による時は、ＳＮＲ_k(n) の他にＳＮＲ_k
(n) ′を用いる。In the gain factor calculation unit 27, the instantaneous S
SNR _k (n) transferred from the / N ratio estimation unit 201,
In some cases, using this and SNR _k (n) ′, a gain factor G (SNR _k (n)) at each frequency defined in each noise reduction scheme is calculated. The gain factor G (SNR _k (n)) is transferred to the gain factor insertion unit 28. FIG. 7 shows a gain factor according to each method. SNR _k is i.e. three methods above in FIG. 5
(n) is used to determine the gain factor, but when the lowest MMSE method is used, SNR _k (n) and SNR _k
(n) 'is used.

【００１０】ゲインファクター挿入部２８では、各帯域
ごとに、ゲインファクター計算部２７において計算され
たゲインファクターを用いて雑音低減を行う。即ち入力
信号パワー計算部２４より転送されてきた帯域信号Ｐ
_X,k(n) に対して、Ｐ_Y,k(n) ＝Ｇ（ＳＮＲ_k(n) ）×Ｐ_X,k(n) （13）を演算して雑音を低減した帯域出力Ｐ_Y,k(n) を出力す
る。Ｐ_Y,k(n) は時間領域変換部２９に転送され、入力
信号位相計算部２５から送られてきたΦ_k(n) を用い
て、Ｙ_k(n) ＝Ｙ_k,r(n) ＋ｊＹ_k,i(n) 但し、Ｙ_k,r(n) ＝√Ｐ_Y,k(n) cos ［Φ_k(n) ］Ｙ_k,i(n) ＝√Ｐ_Y,k(n) sin ［Φ_k(n) ］（14）に変換され、全帯域信号に合成され、更に例えば逆離散
的フーリエ変換により時間領域信号に変換される。この
結果を、Ｄ／Ａ変換部３０でアナログ信号にして雑音を
低減した信号１７、Ｙ（ｎ）を出力する。The gain factor insertion unit 28 performs noise reduction for each band using the gain factor calculated by the gain factor calculation unit 27. That is, the band signal P transferred from the input signal power calculator 24
_With respect to _{X, k} (n), a band output P _Y, obtained by calculating P _{Y, k} (n) = G (SNR _k (n)) × P _{X, k} (n) (13) to reduce noise Output _k (n). P _{Y, k} (n) is transferred to the time domain transformation unit 29, and Y _k (n) = Y _{k, r} (n) using Φ _k (n) sent from the input signal phase calculation unit 25. + JY _{k, i} (n) where Y _{k, r} (n) = √P _{Y, k} (n) cos [Φ _k (n)] Y _{k, i} (n) = √P _{Y, k} (n) sin [Φ _k (n)] (14), synthesized into a full-band signal, and further converted into a time-domain signal by, for example, an inverse discrete Fourier transform. The D / A converter 30 converts the result into an analog signal, and outputs a signal 17, Y (n) in which noise is reduced.

【００１１】これらの従来の方式では、２５６〜１０２
４の周波数帯域に分割して処理が行われる事が多いが、
この条件下では、以下の（１５）式で定義されているＳ
／Ｎ比、Ｓ／Ｎ＝１０ log₁₀（Ｐ_S／Ｐ_N）ただし、Ｐ_S：目的信号の平均パワーＰ_N：雑音信号の平均パワー（15）が１５ｄＢ程度以上ある場合には、雑音低減によって音
声の品質が向上するが、１０ｄＢ程度以下の場合には、
雑音は低減されるものの、それに伴い音声信号に歪みが
生じたり、消し残された雑音が時間的に変化するのが原
因で聴感的に悪くなり品質が劣化する。また２５６〜１
０２４の周波数帯域への分割処理は、会議システムにそ
のまま用いると大きな遅延が生じ、会議を行う際に通話
性能が劣化する。これを防ぐ為に、周波数分割の数を減
らし遅延を小さくすると、非線形処理に起因する歪が大
きくなる為、Ｓ／Ｎ比が１５ｄＢ程度でも音声品質の劣
化が起こってしまう。In these conventional systems, 256 to 102
In many cases, processing is performed by dividing into four frequency bands,
Under this condition, S defined by the following equation (15)
/ N _{ratio, S / N = 10 log 10} (P S / P N) , however, P _S: Average object signal power P _N: when the average power of the noise signal (15) has more than about 15dB, the noise reduction Improves the sound quality, but if it is less than about 10 dB,
Although the noise is reduced, the sound signal is distorted in accordance with the noise, and the remaining noise is temporally changed, so that the auditory sense is deteriorated and the quality is deteriorated. Also 256-1
If the process of dividing into frequency bands of 024 is used as it is in a conference system, a large delay occurs, and the call performance deteriorates when a conference is held. If the number of frequency divisions is reduced and the delay is reduced to prevent this, the distortion due to the non-linear processing increases, so that even if the S / N ratio is about 15 dB, the voice quality deteriorates.

【００１２】即ち、従来法では、Ｓ／Ｎ比が悪い場合や
遅延を小さくしたい場合は、効果的に雑音低減する事が
出来ないという問題がある。この為、音声会議装置・Ｔ
Ｖ会議装置等、受聴を目的とし音質が重要でありリアル
タイム性が要求される収音、あるいは、高騒音下の条件
での通話においては、この方法をそのまま適用すること
はできない。That is, the conventional method has a problem that noise cannot be effectively reduced when the S / N ratio is poor or when it is desired to reduce the delay. For this reason, the audio conference device T
This method cannot be applied as it is to a V-conference device or the like for sound collection for which the sound quality is important for the purpose of listening and real-time performance is required, or for calls under high noise conditions.

【００１３】[0013]

【発明が解決しようとする課題】周囲雑音が混入した入
力信号の雑音を低減する方法として雑音の振幅分を音声
信号から減算する従来の方法では、Ｓ／Ｎ比が悪い場合
や、遅延を小さくしたい場合に、音声信号に歪が生じた
り、引き残された雑音が時間的に変化する為に聴感上好
ましくない音をたてるという問題があった。この発明の
目的は、Ｓ／Ｎ比によらず、処理遅延が少なく、聴感上
の音質の劣化が少ない雑音低減処理方法、その装置及び
プログラム記録媒体を提供する事である。As a method for reducing the noise of an input signal mixed with ambient noise, a conventional method of subtracting the amplitude of the noise from a speech signal has a problem that the S / N ratio is poor or the delay is small. In such a case, there is a problem in that the sound signal is distorted, and the noise left behind is changed over time, so that an unpleasant sound is heard. An object of the present invention is to provide a noise reduction processing method, an apparatus thereof, and a program recording medium which have a small processing delay and a small deterioration in audible sound quality regardless of the S / N ratio.

【００１４】[0014]

【課題を解決するための手段】この発明によれば、目的
となる音声信号と周囲雑音などの混在したマイクロホン
での受音信号を複数の帯域に分割し、各々の帯域別の信
号に対し雑音パワーを推定し、推定された雑音パワーと
実際に入力されてきた入力信号パワーとを比較して瞬時
及び長時間平均の音声信号と雑音信号の比率を推定し、
瞬時の音声信号と雑音信号の比率に基づいて計算された
雑音抑圧の為のゲインファクターを、各帯域毎の入力信
号に掛け合わせる事によって雑音を抑圧し、長時間平均
の音声信号と雑音信号の比率に応じて決定した最適量の
帯域別信号を加算する事によって生じる歪をマスキング
し、その結果歪が少ない効果的な雑音低減を可能にす
る。According to the present invention, a sound signal received by a microphone in which a target voice signal and ambient noise are mixed is divided into a plurality of bands, and noise for each band signal is divided. Estimating the power, comparing the estimated noise power with the input signal power that has actually been input, and estimating the ratio of the instantaneous and long-term average speech signal and noise signal,
Noise is suppressed by multiplying the gain factor for noise suppression calculated based on the ratio of the instantaneous speech signal and noise signal to the input signal for each band, and the long-term average speech signal and noise signal Distortion caused by adding the optimal amount of band-specific signals determined according to the ratio is masked, and as a result, effective noise reduction with less distortion is enabled.

【００１５】この発明は、次の様な特徴を持つ。各帯域
別の信号の長時間平均の音声信号と雑音信号の比率を測
定し、適応的に最適量の各帯域別の信号を加算する事に
よって音質の劣下を防ぎ同時に雑音低減効果を最大限に
する事が可能である。The present invention has the following features. Measures the ratio of the long-term average audio signal to noise signal of each band, and adaptively adds the optimal amount of each band signal to prevent deterioration in sound quality and maximize the noise reduction effect at the same time It is possible to

【００１６】[0016]

【発明の実施の形態】図１はこの発明の実施例を示すブ
ロック図であり、これを用いて、この発明の処理手順を
説明する。図４、図５と同じものについては共通の記号
を用いた。雑音パワー推定部５１、長時間Ｓ／Ｎ比計算
部５２、最適入力信号加算率決定部５３、入力信号加算
部５４が設けられる。雑音パワー推定には、従来の方式
で示したものでもよいが、ここでは、特願平８−６８５
４８で示した方式を用いて推定を行う事にして、以下の
説明を行う。FIG. 1 is a block diagram showing an embodiment of the present invention. The processing procedure of the present invention will be described with reference to FIG. 4 and 5, the same symbols are used. A noise power estimation unit 51, a long-time S / N ratio calculation unit 52, an optimum input signal addition rate determination unit 53, and an input signal addition unit 54 are provided. The noise power estimation may be the one shown by the conventional method, but here, here, Japanese Patent Application No. Hei 8-685.
The following description will be made by estimating using the method shown at 48.

【００１７】まず、マイクロホン１１により受音され
た、目的信号と不要な雑音等の混入した入力信号１５を
Ａ／Ｄ変換部２１においてデジタル化し、周波数帯域分
割部２２に転送する。周波数帯域分割部２２では、転送
された信号が周波数帯域に分割される。分割された各帯
域信号は、入力信号パワー計算部２４、雑音パワー推定
部５１、ゲインファクター挿入部２８、入力信号加算部
５４に転送される。以降、ｋ番目の帯域信号をＸ_k(n)
として、Ｘ_k(n) に対する処理の流れを説明する。First, an A / D converter 21 digitizes an input signal 15 mixed with a target signal and unnecessary noise received by a microphone 11 and transfers the digitized signal to a frequency band divider 22. In the frequency band dividing section 22, the transferred signal is divided into frequency bands. Each of the divided band signals is transferred to the input signal power calculator 24, the noise power estimator 51, the gain factor inserter 28, and the input signal adder 54. Hereinafter, the k-th band signal is _represented by X _k (n)
The process flow for X _k (n) will be described.

【００１８】入力信号パワー計算部２４では、転送され
てきたＸ_k(n) のパワーレベルＰ_X,k(n) ＝Ｘ_k,r(n)²＋Ｘ_k,i(n)² （16）を計算し、瞬時Ｓ／Ｎ比推定部２０１に転送される。雑
音パワー推定部５１では転送されてきたＸ_k(n) を用い
て雑音パワーＰａｖ _N,k(n) の推定が行われる。雑音パ
ワー推定部については、処理の流れを図２に記した。詳
しい処理の説明はこの図を用いて行う。The input signal power calculator 24 transfers the
X_k(n) power level P_{X, k}(n) = X_{k, r}(n)^Two+ X_{k, i}(n)^Two (16) is calculated and transferred to the instantaneous S / N ratio estimating unit 201. Miscellaneous
In the sound power estimating unit 51, the transmitted X_kusing (n)
Noise power Pav _{N, k}(n) is estimated. Noise
FIG. 2 shows the processing flow of the word estimation unit. Details
New processing will be described with reference to FIG.

【００１９】まずステップ６１において、転送されてき
たＸ_k(n) のパワーレベルＰ_X,k(n) を（16）式の計算
式を用いて計算する。次にＰ_X,k(n) を予め決められた
時間について、以下の式を用いて平均しＰａｖ_X,k(n)
とする。Ｐａｖ_X,k(n) ＝（１／Ａ）Σ_mγ_mＰ_X,k(n-m) （17）と計算される。ここでγ_mは例えば、 γ_m＝（γ）^m （18）と表わされるような指数重みづけの係数で（γ＜１）、
Ａは（１／Ａ）Σ_mγ_m＝１（19）となる正規化の為の定数である。平均時間としては、例
えば５〜６ｍｓｅｃをとる。First, at step 61, the power level P _{X, k} (n) of the transferred X _k (n) is calculated by using the equation (16). Next, P _{X, k} (n) is averaged over a predetermined time using the following equation, and Pav _{X, k} (n) is averaged.
And _{Pav X, k (n) =} (1 / A) Σ m γ m P X, is calculated as _k (nm) (17). Here, γ _m is, for example, an exponentially weighted coefficient expressed as γ _m = (γ) ^m (18) (γ <1),
A is a constant for normalization to be _{(1 / A) Σ m γ} m = 1 (19). The average time is, for example, 5 to 6 msec.

【００２０】次に、ステップ６２でＰａｖ_X,k(n) のレ
ベル分布のヒストグラムをとる。つまりＰａｖ_X,k(n)
が属するパワー区間の数ｈ_k（int Ｐａｖ_X,k(n) ）を
１加算、即ち、ｈ_k（int Ｐａｖ_X,k(n) ）＝ｈ_k（int Ｐａｖ_X,k(n) ）＋１（20）を計算する。int （＊）は小数点以下を切り捨て整数化
する事を示す。更にステップ６３でヒストグラムｈ
_k（ｉ）のピーク区間が検出され記憶される。即ち前後
の値に対してｈ_k（ｉ′）≦ｈ_k（ｉ）（21）となるｉを求める。このｉの中で最も小さなｉを雑音の
パワーＰａｖ_N,k(n) とする。Ｐａｖ_N,k(n) は瞬時Ｓ
／Ｎ比推定部２０１に転送される。Next, at step 62, a histogram of the level distribution of Pav _{X, k} (n) is _obtained . That is, Pav _{X, k} (n)
Is added to the number h _k (int Pav _{X, k} (n)) of the power section to which it belongs, that is, h _k (int Pav _{X, k} (n)) = h _k (int Pav _{X, k} (n)) + 1 (20) is calculated. int (*) indicates that the decimal part is truncated to an integer. Further, in step 63, the histogram h
The peak section of _k (i) is detected and stored. That is, i that satisfies h _k (i ′) ≦ h _k (i) (21) with respect to values before and after is obtained. The smallest i among the i is set as the noise power Pav _{N, k} (n). Pav _{N, k} (n) is the instantaneous S
/ N ratio estimating section 201.

【００２１】瞬時Ｓ／Ｎ比推定部２０１では、入力信号
パワー計算部２４で計算されたＰ_X, _k(n) 及び雑音パワ
ー推定部５１で推定された雑音パワーＰａｖ_N,k(n) を
用いて、図６に示した方法でＳＮＲ_k(n) ′及びＳＮＲ
_k(n) が推定される。瞬時Ｓ／Ｎ比推定部２０１で推定
されたＳＮＲ_k(n) ′はゲインファクター計算部２７
へ、ＳＮＲ_k(n) は、ゲインファクター計算部２７及び
長時間Ｓ／Ｎ比計算部５２に転送される。The instantaneous S / N ratio estimator 201 calculates P _X, _k (n) calculated by the input signal power calculator 24 and the noise power Pav _{N, k} (n) estimated by the noise power estimator 51. SNR _k (n) ′ and SNR in the manner shown in FIG.
_k (n) is estimated. The SNR _k (n) ′ estimated by the instantaneous S / N ratio estimator 201 is calculated by the gain factor calculator 27
The SNR _k (n) is transferred to the gain factor calculator 27 and the long-time S / N ratio calculator 52.

【００２２】ゲインファクター計算部２７では、瞬時Ｓ
／Ｎ比推定部２０１より転送されてきたＳＮＲ_k(n) ′
及びＳＮＲ_k(n) を用いて、ゲインファクターＧ（ＳＮ
Ｒ_k(n) ）が決定される。ここで、ゲインファクター計
算部２７におけるゲインファクターＧ（ＳＮＲ_k(n) ）
の計算方法として図７で挙げられたＷｉｅｎｅｒＦｉ
ｌｔｅｒｉｎｇ法、エンベロープ最尤推定法（Ｍａｘｉ
ｍｕｍｌｉｋｅｌｉｈｏｏｄｅｎｖｅｌｏｐｅｅ
ｓｔｉｍａｔｉｏｎ）、スペクトル・サブストラクショ
ン法（Ｓｐｅｃｔｒｕｍｓｕｂｓｔｒａｃｔｉｏ
ｎ）、平均自乗誤差最小法（ＭＭＳＥ；ｍｉｎｉｍｕｍ
ｍｅａｎｓｑｕａｒｅｄｅｒｒｏｒｍｅｔｈｏ
ｄ）等が代表的である。ゲインファクター計算部２７で
推定されたゲインファクターＧ（ＳＮＲ_k(n) ）は、ゲ
インファクター挿入部２８に転送される。なお、瞬時Ｓ
／Ｎ比推定部２０１の方法及びゲインファクター計算部
２７の方法は、ここに説明した以外の方法によってもよ
い。In the gain factor calculation section 27, the instantaneous S
SNR _k (n) ′ transferred from the / N ratio estimation unit 201
And SNR _k (n), the gain factor G (SN
R _k (n)) are determined. Here, the gain factor G (SNR _k (n)) in the gain factor calculation unit 27
The calculation method of Wiener Fi shown in FIG.
litering method, envelope maximum likelihood estimation method (Maxi
mum likeelihood envelope e
), spectrum subtraction method (Spectrum subtraction method)
n), minimum mean square error method (MMSE; minimum)
mean squared errormetho
d) and the like are typical. The gain factor G (SNR _k (n)) estimated by the gain factor calculation unit 27 is transferred to the gain factor insertion unit 28. Note that instantaneous S
The method of the / N ratio estimation unit 201 and the method of the gain factor calculation unit 27 may be methods other than those described here.

【００２３】ゲインファクター挿入部２８では、ゲイン
ファクター計算部２７において計算されたゲインファク
ターを用いて雑音低減を行う。即ち周波数帯域分割部２
２より転送されてきた帯域信号Ｘ_k(n) に対して、Ｙ_k′(n) ＝Ｇ（ＳＮＲ_k(n) ）×Ｘ_k(n) （22）を計算し、雑音を低減した信号Ｙ_k′(n) を入力信号加
算部５４に転送する。The gain factor insertion unit 28 performs noise reduction using the gain factor calculated by the gain factor calculation unit 27. That is, the frequency band dividing unit 2
Relative band signal has been transferred from the _{2 X k (n), Y} k '(n) = G (SNR k (n)) × X k (n) (22) is calculated and a signal with reduced noise Y _k ′ (n) is transferred to the input signal adding section 54.

【００２４】長時間Ｓ／Ｎ比計算部５２では、転送され
てきたＳＮＲ_k(n) を、予め決められた時間について、
以下の式を用いて平均し、ＳＮＲａｖ_k(n) とし、最適
入力信号加算率決定部５３に転送する。ＳＮＲａｖ_k(n) ＝（１／Ａ）Σ_mγ_mＳＮＲ_k(n-m) （23）と計算される。ここでγ_mは例えば、 γ_m＝（γ）^m （24）と表わされるような指数重みづけの係数で（γ＜１）、
Ａは（１／Ａ）Σ_mγ_m＝１（25）となる正規化の為の定数である。平均時間としては、例
えば１〜３ｓｅｃをとる。The long-time S / N ratio calculation section 52 calculates the transferred SNR _k (n) for a predetermined time.
Averaging is performed using the following equation to obtain SNRav _k (n), which is transferred to the optimum input signal addition rate determination unit 53. It is calculated as _{SNRav k (n) = (1} / A) Σ m γ m SNR k (nm) (23). Here, γ _m is an exponentially weighted coefficient expressed as γ _m = (γ) ^m (24) (γ <1),
A is a constant for normalization to be _{(1 / A) Σ m γ} m = 1 (25). The average time is, for example, 1 to 3 seconds.

【００２５】最適入力信号加算率決定部５３では、長時
間Ｓ／Ｎ比計算部５２から転送されてきたＳＮＲａｖ_k
(n) に基づき、入力信号の最適入力信号加算率αを決定
する。例えば、周波数帯域分割数３２の場合、１５≦１０ lｏｇ₁₀［ＳＮＲａｖ_k(n) ］ → α＝０．０５１０ｌｏｇ₁₀［ＳＮＲａｖ_k(n) ］＜１５ → α＝０．５（26）の様に決定する。最適入力信号加算率αは、入力信号加
算部５４に転送される。In the optimum input signal addition rate determination section 53, the SNRav _k transferred from the long-time S / N ratio calculation section 52 is used.
Based on (n), the optimum input signal addition rate α of the input signal is determined. For example, when the number of frequency band divisions is 32, 15 ≦ 10 log ₁₀ [SNRav _k (n)] → α = 0.05 10 log ₁₀ [SNRav _k (n)] <15 → α = 0.5 (26) To decide. The optimum input signal addition rate α is transferred to the input signal addition unit 54.

【００２６】入力信号加算部５４では、周波数帯域分割
部２２、ゲインファクター挿入部２８、最適入力信号加
算率決定部５３からそれぞれ転送されてきた帯域信号Ｘ
_k(n) 、雑音を低減した信号Ｙ_k′(n) 及び最適入力信
号加算率αを用いて帯域出力信号Ｙ_k(n) を出力する。
即ち、Ｙ_k(n) ＝α×Ｘ_k(n) ＋（１−α）×Ｙ_k′(n) （27）とする。Ｙ_k(n) は時間領域変換部２９に転送され、時
間領域に変換される。この結果を、Ｄ／Ａ変換部３０で
アナログ信号にして雑音を低減した信号１７，Ｙ(n) を
出力する。In the input signal adding section 54, the band signal X transferred from the frequency band dividing section 22, the gain factor inserting section 28, and the optimum input signal adding rate determining section 53, respectively.
_The band output signal Y _k (n) is output using _k (n), the noise-reduced signal Y _k ′ (n), and the optimum input signal addition rate α.
That is, _Yk (n) = α × _Xk (n) + (1−α) × _Yk ′ (n) (27) Y _k (n) is transferred to the time domain conversion unit 29 and is converted to the time domain. The result is converted into an analog signal by the D / A converter 30 to output a signal 17, Y (n) in which noise is reduced.

【００２７】この発明の有効性を、実際のＴＶ会議シス
テムの状況を想定したオピニオン評価法を用い、評価し
た。評価実験は、体積８７ｍ³、残響時間３００ｍｓｅ
ｃ、暗騒音レベルは３０ｄＢ（Ａ）以下の可変残響室で
行った。評価は２０名の被験者数で行った。作成した刺
激音をスピーカから流し、それを拡声会議における受話
音声として、被験者に品質評価を行ってもらった。周波
数分割数は、一般的な２５６と低処理遅延な３２の２つ
について検討した。The effectiveness of the present invention was evaluated using an opinion evaluation method assuming the situation of an actual TV conference system. The evaluation experiment was performed with a volume of 87 m ³ and a reverberation time of 300 mse.
c, The test was performed in a variable reverberation room having a background noise level of 30 dB (A) or less. The evaluation was performed on 20 subjects. The created stimulus sound was played from the speaker, and the quality was evaluated by the subject as the received voice in the public conference. The number of frequency divisions was examined for two cases: 256, which is general, and 32, which has low processing delay.

【００２８】図３（ａ）（ｂ）は、入力信号加算率を固
定（固定値０〜０．５）で雑音低減処理した信号のＭＯ
Ｓ値（５段階）を比較したもので、それぞれ周波数分割
数３２，２５６の場合に、Ｓ／Ｎ比５〜２５ｄＢの信号
に対し行った。ここで、原音付加率＝１は雑音低減処理
前の信号に相当し、原音付加率＝０は雑音低減処理を行
って原音を付加しない従来方式による処理信号に相当す
る。また、図３（ｃ）は、リファレンス音声データ（オ
リジナルの音声と、それにノイズを付加しＳ／Ｎ比−５
〜５０ｄＢとしたもの）のＳ／Ｎ比とＭＯＳ値の関係で
ある。FIGS. 3 (a) and 3 (b) show the MO of a signal subjected to noise reduction processing with the input signal addition rate fixed (fixed value 0 to 0.5).
This is a comparison of S values (5 levels), and was performed on a signal having an S / N ratio of 5 to 25 dB when the number of frequency divisions was 32 and 256, respectively. Here, the original sound addition rate = 1 corresponds to the signal before the noise reduction processing, and the original sound addition rate = 0 corresponds to the processed signal according to the conventional method in which the noise reduction processing is performed and the original sound is not added. FIG. 3 (c) shows reference audio data (original audio and S / N ratio −5 with noise added thereto).
５０50 dB) and the MOS value.

【００２９】図３（ａ）（ｂ）より、ＭＯＳ値を最大に
する最適入力信号加算率は入力信号のＳ／Ｎ比に依存
し、Ｓ／Ｎ比１０〜１５ｄＢを境として、Ｓ／Ｎ比が良
い場合は０．０５程度、悪い場合は０．５以上である事
がわかる。この結果については、Ｓ／Ｎ比が良い場合
は、処理歪みが小さい為、マスキングに必要な原音の量
も小さくて十分であるのに対し、Ｓ／Ｎ比が悪い場合
は、処理歪みが大きくなる為に、原音の量も大きくなる
と考えることができる。また、最適入力信号加算率にお
ける、従来法からのＭＯＳ値改善効果は、Ｓ／Ｎ比が良
い時で０．２程度、悪い時で０．６〜１．０程度であ
る。これは、図３（ｃ）との比較から、約５〜１０ｄＢ
のＳ／Ｎ比改善相当であると見積もることができる。3 (a) and 3 (b), the optimum input signal addition rate for maximizing the MOS value depends on the S / N ratio of the input signal. It can be seen that the ratio is about 0.05 when the ratio is good and 0.5 or more when the ratio is bad. Regarding this result, when the S / N ratio is good, the processing distortion is small, so that the amount of original sound necessary for masking is small and sufficient. On the other hand, when the S / N ratio is bad, the processing distortion is large. Therefore, it can be considered that the amount of the original sound also increases. Further, the MOS value improvement effect from the conventional method at the optimum input signal addition rate is about 0.2 when the S / N ratio is good, and about 0.6 to 1.0 when the S / N ratio is bad. This is approximately 5 to 10 dB from the comparison with FIG.
Can be estimated to be equivalent to the improvement in the S / N ratio.

【００３０】以上の結果から、入力信号加算率を固定で
なく、入力信号のＳ／Ｎ比に応じて適応的に変化させる
この発明が有効であると考えられる。従来法では、Ｓ／
Ｎ比が悪い場合に、雑音低減処理前よりもＭＯＳ値が低
下してしまったり、周波数分割数を少なくすると、分割
数が多い場合よりもＭＯＳ値が低下してしまったりす
る。入力信号のＳ／Ｎ比に応じて、適応的に最適入力信
号加算率で入力信号を加算するこの発明を用いれば、Ｓ
／Ｎ比が悪い場合でも、約５〜１０ｄＢの主観的Ｓ／Ｎ
比改善効果が得られ、また、周波数分割数によらず同程
度のＭＯＳ値が実現する事が可能になる。From the above results, it is considered that the present invention is effective in that the input signal addition rate is not fixed but is adaptively changed according to the S / N ratio of the input signal. In the conventional method, S /
When the N ratio is poor, the MOS value is lower than before the noise reduction processing, or when the number of frequency divisions is reduced, the MOS value is lower than when the number of divisions is large. According to the present invention, an input signal is adaptively added at an optimum input signal addition rate according to an S / N ratio of an input signal.
Even when the / N ratio is poor, a subjective S / N of about 5 to 10 dB
The effect of improving the ratio can be obtained, and the same MOS value can be realized regardless of the number of frequency divisions.

【００３１】図１中の各部はコンピュータがプログラム
を読出し、解読実行して機能させるように構成すること
もできる。Each unit in FIG. 1 can be configured so that a computer reads out a program, executes the program, and makes it function.

【００３２】[0032]

【発明の効果】この発明は、目的音声と雑音が混在した
信号を周波数帯域に分割し、帯域毎の瞬時の目的信号に
対する雑音信号の比率を推定し、この推定結果に基づい
て帯域毎のゲインファクターを決定して雑音を低減す
る。この発明では、Ｓ／Ｎ比が悪い場合や周波数帯域分
割数が少ない場合に生じる目的音声の歪を低減する為
に、長時間の雑音と音声の比率に応じて雑音低減音声に
原信号を加算することによって、歪の少ない雑音低減音
声が得られることを特徴とする。According to the present invention, a signal in which a target voice and noise are mixed is divided into frequency bands, a ratio of a noise signal to an instantaneous target signal in each band is estimated, and a gain for each band is estimated based on the estimation result. Determine the factor to reduce noise. According to the present invention, the original signal is added to the noise-reduced sound in accordance with the ratio of long-time noise to the sound in order to reduce distortion of the target sound which occurs when the S / N ratio is poor or the number of frequency band divisions is small. By doing so, a noise-reduced voice with little distortion can be obtained.

【００３３】この方法及び装置による受聴を目的とした
雑音低減により、低遅延、あるいは低Ｓ／Ｎ下でも、聞
き易い目的信号を得る事が可能になる。その結果、音声
会議・ＴＶ会議等の拡声通話系において、マイクロホン
で受音し、相手側に送出される送話信号に、目的となる
音声以外の周囲雑音が混入した場合でも、この方法及び
装置による雑音の低減により音声の明瞭性を保つ事が可
能になり、通信品質が向上する。The noise reduction for the purpose of listening by the method and the apparatus makes it possible to obtain a target signal that is easy to hear even under a low delay or low S / N. As a result, in a voice communication system such as a voice conference or a TV conference, even when ambient noise other than the target voice is mixed in a transmission signal received by a microphone and transmitted to the other party, this method and apparatus can be used. As a result, the clarity of the voice can be maintained and the communication quality can be improved.

[Brief description of the drawings]

【図１】この発明の実施例の機能構成を示すブロック
図。FIG. 1 is a block diagram showing a functional configuration of an embodiment of the present invention.

【図２】図１中の雑音パワー推定部５１における処理手
順を示す流れ図。FIG. 2 is a flowchart showing a processing procedure in a noise power estimation unit 51 in FIG. 1;

【図３】この発明の効果を示す図で（ａ）はこの発明に
よる雑音低減効果の主観評価結果（分割数３２）を示す
図、（ｂ）はこの発明による雑音低減効果の主観評価結
果（分割数２５６）を示す図、（ｃ）はリファレンス信
号の主観評価結果を示す図である。3A and 3B are diagrams showing the effect of the present invention, and FIG. 3A is a diagram showing the subjective evaluation result (the number of divisions: 32) of the noise reduction effect according to the present invention, and FIG. FIG. 7C is a diagram showing the number of divisions 256), and FIG. 7C is a diagram showing the subjective evaluation result of the reference signal.

【図４】雑音低減装置の概念を表わす図。FIG. 4 is a diagram illustrating the concept of a noise reduction device.

【図５】従来の雑音低減装置の機能構成を示すブロック
図。FIG. 5 is a block diagram showing a functional configuration of a conventional noise reduction device.

【図６】図５中の瞬時Ｓ／Ｎ比推定部２０１における処
理手順を示す流れ図。FIG. 6 is a flowchart showing a processing procedure in an instantaneous S / N ratio estimating unit 201 in FIG. 5;

【図７】代表的なゲインファクターを示す図。FIG. 7 is a diagram showing a representative gain factor.

Claims

[Claims]

1. A noise reduction processing method for reducing a noise signal from an input signal in which a target voice signal and a noise signal are mixed, wherein: a frequency band dividing step of dividing the input signal into a plurality of frequency bands; An input signal power calculation step of calculating power for each frequency band for the input signal divided into each frequency band by the frequency band division step, and an input signal divided into each frequency band by the frequency band division step. Estimating the ratio of noise to the instantaneous target audio signal in each of the frequency bands using the gain factor calculation for determining the gain factor for each frequency band based on the estimated instantaneous target audio signal to noise ratio. The input signal divided into each frequency band by the frequency band dividing step. A noise reduction process of reducing noise based on the input signal, a process of calculating a long-term average value of the estimated instantaneous target audio signal-to-noise ratio, and an input signal divided into each frequency band by the frequency band division process. A step of determining, for each of the frequency bands, an addition ratio with the signal whose noise has been reduced by the noise reduction step, based on the long-term average value; and an input signal divided into each frequency band by the frequency band division step. And a step of adding a signal whose noise has been reduced by the noise reduction step based on the determined addition ratio to obtain a band output signal, and a step of synthesizing a signal over the entire frequency band from the band output signal. A noise reduction processing method comprising:

2. The noise reduction processing method according to claim 1, wherein the addition ratio is determined according to the long-term average value and the number of frequency bands.

3. The noise reduction processing method according to claim 2, wherein the long-term average value is 1 when the number of frequency bands is 256.
A noise reduction processing method, wherein the addition ratio is set to 0.5 when it is worse than 0 dB, and the addition ratio is set to 0.05 when the long-term average value is better than 10 dB.

4. The noise reduction processing method according to claim 2, wherein the long-term average value is 15 when the number of frequency bands is 32.
A noise reduction processing method, wherein the addition ratio is set to 0.5 when the average value is worse than 15 dB, and the addition ratio is set to 0.05 when the long-term average value is better than 15 dB.

5. A noise reduction processing apparatus for reducing a noise signal from an input signal in which a target voice signal and a noise signal are mixed, wherein: a frequency band dividing means for dividing the input signal into a plurality of frequency bands; Input signal power calculation means for calculating power for each of the frequency bands for the input signal divided into each frequency band by the frequency band division means, and an input signal divided into each frequency band by the frequency band division means. An instantaneous target audio signal-to-noise ratio estimating means for estimating a ratio of noise to an instantaneous target audio signal in each of the frequency bands, and a gain factor for each frequency band based on the instantaneous target audio signal-to-noise ratio. Gain factor calculating means for determining, and an input signal divided into each frequency band by the frequency band dividing means. Noise reducing means for reducing noise based on the gain factor, a long-term average value calculating means for calculating a long-term average value of the instantaneous target audio signal-to-noise ratio, and a frequency band dividing means for each frequency band. An adding ratio determining unit that determines an adding ratio of the divided input signal and the signal whose noise is reduced by the noise reducing unit for each of the frequency bands based on the long-term average value, A band output signal synthesizing unit for obtaining a band output signal by adding the input signal divided into each frequency band and the signal whose noise has been reduced by the noise reduction unit based on the addition ratio; A noise reduction processing device comprising: an entire frequency band signal synthesizing means for synthesizing a signal over a frequency band.

6. The noise reduction processing device according to claim 5, further comprising: means for determining the optimum addition ratio according to the long-term average value and the number of frequency bands.

7. The noise reduction processing apparatus according to claim 6, wherein when the number of frequency bands is 256, the long-term average value is 1
A noise reduction processing device, wherein the addition ratio is selected to be 0.5 when it is worse than 0 dB, and the addition ratio is selected to be 0.05 when the long-term average value is better than 10 dB.

8. The noise reduction processing apparatus according to claim 7, wherein the long-term average value is 15 when the number of frequency bands is 32.
When the average is longer than 15 dB, the addition ratio is set to 0.5 when the average is longer than 15 dB.
05. A noise reduction processing apparatus characterized by being selected as No. 05.

9. A noise reduction process for reducing a noise signal from an input signal in which a target speech signal and a noise signal are mixed, wherein: a frequency band division process of dividing the input signal into a plurality of frequency bands; An input signal power calculation process of calculating power for each of the frequency bands with respect to the input signal divided into each frequency band by the band division process, and an input signal divided into each frequency band by the frequency band division process. A process of estimating the ratio of noise to the instantaneous target audio signal in each of the frequency bands, and a gain factor calculation process of determining a gain factor for each frequency band based on the instantaneous target audio signal to noise ratio. Based on the gain factor for the input signal divided into each frequency band by the frequency band dividing process Noise reduction processing for reducing sound, processing for calculating a long-term average value of the instantaneous target sound signal-to-noise ratio, input signal divided into each frequency band by the frequency band division processing, and noise due to the noise reduction processing. A process of determining the addition ratio with the reduced signal for each of the frequency bands based on the long-term average value, and an input signal divided into each frequency band by the frequency band division process and the noise reduction process. A program for performing a noise reduction process including a process of obtaining a band output signal by adding a signal with reduced noise based on the addition ratio and a process of combining a signal over the entire frequency band from the band output signal. A computer-readable storage medium having stored therein.