JP3309895B2

JP3309895B2 - Noise reduction method

Info

Publication number: JP3309895B2
Application number: JP06854896A
Authority: JP
Inventors: 潤子佐々木; 陽一羽田; 昭二牧野; 豊金田; 順治小島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1996-03-25
Filing date: 1996-03-25
Publication date: 2002-07-29
Anticipated expiration: 2016-03-25
Also published as: JPH09258792A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、音声会議装置、ＴＶ会
議装置等の音声／音響装置等において、目的となる信号
と不要な雑音等の信号が混在する入力信号から、雑音を
低減した信号を出力する雑音低減方法に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice / audio device such as a voice conference device, a TV conference device, etc., which is a signal obtained by reducing noise from an input signal in which a signal of interest and a signal such as unnecessary noise are mixed. it relates to noise reduction how to output.

【０００２】[0002]

【従来の技術】音声会議、ＴＶ会議等の拡声通話系で
は、マイクロホンで受音し、相手側に送出される送話信
号に、目的となる音声以外の周囲雑音が混入すると、音
声の明瞭性が損なわれ通話品質が著しく劣化する。この
ため、送話信号に含まれる目的音声以外の周囲雑音を低
減することが強く求められている。2. Description of the Related Art In a voice communication system such as a voice conference or a TV conference, if ambient noise other than the target voice is mixed in a transmission signal received by a microphone and transmitted to a partner, the clarity of the voice is increased. And the call quality is significantly degraded. For this reason, there is a strong demand for reducing ambient noise other than the target voice included in the transmission signal.

【０００３】雑音低減方法とは、目的となる音声信号と
不要な周囲雑音等の信号が混在する入力信号から、雑音
を低減した信号を出力する技術である。[0003] The noise reduction method is a technique of outputting a noise-reduced signal from an input signal in which a target voice signal and a signal such as unnecessary ambient noise are mixed.

【０００４】図６は、収音システムを示すもので、これ
を用いて従来の雑音低減方法および装置を説明する。１
１はマイクロホン、１２はこのマイクロホン１１から離
れた位置にある発声者、１３はこの発声者１２の発声し
た目的とする音声信号、１４は空調音等の不要な周囲雑
音、１５は前記マイクロホン１１において受音された雑
音低減装置への入力信号、１６は雑音低減装置、１７は
前記雑音低減装置１６の出力信号を示す。本明細書にお
いては、信号の時間表現は離散時間を表す整数値ｎを用
いて、例えばＸ(n) と表わす。FIG. 6 shows a sound collection system, which will be used to explain a conventional noise reduction method and apparatus. 1
1 is a microphone, 12 is a speaker at a position distant from the microphone 11, 13 is a target audio signal uttered by the speaker 12, 14 is unnecessary ambient noise such as air-conditioning sound, and 15 is the microphone 11 16 shows a received input signal to the noise reduction device, 16 shows a noise reduction device, and 17 shows an output signal of the noise reduction device 16. In this specification, the time expression of a signal is represented by, for example, X (n) using an integer value n representing a discrete time.

【０００５】今、音声信号１３をＳ(n) 、周囲雑音１４
をＮ(n) 、雑音低減装置１６への入力信号１５をＸ(n)
、雑音低減装置１６の出力信号１７をＹ(n) とする
と、雑音低減装置１６への入力信号Ｘ(n) には、目的と
なる音声信号Ｓ(n) 以外に周囲雑音Ｎ(n) が混入してい
る。すなわち[0005] Now, an audio signal 13 is converted to S (n) and ambient noise 14.
Is N (n), and the input signal 15 to the noise reduction device 16 is X (n).
If the output signal 17 of the noise reduction device 16 is Y (n), the input signal X (n) to the noise reduction device 16 includes ambient noise N (n) in addition to the target audio signal S (n). It is mixed. Ie

【０００６】[0006]

【数１】と表わされる。この時、入力信号Ｘ(n) 中の周囲雑音Ｎ
(n) を低減し、目的となる音声信号Ｓ(n) に近い信号を
出力信号Ｙ(n) として取り出す装置を雑音低減装置１６
と呼ぶ。(Equation 1) It is expressed as At this time, the ambient noise N in the input signal X (n)
(n), and extracts a signal close to the target audio signal S (n) as an output signal Y (n).
Call.

【０００７】図７は、スペクトラムサブトラクション法
と呼ばれ、音声認識の分野で従来使用されている方法を
示すブロック図で、これを用いて従来の雑音低減方法を
説明する。図７では図６と同一の要素には共通の記号を
用いた。２１はＡ／Ｄ変換部、２２は周波数帯域分割
部、２３は雑音判別部、２４は入力信号パワー計算部、
２５は入力信号位相計算部、２６は雑音パワー計算部、
２７は減算部、２８は時間領域変換部、２９はＤ／Ａ変
換部を示す。FIG. 7 is a block diagram showing a method called a spectrum subtraction method, which is conventionally used in the field of speech recognition. The conventional noise reduction method will be described with reference to FIG. In FIG. 7, common symbols are used for the same elements as those in FIG. 21 is an A / D converter, 22 is a frequency band divider, 23 is a noise discriminator, 24 is an input signal power calculator,
25 is an input signal phase calculator, 26 is a noise power calculator,
27 denotes a subtraction unit, 28 denotes a time domain conversion unit, and 29 denotes a D / A conversion unit.

【０００８】まず、マイクロホン１１により受音され
た、目的信号と不要な雑音等の混入する入力信号１５を
Ａ／Ｄ変換部２１においてデジタル化し、周波数帯域分
割部２２と雑音判別部２３に転送する。周波数帯域分割
部２２では、転送された信号が複数の周波数帯域に分割
される。周波数帯域への分割は、例えば離散的フーリエ
変換等を用いて行う。ここで、帯域分割された信号は一
般に複素数であるが、分割方法によっては実数となる場
合もある。ここでは、一般的に複素数を仮定して議論す
るが、実数の場合も同じ議論が可能である。周波数帯域
に分割されたｋ番目の周波数帯域の信号をFirst, an A / D converter 21 digitizes an input signal 15 mixed with a target signal and unnecessary noise received by a microphone 11 and transfers the digitized signal to a frequency band dividing unit 22 and a noise discriminating unit 23. . In the frequency band dividing unit 22, the transferred signal is divided into a plurality of frequency bands. Division into frequency bands is performed using, for example, a discrete Fourier transform or the like. Here, the band-divided signal is generally a complex number, but may be a real number depending on the division method. Here, discussion is generally made assuming a complex number, but the same discussion can be made for a real number. The signal of the k-th frequency band divided into frequency bands is

【０００９】[0009]

【数２】とすると、Ｘ_k(n)は、入力信号パワー計算部２４，入力
信号位相計算部２５，雑音パワー計算部２６に転送され
る。入力信号パワー計算部２４では、入力信号のパワー
レベル(Equation 2) Then, X _k (n) is transferred to the input signal power calculator 24, the input signal phase calculator 25, and the noise power calculator 26. The input signal power calculator 24 calculates the power level of the input signal.

【００１０】[0010]

【数３】が入力信号位相計算部２５では位相(Equation 3) Is the phase in the input signal phase calculator 25.

【００１１】[0011]

【数４】が計算される。その後Ｐ_X,k(n)は減算部２７に転送さ
れ、Φ_k(n)は時間領域変換部２８に転送される。一方、
雑音判別部２３ではＡ／Ｄ変換部２１から転送されてき
たＸ(n) に対して、まずパワーレベル(Equation 4) Is calculated. Thereafter, P _{X, k} (n) is transferred to the subtraction unit 27, and Φ _k (n) is transferred to the time domain transformation unit 28. on the other hand,
The noise discriminating unit 23 first determines the power level of X (n) transferred from the A / D converting unit 21.

【００１２】[0012]

【数５】が計算される。次に例えば予め決められたしきい値Ｐ_th
に対し、(Equation 5) Is calculated. Next, for example, a predetermined threshold value P _th
Against

【００１３】[0013]

【数６】の判定が行われ、条件式を満たした場合には、雑音であ
ると判別する。雑音パワー計算部２６では、雑音判別部
２３において入力信号が雑音であると判定された時の
み、雑音のパワーレベルを(Equation 6) Is determined, and if the conditional expression is satisfied, it is determined that the noise is present. The noise power calculation unit 26 determines the power level of the noise only when the noise discrimination unit 23 determines that the input signal is noise.

【００１４】[0014]

【数７】として計算し、その時間平均Ｐａｖ_N,k(n)を減算部２７
に転送する。時間平均は、例えば(Equation 7) And the time average Pav _{N, k} (n) is subtracted by the subtractor 27.
Transfer to The time average is, for example,

【００１５】[0015]

【数８】と計算される。ここでγ_m は例えば、(Equation 8) Is calculated. Where γ _m is, for example,

【００１６】[0016]

【数９】と表わされるような指数重みづけの係数で（γ＜１）、
Ａは(Equation 9) Is an exponentially weighted coefficient expressed as (γ <1),
A is

【００１７】[0017]

【数１０】となる正規化のための定数である。減算部２７では、入
力信号パワー計算部２４および雑音パワー計算部２６か
ら転送されてきたＰ_X,k(n)，Ｐａｖ_N,k(n)を用いて(Equation 10) Is a constant for normalization. The subtraction unit 27 uses P _{X, k} (n) and Pav _{N, k} (n) transferred from the input signal power calculation unit 24 and the noise power calculation unit 26.

【００１８】[0018]

【数１１】を計算する。Ｐ_Y,k(n)は時間領域変換部２８に転送さ
れ、入力信号位相計算部２５から送られてきたΦ_k(n)を
用いて、[Equation 11] Is calculated. P _{Y, k} (n) is transferred to the time domain transforming unit 28 and using Φ _k (n) sent from the input signal phase calculating unit 25,

【００１９】[0019]

【数１２】に変換され、全帯域信号に合成される。この結果を、Ｄ
／Ａ変換部２９でアナログ信号にして雑音を低減した出
力信号１７のＹ(n) を出力する。(Equation 12) And combined into a full-band signal. This result is
The A / A converter 29 outputs Y (n) of the output signal 17 in which noise has been reduced into an analog signal.

【００２０】この方法は、目的信号となる音声信号と不
要な雑音等の混入する入力信号１５に含まれる目的音の
パワーをＰ_S 、不要音のパワーをＰ_N とした時、In this method, when the power of the target sound included in the input signal 15 in which the audio signal serving as the target signal and unnecessary noise are mixed is P _S and the power of the unnecessary sound is P _N ,

【００２１】[0021]

【数１３】で定義されるＳ／Ｎ比が２５ｄＢ程度以上ある場合に
は、効果的に雑音を低減できる。しかし、Ｓ／Ｎ比が２
０ｄＢ程度以下の場合には、雑音は低減されるが、それ
に伴い音声信号の歪みが生じる。これは、このスペクト
ラムサブトラクション法が帯域に分割された信号の段階
で引き算という非線形な処理を施していることに起因す
る。(Equation 13) When the S / N ratio defined by the above is about 25 dB or more, noise can be effectively reduced. However, the S / N ratio is 2
In the case of about 0 dB or less, the noise is reduced, but the sound signal is distorted accordingly. This is because the spectrum subtraction method performs a non-linear process of subtraction at the stage of a signal divided into bands.

【００２２】一方、音声の歪みを少なくするために、計
測された雑音レベルよりも低い値を用いて減算した場
合、例えば雑音パワー計算部２６において推定されたＰ
ａｖ_N,k(n)に対してOn the other hand, when the subtraction is performed using a value lower than the measured noise level in order to reduce the distortion of the voice, for example, the P estimated by the noise power calculator 26 is used.
for av _{N, k} (n)

【００２３】[0023]

【数１４】を行った後Ｐａｖ_N,k(n)を減算部２７に転送し減算を行
った場合には雑音を低減しきれず、雑音低減装置１６の
出力信号１７のＹ(n) の消し残された雑音が時間的に変
化するために聴感上好ましくない音をたてる。すなわ
ち、スペクトラムサブトラクション法では、処理後の出
力信号が減算量によらず、聴感上好ましくなくなってし
まうという問題がある。このため、音声会議装置・ＴＶ
会議装置等、受聴を目的とし音質が重要である収音にお
いては、この方法をそのまま適用することはできない。[Equation 14] Is performed, the Pav _{N, k} (n) is transferred to the subtraction unit 27, and when the subtraction is performed, the noise cannot be completely reduced, and the remaining noise of Y (n) of the output signal 17 of the noise reduction device 16 is not removed. Produces an unpleasant sound due to temporal changes. That is, in the spectrum subtraction method, there is a problem that the output signal after processing becomes undesirably audible regardless of the subtraction amount. For this reason, audio conference devices and TVs
This method cannot be directly applied to sound collection such as a conference apparatus in which sound quality is important for listening.

【００２４】[0024]

【発明が解決しようとする課題】周囲雑音が混入した入
力信号の雑音を低減する方法として雑音の振幅分を音声
信号から減算する従来の方法では、減算量が大きいと音
声の歪みが大きくなり、減算量が小さいと引き残された
雑音が時間的に変化するために聴感上好ましくない音を
たてるという問題があった。As a method for reducing the noise of an input signal mixed with ambient noise, a conventional method of subtracting the amplitude of noise from a speech signal results in a large amount of subtraction resulting in large distortion of speech. If the amount of subtraction is small, there is a problem in that the noise left undesirably sounds due to the temporal change of the remaining noise.

【００２５】本発明の目的は、聴感上の音質の劣化が少
ない雑音低減方法を提供することである。[0025] It is an object of the present invention is to provide a noise reduction how little deterioration of sound quality of audibility.

【００２６】[0026]

【課題を解決するための手段】本発明にかかる雑音低減
方法は、目的となる音声信号と周囲雑音などの混在した
マイクロホンでの受音信号を複数の帯域に分割し、各々
の帯域別の信号に対し雑音パワーを推定し、推定された
雑音パワーと実際に入力されてきた入力信号パワーとを
比較して、音声信号と雑音信号の比率を推定し、これに
基づいて入力信号に損失を与えることによって、雑音を
抑圧する手法である。Noise reduction <br/> how according to the present invention, in order to solve the problems] divides the received sound signals in a mixed microphone such as voice signals and ambient noise are the object into a plurality of bands, each Estimate the noise power for the signal of each band, compare the estimated noise power with the input signal power actually input, estimate the ratio of the voice signal and the noise signal, and based on this, This is a technique for suppressing noise by giving a loss to a signal.

【００２７】損失値は、音声信号と雑音信号の比率の推
定値に基づいて値を決定した後、帯域間での平均化およ
び過去の損失値との間での平均化を行ったものを用い
る。As the loss value, a value determined based on the estimated value of the ratio of the voice signal to the noise signal, and after averaging between the bands and averaging with the past loss value is used. .

【００２８】また、雑音パワーの推定には、入力信号の
パワーレベル分布のヒストグラムを用いる。音声信号
は、パワーレベルの変動が大きく、分布にばらつきがあ
るが、定常雑音は、パワーレベル値がほぼ一定であるた
め、パワーレベル分布のヒストグラムをとると、雑音平
均レベル位置に強いピークが立つ。本発明では、この性
質を用いて、ヒストグラムのピーク区間を検出し雑音の
パワーを推定する。The noise power is estimated using a histogram of the power level distribution of the input signal. The power level of an audio signal greatly fluctuates and its distribution varies. However, since the power level value of stationary noise is almost constant, a strong peak appears at the noise average level position when a histogram of the power level distribution is taken. . In the present invention, using this property, the peak section of the histogram is detected and the power of the noise is estimated.

【００２９】[0029]

【作用】本発明は、次のような特徴をもつ。まず、帯域
に分割して処理することによって、一部の帯域で損失挿
入の判別誤りが起こっても全体的な影響が小さい。ま
た、減算という非線形処理を用いた従来のスペクトラム
サブトラクション法に対し、損失の挿入という線形が処
理を与えるため、信号に対する歪みが少ない。また、音
声の立ち上り部分や立ち下がり部分等では、音声のパワ
ーが短時間に激しく変動するため、音声信号と雑音信号
の比率も大きく変化し、それに伴い平均前の損失値は激
しく変わる。その結果、この損失値を用いると処理後の
信号に不連続感が生じる。本発明では、損失値を帯域お
よび時間で平均化して与えることで、この不連続感を解
消している。また、特に、雑音の混入寄与率の計算に必
要である雑音パワー推定には、ヒストグラムを用いるこ
とで、音声と雑音の混在する区間でも雑音パワーの推定
が可能である。The present invention has the following features. First, by dividing the band into bands and processing, even if a loss insertion error occurs in some bands, the overall effect is small. In addition, since the linearity of insertion of loss gives processing to the conventional spectrum subtraction method using non-linear processing called subtraction, distortion to a signal is small. In addition, in the rising portion and the falling portion of the voice, the power of the voice fluctuates drastically in a short time, so that the ratio between the voice signal and the noise signal greatly changes, and accordingly, the loss value before averaging changes sharply. As a result, if this loss value is used, a sense of discontinuity occurs in the processed signal. In the present invention, the discontinuity is eliminated by averaging the loss value with the band and time. Further, in particular, by using a histogram for noise power estimation required for calculating the noise contribution ratio, noise power can be estimated even in a section where speech and noise are mixed.

【００３０】[0030]

【実施例】図１は、本発明の実施例の構成を示すブロッ
ク図であり、これを用いて、本発明の処理手順を説明す
る。図６，図７と同じものについては共通の記号を用い
た。図１において、３１は雑音パワー推定部、３２は損
失値計算部、３３は損失挿入部を示す。DETAILED DESCRIPTION FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention, and used to explain the processing procedure of the present invention. 6 and 7, the same symbols are used. In FIG. 1, reference numeral 31 denotes a noise power estimator, 32 denotes a loss value calculator, and 33 denotes a loss inserter.

【００３１】その動作について説明する。まず、マイク
ロホン１１により受音された、目的信号と不要な雑音等
の混入する入力信号１５をＡ／Ｄ変換部２１においてデ
ジタル化し、周波数帯域分割部２２に転送する。周波数
帯域分割部２２では、転送された信号が周波数帯域に分
割される。分割された各帯域信号は、入力信号パワー計
算部２４，雑音パワー推定部３１，損失挿入部３３に転
送される。以降、ｋ番目の帯域信号をＸ_k(n)として、Ｘ
_k(n)に対する処理の流れを説明する。The operation will be described. First, an input signal 15 mixed with a target signal and unnecessary noise received by the microphone 11 is digitized in the A / D converter 21 and transferred to the frequency band divider 22. In the frequency band dividing section 22, the transferred signal is divided into frequency bands. Each of the divided band signals is transferred to an input signal power calculation unit 24, a noise power estimation unit 31, and a loss insertion unit 33. Hereafter, let the k-th band signal be X _k (n),
The flow of processing for _k (n) will be described.

【００３２】入力信号パワー計算部２４では、転送され
てきた帯域信号Ｘ_k(n)のパワーレベルThe input signal power calculator 24 calculates the power level of the transferred band signal X _k (n).

【００３３】[0033]

【数１５】を計算する。次にパワーレベルＰ_X,k(n)を予め定められ
た時間について平均し、Ｐａｖ_x,k(n)として損失値計算
部３２に転送する。時間平均は、例えば(Equation 15) Is calculated. Next, the power level P _{X, k} (n) is averaged for a predetermined time, and transferred to the loss value calculation unit 32 as Pav _{x, k} (n). The time average is, for example,

【００３４】[0034]

【数１６】と計算される。ここでγ_m は例えば、(Equation 16) Is calculated. Where γ _m is, for example,

【００３５】[0035]

【数１７】と表わされるような指数重みづけの係数で（γ＜１）、
Ａは[Equation 17] Is an exponentially weighted coefficient expressed as (γ <1),
A is

【００３６】[0036]

【数１８】となる正規化のための定数である。この平均時間ｍが短
すぎると音声・雑音の判別の時間的な変動が大きくな
る。また長すぎると、時間的な変動に対する追従性能が
悪くなる。平均時間としては、例えば０．５〜１ｍｓｅ
ｃをとる。(Equation 18) Is a constant for normalization. If the average time m is too short, the temporal variation in discrimination between speech and noise increases. On the other hand, if the length is too long, the performance of following a temporal change is deteriorated. As an average time, for example, 0.5 to 1 mse
Take c.

【００３７】雑音パワー推定部３１では転送されてきた
帯域信号Ｘ_k(n)を用いて雑音パワーＰａｖ_N,k(n)の推定
が行われる。雑音パワー推定部３１については、処理の
流れ図を図２に示した。詳しい処理の説明はこの図を用
いて行う。なお、図２中のＳ２１〜Ｓ２３は各ステップ
を示す。The noise power estimator 31 estimates the noise power Pav _{N, k} (n) using the transferred band signal X _k (n). FIG. 2 shows a flowchart of the processing performed by the noise power estimating unit 31. Detailed description of the processing will be made using this figure. S21 to S23 in FIG. 2 indicate each step.

【００３８】まずステップＳ２１において、転送されて
きたＸ_k(n)のパワーレベルＰ_X,k(n)を（１２）式の計算
式を用いて計算する。次にパワーレベルＰ_X,k(n)を予め
決められた時間について平均し、Ｐａｖ´_x,k(n)とす
る。Ｐａｖ´_x,k(n)の平均時間ｍ´は、入力信号パワー
計算部２４におけるＰａｖ_x,k(n)の平均時間ｍに比べ
て、長くすることが望ましいが、平均時間が長すぎると
雑音の変動に追従しなくなる。例えば数ｍｓｅｃをと
る。次にステップＳ２２でＰａｖ´_x,k(n)のレベル分布
のヒストグラムをとる。つまりＰａｖ´_x,k(n)が属する
パワー区間の数を１加算、すなわちFirst, in step S21, the power level P _{X, k} (n) of the transferred X _k (n) is calculated using the equation (12). Next, the power levels P _{X, k} (n) are averaged for a predetermined time to obtain Pav ′ _{x, k} (n). Pav' x, the average time m'of _k (n) is, Pav x in the input signal power calculation unit 24, as compared to the average time m of _k (n), it is desirable to lengthen the average time is too long No longer follows noise fluctuations. For example, it takes several msec. Next, in step S22, a histogram of the level distribution of Pav ' _{x, k} (n) is _obtained . That is _, the number of power sections to which Pav ' _{x, k} (n) belongs is added by 1, that is,

【００３９】[0039]

【数１９】を行う。ｉｎｔ（＊）は小数点以下を切り捨て整数化す
ることを示す。更にステップＳ２３でヒストグラムｈ
_k(i)のピーク区間が検出され記憶される。すなわち前後
の値に対して[Equation 19] I do. int (*) indicates that the value after the decimal point is truncated to an integer. Further, in step S23, the histogram h
The peak section of _k (i) is detected and stored. That is, for the value before and after

【００４０】[0040]

【数２０】となるｉを求める。このｉの中で最も小さなｉを雑音パ
ワーＰａｖ_N,k(n)とする。この雑音パワーＰａｖ_N,k(n)
は損失値計算部３２に転送される。(Equation 20) Is obtained. The smallest i among these i is assumed to be noise power Pav _{N, k} (n). This noise power Pav _{N, k} (n)
Is transferred to the loss value calculator 32.

【００４１】損失値計算部３２では、入力信号パワー計
算部２４で計算されたＰａｖ_x,k(n)および雑音パワー推
定部３１で推定された雑音パワーＰａｖ_N,k(n)を用いて
損失値が計算される。損失値計算部３２については、処
理の流れ図を図３に示した。詳しい処理の説明はこの図
を用いて行う。The loss value calculator 32 uses the Pav _{x, k} (n) calculated by the input signal power calculator 24 and the noise power Pav _{N, k} (n) estimated by the noise power estimator 31 to perform a loss. The value is calculated. FIG. 3 shows a flow chart of the processing for the loss value calculation unit 32. Detailed description of the processing will be made using this figure.

【００４２】まず、ステップＳ３１において雑音パワー
Ｐａｖ_N,k(n)に予め定められたマージンＰ_thを加えた値
をしきい値Ｐ_N '_,k(n) とする。すなわち[0042] First, the noise power Pav N, _k a value obtained by adding a predetermined margin P _th to (n) threshold P _N _', and _k (n) in step S31. Ie

【００４３】[0043]

【数２１】とする。ここで、マージンＰ_thが小さすぎると効果が小
さくなり、大きすぎると不連続感が出る。例えば、マー
ジンＰ_thとしては、５×Ｐａｖ_N,k(n)をとる。次に、ス
テップＳ３２においてＰａｖ_X,k(n)とＰａｖ´_N,k(n)の
比較を行い、Ｐａｖ_X,k(n)がＰａｖ´_N,k(n)より小さい
場合には損失値を１より小さく、等しいかあるいは大き
い場合には損失値を１に設定する。(Equation 21) And Here, if the margin P _th is too small, the effect will be small, and if it is too large, a sense of discontinuity will appear. For example, 5 × Pav _{N, k} (n) is taken as the margin P _th . Next, in step S32, Pav _{X, k} (n) is compared with Pav ′ _{N, k} (n). If Pav _{X, k} (n) is smaller than Pav ′ _{N, k} (n), the loss value is calculated. Is smaller than 1 and the loss value is set to 1 if it is equal to or greater than 1.

【００４４】[0044]

【数２２】その後ステップＳ３３において、ステップＳ３２で決定
した損失値Ｌ_k(n)に対して、近隣の帯域の損失値、例え
ばＬ_k-2(n)，Ｌ_k-1(n)……，Ｌ_k+2(n)で平均化し、各帯
域間の差を少なくする。また、過去の損失値Ｌ_k(n-1)，
Ｌ_k(n-2)……を用いて、例えば３０ｍｓｅｃ程度の時間
で損失量を平均化する。この平均化の時間が短いと、損
失値が時間的に変動して不連続感が生じる。平均化によ
って得られた値を新たにＬ_k(n)とし、損失挿入部３３に
転送する。(Equation 22) Then, in step S33, with respect to the loss value L _k determined in step S32 (n), the loss values of neighboring bands, _{L k-2 (n),} L k-1 (n) ......, L k + Averaging by ₂ (n) to reduce the difference between each band. Also, the past loss values L _k (n-1),
Using L _k (n−2)..., The amount of loss is averaged in a time of, for example, about 30 msec. If the averaging time is short, the loss value fluctuates with time, causing a sense of discontinuity. The value obtained by the averaging is newly set as L _k (n) and transferred to the loss insertion unit 33.

【００４５】損失挿入部３３では、損失値計算部３２に
おいて計算された損失値Ｌ_k(n) を用いて損失制御を行
う。すなわち周波数帯域分割部２２より転送されてきた
帯域信号Ｘ_k(n)に対して、The loss insertion unit 33 performs loss control using the loss value L _k (n) calculated by the loss value calculation unit 32. That is, for the band signal X _k (n) transferred from the frequency band division unit 22,

【００４６】[0046]

【数２３】を行い雑音を低減した帯域出力Ｙ_k(n)を出力する。帯域
出力Ｙ_k(n)は時間領域変換部２８に転送され、時間領域
に変換される。この結果を、Ｄ／Ａ変換部２９でアナロ
グ信号にして雑音を低減したＹ(n) の出力信号１７を出
力する。(Equation 23) And outputs a band output Y _k (n) with reduced noise. The band output Y _k (n) is transferred to the time domain converter 28, where it is converted to the time domain. The result is converted into an analog signal by the D / A converter 29, and the Y (n) output signal 17 in which noise is reduced is output.

【００４７】なお、上記の説明では、図１の各部はハー
ド的に示したが、これらはコンピュータプログラムによ
りソフト的に構成できる。したがって、請求項の記載に
おいては部にかえて手段で表現してある。In the above description, each section in FIG. 1 is shown as hardware, but these can be configured as software by a computer program. Therefore, in the description of claims, parts are expressed by means instead of parts.

【００４８】次に、本発明の雑音低減方法について説明
する。基本的には図１の動作手順と同じであるが、まと
めて示すと図４の流れ図のようになる。この図でＳ４１
〜Ｓ４７は各ステップを示す。Next, the noise reduction method of the present invention will be described. Basically, the operation procedure is the same as that of FIG. 1, but when collectively shown, it is as shown in the flowchart of FIG. In this figure, S41
S47 indicate each step.

【００４９】音声と雑音の混在した入力信号Ｘ(n)が入
力されると、ステップＳ４１の過程において、音声雑音
混在信号を複数の帯域に分割する。分割された複数の周
波数帯域の各帯域信号Ｘ_k(n)に対してステップＳ４２の
過程で入力信号の平均パワーレベルＰａｖ_x,k(n)が計算
される。これと並行してステップＳ４３の過程で前記各
帯域信号Ｘ_k(n)の雑音パワーＰａｖ_N,k(n)の推定が行わ
れる。平均パワーレベルＰａｖ_x,k(n)と雑音パワーＰａ
ｖ_N,k(n)とから帯域信号Ｘ_k(n)中に占める各帯域毎の雑
音の比率をステップＳ４４の過程で求める。次いで、ス
テップＳ４５の過程において、この比率に基づいて帯域
毎の損失値を決定する。次に、ステップＳ４６の過程に
おいて分割された複数の周波数帯域の各帯域信号に対し
て各帯域毎に決定された損失値を挿入し、各帯域毎の雑
音を低減した帯域出力信号とする。次いで、ステップＳ
４７の過程において、雑音が低減された帯域出力信号を
時間領域に変換し全帯域信号とし、雑音の低減したアナ
ログ信号を得る。When an input signal X (n) in which speech and noise are mixed is input, the speech and noise mixed signal is divided into a plurality of bands in the process of step S41. In step S42, an average power level Pav _{x, k} (n) of the input signal is calculated for each band signal X _k (n) of the plurality of divided frequency bands. Wherein to the process of step S43 in parallel with this noise power Pav N of each band signals X _k (n), the estimation of _k (n) is performed. Average power level Pav _{x, k} (n) and noise power Pa
From v _{N, k} (n), the ratio of noise in each band in the band signal X _k (n) is determined in the process of step S44. Next, in the process of step S45, a loss value for each band is determined based on the ratio. Next, a loss value determined for each band is inserted into each band signal of the plurality of frequency bands divided in the process of step S46, to obtain a band output signal in which noise for each band is reduced. Then, step S
In the process of 47, the noise-reduced band output signal is converted into the time domain to be a full-band signal, and an analog signal with reduced noise is obtained.

【００５０】本発明の有効性を、実際のＴＶ会議システ
ムの状況下で収音された女性音声（１６ｋＨｚサンプリ
ング）に対して計算機シミュレーションで検証した。騒
音の加わった音声および従来技術で示したスペクトラム
サブトラクションと本発明による騒音低減処理を行った
データのスペクトログラムを図５に示す。スペクトラム
サブトラクションと本発明は両者とも２５０Ｈｚ幅の３
２の帯域信号に分割してから処理を行った。The effectiveness of the present invention was verified by computer simulation on a female voice (16 kHz sampling) collected under the conditions of an actual TV conference system. FIG. 5 shows a spectrogram of the noise-added voice, the spectrum subtraction shown in the prior art, and the data subjected to the noise reduction processing according to the present invention. The spectrum subtraction and the present invention are both 250 Hz width 3
The processing was performed after the signal was divided into two band signals.

【００５１】図５（ａ）は、騒音の加わった音声で、５
００Ｈｚ以下の黒い帯状の部分と全体的に広がったグレ
ーの部分が、主に空調音からなる周囲雑音である。雑音
によって音声がかき消されている様子がわかる。FIG. 5A shows a voice with noise.
The black band below 00 Hz and the gray part which spreads out as a whole are the ambient noise mainly composed of the air-conditioning sound. It can be seen that the voice is erased by the noise.

【００５２】図５（ｂ）は、スペクトラムサブトラクシ
ョンで処理を行ったものである。全体的に雑音が除去さ
れ音声との差が明確に現れている。ただし、消し切って
いない雑音が薄くたて縞になって現われている。この周
波数成分が時間的に変化し、これがキュルキュルとした
耳障りな音の原因となる。また音声部分では、減算のし
過ぎによって振幅の小さな部分が消え劣化が認められ
る。FIG. 5B shows the result of the processing performed by the spectrum subtraction. The noise is removed as a whole, and the difference from the voice is clearly shown. However, the noise that has not been completely eliminated appears as a thin stripe. This frequency component changes with time, which causes a curly and unpleasant sound. In the audio part, a part having a small amplitude disappears due to excessive subtraction, and deterioration is recognized.

【００５３】図５（ｃ）は、本発明により損失制御を行
ったものである。スペクトラムサブトラクションに比べ
て音声と雑音部分の対比がはっきりしていないが、全体
的に雑音部分が薄くなり、音声が強調されているのがわ
かる。原音を大きく変えることなく明瞭度を上げること
により、聞きやすい音が作られている。FIG. 5C shows the result of loss control according to the present invention. Compared with the spectrum subtraction, the contrast between the voice and the noise part is not clear, but it can be seen that the noise part is thin overall and the voice is emphasized. By increasing the clarity without significantly changing the original sound, a sound that is easy to hear is created.

【００５４】Ｓ／Ｎ比を計算したところ、損失制御の処
理前後でそれぞれ１８ｄＢ，３０ｄＢとなり、損失制御
によって１２ｄＢの騒音低減効果があることが確かめら
れた。When the S / N ratio was calculated, it was 18 dB and 30 dB before and after the loss control processing, respectively, and it was confirmed that the noise control had a 12 dB noise reduction effect.

【００５５】[0055]

【発明の効果】本発明は、受聴を目的とした雑音低減方
法および装置であり、次のような特徴を持つ。まず、帯
域に分割して処理することによって、一部の帯域で損失
挿入の判別誤りが起こっても全体的な影響が小さい。ま
た、損失の挿入という線形な処理を与えるため、信号に
対する歪みが少ない。また、損失値を平均化して与える
ことにより、音声の立ち上がり部分や立ち下がり部分
等、音声信号と雑音信号の比率が大きく変化し、それに
伴い損失値が激しく変わる部分でも不連続感が解消され
る。また、雑音パワーの推定にヒストグラムを用いるこ
とで、音声と雑音の混在する区間でも雑音パワーの推定
が使用可能である。According to the present invention, there is provided a noise reduction method and apparatus for listening, which has the following features. First, by dividing the band into bands and processing, even if a loss insertion error occurs in some bands, the overall effect is small. In addition, since a linear process of inserting a loss is provided, distortion to a signal is small. By averaging the loss values, the ratio between the audio signal and the noise signal greatly changes, such as the rising portion and the falling portion of the voice, and the discontinuity is eliminated even in the portion where the loss value changes drastically with it. . Also, by using a histogram for noise power estimation, noise power estimation can be used even in a section where speech and noise are mixed.

【００５６】本発明の方法による受聴を目的とした雑音
低減により、聞き易い目的信号を得ることが可能にな
る。その結果、音声会議・ＴＶ会議等の拡声通話系にお
いて、マイクロホンで受音し、相手側に送出される送話
信号に、目的となる音声以外の周囲雑音が混入した場合
でも、本方法による雑音の低減により音声の明瞭性を保
つことが可能になり、通信品質が向上する。[0056] The noise reduction for the purpose of listening by way of the present invention, it is possible to obtain a likely target signal heard. As a result, in the hands-free communication systems such as audio conferencing · TV conference, and the sound receiving by a microphone, the transmission signal sent to the other party, even if the ambient noise other than the voice to be objective is mixed, in the how Due to the reduction of noise, clarity of voice can be maintained, and communication quality is improved.

[Brief description of the drawings]

【図１】本発明の実施例の構成を示すブロック図であ
る。1 is a block diagram showing a configuration of an embodiment of the present invention.

【図２】図１の実施例における雑音パワー推定部による
処理の流れ図である。FIG. 2 is a flowchart of a process performed by a noise power estimation unit in the embodiment of FIG.

【図３】図１の実施例における損失値計算部による処理
の流れ図である。FIG. 3 is a flowchart of a process performed by a loss value calculator in the embodiment of FIG. 1;

【図４】本発明にかかる雑音低減方法を示す流れ図であ
る。Is a flow diagram showing such a noise reduction methods from the present invention; FIG.

【図５】本発明の効果を説明するためのスペクトログラ
ムである。FIG. 5 is a spectrogram for explaining the effect of the present invention.

【図６】従来の雑音低減装置の原理を説明するための図
である。FIG. 6 is a diagram for explaining the principle of a conventional noise reduction device.

【図７】従来の雑音低減装置の一例を示すブロック図で
ある。FIG. 7 is a block diagram illustrating an example of a conventional noise reduction device.

【符号の説明】１１マイクロホン１５入力信号１６雑音低減装置１７出力信号２１Ａ／Ｄ変換部２２周波数帯域分割部２８時間領域変換部２９Ｄ／Ａ変換部３１雑音パワー推定部３２損失値計算部３３損失挿入部[Description of Signs] 11 Microphone 15 Input signal 16 Noise reduction device 17 Output signal 21 A / D conversion unit 22 Frequency band division unit 28 Time domain conversion unit 29 D / A conversion unit 31 Noise power estimation unit 32 Loss value calculation unit 33 Loss insertion part

フロントページの続き (72)発明者金田豊東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内 (72)発明者小島順治東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内 (56)参考文献特開平３−266899（ＪＰ，Ａ) 特開昭59−67732（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 21/02 Continuing on the front page (72) Inventor Yutaka Kaneda 3-19-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo Nippon Telegraph and Telephone Corporation (72) Inventor Junji Kojima 3-192-2, Nishi-Shinjuku, Shinjuku-ku, Tokyo Date (56) References JP-A-3-266899 (JP, A) JP-A-59-67732 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 21/02

Claims

(57) [Claims]

1. A frequency band dividing step of dividing an audio / noise mixed signal into a plurality of bands, and an input signal power for calculating an input signal power for each band with respect to each band signal of the plurality of divided frequency bands. A calculating process, a noise power estimating process of estimating a noise power of each of the frequency bands by using each band signal of the plurality of divided frequency bands, and an input signal power of each of the bands and each of the bands. A prediction process of comparing the noise power of each band with the noise power of each band, and a loss value for determining a loss value of each band based on the predicted ratio. A calculation step of: inserting the loss value determined for each band into each band signal of the plurality of divided frequency bands, and outputting a band output signal in which noise of each band is reduced. Excessive insertion When the audio signal and the unwanted noise signal to be objective by having a time-domain transform process of synthesizing the entire band signal by converting the band output signal into the time domain
Noise signal is removed from mixed voice noise signal
A noise reduction processing method for outputting a signal, wherein the noise power estimating step includes the steps of:
Histogram of input signal power level distribution for each band signal
And the peak section of the histogram is detected.
Noise characterized by using the smallest value as the noise power
Reduction method.

2. A loss value calculating step, wherein a predetermined value is added to the estimated noise power signal for each band, and this is set as a threshold value, and the determined threshold value for each band is set. The value is compared with the magnitude of the input signal power for each band, and a predetermined loss value is given for a band where the input signal power for each band is larger than a threshold, and the noise reduction method according to claim 1, wherein a is not inserted loss.

3. The loss value calculating step is characterized in that the loss value determined for each band is averaged with a loss value of an adjacent band to newly determine a loss value for each band. The noise reduction method according to claim 1, wherein