JP4542790B2

JP4542790B2 - Noise suppressor and voice communication apparatus provided with noise suppressor

Info

Publication number: JP4542790B2
Application number: JP2004009332A
Authority: JP
Inventors: 岳彦井阪
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2004-01-16
Filing date: 2004-01-16
Publication date: 2010-09-15
Anticipated expiration: 2024-01-16
Also published as: JP2005202222A

Abstract

<P>PROBLEM TO BE SOLVED: To realize high-speed and highly accurate noise suppression processing by making it possible to estimate a noise suppression coefficient with high accuracy in a small amount of operation. <P>SOLUTION: A noise suppression coefficient calculating part 30 approximates occurrence probability for each of a real part and an imaginary part of voice spectrum by a Laplace distribution, and also approximates the occurrence probability for each of a real part and an imaginary part of noise spectrum by a Gauss distribution, and further prepares voice spectrum estimation formula derived by using MAP estimation and Bayes' theorem, and is made to calculate a noise suppression coefficient according to this estimation formula. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

この発明は、入力音声信号に含まれる雑音成分を抑圧するために設けられるノイズサプレッサと、このノイズサプレッサを備える固定電話機や携帯電話機、ＩＰ電話機、テレビ会議装置等の音声通信装置に関する。 The present invention relates to a noise suppressor provided to suppress a noise component contained in an input audio signal, and an audio communication apparatus such as a fixed telephone, a mobile phone, an IP telephone, and a video conference apparatus provided with the noise suppressor.

携帯電話等の音声通信装置では、ＣＥＬＰ（Code Excited Linear Prediction）方式などの音声符号化方式が用いられている。この種の装置を背景雑音の大きい環境下で使用すると、この背景雑音が取り込まれて音声とともに符号化され、その結果音声の明瞭感が低下してしまう。このため、背景雑音を除去もしくは抑制して音声のみの信号に近づけて音声符号化を行う技術（ノイズサプレッサ）が種々研究されている。 Voice communication systems such as CELP (Code Excited Linear Prediction) systems are used in voice communication apparatuses such as mobile phones. When this type of device is used in an environment with a large amount of background noise, the background noise is captured and encoded together with the speech, resulting in a decrease in the clarity of the speech. For this reason, various techniques (noise suppressors) for performing speech encoding by removing or suppressing background noise and approaching a speech-only signal have been studied.

例えば、第１の先行技術として、入力信号の信号対雑音比を周波数帯域ごとに求め、この信号対雑音比に基づいて雑音抑圧係数を決定し、この係数を周波数帯域ごとに上記入力信号にかけることにより雑音を抑圧する方式が知られている（例えば、非特許文献１を参照）。この方式によれば、雑音が抑圧された音声信号を得ることができるる。しかし、修正ベッセル関数を用いて雑音抑圧係数を計算するため、演算量がきわめて多くなるという問題がある。 For example, as a first prior art, a signal-to-noise ratio of an input signal is obtained for each frequency band, a noise suppression coefficient is determined based on the signal-to-noise ratio, and this coefficient is applied to the input signal for each frequency band. Therefore, a method for suppressing noise is known (for example, see Non-Patent Document 1). According to this method, it is possible to obtain an audio signal in which noise is suppressed. However, since the noise suppression coefficient is calculated using the modified Bessel function, there is a problem that the amount of calculation becomes extremely large.

また、第２の先行技術として、音声の周波数スペクトルの振幅をガンマ分布で近似するとともにＭＡＰ推定とBayesの定理を利用して雑音抑圧係数の推定式を導出し、この推定式を用いて雑音抑圧係数を算出する方式がある（例えば、非特許文献２を参照）。この方式では、雑音抑圧係数の推定式が簡単な式になるので、第１の先行技術に比べ演算量を削減することができる。しかしながら、数学的に完全な導出を行っていない。このため、パラメータを試行錯誤しながら決めなければならないという問題がある。 In addition, as a second prior art, the amplitude of a speech frequency spectrum is approximated by a gamma distribution and a noise suppression coefficient estimation formula is derived using MAP estimation and Bayes' theorem, and noise suppression is performed using this estimation formula. There is a method for calculating a coefficient (see, for example, Non-Patent Document 2). In this method, since the noise suppression coefficient estimation formula is a simple formula, the amount of calculation can be reduced as compared with the first prior art. However, it is not a mathematically complete derivation. For this reason, there is a problem that parameters must be determined through trial and error.

さらに第３の先行技術として、第２の先行技術と同様の式変形を行うことにより、雑音抑圧係数の計算量を削減した方式がある（例えば、非特許文献３を参照）。この方式では、第２の先行技術に述べた技術と同様、雑音抑圧係数の推定式が簡単な式になるため演算量の削減効果が得られる。しかし、音声の周波数スペクトルをガウス分布で近似しているので、雑音抑圧係数の推定精度が低いという問題がある。 Further, as a third prior art, there is a method in which the calculation amount of the noise suppression coefficient is reduced by performing the same equation modification as the second prior art (see, for example, Non-Patent Document 3). In this method, similarly to the technology described in the second prior art, the noise suppression coefficient estimation formula becomes a simple formula, so that the amount of calculation can be reduced. However, since the speech frequency spectrum is approximated by a Gaussian distribution, there is a problem that the estimation accuracy of the noise suppression coefficient is low.

さらに第４の先行技術として、上記第１の先行技術における信号対雑音比の推定方法を改良した方式がある（例えば、特許文献１を参照）。この方式では、信号対雑音比の推定精度が上がることで雑音抑圧係数の推定精度も高めることが可能である。しかし、雑音抑圧係数の計算に修正ベッセル関数を用いているため、依然として演算量が多いという問題を有している。
特開２００３−１４０７００公報 Y.Ephraim et al.,”Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator”, ASSP-32 (6), pp.1109-1121, 1984 T.Lotter et al.,”Noise reduction by maximum a posteriori spectral amplitude estimation with supergaussian speech modeling”, IWAENC 2003,pp.83-86 P.Wolfe et al.,”Simple alternatives to the Ephraim and Malah suppression rule for speech enhancement”, Proc. IEEE Workshop on SSP, pp.496-499, Aug, 2001 Further, as a fourth prior art, there is a method obtained by improving the signal-to-noise ratio estimation method in the first prior art (see, for example, Patent Document 1). In this method, the estimation accuracy of the noise suppression coefficient can be increased by increasing the estimation accuracy of the signal-to-noise ratio. However, since the modified Bessel function is used to calculate the noise suppression coefficient, there is still a problem that the amount of calculation is large.
JP 2003-140700 A Y. Ephraim et al., “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator”, ASSP-32 (6), pp.1109-1121, 1984 T.Lotter et al., “Noise reduction by maximum a posteriori spectral amplitude estimation with supergaussian speech modeling”, IWAENC 2003, pp.83-86 P. Wolfe et al., “Simple alternatives to the Ephraim and Malah suppression rule for speech enhancement”, Proc. IEEE Workshop on SSP, pp.496-499, Aug, 2001

以上述べたように各先行技術文献に記載された技術は、雑音抑圧係数を算出するための演算量が多いか、或いは雑音抑圧係数の推定精度が低いという問題点を有する。
この発明は上記事情に着目してなされたもので、その目的とするところは、少ない演算量で雑音抑圧係数の推定を高精度に行えるようにし、これにより高速かつ高精度の雑音抑圧処理を可能にしたノイズサプレッサ及びノイズサプレッサを備えた音声通信装置を提供することにある。 As described above, the techniques described in the respective prior art documents have a problem that the amount of calculation for calculating the noise suppression coefficient is large or the estimation accuracy of the noise suppression coefficient is low.
The present invention has been made paying attention to the above circumstances, and the object of the present invention is to make it possible to estimate a noise suppression coefficient with high accuracy with a small amount of computation, thereby enabling high-speed and high-accuracy noise suppression processing. An object of the present invention is to provide a noise suppressor and a voice communication apparatus including the noise suppressor.

上記目的を達成するためにこの発明は、入力信号から求めた周波数スペクトルを複数の帯域に分割し、この分割された帯域ごとに雑音スペクトルを推定してこの雑音スペクトルと上記周波数スペクトルとから信号対雑音比を算出し、算出された信号対雑音比に基づいて雑音抑圧係数を設定して、この設定された雑音抑圧係数に従い上記周波数スペクトルを重み付けしたのち時間領域の信号に変換するようにしたノイズサプレッサにあって、上記周波数スペクトルに含まれる音声スペクトルの実部及び虚部ごとの出現確率を統計分布モデルにより近似することにより導出される音声スペクトルの推定式を偏微分して零とおき、かつ位相スペクトルをφとしたときの|cosφ|+|sinφ|を定数として近似される演算式を用意し、この演算式に従い上記雑音抑圧係数を算出して設定するようにしたものである。 In order to achieve the above object, the present invention divides a frequency spectrum obtained from an input signal into a plurality of bands, estimates a noise spectrum for each of the divided bands, and generates a signal pair from the noise spectrum and the frequency spectrum. A noise ratio is calculated, a noise suppression coefficient is set based on the calculated signal-to-noise ratio, the frequency spectrum is weighted according to the set noise suppression coefficient, and then converted into a time domain signal. In the suppressor, the speech spectrum estimation formula derived by approximating the appearance probability for each real part and imaginary part of the speech spectrum included in the frequency spectrum by a statistical distribution model is partially differentiated and set to zero, and when the phase spectrum was φ | cosφ | + | sinφ | prepared arithmetic expression is approximated as a constant, the noise in accordance with the calculation equation It is obtained so as to set to calculate the pressure coefficient.

したがってこの発明によれば、音声の周波数スペクトルを統計分布モデルで近似するとともにＭＡＰ推定とBayesの定理を利用して雑音抑圧係数の推定式を導出し、この推定式を用いて雑音抑圧係数が算出されるため、雑音抑圧係数の推定式が比較的簡単となってこれにより演算量を削減することが可能となる。また、音声スペクトルの実部及び虚部ごとの出現確率を統計分布モデルにより近似するようにしているので近似精度を高めることができ、さらに数学的に完全な導出を行っているので調整するパラメータが少なくなり、これにより雑音抑圧係数の推定精度を高めることが可能となる。 Therefore, according to the present invention, the frequency spectrum of speech is approximated by a statistical distribution model, and an estimation expression of a noise suppression coefficient is derived using the MAP estimation and Bayes' theorem, and the noise suppression coefficient is calculated using this estimation expression. As a result, the noise suppression coefficient estimation formula becomes relatively simple, thereby reducing the amount of calculation. In addition, since the appearance probability for each real part and imaginary part of the speech spectrum is approximated by a statistical distribution model, the approximation accuracy can be improved, and further mathematically complete derivation is performed, so the parameters to be adjusted are As a result, the estimation accuracy of the noise suppression coefficient can be increased.

上記音声スペクトルの推定式を導出する際には、上記音声スペクトルの実部及び虚部ごとの出現確率を第１の統計分布モデルにより近似するとともに、雑音スペクトルの実部及び虚部ごとの出現確率を第２の統計分布モデルにより近似するとよい。このようにすると、音声スペクトルに加え雑音スペクトルを考慮して音声スペクトルの推定式を導出することができ、これにより音声スペクトルの推定式の精度をさらに高めることができる。
具体的には、音声スペクトルを近似するための第１の統計分布モデルとしてラプラス分布又はガンマ分布を使用し、一方雑音スペクトルを近似するための第２の統計分布モデルとしてガウス分布を使用するとよい。 When deriving the estimation formula of the speech spectrum, the appearance probability for each real part and imaginary part of the speech spectrum is approximated by the first statistical distribution model, and the appearance probability for each real part and imaginary part of the noise spectrum. May be approximated by a second statistical distribution model. In this way, it is possible to derive a speech spectrum estimation formula in consideration of the noise spectrum in addition to the speech spectrum, thereby further improving the accuracy of the speech spectrum estimation formula.
Specifically, a Laplace distribution or a gamma distribution may be used as the first statistical distribution model for approximating the speech spectrum, while a Gaussian distribution may be used as the second statistical distribution model for approximating the noise spectrum.

さらに、上記雑音抑圧係数の演算式を上記分割された帯域ごとに導出して設定し、これらの演算式を用いて上記分割された帯域ごとに雑音抑圧係数を算出するとよい。このようにすると、周波数帯域ごとに独立して設定された演算式により雑音抑圧係数が算出される。このため、全周波数帯域に対し共通に設定された演算式を用いて雑音抑圧係数を算出する場合に比べ、音声のスペクトル確率密度関数（ＳＰＤＦ：Spectral Probability Density Function）の分布が周波数帯域ごとに異なる点を考慮して、さらに高精度の近似を行うことが可能となる。 Furthermore, it is preferable to derive and set an arithmetic expression for the noise suppression coefficient for each of the divided bands, and calculate the noise suppression coefficient for each of the divided bands using these arithmetic expressions. In this way, the noise suppression coefficient is calculated by an arithmetic expression set independently for each frequency band. For this reason, compared to the case where the noise suppression coefficient is calculated using an arithmetic expression commonly set for all frequency bands, the distribution of the spectral probability density function (SPDF) of the speech differs for each frequency band. Considering the points, it is possible to perform approximation with higher accuracy.

要するにこの発明では、雑音抑圧係数を設定する手段において、上記周波数スペクトルに含まれる音声スペクトルの実部及び虚部ごとの出現確率を統計分布モデルにより近似することにより導出される音声スペクトルの推定式を偏微分して零とおき、かつ位相スペクトルをφとしたときの|cosφ|+|sinφ|を定数として近似される演算式に従い、上記雑音抑圧係数を算出するようにしている。
したがってこの発明によれば、少ない演算量で雑音抑圧係数の推定を高精度に行うことができ、これにより高速かつ高精度の雑音抑圧処理を可能にしたノイズサプレッサ及びノイズサプレッサを備えた音声通信装置を提供することができる。 In short, in the present invention, in the means for setting the noise suppression coefficient, the estimation formula of the speech spectrum derived by approximating the appearance probability for each real part and imaginary part of the speech spectrum included in the frequency spectrum by a statistical distribution model is provided. The noise suppression coefficient is calculated according to an arithmetic expression approximated by using | cosφ | + | sinφ | as a constant when the partial differentiation is set to zero and the phase spectrum is φ.
Therefore, according to the present invention, it is possible to estimate the noise suppression coefficient with high accuracy with a small amount of computation, thereby enabling high-speed and high-accuracy noise suppression processing, and a voice communication apparatus including the noise suppressor. Can be provided.

図１は、この発明に係わるノイズサプレッサを備えた音声通信装置の第１の実施形態である携帯電話機の構成を示すブロック図である。
図示しない基地局から送信された無線信号は、アンテナ１で受信されたのちアンテナ共用器（ＤＵＰ）２を介して受信回路（ＲＸ）３に入力される。受信回路３は、上記受信された無線信号を周波数シンセサイザ（ＳＹＮ）４から出力された局部発振信号とミキシングして中間周波信号に周波数変換（ダウンコンバート）する。そして、このダウンコンバートされた受信中間周波信号を直交復調し、これにより生成される受信ベースバンド信号をＣＤＭＡ（Code Division Multiple Access）信号処理部６に供給する。なお、上記周波数シンセサイザ４から発生される局部発振信号の周波数は、制御部１８からの制御信号ＳＹＣにより指定される。 FIG. 1 is a block diagram showing a configuration of a mobile phone which is a first embodiment of a voice communication apparatus provided with a noise suppressor according to the present invention.
A radio signal transmitted from a base station (not shown) is received by the antenna 1 and then input to the receiving circuit (RX) 3 through the antenna duplexer (DUP) 2. The receiving circuit 3 mixes the received radio signal with the local oscillation signal output from the frequency synthesizer (SYN) 4 and converts the frequency into an intermediate frequency signal (down-conversion). Then, the down-converted received intermediate frequency signal is orthogonally demodulated, and a received baseband signal generated thereby is supplied to a code division multiple access (CDMA) signal processing unit 6. The frequency of the local oscillation signal generated from the frequency synthesizer 4 is specified by the control signal SYC from the control unit 18.

ＣＤＭＡ信号処理部６はＲＡＫＥ受信機を備える。ＲＡＫＥ受信機では、上記受信ベースバンド信号に含まれる複数のパスがそれぞれ拡散符号により逆拡散処理される。そして、この逆拡散処理された各パスの信号が位相を合わされたのち合成される。この結果、受信符号化データが再生され、この受信符号化データは音声符号復号処理部（以後スピーチコーデック（ＳＰ−ＣＯＤ）と称する）７に入力される。 The CDMA signal processing unit 6 includes a RAKE receiver. In the RAKE receiver, the plurality of paths included in the received baseband signal are each subjected to despreading processing using spreading codes. Then, the signals of the respective paths subjected to the despreading process are combined after being matched in phase. As a result, received encoded data is reproduced, and this received encoded data is input to a speech code decoding processing unit (hereinafter referred to as speech codec (SP-COD)) 7.

スピーチコーデック７は、上記ＣＤＭＡ信号処理部６から出力された受信符号化データを音声復号する。ＰＣＭコーデック８は、上記スピーチコーデック７から出力されたディジタル受話信号をＰＣＭ復号してアナログ受話信号を出力する。このアナログ受話信号は、受話増幅器９にて増幅されたのちスピーカ１０から音声となって出力される。 The speech codec 7 performs speech decoding on the received encoded data output from the CDMA signal processing unit 6. The PCM codec 8 performs PCM decoding on the digital reception signal output from the speech codec 7 and outputs an analog reception signal. The analog reception signal is amplified by the reception amplifier 9 and then outputted as sound from the speaker 10.

一方、マイクロホン１１に入力された話者の送話音声信号は、送話増幅器１２により適正レベルまで増幅されたのち、ＰＣＭコーデック８にてＰＣＭ符号化処理が施され、これによりディジタル送話信号となる。このディジタル送話信号は、後述するノイズサプレッサ（ＮＳ）２０を介してスピーチコーデック７に入力される。スピーチコーデック７は、ＰＣＭコーデック８から出力されたディジタル送話信号を所定の音声符号化方式に従い符号化する。そして、これにより生成された送信符号化データをＣＤＭＡ信号処理部６に供給する。 On the other hand, a speaker's transmission voice signal input to the microphone 11 is amplified to an appropriate level by the transmission amplifier 12 and then subjected to PCM encoding processing by the PCM codec 8, whereby a digital transmission signal and Become. This digital transmission signal is input to the speech codec 7 via a noise suppressor (NS) 20 described later. The speech codec 7 encodes the digital transmission signal output from the PCM codec 8 according to a predetermined speech encoding method. Then, the transmission encoded data generated thereby is supplied to the CDMA signal processing unit 6.

ＣＤＭＡ信号処理部６は、上記スピーチコーデック７から出力された送信符号化データに対し、送信チャネルに割り当てられた拡散符号を用いてスペクトル拡散処理を施す。そして、上記スペクトル拡散処理された信号を送信回路（ＴＸ）５に供給する。送信回路５は、上記スペクトル拡散された信号を例えばＱＰＳＫ（Quadrature Phase Shift Keying）方式等のディジタル変調方式を使用して変調する。そして、この変調により生成された送信信号を、周波数シンセサイザ４から発生される局部発振信号と合成して無線信号に周波数変換する。またそれと共に送信回路５は、制御部１２により指示される送信電力レベルとなるように、上記無線信号を電力増幅器により高周波増幅する。この増幅された無線信号は、アンテナ共用器２を介してアンテナ１に供給され、このアンテナ１から図示しない基地局へ向け送信される。 The CDMA signal processing unit 6 performs spread spectrum processing on the transmission encoded data output from the speech codec 7 using a spreading code assigned to the transmission channel. Then, the signal subjected to the spread spectrum processing is supplied to the transmission circuit (TX) 5. The transmission circuit 5 modulates the spectrum spread signal using a digital modulation method such as a QPSK (Quadrature Phase Shift Keying) method. Then, the transmission signal generated by this modulation is combined with the local oscillation signal generated from the frequency synthesizer 4 and frequency-converted into a radio signal. At the same time, the transmission circuit 5 amplifies the radio signal with a power amplifier at a high frequency so that the transmission power level indicated by the control unit 12 is obtained. The amplified radio signal is supplied to the antenna 1 via the antenna duplexer 2 and transmitted from the antenna 1 to a base station (not shown).

制御部１８は、例えばマイクロコンピュータを使用したもので、携帯電話機の通信動作に係わる一切の制御を司る。記憶部１３は、ＲＡＭ及びフラッシュメモリを使用したもので、フラッシュメモリには電話帳や送受信された電子メール、各種制御情報等が記憶される。入力部１４には、ダイヤルキーに加え、送信キー、終了キー、電源キー、音量調節キー及びモード指定キー等の機能キーが設けられている。表示部１５には、ＬＣＤ及びＬＥＤが設けられている。このうちＬＣＤは、バッテリ残量や受信電界強度等の自装置の動作状態を示す情報、送受信される電子メール、電話帳や通信相手ユーザが使用する端末の電話番号、送受信履歴等を表示するために使用される。またＬＥＤは、着信の報知やバッテリ１６の充電状態を表示するために使用される。電源回路（ＰＯＷ）は、バッテリ１７の出力をもとに所定の動作電源電圧Ｖccを生成して各回路部に供給する。 The control unit 18 uses, for example, a microcomputer, and manages all control related to the communication operation of the mobile phone. The storage unit 13 uses a RAM and a flash memory. The flash memory stores a phone book, transmitted / received electronic mail, various control information, and the like. In addition to the dial keys, the input unit 14 is provided with function keys such as a transmission key, an end key, a power key, a volume adjustment key, and a mode designation key. The display unit 15 is provided with an LCD and an LED. Of these, the LCD displays information indicating the operating state of the device such as the remaining battery level and received electric field strength, e-mails to be transmitted / received, telephone numbers of terminals used by the telephone book and other users, and transmission / reception history Used for. Further, the LED is used for notifying incoming calls and displaying the state of charge of the battery 16. The power supply circuit (POW) generates a predetermined operating power supply voltage Vcc based on the output of the battery 17 and supplies it to each circuit unit.

ところで、ノイズサプレッサ２０は例えばＤＳＰ（Digital Signal Processor）により構成されるもので、次のような機能を備えている。図２はその構成を示す機能ブロック図である。
ノイズサプレッサ２０は、高速フーリエ変換部（ＦＦＴ）２１と、スペクトル振幅抑圧部２２と、逆高速フーリエ変換部（ＩＦＦＴ）２３と、第１の帯域分割部２４と、音声検出部２５と、雑音レベル推定部２６と、事後ＳＮＲ推定部２７と、第２の帯域分割部２８と、事前ＳＮＲ推定部２９と、雑音抑圧係数計算部３０とを備える。 By the way, the noise suppressor 20 is constituted by a DSP (Digital Signal Processor), for example, and has the following functions. FIG. 2 is a functional block diagram showing the configuration.
The noise suppressor 20 includes a fast Fourier transform unit (FFT) 21, a spectrum amplitude suppression unit 22, an inverse fast Fourier transform unit (IFFT) 23, a first band division unit 24, a voice detection unit 25, and a noise level. An estimator 26, a posteriori SNR estimator 27, a second band divider 28, an a priori SNR estimator 29, and a noise suppression coefficient calculator 30 are provided.

高速フーリエ変換部（ＦＦＴ）２１は、入力信号を時間領域の信号から周波数領域の信号に変換する。所定時間長ずつ、例えば１２８個の入力されたディジタル送話信号ｘ(t) をフレームに分割し、これらのフレームごとに高速フーリエ変換処理を行い、これにより振幅スペクトルＸ(k) （k=0〜N−1, Nはフレーム長）を得る。 The fast Fourier transform unit (FFT) 21 converts an input signal from a time domain signal to a frequency domain signal. For example, 128 input digital transmission signals x (t) are divided into frames for each predetermined time length, and fast Fourier transform processing is performed for each of these frames, whereby the amplitude spectrum X (k) (k = 0). ˜N−1, N is the frame length).

なお、高速フーリエ変換処理に先立って、スペクトル包絡を平坦化することを目的として、入力されたディジタル送話信号ｘ(t) に対しプレエンファシス処理を施すようにしてもよい。また、高速フーリエ変換処理のフレーム長とシフト幅は同じでなくてもよく、例えばフレーム長を１２８、シフト幅を８０とした場合には、８０サンプル分の入力ディジタル送話信号ｘ(t) をフレーム前半部に格納し、残りの４８サンプルについては０（零）とした後に境界の不連続性を排除するために正弦波特性の窓掛けを行ってもよい。プレエンファシスおよび窓掛けのより具体的な手法は、米国TIAで標準化された符号化方式の規格である、TIA/EIA IS-127 EVRC, 1997-01に詳述されている。 Prior to the fast Fourier transform process, a pre-emphasis process may be performed on the input digital transmission signal x (t) for the purpose of flattening the spectral envelope. Further, the frame length and the shift width of the fast Fourier transform process need not be the same. For example, when the frame length is 128 and the shift width is 80, the input digital transmission signal x (t) for 80 samples is obtained. In order to eliminate the discontinuity of the boundary after storing in the first half of the frame and setting the remaining 48 samples to 0 (zero), windowing of a sine wave characteristic may be performed. More specific methods of pre-emphasis and windowing are described in detail in TIA / EIA IS-127 EVRC, 1997-01, which is a standard for an encoding method standardized by the US TIA.

第１の帯域分割部２４は、振幅スペクトルＸ(k) を低域から高域まで例えば８個の周波数帯域に分割し、これらの周波数帯域ごとに平均をとって各周波数帯域を代表する帯域パワーＸd(i) （i=0〜Ni, Niは周波数帯域数で例えば１６個）を求める。 The first band dividing unit 24 divides the amplitude spectrum X (k) into, for example, eight frequency bands from a low band to a high band, and takes the average for each frequency band to represent the band power representing each frequency band. Xd (i) (i = 0 to Ni, Ni is, for example, 16 in frequency band) is obtained.

音声検出部２５は、上記第１の帯域分割部２４により求められた帯域パワーＸd(i) をしきい値q (i) と比較し、帯域パワーＸd(i) がしきい値θ(i) より大きければ、音声として検出する。ここで、しきい値θ(i) は固定値としてもよいし、過去のしきい値θ(i) または帯域パワーＸd(i) との重み付き平均をとって適応的に更新してもよい。 The voice detection unit 25 compares the band power Xd (i) obtained by the first band dividing unit 24 with the threshold value q (i), and the band power Xd (i) is compared with the threshold value θ (i). If it is larger, it is detected as voice. Here, the threshold value θ (i) may be a fixed value, or may be adaptively updated by taking a weighted average with the past threshold value θ (i) or the band power Xd (i). .

雑音レベル推定部２６は、上記音声検出部２４により音声と検出されなかったフレームの帯域パワーＸd(i) から帯域ごとの雑音レベルＮ(i) を推定する。ここで、現在の帯域パワーＸd(i) をそのまま雑音レベルＮ(i) としてもよいし、過去の雑音レベルと帯域パワーＸd(i) との重み付き平均をとって適応的に雑音レベルＮ(i) を求めてもよい。
事後ＳＮＲ推定部２７は、上記帯域分割部２４により求められた帯域パワーＸd(i) と、上記雑音レベル推定部２６により求められた雑音レベルＮ(i) との比、つまりＳＮＲ(i) を求める。 The noise level estimator 26 estimates the noise level N (i) for each band from the band power Xd (i) of the frame that is not detected by the voice detector 24. Here, the current band power Xd (i) may be used as the noise level N (i) as it is, or the noise level N (() is adaptively obtained by taking a weighted average of the past noise level and the band power Xd (i). i) may be sought.
The a posteriori SNR estimator 27 calculates the ratio of the band power Xd (i) obtained by the band divider 24 to the noise level N (i) obtained by the noise level estimator 26, that is, SNR (i). Ask.

第２の帯域分割部２８は、後述するスペクトル振幅抑圧部２２において雑音が抑圧された後の振幅スペクトルＹ(k) を低域から高域まで例えば１６個の帯域に分割し、これらの帯域ごとに平均をとって各帯域を代表する帯域パワーＹd(i) （i=0〜Ni, Niは帯域数で例えば16）を求める。
事前ＳＮＲ推定部２９は、第２の帯域分割部２８により求められた帯域パワーＹd(i) と今回の雑音レベルＮ(k) との比、ＳＮＲ’(i) を求める。 The second band dividing unit 28 divides the amplitude spectrum Y (k) after the noise is suppressed by the spectrum amplitude suppressing unit 22 to be described later into, for example, 16 bands from a low frequency range to a high frequency range. The band power Yd (i) (i = 0 to Ni, Ni is, for example, 16 is the number of bands) representing each band.
The prior SNR estimation unit 29 obtains the ratio SNR ′ (i) between the band power Yd (i) obtained by the second band dividing unit 28 and the current noise level N (k).

振幅抑圧係数計算部３０は、予め用意された演算式に従い、上記事後ＳＮＲ推定部２７により求められたＳＮＲ(i) および上記事前ＳＮＲ推定部２９により求められたＳＮＲ’(i) をもとに雑音抑圧係数Ｗ(i) を求める。この雑音抑圧係数Ｗ(i) の演算式の導出方法については後で詳述する。 The amplitude suppression coefficient calculator 30 is based on the SNR (i) obtained by the a posteriori SNR estimator 27 and the SNR ′ (i) obtained by the a priori SNR estimator 29 according to a previously prepared arithmetic expression. Then, a noise suppression coefficient W (i) is obtained. A method for deriving an arithmetic expression for the noise suppression coefficient W (i) will be described in detail later.

スペクトル振幅抑圧部２２は、高速フーリエ変換部（ＦＦＴ）２１により求められた振幅スペクトルＸ(k) に、上記振幅抑圧係数計算部３０により求められた雑音抑圧係数W(i)を掛け合わせ、これにより雑音抑圧された振幅スペクトルＹ(k) を得る。ここで、同一帯域内の振幅スペクトルＸ(k) には、同一の雑音抑圧係数Ｗ(i) を掛け合わせる。 The spectrum amplitude suppression unit 22 multiplies the amplitude spectrum X (k) obtained by the fast Fourier transform unit (FFT) 21 by the noise suppression coefficient W (i) obtained by the amplitude suppression coefficient calculation unit 30, To obtain an amplitude spectrum Y (k) in which noise is suppressed. Here, the amplitude spectrum X (k) in the same band is multiplied by the same noise suppression coefficient W (i).

逆高速フーリエ変換部（ＩＦＦＴ）部２３は、上記スペクトル振幅抑圧部２２により雑音抑圧された振幅スペクトルＹ(k) および前記高速フーリエ変換部（ＦＦＴ）２１により求められた位相スペクトルＰ(k) を時間領域の信号ｙ(t) に変換する。この変換された時間領域のディジタル送話信号ｙ(t) は、音声符号復号処理部（ＳＰ−ＣＯＤ）７に供給される。 The inverse fast Fourier transform unit (IFFT) unit 23 calculates the amplitude spectrum Y (k) noise-suppressed by the spectrum amplitude suppression unit 22 and the phase spectrum P (k) obtained by the fast Fourier transform unit (FFT) 21. Convert to time domain signal y (t). The converted digital transmission signal y (t) in the time domain is supplied to the speech code decoding processing unit (SP-COD) 7.

なお、高速フーリエ変換（ＦＦＴ）処理に先立って、入力ディジタル送話信号ｘ(t) をプレエンファシスした場合には、ディエンファシスして元に戻す。また、高速フーリエ変換（ＦＦＴ）処理のフレーム長とシフト幅が同じでない場合には、フレーム境界をオーバーラップさせて不連続性をなくす。ディエンファシスおよびオーバーラップのより具体的な方法は、先に述べたTIA/EIA 127 EVRC, 1997-01に詳述されている。
ところで、上記振幅抑圧係数計算部３０において使用される雑音抑圧係数Ｗ(i) の演算式は以下のように導出される。
先ず、雑音抑圧係数Ｗ(i) の演算式を以下に示す。

If the input digital transmission signal x (t) is pre-emphasized prior to the fast Fourier transform (FFT) process, it is de-emphasized and restored. Further, when the frame length and the shift width of the fast Fourier transform (FFT) process are not the same, the frame boundaries are overlapped to eliminate the discontinuity. More specific methods of de-emphasis and overlap are detailed in TIA / EIA 127 EVRC, 1997-01 mentioned above.
By the way, the arithmetic expression of the noise suppression coefficient W (i) used in the amplitude suppression coefficient calculator 30 is derived as follows.
First, an arithmetic expression of the noise suppression coefficient W (i) is shown below.

次に、上記雑音抑圧係数Ｗ(i) の演算式の導出過程を説明する。この発明の第１の実施形態では、音声のスペクトル確率密度関数（ＳＰＤＦ：Spectral Probability Density Function）の実部および虚部を統計分布モデルで近似できるものと仮定する。ここで、統計分布モデルとしてラプラス分布を用いれば、音声のスペクトル確率密度関数（ＳＰＤＦ）は次のように表される。

Next, a process for deriving an arithmetic expression for the noise suppression coefficient W (i) will be described. In the first embodiment of the present invention, it is assumed that the real part and the imaginary part of a spectral probability density function (SPDF) of speech can be approximated by a statistical distribution model. Here, if a Laplace distribution is used as the statistical distribution model, the speech spectral probability density function (SPDF) is expressed as follows.

ただし、φは位相スペクトル、σXはXの分散を表す。なお、ここでは記述を簡単にするため、周波数帯域番号iの記載は省略した（以降も省略する）。音声の振幅スペクトルXの推定値X^（エックスハットと読む）は、ＭＡＰ推定とBayesの定理から次のように表される。

Here, φ represents the phase spectrum, and σX represents the dispersion of X. Here, for simplicity of description, the description of the frequency band number i is omitted (hereinafter also omitted). The estimated value X ^ (read as X-hat) of the speech amplitude spectrum X is expressed as follows from the MAP estimation and Bayes' theorem.

ただし、p(X) はXの出力確率、R は雑音付き音声の振幅スペクトルを表す。p(R) はXに無関係なので、p(R|X)p(X)を最大にするXを求めればよい。条件付き確率p(R|X)、及び事前確率p(X)は、前述の非特許文献２によれば次のように表される。

Here, p (X) is the output probability of X, and R is the amplitude spectrum of speech with noise . Since p (R) is irrelevant to X, X that maximizes p (R | X) p (X) may be obtained. The conditional probability p (R | X) and the prior probability p (X) are expressed as follows according to Non-Patent Document 2 described above.

ただし、σNは雑音の振幅スペクトルの分散を表す。式（２）及び（３）を式（６）に代入し、|cosφ|+|sinφ|=a （aは定数）と近似すると、

Here, σN represents the variance of the amplitude spectrum of noise. Substituting Equations (2) and (3) into Equation (6) and approximating | cosφ | + | sinφ | = a (a is a constant)

を得る。式（５）と式（７）との積をとり、その対数をXで微分して０とおけば、X^の最適解を導出できる。そして、最終的に雑音抑圧係数W(i) の計算式として、式（１）が導出される。 Get. Taking the product of Equation (5) and Equation (7), differentiating the logarithm with X and setting it to 0, the optimal solution for X ^ can be derived. Finally, equation (1) is derived as a formula for calculating the noise suppression coefficient W (i).

以上のような構成であるから、ＰＣＭ符号復号処理部８から出力されたディジタル送話信号ｘ(t) がノイズサプレッサ２０に入力されると、ディジタル送話信号ｘ(t) は先ず高速フーリエ変換部（ＦＦＴ）２１でフレームごとに周波数領域の信号に変換され、これにより振幅スペクトルＸ(k) が得られる。この振幅スペクトルＸ(k) は、第１の帯域分割部２４により例えば１６個の周波数帯域に分割されたのち、これらの周波数帯域ごとに平均化されて帯域パワーＸd(i) が得られる。そして、この帯域パワーＸd(i) は音声検出部２５において予め設定されたしきい値q (i) と比較され、これにより音声フレームが検出される。 With the above configuration, when the digital transmission signal x (t) output from the PCM code decoding processing unit 8 is input to the noise suppressor 20, the digital transmission signal x (t) is first subjected to fast Fourier transform. The unit (FFT) 21 converts each frame into a frequency domain signal, whereby an amplitude spectrum X (k) is obtained. The amplitude spectrum X (k) is divided into, for example, 16 frequency bands by the first band dividing unit 24, and then averaged for each of these frequency bands to obtain band power Xd (i). The band power Xd (i) is compared with a preset threshold value q (i) in the voice detector 25, thereby detecting a voice frame.

雑音レベル推定部２６では、上記音声検出部２４により音声フレームと検出されなかったフレームの帯域パワーＸd(i) をもとに帯域ごとの雑音レベルＮ(i) が推定され、この推定された雑音レベルＮ(i) と上記帯域分割部２４により求められた帯域パワーＸd(i) とをもとに、事後ＳＮＲ推定部２７でＳＮＲ(i) が算出される。 The noise level estimator 26 estimates the noise level N (i) for each band based on the band power Xd (i) of the frame that has not been detected by the voice detector 24 and the estimated noise. Based on the level N (i) and the band power Xd (i) obtained by the band dividing unit 24, the a posteriori SNR estimating unit 27 calculates SNR (i).

一方、スペクトル振幅抑圧部２２において雑音が抑圧された後の振幅スペクトルＹ(k) は、先に述べた抑圧前の振幅スペクトルＸ(k) と同様に、１６個の帯域に分割されたのちこれらの帯域ごとに平均化されて帯域パワーＹd(i) となる。そして、この帯域パワーＹd(i) と今回の雑音レベルＮ(k) とをもとに、事前ＳＮＲ推定部２９によりＳＮＲ’(i) が算出される。 On the other hand, the amplitude spectrum Y (k) after the noise is suppressed in the spectrum amplitude suppression unit 22 is divided into 16 bands after being divided into 16 bands, similar to the amplitude spectrum X (k) before the suppression described above. The band power Yd (i) is averaged for each band. Then, based on the band power Yd (i) and the current noise level N (k), the prior SNR estimation unit 29 calculates SNR ′ (i).

さて、振幅抑圧係数計算部３０では、先に述べたように導出された雑音抑圧係数W(i) の推定演算式に従い、上記事後ＳＮＲ推定部２７により求められたＳＮＲ(i) および上記事前ＳＮＲ推定部２９により求められたＳＮＲ’(i) をもとに雑音抑圧係数Ｗ(i) の算出が行われる。 Now, in the amplitude suppression coefficient calculation unit 30, the SNR (i) obtained by the a posteriori SNR estimation unit 27 and the a priori are calculated in accordance with the estimation equation of the noise suppression coefficient W (i) derived as described above. The noise suppression coefficient W (i) is calculated based on the SNR ′ (i) obtained by the SNR estimation unit 29.

このとき、雑音抑圧係数Ｗ(i) の推定演算式は、（１）式に示したように音声スペクトルの実部および虚部ごとの出現確率をラプラス分布により近似するとともに、雑音スペクトルの実部および虚部ごとの出現確率をガウス分布により近似し、かつＭＡＰ推定及びBayesの定理を使用して導出された比較的簡単な演算式である。このため、雑音抑圧係数Ｗ(i) は比較的少ない演算量により算出される。しかも、音声スペクトル及び雑音スペクトルの実部及び虚部ごとの出現確率をそれぞれラプラス分布及びガウス分布により近似しているので高精度の近似が可能となり、さらに数学的に完全な導出を行っているので調整するパラメータが少なく、これにより雑音抑圧係数Ｗ(i) を高精度に算出することができる。 At this time, the estimation calculation formula of the noise suppression coefficient W (i) is obtained by approximating the appearance probability for each real part and imaginary part of the speech spectrum by a Laplace distribution as shown in the formula (1), and at the same time the real part of the noise spectrum. It is a relatively simple arithmetic expression that approximates the appearance probability for each imaginary part by a Gaussian distribution and is derived using MAP estimation and Bayes' theorem. For this reason, the noise suppression coefficient W (i) is calculated with a relatively small amount of calculation. Moreover, since the appearance probabilities for the real and imaginary parts of the speech spectrum and noise spectrum are approximated by the Laplace distribution and Gaussian distribution, respectively, high-precision approximation is possible, and further mathematically complete derivation is performed. There are few parameters to be adjusted, so that the noise suppression coefficient W (i) can be calculated with high accuracy.

上記振幅抑圧係数計算部３０で算出された雑音抑圧係数Ｗ(i) は、スペクトル振幅抑圧部２２に与えられる。スペクトル振幅抑圧部２２では、高速フーリエ変換部（ＦＦＴ）２１により求められた振幅スペクトルＸ(k) に、上記振幅抑圧係数計算部３０により算出された雑音抑圧係数Ｗ(i)が帯域ごとに掛け合わされ、これにより雑音抑圧された振幅スペクトルＹ(k) が得られる。そして、この雑音抑圧された振幅スペクトルＹ(k) は、高速フーリエ変換部（ＦＦＴ）２１により求められた位相スペクトルＰ(k) とともに逆高速フーリエ変換部（ＩＦＦＴ）部２３に入力される。逆高速フーリエ変換部（ＩＦＦＴ）部２３では、上記雑音抑圧された振幅スペクトルＹ(k) 及び位相スペクトルＰ(k) が時間領域の信号ｙ(t) に変換され、この変換されたディジタル送話信号ｙ(t) が音声符号復号処理部７による音声符号化処理に供される。 The noise suppression coefficient W (i) calculated by the amplitude suppression coefficient calculation unit 30 is given to the spectrum amplitude suppression unit 22. The spectrum amplitude suppression unit 22 multiplies the amplitude spectrum X (k) obtained by the fast Fourier transform unit (FFT) 21 by the noise suppression coefficient W (i) calculated by the amplitude suppression coefficient calculation unit 30 for each band. As a result, a noise-suppressed amplitude spectrum Y (k) is obtained. The noise-suppressed amplitude spectrum Y (k) is input to the inverse fast Fourier transform unit (IFFT) unit 23 together with the phase spectrum P (k) obtained by the fast Fourier transform unit (FFT) 21. The inverse fast Fourier transform unit (IFFT) unit 23 converts the noise-suppressed amplitude spectrum Y (k) and phase spectrum P (k) into a time domain signal y (t), and the converted digital transmission. The signal y (t) is subjected to speech encoding processing by the speech encoding / decoding processing unit 7.

したがって、音声符号復号処理部７では、雑音成分が抑圧されたディジタル送話信号ｙ(t) に対し音声符号化処理が行われ、これにより生成された送信符号化データが変調されたのち通話相手の携帯電話機へ送信される。この結果、通話相手の話者は雑音の少ない明瞭な音声を聞くことが可能となり、これにより通話品質の向上が図られる。 Accordingly, the voice code decoding processing unit 7 performs voice coding processing on the digital transmission signal y (t) in which the noise component is suppressed, and after the transmission coded data generated thereby is modulated, the other party of the call Sent to the mobile phone. As a result, the other party's speaker can hear clear voice with less noise, thereby improving call quality.

以上述べたように第１の実施形態では、音声スペクトルの実部および虚部ごとの出現確率をラプラス分布により近似するとともに、雑音スペクトルの実部および虚部ごとの出現確率をガウス分布により近似し、かつＭＡＰ推定及びBayesの定理を使用して導出した演算式を使用して、雑音抑圧係数Ｗ(i) を算出するようにしている。したがって、比較的少ない演算量でしかも高精度に雑音抑圧係数Ｗ(i) を算出することができ、これにより高速かつ高精度の雑音抑圧処理が可能なノイズサプレッサとこのノイズサプレッサを備えた音声通信装置を提供することができる。 As described above, in the first embodiment, the appearance probability for each real part and imaginary part of the speech spectrum is approximated by a Laplace distribution, and the appearance probability for each real part and imaginary part of the noise spectrum is approximated by a Gaussian distribution. In addition, the noise suppression coefficient W (i) is calculated using an arithmetic expression derived using MAP estimation and Bayes' theorem. Therefore, it is possible to calculate the noise suppression coefficient W (i) with a relatively small amount of computation and with high accuracy, thereby enabling high-speed and high-accuracy noise suppression processing, and voice communication including the noise suppressor. An apparatus can be provided.

図３は、雑音が重畳された音声データを、この発明の第１の実施形態に係わるノイズサプレッサ２０により雑音抑圧処理したときのセグメンタルＳＮＲの改善度と、前記した第２の先行技術文献（非特許文献２）及び第３の先行技術文献（非特許文献３）に記載された技術により雑音抑圧処理したときのセグメンタルＳＮＲの改善度を、第１の先行技術文献（非特許文献１）に記載された技術によるセグメンタルＳＮＲの改善度をベースラインとしてそれぞれ示したものである。 FIG. 3 shows the improvement in the segmental SNR when the noise data is subjected to noise suppression processing by the noise suppressor 20 according to the first embodiment of the present invention, and the second prior art document ( Non-Patent Document 2) and third prior art document (Non-Patent Document 3), the degree of improvement of the segmental SNR when the noise suppression processing is performed by the technique described in the first prior art document (Non-Patent Document 1). The improvement degree of the segmental SNR by the technique described in the above is shown as a baseline, respectively.

ここで、音声データは、NTT Advanced Technology, “Multi-lingual Speech database for telephonometry 1994”の日本語部分を８kHzにダウンサンプリングしたものを、男女各４名（合計８名）が４文章ずつ発声したものである。雑音データとしては、（社）日本電子工業振興協会、“電子協騒音データベース”1990に記載されたBabble（人混み）、Car （自動車内）、Street（歩道）の３種類を用い、これらを重畳後のＳＮＲが９dB、１８dBになるようにコンピュータ上で音声データに重畳した。 Here, the voice data is a sample of the Japanese part of NTT Advanced Technology, “Multi-lingual Speech database for telephonometry 1994” downsampled to 8kHz, and 4 men and women (total 8 people) uttered 4 sentences. It is. There are three types of noise data: Babble (crowded), Car (in the car), and Street (sidewalk) described in the Japan Electronics Industry Promotion Association, “Electronic Collaborative Noise Database” 1990. Was superimposed on the audio data on the computer so that the SNR was 9 dB and 18 dB.

図３から、この発明の第１の実施形態に係わるノイズサプレッサ２０によるＳＮＲの改善度がもっとも良く、高い雑音抑圧効果が得られることがわかる。特に、ＳＮＲが小さいとき、すなわち雑音レベルが大きいときに著しい効果が奏せられる。 FIG. 3 shows that the improvement of the SNR by the noise suppressor 20 according to the first embodiment of the present invention is the best, and a high noise suppression effect can be obtained. In particular, when the SNR is small, that is, when the noise level is large, a remarkable effect can be obtained.

（第２の実施形態）
この発明の第２の実施形態は、雑音抑圧係数の演算式を分割された周波数帯域ごとに導出して設定し、これらの演算式を用いて上記分割された周波数帯域ごとに雑音抑圧係数を算出するようにしたものである。 (Second Embodiment)
In the second embodiment of the present invention, a noise suppression coefficient arithmetic expression is derived and set for each divided frequency band, and a noise suppression coefficient is calculated for each of the divided frequency bands using these arithmetic expressions. It is what you do.

図４は、この発明の第２の実施形態に係わるノイズサプレッサの要部である雑音抑圧係数計算部３０′の構成を示す機能ブロック図である。同図に示すように、雑音抑圧係数計算部３０′は、帯域分割部２４において設定される分割帯域数に対応する数の雑音抑圧係数計算部３１〜３ｎ（例えば１６個）を備えている。これらの雑音抑圧係数計算部３１〜３ｎはそれぞれ、帯域別に独立して導出・設定された雑音抑圧係数Ｗ(i) の演算式を備えている。すなわち、雑音抑圧係数計算部３１〜３ｎにはそれぞれ、（１）式の但し書きにおけるλが周波数帯域ごとに独立して設定された雑音抑圧係数Ｗ(i) の演算式が用意されている。 FIG. 4 is a functional block diagram showing a configuration of a noise suppression coefficient calculation unit 30 'which is a main part of a noise suppressor according to the second embodiment of the present invention. As shown in the figure, the noise suppression coefficient calculation unit 30 ′ includes noise suppression coefficient calculation units 31 to 3 n (for example, 16) corresponding to the number of division bands set in the band division unit 24. Each of these noise suppression coefficient calculation units 31 to 3n has an arithmetic expression of the noise suppression coefficient W (i) derived and set independently for each band. That is, each of the noise suppression coefficient calculators 31 to 3n is provided with an arithmetic expression of the noise suppression coefficient W (i) in which λ in the proviso of the expression (1) is set independently for each frequency band.

したがって、雑音抑圧係数計算部３０′では、雑音抑圧係数計算部３１〜３ｎにより周波数帯域別にそれぞれ独立して雑音抑圧係数Ｗ(i) （i=1〜n）の算出が行われる。そして、スペクトル振幅抑圧部２２ではそれぞれ、上記帯域別に算出された雑音抑圧係数Ｗ(i) （i=1〜n）に従い、帯域別に振幅スペクトルＸ(k) に含まれる雑音成分の抑圧処理が行われる。 Therefore, in the noise suppression coefficient calculation unit 30 ′, the noise suppression coefficient calculation units 31 to 3n calculate the noise suppression coefficient W (i) (i = 1 to n) independently for each frequency band. Each of the spectrum amplitude suppression units 22 performs a suppression process of the noise component included in the amplitude spectrum X (k) for each band according to the noise suppression coefficient W (i) (i = 1 to n) calculated for each band. Is called.

ここで、周波数帯域ごとにλを独立に設定する理由は、音声のスペクトル確率密度関数（ＳＰＤＦ）の分布が帯域ごとに異なるためである。図５及び図６はそれぞれ、低域および高域における音声のスペクトル確率密度関数（ＳＰＤＦ）の分布の一例を示した図である。ここでは、低域として０Hz〜２５０Hz帯および２５０Hz〜５００Hz帯を、また高域として２０００Hz〜２２５０Hz帯および２２５０Hz〜２５００Hz帯を選んだ。 Here, the reason why λ is set independently for each frequency band is that the distribution of the spectral probability density function (SPDF) of speech differs for each band. FIG. 5 and FIG. 6 are diagrams showing examples of the distribution of the spectral probability density function (SPDF) of speech in the low frequency range and the high frequency range, respectively. Here, the 0 Hz to 250 Hz band and the 250 Hz to 500 Hz band are selected as the low band, and the 2000 Hz to 2250 Hz band and the 2250 Hz to 2500 Hz band are selected as the high band.

図５及び図６から明らかなように、低域のＳＰＤＦはまばらに分布するのに対し、高域のＳＰＤＦは小さい値に集中している。こうした分布の異なるＳＰＤＦに対して、同一のλで近似するよりは、帯域ごとに最適なλにより近似した方がより高い精度で近似することが可能である。 As is apparent from FIGS. 5 and 6, the low-frequency SPDFs are sparsely distributed, while the high-frequency SPDFs are concentrated on small values. It is possible to approximate the SPDFs having different distributions with higher accuracy by approximating with the optimum λ for each band than approximating with the same λ.

（その他の実施形態）
前記各実施形態では、音声スペクトルをラプラス分布を用いて近似するようにしたが、ガンマ分布を用いて近似するようにしてもよい。また、前記実施形態ではノイズサプレッサ２０をＤＳＰにより実現する場合を例にとって説明したが、雑音抑圧処理プログラムをマイクロプロセッサに実行させることで実現してもよい。その他、ノイズサプレッサの機能構成、各機能の処理手順及び処理内容、音声通信装置の種類とその構成等についても、この発明の要旨を逸脱しない範囲で種々変形して実施できる。 (Other embodiments)
In each of the above embodiments, the speech spectrum is approximated using the Laplace distribution, but may be approximated using the gamma distribution. In the above embodiment, the case where the noise suppressor 20 is realized by a DSP has been described as an example. However, the noise suppressor 20 may be realized by causing a microprocessor to execute a noise suppression processing program. In addition, the functional configuration of the noise suppressor, the processing procedure and processing contents of each function, the type and configuration of the voice communication device, and the like can be variously modified without departing from the scope of the present invention.

要するにこの発明は、上記各実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記各実施形態に開示されている複数の構成要素の適宜な組み合せにより種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態に亘る構成要素を適宜組み合せてもよい。 In short, the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the components without departing from the scope of the invention in the implementation stage. Moreover, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, you may combine suitably the component covering different embodiment.

この発明に係わるノイズサプレッサを備えた音声通信装置の第１の実施形態であるＣＤＭＡ携帯電話機の構成を示すブロック図。1 is a block diagram showing a configuration of a CDMA mobile phone which is a first embodiment of a voice communication apparatus including a noise suppressor according to the present invention. 図１に示す音声通信装置に設けられるノイズサプレッサの機能構成を示すブロック図。The block diagram which shows the function structure of the noise suppressor provided in the audio | voice communication apparatus shown in FIG. この発明に係わるノイズサプレッサと先行技術文献に記載されたノイズサプレッサとの使用環境別の雑音抑圧効果を対比して示す図。The figure which contrasts and shows the noise suppression effect according to use environment of the noise suppressor concerning this invention and the noise suppressor described in prior art literature. この発明の第２の実施形態に係わるノイズサプレッサの雑音抑圧計数計算部の構成を示すブロック図。The block diagram which shows the structure of the noise suppression count calculation part of the noise suppressor concerning the 2nd Embodiment of this invention. 低域における音声スペクトルの確率密度関数（ＳＰＤＦ）の一例を示す図。The figure which shows an example of the probability density function (SPDF) of the audio | voice spectrum in a low region. 低域における音声スペクトルの確率密度関数（ＳＰＤＦ）の一例を示す図。The figure which shows an example of the probability density function (SPDF) of the audio | voice spectrum in a low region.

Explanation of symbols

１…アンテナ、２…アンテナ共用器（ＤＵＰ）、３…受信回路（ＲＸ）、４…周波数シンセサイザ（ＳＹＮ）、５…送信回路（ＴＸ）、６…ＣＤＭＡ信号処理部、７…音声符号復号処理部（ＳＰ−ＣＯＤ）、８…ＰＣＭ符号処理部（ＰＣＭコーデック）、９…受話増幅器、１０…スピーカ、１１…マイクロホン、１２…送話増幅器、１３…記憶部、１４…入力部、１５…表示部、１６…電源回路（ＰＯＷ）、１７…バッテリ、１８…制御部、２０…ノイズサプレッサ（ＮＳ）、２１…高速フーリエ変換部（ＦＦＴ）、２２…スペクトル振幅抑圧部、２３…逆高速フーリエ変換部（ＩＦＦＴ）、２４…帯域分割部、２５…音声検出部、２６…雑音レベル推定部、２７…事後ＳＮＲ推定部、２８…帯域分割部、２９…事前ＳＮＲ推定部、３０，３１〜３ｎ…雑音抑圧係数計算部。 DESCRIPTION OF SYMBOLS 1 ... Antenna, 2 ... Antenna duplexer (DUP), 3 ... Reception circuit (RX), 4 ... Frequency synthesizer (SYN), 5 ... Transmission circuit (TX), 6 ... CDMA signal processing part, 7 ... Speech code decoding process (SP-COD), 8 ... PCM code processing unit (PCM codec), 9 ... receiver amplifier, 10 ... speaker, 11 ... microphone, 12 ... transmitter amplifier, 13 ... storage unit, 14 ... input unit, 15 ... display , 16 ... Power supply circuit (POW), 17 ... Battery, 18 ... Control unit, 20 ... Noise suppressor (NS), 21 ... Fast Fourier transform unit (FFT), 22 ... Spectral amplitude suppression unit, 23 ... Inverse fast Fourier transform Unit (IFFT), 24 ... band division unit, 25 ... voice detection unit, 26 ... noise level estimation unit, 27 ... post-hoc SNR estimation unit, 28 ... band division unit, 29 ... pre-SNR estimation unit, 30, 3 ~3n ... noise suppression coefficient calculation unit.

Claims

Means for obtaining the frequency spectrum from the input signal;
Means for dividing the determined frequency spectrum into a plurality of bands;
Means for estimating a noise spectrum based on the frequency spectrum for each of the divided bands;
Means for calculating a signal-to-noise ratio from the frequency spectrum and the estimated noise spectrum for each of the divided bands;
Means for setting a noise suppression coefficient based on the calculated signal-to-noise ratio;
Means for weighting the determined frequency spectrum according to the set noise suppression coefficient;
Means for converting the weighted frequency spectrum into a time domain signal;
The means for setting the noise suppression coefficient is:
The speech spectrum estimation formula derived by approximating the appearance probability for each real and imaginary part of the speech spectrum included in the frequency spectrum by a statistical distribution model is partially differentiated to be zero, and the phase spectrum is φ and A noise suppressor, wherein the noise suppression coefficient is calculated according to an arithmetic expression approximated by | cosφ | + | sinφ |

The means for setting the noise suppression coefficient is:
The appearance probability for each real part and imaginary part of the speech spectrum included in the frequency spectrum is approximated by the first statistical distribution model, and the appearance probability for each real part and imaginary part of the estimated noise spectrum is set to the second According to the arithmetic expression approximated by | cosφ | + | sinφ | as a constant when the differential equation of the speech spectrum derived by approximating with the statistical distribution model is set to zero and the phase spectrum is φ. The noise suppressor according to claim 1, wherein the noise suppression coefficient is calculated.

The means for setting the noise suppression coefficient is:
The noise suppressor according to claim 1, further comprising the arithmetic expression derived for each of the divided bands, and calculating the noise suppression coefficient for each of the divided bands according to the arithmetic expression.

The means for setting the noise suppression coefficient is:
The speech spectrum estimation formula is derived by using a Laplace distribution or a gamma distribution as a statistical distribution model of the speech spectrum, and using MAP estimation (Maximum A posteriori Estimation) and Bayes' theorem. Or the noise suppressor of 3.

The means for setting the noise suppression coefficient is:
Using Laplace distribution or gamma distribution as the first statistical distribution model, using Gaussian distribution as the second statistical distribution model, and using MAP estimation (Maximum A posteriori Estimation) and Bayes' theorem, The noise suppressor according to claim 2 or 3, wherein an estimation formula of a speech spectrum is derived.

An audio input unit that inputs audio and outputs a digital audio signal corresponding to the input audio;
A noise suppressor for suppressing a noise component included in the digital audio signal output from the audio input unit and outputting a digital audio signal in which the noise component is suppressed;
A digital audio signal output from the noise suppressor is converted into a transmission signal, and includes a transmission unit that transmits the transmission signal to a transmission line,
The noise suppressor is
Means for obtaining the frequency spectrum from the digital audio signal output from the audio input unit;
Means for dividing the determined frequency spectrum into a plurality of bands;
Means for estimating a noise spectrum based on the frequency spectrum for each of the divided bands;
Means for calculating a signal-to-noise ratio from the frequency spectrum and the estimated noise spectrum for each of the divided bands;
Means for setting a noise suppression coefficient based on the calculated signal-to-noise ratio;
Means for weighting the obtained frequency spectrum according to the set noise suppression coefficient;
Means for converting the weighted frequency spectrum into a time domain signal;
The means for setting the noise suppression coefficient is:
The speech spectrum estimation formula derived by approximating the appearance probability for each real and imaginary part of the speech spectrum included in the frequency spectrum by a statistical distribution model is partially differentiated to be zero, and the phase spectrum is φ and The noise suppression coefficient is calculated according to an arithmetic expression approximated by | cosφ | + | sinφ |

Means for obtaining the frequency spectrum from the input signal;
Means for dividing the determined frequency spectrum into a plurality of bands;
Means for estimating the past and present noise spectra based on the frequency spectrum for each of the divided bands;
For each of the divided bands, a subsequent signal-to-noise ratio SNR (i) (i is the number of divided bands) is calculated from the frequency spectrum before noise suppression and the estimated past noise spectrum; Means for calculating a prior signal-to-noise ratio SNR ′ (i) (i is the number of divided bands) from the frequency spectrum after noise suppression and the current noise spectrum;
Means for setting a noise suppression coefficient W (i) (i is the number of divided bands) based on the calculated posterior signal-to-noise ratio SNR (i) and a priori signal band noise ratio SNR ′ (i) When,
Means for weighting the obtained frequency spectrum according to the set noise suppression coefficient W (i);
Means for converting the weighted frequency spectrum into a time domain signal;
The means for setting the noise suppression coefficient W (i) is a formula for calculating the noise suppression coefficient W (i) when λ is a set value for approximation.

A noise suppressor that calculates a noise suppression coefficient W (i) using

An audio input unit that inputs audio and outputs a digital audio signal corresponding to the input audio;
A noise suppressor for suppressing a noise component included in the digital audio signal output from the audio input unit and outputting a digital audio signal in which the noise component is suppressed;
A digital audio signal output from the noise suppressor is converted into a transmission signal, and includes a transmission unit that transmits the transmission signal to a transmission line,
The noise suppressor is
Means for obtaining the frequency spectrum from the digital audio signal output from the audio input unit;
Means for dividing the determined frequency spectrum into a plurality of bands;
Means for estimating the past and present noise spectra based on the frequency spectrum for each of the divided bands;
For each of the divided bands, a subsequent signal-to-noise ratio SNR (i) (i is the number of divided bands) is calculated from the frequency spectrum before noise suppression and the estimated past noise spectrum; Means for estimating a prior signal-to-noise ratio SNR ′ (i) (i is the number of divided bands) from the frequency spectrum after noise suppression and the current noise spectrum;
Means for setting a noise suppression coefficient W (i) (i is the number of divided bands) based on the calculated posterior signal-to-noise ratio SNR (i) and a priori signal band noise ratio SNR ′ (i) When,
Means for weighting the obtained frequency spectrum according to the set noise suppression coefficient W (i);
Means for converting the weighted frequency spectrum into a time domain signal;
The means for setting the noise suppression coefficient W (i) is a formula for calculating the noise suppression coefficient W (i) when λ is a set value for approximation.

A voice communication apparatus that calculates a noise suppression coefficient W (i) using