JP4968355B2

JP4968355B2 - Method and apparatus for noise suppression

Info

Publication number: JP4968355B2
Application number: JP2010068541A
Authority: JP
Inventors: 正徳加藤; 昭彦杉山
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2010-03-24
Filing date: 2010-03-24
Publication date: 2012-07-04
Anticipated expiration: 2025-05-31
Also published as: JP2010140063A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and a device for noise suppression capable of obtaining an emphasis voice which is excellent in subjective voice quality without generating noise in an output signal, and achieving high emphasis voice quality by using a suppression coefficient suitable for all background noise. <P>SOLUTION: The device includes: a coefficient calculation section for a silence part which calculates a coefficient for the silence part on the basis of an emphasis voice power spectrum and an estimation noise power spectrum; a coefficient storage section for storing a coefficient for a sonant part; and a post-suppression coefficient calculation section for calculating a post-suppression coefficient on the basis of the obtained coefficient for the silence part and that for the sonant part. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、所望の音声信号に重畳されている雑音を抑圧するための雑音抑圧の方法及び装置に関する。 The present invention relates to a noise suppression method and apparatus for suppressing noise superimposed on a desired audio signal.

ノイズ・サプレッサは、所望の音声信号に重畳されている雑音（ノイズ）を抑圧する技術であり、周波数領域に変換した入力信号を用いて雑音成分のパワースペクトルを推定し、この推定パワースペクトルを入力信号から差し引くことにより、所望の音声信号に混在する雑音を抑圧するように動作する。雑音成分のパワースペクトルを継続的に推定することにより、非定常な雑音の抑圧にも適用することができる。ノイズ・サプレッサとしては、例えば、特許文献１に記載されている方式がある。図３６に、特許文献１に記載されたノイズ・サプレッサの構成を示す。 The noise suppressor is a technology that suppresses noise (noise) superimposed on the desired audio signal. The power spectrum of the noise component is estimated using the input signal converted to the frequency domain, and this estimated power spectrum is input. By subtracting from the signal, it operates to suppress noise mixed in the desired audio signal. By continuously estimating the power spectrum of the noise component, it can also be applied to non-stationary noise suppression. As a noise suppressor, for example, there is a method described in Patent Document 1. FIG. 36 shows the configuration of the noise suppressor described in Patent Document 1.

入力端子１１には、劣化音声信号（所望音声信号と雑音の混在する信号）が、サンプル値系列として供給される。劣化音声信号サンプルは、フレーム分割部１に供給され、Ｋ／２サンプル毎のフレームに分割される。ここに、Ｋは偶数とする。フレームに分割された劣化音声信号サンプルは、窓がけ処理部２に供給され、窓関数ｗ（ｔ）との乗算が行なわれる。第ｎフレームの入力信号ｙ_ｎ（ｔ）（ｔ＝０，１，．．．，Ｋ／２−１）に対するｗ（ｔ）で窓がけされた信号ｙ_ｎ（ｔ）バーは、次式で与えられる。

The input terminal 11 is supplied with a degraded voice signal (a signal in which a desired voice signal and noise are mixed) as a sample value series. The deteriorated speech signal samples are supplied to the frame dividing unit 1 and divided into frames for every K / 2 samples. Here, K is an even number. The degraded speech signal samples divided into frames are supplied to the windowing processing unit 2 and multiplied by the window function w (t). The signal y _n (t) bar windowed with w (t) for the input signal y _n (t) (t = 0, 1,..., K / 2-1) of the nth frame is given by Given.

また、連続する２フレームの一部を重ね合わせ（オーバラップ）して窓がけすることも広く行なわれている。オーバラップ長としてフレーム長の５０％を仮定すれば、ｔ＝０，１，．．．，Ｋ／２−１に対して、

In addition, it is also widely performed to overlap a part of two consecutive frames. Assuming 50% of the frame length as the overlap length, t = 0, 1,. . . , For K / 2-1,

で得られるｙ_ｎ（ｔ）バー（ｔ＝０，１，．．．，Ｋ−１）が、窓がけ処理部２の出力となる。実数信号に対しては、左右対称窓関数が用いられる。また、窓関数は、抑圧係数を１に設定したときの入力信号と出力信号が計算誤差を除いて一致するように設計される。これは、ｗ（ｔ）＋ｗ（ｔ＋Ｋ／２）＝１となることを意味する。 Y _n (t) bars (t = 0, 1,..., K−1) obtained in the above are output from the windowing processing unit 2. For real signals, a symmetric window function is used. The window function is designed so that the input signal and the output signal when the suppression coefficient is set to 1 match except for calculation errors. This means that w (t) + w (t + K / 2) = 1.

以後、連続する２フレームの５０％をオーバラップして窓がけする場合を例として説明を続ける。ｗ（ｔ）としては、例えば次式に示すハニング窓を用いることができる。

Hereinafter, the description will be continued by taking as an example a case in which 50% of two consecutive frames overlap each other. As w (t), for example, a Hanning window represented by the following equation can be used.

窓がけされた出力ｙ_ｎ（ｔ）バーは、フーリエ変換部３に供給され、劣化音声スペクトルＹ_ｎ（ｋ）に変換される。劣化音声スペクトルＹ_ｎ（ｋ）は位相と振幅に分離され、劣化音声位相スペクトルａｒｇＹ_ｎ（ｋ）は逆フーリエ変換部９に、劣化音声振幅スペクトル｜Ｙ_ｎ（ｋ）｜は、多重乗算部１３と多重乗算部１６に供給される。 The windowed output y _n (t) bar is supplied to the Fourier transform unit 3 and converted into a degraded speech spectrum Y _n (k). The degraded speech spectrum Y _n (k) is separated into phase and amplitude, the degraded speech phase spectrum arg Y _n (k) is sent to the inverse Fourier transform unit 9, and the degraded speech amplitude spectrum | Y _n (k) | 13 and the multiple multiplier 16.

多重乗算部１３は、供給された劣化音声振幅スペクトル｜Ｙ_ｎ（ｋ）｜を用いて劣化音声パワースペクトルを計算し、推定雑音計算部５、周波数別ＳＮＲ（信号対雑音比）計算部６及び重みつき劣化音声計算部１４に伝達する。重みつき劣化音声計算部１４は、多重乗算部１３から供給された劣化音声パワースペクトルを用いて重みつき劣化音声パワースペクトルを計算し、推定雑音計算部５に伝達する。推定雑音計算部５は、劣化音声パワースペクトル、重みつき劣化音声パワースペクトル、及びカウンタ４から供給されるカウント値を用いて雑音のパワースペクトルを推定し、推定雑音パワースペクトルとして周波数別ＳＮＲ計算部６に伝達する。周波数別ＳＮＲ計算部６は、入力された劣化音声パワースペクトルと推定雑音パワースペクトルを用いて周波数別にＳＮＲを計算し、後天的ＳＮＲとして推定先天的ＳＮＲ計算部７と雑音抑圧係数生成部８に供給する。 The multiplex multiplier 13 calculates a deteriorated speech power spectrum using the supplied degraded speech amplitude spectrum | Y _n (k) |, an estimated noise calculator 5, a frequency-specific SNR (signal-to-noise ratio) calculator 6, and This is transmitted to the weighted deteriorated speech calculation unit 14. The weighted deteriorated sound calculation unit 14 calculates a weighted deteriorated sound power spectrum using the deteriorated sound power spectrum supplied from the multiple multiplier 13 and transmits the weighted deteriorated sound power spectrum to the estimated noise calculation unit 5. The estimated noise calculation unit 5 estimates the noise power spectrum using the deteriorated voice power spectrum, the weighted deteriorated voice power spectrum, and the count value supplied from the counter 4, and the SNR calculation unit 6 for each frequency as the estimated noise power spectrum. To communicate. The frequency-specific SNR calculation unit 6 calculates the SNR for each frequency using the input degraded speech power spectrum and the estimated noise power spectrum, and supplies the SNR to the estimated innate SNR calculation unit 7 and the noise suppression coefficient generation unit 8 as an acquired SNR. To do.

推定先天的ＳＮＲ計算部７は、入力された後天的ＳＮＲ、及び抑圧係数補正部１５から供給された補正抑圧係数を用いて先天的ＳＮＲを推定し、推定先天的ＳＮＲとして、雑音抑圧係数生成部８に伝達する。雑音抑圧係数生成部８は、入力として供給された後天的ＳＮＲ、推定先天的ＳＮＲ及び音声非存在確率記憶部２１から供給される音声非存在確率を用いて雑音抑圧係数を生成し、抑圧係数として抑圧係数補正部１５に伝達する。抑圧係数補正部１５は、入力された推定先天的ＳＮＲと抑圧係数を用いて抑圧係数を補正し、補正抑圧係数Ｇ_ｎ（ｋ）バーとして多重乗算部１６に供給する。多重乗算部１６は、フーリエ変換部３から供給された劣化音声振幅スペクトル｜Ｙ_ｎ（ｋ）｜を、抑圧係数補正部１５から供給された補正抑圧係数Ｇ_ｎ（ｋ）バーで重み付けすることによって強調音声振幅スペクトル｜Ｘ_ｎ（ｋ）｜バーを求め、逆フーリエ変換部９に伝達する。｜Ｘ_ｎ（ｋ）｜バーは、式（４）で与えられる。

The estimated innate SNR calculation unit 7 estimates the innate SNR using the acquired acquired SNR and the corrected suppression coefficient supplied from the suppression coefficient correction unit 15, and a noise suppression coefficient generation unit as the estimated innate SNR 8 is transmitted. The noise suppression coefficient generation unit 8 generates a noise suppression coefficient using the acquired SNR supplied as input, the estimated innate SNR, and the speech non-existence probability supplied from the speech non-existence probability storage unit 21, and serves as a suppression coefficient. This is transmitted to the suppression coefficient correction unit 15. The suppression coefficient correction unit 15 corrects the suppression coefficient using the input estimated innate SNR and the suppression coefficient, and supplies the correction coefficient to the multiple multiplication unit 16 as a corrected suppression coefficient G _n (k) bar. The multiplex multiplier 16 weights the deteriorated speech amplitude spectrum | Y _n (k) | supplied from the Fourier transform unit 3 with the corrected suppression coefficient G _n (k) bar supplied from the suppression coefficient correction unit 15. The emphasized speech amplitude spectrum | X _n (k) | bar is obtained and transmitted to the inverse Fourier transform unit 9. | X _n (k) | bar is given by equation (4).

逆フーリエ変換部９は、多重乗算部１６から供給された強調音声振幅スペクトル｜Ｘ_ｎ（ｋ）｜バーとフーリエ変換部３から供給された劣化音声位相スペクトルａｒｇＹ_ｎ（ｋ）を乗算して、強調音声Ｘ_ｎ（ｋ）バーを求める。すなわち、

The inverse Fourier transform unit 9 multiplies the enhanced speech amplitude spectrum | X _n (k) | bar supplied from the multiple multiplication unit 16 and the degraded speech phase spectrum arg Y _n (k) supplied from the Fourier transform unit 3. Then, the emphasized speech X _n (k) bar is obtained. That is,

を実行する。 Execute.

得られた強調音声Ｘ_ｎ（ｋ）バーに逆フーリエ変換を施し、１フレームがＫサンプルから構成される時間領域サンプル値系列ｘ_ｎ（ｔ）バー（ｔ＝０，１，．．．，Ｋ−１）として、フレーム合成部１０に伝達する。フレーム合成部１０は、ｘ_ｎ（ｔ）バーの隣接する２フレームからＫ／２サンプルずつを取り出して重ね合わせ、

The obtained emphasized speech X _n (k) bar is subjected to inverse Fourier transform, and a time-domain sample value series x _n (t) bar (t = 0, 1,. -1) is transmitted to the frame synthesis unit 10. The frame synthesis unit 10 takes out K / 2 samples from two adjacent frames of the x _n (t) bar and superimposes them,

によって、強調音声ｘ_ｎ（ｔ）ハットを得る。得られた強調音声ｘ_ｎ（ｔ）ハット（ｔ＝０，１，．．．，Ｋ−１）が、フレーム合成部１０の出力として、出力端子１２に伝達される。 To obtain the emphasized speech x _n (t) hat. The obtained enhanced speech x _n (t) hat (t = 0, 1,..., K−1) is transmitted to the output terminal 12 as an output of the frame synthesis unit 10.

図３７は、図３６に含まれる多重乗算部１３の構成を示すブロック図である。多重乗算部１３は、乗算器１３０１_０〜１３０１_Ｋ−１、分離部１３０２、１３０３、多重化部１３０４を有する。多重化された状態で図３６のフーリエ変換部３から供給された劣化音声振幅スペクトルは、分離部１３０２及び１３０３において周波数別のＫサンプルに分離され、それぞれ乗算器１３０１_０〜１３０１_Ｋ−１に供給される。乗算器１３０１_０〜１３０１_Ｋ−１は、それぞれ入力された信号を２乗し、多重化部１３０４に伝達する。多重化部１３０４は、入力された信号を多重化し、劣化音声パワースペクトルとして出力する。 FIG. 37 is a block diagram showing a configuration of the multiple multiplier 13 included in FIG. The multiple multiplier 13 includes multipliers 1301 _{0 to} 1301 _K−1 , demultiplexers 1302 and 1303, and a multiplexer 1304. The degraded speech amplitude spectrum supplied from the Fourier transform unit 3 in FIG. 36 in the multiplexed state is separated into K samples for each frequency in the separation units 1302 and 1303 and supplied to the multipliers 1301 _{0 to} 1301 _K−1 , respectively. Is done. Multipliers 1301 _{0 to} 1301 _K−1 square the input signals, respectively, and transmit them to multiplexing section 1304. The multiplexing unit 1304 multiplexes the input signal and outputs it as a degraded voice power spectrum.

図３８は重みつき劣化音声計算部１４の構成を示すブロック図である。重みつき劣化音声計算部１４は、推定雑音記憶部１４０１、周波数別ＳＮＲ計算部１４０２、多重非線形処理部１４０５、及び多重乗算部１４０４を有する。推定雑音記憶部１４０１は、図３６の推定雑音計算部５から供給される推定雑音パワースペクトルを記憶し、１フレーム前に記憶された推定雑音パワースペクトルを周波数別ＳＮＲ計算部１４０２へ出力する。周波数別ＳＮＲ計算部１４０２は、推定雑音記憶部１４０１から供給される推定雑音パワースペクトルと図３６の多重乗算部１３から供給される劣化音声パワースペクトルを用いてＳＮＲを各周波数毎に求め、多重非線形処理部１４０５に出力する。多重非線形処理部１４０５は、周波数別ＳＮＲ計算部１４０２から供給されるＳＮＲを用いて重み係数ベクトルを計算し、重み係数ベクトルを多重乗算部１４０４に出力する。 FIG. 38 is a block diagram showing the configuration of the weighted deteriorated speech calculation unit 14. The weighted degraded speech calculation unit 14 includes an estimated noise storage unit 1401, a frequency-specific SNR calculation unit 1402, a multiple nonlinear processing unit 1405, and a multiple multiplication unit 1404. The estimated noise storage unit 1401 stores the estimated noise power spectrum supplied from the estimated noise calculation unit 5 of FIG. 36, and outputs the estimated noise power spectrum stored one frame before to the SNR calculation unit 1402 for each frequency. The frequency-specific SNR calculation unit 1402 obtains an SNR for each frequency using the estimated noise power spectrum supplied from the estimated noise storage unit 1401 and the degraded speech power spectrum supplied from the multiplex multiplier 13 in FIG. The data is output to the processing unit 1405. The multiple nonlinear processing unit 1405 calculates a weight coefficient vector using the SNR supplied from the frequency-specific SNR calculation section 1402, and outputs the weight coefficient vector to the multiple multiplication section 1404.

多重乗算部１４０４は、図３６の多重乗算部１３から供給される劣化音声パワースペクトルと、多重非線形処理部１４０５から供給される重み係数ベクトルの積を周波数毎に計算し、重みつき劣化音声パワースペクトルを図３６の推定雑音記憶部５に出力する。多重乗算部１４０４の構成は、既に図３７を用いて説明した多重乗算部１３に等しいので、詳細な説明は省略する。 The multiplex multiplier 1404 calculates the product of the degraded speech power spectrum supplied from the multiplex multiplier 13 in FIG. 36 and the weight coefficient vector supplied from the multiplex nonlinear processor 1405 for each frequency, and weighted degraded speech power spectrum. Is output to the estimated noise storage unit 5 of FIG. The configuration of the multiple multiplier 1404 is the same as that of the multiple multiplier 13 already described with reference to FIG.

図３９は、図３８に含まれる周波数別ＳＮＲ計算部１４０２の構成を示すブロック図である。周波数別ＳＮＲ計算部１４０２は、除算部１４２１_０〜１４２１_Ｋ−１、分離部１４２２、１４２３、多重化部１４２４を有する。図３６の多重乗算部１３から供給される劣化音声パワースペクトルは、分離部１４２２に伝達される。図３８の推定雑音記憶部１４０１から供給される推定雑音パワースペクトルは、分離部１４２３に伝達される。劣化音声パワースペクトルは分離部１４２２において、推定雑音パワースペクトルは分離部１４２３において、それぞれ周波数成分に対応したＫサンプルに分離され、それぞれ除算部１４２１_０〜１４２１_Ｋ−１に供給される。除算部１４２１_０〜１４２１_Ｋ−１では、式（７）に従って、供給された劣化音声パワースペクトルを推定雑音パワースペクトルで除算して周波数別ＳＮＲγ_ｎ（ｋ）ハットを求め、多重化部１４２４に伝達する。

FIG. 39 is a block diagram showing the configuration of the frequency-specific SNR calculator 1402 included in FIG. The frequency-specific SNR calculation unit 1402 includes division units 1421 _{0 to} 1421 _K−1 ,

separation units

1422 and 1423, and a multiplexing unit 1424. The degraded sound power spectrum supplied from the multiple multiplier 13 in FIG. 36 is transmitted to the separator 1422. The estimated noise power spectrum supplied from the estimated noise storage unit 1401 in FIG. 38 is transmitted to the separation unit 1423. The degraded voice power spectrum is separated into K samples corresponding to the frequency components in the separation unit 1422 and the estimated noise power spectrum is separated in the separation unit 1423, and supplied to the division units 1421 _{0 to} 1421 _K−1 . The division units 1421 _{0 to} 1421 _K−1 divide the supplied degraded speech power spectrum by the estimated noise power spectrum according to the equation (7) to obtain the frequency-specific SNRγ _n (k) hat, and transmit it to the multiplexing unit 1424. To do.

ここに、λ_ｎ−１（ｋ）は１フレーム前に記憶された推定雑音パワースペクトルである。多重化部１４２４は、伝達されたＫ個の周波数別ＳＮＲを多重化して、図３８の多重非線形処理部１４０５へ伝達する。 Here, λ _n-1 (k) is an estimated noise power spectrum stored one frame before. The multiplexing unit 1424 multiplexes the transmitted K frequency-specific SNRs, and transmits the multiplexed SNRs to the multiple nonlinear processing unit 1405 in FIG.

次に、図４０を参照しながら、図３８の多重非線形処理部１４０５の構成と動作について詳しく説明する。図４０は、重みつき劣化音声計算部１４に含まれる多重非線形処理部１４０５の構成を示すブロック図である。多重非線形処理部１４０５は、分離部１４９５、非線形処理部１４８５_０〜１４８５_Ｋ−１、及び多重化部１４７５を有する。分離部１４９５は、図３８の周波数別ＳＮＲ計算部１４０２から供給されるＳＮＲを周波数別のＳＮＲに分離し、非線形処理部１４８５_０〜１４８５_Ｋ−１に出力する。非線形処理部１４８５_０〜１４８５_Ｋ−１は、それぞれ入力値に応じた実数値を出力する非線形関数を有する。 Next, the configuration and operation of the multiple nonlinear processing unit 1405 of FIG. 38 will be described in detail with reference to FIG. FIG. 40 is a block diagram illustrating a configuration of the multiple nonlinear processing unit 1405 included in the weighted deteriorated speech calculation unit 14. The multiple nonlinear processing unit 1405 includes a separation unit 1495, nonlinear processing units 1485 _{0 to} 1485 _K−1 , and a multiplexing unit 1475. The separation unit 1495 separates the SNR supplied from the frequency-specific SNR calculation unit 1402 in FIG. 38 into frequency-specific SNRs, and outputs them to the nonlinear processing units 1485 _{0 to} 1485 _K−1 . Each of the nonlinear processing units 1485 _{0 to} 1485 _K−1 has a nonlinear function that outputs a real value corresponding to the input value.

図４１に、非線形関数の例を示す。ｆ_１を入力値としたとき、図４１に示される非線形関数の出力値ｆ_２は、

FIG. 41 shows an example of a nonlinear function. When f ₁ is an input value, the output value f ₂ of the nonlinear function shown in FIG.

で与えられる。但し、ａとｂは任意の実数である。 Given in. However, a and b are arbitrary real numbers.

非線形処理部１４８５_０〜１４８５_Ｋ−１は、分離部１４９５から供給される周波数別ＳＮＲを、非線形関数によって処理して重み係数を求め、多重化部１４７５に出力する。すなわち、非線形処理部１４８５_０〜１４８５_Ｋ−１はＳＮＲに応じた１から０までの重み係数を出力する。ＳＮＲが小さい時は１を、大きい時は０を出力する。多重化部１４７５は、非線形処理部１４８５_０〜１４８５_Ｋ−１から出力された重み係数を多重化し、重み係数ベクトルとして多重乗算部１４０４に出力する。 The non-linear processing units 1485 _{0 to} 1485 _K−1 process the frequency-specific SNR supplied from the demultiplexing unit 1495 by a non-linear function to obtain a weighting coefficient, and output it to the multiplexing unit 1475. That is, the non-linear processing units 1485 _{0 to} 1485 _K−1 output weighting coefficients from 1 to 0 according to the SNR. 1 is output when the SNR is small, and 0 is output when the SNR is large. The multiplexing unit 1475 multiplexes the weighting factors output from the non-linear processing units 1485 _{0 to} 1485 _K−1, and outputs the result to the multiplexing multiplication unit 1404 as a weighting factor vector.

図３８の多重乗算部１４０４で劣化音声パワースペクトルと乗算される重み係数は、ＳＮＲに応じた値になっており、ＳＮＲが大きい程、すなわち劣化音声に含まれる音声成分が大きい程、重み係数の値は小さくなる。推定雑音の更新には一般に劣化音声パワースペクトルが用いられるが、推定雑音の更新に用いる劣化音声パワースペクトルに対して、ＳＮＲに応じた重みづけを行うことで、劣化音声パワースペクトルに含まれる音声成分の影響を小さくすることができ、より精度の高い雑音推定を行うことができる。なお、重み係数の計算に非線形関数を用いた例を示したが、非線形関数以外にも線形関数や高次多項式など、他の形で表されるＳＮＲの関数を用いる事も可能である。 The weighting coefficient multiplied by the degraded speech power spectrum in the multiple multiplier 1404 in FIG. 38 has a value corresponding to the SNR. The greater the SNR, that is, the greater the speech component included in the degraded speech, The value becomes smaller. In general, a degraded speech power spectrum is used to update the estimated noise. However, a speech component included in the degraded speech power spectrum can be obtained by weighting the degraded speech power spectrum used to update the estimated noise according to the SNR. Can be reduced, and more accurate noise estimation can be performed. In addition, although the example which used the nonlinear function for the calculation of a weighting coefficient was shown, it is also possible to use the function of SNR represented with other forms, such as a linear function and a high-order polynomial, besides a nonlinear function.

図４２は、図３６に含まれる推定雑音計算部５の構成を示すブロック図である。雑音推定計算部５は、分離部５０１、５０２、多重化部５０３、及び周波数別推定雑音計算部５０４_０〜５０４_Ｋ−１を有する。分離部５０１は、図３６の重みつき劣化音声計算部１４から供給される重みつき劣化音声パワースペクトルを周波数別の重みつき劣化音声パワースペクトルに分離し、周波数別推定雑音計算部５０４_０〜５０４_Ｋ−１にそれぞれ供給する。分離部５０２は、図３６の多重乗算部１３から供給される劣化音声パワースペクトルを周波数別の劣化音声パワースペクトルに分離し、周波数別推定雑音計算部５０４_０〜５０４_Ｋ−１にそれぞれ出力する。周波数別推定雑音計算部５０４_０〜５０４_Ｋ−１は、分離部５０１から供給される周波数別重みつき劣化音声パワースペクトル、分離部５０２から供給される周波数別劣化音声パワースペクトル、及び図３６のカウンタ４から供給されるカウント値から周波数別推定雑音パワースペクトルを計算し、多重化部５０３へ出力する。多重化部５０３は、周波数別推定雑音計算部５０４_０〜５０４_Ｋ−１から供給される周波数別推定雑音パワースペクトルを多重化し、推定雑音パワースペクトルを図３６の周波数別ＳＮＲ計算部６と重みつき劣化音声計算部１４へ出力する。周波数別推定雑音計算部５０４_０〜５０４_Ｋ−１の構成と動作の詳細な説明は、図４３を参照しながら行う。 FIG. 42 is a block diagram showing a configuration of estimated noise calculation unit 5 included in FIG. The noise estimation calculation unit 5 includes separation units 501 and 502, a multiplexing unit 503, and frequency-specific estimation noise calculation units 504 _{0 to} 504 _K−1 . The separation unit 501 separates the weighted deteriorated sound power spectrum supplied from the weighted deteriorated sound calculation unit 14 of FIG. 36 into the weighted deteriorated sound power spectrum for each frequency, and the frequency-specific estimated noise calculation units 504 _{0 to} 504 _{K. To -1} . Separation section 502 separates the degraded speech power spectrum supplied from multiplex multiplication section 13 in FIG. 36 into degraded speech power spectra for each frequency and outputs them to frequency-specific estimated noise calculation sections 504 _{0 to} 504 _K−1 . 36. The frequency-specific estimated noise calculation units 504 _{0 to} 504 _K−1 are the frequency _- dependent weighted degraded speech power spectrum supplied from the separation unit 501, the frequency-based degraded speech power spectrum supplied from the separation unit 502, and the counter of FIG. 4 calculates an estimated noise power spectrum for each frequency from the count value supplied from 4, and outputs it to the multiplexing unit 503. The multiplexing unit 503 multiplexes the frequency-specific estimated noise power spectrum supplied from the frequency-specific estimated noise calculation units 504 _{0 to} 504 _K−1, and weights the estimated noise power spectrum with the frequency-specific SNR calculation unit 6 of FIG. Output to the deteriorated speech calculator 14. Detailed configuration and operation of the frequency-specific estimated noise calculators 504 _{0 to} 504 _K−1 will be described with reference to FIG.

図４３は、図４２に含まれる周波数別推定雑音計算部５０４_０〜５０４_Ｋ−１の構成を示すブロック図である。周波数別推定雑音計算部５０４_０〜５０４_Ｋ−１は、更新判定部５２０、レジスタ長記憶部５０４１、推定雑音記憶部５０４２、スイッチ５０４４、シフトレジスタ５０４５、加算器５０４６、最小値選択部５０４７、除算部５０４８、カウンタ５０４９を有する。スイッチ５０４４には、図４２の分離部５０１から、周波数別重みつき劣化音声パワースペクトルが供給されている。スイッチ５０４４が回路を閉じたときに、周波数別重みつき劣化音声パワースペクトルは、シフトレジスタ５０４５に伝達される。シフトレジスタ５０４５は、更新判定部５２０から供給される制御信号に応じて、内部レジスタの記憶値を隣接レジスタにシフトする。シフトレジスタ長は、後述するレジスタ長記憶部５０４１に記憶されている値に等しい。シフトレジスタ５０４５の全レジスタ出力は、加算器５０４６に供給される。加算器５０４６は、供給された全レジスタ出力を加算して、加算結果を除算部５０４８に伝達する。 FIG. 43 is a block diagram showing a configuration of frequency-specific estimated noise calculation units 504 _{0 to} 504 _K−1 included in FIG. The frequency-specific estimated noise calculation units 504 _{0 to} 504 _K−1 include an update determination unit 520, a register length storage unit 5041, an estimated noise storage unit 5042, a switch 5044, a shift register 5045, an adder 5046, a minimum value selection unit 5047, and a division. Part 5048 and counter 5049. The switch 5044 is supplied with the frequency-dependent weighted degraded sound power spectrum from the separation unit 501 in FIG. When the switch 5044 closes the circuit, the frequency-specific weighted degraded sound power spectrum is transmitted to the shift register 5045. The shift register 5045 shifts the stored value of the internal register to the adjacent register in accordance with the control signal supplied from the update determination unit 520. The shift register length is equal to a value stored in a register length storage unit 5041 described later. All register outputs of the shift register 5045 are supplied to the adder 5046. The adder 5046 adds all the supplied register outputs and transmits the addition result to the division unit 5048.

一方、更新判定部５２０には、カウント値、周波数別劣化音声パワースペクトル及び周波数別推定雑音パワースペクトルが供給されている。更新判定部５２０は、カウント値が予め設定された値に到達するまでは常に“１”を、到達した後は入力された劣化音声信号が雑音であると判定されたときに“１”を、それ以外のときに“０”を出力し、カウンタ５０４９、スイッチ５０４４、及びシフトレジスタ５０４５に伝達する。スイッチ５０４４は、更新判定部から供給された信号が“１”のときに回路を閉じ、“０”のときに開く。カウンタ５０４９は、更新判定部から供給された信号が“１”のときにカウント値を増加し、“０”のときには変更しない。シフトレジスタ５０４５は、更新判定部から供給された信号が“１”のときにスイッチ５０４４から供給される信号サンプルを１サンプル取り込むと同時に、内部レジスタの記憶値を隣接レジスタにシフトする。最小値選択部５０４７には、カウンタ５０４９の出力とレジスタ長記憶部５０４１の出力が供給されている。 On the other hand, the update determination unit 520 is supplied with a count value, a frequency-specific degraded speech power spectrum, and a frequency-specific estimated noise power spectrum. The update determination unit 520 always sets “1” until the count value reaches a preset value, and after reaching the count value, sets “1” when the input deteriorated speech signal is determined to be noise. At other times, “0” is output and transmitted to the counter 5049, the switch 5044, and the shift register 5045. The switch 5044 closes the circuit when the signal supplied from the update determination unit is “1” and opens when the signal is “0”. The counter 5049 increases the count value when the signal supplied from the update determination unit is “1”, and does not change when the signal is “0”. The shift register 5045 captures one sample of the signal sample supplied from the switch 5044 when the signal supplied from the update determination unit is “1”, and simultaneously shifts the stored value of the internal register to the adjacent register. The minimum value selection unit 5047 is supplied with the output of the counter 5049 and the output of the register length storage unit 5041.

最小値選択部５０４７は、供給されたカウント値とレジスタ長のうち、小さい方を選択して、除算部５０４８に伝達する。除算部５０４８は、加算器５０４６から供給された周波数別劣化音声パワースペクトルの加算値をカウント値又はレジスタ長の小さい方の値で除算し、商を周波数別推定雑音パワースペクトルλ_ｎ（ｋ）として出力する。Ｂ_ｎ（ｋ）（ｎ＝０，１，．．．，Ｎ−１）をシフトレジスタ５０４５に保存されている劣化音声パワースペクトルのサンプル値とすると、λ_ｎ（ｋ）は、

The minimum value selection unit 5047 selects the smaller one of the supplied count value and register length and transmits it to the division unit 5048. The division unit 5048 divides the addition value of the degraded speech power spectrum for each frequency supplied from the adder 5046 by the smaller value of the count value or the register length, and sets the quotient as the estimated noise power spectrum for each frequency λ _n (k). Output. Assuming that B _n (k) (n = 0, 1,..., N−1) is a sample value of the degraded speech power spectrum stored in the shift register 5045, λ _n (k) is

で与えられる。 Given in.

ただし、Ｎはカウント値とレジスタ長のうち、小さい方の値である。カウント値はゼロから始まって単調に増加するので、最初はカウント値で除算が行なわれ、後にはレジスタ長で除算が行なわれる。レジスタ長で除算が行なわれることは、シフトレジスタに格納された値の平均値を求めることになる。最初は、シフトレジスタ５０４５に十分多くの値が記憶されていないために、実際に値が記憶されているレジスタの数で除算する。実際に値が記憶されているレジスタの数は、カウント値がレジスタ長より小さいときはカウント値に等しく、カウント値がレジスタ長より大きくなると、レジスタ長と等しくなる。 However, N is the smaller value of the count value and the register length. Since the count value starts monotonically and increases monotonically, division is first performed by the count value, and thereafter division is performed by the register length. When division is performed by the register length, an average value of values stored in the shift register is obtained. At first, since not enough values are stored in the shift register 5045, the value is divided by the number of registers in which values are actually stored. The number of registers in which values are actually stored is equal to the count value when the count value is smaller than the register length, and equal to the register length when the count value is larger than the register length.

図４４は、図４３に含まれる更新判定部５２０の構成を示すブロック図である。更新判定部５２０は、論理和計算部５２０１、比較部５２０３、５２０５、閾値記憶部５２０４、５２０６、閾値計算部５２０７を有する。図３６のカウンタ４から供給されるカウント値は、比較部５２０３に伝達される。閾値記憶部５２０４の出力である閾値も、比較部５２０３に伝達される。比較部５２０３は、供給されたカウント値と閾値を比較し、カウント値が閾値より小さいときに“１”を、カウント値が閾値より大きいときに“０”を、論理和計算部５２０１に伝達する。 FIG. 44 is a block diagram showing the configuration of the update determination unit 520 included in FIG. The update determination unit 520 includes a logical sum calculation unit 5201, comparison units 5203 and 5205, threshold storage units 5204 and 5206, and a threshold calculation unit 5207. The count value supplied from the counter 4 in FIG. 36 is transmitted to the comparison unit 5203. The threshold value that is the output of the threshold value storage unit 5204 is also transmitted to the comparison unit 5203. The comparison unit 5203 compares the supplied count value with a threshold value, and transmits “1” to the logical sum calculation unit 5201 when the count value is smaller than the threshold value and “0” when the count value is larger than the threshold value. .

一方、閾値計算部５２０７は、図４３の推定雑音記憶部５０４２から供給される周波数別推定雑音パワースペクトルに応じた値を計算し、閾値として閾値記憶部５２０６に出力する。最も簡単な閾値の計算方法は、周波数別推定雑音パワースペクトルの定数倍である。その他に、高次多項式や非線形関数を用いて閾値を計算することも可能である。閾値記憶部５２０６は、閾値計算部５２０７から出力された閾値を記憶し、１フレーム前に記憶された閾値を比較部５２０５へ出力する。比較部５２０５は、閾値記憶部５２０６から供給される閾値と図４２の分離部５０２から供給される周波数別劣化音声パワースペクトルを比較し、周波数別劣化音声パワースペクトルが閾値よりも小さければ“１”を、大きければ“０”を論理和計算部５２０１に出力する。 On the other hand, the threshold value calculation unit 5207 calculates a value corresponding to the frequency-specific estimated noise power spectrum supplied from the estimated noise storage unit 5042 of FIG. 43 and outputs the value as a threshold value to the threshold value storage unit 5206. The simplest threshold calculation method is a constant multiple of the estimated noise power spectrum for each frequency. In addition, it is possible to calculate the threshold value using a high-order polynomial or a nonlinear function. The threshold storage unit 5206 stores the threshold output from the threshold calculation unit 5207 and outputs the threshold stored one frame before to the comparison unit 5205. The comparison unit 5205 compares the threshold supplied from the threshold storage unit 5206 with the frequency-specific degraded audio power spectrum supplied from the separation unit 502 shown in FIG. 42, and “1” if the frequency-specific degraded audio power spectrum is smaller than the threshold. Is larger, “0” is output to the logical sum calculation unit 5201.

すなわち、推定雑音パワースペクトルの大きさをもとに、劣化音声信号が雑音であるか否かを判別している。論理和計算部５２０１は、比較部５２０３の出力値と比較部５２０５の出力値との論理和を計算し、計算結果を図４３のスイッチ５０４４、シフトレジスタ５０４５及びカウンタ５０４９に出力する。このように、初期状態や無音区間だけでなく、有音区間でも劣化音声パワーが小さい場合には、更新判定部５２０は“１”を出力する。すなわち、推定雑音の更新が行われる。閾値の計算は各周波数毎に行われるため、各周波数毎に推定雑音の更新を行うことができる。 That is, it is determined whether or not the degraded speech signal is noise based on the magnitude of the estimated noise power spectrum. The logical sum calculation unit 5201 calculates a logical sum of the output value of the comparison unit 5203 and the output value of the comparison unit 5205, and outputs the calculation result to the switch 5044, the shift register 5045, and the counter 5049 in FIG. As described above, the update determination unit 520 outputs “1” when the deteriorated voice power is small not only in the initial state and the silent period but also in the voiced period. That is, the estimated noise is updated. Since the threshold is calculated for each frequency, the estimated noise can be updated for each frequency.

図４５は、図３６に含まれる推定先天的ＳＮＲ計算部７の構成を示すブロック図である。推定先天的ＳＮＲ計算部７は、多重値域限定処理部７０１、後天的ＳＮＲ記憶部７０２、抑圧係数記憶部７０３、多重乗算部７０４、７０５、重み記憶部７０６、多重重みつき加算部７０７、加算器７０８を有する。図３６の周波数別ＳＮＲ計算部６から供給される後天的ＳＮＲγ_ｎ（ｋ）（ｋ＝０，１，．．．，Ｋ−１）は、後天的ＳＮＲ記憶部７０２と加算器７０８に伝達される。後天的ＳＮＲ記憶部７０２は、第ｎフレームにおける後天的ＳＮＲγ_ｎ（ｋ）を記憶すると共に、第ｎ−１フレームにおける後天的ＳＮＲγ_ｎ−１（ｋ）を多重乗算部７０５に伝達する。 FIG. 45 is a block diagram showing the configuration of the estimated innate SNR calculation unit 7 included in FIG. The estimated innate SNR calculation unit 7 includes a multi-range limitation processing unit 701, an acquired SNR storage unit 702, a suppression coefficient storage unit 703, multiple multiplication units 704 and 705, a weight storage unit 706, a multiple weighted addition unit 707, an adder 708. The acquired SNRγ _n (k) (k = 0, 1,..., K−1) supplied from the frequency-specific SNR calculation unit 6 in FIG. 36 is transmitted to the acquired SNR storage unit 702 and the adder 708. The The acquired SNR storage unit 702 stores the acquired SNRγ _n (k) in the nth frame and transmits the acquired SNRγ _n−1 (k) in the ( _n−1 ) th frame to the multiple multiplier 705.

図３６の抑圧係数補正部１５から供給される補正抑圧係数Ｇ_ｎ（ｋ）バー（ｋ＝０，１，．．．，Ｋ−１）は、抑圧係数記憶部７０３に伝達される。抑圧係数記憶部７０３は、第ｎフレームにおける補正抑圧係数Ｇ_ｎ（ｋ）バーを記憶すると共に、第ｎ−１フレームにおける補正抑圧係数Ｇ_ｎ−１（ｋ）バーを多重乗算部７０４に伝達する。多重乗算部７０４は、供給されたＧ_ｎ（ｋ）バーを２乗してＧ^２ _ｎ−１（ｋ）バーを求め、多重乗算部７０５に伝達する。多重乗算部７０５は、Ｇ^２ _ｎ−１（ｋ）バーとγ_ｎ−１（ｋ）をｋ＝０，１，．．．，Ｋ−１に対して乗算してＧ^２ _ｎ−１（ｋ）バーγ_ｎ−１（ｋ）を求め、結果を多重重み付き加算部７０７に過去の推定ＳＮＲ９２２として伝達する。多重乗算部７０４及び７０５の構成は、既に図３７を用いて説明した多重乗算部１３に等しいので、詳細な説明は省略する。 The corrected suppression coefficient G _n (k) bar (k = 0, 1,..., K−1) supplied from the suppression coefficient correction unit 15 in FIG. 36 is transmitted to the suppression coefficient storage unit 703. The suppression coefficient storage unit 703 stores the corrected suppression coefficient G _n (k) bar in the nth frame and transmits the corrected suppression coefficient G _n−1 (k) bar in the ( _n−1 ) _th frame to the multiple multiplication unit 704. . Multiplex multiplier 704 squares the supplied G _n (k) bar to obtain G ² _n−1 (k) bar, and transmits it to multiple multiplier 705. Multiplex multiplier 705 converts G ² _n−1 (k) bar and γ _n−1 (k) to k = 0, 1,. . . , K−1 is multiplied to obtain G ² _n−1 (k) bar γ _n−1 (k), and the result is transmitted to the multi-weighted addition unit 707 as the past estimated SNR 922. The configuration of the multiple multipliers 704 and 705 is the same as that of the multiple multiplier 13 already described with reference to FIG.

加算器７０８の他方の端子には−１が供給されており、加算結果γ_ｎ（ｋ）−１が多重値域限定処理部７０１に伝達される。多重値域限定処理部７０１は、加算器７０８から供給された加算結果γ_ｎ（ｋ）−１に値域限定演算子Ｐ［・］による演算を施し、結果であるＰ［γ_ｎ（ｋ）−１］を多重重みつき加算部７０７に瞬時推定ＳＮＲ９２１として伝達する。ただし、Ｐ［ｘ］は式（１０）で定められる。

The other terminal of the adder 708 is supplied with −1, and the addition result γ _n (k) −1 is transmitted to the multi-value range limiting processing unit 701. The multi-range limitation processing unit 701 performs an operation using the range limitation operator P [•] on the addition result γ _n (k) −1 supplied from the adder 708, and the result P [γ _n (k) −1. ] As the instantaneous estimated SNR 921. However, P [x] is defined by Formula (10).

多重重みつき加算部７０７には、また、重み記憶部７０６から重み９２３が供給されている。多重重みつき加算部７０７は、これらの供給された瞬時推定ＳＮＲ９２１、過去の推定ＳＮＲ９２２、重み９２３を用いて推定先天的ＳＮＲ９２４を求める。重み９２３をαとし、ξ_ｎ（ｋ）ハットを推定先天的ＳＮＲとすると、ξ_ｎ（ｋ）ハットは、式（１１）によって計算される。

A weight 923 is also supplied from the weight storage unit 706 to the multiple weighted addition unit 707. The multiple weighted addition unit 707 obtains an estimated innate SNR 924 using the supplied instantaneous estimated SNR 921, past estimated SNR 922, and weight 923. If the weight 923 is α and ξ _n (k) hat is the estimated innate SNR, ξ _n (k) hat is calculated by the equation (11).

ここに、Ｇ^２ _−１（ｋ）γ_−１（ｋ）バー＝１とする。 Here, G ² ₋₁ (k) γ ₋₁ (k) bar = 1.

図４６は、図４５に含まれる多重値域限定処理部７０１の構成を示すブロック図である。多重値域限定処理部７０１は、定数記憶部７０１１、最大値選択部７０１２_０〜７０１２_Ｋ−１、分離部７０１３、多重化部７０１４を有する。分離部７０１３には、図４５の加算器７０８から、γ_ｎ（ｋ）−１が供給される。分離部７０１３は、供給されたγ_ｎ（ｋ）−１をＫ個の周波数別成分に分離し、最大値選択部７０１２_０〜７０１２_Ｋ−１に供給する。最大値選択部７０１２_０〜７０１２_Ｋ−１の他方の入力には、定数記憶部７０１１からゼロが供給されている。最大値選択部７０１２_０〜７０１２_Ｋ−１は、γ_ｎ（ｋ）−１をゼロと比較し、大きい方の値を多重化部７０１４へ伝達する。この最大値選択演算は、式（１０）を実行することに相当する。多重化部７０１４は、これらの値を多重化して出力する。 FIG. 46 is a block diagram showing the configuration of the multi-range limitation processing unit 701 included in FIG. The multi-value range limitation processing unit 701 includes a constant storage unit 7011, maximum value selection units 7012 _{0 to} 7012 _K−1 , a separation unit 7013, and a multiplexing unit 7014. Γ _n (k) −1 is supplied to the separation unit 7013 from the adder 708 in FIG. The separation unit 7013 separates the supplied γ _n (k) −1 into K frequency-specific components, and supplies them to the maximum value selection units 7012 _{0 to} 7012 _K−1 . Zeros are supplied from the constant storage unit 7011 to the other inputs of the maximum value selection units 7012 _{0 to} 7012 _K−1 . Maximum value selection sections 7012 _{0 to} 7012 _K−1 compare γ _n (k) −1 with zero and transmit the larger value to multiplexing section 7014. This maximum value selection calculation corresponds to executing Expression (10). The multiplexing unit 7014 multiplexes these values and outputs them.

図４７は、図４５に含まれる多重重みつき加算部７０７の構成を示すブロック図である。多重重みつき加算部７０７は、重みつき加算部７０７１_０〜７０７１_Ｋ−１、分離部７０７２、７０７４、多重化部７０７５を有する。分離部７０７２には、図４５の多重値域限定処理部７０１から、Ｐ［γ_ｎ（ｋ）−１］が瞬時推定ＳＮＲ９２１として供給される。分離部７０７２は、Ｐ［γ_ｎ（ｋ）−１］をＫ個の周波数別成分に分離し、周波数別瞬時推定ＳＮＲ９２１_０〜９２１_Ｋ−１として、重みつき加算部７０７１_０〜７０７１_Ｋ−１に伝達する。分離部７０７４には、図４５の多重乗算部７０５から、Ｇ^２ _ｎ−１（ｋ）バーγ_ｎ−１（ｋ）が過去の定ＳＮＲ９２２として供給される。 FIG. 47 is a block diagram showing the configuration of the multiple weighted addition unit 707 included in FIG. The multiple weighted addition unit 707 includes weighted addition units 7071 _{0 to} 7071 _K−1 , separation units 7072 and 7074, and a multiplexing unit 7075. The separation unit 7072 is supplied with P [γ _n (k) −1] as the instantaneous estimated SNR 921 from the multi-value range limitation processing unit 701 in FIG. Separation unit 7072, P a [γ _n (k) -1] is separated into K frequency-components, as frequency-instantaneous estimation SNR921 ₀ ~921 _K-1, weighted adder 7071 ₀ ~7071 _K-1 To communicate. The separation unit 7074 is supplied with G ² _n−1 (k) bar γ _n−1 (k) as the past constant SNR 922 from the multiple multiplication unit 705 in FIG.

分離部７０７４は、Ｇ^２ _ｎ−１（ｋ）バーγ_ｎ−１（ｋ）をＫ個の周波数別成分に分離し、過去の周波数別推定ＳＮＲ９２２_０〜９２２_Ｋ−１として、重みつき加算部７０７１_０〜７０７１_Ｋ−１に伝達する。一方、重みつき加算部７０７１_０〜７０７１_Ｋ−１には、重み９２３も供給される。重みつき加算部７０７１_０〜７０７１_Ｋ−１は、式（１１）によって表される重みつき加算を実行し、周波数別推定先天的ＳＮＲ９２４_０〜９２４_Ｋ−１を多重化部７０７５に伝達する。多重化部７０７５は、周波数別推定先天的ＳＮＲ９２４_０〜９２４_Ｋ−１を多重化し、推定先天的ＳＮＲ９２４として出力する。重みつき加算部７０７１_０〜７０７１_Ｋ−１の動作と構成については、次に図４８を参照しながら説明する。 The separation unit 7074 separates G ² _n-1 (k) bar γ _n-1 (k) into K frequency-specific components, and sets the weighted addition unit as past frequency-specific estimated SNRs 922 _{0 to} 922 _K−1. 7071 _{0 to} 7071 _K−1 . On the other hand, weights 923 are also supplied to the weighted addition units 7071 _{0 to} 7071 _K−1 . The weighted addition units 7071 _{0 to} 7071 _K−1 perform weighted addition represented by Expression (11), and transmit the frequency-specific estimated innate SNRs 924 _{0 to} 924 _K−1 to the multiplexing unit 7075. Multiplexing section 7075 multiplexes frequency-specific estimated innate SNRs 924 _{0 to} 924 _K−1 and outputs them as estimated innate SNR 924. The operation and configuration of the weighted addition units 7071 _{0 to} 7071 _K−1 will be described next with reference to FIG.

図４８は、図４７に含まれる重みつき加算部７０７１の構成を示すブロック図である。重みつき加算部７０７１は、乗算器７０９１、７０９３、定数乗算器７０９５、加算器７０９２、７０９４を有する。図４７の分離部７０７２から周波数別瞬時推定ＳＮＲ９２１が、図４７の分離部７０７４から過去の周波数別ＳＮＲ９２２が、図４５の重み記憶部７０６から重み９２３が、それぞれ入力として供給される。値αを有する重み９２３は、定数乗算器７０９５と乗算器７０９３に伝達される。定数乗算器７０９５は入力信号を−１倍して得られた−αを、加算器７０９４に伝達する。加算器７０９４のもう一方の入力としては１が供給されており、加算器７０９４の出力は両者の和である１−αとなる。１−αは乗算器７０９１に供給されて、もう一方の入力である周波数別瞬時推定ＳＮＲＰ［γ_ｎ（ｋ）−１］と乗算され、積である（１−α）Ｐ［γ_ｎ（ｋ）−１］が加算器７０９２に伝達される。一方、乗算器７０９３では、重み９２３として供給されたαと過去の推定ＳＮＲ９２２が乗算され、積であるαＧ^２ _ｎ−１（ｋ）バーγ_ｎ−１（ｋ）が加算器７０９２に伝達される。加算器７０９２は、（１−α）Ｐ［γ_ｎ（ｋ）−１］とαＧ^２ _ｎ−１（ｋ）バーγ_ｎ−１（ｋ）の和を、周波数別推定先天的ＳＮＲ９２４として出力する。 FIG. 48 is a block diagram showing a configuration of the weighted addition unit 7071 included in FIG. The weighted addition unit 7071 includes multipliers 7091 and 7093, a constant multiplier 7095, and adders 7092 and 7094. 47, the instantaneous frequency-specific estimated SNR 921 is supplied from the separation unit 7072, the past frequency SNR 922 is supplied from the separation unit 7074, and the weight 923 is supplied from the weight storage unit 706 of FIG. The weight 923 having the value α is transmitted to the constant multiplier 7095 and the multiplier 7093. The constant multiplier 7095 transmits −α obtained by multiplying the input signal by −1 to the adder 7094. 1 is supplied as the other input of the adder 7094, and the output of the adder 7094 is 1-α which is the sum of both. 1-α is supplied to a multiplier 7091 and is multiplied by the other frequency-specific instantaneous estimated SNR P [γ _n (k) −1], which is the product, (1-α) P [γ _n ( k) −1] is transmitted to the adder 7092. On the other hand, the multiplier 7093 multiplies α supplied as the weight 923 by the past estimated SNR 922 and transmits the product αG ² _n−1 (k) bar γ _n−1 (k) to the adder 7092. . Adder 7092 outputs the sum of (1-α) P [γ _n (k) -1] and αG ² _n-1 (k) bar γ _n-1 (k) as frequency-specific estimated innate SNR 924. .

図４９は、図３６に含まれる雑音抑圧係数生成部８を示すブロック図である。雑音抑圧係数生成部８は、ＭＭＳＥＳＴＳＡゲイン関数値計算部８１１、一般化尤度比計算部８１２、及び抑圧係数計算部８１４を有する。以下、特許文献１に記載されている計算式をもとに、抑圧係数の計算方法を説明する。 FIG. 49 is a block diagram showing the noise suppression coefficient generation unit 8 included in FIG. The noise suppression coefficient generation unit 8 includes an MMSE STSA gain function value calculation unit 811, a generalized likelihood ratio calculation unit 812, and a suppression coefficient calculation unit 814. Hereinafter, based on the calculation formula described in Patent Document 1, a calculation method of the suppression coefficient will be described.

フレーム番号をｎ、周波数番号をｋとし、γ_ｎ（ｋ）を図３６の周波数別ＳＮＲ計算部６から供給される周波数別後天的ＳＮＲ、ξ_ｎ（ｋ）ハットを図３６の推定先天的ＳＮＲ計算部７から供給される周波数別推定先天的ＳＮＲ、ｑを図３６の音声非存在確率記憶部２１から供給される音声非存在確率とする。また、η_ｎ（ｋ）＝ξ_ｎ（ｋ）ハット／（１−ｑ）、ｖ_ｎ（ｋ）＝（η_ｎ（ｋ）γ_ｎ（ｋ））／（１＋η_ｎ（ｋ））とする。ＭＭＳＥＳＴＳＡゲイン関数値計算部８１１は、図３６の周波数別ＳＮＲ計算部６から供給される後天的ＳＮＲ γ_ｎ（ｋ）、図３６の推定先天的ＳＮＲ計算部７から供給される推定先天的ＳＮＲ ξ_ｎ（ｋ）ハット及び図３６の音声非存在確率記憶部２１から供給される音声非存在確率ｑをもとに、各周波数毎にＭＭＳＥＳＴＳＡゲイン関数値を計算し、抑圧係数計算部８１４に出力する。各周波数毎のＭＭＳＥＳＴＳＡゲイン関数値Ｇ_ｎ（ｋ）は、

The frame number is n, the frequency number is k, γ _n (k) is the acquired frequency-specific SNR supplied from the frequency-specific SNR calculator 6 in FIG. 36, and ξ _n (k) is the estimated innate SNR in FIG. The frequency-specific estimated innate SNR, q supplied from the calculation unit 7 is defined as the speech non-existence probability supplied from the speech non-existence probability storage unit 21 of FIG. Further, η _n (k) = ξ _n (k) hat / (1-q), v _n (k) = (η _n (k) γ _n (k)) / (1 + η _n (k)). The MMSE STSA gain function value calculator 811 obtains the acquired SNR γ _n (k) supplied from the frequency-specific SNR calculator 6 in FIG. 36 and the estimated innate SNR supplied from the estimated innate SNR calculator 7 in FIG. An MMSE STSA gain function value is calculated for each frequency based on ξ _n (k) hat and the speech absence probability q supplied from the speech absence probability storage unit 21 in FIG. Output. The MMSE STSA gain function value G _n (k) for each frequency is

で与えられる。 Given in.

ここに、Ｉ_０（ｚ）は０次変形ベッセル関数、Ｉ_１（ｚ）は１次変形ベッセル関数である。変形ベッセル関数については、非特許文献１に記載されている。 Here, I ₀ (z) is a zero-order modified Bessel function, and I ₁ (z) is a first-order modified Bessel function. Non-Patent Document 1 describes the modified Bessel function.

一般化尤度比計算部８１２は、図３６の周波数別ＳＮＲ計算部６から供給される後天的ＳＮＲ γ_ｎ（ｋ）、図３６の推定先天的ＳＮＲ計算部７から供給される推定先天的ＳＮＲ ξ_ｎ（ｋ）ハット及び図３６の音声非存在確率記憶部２１から供給される音声非存在確率ｑをもとに、周波数毎に一般化尤度比を計算し、抑圧係数計算部８１４に出力する。周波数毎の一般化尤度比Λ_ｎ（ｋ）は、

The generalized likelihood ratio calculator 812 obtains the acquired SNR γ _n (k) supplied from the frequency-specific SNR calculator 6 in FIG. 36 and the estimated innate SNR supplied from the estimated innate SNR calculator 7 in FIG. Based on ξ _n (k) hat and the speech non-existence probability q supplied from the speech non-existence probability storage unit 21 of FIG. 36, a generalized likelihood ratio is calculated for each frequency and output to the suppression coefficient calculation unit 814. To do. The generalized likelihood ratio Λ _n (k) for each frequency is

で与えられる。 Given in.

抑圧係数計算部８１４は、ＭＭＳＥＳＴＳＡゲイン関数値計算部８１１から供給されるＭＭＳＥＳＴＳＡゲイン関数値Ｇ_ｎ（ｋ）と一般化尤度比計算部８１２から供給される一般化尤度比Λ_ｎ（ｋ）から周波数毎に抑圧係数を計算し、図３６の抑圧係数補正部１５へ出力する。周波数毎の抑圧係数Ｇ_ｎ（ｋ）バーは、

The suppression coefficient calculation unit 814 receives the MMSE STSA gain function value G _n (k) supplied from the MMSE STSA gain function value calculation unit 811 and the generalized likelihood ratio Λ _n (supplied from the generalized likelihood ratio calculation unit 812. The suppression coefficient is calculated for each frequency from k) and output to the suppression coefficient correction unit 15 in FIG. The suppression coefficient G _n (k) bar for each frequency is

で与えられる。周波数別にＳＮＲを計算する代わりに、複数の周波数から構成される帯域に共通なＳＮＲを求めて、これを用いることも可能である。 Given in. Instead of calculating the SNR for each frequency, it is also possible to obtain and use an SNR common to a band composed of a plurality of frequencies.

図５０は、図３６に含まれる抑圧係数補正部１５を示すブロック図である。抑圧係数補正部１５は、周波数別抑圧係数補正部１５０１_０〜１５０１_Ｋ−１、分離部１５０２、１５０３及び多重化部１５０４を有する。 50 is a block diagram showing the suppression coefficient correction unit 15 included in FIG. The suppression coefficient correction unit 15 includes frequency-specific suppression coefficient correction units 1501 _{0 to} 1501 _K−1 , separation units 1502 and 1503, and a multiplexing unit 1504.

分離部１５０２は、図３６の推定先天的ＳＮＲ計算部７から供給される推定先天的ＳＮＲを周波数別成分に分離し、それぞれ周波数別抑圧係数補正部１５０１_０〜１５０１_Ｋ−１に出力する。分離部１５０３は、図３６の抑圧係数生成部８から供給される抑圧係数を周波数別成分に分離し、それぞれ周波数別抑圧係数補正部１５０１_０〜１５０１_Ｋ−１に出力する。周波数別抑圧係数補正部１５０１_０〜１５０１_Ｋ−１は、分離部１５０２から供給される周波数別推定先天的ＳＮＲと、分離部１５０３から供給される周波数別抑圧係数から、周波数別補正抑圧係数を計算し、多重化部１５０４へ出力する。多重化部１５０４は、周波数別抑圧係数補正部１５０１_０〜１５０１_Ｋ−１から供給される周波数別補正抑圧係数を多重化し、補正抑圧係数として図３６の多重乗算部１と推定先天的ＳＮＲ計算部７へ出力する。 Separation section 1502 separates the estimated innate SNR supplied from estimated innate SNR calculation section 7 in FIG. 36 into frequency-specific components, and outputs them to frequency-specific suppression coefficient correction sections 1501 _{0 to} 1501 _K−1 . Separation section 1503 separates the suppression coefficient supplied from suppression coefficient generation section 8 of FIG. 36 into frequency-specific components, and outputs them to frequency-specific suppression coefficient correction sections 1501 _{0 to} 1501 _K−1 . Frequency-specific suppression coefficient correction units 1501 _{0 to} 1501 _K−1 calculate frequency-specific correction suppression coefficients from frequency-specific estimated innate SNRs supplied from separation unit 1502 and frequency-specific suppression coefficients supplied from separation unit 1503. And output to the multiplexing unit 1504. The multiplexing unit 1504 multiplexes the frequency-specific correction suppression coefficients supplied from the frequency-specific suppression coefficient correction units 1501 _{0 to} 1501 _K−1, and uses the multiple multiplication unit 1 of FIG. 36 and the estimated innate SNR calculation unit as the correction suppression coefficients. 7 is output.

次に図５１を参照しながら、周波数別抑圧係数補正部１５０１_０〜１５０１_Ｋ−１の構成と動作について詳細に説明する。 Next, the configuration and operation of the frequency-specific suppression coefficient correction units 1501 _{0 to} 1501 _K−1 will be described in detail with reference to FIG.

図５１は、抑圧係数補正部１５に含まれる周波数別抑圧係数補正部１５０１_０〜１５０１_Ｋ−１の構成を示すブロック図である。周波数別抑圧係数補正部１５０１は、最大値選択部１５９１、抑圧係数下限値記憶部１５９２、閾値記憶部１５９３、比較部１５９４、スイッチ１５９５、修正値記憶部１５９６及び乗算器１５９７を有する。 FIG. 51 is a block diagram showing the configuration of frequency-specific suppression coefficient correction units 1501 _{0 to} 1501 _K−1 included in the suppression coefficient correction unit 15. The frequency-specific suppression coefficient correction unit 1501 includes a maximum value selection unit 1591, a suppression coefficient lower limit value storage unit 1592, a threshold storage unit 1593, a comparison unit 1594, a switch 1595, a correction value storage unit 1596, and a multiplier 1597.

比較部１５９４は、閾値記憶部１５９３から供給される閾値と、図５０の分離部１５０２から供給される周波数別推定先天的ＳＮＲを比較し、周波数別推定先天的ＳＮＲが閾値よりも大きければ“０”を、小さければ“１”をスイッチ１５９５に供給する。スイッチ１５９５は、図５０の分離部１５０３から供給される周波数別抑圧係数を、比較部１５９４の出力値が“１”のときに乗算器１５９７に出力し、“０”のときに最大値選択部１５９１に出力する。すなわち、周波数別推定先天的ＳＮＲが閾値よりも小さいときに、抑圧係数の補正が行われる。乗算器１５９７は、スイッチ１５９５の出力値と修正値記憶部１５９６の出力値との積を計算し、最大値選択部１５９１に出力する。 The comparison unit 1594 compares the threshold supplied from the threshold storage unit 1593 with the frequency-specific estimated innate SNR supplied from the separation unit 1502 in FIG. 50. If the frequency-specific estimated innate SNR is larger than the threshold, “0” is output. "Is supplied to the switch 1595 if it is smaller. The switch 1595 outputs the frequency-specific suppression coefficient supplied from the separation unit 1503 in FIG. 50 to the multiplier 1597 when the output value of the comparison unit 1594 is “1”, and the maximum value selection unit when the output value is “0”. It outputs to 1591. That is, when the frequency-specific estimated innate SNR is smaller than the threshold value, the suppression coefficient is corrected. Multiplier 1597 calculates the product of the output value of switch 1595 and the output value of correction value storage unit 1596 and outputs the product to maximum value selection unit 1591.

一方、抑圧係数下限値記憶部１５９２は、記憶している抑圧係数の下限値を、最大値選択部１５９１に供給する。最大値選択部１５９１は、図５０の分離部１５０３から供給される周波数別抑圧係数、又は乗算器１５９７で計算された積と、抑圧係数下限値記憶部１５９２から供給される抑圧係数下限値とを比較し、大きい方の値を図５０の多重化部１５０４に出力する。すなわち、抑圧係数は抑圧係数下限値記憶部１５９２が記憶する下限値よりも必ず大きい値になる。 On the other hand, the suppression coefficient lower limit value storage unit 1592 supplies the stored lower limit value of the suppression coefficient to the maximum value selection unit 1591. The maximum value selection unit 1591 calculates the frequency-specific suppression coefficient supplied from the separation unit 1503 in FIG. 50 or the product calculated by the multiplier 1597 and the suppression coefficient lower limit value supplied from the suppression coefficient lower limit value storage unit 1592. The larger value is compared and output to multiplexing section 1504 in FIG. In other words, the suppression coefficient is necessarily larger than the lower limit value stored in the suppression coefficient lower limit value storage unit 1592.

特開２００２−２０４１７号公報Japanese Patent Laid-Open No. 2002-20417

１９８５年、数学辞典、岩波書店、３７４．Ｇページ1985, Mathematical Dictionary, Iwanami Shoten, 374. G page １９７９年１２月、プロシーディングス・オブ・ザ・アイ・イー・イー・イー、第６７巻、第１２号（ＰＲＯＣＥＥＤＩＮＧＳＯＦＴＨＥＩＥＥＥ，ＶＯＬ．６７，ＮＯ．１２，ＰＰ．１５８６−１６０４，ＤＥＣ，１９７９）、１５８６〜１６０４ページDecember 1979, Proceedings of the IEE, Vol. 67, No. 12 (PROCEEDINGS OF THE IEEE, VOL.67, NO.12, PP.1586-1604, DEC, 1979 ), Pages 1586-1604 １９７９年４月、アイ・イー・イー・イー・トランザクションズ・オン・アクースティクス・スピーチ・アンド・シグナル・プロセシング、第２７巻、第２号（ＩＥＥＥＴＲＡＮＳＡＣＴＩＯＮＳＯＮＡＣＯＵＳＴＩＣＳ，ＳＰＥＥＣＨ，ＡＮＤＳＩＧＮＡＬＰＲＯＣＥＳＳＩＮＧ，ＶＯＬ．２７，ＮＯ．２，ＰＰ．１１３−１２０，ＡＰＲ，１９７９）、１１３〜１２０ページApril 1979, IEE Transactions on Axetics Speech and Signal Processing, Vol. 27, No. 2 (IEEETRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. 27 , No. 2, PP. 113-120, APR, 1979), pages 113-120.

これまで説明した関連技術の方法では、音声区間と雑音区間を区別せずに、常に同一の計算方法で求めた抑圧係数を用いて雑音抑圧を行っていた。このため、音声区間で音声歪みが発生し、雑音区間での抑圧が不十分になるという問題があった。 In the related art methods described so far, noise suppression is always performed using a suppression coefficient obtained by the same calculation method without distinguishing between a speech section and a noise section. For this reason, there is a problem that voice distortion occurs in the voice section, and suppression in the noise section becomes insufficient.

本発明の目的は、音声区間と雑音区間を区別し、それぞれに適した計算方法で求めた抑圧係数を用いて雑音抑圧を行うことによって、音声区間での音声歪みを低減し、雑音区間において十分な抑圧を達成することのできる雑音抑圧の方法及び装置を提供することである。 It is an object of the present invention to reduce speech distortion in a speech section by distinguishing between a speech section and a noise section and performing noise suppression using a suppression coefficient obtained by a calculation method suitable for each, and sufficient in the noise section. It is an object of the present invention to provide a noise suppression method and apparatus capable of achieving various suppressions.

本発明の雑音抑圧の方法及び装置では、無音部用係数と有音部係数に基づき、雑音抑圧
後に更に抑圧を行う後抑圧を用いることを特徴とする。 The noise suppression method and apparatus according to the present invention is characterized by using post-suppression that performs further suppression after noise suppression based on the silence part coefficient and the sound part coefficient.

より具体的には、強調音声パワースペクトルと推定雑音パワースペクトルに基づいて無
音部用係数を計算する無音部用係数計算部と、有音部用係数を記憶する有音部用係数記憶
部と、得られた無音部用係数と有音部用係数をもとに後抑圧係数を計算するための後抑圧
係数計算部を備えていることを特徴とする。 More specifically, a silent part coefficient calculating unit that calculates a silent part coefficient based on the emphasized speech power spectrum and the estimated noise power spectrum, a voiced part coefficient storage part that stores a voiced part coefficient, A post-suppression coefficient calculator for calculating a post-suppression coefficient based on the obtained silent part coefficient and sound part coefficient is provided.

本発明では、強調音声パワースペクトルと推定雑音パワースペクトルに基づいて計算された無音部用係数と、有音部用係数を用いて抑圧係数を補正するので、音声区間では有音部用係数に基づき抑圧を弱め、雑音区間では強調音声パワースペクトルと推定雑音パワースペクトルに応じた無音部用係数に基づき抑圧を強めるように後抑圧を行うことが可能となり、音声区間では歪みが少なく雑音区間では残留雑音が少ない強調音声を得ることができる。 In the present invention, since the suppression coefficient is corrected using the coefficient for the silent part calculated based on the emphasized voice power spectrum and the estimated noise power spectrum and the coefficient for the voice part, the voice section is based on the coefficient for the voice part. It is possible to weaken the suppression and perform post-suppression in the noise interval to increase the suppression based on the coefficients for the silenced voice power spectrum and the estimated noise power spectrum in the noise interval. It is possible to obtain emphasized speech with less.

本発明の第１の実施の形態を示すブロック図。The block diagram which shows the 1st Embodiment of this invention. 本発明の第１の実施の形態に含まれる強調音声スペクトル補正部の構成を示すブロック図。The block diagram which shows the structure of the emphasis audio | voice spectrum correction | amendment part contained in the 1st Embodiment of this invention. 図２に含まれる音声存在確率計算部の構成を示すブロック図。The block diagram which shows the structure of the audio | voice existence probability calculation part contained in FIG. 図３に含まれる平滑化部の構成を示すブロック図。The block diagram which shows the structure of the smoothing part contained in FIG. 図２に含まれる後抑圧係数計算部の構成を示すブロック図。The block diagram which shows the structure of the post-suppression coefficient calculation part contained in FIG. 図５に含まれる周波数別後抑圧係数計算部の構成を示すブロック図。FIG. 6 is a block diagram illustrating a configuration of a frequency-specific post-frequency suppression coefficient calculation unit included in FIG. 5. 図６に含まれる無音部用係数計算部の構成を示すブロック図。The block diagram which shows the structure of the coefficient calculation part for silence parts contained in FIG. 図７に含まれる係数計算部における非線形関数の一例を示す図。The figure which shows an example of the nonlinear function in the coefficient calculation part contained in FIG. 本発明の第２の実施の形態を示すブロック図。The block diagram which shows the 2nd Embodiment of this invention. 本発明の第２の実施の形態に含まれる強調音声スペクトル補正部の構成を示すブロック図。The block diagram which shows the structure of the emphasis audio | voice spectrum correction | amendment part contained in the 2nd Embodiment of this invention. 図１０に含まれる後抑圧係数計算部の構成を示すブロック図。The block diagram which shows the structure of the post-suppression coefficient calculation part contained in FIG. 図１１に含まれる周波数別後抑圧係数計算部の構成を示すブロック図。The block diagram which shows the structure of the post-frequency suppression coefficient calculation part contained in FIG. 本発明の第３の実施の形態を示すブロック図。The block diagram which shows the 3rd Embodiment of this invention. 本発明の第３の実施の形態に含まれる強調音声スペクトル補正部の構成を示すブロック図。The block diagram which shows the structure of the emphasis audio | voice spectrum correction | amendment part contained in the 3rd Embodiment of this invention. 図１４に含まれる後抑圧係数計算部の構成を示すブロック図。The block diagram which shows the structure of the post-suppression coefficient calculation part contained in FIG. 図１５に含まれる周波数別後抑圧係数計算部の構成を示すブロック図。The block diagram which shows the structure of the post-frequency suppression coefficient calculation part contained in FIG. 本発明の第４の実施の形態を示すブロック図。The block diagram which shows the 4th Embodiment of this invention. 本発明の第４の実施の形態に含まれる強調音声スペクトル補正部の構成を示すブロック図。The block diagram which shows the structure of the emphasis audio | voice spectrum correction | amendment part contained in the 4th Embodiment of this invention. 図１８に含まれる後抑圧係数計算部の構成を示すブロック図。The block diagram which shows the structure of the post-suppression coefficient calculation part contained in FIG. 図１９に含まれる周波数別後抑圧係数計算部の構成を示すブロック図。FIG. 20 is a block diagram illustrating a configuration of a frequency-specific post-frequency suppression coefficient calculation unit included in FIG. 19. 本発明の第５の実施の形態を示すブロック図。The block diagram which shows the 5th Embodiment of this invention. 本発明の第５の実施の形態に含まれる推定先天的ＳＮＲ計算部の構成を示すブロック図。The block diagram which shows the structure of the presumed innate SNR calculation part contained in the 5th Embodiment of this invention. 本発明の第５の実施の形態に含まれる抑圧係数補正部の構成を示すブロック図。The block diagram which shows the structure of the suppression coefficient correction | amendment part contained in the 5th Embodiment of this invention. 図２３に含まれる周波数別抑圧係数補正部の構成を示すブロック図。The block diagram which shows the structure of the suppression coefficient correction | amendment part classified by frequency contained in FIG. 本発明の第６の実施の形態を示すブロック図。The block diagram which shows the 6th Embodiment of this invention. 本発明の第６の実施の形態に含まれる推定先天的ＳＮＲ計算部の構成を示すブロック図。The block diagram which shows the structure of the presumed innate SNR calculation part contained in the 6th Embodiment of this invention. 本発明の第６の実施の形態に含まれる抑圧係数補正部の構成を示すブロック図。The block diagram which shows the structure of the suppression coefficient correction | amendment part contained in the 6th Embodiment of this invention. 図２７に含まれる周波数別推定抑圧係数補正部の構成を示すブロック図。The block diagram which shows the structure of the estimation suppression coefficient correction part classified by frequency contained in FIG. 本発明の第７の実施の形態を示すブロック図。The block diagram which shows the 7th Embodiment of this invention. 本発明の第７の実施の形態に含まれる強調音声振幅スペクトル補正部を示すブロック図。The block diagram which shows the emphasis audio | voice amplitude spectrum correction | amendment part contained in the 7th Embodiment of this invention. 本発明の第８の実施の形態を示すブロック図。The block diagram which shows the 8th Embodiment of this invention. 本発明の第８の実施の形態に含まれる強調音声振幅スペクトル補正部を示すブロック図。The block diagram which shows the emphasis audio | voice amplitude spectrum correction | amendment part contained in the 8th Embodiment of this invention. 本発明の第８の実施の形態に含まれる音声存在確率計算部を示すブロック図。The block diagram which shows the speech presence probability calculation part contained in the 8th Embodiment of this invention. 本発明の第９の実施の形態を示すブロック図。The block diagram which shows the 9th Embodiment of this invention. 本発明の第９の実施の形態に含まれる音声存在確率計算部を示すブロック図。The block diagram which shows the audio | voice presence probability calculation part contained in the 9th Embodiment of this invention. 関連技術例の構成を示すブロック図。The block diagram which shows the structure of a related art example. 関連技術例の構成に含まれる多重乗算部の構成を示すブロック図。The block diagram which shows the structure of the multiple multiplication part contained in the structure of a related art example. 関連技術例の構成に含まれる重み付き劣化音声計算部の構成を示すブロック図。The block diagram which shows the structure of the weighted degradation audio | voice calculation part contained in the structure of a related art example. 図３８に含まれる周波数別ＳＮＲ計算部の構成を示すブロック図。The block diagram which shows the structure of the SNR calculation part classified by frequency contained in FIG. 図３８に含まれる多重非線形処理部の構成を示すブロック図。FIG. 39 is a block diagram illustrating a configuration of a multiple nonlinear processing unit included in FIG. 38. 非線形処理部における非線形関数の一例を示す図。The figure which shows an example of the nonlinear function in a nonlinear processing part. 関連技術例の構成に含まれる推定雑音計算部の構成を示すブロック図。The block diagram which shows the structure of the estimation noise calculation part contained in the structure of a related art example. 図４２に含まれる周波数別推定雑音計算部の構成を示すブロック図。The block diagram which shows the structure of the estimation noise calculation part classified by frequency contained in FIG. 図４３に含まれる更新判定部の構成を示すブロック図。The block diagram which shows the structure of the update determination part contained in FIG. 関連技術例の構成に含まれる推定先天的ＳＮＲ計算部の構成を示すブロック図。The block diagram which shows the structure of the presumed innate SNR calculation part contained in the structure of a related art example. 図４５に含まれる多重値域限定処理部の構成を示すブロック図。The block diagram which shows the structure of the multiple value range limitation process part contained in FIG. 図４５に含まれる多重重みつき加算部の構成を示すブロック図。The block diagram which shows the structure of the addition part with multiple weight contained in FIG. 図４７に含まれる重みつき加算部の構成を示すブロック図。The block diagram which shows the structure of the weighted addition part contained in FIG. 関連技術例の構成に含まれる雑音抑圧係数生成部の構成を示すブロック図。The block diagram which shows the structure of the noise suppression coefficient production | generation part contained in the structure of a related art example. 関連技術例の構成に含まれる抑圧係数補正部の構成を示すブロック図。The block diagram which shows the structure of the suppression coefficient correction | amendment part contained in the structure of a related art example. 図５０に含まれる周波数別抑圧係数補正部の構成を示すブロック図。The block diagram which shows the structure of the suppression coefficient correction part classified by frequency contained in FIG.

図１は本発明の実施の形態を示すブロック図である。図１と関連技術例である図３６とは、強調音声振幅スペクトル補正部１８を除いて同一である。以下、これらの相違点を中心に詳細な動作を説明する。 FIG. 1 is a block diagram showing an embodiment of the present invention. FIG. 1 and FIG. 36 which is a related art example are the same except for the emphasized speech amplitude spectrum correction unit 18. Hereinafter, detailed operations will be described focusing on these differences.

強調音声振幅スペクトル補正部１８には、フーリエ変換部３から劣化音声振幅スペクトル、推定雑音計算部５から推定雑音パワースペクトル、多重乗算部１６から強調音声振幅スペクトル、そして抑圧係数補正部１５から補正抑圧係数がそれぞれ供給されている。強調音声振幅スペクトル補正部１８は、これらの劣化音声振幅スペクトル、推定雑音パワースペクトル、強調音声振幅スペクトル、補正抑圧係数を用いて強調音声振幅スペクトルを補正し、逆フーリエ変換部９へ伝達する。強調音声振幅スペクトル補正部１８の構成と動作の詳細な説明は、図２を参照しながら行う。 The enhanced speech amplitude spectrum correction unit 18 includes a degraded speech amplitude spectrum from the Fourier transform unit 3, an estimated noise power spectrum from the estimated noise calculation unit 5, an enhanced speech amplitude spectrum from the multiple multiplication unit 16, and a corrected suppression from the suppression coefficient correction unit 15. Each coefficient is supplied. The enhanced speech amplitude spectrum correction unit 18 corrects the enhanced speech amplitude spectrum using the deteriorated speech amplitude spectrum, the estimated noise power spectrum, the enhanced speech amplitude spectrum, and the correction suppression coefficient, and transmits the corrected speech amplitude spectrum to the inverse Fourier transform unit 9. A detailed description of the configuration and operation of the enhanced speech amplitude spectrum correction unit 18 will be given with reference to FIG.

図２は強調音声振幅スペクトル補正部１８の構成を示すブロック図である。強調音声振幅スペクトル補正部１８は、多重乗算部１７０、１７３、音声存在確率計算部１７１、後抑圧係数計算部１８２を有する。多重乗算部１７０は、図１の多重乗算部１６から供給される強調音声振幅スペクトルを用いて、強調音声パワースペクトルを計算し、音声存在確率計算部１７１へ伝達する。音声存在確率計算部１７１は、多重乗算部１７０及び図１の推定雑音計算部５から供給される強調音声パワースペクトル及び推定雑音パワースペクトルを用いて、音声存在確率を計算し、後抑圧係数計算部１８２に伝達する。音声存在確率計算部に供給されている強調音声パワースペクトルと推定雑音パワースペクトルは、共に劣化音声振幅スペクトルから計算されている。従って、音声存在確率は、本質的には劣化音声パワースペクトルを基に計算されていると言える。 FIG. 2 is a block diagram showing the configuration of the enhanced speech amplitude spectrum correction unit 18. The enhanced speech amplitude spectrum correction unit 18 includes multiple multiplication units 170 and 173, a speech presence probability calculation unit 171, and a post-suppression coefficient calculation unit 182. Multiplexer 170 calculates an emphasized speech power spectrum using the enhanced speech amplitude spectrum supplied from multiplex multiplier 16 in FIG. 1, and transmits it to speech presence probability calculator 171. The speech presence probability calculation unit 171 calculates a speech presence probability using the enhanced speech power spectrum and the estimated noise power spectrum supplied from the multiple multiplication unit 170 and the estimated noise calculation unit 5 in FIG. 1, and a post-suppression coefficient calculation unit. 182. Both the enhanced speech power spectrum and the estimated noise power spectrum supplied to the speech presence probability calculation unit are calculated from the degraded speech amplitude spectrum. Therefore, it can be said that the speech existence probability is essentially calculated based on the degraded speech power spectrum.

後抑圧係数計算部１８２は、音声存在確率計算部１７１から供給された音声存在確率と、図１の抑圧係数補正部１５から供給された補正抑圧係数と、図１の推定雑音計算部５から供給された推定雑音と、図１の抑圧係数補正部１５から供給された補正抑圧係数を用いて、後抑圧係数を計算し、多重乗算部１７３に伝達する。多重乗算部１７３は、図１のフーリエ変換部から供給された劣化音声振幅スペクトルを、後抑圧係数計算部１７２から供給された後抑圧係数で重みづけすることによって補正強調音声振幅スペクトルを求め、図１の逆フーリエ変換部９に伝達する。多重乗算部１７０、１７３の構成は、図３７を用いて説明した多重乗算部１３に等しいので、詳細な説明は省略する。 The post-suppression coefficient calculator 182 supplies the speech presence probability supplied from the speech presence probability calculator 171, the corrected suppression coefficient supplied from the suppression coefficient corrector 15 in FIG. 1, and the estimated noise calculator 5 in FIG. 1. The post-suppression coefficient is calculated using the estimated noise and the correction suppression coefficient supplied from the suppression coefficient correction unit 15 in FIG. 1 and transmitted to the multiple multiplication unit 173. The multiplex multiplication unit 173 obtains a corrected enhanced speech amplitude spectrum by weighting the degraded speech amplitude spectrum supplied from the Fourier transform unit of FIG. 1 with the post-suppression coefficient supplied from the post-suppression coefficient calculation unit 172. 1 to the inverse Fourier transform unit 9. The configuration of the multiple multipliers 170 and 173 is the same as that of the multiple multiplier 13 described with reference to FIG.

音声存在確率計算部１７１及び後抑圧係数計算部１８２の構成と動作の詳細な説明は、図３及び図５を参照しながら行う。 Detailed configuration and operation of the speech existence probability calculation unit 171 and the post-suppression coefficient calculation unit 182 will be described with reference to FIGS.

図３は音声存在確率計算部１７１の構成を示すブロック図である。音声存在確率計算部１７１は、分離部１７００、１７０８、平均値計算部１７０１、１７０９、対数計算部１７０２、１７１０、乗算部１７０３、１７１１、平滑化係数記憶部１７０４、１７０６、平滑化部１７０５、１７０７、関数値計算部１７１２、１７１３、平均指標計算部１７１４、瞬時指標計算部１７１５、加算部１７１６を有する。 FIG. 3 is a block diagram showing the configuration of the speech existence probability calculation unit 171. The speech existence probability calculation unit 171 includes separation units 1700 and 1708, average value calculation units 1701 and 1709, logarithmic calculation units 1702 and 1710, multiplication units 1703 and 1711, smoothing coefficient storage units 1704 and 1706, and smoothing units 1705 and 1707. , Function value calculation units 1712 and 1713, average index calculation unit 1714, instantaneous index calculation unit 1715, and addition unit 1716.

分離部１７００は、図２の多重乗算部１７０から供給される強調音声パワースペクトルを周波数別強調音声パワースペクトルに分離し、平均値計算部１７０１へ出力する。平均値計算部１７０１は、強調音声パワースペクトル｜Ｘ_ｎ（ｋ）｜^２バーのｋ＝０からＫ−１に対する総和をＫで除算し、計算結果を対数計算部１７０２へ伝達する。対数計算部１７０２は、平均値計算部１７０１から入力された平均値の対数を計算し、乗算器１７０３へ伝達する。乗算器１７０３は、供給された対数値を定数倍して、強調音声パワーＰＥ_ｎを求め、平滑化部１７０５、１７０７へ供給する。すなわち、第ｎフレームの強調音声パワーＰＥ_ｎは、

Separating section 1700 separates the emphasized speech power spectrum supplied from multiplex multiplication section 170 in FIG. Average value calculation section 1701 divides the sum of emphasized speech power spectrum | X _n (k) | ² bars from k = 0 to K−1 by K, and transmits the calculation result to logarithmic calculation section 1702. The logarithm calculation unit 1702 calculates the logarithm of the average value input from the average value calculation unit 1701 and transmits the logarithm to the multiplier 1703. The multiplier 1703 multiplies the supplied logarithm value by a constant to obtain the emphasized speech power PE _n and supplies it to the smoothing

units

1705 and 1707. That is, the emphasized speech power PE _n of the nth frame is

で与えられる。 Given in.

一方、分離部１７０８は、図１の雑音推定計算部５から供給された推定雑音パワースペクトルを周波数別推定雑音パワースペクトルに分離し、平均値計算部１７０９へ出力する。平均値計算部１７０９は、周波数別推定雑音パワースペクトルλ_ｎ（ｋ）のｋ＝０からＫ−１に対する総和をＫで除算し、計算結果を対数計算部１７１０へ伝達する。対数計算部１７１０は、平均値計算部１７０９から供給された平均値の対数を計算し、乗算器１７１１へ伝達する。乗算器１７１１は、供給された対数値を定数倍して、推定雑音パワーＰＮ_ｎを求め、関数値計算部１７１２、１７１３へ供給する。すなわち、第ｎフレームの推定雑音パワーＰＮ_ｎは、

On the other hand, the separation unit 1708 separates the estimated noise power spectrum supplied from the noise estimation calculation unit 5 of FIG. 1 into frequency-specific estimated noise power spectra, and outputs them to the average value calculation unit 1709. Average value calculation section 1709 divides the total sum of frequency-specific estimated noise power spectrum λ _n (k) from k = 0 to K−1 by K, and transmits the calculation result to logarithmic calculation section 1710. The logarithm calculation unit 1710 calculates the logarithm of the average value supplied from the average value calculation unit 1709 and transmits the logarithm to the multiplier 1711. The multiplier 1711 multiplies the supplied logarithmic value by a constant to obtain an estimated noise power PN _n and supplies it to the function

value calculation units

1712 and 1713. That is, the estimated noise power PN _n of the nth frame is

で与えられる。 Given in.

入力信号に音声がどの程度含まれているかを表す指標は、推定雑音パワーＰＮ_ｎと強調音声パワーＰＥ_ｎの相対関係をもとに計算される。強調音声パワーＰＥ_ｎが推定雑音パワーＰＮ_ｎよりも大きければ、指標は音声の存在確率が高いことを示す。一般的に、推定雑音パワーＰＮ_ｎと強調音声パワーＰＥ_ｎは非定常信号であるため、音声区間において推定雑音パワーＰＮ_ｎが強調音声パワーＰＥ_ｎよりも大きくなる場合が発生する。逆に、雑音区間でも推定雑音パワーＰＮ_ｎが強調音声パワーＰＥ_ｎよりも大きくなることがある。従って、それぞれのパワーを補正せずに指標計算に用いると、誤った音声存在確率が得られる可能性がある。このため、音声存在確率計算の精度を向上するには、推定雑音パワーＰＮ_ｎと強調音声パワーＰＥ_ｎを適切に補正することが望ましい。また、複数の補正方法を導入し、複数の指標をもとに音声存在確率を計算すれば、精度は更に向上する。 Index indicating it contains what extent audio input signal is calculated based on the relative relationship between the enhanced speech power PE _n and the estimated noise power PN _n. If the emphasized speech power PE _n is larger than the estimated noise power PN _n , the index indicates that the presence probability of speech is high. In general, the estimated noise power PN _n and the emphasized speech power PE _n are non-stationary signals, and therefore the estimated noise power PN _n may be larger than the enhanced speech power PE _n in the speech interval. On the contrary, the estimated noise power PN _n may be larger than the emphasized speech power PE _n even in the noise interval. Therefore, if the respective powers are used for index calculation without correction, there is a possibility that an erroneous speech existence probability is obtained. For this reason, in order to improve the accuracy of the speech existence probability calculation, it is desirable to appropriately correct the estimated noise power PN _n and the emphasized speech power PE _n . Moreover, if a plurality of correction methods are introduced and the speech existence probability is calculated based on a plurality of indices, the accuracy is further improved.

本実施例では、強調音声パワーＰＥ_ｎは平滑化部１７１５と１７１６において平滑化処理を用いて、推定雑音パワーＰＮ_ｎは関数値計算部１７１２と１７１３において適切な関数を用いて、指標計算に適した値に補正される。指標としては、分析区間長がそれぞれ異なる瞬時指標と平均指標の二種類が計算される。 In this embodiment, the enhanced speech power PE _n is suitable for index calculation using smoothing processing in the smoothing units 1715 and 1716, and the estimated noise power PN _n is appropriate for function calculation in the function value calculation units 1712 and 1713. The value is corrected. Two types of indicators are calculated: an instantaneous indicator and an average indicator with different analysis section lengths.

平滑化部１７０５は、平滑化係数記憶部１７０４から供給された平滑化係数を用いて、乗算器１７０３から供給された強調音声パワーＰＥ_ｎを時間方向に平滑化し、第一の平滑強調音声パワーを瞬時指標計算部１７１５へ供給する。平滑化部１７０７も同様に、平滑化係数記憶部１７０６から供給された平滑化係数を用いて、乗算器１７０３から供給された強調音声パワーＰＥ_ｎを時間方向に平滑化し、第二の平滑強調音声パワーを平均指標計算部１７１４へ供給する。基本的に、平滑化係数記憶部１７０４に記憶されている係数の方が、平滑化係数記憶部１７０６の係数よりも小さくなるように設定される。これは、平滑化係数の値が小さい程、平滑化部の時間方向平滑化効果が小さくなり、瞬時指標の計算に適しているためである。 The smoothing unit 1705 uses the smoothing coefficient supplied from the smoothing coefficient storage unit 1704 to smooth the emphasized speech power PE _n supplied from the multiplier 1703 in the time direction, and obtains the first smoothed enhanced speech power. It supplies to the instantaneous index calculation part 1715. Similarly, the smoothing unit 1707 uses the smoothing coefficient supplied from the smoothing coefficient storage unit 1706 to smooth the emphasized speech power PE _n supplied from the multiplier 1703 in the time direction, and the second smoothed emphasized speech. The power is supplied to the average index calculation unit 1714. Basically, the coefficient stored in the smoothing coefficient storage unit 1704 is set to be smaller than the coefficient in the smoothing coefficient storage unit 1706. This is because the smaller the value of the smoothing coefficient, the smaller the smoothing effect in the time direction of the smoothing unit, which is suitable for the calculation of the instantaneous index.

関数値計算部１７１３は、乗算器１７１１から供給された推定雑音パワーＰＮ_ｎから第一の関数値を計算し、瞬時指標計算部１７１５へ供給する。関数値計算部１７１２も同様に、乗算器１７１１から供給された推定雑音パワーＰＮ_ｎから第二の関数値を計算し、平均指標計算部１７１４へ供給する。関数値の計算には、ダイナミックレンジの圧縮や拡大を行うために線形又は非線形関数や、分散を低減するために平滑化が用いられる。ダイナミックレンジの圧縮や拡大、分散の低減により、推定雑音パワーＰＮ_ｎの非定常性に起因する指標計算の精度劣化を低減できる。また、演算量を低減するために、関数値計算を省略し、推定雑音パワーＰＮ_ｎをそのまま指標計算に利用することも可能である。関数値計算部１７１２と１７１３では、例えば次のような関数が利用される。

The function value calculation unit 1713 calculates the first function value from the estimated noise power PN _n supplied from the multiplier 1711 and supplies the first function value to the instantaneous index calculation unit 1715. Similarly, the function value calculating unit 1712, a second function value calculated from the estimated noise power PN _n supplied from the multiplier 1711, and supplies the average index calculation unit 1714. In calculating the function value, a linear or non-linear function is used to compress or expand the dynamic range, and smoothing is used to reduce dispersion. Dynamic range compression or expansion of, the reduction of the dispersion can be reduced degradation of accuracy index calculation due to unsteadiness of the estimated noise power PN _n. Further, in order to reduce the calculation amount, omitting the function value calculations, it is also possible to use the estimated noise power PN _n as they index calculation. In the function

value calculation units

1712 and 1713, for example, the following functions are used.

但し、ＰＮ_ｎハットは関数値、ａ_ｆｃとｂ_ｆｃは実数である。 However, PN _n hat is a function value, and a _fc and b _fc are real numbers.

瞬時指標計算部１７１５は、平滑化部１７０５から供給された第一の平滑強調音声パワーと、関数値計算部１７１３から供給された第一の関数値を用いて、瞬時指標を計算し、加算部１７１６へ供給する。平均指標計算部１７１４は、平滑化部１７０７から供給された第二の平滑強調音声パワーと、関数値計算部１７１２から供給された第二の関数値を用いて、平均指標を計算し、加算部１７１６へ供給する。指標の計算には、強調音声パワーＰＥ_ｎと推定雑音パワーＰＮ_ｎの比を計算し、その比に応じて数値を大きくする方法が利用される。具体例としては、次のような計算方法が挙げられる。

The instantaneous index calculation unit 1715 calculates an instantaneous index using the first smoothing emphasized speech power supplied from the smoothing unit 1705 and the first function value supplied from the function value calculation unit 1713, and the addition unit 1716. The average index calculation unit 1714 calculates an average index using the second smoothing enhancement speech power supplied from the smoothing unit 1707 and the second function value supplied from the function value calculation unit 1712, and adds the addition unit. 1716. For the calculation of the index, a method is used in which the ratio between the emphasized speech power PE _n and the estimated noise power PN _n is calculated and the numerical value is increased according to the ratio. Specific examples include the following calculation methods.

但し、ＩＤＸ_ｎは指標、ＰＥ_ｎバーは平滑強調音声パワー、ＰＮ_ｎハットは関数値である。また、θ_ｉｄｘ、ａ_ｉｄｘとｂ_ｉｄｘは実数で、ａ_ｉｄｘはｂ_ｉｄｘ以上の値を有する。 However, IDX _n is an index, PE _n bar is a smooth emphasis voice power, and PN _n hat is a function value. Θ _idx , a _idx and b _idx are real numbers, and a _idx has a value equal to or greater than b _idx .

比を計算するときに分母に定数を加えると、分母の値が定数よりも小さくならないので、比を計算する際に発散を防止できる。この他にも、強調音声パワーＰＥ_ｎと推定雑音パワーＰＮ_ｎの差や、差を強調音声パワーＰＥ_ｎで正規化した値を用いて計算することもできるが、詳細は省略する。 If a constant is added to the denominator when calculating the ratio, the value of the denominator does not become smaller than the constant, so that divergence can be prevented when calculating the ratio. In addition to this, the difference between the enhanced speech power PE _n and the estimated noise power PN _n and a value obtained by normalizing the difference with the enhanced speech power PE _n can be calculated, but details are omitted.

加算部１７１６は、平均指標計算部１７１４及び瞬時指標計算部１７１５から供給された平均指標及び瞬時指標の和を計算し、音声存在確率として図２の後抑圧係数計算部１７２へ伝達する。音声存在確率の計算には、加算以外にも、重みつき加算や乗算を用いることが可能である。音声存在確率の精度を改善するために、分析区間が異なる３種類以上の指標を計算しても良い。また、１種類の指標だけを利用し、計算を簡略化することも可能である。 The addition unit 1716 calculates the sum of the average index and the instantaneous index supplied from the average index calculation unit 1714 and the instantaneous index calculation unit 1715, and transmits the sum to the post-suppression coefficient calculation unit 172 of FIG. In addition to the addition, weighted addition or multiplication can be used for calculating the speech existence probability. In order to improve the accuracy of the speech existence probability, three or more types of indices having different analysis intervals may be calculated. It is also possible to simplify the calculation using only one type of index.

平滑化部１７０５の構成と動作の詳細な説明は、図４を用いて行う。 A detailed description of the configuration and operation of the smoothing unit 1705 will be given with reference to FIG.

図４は、図３の平滑化部１７０５の構成を示すブロック図である。平滑化部１７０５は、定数乗算器１７４１、乗算器１７４３、１７４４、加算器１７４２、１７４５、遅延器１７４６を有する。図３の乗算器１７０３から強調音声パワーＰＥ_ｎが、図３の平滑化係数記憶部１７０４から平滑化係数が、それぞれ入力として供給される。値δを有する平滑化係数は、定数乗算器１７４１と乗算器１７４４に伝達される。定数乗算器１７４１は、入力信号を−１倍して−δとし、これを加算器１７４２に伝達する。加算器１７４２のもう一方の入力としては１が供給されており、加算器１７４２の出力は両者の和である１−δとなる。１−δは乗算器１７４３に供給されて、もう一方の入力である強調音声パワーＰＥ_ｎと乗算され、積である（１−δ）ＰＥ_ｎが加算器１７４５に伝達される。 FIG. 4 is a block diagram showing a configuration of the smoothing unit 1705 of FIG. The smoothing unit 1705 includes a constant multiplier 1741, multipliers 1743 and 1744, adders 1742 and 1745, and a delay unit 1746. The emphasized speech power PE _n is supplied from the multiplier 1703 in FIG. 3 and the smoothing coefficient is supplied from the smoothing coefficient storage unit 1704 in FIG. The smoothing coefficient having the value δ is transmitted to the constant multiplier 1741 and the multiplier 1744. The constant multiplier 1741 multiplies the input signal by −1 to obtain −δ, and transmits this to the adder 1742. 1 is supplied as the other input of the adder 1742, and the output of the adder 1742 is 1−δ which is the sum of both. 1-δ is supplied to a multiplier 1743 and multiplied by the emphasized speech power PE _n which is the other input, and (1-δ) PE _n which is a product is transmitted to the adder 1745.

一方、乗算器１７４４では、平滑化係数として供給されたδと遅延器１７４６から供給された１フレーム前の平滑化強調音声パワーＰＥ_ｎ−１バーが乗算され、積であるδＰＥ_ｎ−１バーが加算器１７４５に伝達される。加算器１７４５は、（１−δ）ＰＥ_ｎとδＰＥ_ｎ−１バーの和を遅延器１７４６と図３の瞬時指標計算部１７１５に、平滑化強調音声パワーＰＥ_ｎバーとして、出力する。以上の計算は、式（１９）によって表すことができる。

On the other hand, the multiplier 1744 multiplies δ supplied as a smoothing coefficient by the smoothing-enhanced speech power PE _n−1 bar one frame before supplied from the delay unit 1746, and a product δPE _n−1 bar is obtained. It is transmitted to the adder 1745. The adder 1745, to (1-δ) instantaneous index calculator 1715 PE _n and .DELTA.Pe _n-1 bar sum of the delay device 1746 and FIG. 3, as the smoothing enhanced speech power PE _n bars, and outputs. The above calculation can be expressed by equation (19).

平滑化部１７０７の構成は、平滑化部１７０５と同じである。但し、平滑化部１７０７は、平滑化係数記憶部１７０６から供給される平滑化係数を用いて、平滑化強調音声パワーを計算する。また、平滑化部１７０５と１７０７では、式（１９）の他に、移動平均を利用することも可能である。 The configuration of the smoothing unit 1707 is the same as that of the smoothing unit 1705. However, the smoothing unit 1707 uses the smoothing coefficient supplied from the smoothing coefficient storage unit 1706 to calculate the smoothed enhanced speech power. Further, the smoothing units 1705 and 1707 can use a moving average in addition to the equation (19).

図５は、図２の後抑圧係数計算部１８２の構成を示すブロック図である。後抑圧係数計算部１８２は、分離部１７２２、周波数別後抑圧係数計算部１８２１_０〜１８２１_Ｋ−１、多重化部１７２３を有する。分離部１７２２は、図１の抑圧係数補正部５から供給された補正抑圧係数を周波数別補正抑圧係数に分離し、周波数別後抑圧係数計算部１８２１_０〜１８２１_Ｋ−１に伝達する。周波数別後抑圧係数計算部１８２１_０〜１８２１_Ｋ−１は、図２の音声存在確率計算部１７１、図１の多重乗算部１６及び推定雑音計算部５からそれぞれ供給される音声存在確率、強調音声振幅スペクトル、推定雑音パワースペクトル、及び分離部１７２２から供給される周波数別補正抑圧係数を用いて、周波数別後抑圧係数を計算し、多重化部１７２３に伝達する。 FIG. 5 is a block diagram showing the configuration of the post-suppression coefficient calculator 182 in FIG. The post-suppression coefficient calculator 182 includes a separator 1722, post-frequency post-suppression coefficients calculators 1821 _{0 to} 1821 _K−1 , and a multiplexer 1723. The separation unit 1722 separates the corrected suppression coefficient supplied from the suppression coefficient correction unit 5 of FIG. 1 into frequency-specific correction suppression coefficients and transmits the frequency-specific post-frequency suppression coefficient calculation units 1821 _{0 to} 1821 _K−1 . The post-frequency suppression coefficient calculation units 1821 _{0 to} 1821 _K−1 are the speech existence probability and the emphasized speech respectively supplied from the speech existence probability calculation unit 171 in FIG. 2, the multiple multiplication unit 16 in FIG. 1, and the estimated noise calculation unit 5. The frequency-specific post-frequency suppression coefficient is calculated using the amplitude spectrum, the estimated noise power spectrum, and the frequency-specific correction suppression coefficient supplied from the separation unit 1722 and transmitted to the multiplexing unit 1723.

周波数別後抑圧係数計算部１８２１_０〜１８２１_Ｋ−１の構成と動作の詳細な説明は、図６を参照しながら行う。 A detailed description of the configuration and operation of the frequency-specific post-frequency suppression coefficient calculation units 1821 _{0 to} 1821 _K−1 will be given with reference to FIG.

図６は、図５の周波数別後抑圧係数計算部１８２１_０〜１８２１_Ｋ−１の構成を示すブロック図である。周波数別後抑圧係数計算部１８２１は、有音部用係数記憶部１８３１、無音部用係数計算部１８３２、係数計算部１８３３、乗算器１８３４を有する。周波数別後抑圧係数計算部１８２１は、音声存在確率に応じて、周波数別後抑圧係数を計算する。音声存在確率が低ければ、無音部用係数の寄与率が高い係数を用いて、周波数別後抑圧係数の値を小さくする。このため、雑音区間での残留雑音を更に低減できる。逆に、音声存在確率が高い場合には、有音部用係数の寄与率が高い係数を用いて、周波数別後抑圧係数が周波数別補正抑圧係数と同等の値になるように補正する。また、周波数別後抑圧係数が周波数別補正抑圧係数よりも少し大きくなるように補正しても良い。以上から、音声存在確率が高い場合には、音声の過剰抑圧を防止できる。本実施例では、係数は各周波数毎に計算しているが、全帯域で共通の係数を求め、その係数を周波数別補正抑圧係数に適用すれば、係数の計算に必要な演算量を低減できる。 FIG. 6 is a block diagram illustrating a configuration of the frequency-specific post-frequency suppression coefficient calculation units 1821 _{0 to} 1821 _K−1 in FIG. 5. The post-frequency suppression coefficient calculation unit 1821 includes a sound part coefficient storage unit 1831, a silence part coefficient calculation unit 1832, a coefficient calculation unit 1833, and a multiplier 1834. The frequency-specific post-frequency suppression coefficient calculation unit 1821 calculates the frequency-specific post-frequency suppression coefficient according to the speech existence probability. If the speech existence probability is low, the value of the post-frequency suppression coefficient is reduced using a coefficient with a high contribution ratio of the silence part coefficient. For this reason, the residual noise in the noise section can be further reduced. On the other hand, when the speech existence probability is high, correction is performed so that the post-frequency suppression coefficient becomes equal to the frequency-specific correction suppression coefficient using a coefficient with a high contribution ratio of the sound part coefficient. Further, correction may be performed so that the post-frequency suppression coefficient is slightly larger than the frequency-specific correction suppression coefficient. From the above, when the voice existence probability is high, excessive suppression of voice can be prevented. In this embodiment, the coefficient is calculated for each frequency. However, if a common coefficient is obtained in all bands and the coefficient is applied to the correction suppression coefficient for each frequency, the amount of calculation required for calculating the coefficient can be reduced. .

係数計算部１８３３は、有音部用係数記憶部１８３１と無音部用係数計算部１８３２からそれぞれ出力される有音部用係数と無音部用係数、及び図２の音声存在確率計算部１７１から供給される音声存在確率をもとに、係数を計算する。 The coefficient calculation unit 1833 supplies the sound part coefficient and the soundless part coefficient output from the sound part coefficient storage part 1831 and the soundless part coefficient calculation part 1832 and the sound existence probability calculation part 171 in FIG. The coefficient is calculated based on the voice presence probability.

音声存在確率をｐ、有音部用係数をＦＶ、無音部用係数をＦＵとした場合に、係数計算部１８３３から出力される係数Ｆは、式（２０）で与えられる。

When the speech existence probability is p, the sound part coefficient is FV, and the silence part coefficient is FU, the coefficient F output from the coefficient calculation unit 1833 is given by Expression (20).

係数の計算では、音声存在確率が大きければ、係数計算部１８３３の出力値に対する有音部用係数の寄与率を大きくする。式（２０）の計算方法では、音声存在確率をそのまま寄与率として利用している。 In the coefficient calculation, if the speech existence probability is large, the contribution ratio of the sound part coefficient to the output value of the coefficient calculation unit 1833 is increased. In the calculation method of Expression (20), the speech existence probability is directly used as a contribution rate.

また、式（２１）に示すように、適当な関数Ｆ_ＳＦＣ、Ｇ_ＳＦＣを用いて有音部用と無音部用の係数を補正してから、音声存在確率を寄与率として利用することも可能である。

Further, as shown in the equation (21), the sound existence probability can be used as the contribution rate after correcting the coefficients for the sounded part and the silent part using appropriate functions F _SFC and G _SFC. It is.

この他にも、音声存在確率が予め定められた値以上の場合は、有音部用係数を係数計算部１８３３から出力することもできる。そして、乗算器１８３４は、図５の分離部１７２２から供給される周波数別補正抑圧係数と、係数計算部１８３３から供給される係数の積を計算し、周波数別後抑圧係数として図５の多重化部１７２３に伝達する。 In addition to this, when the voice existence probability is equal to or higher than a predetermined value, the coefficient for sound part can be output from the coefficient calculation unit 1833. Then, the multiplier 1834 calculates the product of the frequency-specific correction suppression coefficient supplied from the separation unit 1722 in FIG. 5 and the coefficient supplied from the coefficient calculation unit 1833, and uses the frequency-dependent post-frequency suppression coefficient in FIG. Transmitted to the unit 1723.

無音部用係数計算部１８３２は、図２の音声存在確率計算部１７１、図１の多重乗算部１６及び推定雑音計算部１からそれぞれ供給される音声存在確率、強調音声振幅スペクトル、推定雑音パワースペクトルを用いて、無音部用係数を求め、係数計算部１８３３へ供給する。雑音区間の残留雑音を低減するため、有音部用係数よりも小さな値を出力するように無音部用係数計算部１８３２を設計する。 The silent part coefficient calculation unit 1832 includes the speech existence probability, the enhanced speech amplitude spectrum, and the estimated noise power spectrum respectively supplied from the speech existence probability calculation unit 171 in FIG. 2, the multiple multiplication unit 16 in FIG. 1, and the estimated noise calculation unit 1. Is used to obtain the coefficient for silent part and supply it to the coefficient calculation part 1833. In order to reduce residual noise in the noise section, the silent part coefficient calculation unit 1832 is designed so as to output a value smaller than the sound part coefficient.

無音部用係数計算部１８３２の構成と動作の詳細な説明は、図７を用いて行う。 A detailed description of the configuration and operation of the silent part coefficient calculation unit 1832 will be given with reference to FIG.

図７は、図６の無音部用係数計算部１８３２の構成を示すブロック図である。無音部用係数計算部１８３２は、分離部１８５０、１８５５、平均値計算部１８５１、１８５６、音声パワー混合部１８５２、平滑化部１８５３、平滑化係数記憶部１８５４、平滑信号記憶部１８５８、除算部１８５７、１８６２、対数計算部１８５９、定数乗算部１８６０、係数計算部１８６１、指数計算部１８６３を有する。 FIG. 7 is a block diagram showing the configuration of the silent part coefficient calculation unit 1832 of FIG. The silent part coefficient calculation unit 1832 includes separation units 1850 and 1855, average value calculation units 1851 and 1856, audio power mixing unit 1852, smoothing unit 1853, smoothing coefficient storage unit 1854, smooth signal storage unit 1858, and division unit 1857. 1862, a logarithm calculation unit 1859, a constant multiplication unit 1860, a coefficient calculation unit 1861, and an exponent calculation unit 1863.

分離部１８５０は、図１の多重乗算部１６から供給される強調音声パワースペクトルを周波数別強調音声パワースペクトルに分離し、平均値計算部１８５１へ伝達する。平均値計算部１８５１は、周波数別強調音声パワースペクトル｜Ｘ_ｎ（ｋ）｜^２バーのｋ＝０からＫ−１に対する総和をＫで除算し、強調音声パワーとして音声パワー混合部１８５２へ伝達する。音声パワー混合部１８５２は、平均値計算部１８５１から供給される強調音声パワーと、平滑信号記憶部１８５８から供給される１フレーム前の平滑強調音声パワーを、図２の音声存在確率計算部１７１から供給される音声存在確率に応じて混合し、混合した信号を平滑化部１８５３へ伝達する。混合の際、音声存在確率が高ければ、平均電力計算部１８５１から供給される強調音声平均パワーの比率を高くし、低ければ、平滑信号記憶部１８５８から供給される平滑強調音声パワーの比率を高くする。 Separating section 1850 separates the emphasized speech power spectrum supplied from multiplex multiplication section 16 in FIG. The average value calculation unit 1851 divides the sum of the frequency-specific enhanced speech power spectrum | X _n (k) | ² bars from k = 0 to K−1 by K, and transmits the sum to the speech power mixing unit 1852 as the enhanced speech power. . The speech power mixing unit 1852 uses the enhanced speech power supplied from the average value calculation unit 1851 and the smoothed enhanced speech power one frame before supplied from the smoothed signal storage unit 1858 from the speech existence probability calculation unit 171 in FIG. Mixing is performed according to the supplied speech existence probability, and the mixed signal is transmitted to the smoothing unit 1853. At the time of mixing, if the speech existence probability is high, the ratio of the emphasized speech average power supplied from the average power calculator 1851 is increased, and if it is low, the ratio of the smoothed emphasized speech power supplied from the smooth signal storage unit 1858 is increased. To do.

平滑化部１８５３は、平滑化係数記憶部１８５４から供給される平滑化係数に応じて、音声パワー混合部１８５２から供給された混合信号を平滑化し、平滑強調音声パワーとして平滑信号記憶部１８５８と除算部１８５７に伝達する。音声パワー混合部の機能から明らかなように、音声存在確率が低い区間では、平滑化部１８５３は、１フレーム前の平滑強調音声パワーが多く含まれた信号を用いて、平滑強調音声パワーを計算する。従って、平滑強調音声パワーは殆ど更新されない。このため、平滑部１８５３からは、雑音区間においても、音声区間で計算された強調音声パワーが常に出力される。一方、音声存在確率が高い区間では、平滑化部１８５３は、強調音声平均パワーが多く含まれた信号を用いて、平滑強調音声パワーを計算する。 The smoothing unit 1853 smoothes the mixed signal supplied from the audio power mixing unit 1852 in accordance with the smoothing coefficient supplied from the smoothing coefficient storage unit 1854, and divides the smoothed signal from the smoothed signal storage unit 1858 as the smoothed enhanced audio power. Part 1857. As is clear from the function of the speech power mixing unit, in a section where the speech existence probability is low, the smoothing unit 1853 calculates the smoothed speech power using a signal containing a large amount of the smoothed speech power one frame before. To do. Therefore, the smooth enhancement speech power is hardly updated. For this reason, the smoothing unit 1853 always outputs the enhanced speech power calculated in the speech interval even in the noise interval. On the other hand, in a section where the speech existence probability is high, the smoothing unit 1853 calculates smooth enhanced speech power using a signal that includes a large amount of enhanced speech average power.

音声パワー混合部１８５２で利用されている音声存在確率は、図２の音声存在確率計算部１７１から供給されており、強調音声パワースペクトルと推定雑音パワースペクトルを基に計算されている。無音部用係数計算部１８３２にも、強調音声パワースペクトルと推定雑音パワースペクトルが入力されているので、音声パワー混合部１８５２で利用する音声存在確率を無音部用係数計算部１８３２の内部でも計算することが可能である。 The voice presence probability used in the voice power mixing unit 1852 is supplied from the voice presence probability calculation unit 171 in FIG. 2, and is calculated based on the enhanced voice power spectrum and the estimated noise power spectrum. Since the emphasized speech power spectrum and the estimated noise power spectrum are also input to the silence part coefficient calculation unit 1832, the speech existence probability used in the speech power mixing unit 1852 is also calculated inside the silence part coefficient calculation unit 1832. It is possible.

また、図２の音声存在確率計算部１７１の場合と同様に、強調音声パワースペクトルと推定雑音パワースペクトルは、劣化音声振幅スペクトルをもとに計算されているので、音声パワー混合部１８５２で利用されている音声存在確率は、本質的には劣化音声振幅スペクトルから求められているといえる。 Further, as in the case of the speech existence probability calculation unit 171 in FIG. 2, the enhanced speech power spectrum and the estimated noise power spectrum are calculated based on the degraded speech amplitude spectrum, and thus are used in the speech power mixing unit 1852. It can be said that the existing speech existence probability is essentially obtained from the degraded speech amplitude spectrum.

一方、分離部１８５５は、図１の推定雑音計算部５から供給された推定雑音パワースペクトルを周波数別推定雑音パワースペクトルに分離し、平均値計算部１８５６へ出力する。平均値計算部１８５６は、周波数別推定雑音パワースペクトルλ_ｎ（ｋ）のｋ＝０からＫ−１に対する総和をＫで除算し、計算結果を推定雑音平均パワーとして除算部１８５７へ伝達する。除算部１８５７は、平滑化部１８５３から供給される強調音声平均パワーを、平均値計算部１８５６から供給される推定雑音平均パワーで除算し、除算結果を対数計算部１８５９へ伝達する。対数計算部１８５９は、除算部１８５７から供給された除算結果の対数を計算し、対数値を定数乗算部１８６０へ伝達する。 On the other hand, the separation unit 1855 separates the estimated noise power spectrum supplied from the estimated noise calculation unit 5 of FIG. 1 into the estimated noise power spectrum for each frequency, and outputs it to the average value calculation unit 1856. Average value calculating section 1856 divides the sum of k = 1 to K−1 of estimated frequency noise power spectrum λ _n (k) by frequency by K, and transmits the calculation result to dividing section 1857 as estimated noise average power. The division unit 1857 divides the emphasized speech average power supplied from the smoothing unit 1853 by the estimated noise average power supplied from the average value calculation unit 1856 and transmits the division result to the logarithmic calculation unit 1859. The logarithm calculation unit 1859 calculates the logarithm of the division result supplied from the division unit 1857 and transmits the logarithmic value to the constant multiplication unit 1860.

この定数乗算部１８６０は、対数計算部１８５９から供給された対数値を定数倍して、演算結果を係数計算部１８６１に伝達する。係数計算部１８６１は、定数乗算部１８６０の出力から係数を求め、除算部１８６２へ伝達する。除算部１８６２のもう一方の入力としては１０が供給されているので、除算部１８６２は、係数計算部１８６１から供給された係数を１０で除算し、除算結果を指数計算部１８６３へ伝達する。指数部計算部１８６３は、除算部１８６２の出力の指数を計算し、演算結果を無音部用係数として図６の係数計算部１８３３へ伝達する。 The constant multiplier 1860 multiplies the logarithmic value supplied from the logarithm calculator 1859 by a constant, and transmits the calculation result to the coefficient calculator 1861. The coefficient calculation unit 1861 obtains a coefficient from the output of the constant multiplication unit 1860 and transmits it to the division unit 1862. Since 10 is supplied as the other input of the division unit 1862, the division unit 1862 divides the coefficient supplied from the coefficient calculation unit 1861 by 10 and transmits the division result to the exponent calculation unit 1863. The exponent part calculation unit 1863 calculates the exponent of the output of the division unit 1862 and transmits the calculation result to the coefficient calculation unit 1833 in FIG. 6 as a silence part coefficient.

除算部１８５７の演算結果は、強調音声平均パワーと推定雑音パワーの比、すなわちＳＮＲに相当する。従って、係数計算部１８６１は、ＳＮＲをもとに無音部の抑圧度を計算していることになる。ＳＮＲを計算する目的は、音声存在確率計算部１７１で求めた音声存在確率の信頼度を、係数の計算に反映することである。ＳＮＲが高い場合、すなわち音声存在確率の信頼度が高い場合には、音声を誤って抑圧する可能性が小さいので、係数を小さくし、抑圧度を増加させる。一方、音声存在確率の信頼度が低い場合には、音声を誤って抑圧することを防ぐため、係数を大きくし、抑圧度を減少させる。ＳＮＲから係数を求めることが重要なので、計算を簡略化するために、対数計算部１８５９と指数計算部１８６３のどちらか一方、もしくは両方を省略することが可能である。 The calculation result of the division unit 1857 corresponds to the ratio between the emphasized speech average power and the estimated noise power, that is, the SNR. Accordingly, the coefficient calculation unit 1861 calculates the suppression degree of the silent part based on the SNR. The purpose of calculating the SNR is to reflect the reliability of the speech existence probability obtained by the speech existence probability calculation unit 171 in the calculation of the coefficient. When the SNR is high, that is, when the reliability of the voice existence probability is high, since the possibility of erroneously suppressing the voice is small, the coefficient is reduced and the degree of suppression is increased. On the other hand, when the reliability of the voice existence probability is low, the coefficient is increased to reduce the degree of suppression in order to prevent the voice from being erroneously suppressed. Since it is important to obtain the coefficient from the SNR, either one or both of the logarithmic calculator 1859 and the exponent calculator 1863 can be omitted to simplify the calculation.

また、予め適切に設定した定数を推定雑音平均パワーに加算してから除算を行えば、除算結果の発散を防ぐことができる。除算ではなく、除算の近似演算を利用しても、発散を防止できる。 Further, if division is performed after adding a constant set appropriately in advance to the estimated noise average power, divergence of the division result can be prevented. Divergence can also be prevented by using an approximate operation of division instead of division.

本実施例では、強調音声平均パワーと推定雑音パワーを計算する際に、全帯域のパワースペクトルの平均値を用いたが、適当な帯域幅を持ったサブバンド毎に計算したパワースペクトルの平均値を用いる方法も有効である。各帯域毎に平均値を計算するので、全帯域の平均値を用いた場合よりも、各帯域で正確なＳＮＲを計算することが可能になる。 In this embodiment, when calculating the emphasized speech average power and the estimated noise power, the average value of the power spectrum of the entire band is used. However, the average value of the power spectrum calculated for each subband having an appropriate bandwidth. The method using is also effective. Since the average value is calculated for each band, it is possible to calculate an accurate SNR in each band, compared to the case where the average value of all bands is used.

図８に、図７の係数計算部１８６１で係数を計算する際に用いる非線形関数の例を示す。ｆ_ｃｍを入力値としたとき、図８に示される非線形関数の出力値ｇ_ｃｍは、式（２２）で与えられる。

FIG. 8 shows an example of a nonlinear function used when the coefficient calculation unit 1861 in FIG. 7 calculates the coefficient. When f _cm is an input value, the output value g _cm of the nonlinear function shown in FIG. 8 is given by Expression (22).

但し、ａ_ｃｍ，ｂ_ｃｍ，ｃ_ｃｍ，ｄ_ｃｍは正の実数である。 However, a _cm , b _cm , c _cm , and d _cm are positive real numbers.

ｆ_ｃｍが大きくなればｇ_ｃｍが小さくなることが、式（２２）の非線形関数に求められる条件である。式（２２）の他にも、この条件を満たすような線形関数や高次多項式、重みつき加算を含む任意の関数を用いることができる。 The condition required for the nonlinear function of Equation (22) is that g _cm decreases as f _cm increases. In addition to Expression (22), a linear function that satisfies this condition, a high-order polynomial, and an arbitrary function including weighted addition can be used.

図９は本発明の第２の実施の形態を示すブロック図である。図９と第１の実施例である図１とは、強調音声振幅スペクトル補正部２８を除いて同一である。強調音声振幅スペクトル補正部２８の構成と動作の詳細な説明は、図１０を参照しながら行う。 FIG. 9 is a block diagram showing a second embodiment of the present invention. FIG. 9 and FIG. 1 as the first embodiment are the same except for the enhanced speech amplitude spectrum correction unit 28. A detailed description of the configuration and operation of the enhanced speech amplitude spectrum correction unit 28 will be given with reference to FIG.

図１０は強調音声振幅スペクトル補正部２８の構成を示すブロック図である。図２に示した強調音声振幅スペクトル補正部１８とは、後抑圧係数計算部１８２が後抑圧係数計算部２８２に置換されていることを除いて同一である。後抑圧係数計算部２８２の構成と動作の詳細な説明は、図１１を参照しながら行う。 FIG. 10 is a block diagram showing the configuration of the enhanced speech amplitude spectrum correction unit 28. The emphasized speech amplitude spectrum correction unit 18 shown in FIG. 2 is the same except that the post-suppression coefficient calculator 182 is replaced with a post-suppression coefficient calculator 282. Detailed configuration and operation of the post-suppression coefficient calculator 282 will be described with reference to FIG.

図１１は後抑圧係数計算部２８２の構成を示すブロック図である。図５に示した後抑圧係数計算部１８２とは、周波数別後抑圧係数計算部１８２１_０〜１８２１_Ｋ−１が周波数別後抑圧係数計算部２８２１_０〜２８２１_Ｋ−１に置換されていることを除いて同一である。周波数別後抑圧係数計算部２８２１_０〜２８２１_Ｋ−１の構成と動作の詳細な説明は、図１２を参照しながら行う。 FIG. 11 is a block diagram showing the configuration of the post-suppression coefficient calculator 282. The suppression coefficient calculation unit 182 then illustrated in Figure 5, that after the suppression coefficient calculation unit 1821 ₀ ~1821 _K-1 each frequency is replaced after each frequency suppression coefficient calculation unit 2821 ₀ ~2821 _K-1 Except for the same. A detailed description of the configuration and operation of the frequency-specific post-frequency suppression coefficient calculation units 2821 _{0 to} 2821 _K−1 will be given with reference to FIG.

図１２は、図１１の周波数別後抑圧係数計算部２８２１_０〜２８２１_Ｋ−１の構成を示すブロック図である。図６に示した周波数別後抑圧係数計算部１８２１とは、有音部用係数記憶部１８３１が有音部用係数計算部２８３１に置換されていることを除いて同一である。無音部用係数だけでなく、有音部用係数も計算するので、図６の周波数別後抑圧係数計算部よりも有音部で高音質を達成できる。 FIG. 12 is a block diagram illustrating the configuration of the frequency-specific post-frequency suppression coefficient calculation units 2821 _{0 to} 2821 _K−1 of FIG. 11. The post-frequency suppression coefficient calculation unit 1821 shown in FIG. 6 is the same except that the sound part coefficient storage unit 1831 is replaced with a sound part coefficient calculation unit 2831. Since not only the silence part coefficient but also the sound part coefficient is calculated, higher sound quality can be achieved in the sound part than the frequency-specific post-frequency suppression coefficient calculation part of FIG.

有音部用係数計算部２８３１は、図９の多重乗算部１６及び推定雑音計算部５からそれぞれ供給される強調音声パワースペクトルと推定雑音パワースペクトルを用いて、有音部用係数を求め、係数計算部１８３３へ供給する。推定雑音パワーが強調音声パワーよりも大きい場合、又は両パワーの大きさが同等の場合には、有音部用係数計算部２８３１は、推定雑音と強調音声のパワー比に応じて、１．０以上の値を出力する。これは、補正抑圧係数が適切な値よりも小さくなっている可能性があるので、音声区間で過剰抑圧となることを防ぐために行う。一方、推定雑音が強調音声よりも小さい場合には、音声区間で過剰抑圧が発生する可能性は低い。そこで、推定雑音と強調音声のパワー比とは無関係に、１．０以上の適切な定数値を出力する。 The sound part coefficient calculation unit 2831 obtains a sound part coefficient by using the emphasized speech power spectrum and the estimated noise power spectrum respectively supplied from the multiple multiplication unit 16 and the estimated noise calculation unit 5 of FIG. The data is supplied to the calculation unit 1833. When the estimated noise power is greater than the emphasized speech power, or when both powers are equal in magnitude, the sound part coefficient calculation unit 2831 generates 1.0 according to the power ratio between the estimated noise and the enhanced speech. The above values are output. This is performed in order to prevent over-suppression in the speech section because the correction suppression coefficient may be smaller than an appropriate value. On the other hand, when the estimated noise is smaller than the emphasized speech, the possibility of excessive suppression occurring in the speech section is low. Therefore, an appropriate constant value of 1.0 or more is output regardless of the estimated noise and the power ratio of the emphasized speech.

図１３は本発明の第３の実施の形態を示すブロック図である。図１３と第１の実施例である図１とは、強調音声振幅スペクトル補正部１７を除いて同一である。後述するように、強調音声振幅スペクトル補正部１７と１８の違いは、後抑圧係数の計算を行う際に、強調音声振幅スペクトル補正部１７が推定雑音パワースペクトルと強調音声パワースペクトルを利用しないところである。強調音声振幅スペクトル補正部１７の構成と動作の詳細な説明は、図１４を参照しながら行う。 FIG. 13 is a block diagram showing a third embodiment of the present invention. FIG. 13 and FIG. 1 as the first embodiment are the same except for the emphasized speech amplitude spectrum correction unit 17. As will be described later, the difference between the enhanced speech amplitude spectrum correction units 17 and 18 is that the enhanced speech amplitude spectrum correction unit 17 does not use the estimated noise power spectrum and the enhanced speech power spectrum when calculating the post-suppression coefficient. . A detailed description of the configuration and operation of the enhanced speech amplitude spectrum correction unit 17 will be given with reference to FIG.

図１４は強調音声振幅スペクトル補正部１７の構成を示すブロック図である。図２に示した強調音声振幅スペクトル補正部１８とは、後抑圧係数計算部１８２が後抑圧係数計算部１７２に置換されていることを除いて同一である。以下、この相違点を中心に詳細な動作を説明する。 FIG. 14 is a block diagram showing the configuration of the enhanced speech amplitude spectrum correction unit 17. The enhanced speech amplitude spectrum correction unit 18 shown in FIG. 2 is the same as that except that the post-suppression coefficient calculator 182 is replaced with the post-suppression coefficient calculator 172. Hereinafter, detailed operations will be described focusing on this difference.

後抑圧係数計算部１７２は、音声存在確率計算部１７１から供給された音声存在確率と、図１３の抑圧係数補正部１５から供給された補正抑圧係数を用いて、後抑圧係数を計算し、多重乗算部１７３に伝達する。後抑圧係数計算部１７２の構成と動作の詳細な説明は、図１５を用いて行う。 The post-suppression coefficient calculator 172 calculates a post-suppression coefficient using the speech presence probability supplied from the speech presence probability calculator 171 and the corrected suppression coefficient supplied from the suppression coefficient corrector 15 in FIG. This is transmitted to the multiplier 173. Detailed configuration and operation of the post-suppression coefficient calculator 172 will be described with reference to FIG.

図１５は後抑圧係数計算部１７２の構成を示すブロック図である。図５に示した後抑圧係数計算部１８２とは、周波数別後抑圧係数計算部１８２１_０〜１８２１_Ｋ−１が周波数別後抑圧係数計算部１７２１_０〜１７２１_Ｋ−１に置換されていることを除いて同一である。以下、この相違点を中心に詳細な動作を説明する。 FIG. 15 is a block diagram illustrating a configuration of the post-suppression coefficient calculation unit 172. The suppression coefficient calculation unit 182 then illustrated in Figure 5, that after the suppression coefficient calculation unit 1821 ₀ ~1821 _K-1 each frequency is replaced after each frequency suppression coefficient calculation unit 1721 ₀ ~1721 _K-1 Except for the same. Hereinafter, detailed operations will be described focusing on this difference.

周波数別後抑圧係数計算部１７２１_０〜１７２１_Ｋ−１は、分離部１７２２から供給される周波数別補正抑圧係数と、図１４の音声存在確率計算部１７１から供給される音声存在確率を用いて、周波数別後抑圧係数を計算し、多重化部１７２３に伝達する。周波数別後抑圧係数計算部１７２１_０〜１７２１_Ｋ−１の構成と動作の詳細な説明は、図１６を用いて行う。 The frequency-specific post-frequency suppression coefficient calculation units 1721 _{0 to} 1721 _K−1 use the frequency-specific corrected suppression coefficient supplied from the separation unit 1722 and the audio presence probability supplied from the audio existence probability calculation unit 171 in FIG. The frequency-specific post-suppression coefficient is calculated and transmitted to the multiplexing unit 1723. A detailed description of the configuration and operation of the frequency-specific post-frequency suppression coefficient calculation units 1721 _{0 to} 1721 _K−1 will be given with reference to FIG.

図１６は、図１５の周波数別後抑圧係数計算部１７２１_０〜１７２１_Ｋ−１の構成を示すブロック図である。周波数別後抑圧係数計算部１７２１は、有音部用下限値記憶部１６９１、無音部用下限値記憶部１６９２、下限値計算部１６９３、最大値選択部１６９４を有する。下限値計算部１６９３は、有音部用下限値記憶部１６９１から供給される有音部用下限値と、無音部用下限値記憶部１６９２から供給される無音部用下限値をもとに、図１４の音声存在確率計算部１７１から供給される音声存在確率に応じた下限値を計算し、最大値選択部１６９４へ伝達する。音声歪みを防止するため、有音部用下限値には、無音部用下限値よりも大きな値が設定される。下限値の計算では、音声存在確率が大きければ、下限値計算部１６９３の出力値に対する有音部用下限値の寄与率を大きくする。寄与率の設定には、式（２０）や式（２１）に示される方法を同様に用いることが可能である。 FIG. 16 is a block diagram illustrating a configuration of the frequency-specific post-frequency suppression coefficient calculation units 1721 _{0 to} 1721 _K−1 in FIG. 15. The post-frequency suppression coefficient calculation unit 1721 includes a lower limit value storage unit 1691 for a sound part, a lower limit value storage unit 1692 for a silence part, a lower limit value calculation unit 1693, and a maximum value selection unit 1694. The lower limit calculation unit 1693 is based on the lower limit value for the sound part supplied from the lower limit value storage unit 1691 for the sound part and the lower limit value for the silent part supplied from the lower limit value storage part 1692 for the silence part. A lower limit value corresponding to the voice presence probability supplied from the voice presence probability calculation unit 171 in FIG. 14 is calculated and transmitted to the maximum value selection unit 1694. In order to prevent sound distortion, a value larger than the lower limit value for the silent part is set as the lower limit value for the sound part. In the calculation of the lower limit value, if the speech existence probability is large, the contribution ratio of the lower limit value for the sound part to the output value of the lower limit value calculation unit 1693 is increased. For setting the contribution rate, it is possible to similarly use the methods shown in the equations (20) and (21).

最大値選択部１６９４は、図１５の分離部１７２２から供給される周波数別補正抑圧係数と、下限値計算部１６９３から供給される下限値とを比較し、大きい方の値を図１５の多重化部１７２３へ伝達する。値が同じ場合まで考慮すると、後抑圧係数は下限値計算部１６９３が供給する下限値以上の値になる。従って、抑圧係数は音声存在確率に応じて設定された下限値以上の値になる。音声存在確率が高ければ、下限値は大きくなるので、音声区間において過剰抑圧がもたらす音声歪みを防止できる。一方、音声存在確率が低ければ、下限値は小さくなるので、雑音区間において十分な抑圧度を得ることができる。 The maximum value selection unit 1694 compares the frequency-specific correction suppression coefficient supplied from the separation unit 1722 in FIG. 15 with the lower limit value supplied from the lower limit value calculation unit 1693, and multiplexes the larger value in FIG. Part 1723. Considering the case where the values are the same, the post-suppression coefficient becomes a value equal to or higher than the lower limit value supplied by the lower limit value calculation unit 1693. Therefore, the suppression coefficient becomes a value equal to or higher than the lower limit value set according to the voice presence probability. If the speech existence probability is high, the lower limit value becomes large, so that speech distortion caused by excessive suppression in the speech section can be prevented. On the other hand, if the speech existence probability is low, the lower limit value becomes small, so that a sufficient degree of suppression can be obtained in the noise interval.

図１７は本発明の第４の実施の形態を示すブロック図である。図１７と第一の実施例である図１とは、強調音声振幅スペクトル補正部２９を除いて同一である。強調音声振幅スペクトル補正部２９の構成と動作の詳細な説明は、図１８を参照しながら行う。 FIG. 17 is a block diagram showing a fourth embodiment of the present invention. FIG. 17 and FIG. 1 as the first embodiment are the same except for the enhanced speech amplitude spectrum correction unit 29. A detailed description of the configuration and operation of the enhanced speech amplitude spectrum correction unit 29 will be given with reference to FIG.

図１８は強調音声振幅スペクトル補正部２９の構成を示すブロック図である。図２に示した強調音声振幅スペクトル補正部１８とは、後抑圧係数計算部１８２が後抑圧係数計算部２９２に置換されていることを除いて同一である。後抑圧係数計算部２９２の構成と動作の詳細な説明は、図１９を参照しながら行う。 FIG. 18 is a block diagram showing the configuration of the enhanced speech amplitude spectrum correction unit 29. The enhanced speech amplitude spectrum correction unit 18 shown in FIG. 2 is the same except that the post-suppression coefficient calculator 182 is replaced with a post-suppression coefficient calculator 292. A detailed description of the configuration and operation of the post-suppression coefficient calculator 292 will be given with reference to FIG.

図１９は後抑圧係数計算部２９２の構成を示すブロック図である。図５に示した後抑圧係数計算部１８２とは、周波数別後抑圧係数計算部１８２１_０〜１８２１_Ｋ−１が周波数別後抑圧係数計算部２９２１_０〜２９２１_Ｋ−１に置換されていることを除いて同一である。周波数別後抑圧係数計算部２９２１_０〜２９２１_Ｋ−１の構成と動作の詳細な説明は、図２０を参照しながら行う。 FIG. 19 is a block diagram showing the configuration of the post-suppression coefficient calculator 292. The suppression coefficient calculation unit 182 then illustrated in Figure 5, that after the suppression coefficient calculation unit 1821 ₀ ~1821 _K-1 each frequency is replaced after each frequency suppression coefficient calculation unit 2921 ₀ ~2921 _K-1 Except for the same. Detailed configuration and operation of the frequency-specific post-frequency suppression coefficient calculation units 2921 _{0 to} 2921 _K−1 will be described with reference to FIG.

図２０は、図１９の周波数別後抑圧係数計算部２９２１_０〜２９２１_Ｋ−１の構成を示すブロック図である。図１６に示した周波数別後抑圧係数計算部１７２１とは、有音部用下限値記憶部１６９１が有音部用下限値計算部２６９１に置換されていること、無音部用下限値記憶部１６９２が無音部用下限値計算部２６９２を除いて同一である。強調音声パワースペクトルと推定雑音パワースペクトルを基に、有音部用及び無音部用下限値を計算するので、図１６の周波数別後抑圧係数計算部よりも、無音部で残留雑音を、有音部で音声歪みを低減できる。 FIG. 20 is a block diagram illustrating a configuration of frequency-specific post-frequency suppression coefficient calculation units 2921 _{0 to} 2921 _K−1 in FIG. The frequency-specific post-frequency suppression coefficient calculation unit 1721 shown in FIG. 16 is that the sound part lower limit value storage unit 1691 is replaced with the sound part lower limit value calculation unit 2691, and the soundless part lower limit value storage unit 1692. Are the same except for the silent part lower limit calculator 2692. Since the lower limit value for the voiced part and the silent part is calculated based on the emphasized voice power spectrum and the estimated noise power spectrum, the residual noise in the silent part is Audio distortion can be reduced at the part.

有音部用下限値計算部２６９１と無音部用下限値計算部２６９２は、図１７の多重乗算部１６及び推定雑音計算部１からそれぞれ供給される強調音声パワースペクトルと推定雑音パワースペクトルを用いて、有音部用下限値と無音部用下限値をそれぞれ求め、下限値計算部１６９３へ供給する。有音部用下限値計算部２６９１と無音部用下限値計算部２６９２は、推定雑音と強調音声のパワー比に応じて、それぞれの下限値を計算し、下限値計算部１６９３へ伝達する。基本的には、推定雑音パワーが強調音声パワーよりも大きくなる、すなわちＳＮＲが低くなれば、音声歪みを防止する目的で有音部用下限値を大きくする。 The voiced lower limit value calculation unit 2691 and the silent part lower limit value calculation unit 2692 use the emphasized speech power spectrum and the estimated noise power spectrum respectively supplied from the multiple multiplication unit 16 and the estimated noise calculation unit 1 in FIG. Then, the lower limit value for the sound part and the lower limit value for the silent part are respectively obtained and supplied to the lower limit value calculation unit 1693. The lower limit value calculation unit for sounded part 2691 and the lower limit value calculation unit for silent part 2692 calculate the respective lower limit values according to the power ratio of the estimated noise and the emphasized speech, and transmit them to the lower limit value calculation unit 1693. Basically, if the estimated noise power is larger than the emphasized voice power, that is, the SNR is lowered, the lower limit value for the sound part is increased for the purpose of preventing voice distortion.

無音部での残留雑音量を小さく、有音部での過剰抑圧を防止するために、無音部用下限値を有音部用下限値以下の値にする。但し、ＳＮＲが低い場合には、有音部用下限値と無音部用下限値の差が大きくならないように制御する。下限値の差が大きすぎると、有音部と無音部の残留雑音量の差が大きくなり、結果的に音声区間で音声ひずみが発生しているように知覚されてしまう。逆に、ＳＮＲが高ければ、有音部の残留雑音は、音声成分にマスクされて知覚されにくくなる。従って、ＳＮＲが低いときのように、有音部と無音部の残留雑音量の差は、音声区間での音声ひずみ要因に殆どならない。 In order to reduce the amount of residual noise in the silent part and prevent excessive suppression in the voiced part, the lower limit value for the silent part is set to a value equal to or lower than the lower limit value for the voiced part. However, when the SNR is low, control is performed so that the difference between the lower limit value for the sounded part and the lower limit value for the silent part is not increased. If the difference between the lower limit values is too large, the difference in the amount of residual noise between the sound part and the soundless part becomes large, and as a result, it is perceived that sound distortion occurs in the sound section. On the other hand, if the SNR is high, the residual noise in the sound part is masked by the sound component and becomes difficult to perceive. Therefore, as in the case where the SNR is low, the difference in the amount of residual noise between the voiced part and the silent part hardly causes a voice distortion factor in the voice section.

そこで、ＳＮＲが高い場合には、無音部用下限値と有音部用下限値の差を大きくして、無音部での残留雑音を十分に低減する。以上より、無音部用下限値は、有音部用下限値に依存した値に設定される。従って、基本的には、有音部下限値の場合と同様に、ＳＮＲが低くなれば、無音部用下限値も大きくする。推定雑音パワースペクトルと強調音声パワースペクトルの大きさを比較する場合は、それぞれの平均値や、図１１の無音部用係数計算で用いられている除算部１８５７の出力信号を用いることが好ましい。 Therefore, when the SNR is high, the difference between the lower limit value for the silent part and the lower limit value for the voiced part is increased to sufficiently reduce the residual noise in the silent part. From the above, the silent part lower limit value is set to a value depending on the voiced part lower limit value. Therefore, basically, as in the case of the lower limit value of the sound part, the lower limit value for the silent part is increased as the SNR decreases. When comparing the magnitudes of the estimated noise power spectrum and the emphasized speech power spectrum, it is preferable to use the respective average values or the output signal of the division unit 1857 used in the silence part coefficient calculation of FIG.

図２１は本発明の第５の実施の形態を示すブロック図である。図２１と関連技術例のブロック図である図３６とは、推定先天的ＳＮＲ計算部７及び抑圧係数補正部１５が、推定先天的ＳＮＲ計算部７１及び抑圧係数補正部１９にそれぞれ置換されていることを除いて同一である。以下、これらの相違点を中心に詳細な動作を説明する。 FIG. 21 is a block diagram showing a fifth embodiment of the present invention. FIG. 21 and FIG. 36 which is a block diagram of the related art example, the estimated innate SNR calculation unit 7 and the suppression coefficient correction unit 15 are replaced with the estimated innate SNR calculation unit 71 and the suppression coefficient correction unit 19, respectively. It is the same except that. Hereinafter, detailed operations will be described focusing on these differences.

推定先天的ＳＮＲ計算部７１には、多重乗算部１３から劣化音声パワースペクトル、推定雑音計算部５から推定雑音パワースペクトル、周波数別ＳＮＲ計算部６から後天的ＳＮＲ、抑圧係数補正部１９から補正抑圧係数が供給される。推定先天的ＳＮＲ計算部７１は、劣化音声パワースペクトル、推定雑音パワースペクトル、後天的ＳＮＲ及び補正抑圧係数を用いて、推定先天的ＳＮＲと音声存在確率を求める。そして、音声存在確率を抑圧係数補正部１９に、推定先天的ＳＮＲを雑音抑圧係数生成部８と抑圧係数補正部１９に伝達する。抑圧係数補正部１９は、推定先天的ＳＮＲ計算部７１から供給される推定先天的ＳＮＲと音声存在確率を用いて、雑音抑圧係数生成部８から供給される抑圧係数を補正し、補正抑圧係数として多重乗算部１６と推定先天的ＳＮＲ計算部７１へ伝達する。 The estimated innate SNR calculator 71 includes a degraded speech power spectrum from the multiple multiplier 13, an estimated noise power spectrum from the estimated noise calculator 5, an acquired SNR from the frequency-specific SNR calculator 6, and correction suppression from the suppression coefficient corrector 19. A coefficient is supplied. The estimated innate SNR calculation unit 71 obtains the estimated innate SNR and the voice presence probability using the degraded voice power spectrum, the estimated noise power spectrum, the acquired SNR, and the correction suppression coefficient. Then, the speech existence probability is transmitted to the suppression coefficient correction unit 19, and the estimated innate SNR is transmitted to the noise suppression coefficient generation unit 8 and the suppression coefficient correction unit 19. The suppression coefficient correction unit 19 corrects the suppression coefficient supplied from the noise suppression coefficient generation unit 8 using the estimated innate SNR and the speech existence probability supplied from the estimated innate SNR calculation unit 71, and serves as a corrected suppression coefficient. This is transmitted to the multiple multiplier 16 and the estimated innate SNR calculator 71.

抑圧係数補正部１９及び推定先天的ＳＮＲ計算部７１の構成と動作の詳細な説明は、図２２及び図２３を参照しながら行う。 A detailed description of the configuration and operation of the suppression coefficient correction unit 19 and the estimated innate SNR calculation unit 71 will be given with reference to FIGS.

図２２は推定先天的ＳＮＲ計算部７１の構成を示すブロック図である。図２２と関連技術例のブロック図である図４５との相違点は、推定先天的ＳＮＲ計算部７１が遅延器７１１、７１２、多重乗算部７１３、音声存在確率計算部７１４を有していることである。以下、これらの相違点を中心に詳細な動作を説明する。 FIG. 22 is a block diagram showing the configuration of the estimated innate SNR calculation unit 71. The difference between FIG. 22 and FIG. 45 which is a block diagram of the related art example is that the estimated innate SNR calculation unit 71 includes delay units 711 and 712, multiple multiplication unit 713, and speech existence probability calculation unit 714. It is. Hereinafter, detailed operations will be described focusing on these differences.

遅延器７１２は、図２１の推定雑音計算部５から供給される第ｎフレームの推定雑音パワースペクトルλ_ｎ（ｋ）を保存すると同時に、保存してあった第ｎ−１フレームの推定雑音パワースペクトルλ_ｎ−１（ｋ）を音声存在確率計算部７１４に供給する。遅延器７１１は、図２１の多重乗算部１３から供給される第ｎフレームの劣化音声パワースペクトル｜Ｙ_ｎ（ｋ）｜^２を保存すると同時に、保存してあった第ｎ−１フレームの劣化音声パワースペクトル｜Ｙ_ｎ−１（ｋ）｜^２を多重乗算部７１３に供給する。多重乗算部７１３は、多重乗算部７０４から供給されるＧ^２ _ｎ−１（ｋ）バーと遅延器７１１から供給される｜Ｙ_ｎ−１（ｋ）｜^２をｋ＝０，１，．．．，Ｋ−１に対して乗算して、Ｇ^２ _ｎ−１（ｋ）バー｜Ｙ_ｎ−１（ｋ）｜^２を求め、演算結果を推定強調音声パワースペクトルとして音声存在確率計算部７１４へ伝達する。多重乗算部７１３の出力信号は、第ｎ−１フレームの強調音声パワースペクトルに一致するが、これを第ｎフレームの強調音声パワースペクトルの推定信号として扱うために、推定強調音声パワースペクトルという名称を用いている。 The delay unit 712 stores the estimated noise power spectrum λ _n (k) of the nth frame supplied from the estimated noise calculation unit 5 of FIG. 21 and at the same time stores the estimated noise power spectrum of the n−1th frame. λ _n−1 (k) is supplied to the speech existence probability calculation unit 714. The delay unit 711 stores the degraded speech power spectrum | Y _n (k) | ² of the nth frame supplied from the multiplex multiplier 13 of FIG. The power spectrum | Y _n−1 (k) | ² is supplied to the multiple multiplier 713. The multiple multiplier 713 converts G ² _n−1 (k) bar supplied from the multiple multiplier 704 and | Y _n−1 (k) | ² supplied from the delay unit 711 into k = 0, 1,. . . , K−1 to obtain G ² _n−1 (k) bar | Y _n−1 (k) | ² , and the calculation result is transmitted to the speech existence probability calculation unit 714 as the estimated enhanced speech power spectrum. To do. The output signal of the multiplex multiplier 713 matches the emphasized speech power spectrum of the (n-1) th frame. In order to treat this as an estimated signal of the enhanced speech power spectrum of the nth frame, the name "estimated enhanced speech power spectrum" is used. Used.

多重乗算部７０４から供給される抑圧係数は、一フレーム前に得られたものなので、抑圧係数と劣化音声パワースペクトルのフレーム番号を合わせて強調音声パワースペクトルを計算するために、遅延器７１１が導入されている。更に、音声存在確率の計算に用いる強調音声パワースペクトルと推定雑音パワースペクトルのフレーム番号を合わせるために、遅延器７１２が導入されている。しかし、数フレームの相違が音声存在確率の計算に与える影響は小さいことから、遅延器７１１と７１２のどちらか一方、もしくは両方を省略することが可能である。 Since the suppression coefficient supplied from the multiplex multiplier 704 was obtained one frame before, a delay unit 711 was introduced to calculate the enhanced speech power spectrum by combining the suppression coefficient and the frame number of the degraded speech power spectrum. Has been. Further, a delay unit 712 is introduced to match the frame numbers of the emphasized speech power spectrum and the estimated noise power spectrum used for calculating the speech presence probability. However, since the influence of the difference of several frames on the calculation of the speech existence probability is small, it is possible to omit one or both of the delay units 711 and 712.

音声存在確率計算部７１４は、多重乗算部７１３から供給される推定強調音声パワースペクトルと、遅延器７１２から供給される推定雑音パワースペクトルを用いて音声存在確率を計算し、図２１の抑圧係数補正部１９へ伝達する。多重乗算部７１３の構成は、既に図３７を用いて説明した多重乗算部２１に等しいので、詳細な説明は省略する。また、音声存在確率計算部７１４の構成は、図３を用いて説明した音声存在確率計算部１７１に等しいので、詳細な説明は省略する。 The speech existence probability calculation unit 714 calculates the speech presence probability using the estimated emphasized speech power spectrum supplied from the multiple multiplier 713 and the estimated noise power spectrum supplied from the delay unit 712, and corrects the suppression coefficient in FIG. To the unit 19. Since the configuration of the multiple multiplier 713 is the same as that of the multiple multiplier 21 already described with reference to FIG. The configuration of the speech presence probability calculation unit 714 is the same as the speech presence probability calculation unit 171 described with reference to FIG.

図２３は、図２１の抑圧係数補正部１９の構成を示すブロック図である。図５０に示した抑圧係数補正部１５とは、周波数別抑圧係数補正部１５０１_０〜１５０１_Ｋ−１が周波数別抑圧係数補正部１９０１_０〜１９０１_Ｋ−１に置換されていることを除いて同一である。以下、これらの相違点を中心に詳細な動作を説明する。 FIG. 23 is a block diagram showing a configuration of the suppression coefficient correction unit 19 of FIG. The suppression coefficient correction unit 15 shown in FIG. 50, the same except that the frequency-suppression coefficient correction unit 1501 ₀ ~1501 _K-1 has been replaced with frequency-suppression coefficient correction unit 1901 ₀ ~1901 _K-1 It is. Hereinafter, detailed operations will be described focusing on these differences.

周波数別抑圧係数補正部１９０１_０〜１９０１_Ｋ−１は、分離部１５０２から供給される周波数別推定先天的ＳＮＲと、図２１の推定先天的ＳＮＲ計算部７１から供給される音声存在確率を用いて、分離部１５０３から供給される周波数別抑圧係数を補正し、周波数別補正抑圧係数として多重化部１５０４へ伝達する。周波数別抑圧係数補正部１９０１_０〜１９０１_Ｋ−１の構成と動作の詳細な説明は、図２４を用いて行う。 The frequency-specific suppression coefficient correction units 1901 _{0 to} 1901 _K−1 use the frequency-specific estimated innate SNR supplied from the separating unit 1502 and the speech existence probability supplied from the estimated innate SNR calculation unit 71 in FIG. Then, the frequency-specific suppression coefficient supplied from the separation unit 1503 is corrected and transmitted to the multiplexing unit 1504 as the frequency-specific correction suppression coefficient. A detailed description of the configuration and operation of the frequency-specific suppression coefficient correction units 1901 _{0 to} 1901 _K−1 will be given with reference to FIG.

図２４は、図２３の周波数別抑圧係数補正部１９０１_０〜１９０１_Ｋ−１の構成を示すブロック図である。図２４では、図５１の周波数別抑圧係数補正部１５０１における最大値選択部１５９１及び抑圧係数下限値記憶部１５９２の代わりに、有音部用下限値記憶部１９２１、無音部用下限値記憶部１９２２、下限値計算部１９２３、及び最大値選択部１９２４が具備されている。以下、これらの相違点を中心に詳細な動作を説明する。 FIG. 24 is a block diagram illustrating the configuration of the frequency-specific suppression coefficient correction units 1901 _{0 to} 1901 _K−1 in FIG. In FIG. 24, instead of the maximum value selection unit 1591 and the suppression coefficient lower limit value storage unit 1592 in the frequency-specific suppression coefficient correction unit 1501 of FIG. 51, a voiced part lower limit value storage unit 1921 and a silent part lower limit value storage unit 1922 are used. , A lower limit calculator 1923 and a maximum value selector 1924 are provided. Hereinafter, detailed operations will be described focusing on these differences.

下限値計算部１９２３は、有音部用下限値記憶部１９２１から供給される有音部用下限値と、無音部用下限値記憶部１９２２から供給される無音部用下限値をもとに、図２１の推定先天的ＳＮＲ計算部７１から供給される音声存在確率に応じた下限値を計算し、最大値選択部１９２４へ伝達する。最大値選択部１９２４は、スイッチ１５９５又は乗算器１５９７の出力値と、下限値計算部１９２３から供給される下限値とを比較し、大きい方の値を補正抑圧係数として図２３の多重化部１５０４へ伝達する。値が同じ場合まで考慮すると、補正抑圧係数は下限値計算部１９２３が供給する下限値より以上の値になる。従って、抑圧係数が音声存在確率に応じて設定された下限値以上の値になるので、音声区間において過剰抑圧がもたらす音声歪みを防止できる。下限値計算部１９２３の構成は、図６を用いて既に説明した下限値計算部１６９３に等しいので、詳細な説明は省略する。 The lower limit value calculation unit 1923 is based on the lower limit value for a sound part supplied from the lower limit value storage part 1921 for a sound part and the lower limit value for a soundless part supplied from the lower limit value storage part 1922 for a silence part. A lower limit value corresponding to the speech existence probability supplied from the estimated innate SNR calculation unit 71 in FIG. 21 is calculated and transmitted to the maximum value selection unit 1924. Maximum value selection section 1924 compares the output value of switch 1595 or multiplier 1597 with the lower limit value supplied from lower limit calculation section 1923, and uses the larger value as the correction suppression coefficient, and multiplexing section 1504 in FIG. To communicate. Considering the case where the values are the same, the corrected suppression coefficient is a value greater than the lower limit value supplied by the lower limit value calculation unit 1923. Therefore, since the suppression coefficient becomes a value equal to or higher than the lower limit value set according to the voice presence probability, voice distortion caused by excessive suppression in the voice section can be prevented. Since the configuration of the lower limit calculation unit 1923 is equal to the lower limit calculation unit 1693 already described with reference to FIG. 6, detailed description thereof is omitted.

図２５は本発明の第６の実施の形態を示すブロック図である。図２５と関連技術例のブロック図である図３６とは、推定先天的ＳＮＲ計算部７及び抑圧係数補正部１５が推定先天的ＳＮＲ計算部７２及び抑圧係数補正部２０にそれぞれ置換されていることを除いて同一である。以下、これらの相違点を中心に詳細な動作を説明する。 FIG. 25 is a block diagram showing a sixth embodiment of the present invention. FIG. 25 and FIG. 36, which is a block diagram of the related art example, show that the estimated innate SNR calculation unit 7 and the suppression coefficient correction unit 15 are replaced with the estimated innate SNR calculation unit 72 and the suppression coefficient correction unit 20, respectively. Is the same except for. Hereinafter, detailed operations will be described focusing on these differences.

推定先天的ＳＮＲ計算部７２には、多重乗算部１３から劣化音声パワースペクトル、推定雑音計算部５から推定雑音パワースペクトル、周波数別ＳＮＲ計算部６から後天的ＳＮＲ、抑圧係数補正部２０から補正抑圧係数が供給される。推定先天的ＳＮＲ計算部７２は、劣化音声パワースペクトル、推定雑音パワースペクトル、後天的ＳＮＲ及び補正抑圧係数を用いて、推定先天的ＳＮＲ、音声存在確率及び推定強調音声パワースペクトルを求める。そして、抑圧係数補正部２０に推定先天的ＳＮＲ、音声存在確率及び推定強調音声パワースペクトルを、雑音抑圧係数生成部８に推定先天的ＳＮＲをそれぞれ伝達する。抑圧係数補正部２０は、推定先天的ＳＮＲ計算部７２から供給される推定先天的ＳＮＲ、音声存在確率及び推定強調音声パワースペクトルを用いて、雑音抑圧係数生成部８から供給される抑圧係数を補正し、補正抑圧係数として多重乗算部１６と推定先天的ＳＮＲ計算部７２へ伝達する。推定先天的ＳＮＲ計算部７２及び抑圧係数補整正部２０の構成と動作の詳細な説明は、図２６及び図２７を参照しながら行う。 The estimated innate SNR calculation unit 72 includes a degenerate speech power spectrum from the multiple multiplication unit 13, an estimated noise power spectrum from the estimation noise calculation unit 5, an acquired SNR from the frequency-specific SNR calculation unit 6, and correction suppression from the suppression coefficient correction unit 20. A coefficient is supplied. The estimated innate SNR calculator 72 obtains an estimated innate SNR, an audio presence probability, and an estimated enhanced audio power spectrum using the degraded audio power spectrum, the estimated noise power spectrum, the acquired SNR, and the correction suppression coefficient. Then, the estimated innate SNR, the speech existence probability, and the estimated enhanced speech power spectrum are transmitted to the suppression coefficient correction unit 20, and the estimated innate SNR is transmitted to the noise suppression coefficient generation unit 8, respectively. The suppression coefficient correction unit 20 corrects the suppression coefficient supplied from the noise suppression coefficient generation unit 8 using the estimated innate SNR, the speech existence probability, and the estimated enhanced speech power spectrum supplied from the estimated innate SNR calculation unit 72. The corrected suppression coefficient is transmitted to the multiple multiplier 16 and the estimated innate SNR calculator 72. A detailed description of the configuration and operation of the estimated innate SNR calculator 72 and the suppression coefficient correction corrector 20 will be given with reference to FIGS.

図２６は推定先天的ＳＮＲ計算部７２の構成を示すブロック図である。図２２の推定先天的ＳＮＲ計算部７１とは、多重乗算部７１３が多重乗算部７１５に置換されていることを除いて同一である。多重乗算部７１３は音声存在確率計算部７１４だけに推定強調音声パワースペクトルを供給していたが、多重乗算部７１５は図２５の抑圧係数補正部２０にも供給する。多重乗算部７１５の構成は、図２２を用いて既に説明した多重乗算部７１３に等しいので、詳細な説明は省略する。 FIG. 26 is a block diagram showing a configuration of the estimated innate SNR calculation unit 72. The estimated innate SNR calculation unit 71 of FIG. 22 is the same except that the multiple multiplier 713 is replaced with a multiple multiplier 715. The multiple multiplier 713 supplies the estimated enhanced speech power spectrum only to the speech presence probability calculator 714, but the multiple multiplier 715 also supplies the suppression coefficient corrector 20 of FIG. The configuration of the multiple multiplier 715 is the same as the multiple multiplier 713 already described with reference to FIG.

図２７は抑圧係数補正部２０の構成を示すブロック図である。図５０の抑圧係数補正部１５とは、周波数別抑圧係数補正部１５０１_０〜１５０１_Ｋ−１が周波数別抑圧係数補正部２００１_０〜２００１_Ｋ−１に置換されていることを除いて同一である。以下、これらの相違点を中心に詳細な動作を説明する。 FIG. 27 is a block diagram illustrating a configuration of the suppression coefficient correction unit 20. The suppression coefficient correction unit 15 of FIG. 50, are identical except that the frequency-suppression coefficient correction unit 1501 ₀ ~1501 _K-1 has been replaced with frequency-suppression coefficient correction unit 2001 ₀ ~2001 _K-1 . Hereinafter, detailed operations will be described focusing on these differences.

周波数別抑圧係数補正部２００１_０〜２００１_Ｋ−１には、分離部１５０２から周波数別推定先天的ＳＮＲ、図２５の推定雑音計算部５から推定雑音パワースペクトル、図２５の推定先天的ＳＮＲ計算部７２から音声存在確率と推定強調音声パワースペクトルがそれぞれ供給されている。周波数別推定先天的ＳＮＲ、推定雑音パワースペクトル、推定強調音声パワースペクトル及び音声存在確率を用いて、分離部１５０３から供給される周波数別抑圧係数を補正し、周波数別補正抑圧係数として多重化部１５０４へ伝達する。周波数別抑圧係数補正部２００１_０〜２００１_Ｋ−１の構成と動作の詳細な説明は、図２８を用いて行う。 The frequency-specific suppression coefficient correction units 2001 _{0 to} 2001 _K−1 include a frequency-specific estimated innate SNR from the separation unit 1502, an estimated noise power spectrum from the estimated noise calculation unit 5 in FIG. 25, and an estimated innate SNR calculation unit in FIG. 25. The speech existence probability and the estimated enhanced speech power spectrum are supplied from 72. The frequency-specific suppression coefficient supplied from the separation unit 1503 is corrected using the frequency-specific estimated innate SNR, the estimated noise power spectrum, the estimated emphasized voice power spectrum, and the voice presence probability, and is multiplexed as a frequency-specific corrected suppression coefficient. To communicate. A detailed description of the configuration and operation of the frequency-specific suppression coefficient correction units 2001 _{0 to} 2001 _K−1 will be given with reference to FIG.

図２８は、図２７の周波数別抑圧係数補正部２００１_０〜２００１_Ｋ−１の構成を示すブロック図である。図２８では、図５１の周波数別抑圧係数補正部１５０１における最大値選択部１５９１及び抑圧係数下限値記憶部１５９２の代わりに、有音部用補正係数記憶部２０１１、無音部用補正係数記憶部２０１２、補正係数計算部２０１３、及び乗算器２０１４が具備されている。以下、これらの相違点を中心に詳細な動作を説明する。 FIG. 28 is a block diagram illustrating a configuration of the frequency-specific suppression coefficient correction units 2001 _{0 to} 2001 _K−1 in FIG. 28, instead of the maximum value selection unit 1591 and the suppression coefficient lower limit value storage unit 1592 in the frequency-specific suppression coefficient correction unit 1501 of FIG. 51, a sound part correction coefficient storage unit 2011 and a silent part correction coefficient storage unit 2012 are used. , A correction coefficient calculation unit 2013, and a multiplier 2014 are provided. Hereinafter, detailed operations will be described focusing on these differences.

無音部用補正係数計算部２０１２は、図２５の推定先天的ＳＮＲ計算部７２から供給される音声存在確率と推定強調音声パワースペクトル、及び図２５の推定雑音計算部５から供給される推定雑音パワースペクトルを用いて無音部用補正係数を計算し、補正係数計算部２０１３へ供給する。補正係数計算部２０１３は、有音部用補正係数記憶部２０１１から供給される有音部用補正係数と、無音部用補正係数計算部２０１２から供給される無音部用補正係数をもとに、図２５の推定先天的ＳＮＲ計算部７２から供給される音声存在確率に応じた補正係数を計算し、乗算器２０１４へ伝達する。乗算器２０１４は、補正係数計算部２０１３から供給される補正係数と、スイッチ１５９５又は乗算器１５９７の出力値との積を計算し、補正抑圧係数として図２７の多重化部１５０４へ伝達する。音声存在確率に応じて計算された補正係数により抑圧係数が補正されるので、雑音区間において残留雑音を更に抑圧できる。無音部用補正係数計算部２０１２の構成は、既に図７を用いて説明した無音部用補正係数計算部１８３２に等しいので、詳細な説明は省略する。また、補正係数計算部２０１３の構成は、図６を用いて既に説明した補正係数計算部１８３３に等しいので、詳細な説明は省略する。 The silent part correction coefficient calculation unit 2012 includes the speech existence probability and the estimated enhanced speech power spectrum supplied from the estimated innate SNR calculation unit 72 in FIG. 25, and the estimated noise power supplied from the estimated noise calculation unit 5 in FIG. The silent part correction coefficient is calculated using the spectrum and supplied to the correction coefficient calculation part 2013. The correction coefficient calculation unit 2013 is based on the sound part correction coefficient supplied from the sound part correction coefficient storage unit 2011 and the soundless part correction coefficient supplied from the sound part correction coefficient calculation unit 2012. A correction coefficient corresponding to the speech existence probability supplied from the estimated innate SNR calculation unit 72 in FIG. 25 is calculated and transmitted to the multiplier 2014. The multiplier 2014 calculates the product of the correction coefficient supplied from the correction coefficient calculation unit 2013 and the output value of the switch 1595 or the multiplier 1597, and transmits the product as a correction suppression coefficient to the multiplexing unit 1504 in FIG. Since the suppression coefficient is corrected by the correction coefficient calculated according to the speech existence probability, the residual noise can be further suppressed in the noise section. The configuration of the silent part correction coefficient calculation unit 2012 is the same as the silent part correction coefficient calculation unit 1832 already described with reference to FIG. The configuration of the correction coefficient calculation unit 2013 is the same as that of the correction coefficient calculation unit 1833 already described with reference to FIG.

図２９は本発明の第７の実施の形態を示すブロック図である。図２９と第３の実施例である図１３との相違点は、音声非存在確率記憶部２１の代わりに遅延器２３と加算器２４が具備されていること、及び強調音声振幅スペクトル補正部１７が強調音声振幅スペクトル補正部２２に置換されていることである。以下、これらの相違点を中心に詳細な動作を説明する。 FIG. 29 is a block diagram showing a seventh embodiment of the present invention. The difference between FIG. 29 and FIG. 13 which is the third embodiment is that a delay unit 23 and an adder 24 are provided in place of the speech non-existence probability storage unit 21, and the enhanced speech amplitude spectrum correction unit 17. Is replaced by the enhanced speech amplitude spectrum correction unit 22. Hereinafter, detailed operations will be described focusing on these differences.

強調音声振幅スペクトル補正部２２から出力された音声存在確率は、遅延器２３に保存される。遅延器２３は、一フレーム前の音声存在確率を加算器２４へ伝達する。雑音抑圧係数が生成された後に、音声存在確率が計算されるため、雑音抑圧係数の生成に必要となる音声存在確率の計算には、一フレーム前の音声存在確率を利用する。加算器２４は、１から音声存在確率を差し引いた値を計算し、計算結果を音声非存在確率として、雑音抑圧係数生成部へ伝達する。図１３の第３の実施例では常に同じ音声非存在確率を用いて雑音抑圧係数の生成を行っていたが、本実施例では強調音声振幅スペクトル補正部で計算した音声存在確率を基に音声非存在確率を計算している。このため、関連技術よりも各入力信号に適した音声非存在確率を、雑音抑圧係数の生成に用いることが可能である。強調音声振幅スペクトル補正部２２の構成と動作の詳細な説明は、図３０を参照しながら行う。 The speech existence probability output from the enhanced speech amplitude spectrum correction unit 22 is stored in the delay unit 23. The delay unit 23 transmits the voice presence probability of the previous frame to the adder 24. Since the speech existence probability is calculated after the noise suppression coefficient is generated, the speech presence probability of one frame before is used to calculate the speech presence probability necessary for generating the noise suppression coefficient. The adder 24 calculates a value obtained by subtracting the voice presence probability from 1, and transmits the calculation result as a voice non-existence probability to the noise suppression coefficient generation unit. In the third embodiment of FIG. 13, the noise suppression coefficient is always generated using the same speech non-existence probability, but in this embodiment, the speech non-existence is calculated based on the speech existence probability calculated by the enhanced speech amplitude spectrum correction unit. Existence probability is calculated. For this reason, it is possible to use a speech non-existence probability that is more suitable for each input signal than the related art for generating a noise suppression coefficient. A detailed description of the configuration and operation of the enhanced speech amplitude spectrum correction unit 22 will be given with reference to FIG.

図３０は、図２９の強調音声振幅スペクトル補正部２２の構成を示すブロック図である。図１４の強調音声振幅スペクトル補正部１７とは、音声存在確率計算部１７１が音声存在確率計算部２２１に置換されていることを除いて同一である。図１４の音声存在確率計算部１７１は、音声存在確率を後抑圧係数１７２のみに伝達しているが、図３０の音声存在確率計算部２２１は、更に図２９の遅延器２３にも伝達している。 FIG. 30 is a block diagram illustrating a configuration of the enhanced speech amplitude spectrum correction unit 22 of FIG. 14 is the same as the emphasized speech amplitude spectrum correction unit 17 except that the speech presence probability calculation unit 171 is replaced with a speech presence probability calculation unit 221. The speech existence probability calculation unit 171 in FIG. 14 transmits the speech existence probability only to the post-suppression coefficient 172, but the speech existence probability calculation unit 221 in FIG. 30 further transmits the speech existence probability to the delay unit 23 in FIG. Yes.

図３１は、本発明の第８の実施の形態を示すブロック図である。図３１と第７の実施例である図２９との相違点は、遅延器２３の代わりに音声存在確率計算部２６が具備されていること、及び強調音声振幅スペクトル補正部２２が強調音声振幅スペクトル補正部２５に置換されていることである。音声存在確率計算部２６は、推定先天的ＳＮＲ計算部７から出力された推定先天的ＳＮＲを用いて、音声存在確率を計算し、加算器２４と強調音声振幅スペクトル補正部２５へ伝達する。第７の実施例である図２９とは異なり、雑音抑圧係数を生成する前に音声存在確率を計算するため、雑音抑圧係数生成部８は、一フレーム前に計算した音声存在確率を基に導出された音声非存在確率を用いる必要が無い。このため、本実施例の雑音抑圧係数生成部８は、第７の実施例の場合よりも正確な音声非存在確率を用いることが可能である。強調音声振幅スペクトル補正部２５と音声存在確率計算部２６の構成と動作の詳細な説明は、図３２及び図３３を参照しながら行う。 FIG. 31 is a block diagram showing an eighth embodiment of the present invention. The difference between FIG. 31 and FIG. 29 which is the seventh embodiment is that a speech existence probability calculation unit 26 is provided in place of the delay unit 23, and the enhanced speech amplitude spectrum correction unit 22 is enhanced speech spectrum. That is, the correction unit 25 is replaced. The speech existence probability calculation unit 26 calculates the speech presence probability using the estimated innate SNR output from the estimated innate SNR calculation unit 7 and transmits the calculated speech existence probability to the adder 24 and the enhanced speech amplitude spectrum correction unit 25. Unlike FIG. 29 which is the seventh embodiment, the noise suppression coefficient generation unit 8 derives based on the voice existence probability calculated one frame before in order to calculate the voice existence probability before generating the noise suppression coefficient. There is no need to use the determined speech non-existence probability. For this reason, the noise suppression coefficient generation unit 8 of the present embodiment can use a more accurate speech non-existence probability than the case of the seventh embodiment. A detailed description of the configuration and operation of the enhanced speech amplitude spectrum correction unit 25 and the speech presence probability calculation unit 26 will be given with reference to FIGS. 32 and 33.

図３２は、図３１の強調音声振幅スペクトル補正部２５の構成を示すブロック図である。図３０の強調音声振幅スペクトル補正部２２とは、音声存在確率計算部２２１と多重乗算部１７０が削除されていること、及び後抑圧係数計算部１７２が後抑圧係数２５２に置換されていることを除いて同一である。後抑圧係数計算部は、図３１の音声存在確率計算部２６から出力された音声存在確率を基に、図３１の抑圧係数補正部１５から出力された補正抑圧係数から後抑圧係数を計算し、多重乗算部１７３へ伝達する。音声非存在確率を強調音声振幅スペクトル補正部の外部で計算している点が、図３０の後抑圧係数計算部１７２と図３２の後抑圧係数計算部２５２との相違点である。 FIG. 32 is a block diagram showing a configuration of the enhanced speech amplitude spectrum correction unit 25 of FIG. The emphasized speech amplitude spectrum correction unit 22 in FIG. 30 indicates that the speech existence probability calculation unit 221 and the multiple multiplication unit 170 are deleted, and that the post-suppression coefficient calculation unit 172 is replaced with the post-suppression coefficient 252. Except for the same. The post-suppression coefficient calculation unit calculates a post-suppression coefficient from the corrected suppression coefficient output from the suppression coefficient correction unit 15 in FIG. 31 based on the speech existence probability output from the speech existence probability calculation unit 26 in FIG. This is transmitted to the multiple multiplier 173. The difference between the post-suppression coefficient calculator 172 of FIG. 30 and the post-suppression coefficient calculator 252 of FIG. 32 is that the speech non-existence probability is calculated outside the emphasized speech amplitude spectrum correction unit.

図３３は、図３１の音声存在確率計算部２６の構成を示すブロック図である。図３の音声存在確率計算部１７１とは、分離部１７０８、平均値計算部１７０９、対数計算部１７１０、乗算器１７１１、関数値計算部１７１２、１７１３が削除されていること、平均指標計算部が１７１４から２６１４に、瞬時指標計算部が１７１５から２６１５に置換されていること、及び分離部１７００への入力が強調音声パワースペクトルから推定先天的ＳＮＲに置換されていることを除いて同一である。図３の音声存在確率計算部１７１と図３３の音声存在確率計算部２６の共通点は、音声と雑音の比に応じて指標を計算している点である。音声存在確率計算部１７１は、強調音声パワーと推定雑音パワーの双方を、指標計算に適した値に補正するが、音声存在確率計算部２６は推定先天的ＳＮＲを補正する。このため、音声存在確率計算部２６の方が少ない演算量で実現できる。以下、これらの相違点を中心に詳細な動作を説明する。 FIG. 33 is a block diagram showing the configuration of the speech existence probability calculation unit 26 of FIG. The speech existence probability calculation unit 171 in FIG. 3 is different from the speech existence probability calculation unit 171 in that the separation unit 1708, the average value calculation unit 1709, the logarithm calculation unit 1710, the multiplier 1711, and the function value calculation units 1712 and 1713 are deleted. 1714 to 2614 are the same except that the instantaneous index calculation unit is replaced from 1715 to 2615 and the input to the separation unit 1700 is replaced from the enhanced speech power spectrum to the estimated innate SNR. The common point of the speech existence probability calculation unit 171 in FIG. 3 and the speech existence probability calculation unit 26 in FIG. 33 is that an index is calculated according to the ratio of speech to noise. The voice presence probability calculator 171 corrects both the emphasized voice power and the estimated noise power to values suitable for index calculation, while the voice presence probability calculator 26 corrects the estimated innate SNR. For this reason, the voice presence probability calculation unit 26 can be realized with a smaller amount of calculation. Hereinafter, detailed operations will be described focusing on these differences.

分離部１７００は、図３１の推定先天的ＳＮＲ計算部７から供給される推定先天的ＳＮＲを周波数別推定先天的ＳＮＲに分離し、平均値計算部１７０１へ出力する。平均値計算部１７０１は、周波数別推定先天的ＳＮＲξ_ｎ（ｋ）ハットのｋ＝０からＫ−１に対する総和をＫで除算し、計算結果を対数計算部１７０２へ伝達する。対数計算部１７０２は、平均値計算部１７０１から入力された平均値の対数を計算し、乗算器１７０３へ伝達する。乗算器１７０３は、供給された対数値を定数倍して、フルバンド推定先天的ＳＮＲΞ（ｎ）を求め、平滑部１７０５、１７０７へ供給する。すなわち、第ｎフレームのフルバンド推定先天的ＳＮＲΞ（ｎ）は、次式で与えられる。

Separation section 1700 separates the estimated innate SNR supplied from estimated innate SNR calculation section 7 in FIG. 31 into the frequency-specific estimated innate SNR, and outputs the result to average value calculation section 1701. The average value calculation unit 1701 divides the sum of the estimated innate SNR ξ _n (k) by frequency from k = 0 to K−1 by K, and transmits the calculation result to the logarithmic calculation unit 1702. The logarithm calculation unit 1702 calculates the logarithm of the average value input from the average value calculation unit 1701 and transmits the logarithm to the multiplier 1703. The multiplier 1703 multiplies the supplied logarithmic value by a constant to obtain a full-band estimated innate SNR Ξ (n) and supplies it to the smoothing

units

1705 and 1707. That is, the full band estimation innate SNR Ξ (n) of the nth frame is given by the following equation.

平滑化部１７０５は、平滑化係数記憶部１７０４から供給された平滑化係数を用いて、乗算器１７０３から供給されたフルバンド推定先天的ＳＮＲΞ（ｎ）を時間方向に平滑化し、第一の平滑先天的ＳＮＲとして瞬時指標計算部２６１５へ供給する。平滑化部１７０７も同様に、平滑化係数記憶部１７０６から供給された平滑化係数を用いて、乗算器１７０３から供給されたフルバンド推定先天的ＳＮＲΞ（ｎ）を時間方向に平滑化し、第二の平滑先天的ＳＮＲとして平均指標計算部２６１４へ供給する。図３の音声存在確率計算部１７１を説明したときに述べたとおり、平滑化係数記憶部１７０４に記憶されている係数の方が、平滑化係数記憶部１７０６の係数よりも小さくなるように設定される。 The smoothing unit 1705 smoothes the full-band estimated innate SNR Ξ (n) supplied from the multiplier 1703 in the time direction using the smoothing coefficient supplied from the smoothing coefficient storage unit 1704, and performs the first smoothing. This is supplied to the instantaneous index calculation unit 2615 as an innate SNR. Similarly, the smoothing unit 1707 uses the smoothing coefficient supplied from the smoothing coefficient storage unit 1706 to smooth the full-band estimated innate SNR Ξ (n) supplied from the multiplier 1703 in the time direction. Is supplied to the average index calculation unit 2614 as a smooth innate SNR. As described when explaining the speech existence probability calculation unit 171 in FIG. 3, the coefficient stored in the smoothing coefficient storage unit 1704 is set to be smaller than the coefficient in the smoothing coefficient storage unit 1706. The

瞬時指標計算部２６１５は、平滑化部１７０５から供給された第一の平滑先天的ＳＮＲを用いて、瞬時指標を計算し、加算部１７１６へ供給する。平均指標計算部２６１４は、平滑化部１７０７から供給された第二の平滑先天的ＳＮＲを用いて、平均指標を計算し、加算部１７１６へ供給する。指標の計算には、平滑先天的ＳＮＲに応じて数値を大きくする方法が利用される。具体例としては、次のような計算方法が挙げられる。

The instantaneous index calculation unit 2615 calculates an instantaneous index using the first smoothed innate SNR supplied from the smoothing unit 1705 and supplies the calculated instantaneous index to the adding unit 1716. The average index calculation unit 2614 calculates the average index using the second smoothed innate SNR supplied from the smoothing unit 1707 and supplies the average index to the addition unit 1716. For calculating the index, a method of increasing the numerical value according to the smooth innate SNR is used. Specific examples include the following calculation methods.

但し、ＩＤＸ２_ｎは指標、Ξ（ｎ）バーは平滑先天的ＳＮＲである。また、θ_ｉｄｘ２、ａ_ｉｄｘ２とｂ_ｉｄｘ２は実数で、ａ_ｉｄｘ２はｂ_ｉｄｘ２以上の値を有する。 Where IDX2 _n is an index and Ξ (n) bar is a smooth innate SNR. Also, θ _idx2 , a _idx2 and b _idx2 are real numbers, and a _idx2 has a value equal to or greater than b _idx2 .

図３４は、本発明の第９の実施の形態を示すブロック図である。図３４と第８の実施例である図３１との相違点は、音声存在確率計算部２６が音声存在確率計算部２７に置換されていることである。音声存在確率計算部２７は、周波数別ＳＮＲ計算部６から出力された後天的ＳＮＲと推定先天的ＳＮＲ計算部７から出力された推定先天的ＳＮＲを用いて、音声存在確率を計算し、加算器２４と強調音声振幅スペクトル補正部２５へ伝達する。音声存在確率計算部２７の構成と動作の詳細な説明は、図３５を参照しながら行う。 FIG. 34 is a block diagram showing a ninth embodiment of the present invention. The difference between FIG. 34 and FIG. 31 of the eighth embodiment is that the voice presence probability calculation unit 26 is replaced with a voice presence probability calculation unit 27. The speech existence probability calculation unit 27 calculates a speech presence probability using the acquired SNR output from the frequency-specific SNR calculation unit 6 and the estimated innate SNR output from the estimated innate SNR calculation unit 7, and an adder 24 and the enhanced speech amplitude spectrum correction unit 25. A detailed description of the configuration and operation of the speech existence probability calculation unit 27 will be given with reference to FIG.

図３５は、図３４の音声存在確率計算部２７の構成を示すブロック図である。図３１の音声存在確率計算部２６とは、分離部１７００が２７００に、平均値計算部１７０１が２７０１に置換されていること、更に、分離部２７０３と平均値計算部２７０４、及びＳＮＲ混合部２７０５が具備されていることを除いて同一である。図３１の音声存在確率計算部２６との主な相違点は、対数計算部１７０２へ入力されるＳＮＲの推定精度が改善されている点である。以下、これらの相違点を中心に詳細な動作を説明する。 FIG. 35 is a block diagram showing a configuration of the speech existence probability calculation unit 27 of FIG. 31 is different from the speech existence probability calculation unit 26 in that the separation unit 1700 is replaced with 2700, the average value calculation unit 1701 is replaced with 2701, and the separation unit 2703, the average value calculation unit 2704, and the SNR mixing unit 2705. Is the same except that is provided. The main difference from the speech existence probability calculation unit 26 in FIG. 31 is that the estimation accuracy of the SNR input to the logarithmic calculation unit 1702 is improved. Hereinafter, detailed operations will be described focusing on these differences.

分離部２７００は、図３４の推定先天的ＳＮＲ計算部７から供給される推定先天的ＳＮＲを周波数別推定先天的ＳＮＲに分離し、平均値計算部２７０１へ出力する。平均値計算部２７０１は、周波数別推定先天的ＳＮＲξ_ｎ（ｋ）ハットのｋ＝０からＫ−１に対する総和をＫで除算し、計算結果を平均先天的ＳＮＲξ_ｎバーとしてＳＮＲ混合部２７０５へ伝達する。すなわち、第ｎフレームの平均先天的ＳＮＲξ_ｎバーは、次式で与えられる。

The separation unit 2700 separates the estimated innate SNR supplied from the estimated innate SNR calculation unit 7 in FIG. 34 into the frequency-specific estimated innate SNR, and outputs it to the average value calculation unit 2701. The average value calculation unit 2701 divides the sum of the estimated innate SNR ξ _n (k) by frequency from k = 0 to K−1 by K, and transmits the calculation result to the SNR mixing unit 2705 as an average innate SNR ξ _n bar. To do. That is, the average innate SNR ξ _n bar of the nth frame is given by

一方、分離部２７０３は、図３４の周波数別ＳＮＲ計算部６から供給される後天的ＳＮＲを周波数別後天的ＳＮＲに分離し、平均値計算部２７０４へ出力する。平均値計算部２７０４は、周波数別後天的ＳＮＲγ_ｎ（ｋ）のｋ＝０からＫ−１に対する総和をＫで除算し、計算結果を平均後天的ＳＮＲγ_ｎバーとしてＳＮＲ混合部２７０５へ伝達する。すなわち、第ｎフレームの平均後天的ＳＮＲγ_ｎバーは、

On the other hand, the separation unit 2703 separates the acquired SNR supplied from the frequency-specific SNR calculation unit 6 of FIG. 34 into the frequency-specific acquired SNR, and outputs it to the average value calculation unit 2704. Average value calculation section 2704 divides the sum of frequency-specific acquired SNRγ _n (k) from k = 0 to K−1 by K, and transmits the calculation result as average acquired SNRγ _n bar to SNR mixing section 2705. That is, the average acquired SNRγ _n bar of the nth frame is

で与えられる。 Given in.

ＳＮＲ混合部は、平均値計算部２７０１から供給される平均先天的ＳＮＲξ_ｎバーと、平均値計算部２７０３から供給される平均後天的ＳＮＲγ_ｎバーを用いて、混合ＳＮＲΞ_ｍｉｘ（ｎ）を計算し、対数計算部１７０２へ伝達する。混合ＳＮＲΞ_ｍｉｘ（ｎ）の計算には、平均先天的ＳＮＲξ_ｎバーに応じて数値を大きくする方法が利用される。具体例としては、次のような計算方法が挙げられる。

SNR mixing unit uses the average and congenital SNRkushi _n bars supplied from the average value calculating unit 2701, the average acquired SNRganma _n bars supplied from the average value calculating unit 2703, a mixed SNRΞ _mix (n) calculated To the logarithm calculation unit 1702. In calculating the mixed SNR Ξ _mix (n), a method of increasing the numerical value according to the average innate SNRξ _n bar is used. Specific examples include the following calculation methods.

但し、Ｆ_ｍｉｘは平均先天的ＳＮＲξ_ｎバーの関数である。 Where F _mix is a function of the average innate SNRξ _n bar.

Ｆ_ｍｉｘは、０から１までの実数を出力し、ξ_ｎバーが大きければ、大きな値を出力する。すなわち、ＳＮＲが高い場合には、平均先天的ＳＮＲξ_ｎバーよりも推定精度が高い平均後天的ＳＮＲγ_ｎバーを優先的に用いて混合ＳＮＲΞ_ｍｉｘ（ｎ）を計算する。このため、先天的ＳＮＲと後天的ＳＮＲの両方を用いて求めた混合ＳＮＲΞ_ｍｉｘ（ｎ）の推定精度は、先天的ＳＮＲだけを用いて求めたフルバンド推定先天的ＳＮＲΞ（ｎ）よりも高くなる。推定精度が高いＳＮＲを用いて音声存在確率を計算することが可能になるため、図３４の音声存在確率計算部２７は、図３１の音声存在確率計算部２６よりも高い精度を達成できる。 F _mix outputs a real number from 0 to 1, and outputs a large value if ξ _n bar is large. That is, when the SNR is high, calculates the average congenital SNRkushi _n estimation accuracy than bar higher average acquired SNRganma _n bar preferentially used mixed SNRΞ _mix (n). Therefore, the estimation accuracy of the mixed SNRＳ _mix (n) obtained using both the innate SNR and the acquired SNR is higher than the full-band estimation innate SNRＳ (n) obtained using only the innate SNR. . Since it is possible to calculate the speech existence probability using the SNR with high estimation accuracy, the speech presence probability calculation unit 27 in FIG. 34 can achieve higher accuracy than the speech presence probability calculation unit 26 in FIG.

これまで説明した全ての実施の形態では、雑音抑圧の方式として、最小平均２乗誤差短時間スペクトル振幅法を仮定してきたが、その他の方法にも適用することができる。このような方法の例として、非特許文献２に開示されているウィーナーフィルタ法や、非特許文献３に開示されているスペクトル減算法などがあるが、これらの詳細な構成例については説明を省略する。 In all the embodiments described so far, the minimum mean square error short-time spectrum amplitude method has been assumed as a noise suppression method, but it can also be applied to other methods. Examples of such methods include the Wiener filter method disclosed in Non-Patent Document 2 and the spectral subtraction method disclosed in Non-Patent Document 3, but the detailed description of these configuration examples is omitted. To do.

１フレーム分割部
２窓がけ処理部
３フーリエ変換部
４，５０４９カウンタ
５推定雑音計算部
６，１４０２周波数別ＳＮＲ計算部
７，７１，７２推定先天的ＳＮＲ計算部
８雑音抑圧係数生成部
９逆フーリエ変換部
１０フレーム合成部
１１入力端子
１２出力端子
１４重みつき劣化音声計算部
１５抑圧係数補正部
１７２，１８２，２５２，２８２，２９２後抑圧係数計算部
１３，１６，１７０，１７３，７０４，７０５，７１３，７１５，１４０４多重乗算部
１７，１８，２２，２５，２８，２９強調音声振幅スペクトル補正部
２１音声非存在確率記憶部
１７１，２２１，２６，２７，７１４音声存在確率計算部
１７４２，１７４５，７０８，５０４６，１７１６，７０９２，７０９４，２４加算器
７１１，７１２，１７４６，２３遅延器
１５９３，５２０４，５２０６閾値記憶部
１５９４，５２０３，５２０５比較部
１７０２，１７１０，１８５９対数計算部
１７０４，１７０６，１８５４平滑化係数記憶部
１７０５，１７０７，１８５３平滑化部
１７１２，１７１３関数値計算部
１７１４，２６１４平均指標計算部
１７１５，２６１５瞬時指標計算部
２７０５ＳＮＲ混合部
１８５２音声パワー混合部
１８５８平滑信号記憶部
１８６３指数計算部
７０７１_０〜７０７１_Ｋ−１重みつき加算部
７０６重み記憶部
５０３，１３０４，１４２４，１４７５，１５０４，１７２３，７０１４，７０７５多重化部
５０４_０〜５０４_Ｋ−１周波数別推定雑音計算部
５２０更新判定部
５２０７閾値計算部
１５９５，５０４４スイッチ
１８５７，１８６２，１４２１_０〜１４２１_Ｋ−１，５０４８除算部
５０１，５０２，１３０２，１３０３，１４２２，１４２３，１４９５，１５０２，１５０３，１７００，１７０８，１７２２，１８５０，１８５５，７０１３，７０７２，７０７４，２７００，２７０３分離部
１７０１，１７０９，１８５１，１８５６，２７０１，２７０４平均値計算部
７０１多重値域限定処理部
７０２後天的ＳＮＲ記憶部
７０３抑圧係数記憶部
７０７多重重みつき加算部
１４０１，５０４２推定雑音記憶部
９２１瞬時推定ＳＮＲ
９２１_０〜９２１_Ｋ−１周波数別瞬時推定ＳＮＲ
９２２過去の推定ＳＮＲ
９２２_０〜９２２_Ｋ−１過去の周波数別推定ＳＮＲ
９２４推定先天的ＳＮＲ
９２４_０〜９２４_Ｋ−１周波数別推定先天的ＳＮＲ
１４０５多重非線形処理部
１４８５_０〜１４８５_Ｋ−１，５０４２非線形処理部
１５０１_０〜１５０１_Ｋ−１，１９０１_０〜１９０１_Ｋ−１，２００１_０〜２００１_Ｋ−１周波数別抑圧係数補正部
１７２１_０〜１７２１_Ｋ−１，１８２１_０〜１８２１_Ｋ−１，２８２１_０〜２８２１_Ｋ−１，２９２１_０〜２９２１_Ｋ−１周波数別後抑圧係数計算部
１５９１，１６９４，１９２４，７０１２_０〜７０１２_Ｋ−１最大値選択部
１５９２抑圧係数下限値記憶部
１５９６修正値記憶部
１６９１，１９２１有音部用下限値記憶部
１６９２，１９２２無音部用下限値記憶部
２６９１有音部下限値計算部
２６９２無音部用下限値計算部
１６９３，１９２３下限値計算部
１８３１有音部用係数記憶部
２８３１有音部用係数計算部
１８３２無音部用係数計算部
１８３３，１８６１係数計算部
２０１１有音部用補正係数記憶部
２０１２無音部用補正係数記憶部
２０１３補正係数計算部
１３０１_０〜１３０１_Ｋ−１，１５９７，１７０３，１７１１，１７４３，１７４４，１８３４，２０１４，７０９１，７０９３乗算器
１７４１，１８６０，７０９５定数乗算器
５０４５シフトレジスタ
５０４７最小値選択部
５２０１論理和計算部
５０４１レジスタ長記憶部
７０１１定数記憶部
８１１ＭＭＳＥＳＴＳＡゲイン関数値計算部
８１２一般化尤度比計算部
８１４抑圧係数計算部 DESCRIPTION OF SYMBOLS 1 Frame division part 2 Windowing process part 3 Fourier transform part 4,5049 Counter 5 Estimated noise calculation part 6,1402 Frequency-specific SNR calculation part 7,71,72 Estimated innate SNR calculation part 8 Noise suppression coefficient generation part 9 Inverse Fourier Conversion unit 10 Frame synthesis unit 11 Input terminal 12 Output terminal 14 Weighted degraded speech calculation unit 15 Suppression coefficient correction unit 172, 182, 252, 282, 292 Post suppression coefficient calculation unit 13, 16, 170, 173, 704, 705 713, 715, 1404 Multiplexing units 17, 18, 22, 25, 28, 29 Enhanced speech amplitude spectrum correction unit 21 Speech non-existence probability storage units 171, 221, 26, 27, 714 Speech presence probability calculation units 1742, 1745, 708, 5046, 1716, 7092, 7094, 24 Adders 711, 712, 1746, 23 Extender 1593, 5204, 5206 Threshold storage unit 1594, 5203, 5205 Comparison unit 1702, 1710, 1859 Logarithm calculation unit 1704, 1706, 1854 Smoothing coefficient storage unit 1705, 1707, 1853 Smoothing unit 1712, 1713 Function value calculation unit 1714, 2614 Average index calculation unit 1715, 2615 Instantaneous index calculation unit 2705 SNR mixing unit 1852 Audio power mixing unit 1858 Smooth signal storage unit 1863 Exponential calculation unit 7071 _{0 to} 7071 _K-1 Weighted addition unit 706 Weight storage units 503, 1304 , 1424,1475,1504,1723,7014,7075 multiplexer ₅₀₄ 0 _~504 _K-1 frequency domain estimated noise calculator 520 update determination unit 5207 threshold calculating unit 1595,5044 switches 1857,1862,1421 ₀ 421 _K-1, 5048 divider 501,502,1302,1303,1422,1423,1495,1502,1503,1700,1708,1722,1850,1855,7013,7072,7074,2700,2703 separation unit 1701,1709 , 1851, 1856, 2701, 2704 Average value calculation unit 701 Multiple range restriction processing unit 702 Acquired SNR storage unit 703 Suppression coefficient storage unit 707 Multiple weighted addition units 1401, 5042 Estimated noise storage unit 921 Instantaneous estimation SNR
921 _{0 to} 921 _K-1 instantaneous estimated SNR by frequency
922 Past estimated SNR
922 _{0 to} 922 _K-1 Estimated SNR by frequency in the past
924 Estimated innate SNR
924 _{0 to} 924 _K-1 Estimated Innate SNR by Frequency
1405 Multiple nonlinear processing units 1485 _{0 to} 1485 _K−1 , 5042 Nonlinear processing units 1501 _{0 to} 1501 _K−1 , 1901 _{0 to} 1901 _K−1 , 2001 _{0 to} 2001 _K−1 Frequency-specific suppression coefficient correcting units 1721 _{0 to} 1721 _K-1 , 1821 _{0 to} 1821 _K-1 , 2821 _{0 to} 2821 _K-1 , 2921 _{0 to} 2921 _K-1 frequency-specific post-suppression coefficient calculator 1591, 1694, 1924, 7012 _{0 to} 7012 _K-1 maximum value selection Unit 1592 suppression coefficient lower limit value storage unit 1596 modified value storage unit 1691, 1921 voiced part lower limit value storage unit 1692, 1922 silent part lower limit value storage part 2691 voiced part lower limit value calculation part 2692 silent part lower limit value calculation part 1693, 1923 Lower limit calculation unit 1831 Coefficient storage unit for sound part 2831 Coefficient calculation for sound part 1832 silence coefficient calculating unit 1833,1861 coefficient calculation unit 2011 talkspurt correction coefficient storage unit 2012 silence correction coefficient storage unit 2013 the correction coefficient calculation unit 1301 ₀ ~1301 _K-1, 1597,1703,1711,1743 , 1744, 1834, 2014, 7091, 7093 multipliers 1741, 1860, 7095 constant multiplier 5045 shift register 5047 minimum value selection unit 5201 logical sum calculation unit 5041 register length storage unit 7011 constant storage unit 811 MMSE STSA gain function value calculation unit 812 Generalized likelihood ratio calculator 814 Suppression coefficient calculator

Claims

Convert the input signal to a frequency domain signal,
Determine a suppression coefficient based on the frequency domain signal,
Obtaining the relative relationship between speech and noise based on the frequency domain signal,
A contribution rate is determined based on the relative relationship ,
Obtaining a minimum suppression coefficient based on the contribution rate and a predetermined first temporary minimum suppression coefficient and a second temporary minimum suppression coefficient ;
Comparing the suppression coefficient with the minimum suppression coefficient;
The one with the larger value is the correction suppression coefficient,
A noise suppression method for suppressing noise by weighting the corrected suppression coefficient to the frequency domain signal ,
The minimum suppression coefficient is
Using the contribution rate as a weight,
A noise suppression method characterized by being determined by a weighted sum of the first temporary minimum suppression coefficient and the second temporary minimum suppression coefficient .

The first temporary minimum suppression coefficient and the second temporary minimum suppression coefficient, the method of noise suppression according to claim 1, characterized in that it is determined based on the frequency-domain signal.

A converter for converting an input signal into a frequency domain signal;
A suppression coefficient calculator that determines a suppression coefficient based on the frequency domain signal;
A relative relationship calculation unit for obtaining a relative relationship between speech and noise based on the frequency domain signal;
Determining a contribution rate based on the relative relationship, a minimum suppression coefficient calculation unit for obtaining a minimum suppression coefficient based on the contribution rate and a predetermined first temporary minimum suppression coefficient and a second temporary minimum suppression coefficient ;
A correction suppression coefficient calculation unit that compares the suppression coefficient with the minimum suppression coefficient and sets a larger value as a correction suppression coefficient;
A weighting operation unit for weighting the corrected suppression coefficient to the frequency domain signal;
Including a noise suppression device comprising :
The minimum suppression coefficient is
Using the contribution rate as a weight,
An apparatus for noise suppression, which is determined by a weighted sum of the first temporary minimum suppression coefficient and the second temporary minimum suppression coefficient .

The apparatus for noise suppression according to claim 3, wherein the first temporary minimum suppression coefficient and the second temporary minimum suppression coefficient are obtained based on the frequency domain signal.