JP5413575B2

JP5413575B2 - Noise suppression method, apparatus, and program

Info

Publication number: JP5413575B2
Application number: JP2009049915A
Authority: JP
Inventors: 昭彦杉山
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2009-03-03
Filing date: 2009-03-03
Publication date: 2014-02-12
Anticipated expiration: 2029-03-03
Also published as: JP2010204392A

Description

本発明は雑音抑圧の方法、装置及びプログラムに関する。 The present invention relates to a noise suppression method, apparatus, and program.

ノイズサプレッサ（雑音抑圧システム）は、所望の音声信号に重畳されている雑音（ノイズ）を抑圧するシステムであり、一般的に、周波数領域に変換した入力信号を用いて雑音成分のパワースペクトルを推定し、この推定パワースペクトルを入力信号から差し引くことにより、所望の音声信号に混在する雑音を抑圧するように動作する。雑音成分のパワースペクトルを継続的に推定することにより、非定常な雑音の抑圧にも適用することができる。ノイズサプレッサとしては、例えば、特許文献１に記載されている方式がある。 A noise suppressor (noise suppression system) is a system that suppresses noise (noise) superimposed on a desired audio signal, and generally estimates the power spectrum of a noise component using an input signal converted to the frequency domain. Then, the estimated power spectrum is subtracted from the input signal to operate so as to suppress noise mixed in the desired audio signal. By continuously estimating the power spectrum of the noise component, it can also be applied to non-stationary noise suppression. As a noise suppressor, for example, there is a method described in Patent Document 1.

さらに、演算量を削減した実現として、非特許文献１に記載されている方式がある。 Furthermore, there is a method described in Non-Patent Document 1 as an implementation in which the amount of calculation is reduced.

これらいずれの方式も、基本的な動作は等しい。すなわち、入力信号を線形変換で周波数領域に変換し、振幅成分を取り出して周波数成分毎に抑圧係数を計算する。その抑圧係数と各周波数成分における振幅の積と各周波数成分の位相を組み合わせて逆変換して雑音抑圧された出力を得る。このとき、抑圧係数はゼロと１の間の値であり、ゼロなら完全抑圧で出力はゼロ、１なら抑圧なしで入力がそのまま出力される。 Both of these methods have the same basic operation. That is, the input signal is converted into the frequency domain by linear conversion, the amplitude component is extracted, and the suppression coefficient is calculated for each frequency component. A noise-suppressed output is obtained by combining the suppression coefficient, the product of the amplitude of each frequency component, and the phase of each frequency component and performing inverse transform. At this time, the suppression coefficient is a value between zero and 1, and if it is zero, the output is completely suppressed, and if it is 1, the input is output as it is without suppression.

特許文献１に開示されたノイズサプレッサは、サンプル値系列として供給された劣化音声信号（所望音声信号と雑音の混在する信号）にフーリエ変換などの変換を施して複数の周波数成分に分割し、その中に含まれる雑音のパワースペクトルを複数の周波数成分それぞれに対して推定する。雑音推定方式の一例としては、過去の信号対雑音比で劣化音声を重み付けて雑音成分とする方式があり、その詳細は特許文献１に記載されている。供給された劣化音声パワースペクトルと推定雑音パワースペクトルを用いて、劣化音声に乗算することによって雑音が抑圧された強調音声を求めるための抑圧係数を生成する。抑圧係数は周波数成分毎に求めるので、周波数成分の数と等しい抑圧係数が得られる。雑音抑圧係数生成の一例としては、強調音声の平均二乗パワーを最小化する最小平均二乗短時間スペクトル振幅法が広く用いられており、その詳細は特許文献１に記載されている。抑圧係数生成の過程で、先天的ＳＮＲも周波数別に推定する。推定先天的ＳＮＲは、抑圧係数生成に用いられると同時に、補正抑圧係数を求めるためにも用いられる。劣化音声と抑圧係数を各周波数で乗算し、その積を強調音声のパワースペクトルとして逆変換を行い、強調音声信号サンプルとする。これまでの処理ではパワースペクトルを用いた例を説明したが、代わりにその平方根に相当する振幅値を用いることができることは、広く知られている。 The noise suppressor disclosed in Patent Document 1 performs a transform such as Fourier transform on a degraded speech signal (a signal in which a desired speech signal and noise are mixed) supplied as a sample value series, and divides the signal into a plurality of frequency components. The power spectrum of noise contained therein is estimated for each of a plurality of frequency components. As an example of the noise estimation method, there is a method in which degraded speech is weighted with a past signal-to-noise ratio to obtain a noise component, and details thereof are described in Patent Document 1. By using the supplied deteriorated voice power spectrum and estimated noise power spectrum, a suppression coefficient for obtaining an emphasized voice in which noise is suppressed by multiplying the deteriorated voice is generated. Since the suppression coefficient is obtained for each frequency component, a suppression coefficient equal to the number of frequency components can be obtained. As an example of generating a noise suppression coefficient, a minimum mean square short-time spectrum amplitude method for minimizing the mean square power of emphasized speech is widely used, and details thereof are described in Patent Document 1. In the process of generating the suppression coefficient, the innate SNR is also estimated for each frequency. The estimated innate SNR is used for generating a suppression coefficient as well as for generating a correction coefficient. The degraded speech and the suppression coefficient are multiplied by each frequency, and the product is inversely transformed as the power spectrum of the enhanced speech to obtain an enhanced speech signal sample. Although an example using a power spectrum has been described so far, it is widely known that an amplitude value corresponding to the square root can be used instead.

これまで説明した関連する技術の構成では、残留雑音と出力歪が一般的にトレードオフの関係にあり、小さな残留雑音と小さな出力歪を両立することはできない。このため、過剰な抑圧による出力信号歪の増加を避けるためにある程度の残留雑音を許容するか、反対に、十分小さな残留雑音のために過剰な抑圧による出力歪を許容するかのどちらかを受け入れる必要があった。この問題に対して、推定雑音と仮出力から計算した音声存在確率に基づいて、音声区間では低歪を優先した抑圧を、非音声区間では低残留雑音を優先した抑圧を行うノイズサプレッサが、非特許文献２に開示されている。 In the configuration of the related technology described so far, the residual noise and output distortion are generally in a trade-off relationship, and it is impossible to achieve both small residual noise and small output distortion. Therefore, either tolerate some residual noise to avoid an increase in output signal distortion due to excessive suppression, or conversely, allow output distortion due to excessive suppression for sufficiently small residual noise. There was a need. In response to this problem, a noise suppressor that performs suppression with priority given to low distortion in the speech period and suppression with priority given to low residual noise in the non-speech period is based on the speech existence probability calculated from the estimated noise and the temporary output. It is disclosed in Patent Document 2.

非特許文献２に開示されたノイズサプレッサは、音声区間では低歪を優先した抑圧を、非音声区間では低残留雑音を優先した抑圧を行い、さらに音声区間と非音声区間の残留雑音レベルに不連続が生じないように、抑圧係数を設定する。このため、非音声区間における小さな残留雑音と音声区間における小さな出力歪を両立し、さらに両者の境界において不連続が生じない、高音質な強調音声を出力することができる。 The noise suppressor disclosed in Non-Patent Document 2 performs suppression that prioritizes low distortion in the speech section, suppresses priority on low residual noise in the non-speech section, and further suppresses residual noise levels in the speech and non-speech sections. A suppression coefficient is set so that continuity does not occur. For this reason, it is possible to output high-quality emphasized speech in which a small residual noise in a non-speech segment and a small output distortion in a speech segment are compatible, and discontinuity does not occur at the boundary between the two.

特開２００２−２０４１７５号公報JP 2002-204175 A

２００６年５月、プロシーディングス・オブ・アイ・シー・エイ・エス・エス・ピー、(PROCEEDINGS OF ICASSP, VOL.I, PP.473-476, MAY, 2006)、473〜476ページMay 2006, Proceedings of ISCSP, (PROCEEDINGS OF ICASSP, VOL.I, PP.473-476, MAY, 2006), pages 473-476 ２００７年１月、プロシーディングス・オブ・アイ・シー・シー・イー、(PROCEEDINGS OF ICCE, 6.1-4, JAN., 2007)January 2007, Proceedings of ICC, (PROCEEDINGS OF ICCE, 6.1-4, JAN., 2007)

しかしながら、これまで説明した非特許文献２に開示された構成では、特に音声に対して雑音が大きなパワーを有する際に、音声区間と非音声区間の区別に誤りが生じるという問題がある。このため、非音声区間で大きな残留雑音を生じる、または音声区間で歪が大きくなる場合がある。これは、出力である強調音声の音質低下として知覚される。 However, in the configuration disclosed in Non-Patent Document 2 described so far, there is a problem that an error occurs in the distinction between a speech section and a non-speech section, particularly when noise has a large power for speech. For this reason, a large residual noise may be generated in a non-speech section or distortion may be increased in a speech section. This is perceived as a reduction in sound quality of the emphasized speech that is the output.

そこで、本発明は上記課題に鑑みて発明されたものであって、その目的は、音声に対して雑音が大きなパワーを有する場合でも、非音声区間における小さな残留雑音と音声区間における小さな出力歪を両立し、高音質な強調音声を出力することのできる雑音抑圧の方法、装置、及びプログラムを提供することにある。 Therefore, the present invention has been invented in view of the above problems, and its purpose is to reduce a small residual noise in a non-voice section and a small output distortion in a voice section even when the noise has a large power for the voice. It is an object to provide a noise suppression method, apparatus, and program that are compatible and can output high-quality enhanced speech.

上記課題を解決する本発明は、入力信号を周波数領域信号に変換し、該周波数領域信号を用いて第１の推定雑音を求め、該第１の推定雑音と前記周波数領域信号を用いて抑圧係数を定め、該抑圧係数で前記周波数領域信号を重みづけして仮出力を求め、該仮出力を用いて第２の推定雑音を求め、該第２の推定雑音を用いて音声らしい区間と非音声らしい区間を定め、前記音声らしい区間では歪が少なくなるように、前記非音声らしい区間では残留雑音が少なくなるように、前記抑圧係数を補正して補正抑圧係数を求め、該補正抑圧係数で前記周波数領域信号を重みづけすることによって雑音を抑圧することを特徴とする雑音抑圧の方法である。 The present invention that solves the above problem converts an input signal into a frequency domain signal, obtains a first estimated noise using the frequency domain signal, and uses the first estimated noise and the frequency domain signal to suppress a suppression coefficient. And calculating a temporary output by weighting the frequency domain signal with the suppression coefficient, determining a second estimated noise using the temporary output, and using the second estimated noise to determine a speech-like interval and a non-speech A correct interval is determined, and a correction suppression coefficient is obtained by correcting the suppression coefficient so that residual noise is reduced in the non-speech interval so that distortion is reduced in the speech-like interval, and the correction suppression coefficient is used to determine the correction suppression coefficient. A noise suppression method is characterized in that noise is suppressed by weighting a frequency domain signal.

上記課題を解決する本発明は、入力信号を周波数領域信号に変換し、該周波数領域信号を用いて前記入力信号における第１の音声含有率を推定し、該含有率に応じて決定した抑圧係数を用いて前記入力信号に含まれる雑音を抑圧して強調音声を求める際に、前記抑圧係数を求め、前記強調信号に残留する雑音を推定し、前記残留する雑音を用いて前記強調音声における第２の音声含有率を求め、前記第２の音声含有率が高いときには前記強調音声の歪が少なくなるように、前記第２の音声含有率が低いときには前記強調音声に残留する雑音が少なくなるように、前記入力信号に含まれる雑音を抑圧して強調音声を求めることを特徴とする雑音抑圧の方法である。尚、本発明における第一の音声含有率と第二の音声含有率とが、上述した本発明の第一の推定雑音と第二の推定雑音とに相当する。 The present invention that solves the above-described problems converts an input signal into a frequency domain signal, estimates a first voice content rate in the input signal using the frequency domain signal, and determines a suppression coefficient determined according to the content rate Is used to suppress the noise included in the input signal and obtain the enhanced speech, to obtain the suppression coefficient, to estimate the noise remaining in the enhanced signal, and to use the remaining noise to When the second audio content rate is high, distortion of the emphasized speech is reduced. When the second audio content rate is low, noise remaining in the emphasized speech is reduced. In addition, the noise suppression method is characterized in that noise included in the input signal is suppressed to obtain enhanced speech. In addition, the 1st audio | voice content rate and 2nd audio | voice content rate in this invention are corresponded to the 1st estimated noise and 2nd estimated noise of this invention mentioned above.

上記課題を解決する本発明は、入力信号を周波数領域信号に変換する変換部と、該周波数領域信号を用いて第１の推定雑音を求める第１の雑音推定部と、該第１の推定雑音と前記周波数領域信号を用いて抑圧係数を定める雑音抑圧係数生成部と、該抑圧係数で前記周波数領域信号を重みづけして仮出力を求める第１の乗算器と、該仮出力を用いて第２の推定雑音を求める第２の雑音推定部と、該第２の推定雑音を用いて音声らしい区間と非音声らしい区間を定める仮出力ＳＮＲ計算部と、前記音声らしい区間では歪が少なくなるように、前記非音声らしい区間では残留雑音が少なくなるように、前記抑圧係数を補正して補正抑圧係数を求める抑圧係数補正部と、該補正抑圧係数で前記周波数領域信号を重みづけすることによって雑音を抑圧する第２の乗算器とを有することを特徴とする雑音抑圧装置である。 The present invention for solving the above-described problems includes a conversion unit that converts an input signal into a frequency domain signal, a first noise estimation unit that obtains a first estimated noise using the frequency domain signal, and the first estimated noise. And a noise suppression coefficient generator that determines a suppression coefficient using the frequency domain signal, a first multiplier that obtains a temporary output by weighting the frequency domain signal with the suppression coefficient, and a first multiplier using the temporary output A second noise estimator that obtains the estimated noise of 2, a temporary output SNR calculator that uses the second estimated noise to determine a speech-like section and a non-speech-like section, and distortion is reduced in the speech-like section. In addition, a suppression coefficient correction unit that corrects the suppression coefficient to obtain a corrected suppression coefficient so that residual noise is reduced in the non-voice-like section, and noise by weighting the frequency domain signal with the corrected suppression coefficient Repress A noise suppression apparatus characterized by a second multiplier.

上記課題を解決する本発明は、コンピュータに、入力信号を周波数領域信号に変換し、該周波数領域信号を用いて第１の推定雑音を求め、該第１の推定雑音と前記周波数領域信号を用いて抑圧係数を定め、該抑圧係数で前記周波数領域信号を重みづけして仮出力を求め、該仮出力を用いて第２の推定雑音を求め、該第２の推定雑音を用いて音声らしい区間と非音声らしい区間を定め、前記音声らしい区間では歪が少なくなるように、前記非音声らしい区間では残留雑音が少なくなるように、前記抑圧係数を補正して補正抑圧係数を求め、該補正抑圧係数で前記周波数領域信号を重みづけすることによって雑音を抑圧する処理を実行させるための雑音抑圧プログラムである。 The present invention that solves the above-described problems is a computer that converts an input signal into a frequency domain signal, obtains a first estimated noise using the frequency domain signal, and uses the first estimated noise and the frequency domain signal. A suppression coefficient is determined, the frequency domain signal is weighted with the suppression coefficient to obtain a temporary output, a second estimated noise is obtained using the temporary output, and a speech-like section is obtained using the second estimated noise. A non-voice-like section is determined, and a correction suppression coefficient is obtained by correcting the suppression coefficient so that residual noise is reduced in the non-voice-like section so that distortion is reduced in the voice-like section, and the correction suppression A noise suppression program for executing a process of suppressing noise by weighting the frequency domain signal with a coefficient.

本発明は、非音声区間における小さな残留雑音と音声区間における小さな出力歪を両立し、さらに両者の境界において不連続が生じない、高音質な強調音声を出力することができる。 The present invention can output high-quality emphasized speech in which a small residual noise in a non-speech segment and a small output distortion in a speech segment are compatible, and discontinuity does not occur at the boundary between the two.

図１は本発明の最良の実施の形態を示すブロック図である。FIG. 1 is a block diagram showing the best mode of the present invention. 図２は図１に含まれる変換部の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of the conversion unit included in FIG. 図３は図１に含まれる逆変換部の構成を示すブロック図である。FIG. 3 is a block diagram showing the configuration of the inverse transform unit included in FIG. 図４は図１に含まれる雑音推定部の構成を示すブロック図である。FIG. 4 is a block diagram showing the configuration of the noise estimation unit included in FIG. 図５は図４に含まれる推定雑音計算部の構成を示すブロック図である。FIG. 5 is a block diagram showing the configuration of the estimated noise calculator included in FIG. 図６は図５に含まれる更新判定部の構成を示すブロック図である。FIG. 6 is a block diagram showing the configuration of the update determination unit included in FIG. 図７は図４に含まれる重み付き劣化音声計算部の構成を示すブロック図である。FIG. 7 is a block diagram showing the configuration of the weighted deteriorated speech calculation unit included in FIG. 図８は図７に含まれる非線形処理部における非線形関数の一例を示す図である。FIG. 8 is a diagram illustrating an example of a nonlinear function in the nonlinear processing unit included in FIG. 図９は図１に含まれる雑音抑圧係数生成部の構成を示すブロック図である。FIG. 9 is a block diagram showing the configuration of the noise suppression coefficient generator included in FIG. 図１０は図９に含まれる推定先天的SNR計算部の構成を示すブロック図である。FIG. 10 is a block diagram showing the configuration of the estimated innate SNR calculator included in FIG. 図１１は図１０に含まれる重み付き加算部の構成を示すブロック図である。FIG. 11 is a block diagram showing the configuration of the weighted addition unit included in FIG. 図１２は図９に含まれる雑音抑圧係数計算部の構成を示すブロック図である。FIG. 12 is a block diagram showing the configuration of the noise suppression coefficient calculation unit included in FIG. 図１３は図１に含まれる抑圧係数補正部の構成を示すブロック図である。FIG. 13 is a block diagram showing the configuration of the suppression coefficient correction unit included in FIG. 図１４は本発明の第２の実施の形態を示すブロック図である。FIG. 14 is a block diagram showing a second embodiment of the present invention.

図１は、本発明の最良の実施の形態を示すブロック図である。入力端子１に供給された劣化音声は、変換部２においてフーリエ変換などの変換を施して複数の周波数成分に分割され、その振幅２乗成分（パワースペクトル）が乗算器５、雑音推定部３００、雑音抑圧係数生成部６０１、及び乗算器６６０へ供給される。位相は、逆変換部３に伝達される。 FIG. 1 is a block diagram showing a preferred embodiment of the present invention. The degraded speech supplied to the input terminal 1 is subjected to transformation such as Fourier transformation in the transformation unit 2 and divided into a plurality of frequency components, and the squared amplitude component (power spectrum) is the multiplier 5, the noise estimation unit 300, The noise suppression coefficient generation unit 601 and the multiplier 660 are supplied. The phase is transmitted to the inverse conversion unit 3.

雑音推定部３００は、劣化音声パワースペクトルの中に含まれる雑音のパワースペクトルを複数の周波数成分それぞれに対して推定し、雑音抑圧係数生成部６０１に伝達する。雑音抑圧係数生成部６０１は、劣化音声パワースペクトルと推定雑音パワースペクトルを用いて抑圧係数を生成し、乗算器６６０と抑圧係数補正部６５１とに供給する。乗算器６６０は、劣化音声パワースペクトルと抑圧係数との積を仮出力として求め、音声存在確率計算部６７０、仮出力ＳＮＲ計算部６８０、及び雑音推定部３０１に供給する。雑音推定部３０１は、乗算器６６０の乗算結果である仮出力を受け、仮出力に含まれる雑音のパワースペクトルを複数の周波数成分ｋそれぞれに対して推定し、仮出力の推定雑音θ_ｎ ^２（ｋ）として音声存在確率計算部６７０と仮出力ＳＮＲ計算部６８０に伝達する。 The noise estimation unit 300 estimates the power spectrum of noise included in the degraded speech power spectrum for each of a plurality of frequency components, and transmits the estimated noise power spectrum to the noise suppression coefficient generation unit 601. The noise suppression coefficient generation unit 601 generates a suppression coefficient using the degraded speech power spectrum and the estimated noise power spectrum, and supplies the suppression coefficient to the multiplier 660 and the suppression coefficient correction unit 651. Multiplier 660 obtains the product of the degraded speech power spectrum and the suppression coefficient as a temporary output, and supplies the product to speech presence probability calculation unit 670, temporary output SNR calculation unit 680, and noise estimation unit 301. The noise estimation unit 301 receives a temporary output that is a multiplication result of the multiplier 660, estimates a power spectrum of noise included in the temporary output for each of a plurality of frequency components k, and estimates a temporary output estimated noise θ _n ² ( k) is transmitted to the speech existence probability calculation unit 670 and the provisional output SNR calculation unit 680.

音声存在確率計算部６７０は、仮出力の推定雑音と仮出力からフレームｎ、周波数成分ｋにおける音声存在確率Ｖ_ｎ（ｋ）を求めて、仮出力ＳＮＲ計算部６８０と抑圧係数補正部６５１に供給する。音声存在確率の一例として、仮出力ｘ_ｎ ^２（ｋ）と仮出力の推定雑音θ_ｎ ^２（ｋ）の比を用いることができる。すなわち、次式によって音声存在確率Ｖ_ｎ（ｋ）を定める。 The speech existence probability calculation unit 670 obtains the speech existence probability V _n (k) in the frame n and the frequency component k from the estimated noise and provisional output of the temporary output, and supplies the speech existence probability V _n (k) to the temporary output SNR calculation unit 680 and the suppression coefficient correction unit 651. To do. As an example of the speech existence probability, a ratio between the temporary output x _n ² (k) and the estimated noise θ _n ² (k) of the temporary output can be used. That is, the voice existence probability V _n (k) is determined by the following equation.

この比が大きいときには仮出力における瞬時的な音声存在確率が高く、小さいときには音声存在確率が低い。

When this ratio is large, the instantaneous speech existence probability at the temporary output is high, and when it is small, the speech existence probability is low.

仮出力ＳＮＲ計算部６８０は、音声存在確率Ｖ_ｎ（ｋ）を用いて、仮出力と仮出力との推定雑音から仮出力ＳＮＲを求め、抑圧係数補正部６５１に供給する。仮出力ＳＮＲの一例として、仮出力の長時間平均ｘ_ｎ ^２（ｋ）バーと仮出力の推定雑音パワースペクトルθ_ｎ ^２（ｋ）による長時間出力ＳＮＲξ_ｎ ^Ｌ（ｋ）を用いることができる。このとき、仮出力ＳＮＲξ_ｎ ^Ｌ（ｋ）は次式で与えられる。 The temporary output SNR calculation unit 680 obtains the temporary output SNR from the estimated noise between the temporary output and the temporary output using the speech existence probability V _n (k), and supplies the calculated SNR to the suppression coefficient correction unit 651. As an example of the temporary output SNR, a long-time output SNRξ _n ^L (k) based on the temporary output average x _n ² (k) bar and the estimated noise power spectrum θ _n ² (k) of the temporary output can be used. At this time, the temporary output SNRξ _n ^L (k) is given by the following equation.

仮出力の長時間平均ｘ_ｎ ^２（ｋ）バーは、音声存在確率計算部６７０から供給された音声存在確率Ｖ_ｎ（ｋ）に応じて更新する。音声存在確率Ｖ_ｎ（ｋ）が大きいときは仮出力の長時間平均を更新し、小さいときは仮出力の長時間平均を更新せずに保持する。これは、仮出力の長時間平均が仮出力中の音声成分を求めるためのものだからである。

The long-term average x _n ² (k) bar of the temporary output is updated according to the voice presence probability V _n (k) supplied from the voice presence probability calculation unit 670. When the voice existence probability V _n (k) is large, the long-term average of the temporary output is updated, and when it is small, the long-term average of the temporary output is held without being updated. This is because the long-term average of the temporary output is used to obtain the sound component during the temporary output.

音声存在確率Ｖ_ｎ（ｋ）が大きいかどうかの判定は、音声存在確率Ｖ_ｎ（ｋ）を予め定められた閾値Ｖ_ｔｈ（ｋ）と比較して行うことができる。音声存在確率が大きいということは、音声区間である可能性が高いことを意味する。フレームｎに音声区間である確率をμ_ｎ（ｋ）とすると、非音声区間である確率は１−μ_ｎ（ｋ）となる。このとき、長時間平均ｘ_ｎ＋１ ^２（ｋ）バーは、現在の長時間平均ｘ_ｎ ^２（ｋ）バーを用いて次式で更新することができる。 The determination of whether or not the voice presence probability V _n (k) is large can be made by comparing the voice presence probability V _n (k) with a predetermined threshold value V _th (k). A high voice presence probability means that there is a high possibility that the voice section is a voice section. If the probability that the frame n is a speech segment is μ _n (k), the probability that it is a non-speech segment is 1−μ _n (k). At this time, the long-time average x _{n + 1} ² (k) bar can be updated by the following equation using the current long-time average x _n ² (k) bar.

ここに、Ｎは整数であり、ｘ_０ ^２（ｋ）バー＝ｘ_０ ^２（ｋ）とする。すなわち、音声区間である確率を用いて仮出力を、非音声区間である確率を用いて長時間平均の現在の値を重み付けて加算し、長時間平均に対する更新成分とする。μ_ｎ（ｋ）は様々な形で定義できるが、例えば次式を用いることができる。

Here, N is an integer, and x ₀ ² (k) bar = x ₀ ² (k). That is, the provisional output is added using the probability that is a speech interval, and the current value of the long-term average is weighted and added using the probability that is a non-speech interval, thereby obtaining an update component for the long-term average. μ _n (k) can be defined in various forms. For example, the following equation can be used.

これは、音声区間である確率の基本的な値を０．５とし、音声存在確率Ｖ_ｎ（ｋ）が閾値Ｖ_ｔｈ（ｋ）からずれている量の半分でこれを補正することを表す。音声存在確率Ｖ_ｎ（ｋ）の値に応じた重みをつけた平均として仮出力の長時間平均を計算することになる。

This represents that the basic value of the probability of being a speech section is 0.5, and this is corrected by half the amount by which the speech existence probability V _n (k) deviates from the threshold value V _th (k). The long-term average of the temporary output is calculated as an average weighted according to the value of the voice existence probability V _n (k).

また、Ｖ_ｎ（ｋ）−Ｖ_ｔｈ（ｋ）の符号をとって、μ_ｎ（ｋ）を次式で計算することもできる。 Further, μ _n (k) can be calculated by the following equation by taking the sign of V _n (k) −V _th (k).

これは、Ｖ_ｎ（ｋ）が閾値Ｖ_ｔｈ（ｋ）を超えたときは完全に音声区間と判断し、数３における長時間平均の現在の値の貢献をゼロにする。反対に、Ｖ_ｎ（ｋ）が閾値Ｖ_ｔｈ（ｋ）より小さいときは完全に非音声区間と判断し、数３における仮出力の貢献をゼロにする。

This means that when V _n (k) exceeds the threshold value V _th (k), it is completely judged as a speech interval, and the contribution of the current value of the long-term average in Equation 3 is made zero. On the other hand, when V _n (k) is smaller than the threshold value V _th (k), it is completely judged as a non-voice interval, and the contribution of the temporary output in Equation 3 is made zero.

抑圧係数補正部６５１は、仮出力ＳＮＲと音声存在確率Ｖ_ｎ（ｋ）を用いて抑圧係数Ｇ_ｎ（ｋ）バーを補正し、補正抑圧係数Ｇ_ｎ（ｋ）ハットとして乗算器５に供給すると同時に雑音抑圧係数生成部６０１に帰還する。乗算器５は、変換部２から供給された劣化音声と抑圧係数補正部６５１から供給された補正抑圧係数を各周波数で乗算し、その積を強調音声のパワースペクトルとして逆変換部３に伝達する。逆変換部３は、乗算器５から供給された強調音声パワースペクトルと変換部２から供給された劣化音声の位相を合わせて逆変換を行い、強調音声信号サンプルとして、出力端子４に供給する。これまでの処理ではパワースペクトルを用いた例を説明したが、代わりにその平方根に相当する振幅値を用いることができることは、広く知られている。 When the suppression coefficient correction unit 651 corrects the suppression coefficient G _n (k) bar using the temporary output SNR and the voice presence probability V _n (k) and supplies the corrected suppression coefficient G _n (k) to the multiplier 5 as a corrected suppression coefficient G _n (k) hat. At the same time, it returns to the noise suppression coefficient generation unit 601. The multiplier 5 multiplies the deteriorated speech supplied from the conversion unit 2 and the correction suppression coefficient supplied from the suppression coefficient correction unit 651 by each frequency, and transmits the product to the inverse conversion unit 3 as the power spectrum of the emphasized speech. . The inverse conversion unit 3 performs inverse conversion by matching the phase of the enhanced speech power spectrum supplied from the multiplier 5 and the deteriorated speech supplied from the conversion unit 2 and supplies the result to the output terminal 4 as an enhanced speech signal sample. Although an example using a power spectrum has been described so far, it is widely known that an amplitude value corresponding to the square root can be used instead.

図２は、変換部２の構成を示すブロック図である。変換部２はフレーム分割部２１、窓がけ処理部２２、及びフーリエ変換部２３から構成されている。劣化音声信号サンプルは、フレーム分割部２１に供給され、Ｋ／２サンプル毎のフレームに分割される。ここに、Ｋは偶数とする。フレームに分割された劣化音声信号サンプルは、窓がけ処理部２２に供給され、窓関数ｗ（ｔ）との乗算が行なわれる。第ｎフレームの入力信号ｙ_ｎ（ｔ）（ｔ＝０，１，．．．，Ｋ／２−１）に対するｗ（ｔ）で窓がけされた信号ｙ_ｎ（ｔ）バーは、次式で与えられる。 FIG. 2 is a block diagram illustrating a configuration of the conversion unit 2. The converting unit 2 includes a frame dividing unit 21, a windowing processing unit 22, and a Fourier transform unit 23. The deteriorated speech signal samples are supplied to the frame dividing unit 21 and divided into frames for every K / 2 samples. Here, K is an even number. The degraded speech signal samples divided into frames are supplied to the windowing processing unit 22 and multiplied by the window function w (t). The signal y _n (t) bar windowed with w (t) for the input signal y _n (t) (t = 0, 1,..., K / 2-1) of the nth frame is given by Given.

また、連続する２フレームの一部を重ね合わせ（オーバラップ）して窓がけすることも広く行なわれている。オーバラップ長としてフレーム長の５０％を仮定すれば、ｔ＝０，１，．．．，Ｋ／２−１に対して、

In addition, it is also widely performed to overlap a part of two consecutive frames. Assuming 50% of the frame length as the overlap length, t = 0, 1,. . . , K / 2-1 for

で得られるｙ_ｎ（ｔ）バー（ｔ＝０，１，．．．，Ｋ−１が、窓がけ処理部２２の出力となる。実数信号に対しては、左右対称窓関数が用いられる。また、窓関数は、抑圧係数を１に設定したときの入力信号と出力信号が計算誤差を除いて一致するように設計される。これは、ｗ（ｔ）＋ｗ（ｔ＋Ｋ／２）＝１となることを意味する。

Y _n (t) bars (t = 0, 1,..., K−1) obtained by the above are the outputs of the windowing processing unit 22. For real signals, a symmetrical window function is used. The window function is designed so that the input signal and the output signal match when the suppression coefficient is set to 1, except for the calculation error, which is w (t) + w (t + K / 2) = 1. It means to become.

以後、連続する２フレームの５０％をオーバラップして窓がけする場合を例として説明を続ける。ｗ（ｔ）しては、例えば次式に示すハニング窓を用いることができる。 Hereinafter, the description will be continued by taking as an example a case in which 50% of two consecutive frames overlap each other. As w (t), for example, a Hanning window represented by the following equation can be used.

このほかにも、ハミング窓、ケイザー窓、ブラックマン窓など、様々な窓関数が知られている。窓がけされた出力ｙ_ｎ（ｔ）バーはフーリエ変換部２３に供給され、劣化音声スペクトルＹ_ｎ（ｋ）に変換される。劣化音声スペクトルＹ_ｎ（ｋ）は位相と振幅に分離され、劣化音声位相スペクトルａｒｇＹ_ｎ（ｋ）は逆変換部３に、劣化音声パワースペクトル｜Ｙ_ｎ（ｋ）｜^２は、乗算器５、雑音推定部３００、及び雑音抑圧係数生成部６０１に供給される。

In addition, various window functions such as a Hamming window, a Kaiser window, and a Blackman window are known. The windowed output y _n (t) bar is supplied to the Fourier transform unit 23 and converted into a degraded speech spectrum Y _n (k). The degraded speech spectrum Y _n (k) is separated into phase and amplitude, the degraded speech phase spectrum arg Y _n (k) is sent to the inverse transform unit 3, and the degraded speech power spectrum | Y _n (k) | ² is the multiplier 5. The noise estimation unit 300 and the noise suppression coefficient generation unit 601 are supplied.

図３は、逆変換部３の構成を示すブロック図である。逆変換部３は逆フーリエ変換部３３、窓がけ処理部３２、及びフレーム合成部３１から構成されている。逆フーリエ変換部３３は、乗算器５から供給された強調音声パワースペクトル｜Ｘ_ｎ（ｋ）｜^２バーを用いて求めた強調音声振幅スペクトル｜Ｘ_ｎ（ｋ）｜バーと変換部２から供給された劣化音声位相スペクトルａｒｇＹ_ｎ（ｋ）とを乗算して、強調音声Ｘ_ｎ（ｋ）バーを求める。すなわち、 FIG. 3 is a block diagram showing the configuration of the inverse transform unit 3. The inverse transform unit 3 includes an inverse Fourier transform unit 33, a window processing unit 32, and a frame composition unit 31. Inverse Fourier transform unit 33, supplied from the multiplier 5 enhanced speech power spectrum | X n _(k) | enhanced speech amplitude spectrum obtained using a ² bar | X n _(k) | bar and supplied from the conversion unit 2 By multiplying the degraded speech phase spectrum arg Y _n (k), the enhanced speech X _n (k) bar is obtained. That is,

を実行する。

Execute.

得られた強調音声Ｘ_ｎ（ｋ）バーに逆フーリエ変換を施し、１フレームがＫサンプルから構成される時間領域サンプル値系列ｘ_ｎ（ｔ）バー（ｔ＝０，１，．．．，Ｋ−１）として、窓がけ処理部３２に供給し、窓関数ｗ（ｔ）との乗算を行う。第ｎフレームの入力信号ｘ_ｎ（ｔ）バー（ｔ＝０，１，．．．，Ｋ／２−１）に対するｗ（ｔ）で窓がけされた信号ｘ_ｎ（ｔ）バーは、次式で与えられる。 The obtained emphasized speech X _n (k) bar is subjected to inverse Fourier transform, and a time domain sample value sequence x _n (t) bar (t = 0, 1,. -1) is supplied to the windowing processing unit 32 and multiplied by the window function w (t). Input signal _x n (t) bar of the n-th frame (t = 0,1, ..., K / 2-1) with respect to w (t) signal window was morning by _x n (t) bar following formula Given in.

で得られるｙ_ｎ（ｔ）バー（ｔ＝０，１，．．．，Ｋ−１）が、窓がけ処理部３２の出力となり、フレーム合成部３１に伝達される。フレーム合成部３１は、ｘ_ｎ（ｔ）バーの隣接する２フレームからＫ／２サンプルずつを取り出して重ね合わせ、

Y _n (t) bars (t = 0, 1,..., K−1) obtained in the above are output from the windowing processing unit 32 and transmitted to the frame synthesis unit 31. The frame synthesis unit 31 extracts K / 2 samples from two adjacent frames of the x _n (t) bar and superimposes them,

によって、強調音声ｘ_ｎ（ｔ）ハットを得る。得られた強調音声ｘ_ｎ（ｔ）ハット（ｔ＝０，１，．．．，Ｋ−１）が、フレーム合成部３１の出力として、出力端子４に伝達される。図２と図３とにおいて、変換部と逆変換部で適用する変換をフーリエ変換として説明したが、フーリエ変換に代えて、コサイン変換、アダマール変換、ハール変換、ウェーブレット変換など、他の変換も用いることができることは広く知られている。

To obtain the emphasized speech x _n (t) hat. The obtained enhanced speech x _n (t) hat (t = 0, 1,..., K−1) is transmitted to the output terminal 4 as the output of the frame synthesis unit 31. In FIG. 2 and FIG. 3, the transform applied by the transform unit and the inverse transform unit has been described as Fourier transform, but other transforms such as cosine transform, Hadamard transform, Haar transform, wavelet transform, etc. are used instead of Fourier transform. It is widely known that it can.

図４は、図１の雑音推定部３００の構成を示すブロック図である。雑音推定部３００は、推定雑音計算部３１０、重み付き劣化音声計算部３２０、及びカウンタ３３０から構成される。雑音推定部３００に供給された劣化音声パワースペクトルは、推定雑音計算部３１０、及び重み付き劣化音声計算部３２０に伝達される。重み付き劣化音声計算部３２０は、供給された劣化音声パワースペクトルと推定雑音パワースペクトルを用いて重み付き劣化音声パワースペクトルを計算し、推定雑音計算部３１０に伝達する。推定雑音計算部３１０は、劣化音声パワースペクトル、重み付き劣化音声パワースペクトル、及びカウンタ３３０から供給されるカウント値を用いて雑音のパワースペクトルを推定し、推定雑音パワースペクトルとして出力すると同時に、重み付き劣化音声計算部３２０に帰還する。なお、雑音推定部３０１は雑音推定部３００とまったく同じ構成とすることができる。入力である劣化音声スペクトルが仮出力スペクトルに対応するように、入力を供給する。その場合、動作も雑音推定部３００とまったく同じになるので、詳細な説明は省略する。 FIG. 4 is a block diagram illustrating a configuration of the noise estimation unit 300 of FIG. The noise estimation unit 300 includes an estimated noise calculation unit 310, a weighted deteriorated speech calculation unit 320, and a counter 330. The degraded speech power spectrum supplied to the noise estimator 300 is transmitted to the estimated noise calculator 310 and the weighted degraded speech calculator 320. The weighted deteriorated sound calculation unit 320 calculates a weighted deteriorated sound power spectrum using the supplied deteriorated sound power spectrum and the estimated noise power spectrum, and transmits the weighted deteriorated sound power spectrum to the estimated noise calculation unit 310. The estimated noise calculation unit 310 estimates the noise power spectrum using the degraded speech power spectrum, the weighted degraded speech power spectrum, and the count value supplied from the counter 330, and outputs the estimated noise power spectrum as well as the weighted weight spectrum. Return to the deteriorated speech calculator 320. Note that the noise estimation unit 301 can have the same configuration as the noise estimation unit 300. The input is supplied so that the degraded speech spectrum as an input corresponds to the temporary output spectrum. In that case, since the operation is exactly the same as that of the noise estimation unit 300, detailed description thereof is omitted.

図５は、図４に含まれる推定雑音計算部３１０の構成を示すブロック図である。更新判定部４００、レジスタ長記憶部４１０、推定雑音記憶部４２０、スイッチ４３０、シフトレジスタ４４０、加算器４５０、最小値選択部４６０、除算部４７０、カウンタ４８０を有する。スイッチ４３０には、重み付き劣化音声パワースペクトルが供給されている。スイッチ４３０が回路を閉じたときに、重み付き劣化音声パワースペクトルは、シフトレジスタ４４０に伝達される。シフトレジスタ４４０は、更新判定部４００から供給される制御信号に応じて、内部レジスタの記憶値を隣接レジスタにシフトする。シフトレジスタ長は、後述するレジスタ長記憶部４１０に記憶されている値に等しい。シフトレジスタ４４０の全レジスタ出力は、加算器４５０に供給される。加算器４５０は、供給された全レジスタ出力を加算して、加算結果を除算部４７０に伝達する。 FIG. 5 is a block diagram showing a configuration of estimated noise calculation section 310 included in FIG. The update determination unit 400 includes a register length storage unit 410, an estimated noise storage unit 420, a switch 430, a shift register 440, an adder 450, a minimum value selection unit 460, a division unit 470, and a counter 480. The switch 430 is supplied with a weighted degraded voice power spectrum. When switch 430 closes the circuit, the weighted degraded voice power spectrum is transmitted to shift register 440. The shift register 440 shifts the stored value of the internal register to the adjacent register in accordance with the control signal supplied from the update determination unit 400. The shift register length is equal to a value stored in a register length storage unit 410 described later. All register outputs of the shift register 440 are supplied to the adder 450. The adder 450 adds all the supplied register outputs and transmits the addition result to the division unit 470.

一方、更新判定部４００には、カウント値、周波数別劣化音声パワースペクトル及び周波数別推定雑音パワースペクトルが供給されている。更新判定部４００は、カウント値が予め設定された値に到達するまでは常に“１”を、到達した後は入力された劣化音声信号が雑音であると判定されたときに“１”'を、それ以外のときに“０”を出力し、カウンタ４８０、スイッチ４３０、及びシフトレジスタ４４０に伝達する。スイッチ４３０は、更新判定部から供給された信号が“１”のときに回路を閉じ、“０”のときに開く。カウンタ４８０は、更新判定部から供給された信号が“１”のときにカウント値を増加し、“０”のときには変更しない。シフトレジスタ４４０は、更新判定部から供給された信号が“１”のときにスイッチ４３０から供給される信号サンプルを１サンプル取り込むと同時に、内部レジスタの記憶値を隣接レジスタにシフトする。最小値選択部４６０には、カウンタ４８０の出力とレジスタ長記憶部４１０の出力が供給されている。 On the other hand, the update determination unit 400 is supplied with a count value, a frequency-specific degraded speech power spectrum, and a frequency-specific estimated noise power spectrum. The update determination unit 400 always sets “1” until the count value reaches a preset value, and after reaching the count value, sets “1” ′ when the input deteriorated speech signal is determined to be noise. In other cases, “0” is output and transmitted to the counter 480, the switch 430, and the shift register 440. The switch 430 closes the circuit when the signal supplied from the update determination unit is “1” and opens when the signal is “0”. The counter 480 increases the count value when the signal supplied from the update determination unit is “1”, and does not change when the signal is “0”. The shift register 440 captures one sample of the signal sample supplied from the switch 430 when the signal supplied from the update determination unit is “1”, and simultaneously shifts the stored value of the internal register to the adjacent register. The minimum value selection unit 460 is supplied with the output of the counter 480 and the output of the register length storage unit 410.

最小値選択部４６０は、供給されたカウント値とレジスタ長のうち、小さい方を選択して、除算部４７０に伝達する。除算部４７０は、加算器４５０から供給された劣化音声パワースペクトルの加算値をカウント値又はレジスタ長の小さい方の値で除算し、商を周波数別推定雑音パワースペクトルλ_ｎ（ｋ）として出力する。Ｂ_ｎ（ｋ）（ｎ＝０，１，．．．，Ｎ−１）をシフトレジスタ４４０に保存されている劣化音声パワースペクトルのサンプル値とすると、λ_ｎ（ｋ）は、 The minimum value selection unit 460 selects the smaller one of the supplied count value and register length and transmits the selected value to the division unit 470. The division unit 470 divides the addition value of the deteriorated speech power spectrum supplied from the adder 450 by the smaller value of the count value or the register length, and outputs the quotient as the estimated noise power spectrum λ _n (k) for each frequency. . If B _n (k) (n = 0, 1,..., N−1) is a sample value of the degraded speech power spectrum stored in the shift register 440, λ _n (k) is

で与えられる。ただし、Ｎはカウント値とレジスタ長のうち、小さい方の値である。カウント値はゼロから始まって単調に増加するので、最初はカウント値で除算が行なわれ、後にはレジスタ長で除算が行なわれる。レジスタ長で除算が行なわれることは、シフトレジスタに格納された値の平均値を求めることになる。最初は、シフトレジスタ４４０に十分多くの値が記憶されていないために、実際に値が記憶されているレジスタの数で除算する。実際に値が記憶されているレジスタの数は、カウント値がレジスタ長より小さいときはカウント値に等しく、カウント値がレジスタ長より大きくなると、レジスタ長と等しくなる。

Given in. However, N is the smaller value of the count value and the register length. Since the count value starts monotonically and increases monotonically, division is first performed by the count value, and thereafter division is performed by the register length. When division is performed by the register length, an average value of values stored in the shift register is obtained. At first, since not enough values are stored in the shift register 440, division is performed by the number of registers in which values are actually stored. The number of registers in which values are actually stored is equal to the count value when the count value is smaller than the register length, and equal to the register length when the count value is larger than the register length.

図６は、図５に含まれる更新判定部４００の構成を示すブロック図である。更新判定部４００は、論理和計算部４００１、比較部４００４、４００２、閾値記憶部４００５、４００３、閾値計算部４００６を有する。図４のカウンタ３３０から供給されるカウント値は、比較部４００２に伝達される。閾値記憶部４００３の出力である閾値も、比較部４００２に伝達される。比較部４００２は、供給されたカウント値と閾値を比較し、カウント値が閾値より小さいときに“１”を、カウント値が閾値より大きいときに“０”を、論理和計算部４００１に伝達する。一方、閾値計算部４００６は、図５の推定雑音記憶部４２０から供給される推定雑音パワースペクトルに応じた値を計算し、閾値として閾値記憶部４００５に出力する。最も簡単な閾値の計算方法は、推定雑音パワースペクトルの定数倍である。 FIG. 6 is a block diagram illustrating a configuration of the update determination unit 400 included in FIG. The update determination unit 400 includes a logical sum calculation unit 4001, comparison units 4004 and 4002, threshold storage units 4005 and 4003, and a threshold calculation unit 4006. The count value supplied from the counter 330 in FIG. 4 is transmitted to the comparison unit 4002. The threshold value that is the output of the threshold value storage unit 4003 is also transmitted to the comparison unit 4002. The comparison unit 4002 compares the supplied count value with a threshold value, and transmits “1” to the logical sum calculation unit 4001 when the count value is smaller than the threshold value and “0” when the count value is larger than the threshold value. . On the other hand, the threshold value calculation unit 4006 calculates a value corresponding to the estimated noise power spectrum supplied from the estimated noise storage unit 420 in FIG. 5 and outputs the value as a threshold value to the threshold value storage unit 4005. The simplest threshold calculation method is a constant multiple of the estimated noise power spectrum.

その他に、高次多項式や非線形関数を用いて閾値を計算することも可能である。閾値記憶部４００５は、閾値計算部４００６から出力された閾値を記憶し、１フレーム前に記憶された閾値を比較部４００４へ出力する。比較部４００４は、閾値記憶部４００５から供給される閾値と図１の変換部２から供給される劣化音声パワースペクトルを比較し、劣化音声パワースペクトルが閾値よりも小さければ“１”を、大きければ“０”を論理和計算部４００１に出力する。すなわち、推定雑音パワースペクトルの大きさをもとに、劣化音声信号が雑音であるか否かを判別している。論理和計算部４００１は、比較部４２０２の出力値と比較部４２０４の出力値との論理和を計算し、計算結果を図５のスイッチ４３０、シフトレジスタ４４０及びカウンタ４８０に出力する。このように、初期状態や無音区間だけでなく、有音区間でも劣化音声パワーが小さい場合には、更新判定部４００は“１”を出力する。すなわち、推定雑音の更新が行われる。閾値の計算は各周波数で行われるため、各周波数で推定雑音の更新を行うことができる。 In addition, it is possible to calculate the threshold value using a high-order polynomial or a nonlinear function. The threshold value storage unit 4005 stores the threshold value output from the threshold value calculation unit 4006 and outputs the threshold value stored one frame before to the comparison unit 4004. The comparison unit 4004 compares the threshold value supplied from the threshold value storage unit 4005 with the deteriorated sound power spectrum supplied from the conversion unit 2 in FIG. 1. If the deteriorated sound power spectrum is smaller than the threshold value, “1” is set. “0” is output to the logical sum calculation unit 4001. That is, it is determined whether or not the degraded speech signal is noise based on the magnitude of the estimated noise power spectrum. The logical sum calculation unit 4001 calculates the logical sum of the output value of the comparison unit 4202 and the output value of the comparison unit 4204, and outputs the calculation result to the switch 430, the shift register 440, and the counter 480 in FIG. As described above, the update determination unit 400 outputs “1” when the deteriorated voice power is small not only in the initial state and the silent period but also in the voiced period. That is, the estimated noise is updated. Since the threshold is calculated at each frequency, the estimated noise can be updated at each frequency.

図７は、重み付き劣化音声計算部３２０の構成を示すブロック図である。重み付き劣化音声計算部３２０は、推定雑音記憶部３２０１、周波数別ＳＮＲ計算部３２０２、非線形処理部３２０４、及び乗算器３２０３を有する。推定雑音記憶部３２０１は、図４の推定雑音計算部３１０から供給される推定雑音パワースペクトルを記憶し、１フレーム前に記憶された推定雑音パワースペクトルを周波数別ＳＮＲ計算部３２０２へ出力する。周波数別ＳＮＲ計算部３２０２は、推定雑音記憶部３２０１から供給される推定雑音パワースペクトルと図１の変換部２から供給される劣化音声パワースペクトルを用いてＳＮＲを周波数帯域毎に求め、非線形処理部３２０４に出力する。具体的には、次式に従って、供給された劣化音声パワースペクトルを推定雑音パワースペクトルで除算して周波数別ＳＮＲγ_ｎ（ｋ）ハットを求める。 FIG. 7 is a block diagram illustrating a configuration of the weighted deteriorated speech calculation unit 320. The weighted deteriorated speech calculation unit 320 includes an estimated noise storage unit 3201, a frequency-specific SNR calculation unit 3202, a nonlinear processing unit 3204, and a multiplier 3203. The estimated noise storage unit 3201 stores the estimated noise power spectrum supplied from the estimated noise calculation unit 310 in FIG. 4 and outputs the estimated noise power spectrum stored one frame before to the SNR calculation unit 3202 for each frequency. The frequency-specific SNR calculation unit 3202 obtains an SNR for each frequency band using the estimated noise power spectrum supplied from the estimated noise storage unit 3201 and the degraded speech power spectrum supplied from the conversion unit 2 in FIG. 3204 is output. Specifically, according to the following equation, the supplied degraded speech power spectrum is divided by the estimated noise power spectrum to obtain a frequency-specific SNRγ _n (k) hat.

ここに、λ_ｎ−１（ｋ）は１フレーム前に記憶された推定雑音パワースペクトルである。

Here, λ _n-1 (k) is an estimated noise power spectrum stored one frame before.

非線形処理部３２０４は、周波数別ＳＮＲ計算部３２０２から供給されるＳＮＲを用いて重み係数ベクトルを計算し、重み係数ベクトルを乗算器３２０３に出力する。乗算器３２０３は、図１の変換部２から供給される劣化音声パワースペクトルと、非線形処理部３２０４から供給される重み係数ベクトルの積を周波数帯域毎に計算し、重み付き劣化音声パワースペクトルを図４の推定雑音計算部３１０に出力する。 The nonlinear processing unit 3204 calculates a weight coefficient vector using the SNR supplied from the frequency-specific SNR calculation section 3202 and outputs the weight coefficient vector to the multiplier 3203. The multiplier 3203 calculates the product of the degraded speech power spectrum supplied from the conversion unit 2 in FIG. 1 and the weighting coefficient vector supplied from the nonlinear processing unit 3204 for each frequency band, and displays the weighted degraded speech power spectrum. 4 to the estimated noise calculation unit 310.

非線形処理部３２０４は、多重化された入力値それぞれに応じた実数値を出力する、非線形関数を有する。図８に、非線形関数の例を示す。ｆ_１を入力値としたとき、図８に示される非線形関数の出力値ｆ_２は、 The non-linear processing unit 3204 has a non-linear function that outputs a real value corresponding to each multiplexed input value. FIG. 8 shows an example of a nonlinear function. When f ₁ is an input value, the output value f ₂ of the nonlinear function shown in FIG.

で与えられる。但し、ａとｂは任意の実数である。

Given in. However, a and b are arbitrary real numbers.

非線形処理部３２０４は、周波数別ＳＮＲ計算部３２０２から供給される周波数帯域別ＳＮＲを、非線形関数によって処理して重み係数を求め、乗算器３２０３に伝達する。すなわち、非線形処理部３２０４はＳＮＲに応じた１から０までの重み係数を出力する。ＳＮＲが小さい時は１を、大きい時は０を出力する。 The non-linear processing unit 3204 processes the SNR by frequency band supplied from the SNR calculation unit by frequency 3202 by a non-linear function to obtain a weighting coefficient, and transmits the weight coefficient to the multiplier 3203. That is, the nonlinear processing unit 3204 outputs a weighting coefficient from 1 to 0 corresponding to the SNR. 1 is output when the SNR is small, and 0 is output when the SNR is large.

図７の乗算器３２０３で劣化音声パワースペクトルと乗算される重み係数は、ＳＮＲに応じた値になっており、ＳＮＲが大きい程、すなわち劣化音声に含まれる音声成分が大きい程、重み係数の値は小さくなる。推定雑音の更新には一般に劣化音声パワースペクトルが用いられるが、推定雑音の更新に用いる劣化音声パワースペクトルに対して、ＳＮＲに応じた重みづけを行うことで、劣化音声パワースペクトルに含まれる音声成分の影響を小さくすることができ、より精度の高い雑音推定を行うことができる。なお、重み係数の計算に非線形関数を用いた例を示したが、非線形関数以外にも線形関数や高次多項式など、他の形で表されるＳＮＲの関数を用いる事も可能である。 The weighting coefficient multiplied by the deteriorated sound power spectrum by the multiplier 3203 in FIG. 7 has a value corresponding to the SNR. The value of the weighting coefficient increases as the SNR increases, that is, the sound component included in the deteriorated sound increases. Becomes smaller. In general, a degraded speech power spectrum is used to update the estimated noise. However, a speech component included in the degraded speech power spectrum can be obtained by weighting the degraded speech power spectrum used to update the estimated noise according to the SNR. Can be reduced, and more accurate noise estimation can be performed. In addition, although the example which used the nonlinear function for the calculation of a weighting coefficient was shown, it is also possible to use the function of SNR represented with other forms, such as a linear function and a high-order polynomial, besides a nonlinear function.

図９は、図１に含まれる雑音抑圧係数生成部６０１の構成を示すブロック図である。図１６に示した雑音抑圧係数生成部６００の構成と比較すると、推定先天的ＳＮＲ計算部６２０の出力である推定先天的ＳＮＲが出力されない点が異なる。すなわち、雑音抑圧係数生成部６０１の出力は、抑圧係数だけである。 FIG. 9 is a block diagram illustrating a configuration of the noise suppression coefficient generation unit 601 included in FIG. Compared to the configuration of the noise suppression coefficient generation unit 600 shown in FIG. 16, the difference is that the estimated innate SNR that is the output of the estimated innate SNR calculation unit 620 is not output. That is, the output of the noise suppression coefficient generation unit 601 is only the suppression coefficient.

図１０は、図９に含まれる推定先天的ＳＮＲ計算部６２０の構成を示すブロック図である。推定先天的ＳＮＲ計算部６２０は、値域限定処理部６２０１、後天的ＳＮＲ記憶部６２０２、抑圧係数記憶部６２０３、乗算器６２０４、６２０５、重み記憶部６２０６、重み付き加算部６２０７、加算器６２０８を有する。図９の後天的ＳＮＲ計算部６１０から供給される後天的ＳＮＲγ_ｎ（ｋ）（ｋ＝０，１，．．．，Ｍ−１）は、後天的ＳＮＲ記憶部６２０２と加算器６２０８とに伝達される。後天的ＳＮＲ記憶部６２０５は、第ｎフレームにおける後天的ＳＮＲγ_ｎ（ｋ）を記憶すると共に、第ｎ−１フレームにおける後天的ＳＮＲγ_ｎ−１（ｋ）を乗算器６２０５に伝達する。図１の抑圧係数補正部６５１から供給される補正抑圧係数Ｇ_ｎ（ｋ）バー（ｋ＝０，１，．．．，Ｍ−１）は、抑圧係数記憶部６２０３に伝達される。抑圧係数記憶部６２０３は、第ｎフレームにおける補正抑圧係数Ｇ_ｎ（ｋ）バーを記憶すると共に、第ｎ−１フレームにおける補正抑圧係数Ｇ_ｎ−１（ｋ）バーを乗算器６２０４に伝達する。乗算器６２０４は、供給されたＧ_ｎ（ｋ）バーを２乗してＧ_ｎ−１ ^２（ｋ）バーを求め、乗算器６２０５に伝達する。乗算器６２０５は、Ｇ_ｎ−１ ^２（ｋ）バーとγ_ｎ−１（ｋ）とをｋ＝０，１，．．．，Ｍ−１に対して乗算して、Ｇ_ｎ−１ ^２（ｋ）バーγ_ｎ−１（ｋ）を求め、結果を重み付き加算部６２０７に過去の推定ＳＮＲ９２２として伝達する。 FIG. 10 is a block diagram illustrating a configuration of the estimated innate SNR calculation unit 620 included in FIG. The estimated innate SNR calculation unit 620 includes a range limitation processing unit 6201, an acquired SNR storage unit 6202, a suppression coefficient storage unit 6203, multipliers 6204 and 6205, a weight storage unit 6206, a weighted addition unit 6207, and an adder 6208. . The acquired SNRγ _n (k) (k = 0, 1,..., M−1) supplied from the acquired SNR calculation unit 610 in FIG. 9 is transmitted to the acquired SNR storage unit 6202 and the adder 6208. Is done. The acquired SNR storage unit 6205 stores the acquired SNRγ _n (k) in the nth frame and transmits the acquired SNRγ _n−1 (k) in the _n− 1th frame to the multiplier 6205. The corrected suppression coefficient G _n (k) bar (k = 0, 1,..., M−1) supplied from the suppression coefficient correction unit 651 in FIG. 1 is transmitted to the suppression coefficient storage unit 6203. The suppression coefficient storage unit 6203 stores the corrected suppression coefficient G _n (k) bar in the nth frame and transmits the corrected suppression coefficient G _n−1 (k) bar in the _n− 1th frame to the multiplier 6204. The multiplier 6204 squares the supplied G _n (k) bar to obtain a G _n−1 ² (k) bar, and transmits it to the multiplier 6205. The multiplier 6205 converts G _n−1 ² (k) bar and γ _n−1 (k) to k = 0, 1,. . . , M−1 to obtain G _n−1 ² (k) bar γ _n−1 (k), and the result is transmitted to the weighted adder 6207 as the past estimated SNR 922.

加算器６２０８の他方の端子には−１が供給されており、加算結果γ_ｎ（ｋ）−１が値域限定処理部６２０１に伝達される。値域限定処理部６２０１は、加算器６２０８から供給された加算結果γ_ｎ（ｋ）−１に値域限定演算子Ｐ［・］による演算を施し、結果であるＰ［γ_ｎ（ｋ）−１］を重み付き加算部６２０７に瞬時推定ＳＮＲ９２１として伝達する。ただし、Ｐ［ｘ］は次式で定められる。 The other terminal of the adder 6208 is supplied with −1, and the addition result γ _n (k) −1 is transmitted to the range limitation processing unit 6201. The range limitation processing unit 6201 performs an operation using the range limitation operator P [•] on the addition result γ _n (k) −1 supplied from the adder 6208, and the result P [γ _n (k) −1]. Is transmitted to the weighted addition unit 6207 as the instantaneous estimated SNR 921. However, P [x] is defined by the following equation.

重み付き加算部６２０７には、また、重み記憶部６２０６から重み９２３が供給されている。重み付き加算部６２０７は、これらの供給された瞬時推定ＳＮＲ９２１、過去の推定ＳＮＲ９２２、重み９２３を用いて推定先天的ＳＮＲ９２４を求める。重み９２３をαとし、ξ_ｎ（ｋ）ハットを推定先天的ＳＮＲとすると、ξ_ｎ（ｋ）ハットは、次式によって計算される。

The weighted addition unit 6207 is also supplied with the weight 923 from the weight storage unit 6206. The weighted addition unit 6207 obtains an estimated innate SNR 924 using the supplied instantaneous estimated SNR 921, past estimated SNR 922, and weight 923. If the weight 923 is α and ξ _n (k) hat is the estimated innate SNR, ξ _n (k) hat is calculated by the following equation.

ここに、Ｇ_ｎ−１ ^２（ｋ）γ_−１（ｋ）バー＝１とする。

Here, it is assumed that G _n-1 ² (k) γ ₋₁ (k) bar = 1.

図１１は、図１０に含まれる重み付き加算部６２０７の構成を示すブロック図である。重み付き加算部６２０７は、乗算器６９０１、６９０３、定数乗算器６９０５、加算器６９０２、６９０４を有する。図１０の値域限定処理部６２０１から周波数帯域別瞬時推定ＳＮＲが、図１０の乗算器６２０５から過去の周波数帯域別ＳＮＲが、図１０の重み記憶部６２０６から重みが、それぞれ入力として供給される。値αを有する重みは、定数乗算器６９０５と乗算器６９０３に伝達される。定数乗算器６９０５は入力信号を−１倍して得られた−αを、加算器６９０４に伝達する。加算器６９０４のもう一方の入力としては１が供給されており、加算器６９０４の出力は両者の和である１−αとなる。１−αは乗算器６９０１に供給されて、もう一方の入力である周波数帯域別瞬時推定ＳＮＲＰ［γ_ｎ（ｋ）−１］と乗算され、積である（１−α）Ｐ［γ_ｎ（ｋ）−１］が加算器６９０２に伝達される。一方、乗算器６９０３では、重みとして供給されたαと過去の推定ＳＮＲが乗算され、積であるαＧ_ｎ−１ ^２（ｋ）バーγ_ｎ−１（ｋ）が加算器６９０２に伝達される。加算器６９０２は、（１−α）Ｐ［γ_ｎ（ｋ）−１］とαＧ_ｎ−１ ^２（ｋ）バーγ_ｎ−１（ｋ）の和を、周波数帯域別推定先天的ＳＮＲとして、出力する。 FIG. 11 is a block diagram illustrating a configuration of the weighted addition unit 6207 included in FIG. The weighted addition unit 6207 includes multipliers 6901 and 6903, a constant multiplier 6905, and adders 6902 and 6904. The frequency range instantaneous estimation SNR is supplied from the range limitation processing unit 6201 in FIG. 10, the past SNR by frequency band is supplied from the multiplier 6205 in FIG. 10, and the weight is supplied from the weight storage unit 6206 in FIG. The weight having the value α is transmitted to the constant multiplier 6905 and the multiplier 6903. The constant multiplier 6905 transmits -α obtained by multiplying the input signal by −1 to the adder 6904. 1 is supplied as the other input of the adder 6904, and the output of the adder 6904 is 1-α which is the sum of the two. 1-α is supplied to a multiplier 6901 and is multiplied by the other input instantaneous frequency band estimated SNRP [γ _n (k) −1], which is the product, (1-α) P [γ _n ( k) −1] is transmitted to the adder 6902. On the other hand, the multiplier 6903 multiplies α supplied as the weight by the estimated SNR in the past, and transmits the product αG _n−1 ² (k) bar γ _n−1 (k) to the adder 6902. The adder 6902 uses the sum of (1-α) P [γ _n (k) −1] and αG _n−1 ² (k) bar γ _n−1 (k) as an estimated innate SNR for each frequency band, Output.

図１２は、図９に含まれる雑音抑圧係数生成部６３０を示すブロック図である。雑音抑圧係数生成部６３０は、ＭＭＳＥＳＴＳＡゲイン関数値計算部６３０１、一般化尤度比計算部６３０２、及び抑圧係数計算部６３０３を有する。以下、非特許文献３（１９８４年１２月、アイ・イー・イー・イー・トランザクションズ・オン・アクースティクス・スピーチ・アンド・シグナル・プロセシング、第３２巻、第６号（IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING,VOL.32, NO.6, PP.1109-1121, DEC, 1984）、１１０９〜１１２１ページ）に記載されている計算式をもとに、抑圧係数の計算方法を説明する。 FIG. 12 is a block diagram showing the noise suppression coefficient generation unit 630 included in FIG. The noise suppression coefficient generation unit 630 includes an MMSE STSA gain function value calculation unit 6301, a generalized likelihood ratio calculation unit 6302, and a suppression coefficient calculation unit 6303. Non-Patent Document 3 (December 1984, IEE Transactions on Acoustics Speech and Signal Processing, Vol. 32, No. 6 (IEEE TRANSACTIONS ON ACOUSTICS, SPEECH , AND SIGNAL PROCESSING, VOL.32, NO.6, PP.1109-1121, DEC, 1984), pages 1109 to 1121), the calculation method of the suppression coefficient will be described.

フレーム番号をｎ、周波数番号をｋとし、γ_ｎ（ｋ）を図９の後天的ＳＮＲ計算部６１０から供給される周波数別後天的ＳＮＲ、ξ_ｎ（ｋ）ハットを図９の推定先天的ＳＮＲ計算部６２０から供給される周波数別推定先天的ＳＮＲ、ｑを図９の音声非存在確率記憶部６４０から供給される音声非存在確率とする。 The frame number is n, the frequency number is k, γ _n (k) is the acquired SNR by frequency supplied from the acquired SNR calculation unit 610 in FIG. 9, and ξ _n (k) is the estimated innate SNR in FIG. The frequency-specific estimated innate SNR, q supplied from the calculation unit 620 is set as the speech non-existence probability supplied from the speech non-existence probability storage unit 640 of FIG.

また、η_ｎ（ｋ）＝ξ_ｎ（ｋ）ハット／（１−ｑ）、ｖ_ｎ（ｋ）＝（η_ｎ（ｋ）γ_ｎ（ｋ））／（１＋η_ｎ（ｋ））とする。 Further, η _n (k) = ξ _n (k) hat / (1-q), v _n (k) = (η _n (k) γ _n (k)) / (1 + η _n (k)).

ＭＭＳＥＳＴＳＡゲイン関数値計算部６３０１は、図９の後天的ＳＮＲ計算部６１０から供給される後天的ＳＮＲγ_ｎ（ｋ）、図９の推定先天的ＳＮＲ計算部６２０から供給される推定先天的ＳＮＲξ_ｎ（ｋ）ハット及び図９の音声非存在確率記憶部６４０から供給される音声非存在確率ｑをもとに、周波数帯域毎にＭＭＳＥＳＴＳＡゲイン関数値を計算し、抑圧係数計算部６３０３に出力する。周波数帯域毎のＭＭＳＥＳＴＳＡゲイン関数値Ｇ_ｎ（ｋ）は、 The MMSE STSA gain function value calculation unit 6301 includes the acquired SNRγ _n (k) supplied from the acquired SNR calculation unit 610 in FIG. 9, and the estimated innate SNR ξ _n supplied from the estimated innate SNR calculation unit 620 in FIG. (K) Based on the hat and the speech non-existence probability q supplied from the speech non-existence probability storage unit 640 of FIG. 9, the MMSE STSA gain function value is calculated for each frequency band, and is output to the suppression coefficient calculation unit 6303. . The MMSE STSA gain function value G _n (k) for each frequency band is

で与えられる。ここに、Ｉ_０（ｚ）は０次変形ベッセル関数、Ｉ_１（ｚ）は１次変形ベッセル関数である。変形ベッセル関数については、非特許文献４（１９８５年、数学辞典、岩波書店、３７４．Ｇページ）に記載されている。

Given in. Here, I ₀ (z) is a zero-order modified Bessel function, and I ₁ (z) is a first-order modified Bessel function. The modified Bessel function is described in Non-Patent Document 4 (1985, Mathematical Dictionary, Iwanami Shoten, page 374. G).

一般化尤度比計算部６３０２は、図９の後天的ＳＮＲ計算部６１０から供給される後天的ＳＮＲγ_ｎ（ｋ）、図９の推定先天的ＳＮＲ計算部６２０から供給される推定先天的ＳＮＲξ_ｎ（ｋ）ハット及び図９の音声非存在確率記憶部６４０から供給される音声非存在確率ｑをもとに、周波数帯域毎に一般化尤度比を計算し、抑圧係数計算部６３０３に伝達する。周波数帯域毎の一般化尤度比Λ_ｎ（ｋ）は、 The generalized likelihood ratio calculator 6302 obtains the acquired SNRγ _n (k) supplied from the acquired SNR calculator 610 in FIG. 9 and the estimated innate SNR ξ _n supplied from the estimated innate SNR calculator 620 in FIG. (K) A generalized likelihood ratio is calculated for each frequency band based on the hat and the speech nonexistence probability q supplied from the speech nonexistence probability storage unit 640 of FIG. 9 and is transmitted to the suppression coefficient calculation unit 6303. . The generalized likelihood ratio Λ _n (k) for each frequency band is

で与えられる。

Given in.

抑圧係数計算部６３０３は、ＭＭＳＥＳＴＳＡゲイン関数値計算部６３０１から供給されるＭＭＳＥＳＴＳＡゲイン関数値Ｇ_ｎ（ｋ）と一般化尤度比計算部６３０２から供給される一般化尤度比Λ_ｎ（ｋ）から周波数帯域毎に抑圧係数を計算し、図１の抑圧係数補正部６５１へ出力する。周波数帯域毎の抑圧係数Ｇ_ｎ（ｋ）バーは、 The suppression coefficient calculation unit 6303 receives the MMSE STSA gain function value G _n (k) supplied from the MMSE STSA gain function value calculation unit 6301 and the generalized likelihood ratio Λ _n (supplied from the generalized likelihood ratio calculation unit 6302. k), the suppression coefficient is calculated for each frequency band, and is output to the suppression coefficient correction unit 651 in FIG. The suppression coefficient G _n (k) bar for each frequency band is

で与えられる。周波数帯域別にＳＮＲを計算する代わりに、複数の周波数帯域から構成される広い帯域に共通なＳＮＲを求めて、これを用いることも可能である。

Given in. Instead of calculating the SNR for each frequency band, an SNR common to a wide band composed of a plurality of frequency bands can be obtained and used.

図１３に、抑圧係数補正部６５１の構成例を示す。抑圧係数補正部６５１は、抑圧係数下限値計算部６５１２と最大値選択部６５１１とを含む。抑圧係数下限値計算部６５１２には、仮出力ＳＮＲξｎ_Ｌ（ｋ）と音声存在確率Ｖ_ｎが供給されている。抑圧係数下限値計算部６５１２は、次式に基づいて、関数Ａ（ξ_ｎ ^Ｌ（ｋ））と音声区間に対応した抑圧係数最小値ｆ_ｓを用いて、抑圧係数の下限値Ａｍｉｎ（Ｖ_ｎ，ξ_ｎ ^Ｌ（ｋ））を計算し、最大値選択部６５１１に伝達する。 FIG. 13 shows a configuration example of the suppression coefficient correction unit 651. The suppression coefficient correction unit 651 includes a suppression coefficient lower limit value calculation unit 6512 and a maximum value selection unit 6511. The suppression coefficient lower limit value calculation unit 6512 is supplied with the temporary output SNRξn _L (k) and the voice existence probability V _n . Based on the following equation, the suppression coefficient lower limit value calculation unit 6512 uses the function A (ξ _n ^L (k)) and the suppression coefficient minimum value f _s corresponding to the speech section, and uses the suppression coefficient lower limit value Amin (V _n , Ξ _n ^L (k)) is transmitted to the maximum value selector 6511.

関数Ａ（ξ_ｎ ^Ｌ（ｋ））は基本的に、大きなＳＮＲに対して小さな値をとるような形状を有する。Ａ（ξ_ｎ ^Ｌ（ｋ））が仮出力ＳＮＲξ_ｎ ^Ｌ（ｋ）に対応してこのような形状をとる関数であることは、仮出力ＳＮＲが高いほど、非音声区間に対応する抑圧係数の下限値が小さくなることを意味する。これは、残留雑音が小さくなることに対応し、音声区間と非音声区間の音質不連続性を低減する効果がある。なお、関数Ａ（ξ_ｎ ^Ｌ（ｋ））は全ての周波数成分に対して異なっていてもよいし、複数の周波数成分に対して共有されていてもよい。また、時間と共にその形状が変化することも可能である。

The function A (ξ _n ^L (k)) basically has a shape that takes a small value for a large SNR. A possible ^(ξ _{n L} ^(k)) is a function that takes such a shape corresponding to the temporary output SNRξ _{n L} ^(k), the more the temporary output SNR is high, the suppression coefficient corresponding to the non-speech section It means that the lower limit value becomes smaller. This corresponds to the reduction of the residual noise, and has the effect of reducing the sound quality discontinuity between the speech section and the non-speech section. Note that the function A (ξ _n ^L (k)) may be different for all frequency components, or may be shared for a plurality of frequency components. It is also possible for the shape to change over time.

最大値計算部６５１１は、雑音抑圧係数計算部６３０から受けた抑圧係数Ｇ_ｎ（ｋ）バーと抑圧係数下限値計算部６５１２を比較して、大きいほうの値を補正抑圧係数Ｇ_ｎ（ｋ）ハットとして出力する。この処理は、次式で表すことができる。 Maximum value calculation section 6511 compares suppression coefficient G _n (k) bar received from noise suppression coefficient calculation section 630 with suppression coefficient lower limit calculation section 6512, and sets the larger value as corrected suppression coefficient G _n (k). Output as a hat. This process can be expressed by the following equation.

すなわち、完全に音声区間と思われる場合はｆ_ｓが、完全に非音声区間と思われる場合は仮出力ＳＮＲξ_ｎ ^Ｌ（ｋ）に応じて単調減少関数で定められる値が、抑圧係数最小値となる。両者の中間と思われる状況では、これらの値が適切に混合される。Ａ（ξ_ｎ ^Ｌ（ｋ））の単調減少性によって、低ＳＮＲ時の大きな抑圧係数最小値が保証され、消し残し雑音の多い直前の音声区間からの連続性が保たれる。高ＳＮＲでは、抑圧係数最小値が小さくなり、残留雑音が小さくなるように制御される。これは、音声区間の残留雑音が無視できる程度に小さいので、非音声区間の残留雑音が小さいときも、連続性が保たれるためである。また、ｆ_ｓをＡ（ξ_ｎ ^Ｌ（ｋ））よりも大きく設定することによって、音声区間あるいはその可能性が高い場合に雑音抑圧が軽度になり、音声に生じる歪を低減することができる。これは、符号化・復号によって生じる歪の混入した音声において雑音推定精度が十分に高くできない場合に、特に有効である。

That is, fully f _s If you think that speech interval, if you think that the completely non-speech section value determined by a monotonically decreasing function depending on the temporary output SNRξ _{n L} ^(k) is a suppression coefficient minimum value Become. In situations that seem to be in between, these values are mixed appropriately. Due to the monotonic decreasing property of A (ξ _n ^L (k)), a large suppression coefficient minimum value at the time of low SNR is guaranteed, and continuity from the immediately preceding speech section with much unerased noise is maintained. At high SNR, control is performed so that the minimum value of the suppression coefficient becomes small and the residual noise becomes small. This is because the residual noise in the speech section is so small that it can be ignored, and continuity is maintained even when the residual noise in the non-speech section is small. In addition, by setting f _s to be larger than A (ξ _n ^L (k)), noise suppression becomes mild when the speech section or the possibility thereof is high, and distortion generated in the speech can be reduced. This is particularly effective when the noise estimation accuracy cannot be sufficiently high in speech mixed with distortion caused by encoding / decoding.

なお、これまでの実施の形態では、特許文献１に従って、各周波数成分に対して独立に、抑圧係数を計算し、それを用いて雑音抑圧を行う例について説明してきた。しかし、演算量を削減するために、非特許文献１に開示されているように、複数の周波数成分に対して共通の抑圧係数を計算し、それを用いて雑音抑圧を行うこともできる。その場合は、図１の変換部２と雑音推定部３００及び雑音抑圧係数生成部６０１の間に帯域統合部を具備する構成となる。 In the embodiments described so far, according to Patent Document 1, an example in which a suppression coefficient is calculated independently for each frequency component and noise suppression is performed using the same has been described. However, in order to reduce the amount of calculation, as disclosed in Non-Patent Document 1, a common suppression coefficient can be calculated for a plurality of frequency components, and noise suppression can be performed using the same. In that case, a band integration unit is provided between the conversion unit 2, the noise estimation unit 300, and the noise suppression coefficient generation unit 601 in FIG.

さらに、非特許文献１にあるように、図１の変換部２の前にオフセット消去部を、変換部２の直後に振幅補正部と位相補正部を具備することにより、周波数領域で高域通過フィルタを形成することもでき、演算量を削減することができる。また、複数の周波数成分に対して共通の抑圧係数を計算する際に、特定の周波数帯域に対応した雑音推定値を補正することもできる。 Further, as described in Non-Patent Document 1, an offset elimination unit is provided in front of the conversion unit 2 in FIG. 1, and an amplitude correction unit and a phase correction unit are provided immediately after the conversion unit 2. A filter can also be formed, and the amount of calculation can be reduced. In addition, when calculating a common suppression coefficient for a plurality of frequency components, it is possible to correct a noise estimation value corresponding to a specific frequency band.

図１４は、本発明の第２の実施の形態に基づく信号処理装置のブロック図である。本発明の第２の実施形態は、プログラム制御により動作するコンピュータ（中央処理装置；プロセッサ；データ処理装置）１０００と、入力端子１及び出力端子４とから構成されている。コンピュータ１０００は、変換部２、乗算器５、逆変換部３、雑音推定部３００、雑音抑圧係数生成部６０１、乗算器６６０、音声存在確率計算部６７０、仮出力ＳＮＲ計算部６８０、抑圧係数補正部６５１を含む。 FIG. 14 is a block diagram of a signal processing device according to the second embodiment of the present invention. The second embodiment of the present invention includes a computer (central processing unit; processor; data processing unit) 1000 that operates by program control, and an input terminal 1 and an output terminal 4. The computer 1000 includes a conversion unit 2, a multiplier 5, an inverse conversion unit 3, a noise estimation unit 300, a noise suppression coefficient generation unit 601, a multiplier 660, a speech existence probability calculation unit 670, a temporary output SNR calculation unit 680, and a suppression coefficient correction. Part 651.

入力端子１に供給された劣化音声は、コンピュータ１０００内の変換部２においてフーリエ変換などの変換を施して複数の周波数成分に分割され、雑音推定部３００、雑音抑圧係数生成部６０１、乗算器６６０及び乗算器５へ供給される。位相は、逆変換部３に伝達される。雑音推定部３００は、劣化音声パワースペクトルの中に含まれる雑音のパワースペクトルを複数の周波数成分それぞれに対して推定し、雑音抑圧係数生成部６０１、音声存在確率計算部６７０、仮出力ＳＮＲ計算部６８０に伝達する。雑音抑圧係数生成部６０１は、劣化音声パワースペクトルと推定雑音パワースペクトルを用いて抑圧係数を生成し、乗算器６６０と抑圧係数補正部６５１とに供給する。乗算器６６０は、劣化音声パワースペクトルと抑圧係数の積を仮出力として求め、音声存在確率計算部670と仮出力SNR計算部６８０に供給する。音声存在確率計算部６７０は、仮出力と推定雑音から音声存在確率を求めて、仮出力ＳＮＲ計算部６８０と抑圧係数補正部６５１とに供給する。仮出力ＳＮＲ計算部６８０は、音声存在確率を用いて、仮出力と推定雑音から仮出力ＳＮＲを求め、抑圧係数補正部６５１に供給する。抑圧係数補正部６５１は、仮出力ＳＮＲ、音声存在確率、及び抑圧係数を用いて補正抑圧係数を求め、これを乗算器５に供給すると同時に雑音抑圧係数生成部６０１に帰還する。乗算器５は、変換部２から供給された劣化音声と抑圧係数補正部６５１から供給された補正抑圧係数を各周波数で乗算し、その積を強調音声のパワースペクトルとして逆変換部３に伝達する。逆変換部３は、乗算器５から供給された強調音声パワースペクトルと変換部２から供給された劣化音声の位相を合わせて逆変換を行い、強調音声信号サンプルとして、出力端子４に供給する。 The deteriorated speech supplied to the input terminal 1 is subjected to transform such as Fourier transform in the transform unit 2 in the computer 1000 and is divided into a plurality of frequency components, and a noise estimation unit 300, a noise suppression coefficient generation unit 601, and a multiplier 660. And supplied to the multiplier 5. The phase is transmitted to the inverse conversion unit 3. The noise estimation unit 300 estimates the noise power spectrum included in the degraded speech power spectrum for each of a plurality of frequency components, and generates a noise suppression coefficient generation unit 601, a speech existence probability calculation unit 670, and a temporary output SNR calculation unit. 680. The noise suppression coefficient generation unit 601 generates a suppression coefficient using the degraded speech power spectrum and the estimated noise power spectrum, and supplies the suppression coefficient to the multiplier 660 and the suppression coefficient correction unit 651. Multiplier 660 obtains the product of the degraded speech power spectrum and the suppression coefficient as a temporary output, and supplies it to speech presence probability calculation section 670 and temporary output SNR calculation section 680. The voice presence probability calculation unit 670 obtains a voice presence probability from the temporary output and the estimated noise, and supplies it to the temporary output SNR calculation unit 680 and the suppression coefficient correction unit 651. The temporary output SNR calculation unit 680 obtains a temporary output SNR from the temporary output and the estimated noise using the speech existence probability, and supplies the calculated temporary output SNR to the suppression coefficient correction unit 651. The suppression coefficient correction unit 651 obtains a corrected suppression coefficient using the temporary output SNR, the speech existence probability, and the suppression coefficient, supplies this to the multiplier 5 and simultaneously feeds back to the noise suppression coefficient generation unit 601. The multiplier 5 multiplies the deteriorated speech supplied from the conversion unit 2 and the correction suppression coefficient supplied from the suppression coefficient correction unit 651 by each frequency, and transmits the product to the inverse conversion unit 3 as the power spectrum of the emphasized speech. . The inverse conversion unit 3 performs inverse conversion by matching the phase of the enhanced speech power spectrum supplied from the multiplier 5 and the deteriorated speech supplied from the conversion unit 2 and supplies the result to the output terminal 4 as an enhanced speech signal sample.

このような構成で動作させることによって、本実施の形態では、音声区間では低歪を優先した抑圧を、非音声区間では低残留雑音を優先した抑圧を行い、さらに音声区間と非音声区間の残留雑音レベルに不連続が生じないように、抑圧係数を設定する。このため、非音声区間における小さな残留雑音と音声区間における小さな出力歪を両立し、さらに両者の境界において不連続が生じない、高音質な強調音声を出力することができる。 By operating in such a configuration, in the present embodiment, suppression with priority given to low distortion is performed in the speech segment, suppression with priority to low residual noise is performed in the non-speech segment, and residual in the speech segment and the non-speech segment is further performed. A suppression coefficient is set so that discontinuity does not occur in the noise level. For this reason, it is possible to output high-quality emphasized speech in which a small residual noise in a non-speech segment and a small output distortion in a speech segment are compatible, and discontinuity does not occur at the boundary between the two.

これまで説明した全ての実施の形態では、雑音抑圧の方式として、最小平均２乗誤差短時間スペクトル振幅法を仮定してきたが、その他の方法にも適用することができる。このような方法の例として、非特許文献５（１９７９年１２月、プロシーディングス・オブ・ザ・アイ・イー・イー・イー、第６７巻、第１２号（PROCEEDINGS OF THE IEEE, VOL.67, NO.12, PP.1586-1604, DEC, 1979）、１５８６〜１６０４ページ）に開示されているウィーナーフィルタ法や、非特許文献６（１９７９年４月、アイ・イー・イー・イー・トランザクションズ・オン・アクースティクス・スピーチ・アンド・シグナル・プロセシング、第２７巻、第２号（IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL.27, NO.2, PP.113-120, APR, 1979）、１１３〜１２０ページ）に開示されているスペクトル減算法などがあるが、これらの詳細な構成例については説明を省略する。 In all the embodiments described so far, the minimum mean square error short-time spectrum amplitude method has been assumed as a noise suppression method, but it can also be applied to other methods. As an example of such a method, Non-Patent Document 5 (December 1979, Proceedings of the IEE, Vol. 67, No. 12 (PROCEEDINGS OF THE IEEE, VOL.67, NO.12, PP.1586-1604, DEC, 1979), pages 1586 to 1604), and Non-Patent Document 6 (April 1979, IEE Transactions) On-Acoustics Speech and Signal Processing, Vol. 27, No. 2 (IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL.27, NO.2, PP.113-120, APR, 1979 ), Pages 113 to 120), and the like. However, description of these detailed configuration examples will be omitted.

以上好ましい実施の形態及び態様をあげて本発明を説明したが、本発明は必ずしも上記実施の形態、実施例及び態様に限定されるものではなく、その技術的思想の範囲内において様々に変形し実施することが出来る。 Although the present invention has been described with reference to the preferred embodiments and aspects, the present invention is not necessarily limited to the above-described embodiments, examples, and aspects, and various modifications may be made within the scope of the technical idea. Can be implemented.

１入力端子
２変換部
３逆変換部
４出力端子
５，６６０，３２０３，６２０４，６２０５，６９０１，６９０３，６５０７乗算器
２１フレーム分割部
２２，３２窓がけ処理部
２３フーリエ変換部
３１フレーム合成部
３３逆フーリエ変換部
３００,３０１雑音推定部
３１０推定雑音計算部
３２０重み付き劣化音声計算部
３３０，４８０カウンタ
４００更新判定部
４１０レジスタ長記憶部
４２０，３２０１推定雑音記憶部
４３０，６５０５スイッチ
４４０シフトレジスタ
４５０，６２０８，６９０２, ６９０４加算器
４６０最小値選択部
４７０除算部
６００，６０１雑音抑圧係数生成部
６１０後天的ＳＮＲ計算部
６２０推定先天的ＳＮＲ計算部
６３０雑音抑圧係数計算部
６４０音声非存在確率記憶部
６５０，６５１抑圧係数補正部
６７０音声存在確率計算部
６８０仮出力ＳＮＲ計算部
１０００コンピュータ
３２０２周波数別ＳＮＲ計算部
３２０４非線形処理部
４００１論理和計算部
４００２，４００４，６５０４比較部
４００３，４００５，６５０３閾値記憶部
４００６閾値計算部
６２０１値域限定処理部
６２０２後天的ＳＮＲ記憶部
６２０３抑圧係数記憶部
６２０６重み記憶部
６２０７重み付き加算部
６３０１ＭＭＳＥＳＴＳＡゲイン関数値計算部
６３０２一般化尤度比計算部
６３０３抑圧係数計算部
６５０１最大値選択部
６５０２抑圧係数下限値記憶部
６５０６修正値記憶部
６５１１最大値選択部
６５１２抑圧係数下限値計算部
６９０５定数乗算器 DESCRIPTION OF SYMBOLS 1 Input terminal 2 Conversion part 3 Inverse conversion part 4 Output terminal 5,660,3203,6204,6205,6901,6903,6507 Multiplier 21 Frame division | segmentation part 22,32 Window processing part
23 Fourier transform unit 31 Frame synthesis unit 33 Inverse Fourier transform unit 300, 301 Noise estimation unit 310 Estimated noise calculation unit 320 Weighted degraded speech calculation unit 330, 480 Counter 400 Update determination unit 410 Register length storage unit 420, 3201 Estimated noise storage Unit 430, 6505 switch 440 shift register 450, 6208, 6902, 6904 adder 460 minimum value selection unit 470 division unit 600, 601 noise suppression coefficient generation unit 610 acquired SNR calculation unit 620 estimated innate SNR calculation unit 630 noise suppression coefficient Calculation unit 640 Speech non-existence probability storage unit 650, 651 Suppression coefficient correction unit 670 Speech presence probability calculation unit 680 Temporary output SNR calculation unit 1000 Computer 3202 Frequency-specific SNR calculation unit 3204 Non-linear processing unit 4001 Logical sum calculation unit 4002, 4004 6504 Comparison unit 4003, 4005, 6503 Threshold storage unit 4006 Threshold calculation unit 6201 Range limitation processing unit 6202 Acquired SNR storage unit 6203 Suppression coefficient storage unit 6206 Weight storage unit 6207 Weighted addition unit 6301 MMSE STSA gain function value calculation unit 6302 General Likelihood ratio calculation unit 6303 Suppression coefficient calculation unit 6501 Maximum value selection unit 6502 Suppression coefficient lower limit value storage unit 6506 Modified value storage unit 6511 Maximum value selection unit 6512 Suppression coefficient lower limit value calculation unit 6905 Constant multiplier

Claims

Convert the input signal to a frequency domain signal,
Determining a first estimated noise using the frequency domain signal;
A suppression coefficient is determined using the first estimated noise and the frequency domain signal,
Weighting the frequency domain signal with the suppression coefficient to obtain a temporary output,
A second estimated noise is obtained using the temporary output,
An SNR of the temporary output is obtained using the second estimated noise,
Correcting the suppression coefficient so as to reduce residual noise when the temporary output SNR is low so as to reduce distortion when the temporary output SNR is high, and obtaining a corrected suppression coefficient;
A noise suppression method comprising suppressing noise by weighting the frequency domain signal with the correction suppression coefficient.

Find the ratio of the average power when the temporary output SNR is high and the average power when the temporary output SNR is low ,
2. The noise suppression method according to claim 1, wherein the correction suppression coefficient is obtained so that residual noise when the temporary output SNR is low becomes small when the ratio value is large.

Find the ratio of the average power when the temporary output SNR is high and the average power when the temporary output SNR is low ,
3. The noise suppression method according to claim 1, wherein the correction suppression coefficient is obtained so that residual noise becomes large when the temporary output SNR is low when the ratio value is small.

Convert the input signal to a frequency domain signal,
Using the frequency domain signal to estimate a first audio content in the input signal;
Using the suppression coefficient determined in accordance with the first speech content rate, the noise included in the input signal is suppressed to obtain enhanced speech,
Estimating noise remaining in the enhanced speech ;
Using the residual noise to determine a second speech content in the enhanced speech;
It is included in the input signal so that distortion of the emphasized sound is reduced when the second sound content rate is high, and noise remaining in the emphasized sound is reduced when the second sound content rate is low. A method of noise suppression, characterized in that enhanced speech is obtained by suppressing noise.

A converter for converting an input signal into a frequency domain signal;
A first noise estimator for obtaining a first estimated noise using the frequency domain signal;
A noise suppression coefficient generator that determines a suppression coefficient using the first estimated noise and the frequency domain signal;
A first multiplier for obtaining a temporary output by weighting the frequency domain signal with the suppression coefficient;
A second noise estimation unit for obtaining a second estimated noise using the temporary output;
A temporary output SNR calculation unit for obtaining an SNR of the temporary output using the second estimated noise;
A suppression coefficient correction unit that corrects the suppression coefficient to obtain a corrected suppression coefficient so that distortion is reduced when the SNR of the temporary output is high and residual noise is reduced when the SNR of the temporary output is small ; ,
And a second multiplier that suppresses noise by weighting the frequency domain signal with the correction suppression coefficient.

The suppression coefficient correction unit
Furthermore, the ratio of the average power when the temporary output SNR is high and the average power when the temporary output SNR is low is obtained,
The noise suppression apparatus according to claim 5, wherein the suppression coefficient is corrected so that residual noise is reduced when the temporary output SNR is low when the ratio value is large.

The suppression coefficient correction unit
Furthermore, the ratio of the average power when the temporary output SNR is high and the average power when the temporary output SNR is low is obtained,
The noise suppression device according to claim 5 or 6, wherein the suppression coefficient is corrected so that residual noise becomes large when the temporary output SNR is low when the ratio value is small.

On the computer,
Convert the input signal to a frequency domain signal,
Determining a first estimated noise using the frequency domain signal;
A suppression coefficient is determined using the first estimated noise and the frequency domain signal,
Weighting the frequency domain signal with the suppression coefficient to obtain a temporary output,
A second estimated noise is obtained using the temporary output,
The SNR of the temporary output is obtained using the second estimated noise, and the residual noise is reduced when the SNR of the temporary output is low so that the distortion is reduced when the SNR of the temporary output is high . Correcting the suppression coefficient to obtain a corrected suppression coefficient;
A noise suppression program for executing a process of suppressing noise by weighting the frequency domain signal with the correction suppression coefficient.

On the computer,
Furthermore, the ratio of the average power when the SNR of the temporary output is high and the average power when the SNR of the temporary output is low is obtained,
The noise suppression program according to claim 8, wherein a process for correcting the suppression coefficient is executed so that residual noise when the SNR of the temporary output is low is small when the ratio value is large.

On the computer,
Furthermore, the ratio of the average power when the SNR of the temporary output is high and the average power when the SNR of the temporary output is low is obtained,
The noise suppression program according to claim 8 or 9, for executing processing for correcting the suppression coefficient so that residual noise becomes large when the SNR of the temporary output is low when the ratio value is small. .