JP6596833B2

JP6596833B2 - Noise suppression device and program, noise estimation device and program, and SNR estimation device and program

Info

Publication number: JP6596833B2
Application number: JP2015023551A
Authority: JP
Inventors: 大藤枝
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2015-02-09
Filing date: 2015-02-09
Publication date: 2019-10-30
Anticipated expiration: 2035-02-09
Also published as: JP2016145944A

Description

本発明は雑音抑圧装置及びプログラム、雑音推定装置及びプログラム、並びに、ＳＮＲ推定装置及びプログラムに関し、例えば、入力信号に含まれる雑音成分を抑圧して音声成分を強調することを欲する通信端末、オーディオ機器、音声認識装置などに適用し得るものである。 The present invention relates to a noise suppression device and program, a noise estimation device and program, and an SNR estimation device and program, for example, a communication terminal and an audio device that desire to suppress a noise component included in an input signal and emphasize a speech component. It can be applied to a voice recognition device or the like.

自然環境において雑音はいたる所に存在するため、一般に実世界で音声を収録すると、観測信号には種々の発信元からの雑音が混入してしまう。それらの雑音は、人が聴くにしても音声の了解性を低下させ、また、音声認識装置等の音声処理装置に入力するにしても音声処理の精度（例えば音声認識率）を低下させる。そのため、入力信号に混入した雑音成分を抑圧して音声成分を強調する技術の需要は高く、これまでに様々な雑音抑圧方法（音声強調方法と呼ばれることもある）が開発されてきた。 Since noise exists everywhere in the natural environment, generally, when recording voice in the real world, noise from various sources is mixed in the observed signal. Those noises reduce the intelligibility of speech even when a person listens, and reduce the accuracy of speech processing (for example, speech recognition rate) even when input to a speech processing device such as a speech recognition device. For this reason, there is a great demand for a technique for enhancing a speech component by suppressing a noise component mixed in an input signal, and various noise suppression methods (sometimes called a speech enhancement method) have been developed so far.

一般的な雑音抑圧方法では、雑音のパワースペクトル（雑音パワースペクトル）又は周波数帯域ごとのＳＮＲ（Ｓｉｇｎａｌ−ｔｏ−ＮｏｉｓｅＲａｔｉｏ；信号対雑音比）が必要となる。また、ＳＮＲの算出には、入力信号のパワースペクトル（入力パワースペクトル）と雑音パワースペクトルを用いるため、結局は雑音パワースペクトルが分かれば良い。一般的な雑音パワースペクトルの推定では、まず、音声区間検出（ＶｏｉｃｅＡｃｔｉｖｉｔｙＤｅｔｅｃｔｉｏｎ：ＶＡＤ）によって雑音区間を特定し、次に、雑音区間の入力パワースペクトルを平均するというアプローチを取る。 In a general noise suppression method, a noise power spectrum (noise power spectrum) or an SNR (Signal-to-Noise Ratio) for each frequency band is required. In addition, since the power spectrum (input power spectrum) and the noise power spectrum of the input signal are used for the calculation of the SNR, it is only necessary to know the noise power spectrum after all. In general noise power spectrum estimation, an approach is adopted in which a noise interval is first specified by voice activity detection (VAD), and then an input power spectrum in the noise interval is averaged.

以上の方法では、雑音パワースペクトルの推定精度はＶＡＤの性能に依存する。しかし、雑音成分に関する情報なしに高精度なＶＡＤを実現することは難しく、実現できたとしても多大な演算量が必要となる。 In the above method, the estimation accuracy of the noise power spectrum depends on the performance of VAD. However, it is difficult to realize highly accurate VAD without information about noise components, and even if it can be realized, a large amount of calculation is required.

一方、特許文献１に記載の雑音抑圧方法は、雑音パワースペクトルの推定にＶＡＤを必要としない。以下、特許文献１に記載の雑音抑圧方法のアルゴリズムを簡単に説明する。なお、以下では、入力パワースペクトル及び雑音パワースペクトルのある周波数帯域の要素をそれぞれ、入力パワー及び雑音パワーと呼ぶ。 On the other hand, the noise suppression method described in Patent Document 1 does not require VAD to estimate the noise power spectrum. Hereinafter, the algorithm of the noise suppression method described in Patent Document 1 will be briefly described. In the following, elements of a frequency band having an input power spectrum and a noise power spectrum are referred to as input power and noise power, respectively.

入力パワーに適切な重み係数を乗じて、得られた加重入力パワーを所定時間（Ｔ秒）分だけ記憶しておき、記憶された加重入力パワーの平均値を推定雑音パワーとする。適切な重み係数は、現在の入力パワーを直前の推定雑音パワーで除した予測事後ＳＮＲによって算出される。具体的には、予測事後ＳＮＲが所定の値Ｇ１以下では重み係数を１とし、予測事後ＳＮＲが値Ｇ１より大きく所定の値Ｇ２（Ｇ２＞Ｇ１）以下では予測事後ＳＮＲに反比例するように重み係数を設定し、予測事後ＳＮＲが値Ｇ２より大きい場合には重み係数を０とする。また、重み係数が０の場合には、加重入力パワーは記憶されない。このようにして得られた推定雑音パワーで入力パワーを除することで事後ＳＮＲを算出し、得られた事後ＳＮＲに基づいて雑音抑圧（音声強調とも呼ばれる）が行われる。雑音抑圧には、ＭＭＳＥ−ＳＴＳＡと呼ばれる、非特許文献１に記載の方法を用いている。 The input power is multiplied by an appropriate weighting factor, the obtained weighted input power is stored for a predetermined time (T seconds), and the average value of the stored weighted input power is used as the estimated noise power. The appropriate weighting factor is calculated by the predicted post-SNR obtained by dividing the current input power by the previous estimated noise power. Specifically, the weighting factor is set to 1 when the predicted posterior SNR is equal to or smaller than a predetermined value G1, and the weighting factor is inversely proportional to the predicted posterior SNR when the predicted posterior SNR is larger than the value G1 and equal to or smaller than the predetermined value G2 (G2> G1). Is set, and the weighting factor is set to 0 when the predicted a posteriori SNR is larger than the value G2. When the weight coefficient is 0, the weighted input power is not stored. The posterior SNR is calculated by dividing the input power by the estimated noise power obtained in this way, and noise suppression (also referred to as speech enhancement) is performed based on the obtained posterior SNR. For noise suppression, a method described in Non-Patent Document 1 called MMSE-STSA is used.

特開２００２−２０４１７５号公報JP 2002-204175 A

Ｙ．ＥｐｈｒａｉｍａｎｄＤ．Ｍａｌａｈ，“Ｓｐｅｅｃｈｅｎｈａｎｃｅｍｅｎｔｕｓｉｎｇａｍｉｎｉｍｕｍｍｅａｎ−ｓｑｕａｒｅｅｒｒｏｒｓｈｏｒｔ−ｔｉｍｅｓｐｅｃｔｒａｌａｍｐｌｉｔｕｄｅｅｓｔｉｍａｔｏｒ”，ＩＥＥＥＡＳＳＰ，ｖｏｌ．ＡＳＳＰ−３２，ｎｏ．６，ｐ．１１０９−１１２１，Ｄｅｃ．１９８４Y. Ephraim and D.C. Malah, “Speech enhancement using a minimum mean-square error short-time spectral Amplitude Estimator”, IEEE ASSP, vol. ASSP-32, no. 6, p. 1109-1121, Dec. 1984

特許文献１の記載技術は、固定の閾値Ｇ１、Ｇ２を用いているため、特に雑音レベルが高い場合に、音声成分を雑音成分として平均してしまい、雑音パワースペクトルの推定が不正確になるという問題がある。その結果、事後ＳＮＲの精度も低くなり、雑音抑圧の精度も低くなる。 The technique described in Patent Document 1 uses fixed threshold values G1 and G2, and therefore, particularly when the noise level is high, the sound component is averaged as a noise component, and the estimation of the noise power spectrum becomes inaccurate. There's a problem. As a result, the accuracy of the posterior SNR is lowered and the accuracy of noise suppression is also lowered.

そのため、抑圧処理や推定処理の精度が高い雑音抑圧装置及びプログラム、雑音推定装置及びプログラム、並びに、ＳＮＲ推定装置及びプログラムが望まれている。 Therefore, a noise suppression device and program, a noise estimation device and program, and an SNR estimation device and program with high accuracy in suppression processing and estimation processing are desired.

第１の本発明は、入力信号に含まれる雑音成分を抑圧し、目的音成分を強調する雑音抑圧装置において、（１）入力信号を周波数解析して入力スペクトルを算出する周波数解析部と、（２）上記周波数解析部が算出したいずれかの入力スペクトルの周波数帯域に対応し、その周波数帯域の信号における雑音成分を抑圧する、複数の帯域別雑音抑圧手段とを備え、（２）上記帯域別雑音抑圧手段は、（２−１）入力された周波数帯域信号について算出された第１の入力パワーに基づいた第１の特徴量と、内部で生成した第１の閾値とを比較して、入力された上記周波数帯域信号における第１の目的音区間の検出結果を得る第１のパラメータ算出部と、（２−２）入力された周波数帯域信号について算出された第２の入力パワーに基づいた第２の特徴量と、内部で生成した第２の閾値とを比較して、入力された上記周波数帯域信号における第２の目的音区間の検出結果を得る第２のパラメータ算出部と、（２−３）上記第１のパラメータ算出部が得た第１のパラメータと上記第２のパラメータ算出部が得た第２のパラメータとに基づいて、入力された上記周波数帯域信号に含まれる雑音成分を抑圧する雑音抑圧部とを有し、（２−１ａ）上記第１のパラメータ算出部は、上記第２のパラメータ算出部が所定の単位時間前に出力した、上記第２の目的音区間の検出結果を少なくとも含む第２のパラメータを用いて上記第１の閾値を生成し、（２−２ａ）上記第２のパラメータ算出部は、上記第１のパラメータ算出部が同一の単位時間で出力した、上記第１の目的音区間の検出結果を少なくとも含む第２のパラメータを用いて上記第２の閾値を生成することを特徴とする。 According to a first aspect of the present invention, in a noise suppression apparatus that suppresses a noise component included in an input signal and emphasizes a target sound component, (1) a frequency analysis unit that calculates an input spectrum by performing frequency analysis of the input signal; 2) a plurality of band-specific noise suppression units that correspond to the frequency band of any input spectrum calculated by the frequency analysis unit and suppress noise components in the signal of the frequency band; The noise suppression means (2-1) compares the first feature amount based on the first input power calculated for the input frequency band signal with the first threshold value generated internally, A first parameter calculation unit for obtaining a detection result of the first target sound section in the frequency band signal, and (2-2) a second input power based on the second input power calculated for the input frequency band signal. 2 A second parameter calculation unit that compares the collected amount with a second threshold value generated internally to obtain a detection result of the second target sound section in the input frequency band signal; (2-3) Noise that suppresses a noise component included in the input frequency band signal based on the first parameter obtained by the first parameter calculation unit and the second parameter obtained by the second parameter calculation unit (2-1a) the first parameter calculation unit outputs at least a detection result of the second target sound section output by the second parameter calculation unit before a predetermined unit time. The first threshold value is generated using a second parameter that includes (2-2a) the second parameter calculation unit, and the first parameter calculation unit outputs the first threshold value in the same unit time. The detection result of the target sound section of And generates the second threshold value using a second parameter including.

第２の本発明は、入力信号における雑音パワーを推定する雑音推定装置において、（１）入力信号を周波数解析して入力スペクトルを算出する周波数解析部と、（２）上記周波数解析部が算出したいずれかの入力スペクトルの周波数帯域に対応し、その周波数帯域の信号における雑音パワーを推定する、複数の帯域別雑音推定手段と、（３）上記各帯域別雑音推定手段が得た、周波数帯域別の複数の雑音パワーの推定値を統合して最終的な雑音パワーの推定値を得る帯域別雑音パワー統合手段とを備え、（２）上記各帯域別雑音推定手段はそれぞれ、（２−１）入力された上記周波数帯域信号について算出された第１の入力パワーに基づいた第１の特徴量と、内部で生成した第１の閾値とを比較して、入力された上記周波数帯域信号における第１の目的音区間を検出する第１のパラメータ算出部と、（２−２）入力された上記周波数帯域信号について算出された第２の入力パワーに基づいた第２の特徴量と、内部で生成した第２の閾値とを比較して、入力された上記周波数帯域信号における第２の目的音区間を検出する第２のパラメータ算出部とを備え、（２−１）上記第１のパラメータ算出部は、（２−１−１）所定の単位時間前の上記第２の目的音区間の検出結果に基づいて平滑化の実行、停止を制御しながら、上記第１の入力パワーを平滑化して第１の平滑化パワーを算出する第１の平滑化部と、（２−１−２）上記第１の平滑化パワーを少なくとも適用して、上記第１の閾値を算出する第１の閾値算出部と、（２−１−３）上記第１の入力パワーを上記第１の特徴量として上記第１の閾値と比較して目的音区間か否かを判定し、上記第１の目的音区間の検出結果を得る第１の目的音区間判定部とを有し、（２−２）上記第２のパラメータ算出部は、（２−２−１）同一の単位時間の上記第１の目的音区間の検出結果に基づいて平滑化の実行、停止を制御しながら、上記第２の入力パワーを平滑化して第２の平滑化パワーを算出する第２の平滑化部と、（２−２−２）上記第２の平滑化パワーを少なくとも適用して、上記第２の閾値を算出する第２の閾値算出部と、（２−２−３）上記第２の入力パワーを上記第２の特徴量として上記第２の閾値と比較して目的音区間か否かを判定し、上記第２の目的音区間の検出結果を得る第２の目的音区間判定部とを有し、（４）上記第１の平滑化部又は上記第２の平滑化部は、所定の単位時間前の上記第２の目的音区間の検出結果又は同一の単位時間の上記第１の目的音区間の検出結果が目的音区間でない場合に平滑化し、目的音区間である場合に平滑化を停止し、上記第１の平滑化パワー又は上記第２の平滑化パワーを帯域別の雑音パワーの推定値として得ることを特徴とする。 According to a second aspect of the present invention, in the noise estimation apparatus for estimating the noise power in the input signal, (1) a frequency analysis unit that performs frequency analysis of the input signal to calculate an input spectrum, and (2) the frequency analysis unit calculates A plurality of noise estimation means for each band corresponding to a frequency band of any one of the input spectra and estimating the noise power in the signal in that frequency band; and (3) the frequency bands obtained by the noise estimation means for each band. And a noise power integration unit for each band that obtains a final noise power estimation value by integrating a plurality of noise power estimation values, and (2) each of the noise estimation means for each band is (2-1) a first feature amount based on the first input power calculated for the input the frequency band signal, by comparing the first threshold value generated internally in the input the frequency band signals A first parameter calculating section that detects a target sound section, and (2-2) second feature amount based on the second input power calculated for the input the frequency band signal, generated internally A second parameter calculation unit that compares the second threshold value and detects a second target sound section in the input frequency band signal, and (2-1) the first parameter calculation unit. (2-1-1) The first input power is smoothed by controlling the execution and stop of smoothing based on the detection result of the second target sound section before a predetermined unit time. A first smoothing unit that calculates one smoothing power; and (2-1-2) a first threshold value calculation unit that calculates the first threshold value by applying at least the first smoothing power. And (2-1-3) the first input power as the first feature amount A first target sound section determination unit that determines whether or not the target sound section is compared with the first threshold value, and obtains a detection result of the first target sound section, (2-2) The second parameter calculation unit (2-2-1) controls the execution and stop of the smoothing based on the detection result of the first target sound section of the same unit time, while the second input power And (2-2-2) calculating the second threshold value by applying at least the second smoothing power to the second smoothing unit that calculates the second smoothing power by smoothing And (2-2-3) comparing the second input power with the second threshold value as the second feature amount to determine whether the target sound section is the second sound amount. A second target sound section determination unit that obtains the detection result of the target sound section of (4), (4) the first smoothing unit or the second smoothing unit, Smoothing is performed when the detection result of the second target sound section before a predetermined unit time or the detection result of the first target sound section of the same unit time is not the target sound section, and smoothed when the detection result is the target sound section. And the first smoothing power or the second smoothing power is obtained as an estimated value of noise power for each band.

第３の本発明は、入力信号におけるＳＮＲを推定するＳＮＲ推定装置において、（１）入力信号を周波数解析して入力スペクトルを算出する周波数解析部と、（２）上記周波数解析部が算出したいずれかの入力スペクトルの周波数帯域に対応し、その周波数帯域の信号におけるＳＮＲを推定する、複数の帯域別ＳＮＲ推定手段と、（３）上記各帯域別ＳＮＲ推定手段が得た、周波数帯域別の複数のＳＮＲ推定値を統合して最終的なＳＮＲの推定値を得る帯域別ＳＮＲ統合手段とを備え、（２）上記各帯域別ＳＮＲ推定手段はそれぞれ、（２−１）入力された上記周波数帯域信号について算出された第１の入力パワーに基づいた第１の特徴量と、内部で生成した第１の閾値とを比較して、入力された上記周波数帯域信号における第１の目的音区間を検出する第１のパラメータ算出部と、（２−２）入力された上記周波数帯域信号について算出された第２の入力パワーに基づいた第２の特徴量と、内部で生成した第２の閾値とを比較して、入力された上記周波数帯域信号における第２の目的音区間を検出する第２のパラメータ算出部とを備え、
（２−１）上記第１のパラメータ算出部は、（２−１−１）所定の単位時間前の上記第２の目的音区間の検出結果に基づいて平滑化の実行、停止を制御しながら、上記第１の入力パワーを平滑化して第１の平滑化パワーを算出する第１の平滑化部と、（２−１−２）上記第１の平滑化パワーを少なくとも適用して、上記第１の閾値を算出する第１の閾値算出部と、（２−１−３）上記第１の入力パワーを上記第１の特徴量として上記第１の閾値と比較して目的音区間か否かを判定し、上記第１の目的音区間の検出結果を得る第１の目的音区間判定部とを有し、（２−２）上記第２のパラメータ算出部は、（２−２−１）同一の単位時間の上記第２の入力パワー及び上記第１の平滑化パワーに基づいてＳＮＲの推定値を算出するＳＮＲ算出部と、（２−２−２）同一の単位時間の上記第１の目的音区間の検出結果に基づいて、平滑化の実行、停止を制御しながら、上記ＳＮＲ推定値を平滑化してＳＮＲの平滑値を算出する第２の平滑化部と、（２−２−３）上記ＳＮＲ平滑値を少なくとも適用して、上記第２の閾値を算出する第２の閾値算出部と、（２−２−４）上記ＳＮＲ推定値を上記第２の特徴量として上記第２の閾値と比較して目的音区間か否かを判定し、上記第２の目的音区間の検出結果を得る第２の目的音区間判定部とを有し、（４）上記ＳＮＲ算出部からの上記ＳＮＲ推定値を入力された上記周波数帯域信号における、その周波数帯域のＳＮＲの推定値として得ることを特徴とする。 According to a third aspect of the present invention, in the SNR estimation apparatus for estimating the SNR in the input signal, (1) a frequency analysis unit that performs frequency analysis of the input signal to calculate an input spectrum, and (2) any of the frequency analysis units calculated by the frequency analysis unit A plurality of SNR estimation means for each band corresponding to the frequency band of the input spectrum and estimating the SNR in the signal of the frequency band, and (3) a plurality of frequency band-specific SNR estimation means obtained by the respective SNR estimation means for each band. Band-specific SNR integrating means for integrating final SNR estimated values to obtain a final SNR estimated value, and (2) each of the band-specific SNR estimating means is (2-1) the input frequency band a first feature amount based on the first input power calculated for the signal, by comparing the first threshold value generated inside, the first target sound Ward in the input the frequency band signals A first parameter calculation unit that detects (2), a second feature amount based on the second input power calculated for the input frequency band signal, and a second threshold value generated internally And a second parameter calculation unit for detecting a second target sound section in the input frequency band signal,
(2-1) The first parameter calculation unit (2-1-1) controls the execution and stop of smoothing based on the detection result of the second target sound section before a predetermined unit time. A first smoothing unit that smoothes the first input power to calculate a first smoothing power; and (2-1-2) applies at least the first smoothing power, and A first threshold value calculation unit that calculates a threshold value of 1, and (2-1-3) whether the first input power is the first feature amount and is compared with the first threshold value to determine whether or not the target sound section And (2-2) the second parameter calculation unit is (2-2-1) having a first target sound section determination unit that obtains a detection result of the first target sound section. An SNR calculation unit for calculating an estimated value of SNR based on the second input power and the first smoothing power of the same unit time; (2-2-2) Based on the detection result of the first target sound section in the same unit time, the smoothed SNR value is obtained by smoothing the SNR estimated value while controlling the execution and stop of smoothing. A second smoothing unit to calculate, (2-2-3) a second threshold value calculating unit to calculate the second threshold value by applying at least the SNR smooth value, and (2-2-4) The SNR estimation value is compared with the second threshold value as the second feature amount to determine whether or not the target sound section is present, and second target sound section determination for obtaining a detection result of the second target sound section (4) The SNR estimation value from the SNR calculation unit is obtained as an SNR estimation value of the frequency band in the input frequency band signal.

第４の本発明は、入力信号に含まれる雑音成分を抑圧し、目的音成分を強調する雑音抑圧プログラムであって、コンピュータを、（１）入力信号を周波数解析して入力スペクトルを算出する周波数解析部と、（２）上記周波数解析部が算出したいずれかの入力スペクトルの周波数帯域に対応し、その周波数帯域の信号における雑音成分を抑圧する、複数の帯域別雑音抑圧手段として機能させるものであり、（２）上記帯域別雑音抑圧手段は、（２−１）入力された周波数帯域信号について算出された第１の入力パワーに基づいた第１の特徴量と、内部で生成した第１の閾値とを比較して、入力された上記周波数帯域信号における第１の目的音区間の検出結果を得る第１のパラメータ算出部と、（２−２）入力された周波数帯域信号について算出された第２の入力パワーに基づいた第２の特徴量と、内部で生成した第２の閾値とを比較して、入力された上記周波数帯域信号における第２の目的音区間の検出結果を得る第２のパラメータ算出部と、（２−３）上記第１のパラメータ算出部が得た第１のパラメータと上記第２のパラメータ算出部が得た第２のパラメータとに基づいて、入力された上記周波数帯域信号に含まれる雑音成分を抑圧する雑音抑圧部とを有し、（２−１ａ）上記第１のパラメータ算出部は、上記第２のパラメータ算出部が所定の単位時間前に出力した、上記第２の目的音区間の検出結果を少なくとも含む第２のパラメータを用いて上記第１の閾値を生成し、（２−２ａ）上記第２のパラメータ算出部は、上記第１のパラメータ算出部が同一の単位時間で出力した、上記第１の目的音区間の検出結果を少なくとも含む第２のパラメータを用いて上記第２の閾値を生成することを特徴とする。 A fourth aspect of the present invention is a noise suppression program that suppresses a noise component included in an input signal and emphasizes a target sound component, and (1) a frequency at which an input spectrum is calculated by frequency analysis of the input signal. An analysis unit, and (2) function as a plurality of band noise suppression units that correspond to the frequency band of any input spectrum calculated by the frequency analysis unit and suppress noise components in the signal of the frequency band. Yes, (2) the band-specific noise suppression means includes (2-1) a first feature amount based on the first input power calculated for the input frequency band signal, and a first generated internally. A first parameter calculation unit that compares the threshold value and obtains a detection result of the first target sound section in the input frequency band signal; and (2-2) calculates the input frequency band signal. The second feature value based on the input second input power is compared with the internally generated second threshold value to obtain a detection result of the second target sound section in the input frequency band signal. Based on the second parameter calculation unit, (2-3) the first parameter obtained by the first parameter calculation unit, and the second parameter obtained by the second parameter calculation unit A noise suppression unit that suppresses a noise component included in the frequency band signal. (2-1a) The first parameter calculation unit is output by the second parameter calculation unit a predetermined unit time before Generating the first threshold value using a second parameter including at least the detection result of the second target sound section, and (2-2a) the second parameter calculation unit calculates the first parameter calculation. part is output in the same unit time And generates the second threshold value using the second parameters including at least a detection result of the first target sound period.

第５の本発明は、入力信号における雑音パワーを推定する雑音推定プログラムであって、コンピュータを、（１）入力信号を周波数解析して入力スペクトルを算出する周波数解析部と、（２）上記周波数解析部が算出したいずれかの入力スペクトルの周波数帯域に対応し、その周波数帯域の信号における雑音パワーを推定する、複数の帯域別雑音推定手段と、（３）上記各帯域別雑音推定手段が得た、周波数帯域別の複数の雑音パワーの推定値を統合して最終的な雑音パワーの推定値を得る帯域別雑音パワー統合手段として機能させるものであり、（２）上記各帯域別雑音推定手段はそれぞれ、（２−１）入力された上記周波数帯域信号について算出された第１の入力パワーに基づいた第１の特徴量と、内部で生成した第１の閾値とを比較して、入力された上記周波数帯域信号における第１の目的音区間を検出する第１のパラメータ算出部と、（２−２）入力された上記周波数帯域信号について算出された第２の入力パワーに基づいた第２の特徴量と、内部で生成した第２の閾値とを比較して、入力された上記周波数帯域信号における第２の目的音区間を検出する第２のパラメータ算出部とを備え、（２−１）上記第１のパラメータ算出部は、（２−１−１）所定の単位時間前の上記第２の目的音区間の検出結果に基づいて平滑化の実行、停止を制御しながら、上記第１の入力パワーを平滑化して第１の平滑化パワーを算出する第１の平滑化部と、（２−１−２）上記第１の平滑化パワーを少なくとも適用して、上記第１の閾値を算出する第１の閾値算出部と、（２−１−３）上記第１の入力パワーを上記第１の特徴量として上記第１の閾値と比較して目的音区間か否かを判定し、上記第１の目的音区間の検出結果を得る第１の目的音区間判定部とを有し、（２−２）上記第２のパラメータ算出部は、（２−２−１）同一の単位時間の上記第１の目的音区間の検出結果に基づいて平滑化の実行、停止を制御しながら、上記第２の入力パワーを平滑化して第２の平滑化パワーを算出する第２の平滑化部と、（２−２−２）上記第２の平滑化パワーを少なくとも適用して、上記第２の閾値を算出する第２の閾値算出部と、（２−２−３）上記第２の入力パワーを上記第２の特徴量として上記第２の閾値と比較して目的音区間か否かを判定し、上記第２の目的音区間の検出結果を得る第２の目的音区間判定部とを有し、（４）上記第１の平滑化部又は上記第２の平滑化部は、所定の単位時間前の上記第２の目的音区間の検出結果又は同一の単位時間の上記第１の目的音区間の検出結果が目的音区間でない場合に平滑化し、目的音区間である場合に平滑化を停止し、上記第１の平滑化パワー又は上記第２の平滑化パワーを帯域別の雑音パワーの推定値として得ることを特徴とする。 The fifth aspect of the present invention is a noise estimation program for estimating noise power in an input signal, the computer comprising: (1) a frequency analysis unit for frequency analysis of the input signal to calculate an input spectrum; and (2) the frequency described above. A plurality of band-specific noise estimation means for estimating a noise power in a signal of the frequency band corresponding to a frequency band of any one of the input spectra calculated by the analysis unit; In addition, a plurality of noise power estimation values for each frequency band are integrated to function as band-specific noise power integration means for obtaining a final noise power estimation value. (2) Each band-specific noise estimation means (2-1) Compare the first feature quantity based on the first input power calculated for the input frequency band signal and the first threshold value generated internally. A first parameter calculating section for detecting a first target sound section in the input the frequency band signals, based on the second input power calculated for the frequency band signal inputted (2-2) A second parameter calculation unit that compares the second feature amount with a second threshold value generated internally and detects a second target sound section in the input frequency band signal; and (2 -1) The first parameter calculation unit (2-1-1) controls the execution and stop of smoothing based on the detection result of the second target sound section before a predetermined unit time. A first smoothing unit that smoothes the first input power and calculates a first smoothing power; and (2-1-2) applies at least the first smoothing power, A first threshold value calculation unit for calculating a threshold value, and (2-1-3) above The first target sound section determination is performed by comparing the first input power with the first threshold value as the first feature value to determine whether or not the target sound section is present, and obtaining a detection result of the first target sound section. (2-2) The second parameter calculation unit (2-2-1) executes smoothing based on the detection result of the first target sound section of the same unit time, A second smoothing unit that calculates the second smoothing power by smoothing the second input power while controlling the stop; and (2-2-2) at least the second smoothing power is applied. A second threshold value calculation unit for calculating the second threshold value, and (2-2-3) comparing the second input power with the second threshold value as the second feature value. A second target sound segment determination unit that determines whether or not the current segment is a sound segment and obtains a detection result of the second target sound segment; (4) The first smoothing unit or the second smoothing unit uses the detection result of the second target sound section before a predetermined unit time or the detection result of the first target sound section of the same unit time as the purpose. Smoothing is performed when it is not a sound section, smoothing is stopped when it is a target sound section, and the first smoothing power or the second smoothing power is obtained as an estimated value of noise power for each band. And

第６の本発明は、入力信号におけるＳＮＲを推定するＳＮＲ推定プログラムであって、コンピュータを、（１）入力信号を周波数解析して入力スペクトルを算出する周波数解析部と、（２）上記周波数解析部が算出したいずれかの入力スペクトルの周波数帯域に対応し、その周波数帯域の信号におけるＳＮＲを推定する、複数の帯域別ＳＮＲ推定手段と、（３）上記各帯域別ＳＮＲ推定手段が得た、周波数帯域別の複数のＳＮＲ推定値を統合して最終的なＳＮＲの推定値を得る帯域別ＳＮＲ統合手段として機能させるものであり、（２）上記各帯域別ＳＮＲ推定手段はそれぞれ、（２−１）入力された上記周波数帯域信号について算出された第１の入力パワーに基づいた第１の特徴量と、内部で生成した第１の閾値とを比較して、入力された上記周波数帯域信号における第１の目的音区間を検出する第１のパラメータ算出部と、（２−２）入力された上記周波数帯域信号について算出された第２の入力パワーに基づいた第２の特徴量と、内部で生成した第２の閾値とを比較して、入力された上記周波数帯域信号における第２の目的音区間を検出する第２のパラメータ算出部とを備え、（２−１）上記第１のパラメータ算出部は、（２−１−１）所定の単位時間前の上記第２の目的音区間の検出結果に基づいて平滑化の実行、停止を制御しながら、上記第１の入力パワーを平滑化して第１の平滑化パワーを算出する第１の平滑化部と、（２−１−２）上記第１の平滑化パワーを少なくとも適用して、上記第１の閾値を算出する第１の閾値算出部と、（２−１−３）上記第１の入力パワーを上記第１の特徴量として上記第１の閾値と比較して目的音区間か否かを判定し、上記第１の目的音区間の検出結果を得る第１の目的音区間判定部とを有し、（２−２）上記第２のパラメータ算出部は、（２−２−１）同一の単位時間の上記第２の入力パワー及び上記第１の平滑化パワーに基づいてＳＮＲの推定値を算出するＳＮＲ算出部と、（２−２−２）同一の単位時間の上記第１の目的音区間の検出結果に基づいて、平滑化の実行、停止を制御しながら、上記ＳＮＲ推定値を平滑化してＳＮＲの平滑値を算出する第２の平滑化部と、（２−２−３）上記ＳＮＲ平滑値を少なくとも適用して、上記第２の閾値を算出する第２の閾値算出部と、（２−２−４）上記ＳＮＲ推定値を上記第２の特徴量として上記第２の閾値と比較して目的音区間か否かを判定し、上記第２の目的音区間の検出結果を得る第２の目的音区間判定部とを有し、（４）上記ＳＮＲ算出部からの上記ＳＮＲ推定値を入力された上記周波数帯域信号における、その周波数帯域のＳＮＲの推定値として得ることを特徴とする。 A sixth aspect of the present invention is an SNR estimation program for estimating an SNR in an input signal, comprising: (1) a frequency analysis unit that performs frequency analysis of the input signal to calculate an input spectrum; and (2) the frequency analysis described above. A plurality of band-specific SNR estimation means for estimating the SNR of a signal in the frequency band corresponding to the frequency band of any input spectrum calculated by the section; and (3) the band-specific SNR estimation means obtained above. A plurality of SNR estimation values for each frequency band are integrated to function as band-specific SNR integration means for obtaining a final SNR estimation value. (2) Each of the band-specific SNR estimation means is (2- 1) The first feature value based on the first input power calculated for the input frequency band signal is compared with the first threshold value generated internally, A first parameter calculating section for detecting a first target sound section in the serial frequency band signals, (2-2) second feature based on a second input power calculated for the input the frequency band signals A second parameter calculation unit that compares the amount with a second threshold value generated internally and detects a second target sound section in the input frequency band signal, (2-1) The first parameter calculation unit (2-1-1) controls the execution and stop of the smoothing based on the detection result of the second target sound section of a predetermined unit time before the first input. A first smoothing unit that calculates the first smoothed power by smoothing the power; and (2-1-2) calculates the first threshold value by applying at least the first smoothed power. A first threshold value calculation unit; and (2-1-3) the first input power Is compared with the first threshold value as the first feature value to determine whether or not the target sound section is present, and a first target sound section determining unit for obtaining a detection result of the first target sound section is provided. (2-2) The second parameter calculation unit (2-2-1) calculates an SNR estimate based on the second input power and the first smoothing power in the same unit time. Based on the SNR calculation unit to calculate and (2-2-2) the detection result of the first target sound section of the same unit time, smoothing the SNR estimated value while controlling smoothing execution and stop A second smoothing unit that calculates the smoothed value of the SNR, and (2-2-3) a second threshold value calculating unit that calculates the second threshold value by applying at least the SNR smoothed value; (2-2-4) The SNR estimated value is compared with the second threshold value as the second feature value, and the target sound section And a second target sound section determination unit for determining whether or not the second target sound section is detected, and (4) the SNR estimated value from the SNR calculation section is input It is obtained as an estimated value of SNR of the frequency band signal in the frequency band signal.

本発明によれば、抑圧処理や推定処理の精度が高い雑音抑圧装置及びプログラム、雑音推定装置及びプログラム、並びに、ＳＮＲ推定装置及びプログラムを提供できる。 According to the present invention, it is possible to provide a noise suppression device and program, a noise estimation device and program, and an SNR estimation device and program with high accuracy in suppression processing and estimation processing.

第１の実施形態の雑音抑圧装置の構成を示すブロック図である。It is a block diagram which shows the structure of the noise suppression apparatus of 1st Embodiment. 第１の実施形態の雑音抑圧装置における第１のパラメータ算出部の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the 1st parameter calculation part in the noise suppression apparatus of 1st Embodiment. 第１の実施形態の雑音抑圧装置における第２のパラメータ算出部の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the 2nd parameter calculation part in the noise suppression apparatus of 1st Embodiment. 第１の実施形態をハングオーバー面で変形した雑音抑圧装置における第２のパラメータ算出部の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the 2nd parameter calculation part in the noise suppression apparatus which deform | transformed 1st Embodiment by the hangover surface. 第２の実施形態の雑音抑圧装置における第１のパラメータ算出部の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the 1st parameter calculation part in the noise suppression apparatus of 2nd Embodiment. 第２の実施形態の雑音抑圧装置における第２のパラメータ算出部の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the 2nd parameter calculation part in the noise suppression apparatus of 2nd Embodiment. 第３の実施形態の雑音抑圧装置における第２のパラメータ算出部の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the 2nd parameter calculation part in the noise suppression apparatus of 3rd Embodiment. 実施形態の雑音推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the noise estimation apparatus of embodiment. 実施形態のＳＮＲ推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the SNR estimation apparatus of embodiment.

（Ａ）第１の実施形態
以下、本発明による雑音抑圧装置及びプログラムの第１の実施形態を、図面を参照しながら説明する。 (A) First Embodiment A noise suppression device and a first embodiment of the present invention will be described below with reference to the drawings.

（Ａ−１）第１の実施形態の構成
図１は、第１の実施形態の雑音抑圧装置の構成を示すブロック図である。 (A-1) Configuration of the First Embodiment FIG. 1 is a block diagram showing the configuration of the noise suppression device of the first embodiment.

第１の実施形態の雑音抑圧装置は、図１で示す構成部分をハードウェアで構成することも可能であり、また、ＣＰＵが実行するソフトウェア（雑音抑圧プログラム）とＣＰＵとで実現することも可能であるが、いずれの実現方法を採用した場合であっても、機能的には図１で表すことができる。 The noise suppression apparatus according to the first embodiment can be configured by hardware as the components shown in FIG. 1, and can also be realized by software (noise suppression program) executed by the CPU and the CPU. However, even if any realization method is adopted, it can be functionally represented in FIG.

図１において、第１の実施形態の雑音抑圧装置１００は、周波数解析部１０１、帯域別雑音抑圧手段１０２−１〜１０２−Ｍ及び波形復元部１０３を有する。 In FIG. 1, the noise suppression device 100 according to the first embodiment includes a frequency analysis unit 101, band-specific noise suppression units 102-1 to 102 -M, and a waveform restoration unit 103.

周波数解析部１０１は、入力信号を周波数解析して周波数スペクトルを算出し、得られた入力スペクトルを帯域別雑音抑圧手段１０２−１〜１０２−Ｍに与えるものである。周波数解析には、例えば、高速フーリエ変換（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ：ＦＦＴ）やウェーブレット変換やフィルタバンクなどを適用することができるが、ＦＦＴが好適である。以下では、入力スペクトルは複素数で与えられるものとする。 The frequency analysis unit 101 performs frequency analysis on the input signal to calculate a frequency spectrum, and gives the obtained input spectrum to the band-specific noise suppression units 102-1 to 102-M. For example, fast Fourier transform (FFT), wavelet transform, filter bank, or the like can be applied to the frequency analysis, but FFT is preferable. In the following, it is assumed that the input spectrum is given as a complex number.

帯域別雑音抑圧手段１０２−１〜１０２−Ｍは、周波数解析部１０１によって得られるスペクトル（周波数帯域）の数（Ｍ）だけ設けられている。以下、各帯域別雑音抑圧手段１０２−１〜１０２−Ｍへの入力スペクトルにおける信号を周波数帯域信号と呼ぶこととする。 The number of noise suppression units 102-1 to 102-M for each band is provided by the number (M) of spectra (frequency bands) obtained by the frequency analysis unit 101. Hereinafter, a signal in the input spectrum to each band noise suppression means 102-1 to 102-M will be referred to as a frequency band signal.

各帯域別雑音抑圧手段１０２−１〜１０２−Ｍ（以下、枝番「１」〜「Ｍ」を適宜省略して説明する）は、入力される周波数帯域信号は異なるが同様な構成を有する。帯域別雑音抑圧手段１０２は、自己へ入力された周波数帯域信号に対して後述するような雑音抑圧を行って、雑音抑圧後の周波数帯域信号を波形復元部１０３に与える。 Each of the noise suppression units 102-1 to 102-M for each band (hereinafter, described with the branch numbers “1” to “M” omitted as appropriate) has a similar configuration, although the input frequency band signals are different. The noise suppression unit 102 for each band performs noise suppression as will be described later on the frequency band signal input to itself, and gives the frequency band signal after noise suppression to the waveform restoration unit 103.

波形復元部１０３は、全ての帯域別雑音抑圧手段１０２から与えられたスペクトル（雑音抑圧後の周波数帯域信号）でなる出カスペクトルを時間領域の信号に変換し、得られた時間領域信号を、当該雑音抑圧装置１００の出力として次段の装置に出力する。時間領域の信号への変換は、周波数解析部１０１で用いた周波数解析技術と対をなす方法を用い、例えば、周波数解析技術がＦＦＴであれば時間領域の信号への変換には逆高速フーリエ変換（ＩｎｖｅｒｓｅＦＦＴ；ＩＦＦＴ）を用いる。 The waveform restoration unit 103 converts the output spectrum composed of the spectrum (frequency band signal after noise suppression) given from all the noise suppression means 102 for each band into a time domain signal, and the obtained time domain signal is The output of the noise suppression apparatus 100 is output to the next stage apparatus. The conversion to the time domain signal uses a method that is paired with the frequency analysis technique used in the frequency analysis unit 101. For example, if the frequency analysis technique is FFT, the inverse fast Fourier transform is used for the conversion to the time domain signal. (Inverse FFT; IFFT) is used.

各帯域別雑音抑圧手段１０２はそれぞれ、パワー算出部１１０、第１のパラメータ算出部１１１、第２のパラメータ算出部１１２、単位時間遅延部１１３及び雑音抑圧部１１４を有する。 Each band-specific noise suppression unit 102 includes a power calculation unit 110, a first parameter calculation unit 111, a second parameter calculation unit 112, a unit time delay unit 113, and a noise suppression unit 114.

帯域別雑音抑圧手段１０２において、入力された周波数帯域信号はパワー算出部１１０及び雑音抑圧部１１４に与えられるようになされている。 In the noise suppression unit 102 for each band, the input frequency band signal is supplied to the power calculation unit 110 and the noise suppression unit 114.

パワー算出部１１０は、入力された周波数帯域信号のＴＰ秒間のパワーを算出し、得られた入力パワーＰｉｎを第１のパラメータ算出部１１１、第２のパラメータ算出部１１２及び雑音抑圧部１１４に与えるものである。パワーの算出方法として、公知の算出方法を適用することができる。例えば、絶対値の２乗和若しくは絶対値和を入力パワーとして算出するようにしても良く、ＴＰ秒間の最大振幅を入力パワーとして算出するようにしても良い。 The power calculation unit 110 calculates the power for TP seconds of the input frequency band signal, and provides the obtained input power Pin to the first parameter calculation unit 111, the second parameter calculation unit 112, and the noise suppression unit 114. Is. A known calculation method can be applied as the power calculation method. For example, the sum of squares of absolute values or the sum of absolute values may be calculated as input power, or the maximum amplitude for TP seconds may be calculated as input power.

第１のパラメータ算出部１１１は、１単位時間前の第２のパラメータ算出部１１２の検出結果である第２の音声区間真偽値Ｖ２を含む第２のパラメータＦ２と、入力パワーＰｉｎとを用いて音声区間検出を行い、得られた音声区間真偽値（第１の音声区間真偽値）Ｖ１を含む第１のパラメータＦ１を第２のパラメータ算出部１１２及び雑音抑圧部１１４に与えるものである。上述した単位時間は、例えば、音声処理などで適用されている１０ミリ秒等のフレームである。 The first parameter calculation unit 111 uses the second parameter F2 including the second speech section truth value V2 that is the detection result of the second parameter calculation unit 112 one unit time ago, and the input power Pin. The first parameter F1 including the obtained voice section truth value (first voice section truth value) V1 is given to the second parameter calculation unit 112 and the noise suppression unit 114. is there. The unit time described above is, for example, a frame of 10 milliseconds applied in audio processing or the like.

第２のパラメータ算出部１１２は、第１のパラメータ算出部１１１の検出結果である第１の音声区間真偽値Ｖ１を少なくとも含む第１のパラメータＦ１と、入力パワーＰｉｎとを用いて音声区間検出を行い、得られた音声区間真偽値（第２の音声区間真偽値）Ｖ２を少なくとも含む第２のパラメータＦ２を、単位時間遅延部１１３を介して第１のパラメータ算出部１１１に与えると共に、上述した第２のパラメータＦ２を雑音抑圧部１１４に与えるものである。 The second parameter calculation unit 112 uses the first parameter F1 including at least the first voice segment true / false value V1, which is the detection result of the first parameter calculation unit 111, and the voice segment detection using the input power Pin. And the second parameter F2 including at least the obtained speech section truth value (second speech section truth value) V2 is provided to the first parameter calculation unit 111 via the unit time delay unit 113. The second parameter F2 described above is given to the noise suppression unit 114.

単位時間遅延部１１３は、第２のパラメータ算出部１１２から出力された第２のパラメータＦ２を１単位時間だけ遅延させて第１のパラメータ算出部１１１に与えるものである。 The unit time delay unit 113 delays the second parameter F2 output from the second parameter calculation unit 112 by one unit time and gives it to the first parameter calculation unit 111.

雑音抑圧部１１４は、第１のパラメータＦ１、第２のパラメータＦ２及び入力パワーＰｉｎに基づいて、入力スペクトル（入力された周波数帯域信号）の雑音成分を抑圧し、得られた出力スペクトル（抑圧後の周波数帯域信号）を波形復元部１０３に与えるものである。雑音抑圧方法として、演算量の少ないスペクトル減算法が好適であるが、ウィナーフィルタ法やＭＭＳＥ−ＳＴＳＡなどを用いても良い。 The noise suppression unit 114 suppresses the noise component of the input spectrum (input frequency band signal) based on the first parameter F1, the second parameter F2, and the input power Pin, and the obtained output spectrum (after suppression) Frequency band signal) to the waveform restoration unit 103. As a noise suppression method, a spectral subtraction method with a small amount of calculation is suitable, but a Wiener filter method, MMSE-STSA, or the like may be used.

スペクトル減算法を用いる場合、抑庄ゲインは入力振幅（又はパワー）から雑音振幅（又はパワー）を減じた後、入力振幅（又はパワー）で除することで与えられる。但し、減算結果が負値となると抑圧ゲインが負となるため、抑圧ゲインが所定の最小ゲイン値を下回らないようにするなどの対策が取られる。一方、ウィナーフィルタやＭＭＳＥ−ＳＴＳＡを用いる場合、抑圧ゲインは事後ＳＮＲと事前ＳＮＲに基づいて算出される。事前ＳＮＲは、Ｄｅｃｉｓｉｏｎ−Ｄｉｒｅｃｔｅｄ法を用いて事後ＳＮＲと１単位時間前の抑圧ゲインから推定できる。事後ＳＮＲは、入力パワーを雑音パワーで除することで算出される。 When using spectral subtraction, the suppression gain is given by subtracting the noise amplitude (or power) from the input amplitude (or power) and then dividing by the input amplitude (or power). However, since the suppression gain becomes negative when the subtraction result becomes a negative value, measures are taken such that the suppression gain does not fall below a predetermined minimum gain value. On the other hand, when a Wiener filter or MMSE-STSA is used, the suppression gain is calculated based on the posterior SNR and the prior SNR. The a priori SNR can be estimated from the a posteriori SNR and the suppression gain of one unit time before using the Decision-Directed method. The posterior SNR is calculated by dividing the input power by the noise power.

これらの抑圧方法では、第１のパラメータＦ１に含まれる後述する第１の平滑化パワーＰ１を必要とする。また、第２のパラメータＦ２に含まれる第２の音声区間真偽値Ｖ２に基づいて、第２の音声区間真偽値Ｖ２が真値（音声区間を表す値）ならば１を、第２の音声区間真偽値Ｖ２が偽値（雑音区間を表す値）ならば０又は非常に小さい値（例えば０．０１）を入力された周波数帯域信号に乗じることにより雑音を抑圧する方法を、雑音抑圧部１１４で用いることができる。 These suppression methods require a first smoothing power P1, which will be described later, included in the first parameter F1. Further, based on the second speech section truth value V2 included in the second parameter F2, if the second speech section truth value V2 is a true value (a value representing the speech section), 1 is set. Noise suppression is a method of suppressing noise by multiplying an input frequency band signal by 0 or a very small value (for example, 0.01) if the voice interval true / false value V2 is a false value (a value representing a noise interval). Part 114 can be used.

第１のパラメータ算出部１１１及び第２のパラメータ算出部１１２はそれぞれ、既存の音声区間検出技術をアレンジして適用しているものである。第１のパラメータ算出部１１１が適用している既存の音声区間検出技術と、第２のパラメータ算出部１１２が適用している既存の音声区間検出技術とは異なっていても良く、また、同じであっても良い。 Each of the first parameter calculation unit 111 and the second parameter calculation unit 112 arranges and applies an existing speech segment detection technique. The existing speech segment detection technology applied by the first parameter calculation unit 111 and the existing speech segment detection technology applied by the second parameter calculation unit 112 may be different or the same. There may be.

なお、図１では、第１のパラメータ算出部１１１及び第２のパラメータ算出部１１２がそれぞれ、入力パワーを利用して音声区間を検出する音声区間検出技術を適用しているため、共通に適用できるパワー算出部１１０を、第１のパラメータ算出部１１１及び第２のパラメータ算出部１１２の外部に記載しているが、第１のパラメータ算出部１１１及び第２のパラメータ算出部１１２の多くても一方だけが入力パワーを利用して音声区間を検出する音声区間検出技術を適用している場合には、共通するパワー算出部１１０は不要となる。また例えば、第１のパラメータ算出部１１１が利用する入力パワーがＴＰ秒間の２乗和であり、第２のパラメータ算出部１１２が利用する入力パワーがＴＰ秒間の最大振幅であるように、第１のパラメータ算出部１１１及び第２のパラメータ算出部１１２が利用する入力パワーが異なっていても良く、このような場合には、パワー算出部を、第１のパラメータ算出部１１１及び第２のパラメータ算出部１１２毎に別個に設けることを要する。 In FIG. 1, the first parameter calculation unit 111 and the second parameter calculation unit 112 each apply a voice section detection technique that detects a voice section using input power, and thus can be applied in common. The power calculation unit 110 is described outside the first parameter calculation unit 111 and the second parameter calculation unit 112. However, at least one of the first parameter calculation unit 111 and the second parameter calculation unit 112 may be used. In the case where only the speech section detection technology that detects the speech section using the input power is applied, the common power calculation unit 110 is not necessary. Further, for example, the first power is used so that the input power used by the first parameter calculation unit 111 is the sum of squares of TP seconds, and the input power used by the second parameter calculation unit 112 is the maximum amplitude of TP seconds. The input power used by the parameter calculation unit 111 and the second parameter calculation unit 112 may be different from each other. In such a case, the power calculation unit is used as the first parameter calculation unit 111 and the second parameter calculation unit. It is necessary to provide each unit 112 separately.

図２は、第１のパラメータ算出部１１１の詳細構成例を示すブロック図である。図２において、第１のパラメータ算出部１１１は、第１の平滑化部２０１、第１の閾値算出部２０２及び第１の音声区間判定部２０３を有する。 FIG. 2 is a block diagram illustrating a detailed configuration example of the first parameter calculation unit 111. In FIG. 2, the first parameter calculation unit 111 includes a first smoothing unit 201, a first threshold calculation unit 202, and a first speech segment determination unit 203.

第１の平滑化部２０１は、第１の音声区間参考真偽値Ｖｒ１（＝１単位時間前の第２の音声区間真偽値Ｖ２）に基づいて入力パワーＰｉｎを平滑化し、得られた第１の平滑化パワーＰ１を第１の閾値算出部２０２に与える。第１の平滑化部２０１は、第１の音声区間参考真偽値Ｖｒ１が偽値（すなわち、雑音区間を表す値）であるときには入力パワーＰｉｎを平滑化して第１の平滑化パワーＰ１を更新し、第１の音声区間参考真偽値Ｖｒ１が真値（すなわち、音声区間を表す値）であるときには第１の平滑化パワーＰ１を更新しない。従って、第１の平滑化パワーＰ１が意味するのは雑音パワーの平滑化値（雑音パワーの平均的な値）である。なお、平滑化方法や平滑化構成は何ら限定されるものではない。例えば、時定数が０．２秒の時定数フィルタを用いて平滑化する。 The first smoothing unit 201 smoothes the input power Pin based on the first speech section reference truth value Vr1 (= second speech section truth value V2 before one unit time) and is obtained. 1 smoothing power P <b> 1 is given to the first threshold value calculation unit 202. The first smoothing unit 201 smoothes the input power Pin and updates the first smoothed power P1 when the first speech section reference true / false value Vr1 is a false value (that is, a value representing a noise section). When the first speech section reference truth value Vr1 is a true value (that is, a value representing the speech section), the first smoothing power P1 is not updated. Accordingly, the first smoothing power P1 means a smoothed value of noise power (an average value of noise power). Note that the smoothing method and the smoothing configuration are not limited at all. For example, smoothing is performed using a time constant filter having a time constant of 0.2 seconds.

第１の閾値算出部２０２は、第１の平滑化パワーＰ１に１以上の値をとる所定の定数係数Ｃ１（以下、第１の係数と呼ぶ）を乗じて、入力パワーＰｉｎと比較する第１の閾値ＴＨ１を形成して第１の音声区間判定部２０３に与えるものである。第１の平滑化パワーＰ１が雑音パワーの平均的な値を意味し、これに乗算することで、音声パワーと雑音パワーとを切り分けるための第１の閾値ＴＨ１を定める第１の係数Ｃ１の値は、限定されるものではないが、例えば、２を適用することができる。 The first threshold value calculation unit 202 multiplies the first smoothing power P1 by a predetermined constant coefficient C1 (hereinafter referred to as the first coefficient) that takes a value of 1 or more, and compares it with the input power Pin. The threshold TH1 is formed and given to the first speech segment determination unit 203. The first smoothing power P1 means an average value of the noise power, and by multiplying this, the value of the first coefficient C1 that defines the first threshold TH1 for separating the voice power and the noise power. Although not limited, for example, 2 can be applied.

第１の音声区間判定部２０３は、第１の閾値ＴＨ１と入力パワーＰｉｎを比較して音声区間か否かを判定し、第１の音声区間真偽値Ｖ１を出力する。第１の音声区間判定部２０３は、入力パワーＰｉｎが第１の閾値ＴＨ１より大きければ第１の音声区間真偽値Ｖ１として真値を出力し、そうでなければ偽値を出力する。 The first speech segment determination unit 203 compares the first threshold value TH1 with the input power Pin to determine whether or not it is a speech segment, and outputs a first speech segment truth value V1. The first speech segment determination unit 203 outputs a true value as the first speech segment true / false value V1 if the input power Pin is greater than the first threshold TH1, and otherwise outputs a false value.

第１のパラメータ算出部１１１は、以上のようにして得られた第１の平滑化パワーＰ１と第１の音声区間真偽値Ｖ１を、第１のパラメータＦ１として出力する。 The first parameter calculation unit 111 outputs the first smoothing power P1 and the first speech section truth value V1 obtained as described above as the first parameter F1.

図３は、第２のパラメータ算出部１１２の詳細構成例を示すブロック図である。図３において、第２のパラメータ算出部１１２は、第２の平滑化部３０１、第２の閾値算出部３０２及び第２の音声区間判定部３０３を有する。 FIG. 3 is a block diagram illustrating a detailed configuration example of the second parameter calculation unit 112. In FIG. 3, the second parameter calculation unit 112 includes a second smoothing unit 301, a second threshold calculation unit 302, and a second speech segment determination unit 303.

第２の平滑化部３０１は、第２の音声区間参考真偽値Ｖｒ２（同一単位時間における第１の音声区間真偽値Ｖ１）に基づいて入力パワーＰｉｎを平滑化し、得られた第２の平滑化パワーＰ２を第２の閾値算出部３０２に与える。第２の平滑化部３０１は、第２の音声区間参考真偽値Ｖｒ２が真値（すなわち、音声区間を表す値）であるときには入力パワーＰｉｎを平滑化して第２の平滑化パワーＰ２を更新し、第２の音声区間参考真偽値Ｖｒ２が偽値（すなわち、雑音区間を表す値）であるときには第２の平滑化パワーＰ２を更新しない。従って、第２の平滑化パワーＰ２が意味するのは音声パワーの平滑化値（音声パワーの平均的な値）である。なお、平滑化方法や平滑化構成は何ら限定されるものではない。例えば、時定数が０．８秒の時定数フィルタを用いて平滑化する。 The second smoothing unit 301 smoothes the input power Pin based on the second speech section reference truth value Vr2 (first speech section truth value V1 in the same unit time), and obtains the second obtained The smoothing power P2 is given to the second threshold value calculation unit 302. The second smoothing unit 301 smoothes the input power Pin and updates the second smoothed power P2 when the second speech section reference truth value Vr2 is a true value (that is, a value representing the speech section). When the second speech section reference true / false value Vr2 is a false value (that is, a value representing a noise section), the second smoothing power P2 is not updated. Therefore, the second smoothing power P2 means the smoothed value of the audio power (average value of the audio power). Note that the smoothing method and the smoothing configuration are not limited at all. For example, smoothing is performed using a time constant filter having a time constant of 0.8 seconds.

時定数は対象信号の追従性と平滑化された値の安定性とのトレードオフで決定されるものであり、上述した第１の平滑化部２０１は雑音区間の入力パワーＰｉｎを平滑化するのに対して、第２の平滑化部３０１は音声区間の入力パワーＰｉｎを平滑化するので、後者の方については安定性に重みをおき、後者の時定数の方を長くするように選定した。 The time constant is determined by a trade-off between the followability of the target signal and the stability of the smoothed value, and the first smoothing unit 201 described above smoothes the input power Pin in the noise section. On the other hand, since the second smoothing unit 301 smoothes the input power Pin of the speech section, the latter is selected so as to give a weight to stability and to make the latter time constant longer.

第２の閾値算出部３０２は、第２の平滑化パワーＰ２に０より大きく１以下の値をとる所定の定数係数Ｃ２（以下、第２の係数と呼ぶ）を乗じて、入力パワーＰｉｎと比較する第２の閾値ＴＨ２を形成して第２の音声区間判定部３０３に与えるものである。第２の平滑化パワーＰ１が音声パワーの平均的な値を意味し、これに乗算することで、音声パワーと雑音パワーとを切り分けるための第２の閾値ＴＨ２を定める第２の係数Ｃ２の値は、限定されるものではないが、例えば、０．５を適用することができる。 The second threshold value calculation unit 302 multiplies the second smoothing power P2 by a predetermined constant coefficient C2 (hereinafter referred to as the second coefficient) having a value greater than 0 and 1 or less, and compares the result with the input power Pin. The second threshold TH2 is formed and given to the second speech segment determination unit 303. The second smoothing power P1 means an average value of the voice power, and by multiplying this, the value of the second coefficient C2 that defines the second threshold TH2 for separating the voice power and the noise power. Although not limited, for example, 0.5 can be applied.

第２の音声区間判定部３０３は、第２の閾値ＴＨ２と入力パワーＰｉｎを比較して音声区間か否かを判定し、第２の音声区間真偽値Ｖ２を出力する。第２の音声区間判定部３０３は、入力パワーＰｉｎが第２の閾値ＴＨ２より大きければ第２の音声区間真偽値Ｖ２として真値を出力し、そうでなければ偽値を出力する。 The second speech segment determination unit 303 compares the second threshold value TH2 with the input power Pin to determine whether or not the speech segment is present, and outputs a second speech segment truth value V2. The second speech segment determination unit 303 outputs a true value as the second speech segment true / false value V2 if the input power Pin is greater than the second threshold value TH2, and otherwise outputs a false value.

上述した第１のパラメータ算出部１１１及び／又は第２のパラメータ算出部１１２は、音声区間検出で多用されるハングオーバーを行うようにしても良い。ハングオーバーについては、後述する動作説明の項で明らかにする。 The first parameter calculation unit 111 and / or the second parameter calculation unit 112 described above may perform hangover frequently used in voice segment detection. The hangover will be clarified in the description of the operation described later.

第２のパラメータ算出部１１２は、以上のようにして得られた第２の平滑化パワーＰ２と第２の音声区間真偽値Ｖ２を、第２のパラメータＦ２として出力する。 The second parameter calculation unit 112 outputs the second smoothing power P2 and the second speech interval truth value V2 obtained as described above as the second parameter F2.

（Ａ−２）第１の実施形態の動作
次に、上述した構成を有する第１の実施形態の雑音抑圧装置１００の動作を説明する。まず、第１の実施形態の雑音抑圧装置１００の全体動作を説明した後、第１のパラメータ算出部１１１及び第２のパラメータ算出部１１２の動作を順に説明し、さらに、ハングオーバー動作についても説明する。 (A-2) Operation of the First Embodiment Next, the operation of the noise suppression device 100 of the first embodiment having the above-described configuration will be described. First, after describing the overall operation of the noise suppression apparatus 100 of the first embodiment, the operations of the first parameter calculation unit 111 and the second parameter calculation unit 112 will be described in order, and further the hangover operation will be described. To do.

図１において、第１の実施形態の雑音抑圧装置１００への入力信号は周波数解析部１０１に与えられ、周波数解析部１０１において、入力信号は周波数解析されて周波数スペクトルが算出され、得られた入力スペクトルが帯域別雑音抑圧手段１０２（１０２−１〜１０２−Ｍ）に与えられる。 In FIG. 1, an input signal to the noise suppression apparatus 100 of the first embodiment is given to a frequency analysis unit 101. In the frequency analysis unit 101, the input signal is subjected to frequency analysis to calculate a frequency spectrum, and the obtained input is obtained. The spectrum is given to the noise suppression unit 102 (102-1 to 102-M) for each band.

各帯域別雑音抑圧手段１０２において、それぞれ、入力された周波数帯域信号はパワー算出部１１０及び雑音抑圧部１１４に与えられる。 In each noise suppression unit 102 for each band, the input frequency band signal is supplied to the power calculation unit 110 and the noise suppression unit 114, respectively.

パワー算出部１１０において、入力された周波数帯域信号のＴＰ秒間のパワーが算出され、得られた入力パワーＰｉｎが第１のパラメータ算出部１１１、第２のパラメータ算出部１１２及び雑音抑圧部１１４に与えられる。 The power calculation unit 110 calculates the power of the input frequency band signal for TP seconds, and gives the obtained input power Pin to the first parameter calculation unit 111, the second parameter calculation unit 112, and the noise suppression unit 114. It is done.

第１のパラメータ算出部１１１においては、１単位時間前の第２のパラメータ算出部１１２の検出結果である第２の音声区間真偽値Ｖ２を含む第２のパラメータＦ２と、入力パワーＰｉｎとが適用されて音声区間の検出動作が実行され、得られた第１の音声区間真偽値Ｖ１を含む第１のパラメータＦ１が第２のパラメータ算出部１１２及び雑音抑圧部１１４に与えられる。 In the first parameter calculation unit 111, the second parameter F2 including the second speech section truth value V2 that is the detection result of the second parameter calculation unit 112 one unit time ago, and the input power Pin are obtained. The voice section detection operation is applied and the first parameter F1 including the obtained first voice section truth value V1 is given to the second parameter calculation unit 112 and the noise suppression unit 114.

第２のパラメータ算出部１１２においても、第１のパラメータ算出部１１１の検出結果である第１の音声区間真偽値Ｖ１を少なくとも含む第１のパラメータＦ１と、入力パワーＰｉｎとが適用されて音声区間の検出動作が実行され、得られた第２の音声区間真偽値Ｖ２を少なくとも含む第２のパラメータＦ２が、単位時間遅延部１１３を介して第１のパラメータ算出部１１１に与えられると共に、雑音抑圧部１１４に与えられる。 Also in the second parameter calculation unit 112, the first parameter F1 including at least the first voice section true / false value V1, which is the detection result of the first parameter calculation unit 111, and the input power Pin are applied and the voice is applied. The section detection operation is executed, and the second parameter F2 including at least the obtained second voice section truth value V2 is given to the first parameter calculation section 111 via the unit time delay section 113, and The noise suppression unit 114 is provided.

雑音抑圧部１１４においては、第１のパラメータＦ１、第２のパラメータＦ２及び入力パワーＰｉｎに基づいて、入力スペクトル（入力された周波数帯域信号）の雑音成分が抑圧され、得られた出力スペクトル（抑圧後の周波数帯域信号）が波形復元部１０３に与えられる。 In the noise suppression unit 114, the noise component of the input spectrum (input frequency band signal) is suppressed based on the first parameter F1, the second parameter F2, and the input power Pin, and the obtained output spectrum (suppression) The later frequency band signal) is given to the waveform restoration unit 103.

そして、波形復元部１０３において、全ての帯域別雑音抑圧手段１０２（１０２−１~１０２−Ｍ）から与えられたスペクトル（雑音抑圧後の周波数帯域信号）でなる出カスペクトルが時間領域の信号に変換され、得られた時間領域信号が、当該雑音抑圧装置１００の出力として次段の装置に出力される。 Then, in the waveform restoration unit 103, the output spectrum composed of the spectrums (frequency band signals after noise suppression) given from all the noise suppression units 102 (102-1 to 102-M) for each band is converted into a signal in the time domain. The time domain signal obtained by the conversion is output as an output of the noise suppression apparatus 100 to the next stage apparatus.

次に、第１のパラメータ算出部１１１の動作を、図２を参照しながら説明する。 Next, the operation of the first parameter calculation unit 111 will be described with reference to FIG.

第１の平滑化部２０１においては、第１の音声区間参考真偽値Ｖｒ１（＝１単位時間前の第２の音声区間真偽値Ｖ２）に基づいて入力パワーＰｉｎが平滑化される。すなわち、第１の音声区間参考真偽値Ｖｒ１が偽値であるときには入力パワーＰｉｎが平滑化されて第１の平滑化パワーＰ１が更新され、一方、第１の音声区間参考真偽値Ｖｒ１が真値であるときには第１の平滑化パワーＰ１が更新されずにその直前の第１の平滑化パワーＰ１が維持される。 In the first smoothing unit 201, the input power Pin is smoothed based on the first speech section reference truth value Vr1 (= second speech section truth value V2 one unit time before). That is, when the first speech section reference truth value Vr1 is a false value, the input power Pin is smoothed and the first smoothing power P1 is updated, while the first speech section reference truth value Vr1 is updated. When the value is true, the first smoothing power P1 is not updated and the immediately preceding first smoothing power P1 is maintained.

上述のようにして得られた第１の平滑化パワーＰ１が第１の閾値算出部２０２及び第１の音声区間判定部２０３に与えられる。第１の閾値算出部２０２において、第１の平滑化パワーＰ１には、１以上の値をとる第１の係数Ｃ１が乗算される。そして、第１の音声区間判定部２０３において、乗算結果である第１の閾値ＴＨ１と、入力パワーＰｉｎとが比較され、入力パワーＰｉｎが第１の閾値ＴＨ１より大きいときに、真値の第１の音声区間真偽値Ｖ１が第１の音声区間判定部２０３から第２のパラメータ算出部１１２及び雑音抑圧部１１４へ出力され、入力パワーＰｉｎが第１の閾値ＴＨ１以下のときに、偽値の第１の音声区間真偽値Ｖ１が第１の音声区間判定部２０３から第２のパラメータ算出部１１２及び雑音抑圧部１１４へ出力される。なお、上述した第１の平滑化パワーＰ１も、雑音抑圧部１１４へ出力される。 The first smoothing power P1 obtained as described above is given to the first threshold value calculation unit 202 and the first speech segment determination unit 203. In the first threshold value calculation unit 202, the first smoothing power P1 is multiplied by a first coefficient C1 having a value of 1 or more. Then, in the first speech section determination unit 203, the first threshold value TH1 as the multiplication result is compared with the input power Pin, and when the input power Pin is larger than the first threshold value TH1, the first true value is obtained. Is output from the first speech segment determination unit 203 to the second parameter calculation unit 112 and the noise suppression unit 114, and when the input power Pin is less than or equal to the first threshold value TH1, The first speech segment truth value V1 is output from the first speech segment determination unit 203 to the second parameter calculation unit 112 and the noise suppression unit 114. The first smoothing power P1 described above is also output to the noise suppression unit 114.

次に、第２のパラメータ算出部１１２の動作を、図３を参照しながら説明する。 Next, the operation of the second parameter calculation unit 112 will be described with reference to FIG.

第２の平滑化部３０１においては、第２の音声区間参考真偽値Ｖｒ２（＝同一の単位時間での第１の音声区間真偽値Ｖ１）に基づいて入力パワーＰｉｎが平滑化される。すなわち、第２の音声区間参考真偽値Ｖｒ２が真値であるときには入力パワーＰｉｎが平滑化されて第２の平滑化パワーＰ２が更新され、一方、第２の音声区間参考真偽値Ｖｒ２が偽値であるときには第２の平滑化パワーＰ２が更新されずにその直前の第２の平滑化パワーＰ２が維持される。 In the second smoothing unit 301, the input power Pin is smoothed based on the second speech section reference truth value Vr2 (= the first speech section truth value V1 in the same unit time). That is, when the second speech section reference truth value Vr2 is a true value, the input power Pin is smoothed and the second smoothing power P2 is updated, while the second speech section reference truth value Vr2 is updated. When it is a false value, the second smoothing power P2 is not updated and the immediately preceding second smoothing power P2 is maintained.

上述のようにして得られた第２の平滑化パワーＰ２が第２の閾値算出部３０２及び第２の音声区間判定部３０３に与えられる。第２の閾値算出部３０２において、第２の平滑化パワーＰ２には、０より大きく１以下の値をとる第２の係数Ｃ２が乗算される。そして、第２の音声区間判定部３０３において、乗算結果である第２の閾値ＴＨ２と、入力パワーＰｉｎとが比較され、入力パワーＰｉｎが第２の閾値ＴＨ２より大きいときに、真値の第２の音声区間真偽値Ｖ２が第２の音声区間判定部３０３から単位時間遅延部１１３（従って第１のパラメータ算出部１１１）及び雑音抑圧部１１４へ出力され、入力パワーＰｉｎが第２の閾値ＴＨ２以下のときに、偽値の第２の音声区間真偽値Ｖ２が単位時間遅延部１１３（従って第１のパラメータ算出部１１１）及び雑音抑圧部１１４へ出力される。なお、上述した第２の平滑化パワーＰ２も、雑音抑圧部１１４へ出力される。 The second smoothing power P2 obtained as described above is provided to the second threshold value calculation unit 302 and the second speech segment determination unit 303. In the second threshold value calculation unit 302, the second smoothing power P2 is multiplied by a second coefficient C2 that is greater than 0 and less than or equal to 1. Then, in the second speech section determination unit 303, the second threshold value TH2 which is the multiplication result is compared with the input power Pin, and when the input power Pin is larger than the second threshold value TH2, the second value of the true value is obtained. Is output from the second speech segment determination unit 303 to the unit time delay unit 113 (and thus the first parameter calculation unit 111) and the noise suppression unit 114, and the input power Pin is set to the second threshold value TH2. In the following case, the false second speech section true / false value V2 is output to the unit time delay unit 113 (therefore, the first parameter calculation unit 111) and the noise suppression unit 114. The above-described second smoothing power P2 is also output to the noise suppression unit 114.

以上では、ハングオーバー動作を実行しないように説明したが、第１の音声区間判定部２０３及び第２の音声区間判定部３０３の少なくとも一方でハングオーバー動作を実行するようにしても良い。 In the above description, the hangover operation is not performed. However, at least one of the first speech segment determination unit 203 and the second speech segment determination unit 303 may perform the hangover operation.

以下、第１の音声区間判定部２０３及び第２の音声区間判定部３０３の少なくとも一方で実行されるハングオーバー動作について説明する。なお、ハングオーバー動作は、第１の音声区間判定部２０３及び第２の音声区間判定部３０３の両方で行っても良く、また、一方で行っても良い（但し、第１の音声区間判定部２０３及び第２の音声区間判定部３０３の両方でハングオーバー動作を実行しない実施形態も本発明の一つの実施形態となる）。 Hereinafter, the hangover operation executed by at least one of the first speech segment determination unit 203 and the second speech segment determination unit 303 will be described. Note that the hangover operation may be performed by both the first speech segment determination unit 203 and the second speech segment determination unit 303, or may be performed by one (however, the first speech segment determination unit). An embodiment in which both the 203 and the second speech segment determination unit 303 do not execute the hangover operation is also an embodiment of the present invention).

第１の音声区間判定部２０３におけるハングオーバー動作と第２の音声区間判定部３０３におけるハングオーバー動作とは、同様であるので、以下では、第１の音声区間判定部２０３におけるハングオーバー動作のみを説明し、第２の音声区間判定部３０３におけるハングオーバー動作の説明は省略する。 Since the hangover operation in the first speech segment determination unit 203 and the hangover operation in the second speech segment determination unit 303 are the same, only the hangover operation in the first speech segment determination unit 203 will be described below. A description of the hangover operation in the second speech section determination unit 303 will be omitted.

第１の音声区間判定部２０３に関し、最後に真値が出力されてからの第１の経過時間Ｔｅ１に対する所定のハングオーバー時間Ｔｈｎ１を予め定めておく。第１の音声区間判定部２０３は、第１の閾値ＴＨ１と入力パワーＰｉｎを比較した際、（ｉ）Ｐｉｎ＞ＴＨ１である場合には真値の第１の音声区間真偽値Ｖ１を出カすると共に第１の経過時間Ｔｅ１を０クリアし、（ii）Ｐｉｎ≦ＴＨ１且つＴｅ１≦Ｔｈｎ１である場合には真値の第１の音声区間真偽値Ｖ１を出カすると共に第１の経過時間Ｔｅ１を１単位時間分だけインクリメントし、（iii）Ｐｉｎ≦ＴＨ１且つＴｅ１＞Ｔｈｎ１である場合には偽値の第１の音声区間真偽値Ｖ１を出力する。 With respect to the first speech section determination unit 203, a predetermined hangover time Thn1 with respect to the first elapsed time Te1 since the last true value was output is determined in advance. When comparing the first threshold value TH1 and the input power Pin, the first speech segment determination unit 203 outputs a true first speech segment truth value V1 if (i) Pin> TH1. In addition, the first elapsed time Te1 is cleared to 0, and (ii) if Pin ≦ TH1 and Te1 ≦ Thn1, the first true voice interval truth value V1 is output and the first elapsed time is output. Te1 is incremented by one unit time, and (iii) if Pin ≦ TH1 and Te1> Thn1, a false first voice interval true / false value V1 is output.

ここで、第１の音声区間判定部２０３及び第２の音声区間判定部３０３の両方でハングオーバー動作を行う場合において、第１の音声区間判定部２０３におけるハングオーバー時間Ｔｈｎ１と、第２の音声区間判定部３０３におけるハングオーバー時間Ｔｈｎ２とは同じであっても良く、また、異なっていても良い。以下では、異なるようにさせる例を説明する。第１の実施形態では、第１の音声区間真偽値Ｖ１は第２のパラメータ算出部１１２において音声パワーの平均的な値の推定に用いられるので、雑音区間を誤って音声区間と判定させないために、第１の音声区間判定部２０３のハングオーバー時間Ｔｈｎ１は短めに設定される。逆に、第２の音声区間真偽値Ｖ２は第１のパラメータ算出部１１１において雑音パワーの平均的な値の推定に用いられるので、音声区間を誤って雑音区間と判定させないために、第２の音声区間判定部３０３のハングオーバー時間Ｔｈｎ２は長めに設定される。例えば、第１の音声区間判定部２０３におけるハングオーバー時間Ｔｈｎ１を０．１秒とし、第２の音声区間判定部３０３におけるハングオーバー時間Ｔｈｎ２を０．２秒とする設定が好適である。 Here, when the hangover operation is performed in both the first speech segment determination unit 203 and the second speech segment determination unit 303, the hangover time Thn1 in the first speech segment determination unit 203 and the second speech segment are determined. The hangover time Thn2 in the section determination unit 303 may be the same or different. Below, the example made to make it differ is demonstrated. In the first embodiment, since the first speech section true / false value V1 is used for the estimation of the average value of the speech power in the second parameter calculation unit 112, the noise section is not erroneously determined as the speech section. In addition, the hangover time Thn1 of the first speech segment determination unit 203 is set short. On the other hand, since the second speech section truth value V2 is used for estimating the average value of the noise power in the first parameter calculation unit 111, the second speech section truth value V2 is used to prevent the voice section from being erroneously determined as the noise section. The hangover time Thn2 of the voice section determination unit 303 is set longer. For example, it is preferable to set the hangover time Thn1 in the first speech segment determination unit 203 to 0.1 seconds and the hangover time Thn2 in the second speech segment determination unit 303 to 0.2 seconds.

（Ａ−３）第１の実施形態の構成に至った考え方
次に、第１の実施形態の雑音抑圧検出装置１００の構成に至った考え方（後述する実施形態も同様である）を説明する。 (A-3) Concept of reaching the configuration of the first embodiment Next, the concept of reaching the configuration of the noise suppression detection apparatus 100 of the first embodiment (the same applies to the embodiments described later) will be described.

従来技術のアプローチは、雑音パワーを推定してから音声区間を検出するか、音声区間を検出してから雑音パワーを推定するかであった。特許文献１の記載技術における重み係数は、音声が強く優勢なら０で、雑音が強く優勢なら１となるので、音声区間を検出していることとほぼ同義である。このようなアプローチでは、先に実施された方の推定又は検出が不正確となり、後に実施される方の推定又は検出も不正確となる。 The approach of the prior art is to detect the speech section after estimating the noise power, or to estimate the noise power after detecting the speech section. The weighting factor in the technique described in Patent Document 1 is almost the same as detecting a speech section because it is 0 if the voice is strong and dominant and is 1 if the noise is strong and dominant. In such an approach, the estimation or detection performed earlier is inaccurate, and the estimation or detection performed later is also inaccurate.

一方、第１の実施形態では、第１のパラメータ算出部及び第２のパラメータ算出部を有し、なおかつ相互に情報をやり取りする。すなわち、例えば、両パラメータ算出部が、雑音パワーを推定してから音声区間を検出する構成であったとして、一方のパラメータ算出部において雑音パワーを推定する際に他方のパラメータ算出部から出力された音声区間の検出結果を利用することによって、精度の高い雑音パワー推定が可能となる。なお、実用的には、例えば、一方のパラメータ算出部で雑音パワーを推定し、他方のパラメータ算出部で音声パワーを推定するといったように（第１の実施形態はこの場合に該当する）、両パラメータ算出部が異なる観点で推定及び検出を行うことで、さらに精度を向上させる。 On the other hand, the first embodiment has a first parameter calculation unit and a second parameter calculation unit, and exchanges information with each other. That is, for example, assuming that both parameter calculation units are configured to detect the speech section after estimating the noise power, when the noise power is estimated by one parameter calculation unit, it is output from the other parameter calculation unit. By using the detection result of the speech section, it is possible to estimate the noise power with high accuracy. Practically, for example, the noise power is estimated by one parameter calculation unit and the sound power is estimated by the other parameter calculation unit (the first embodiment corresponds to this case). The accuracy is further improved by estimating and detecting the parameter calculation unit from different viewpoints.

このように、周波数帯域ごとに二つのパラメータ算出部を用意して、それらがパラメータを相互に交換しながら有機的にパラメータを算出することによって、帯域ごとの雑音パワー（後述する第３の実施形態ではＳＮＲ）の推定、音声区間検出を高い精度で実現することができる。そして、これらの結果を雑音抑圧に応用することで、より少ない歪みで音声を強調した出力音声を得ることができる。 In this way, two parameter calculation units are prepared for each frequency band, and the parameters are organically calculated while exchanging parameters with each other, whereby noise power for each band (third embodiment to be described later). Then, estimation of SNR) and speech segment detection can be realized with high accuracy. Then, by applying these results to noise suppression, it is possible to obtain output speech in which speech is enhanced with less distortion.

（Ａ−４）第１の実施形態の効果
第１の実施形態によれば、２つのパラメータ算出部が互いのパラメータ（音声パワーと雑音パワーの平均的な値の推定値）の更新を補い合うことで、パラメータの算出精度及び音声区間の検出精度を向上させることができるため、自然性と明瞭度の高い雑音抑圧を実現できる。 (A-4) Effect of First Embodiment According to the first embodiment, two parameter calculation units supplement each other's update of parameters (estimated values of audio power and noise power). Therefore, since the parameter calculation accuracy and the speech interval detection accuracy can be improved, noise suppression with high naturalness and clarity can be realized.

（Ａ−５）第１の実施形態の変形実施形態
上述した第１の実施形態の説明では、第２のパラメータ算出部１１２（言い換えると第２の音声区間判定部３０３）がハングオーバー動作をしても良く、また、ハングオーバー動作をしなくても良い旨を説明した。ハングオーバー動作を行う場合であれば、第１のパラメータ算出部１１１にフィードバックされる音声区間真偽値も雑音抑圧部１１４に出力される音声区間真偽値もハングオーバー動作されたものとなり、ハングオーバー動作を行なわない場合であれば、第１のパラメータ算出部１１１にフィードバックされる雑音抑圧部１１４も次段の装置に出力される音声区間真偽値もハングオーバー動作がなされていないものとなる。 (A-5) Modified Embodiment of First Embodiment In the description of the first embodiment described above, the second parameter calculation unit 112 (in other words, the second speech segment determination unit 303) performs a hangover operation. As described above, it has been explained that the hangover operation is not necessary. In the case of performing the hangover operation, the speech section truth value fed back to the first parameter calculation unit 111 and the speech section truth value output to the noise suppression unit 114 are also subjected to the hangover operation. If the over operation is not performed, neither the noise suppression unit 114 fed back to the first parameter calculation unit 111 nor the speech interval truth value output to the next stage device is subjected to the hangover operation. .

図４は、第１の実施形態をハングオーバー面で変形した帯域別抑圧手段１０２Ａ内の第２のパラメータ算出部１１２Ａの構成を示すブロック図である。 FIG. 4 is a block diagram showing a configuration of the second parameter calculation unit 112A in the band-by-band suppression unit 102A obtained by modifying the first embodiment with a hangover surface.

この第２のパラメータ算出部１１２Ａにおいては、ハングオーバー動作を実行しない第２の音声区間判定部３０３に加えて、第２の音声区間判定部３０３から出力された第２の音声区間真偽値Ｖ２に対してハングオーバー動作を実行するハングオーバー部３０４が設けられている。第２の音声区間判定部３０３から出力された第２の音声区間真偽値Ｖ２は、単位時間遅延部１１３を介して第１のパラメータ算出部１１１に与えられると共に、ハングオーバー部３０４を介して次段の装置に与えられる。 In the second parameter calculation unit 112A, in addition to the second voice segment determination unit 303 that does not execute the hangover operation, the second voice segment truth value V2 output from the second voice segment determination unit 303 is used. Is provided with a hangover unit 304 for executing a hangover operation. The second speech segment truth value V2 output from the second speech segment determination unit 303 is provided to the first parameter calculation unit 111 via the unit time delay unit 113 and also via the hangover unit 304. It is given to the next stage device.

ハングオーバー部３０４には、自己が出力する音声区間真偽値Ｖ０に真値が設定されてからの経過時間Ｔｅ０に対する所定のハングオーバー時間Ｔｈｎ０を予め定めておく。ハングオーバー部３０４は、（ｉ）入力された第２の音声区間真偽値Ｖ２が真値である場合には真値の音声区間真偽値Ｖ０を出カすると共に経過時間Ｔｅ０を０クリアし、（ii）第２の音声区間真偽値Ｖ２が偽値で且つＴｅ０≦Ｔｈｎ０である場合には真値の音声区間真偽値Ｖ０を出カすると共に経過時間Ｔｅ０を１単位時間分だけインクリメントし、（iii）第２の音声区間真偽値Ｖ２が偽値で且つＴｅ０＞Ｔｈｎ０である場合には偽値の音声区間真偽値Ｖ０を出力する。ハングオーバー時間Ｔｈｎ０は、音声区間真偽値Ｖ０の用途によって最適な値は異なるが、例えば、音声認識に利用する場合であれば０．５秒が好適である。 In the hangover unit 304, a predetermined hangover time Thn0 with respect to an elapsed time Te0 from when the true value is set to the voice section true / false value V0 output by itself is determined in advance. The hangover unit 304 (i) outputs the true speech interval truth value V0 and clears the elapsed time Te0 to 0 when the input second speech interval truth value V2 is a true value. (Ii) When the second speech section truth value V2 is a false value and Te0 ≦ Thn0, the true speech section truth value V0 is output and the elapsed time Te0 is incremented by one unit time. (Iii) If the second speech interval truth value V2 is a false value and Te0> Thn0, the false speech interval truth value V0 is output. The optimum value of the hangover time Thn0 varies depending on the use of the voice section true / false value V0. However, for example, 0.5 seconds is preferable when used for voice recognition.

（Ｂ）第２の実施形態
次に、本発明による雑音除去装置及びプログラムの第２の実施形態を、図面を参照しながら説明する。 (B) Second Embodiment Next, a noise removal apparatus and program according to a second embodiment of the present invention will be described with reference to the drawings.

第１の実施形態では、平滑化パワー（特に雑音パワーの平均値を意味する第１の平滑化パワー）の算出及び音声区間の判定に際して、平滑化パワーと入力パワーとの比較のために所定の定数係数Ｃ１及びＣ２を用いていたが、最適な係数は音声と雑音のパワーバランスで異なる。そこで、第１のパラメータ算出部及び第２のパラメータ算出部で授受するパラメータを、第１の実施形態では音声区間真偽値のみとしていたが、第２の実施形態では音声区間真偽値に加えて平滑化パワーを含め、該平滑化パワーをも利用して閾値を更新することとした。 In the first embodiment, when calculating the smoothing power (particularly, the first smoothing power that means the average value of the noise power) and determining the speech section, a predetermined value is used for comparison between the smoothing power and the input power. Although constant coefficients C1 and C2 were used, the optimum coefficient differs depending on the power balance between voice and noise. Therefore, the parameters sent and received by the first parameter calculation unit and the second parameter calculation unit are only the speech section truth values in the first embodiment, but in the second embodiment, in addition to the speech section truth values. Thus, the threshold value is updated using the smoothing power including the smoothing power.

（Ｂ−１）第２の実施形態の構成
第２の実施形態の雑音除去装置（以下、符号「１００Ｂ」を用いる）の全体構成も、上述した図１で表すことができる。但し、第１のパラメータ算出部（以下、符号「１１１Ｂ」を用いる）及び第２のパラメータ算出部（以下、符号「１１２Ｂ」を用いる）の詳細な構成が第１の実施形態と異なっている。そのため、以下では、主として、第１のパラメータ算出部１１１Ｂ及び第２のパラメータ算出部１１２Ｂの詳細構成を説明する。 (B-1) Configuration of Second Embodiment The overall configuration of the noise removal apparatus (hereinafter, “100B” is used) of the second embodiment can also be expressed in FIG. 1 described above. However, the detailed configurations of the first parameter calculation unit (hereinafter referred to as “111B”) and the second parameter calculation unit (hereinafter referred to as “112B”) are different from those of the first embodiment. Therefore, hereinafter, the detailed configuration of the first parameter calculation unit 111B and the second parameter calculation unit 112B will be mainly described.

図５は、第２の実施形態における第１のパラメータ算出部１１１Ｂの詳細構成を示すブロック図であり、上述した第１の実施形態に係る図２との同一、対応部分には同一、対応符号を付して示している。 FIG. 5 is a block diagram showing a detailed configuration of the first parameter calculation unit 111B in the second embodiment, which is the same as in FIG. 2 according to the first embodiment described above, the same corresponding parts, the same reference numerals Is shown.

図５において、第１のパラメータ算出部１１１Ｂは、第１の平滑化部２０１、第１の閾値算出部２０２Ｂ及び第１の音声区間判定部２０３を有する。第１の平滑化部２０１及び第１の音声区間判定部２０３は第１の実施形態のものと同様であるので、その機能説明は省略する。 In FIG. 5, the first parameter calculation unit 111B includes a first smoothing unit 201, a first threshold value calculation unit 202B, and a first speech segment determination unit 203. Since the first smoothing unit 201 and the first speech segment determination unit 203 are the same as those in the first embodiment, their functional descriptions are omitted.

第２の実施形態における第１の閾値算出部２０２Ｂは、第１の平滑化部２０１から出力された第１の平滑化パワーＰ１と、単位時間遅延部１１３を介して第２のパラメータ算出部１１２Ｂから与えられた第１の参考平滑化パワーＰｒ１（＝１単位時間前の第２の平滑化パワーＰ２）とに基づいて、入力パワーＰｉｎと比較する第１の閾値ＴＨ１Ｂを形成して第１の音声区間判定部２０３に与えるものである。 The first threshold calculation unit 202B according to the second embodiment is configured to output the first smoothing power P1 output from the first smoothing unit 201 and the second parameter calculation unit 112B via the unit time delay unit 113. The first threshold value TH1B to be compared with the input power Pin is formed based on the first reference smoothing power Pr1 (= second smoothing power P2 before one unit time) given by This is given to the voice section determination unit 203.

第１の閾値ＴＨ１Ｂの形成に用いられる２つの値のうち、第１の平滑化パワーＰ１が雑音パワーの平均的な値を意味し、第１の参考平滑化パワーＰｒ１が１単位時間前の音声パワーの平均的な値を意味するので、第１の閾値ＴＨ１Ｂとして、第１の平滑化パワーＰ１及び第１の参考平滑化パワーＰｒ１の平均値を適用することが好ましい。平均値は相加平均（Ｐ１＋Ｐｒ１）／２であっても相乗平均（Ｐ１×Ｐｒ１）^１／２であっても良い。第１の閾値ＴＨ１Ｂとして平均値以外を適用する場合においては、第１の閾値ＴＨ１Ｂを、第１の平滑化パワーＰ１より大きく平均値より小さい値とし、第１の音声区間判定部２０３で雑音区間より音声区間と判定される機会を多くすることが好ましい。演算の容易性などから、第１の閾値ＴＨ１Ｂの値として相加平均（Ｐ１＋Ｐｒ１）／２が好適である。 Of the two values used to form the first threshold value TH1B, the first smoothing power P1 means an average value of the noise power, and the first reference smoothing power Pr1 is the sound one unit time before. Since it means an average value of power, it is preferable to apply an average value of the first smoothing power P1 and the first reference smoothing power Pr1 as the first threshold TH1B. The average value may be an arithmetic average (P1 + Pr1) / 2 or a geometric mean (P1 × Pr1) ^1/2 . When a value other than the average value is applied as the first threshold value TH1B, the first threshold value TH1B is set to a value larger than the first smoothing power P1 and smaller than the average value, and the first speech segment determination unit 203 uses the noise segment. It is preferable to increase the number of opportunities to be determined as voice segments. From the viewpoint of ease of calculation, the arithmetic average (P1 + Pr1) / 2 is preferable as the value of the first threshold value TH1B.

第２の実施形態の場合、第１のパラメータ算出部１１１Ｂは、第１の平滑化部２０１から出力された第１の平滑化パワーＰ１と第１の音声区間判定部２０３から出力された第１の音声区間真偽値Ｖ１とを含む第１のパラメータＦ１を第２のパラメータ算出部１１２Ｂ及び雑音抑圧部１１４に与える。 In the case of the second embodiment, the first parameter calculation unit 111B includes the first smoothing power P1 output from the first smoothing unit 201 and the first output from the first speech segment determination unit 203. The first parameter F1 including the voice section true / false value V1 is supplied to the second parameter calculation unit 112B and the noise suppression unit 114.

図１２は、第２の実施形態における第２のパラメータ算出部１１２Ｂの詳細構成を示すブロック図であり、上述した第１の実施形態に係る図３との同一、対応部分には同一、対応符号を付して示している。 FIG. 12 is a block diagram showing a detailed configuration of the second parameter calculation unit 112B in the second embodiment, which is the same as in FIG. 3 according to the first embodiment described above, corresponding parts are the same, corresponding codes Is shown.

図１２において、第２のパラメータ算出部１１２Ｂは、第２の平滑化部３０１、第２の閾値算出部３０２Ｂ及び第２の音声区間判定部３０３を有する。第２の平滑化部３０１及び第２の音声区間判定部３０３は第１の実施形態のものと同様であるので、その機能説明は省略する。 In FIG. 12, the second parameter calculation unit 112 </ b> B includes a second smoothing unit 301, a second threshold calculation unit 302 </ b> B, and a second speech segment determination unit 303. Since the second smoothing unit 301 and the second speech segment determination unit 303 are the same as those in the first embodiment, their functional descriptions are omitted.

第２の実施形態における第２の閾値算出部３０２Ｂは、第２の平滑化部３０１から出力された第２の平滑化パワーＰ２と、第１のパラメータ算出部１１１Ｂから与えられた第２の参考平滑化パワーＰｒ２（＝同一単位時間の第１の平滑化パワーＰ１）とに基づいて、入力パワーＰｉｎと比較する第２の閾値ＴＨ２Ｂを形成して第２の音声区間判定部３０３に与えるものである。 The second threshold value calculation unit 302B in the second embodiment includes the second smoothing power P2 output from the second smoothing unit 301 and the second reference given from the first parameter calculation unit 111B. Based on the smoothing power Pr2 (= first smoothing power P1 of the same unit time), a second threshold value TH2B to be compared with the input power Pin is formed and given to the second speech section determination unit 303. is there.

第２の閾値ＴＨ２Ｂの形成に用いられる２つの値のうち、第２の平滑化パワーＰ２が音声パワーの平均的な値を意味し、第２の参考平滑化パワーＰｒ２が雑音パワーの平均的な値を意味するので、第２の閾値ＴＨ２Ｂとして、第２の平滑化パワーＰ２及び第２の参考平滑化パワーＰｒ２の平均値を適用することが好ましい。平均値は相加平均（Ｐ２＋Ｐｒ２）／２であっても相乗平均（Ｐ２×Ｐｒ２）^１／２であっても良い。第２の閾値ＴＨ２Ｂとして平均値以外を適用する場合においては、第２の閾値ＴＨ２Ｂを、第１の平滑化パワーＰ１より大きく平均値より小さい値とし、第２の音声区間判定部３０３で雑音区間より音声区間と判定される機会を多くすることが好ましい。演算の容易性などから、第２の閾値ＴＨ２Ｂの値として相加平均（Ｐ１＋Ｐｒ１）／２が好適である。 Of the two values used to form the second threshold TH2B, the second smoothing power P2 means the average value of the voice power, and the second reference smoothing power Pr2 is the average of the noise power. Since it means a value, it is preferable to apply an average value of the second smoothing power P2 and the second reference smoothing power Pr2 as the second threshold TH2B. The average value may be an arithmetic average (P2 + Pr2) / 2 or a geometric mean (P2 × Pr2) ^1/2 . When a value other than the average value is applied as the second threshold value TH2B, the second threshold value TH2B is set to a value larger than the first smoothing power P1 and smaller than the average value. It is preferable to increase the number of opportunities to be determined as voice segments. From the viewpoint of ease of calculation, the arithmetic mean (P1 + Pr1) / 2 is preferable as the value of the second threshold TH2B.

第２の実施形態の場合、第２のパラメータ算出部１１２Ｂは、第２の平滑化部３０１から出力された第２の平滑化パワーＰ２と第２の音声区間判定部３０３から出力された第２の音声区間真偽値Ｖ２とを含む第２のパラメータＦ２を単位時間遅延部１０４を介して第１のパラメータ算出部１１１Ｂに与えると共に、上述した第２のパラメータＦ２を雑音抑圧部１１４に与える。 In the case of the second embodiment, the second parameter calculation unit 112B includes the second smoothing power P2 output from the second smoothing unit 301 and the second output from the second speech segment determination unit 303. The second parameter F2 including the voice section true / false value V2 is given to the first parameter calculation unit 111B via the unit time delay unit 104, and the second parameter F2 is given to the noise suppression unit 114.

（Ｂ−２）第２の実施形態の動作
次に、第２の実施形態の雑音抑圧装置１００Ｂの動作を説明する。第２の実施形態の雑音抑圧装置１００Ｂの全体動作も第１の実施形態の雑音抑圧装置１００の全体動作と同様であるので全体動作の説明は省略し、以下では、第２の実施形態が第１の実施形態と異なっている第１のパラメータ算出部１１１Ｂ及び第２のパラメータ算出部１１２Ｂの動作を順に説明する。 (B-2) Operation of Second Embodiment Next, the operation of the noise suppression device 100B of the second embodiment will be described. Since the overall operation of the noise suppression device 100B of the second embodiment is also the same as the overall operation of the noise suppression device 100 of the first embodiment, description of the overall operation is omitted, and in the following, the second embodiment is the first embodiment. The operations of the first parameter calculation unit 111B and the second parameter calculation unit 112B, which are different from the first embodiment, will be described in order.

まず、第１のパラメータ算出部１１１Ｂの動作を、図５を参照しながら説明する。 First, the operation of the first parameter calculation unit 111B will be described with reference to FIG.

第１の平滑化部２０１においては、第１の音声区間参考真偽値Ｖｒ１（＝１単位時間前の第２の音声区間真偽値Ｖ２）に基づいて入力パワーＰｉｎが平滑化され、得られた第１の平滑化パワーＰ１が第１の閾値算出部２０２Ｂに与えられる。第１の閾値算出部２０２Ｂには、１単位時間前の第２の平滑化パワーＰ２である第１の参考平滑化パワーＰｒ１も与えられる。第１の閾値算出部２０２Ｂにおいては、第１の平滑化パワーＰ１と第１の参考平滑化パワーＰｒ１とに基づいて、入力パワーＰｉｎと比較される第１の閾値ＴＨ１Ｂが上述した方法により形成されて第１の音声区間判定部２０３に与えられる。そして、第１の音声区間判定部２０３において、第１の閾値ＴＨ１Ｂと、入力パワーＰｉｎとが比較され、入力パワーＰｉｎが第１の閾値ＴＨ１Ｂより大きいときに、真値の第１の音声区間真偽値Ｖ１が形成され、入力パワーＰｉｎが第１の閾値ＴＨ１Ｂ以下のときに、偽値の第１の音声区間真偽値Ｖ１が形成される。そして、第１の平滑化部２０１から出力された第１の平滑化パワーＰ１と第１の音声区間判定部２０３から出力された第１の音声区間真偽値Ｖ１とを含む第１のパラメータＦ１が第２のパラメータ算出部１１２Ｂ及び雑音抑圧部１１４に与えられる。 In the first smoothing unit 201, the input power Pin is smoothed and obtained based on the first speech section reference truth value Vr1 (= second speech section truth value V2 before one unit time). The first smoothing power P1 is given to the first threshold value calculation unit 202B. The first threshold value calculation unit 202B is also given a first reference smoothing power Pr1, which is the second smoothing power P2 one unit time before. In the first threshold value calculation unit 202B, the first threshold value TH1B to be compared with the input power Pin is formed by the above-described method based on the first smoothing power P1 and the first reference smoothing power Pr1. To the first speech segment determination unit 203. Then, in the first speech segment determination unit 203, the first threshold value TH1B is compared with the input power Pin, and when the input power Pin is greater than the first threshold value TH1B, the true first speech segment true When the false value V1 is formed and the input power Pin is less than or equal to the first threshold value TH1B, the false first voice interval true / false value V1 is formed. Then, the first parameter F1 including the first smoothing power P1 output from the first smoothing unit 201 and the first speech section truth value V1 output from the first speech section determination unit 203. Is provided to the second parameter calculation unit 112B and the noise suppression unit 114.

次に、第２のパラメータ算出部１１２Ｂの動作を、図６を参照しながら説明する。 Next, the operation of the second parameter calculation unit 112B will be described with reference to FIG.

第２の平滑化部３０１においては、第２の音声区間参考真偽値Ｖｒ２（＝同一単位時間の第１の音声区間真偽値Ｖ１）に基づいて入力パワーＰｉｎが平滑化され、得られた第２の平滑化パワーＰ２が第２の閾値算出部３０２Ｂに与えられる。第２の閾値算出部３０２Ｂには、同一単位時間の第１の平滑化パワーＰ１である第２の参考平滑化パワーＰｒ２も与えられる。第２の閾値算出部３０２Ｂにおいては、第２の平滑化パワーＰ２と第２の参考平滑化パワーＰｒ２とに基づいて、入力パワーＰｉｎと比較される第２の閾値ＴＨ２Ｂが上述した方法により形成されて第２の音声区間判定部３０３に与えられる。そして、第２の音声区間判定部３０３において、第２の閾値ＴＨ２Ｂと、入力パワーＰｉｎとが比較され、入力パワーＰｉｎが第２の閾値ＴＨ２Ｂより大きいときに、真値の第２の音声区間真偽値Ｖ２が形成され、入力パワーＰｉｎが第２の閾値ＴＨ２Ｂ以下のときに、偽値の第２の音声区間真偽値Ｖ２が形成される。そして、第２の平滑化部３０１から出力された第２の平滑化パワーＰ２と第２の音声区間判定部３０３から出力された第２の音声区間真偽値Ｖ２とを含む第２のパラメータＦ２が単位時間遅延部１１３を介して第１のパラメータ算出部１１１Ｂに与えられ、また、第２のパラメータＦ２が雑音抑圧部１１４に与えられる。 In the second smoothing unit 301, the input power Pin is smoothed and obtained based on the second speech section reference truth value Vr2 (= first speech section truth value V1 of the same unit time). The second smoothing power P2 is given to the second threshold value calculation unit 302B. The second threshold value calculation unit 302B is also given a second reference smoothing power Pr2, which is the first smoothing power P1 of the same unit time. In the second threshold value calculation unit 302B, the second threshold value TH2B to be compared with the input power Pin is formed by the above-described method based on the second smoothing power P2 and the second reference smoothing power Pr2. To the second speech segment determination unit 303. Then, in the second speech segment determination unit 303, the second threshold TH2B is compared with the input power Pin, and when the input power Pin is greater than the second threshold TH2B, the true second speech segment true When the false value V2 is formed and the input power Pin is less than or equal to the second threshold value TH2B, the false second voice interval true / false value V2 is formed. Then, the second parameter F2 including the second smoothing power P2 output from the second smoothing unit 301 and the second speech section truth value V2 output from the second speech section determination unit 303. Is provided to the first parameter calculation unit 111B via the unit time delay unit 113, and the second parameter F2 is provided to the noise suppression unit 114.

第２の実施形態においても、第１の実施形態と同様に、第１の音声区間判定部２０３及び第２の音声区間判定部３０３の少なくとも一方でハングオーバー動作を実行するようにしても良い。ハングオーバー動作を両方で行う場合において、第１の音声区間判定部２０３におけるハングオーバー時間と第２の音声区間判定部３０３におけるハングオーバー時間とが同じであっても良く、異なっていても良い。第２の実施形態においても、第１の音声区間判定部２０３におけるハングオーバー時間を０．１秒、第２の音声区間判定部３０３におけるハングオーバー時間を０．２秒とすることが好ましい態様である。 Also in the second embodiment, as in the first embodiment, a hangover operation may be performed on at least one of the first speech segment determination unit 203 and the second speech segment determination unit 303. In the case where both hangover operations are performed, the hangover time in the first speech segment determination unit 203 and the hangover time in the second speech segment determination unit 303 may be the same or different. Also in the second embodiment, it is preferable that the hangover time in the first speech segment determination unit 203 is 0.1 seconds and the hangover time in the second speech segment determination unit 303 is 0.2 seconds. is there.

また、第１の実施形態と同様に、第２の実施形態についても、図４に示したようなハングオーバー部１５を有する変形を行うことができる。このハングオーバー部１５におけるハングオーバー時間として０．５秒が好適である。 Similarly to the first embodiment, the second embodiment can be modified with the hangover portion 15 as shown in FIG. The hangover time in the hangover portion 15 is preferably 0.5 seconds.

（Ｂ−３）第２の実施形態の効果
第２の実施形態によっても、第１のパラメータ算出部及び第２のパラメータ算出部が互いのパラメータの更新を補い合うことで、パラメータを安定に更新させることができてパラメータの算出精度及び音声区間の検出精度を向上させることができるため、自然性と明瞭度の高い雑音抑圧を実現できる。 (B-3) Effect of Second Embodiment Also according to the second embodiment, the first parameter calculation unit and the second parameter calculation unit supplement each other's parameter updates, thereby stably updating the parameters. Therefore, it is possible to improve the parameter calculation accuracy and the speech interval detection accuracy, so that noise suppression with high naturalness and clarity can be realized.

これに加え、第２の実施形態によれば、音声と雑音のパワーバランスが未知の場合や、このパワーバランスが時間的に変動する場合においても、入力パワーと比較される閾値を適切に更新でき、この点からも、パラメータの算出精度及び音声区間の検出精度を向上させることができて自然性と明瞭度の高い雑音抑圧を実現できる。 In addition, according to the second embodiment, even when the power balance between voice and noise is unknown or when this power balance fluctuates with time, the threshold value to be compared with the input power can be updated appropriately. Also from this point, the parameter calculation accuracy and the speech interval detection accuracy can be improved, and noise suppression with high naturalness and clarity can be realized.

（Ｃ）第３の実施形態
次に、本発明による雑音抑圧装置及びプログラムの第３の実施形態を、図面を参照しながら説明する。 (C) Third Embodiment Next, a third embodiment of the noise suppression device and program according to the present invention will be described with reference to the drawings.

第３の実施形態は、第２のパラメータ算出部がＳＮＲ（ここでは事後ＳＮＲ）を推定して雑音抑圧部に提供する点が、第１の実施形態や第２の実施形態と異なっている。 The third embodiment is different from the first and second embodiments in that the second parameter calculation unit estimates the SNR (here, the posterior SNR) and provides it to the noise suppression unit.

（Ｃ−１）第３の実施形態の構成
第３の実施形態の雑音抑圧装置（以下、符号「１００Ｃ」を用いる）の全体構成も、上述した図１で表すことができる。 (C-1) Configuration of the Third Embodiment The overall configuration of the noise suppression device (hereinafter, using the symbol “100C”) of the third embodiment can also be represented in FIG. 1 described above.

但し、第３の実施形態では、第１のパラメータ算出部１１１が、第１の実施形態と同様に、入力パワーＰｉｎに基づいてパラメータの推定や音声区間の検出を行うが、第２のパラメータ算出部（以下、符号「１１２Ｃ」を用いる）は、ＳＮＲ（ここでは事後ＳＮＲ）を推定し、推定したＳＮＲに基づいて音声区間の検出を行う。また、雑音抑圧部（以下、符号「１１４Ｃ」を用いる）は、第２のパラメータ算出部１１２Ｃから出力されたＳＮＲをも利用して雑音抑圧を行う。 However, in the third embodiment, the first parameter calculation unit 111 performs parameter estimation and voice segment detection based on the input power Pin, as in the first embodiment, but the second parameter calculation. The unit (hereinafter, the code “112C” is used) estimates an SNR (here, a posteriori SNR), and detects a speech section based on the estimated SNR. In addition, the noise suppression unit (hereinafter referred to as “114C”) performs noise suppression also using the SNR output from the second parameter calculation unit 112C.

そのため、以下では、主として、第２のパラメータ算出部１１２Ｃの詳細構成と、雑音抑圧部１１４Ｃの機能とを説明する。なお、第１のパラメータ算出部１１１は、上述した図２に示した詳細構成を有している。 Therefore, the detailed configuration of the second parameter calculation unit 112C and the function of the noise suppression unit 114C will be mainly described below. The first parameter calculation unit 111 has the detailed configuration shown in FIG. 2 described above.

図７は、第３の実施形態における第２のパラメータ算出部１１２Ｃの詳細構成を示すブロック図であり、上述した第１の実施形態に係る図３との同一、対応部分には同一、対応符号を付して示している。 FIG. 7 is a block diagram showing a detailed configuration of the second parameter calculation unit 112C in the third embodiment, and is the same as in FIG. 3 according to the first embodiment described above, corresponding parts are the same, Is shown.

図７において、第３の実施形態の第２のパラメータ算出部１１２Ｃは、第２の平滑化部３０１Ｃ、第２の閾値算出部３０２Ｃ及び第２の音声区間判定部３０３Ｃに加え、ＳＮＲ算出部３０５を有する。 In FIG. 7, the second parameter calculation unit 112C of the third embodiment includes an SNR calculation unit 305 in addition to the second smoothing unit 301C, the second threshold value calculation unit 302C, and the second speech segment determination unit 303C. Have

ＳＮＲ算出部３０５は、入力パワーＰｉｎ（ＳＮＲのＳに相当する）を、雑音パワーの推定値である第２の参考平滑化パワーＰｒ２（＝１単位時間前の第１の平滑化パワーＰ１；ＳＮＲのＲに相当する）で除してＳＮＲの推定値Ｒｉを得、得られたＳＮＲ推定値Ｒｉを第２の平滑化部３０１Ｃ及び第２の音声区間判定部３０３Ｃに与える。 The SNR calculation unit 305 uses the input power Pin (corresponding to S of SNR) as a second reference smoothing power Pr2 (= first smoothing power P1 before one unit time; SNR) that is an estimated value of noise power; SNR estimated value Ri is obtained by dividing by R), and the obtained SNR estimated value Ri is given to second smoothing section 301C and second speech section determining section 303C.

第３の実施形態における第２の平滑化部３０１Ｃは、第１及び第２の実施形態のものと異なって入力パワーＰｉｎではなく、ＳＮＲ推定値Ｒｉを平滑化するものである。第２の平滑化部３０１Ｃは、第２の音声区間参考真偽値Ｖｒ２（＝１単位時間前の第１の音声区間真偽値Ｖ１）に基づいてＳＮＲ推定値Ｒｉを平滑化し、得られたＳＮＲ平滑化値Ｒｓを第２の閾値算出部３０２Ｃに与える。第２の平滑化部３０１Ｃは、第２の音声区間参考真偽値Ｖｒ２が真値（すなわち音声区間）であるときにはＳＮＲ推定値Ｒｉを平滑化してＳＮＲ平滑化値Ｒｓを更新し、第２の音声区間参考真偽値Ｖｒ２が偽値（すなわち雑音区間）であるときにはＳＮＲ平滑化値Ｒｓを更新しないで維持する。従って、ＳＮＲ平滑化値Ｒｓが意味するのは音声区間の平均的なＳＮＲである。なお、平滑化の方法は何ら限定されるものではない。例えば、時定数が０．８秒の時定数フィルタが好適である。 Unlike the first and second embodiments, the second smoothing unit 301C in the third embodiment smooths not the input power Pin but the SNR estimated value Ri. The second smoothing unit 301C smoothes the SNR estimated value Ri based on the second speech section reference truth value Vr2 (= first speech section truth value V1 before one unit time), and is obtained. The SNR smoothing value Rs is given to the second threshold value calculation unit 302C. The second smoothing unit 301C smoothes the SNR estimated value Ri and updates the SNR smoothed value Rs when the second speech section reference truth value Vr2 is a true value (that is, a speech section), When the speech section reference true / false value Vr2 is a false value (that is, a noise section), the SNR smoothing value Rs is maintained without being updated. Therefore, the SNR smoothing value Rs means the average SNR of the speech section. The smoothing method is not limited at all. For example, a time constant filter with a time constant of 0.8 seconds is suitable.

第３の実施形態における第２の閾値算出部３０２Ｃは、ＳＮＲ平滑化値Ｒｓが音声区間のＳＮＲを意味することから、ＳＮＲ平滑化値Ｒｓに、０より大きく１以下の定数値をとる第２の係数Ｃ２Ｃを乗じて、ＳＮＲ推定値Ｒｉと比較する第２の閾値ＴＨ２Ｃを形成して第２の音声区間判定部３０３Ｃに与えるものである。ＳＮＲ平滑化値Ｒｓに乗算することで、音声区間のＳＮＲ推定値と雑音区間のＳＮＲ推定値とを切り分けるための第２の閾値ＴＨ２Ｃを定める第２の係数Ｃ２Ｃの値は、限定されるものではないが、例えば、０．５を適用することができる。 Since the SNR smoothing value Rs means the SNR of the speech section, the second threshold value calculation unit 302C in the third embodiment takes a constant value greater than 0 and 1 or less as the SNR smoothing value Rs. Is multiplied by the coefficient C2C to form a second threshold value TH2C to be compared with the SNR estimated value Ri and provided to the second speech segment determination unit 303C. By multiplying the SNR smoothing value Rs, the value of the second coefficient C2C that defines the second threshold TH2C for separating the SNR estimated value of the speech section and the SNR estimated value of the noise section is not limited. For example, 0.5 can be applied.

第３の実施形態における音声区間判定部３０３Ｃは、ＳＮＲ推定値Ｒｉと第２の閾値ＴＨ２Ｃを比較して音声区間か否かを表す第２の音声区間真偽値Ｖ２を形成するものである。第２の音声区間判定部３０３Ｃは、ＳＮＲ推定値Ｒｉが第２の閾値ＴＨ２Ｃより大きければ第２の音声区間真偽値Ｖ２として真値を出力し、そうでなければ偽値を出力する。 The speech segment determination unit 303C in the third embodiment compares the SNR estimated value Ri and the second threshold value TH2C to form a second speech segment truth value V2 that indicates whether or not the speech segment is a speech segment. The second speech segment determination unit 303C outputs a true value as the second speech segment truth value V2 if the SNR estimated value Ri is larger than the second threshold value TH2C, and otherwise outputs a false value.

第２のパラメータ算出部１１２Ｃからは、ＳＮＲ推定値Ｒｉと第２の音声区間真偽値Ｖ２とが出力される。 From the second parameter calculation unit 112C, the SNR estimated value Ri and the second speech interval truth value V2 are output.

第３の実施形態の雑音抑圧部１１４Ｃは、第１のパラメータＦ１及び第２のパラメータＦ２Ｃと入カパワーＰｉｎに基づいて、入力スペクトル（周波数帯域信号）の雑音成分を抑圧し、抑圧後のスペクトルを波形復元部１０３に与える。第３の実施形態では、上述したように第２のパラメータＦ２ＣにＳＮＲ推定値Ｒｉが含まれており、雑音抑圧部１１４Ｃは、抑圧ゲインの算出に必要な事後ＳＮＲを改めて算出する必要がなく、与えられたＳＮＲ推定値Ｒｉをそのまま利用する。雑音抑圧部１１４Ｃが適用している雑音抑圧方法は限定されないが、周波数帯域毎のＳＮＲ推定値Ｒｉを利用できるという点からは、ウィナーフィルタ法やＭＭＳＥ−ＳＴＳＡが好適である。 The noise suppression unit 114C of the third embodiment suppresses the noise component of the input spectrum (frequency band signal) based on the first parameter F1 and the second parameter F2C and the input power Pin, and the suppressed spectrum is obtained. The waveform restoration unit 103 is provided. In the third embodiment, as described above, the SNR estimated value Ri is included in the second parameter F2C, and the noise suppression unit 114C does not need to calculate the a posteriori SNR necessary for calculating the suppression gain, The given SNR estimated value Ri is used as it is. The noise suppression method applied by the noise suppression unit 114C is not limited, but the Wiener filter method and MMSE-STSA are preferable from the viewpoint that the SNR estimation value Ri for each frequency band can be used.

（Ｃ−２）第３の実施形態の動作
次に、第３の実施形態の雑音抑圧装置１００Ｃの動作を説明する。第３の実施形態の雑音抑圧装置１００Ｃの全体動作も第１の実施形態の音声区間検出装置１００の全体動作と同様であるので全体動作の説明は省略する。また、第３の実施形態における第１の音声区間検出部１０２の動作は第１の実施形態のものと同様であるのでその動作説明は省略し、以下では、第３の実施形態における第２の音声区間検出部１０３Ｃの動作を説明し、第２の音声区間検出部１０３Ｃの出力が与えられる雑音抑圧部１１４Ｃの動作も説明する。 (C-2) Operation of the Third Embodiment Next, the operation of the noise suppression device 100C of the third embodiment will be described. Since the overall operation of the noise suppression device 100C of the third embodiment is also the same as the overall operation of the speech segment detection device 100 of the first embodiment, description of the overall operation is omitted. In addition, since the operation of the first speech section detection unit 102 in the third embodiment is the same as that of the first embodiment, description of the operation is omitted, and the second operation in the third embodiment is described below. The operation of the speech segment detection unit 103C will be described, and the operation of the noise suppression unit 114C to which the output of the second speech segment detection unit 103C is given will also be described.

図７において、ＳＮＲ算出部３０５には、入力パワーＰｉｎと雑音パワーの推定値である第２の参考平滑化パワーＰｒ２（１単位時間前の第１の平滑化パワーＰ１）とが与えられ、入力パワーＰｉｎを第２の参考平滑化パワーＰｒ２で除してＳＮＲの推定値Ｒｉが得られ、得られたＳＮＲ推定値Ｒｉが第２の平滑化部３０１Ｃ及び第２の音声区間判定部３０３Ｃに与えられる。 In FIG. 7, the SNR calculation unit 305 is given an input power Pin and a second reference smoothing power Pr2 (first smoothing power P1 one unit time before) that is an estimated value of noise power. The power Pin is divided by the second reference smoothing power Pr2 to obtain an estimated SNR value Ri, and the obtained SNR estimated value Ri is given to the second smoothing unit 301C and the second speech segment determination unit 303C. It is done.

ＳＮＲ推定値Ｒｉは第２の平滑化部３０１Ｃによって第２の音声区間参考真偽値Ｖｒ２（＝１単位時間前の第１の音声区間真偽値Ｖ１）が参照されて平滑化される。すなわち、第２の音声区間参考真偽値Ｖｒ２が真値（すなわち音声区間）であるときにはＳＮＲ推定値Ｒｉが平滑化されてＳＮＲ平滑化値Ｒｓが更新され、第２の音声区間参考真偽値Ｖｒ２が偽値（すなわち雑音区間）であるときにはＳＮＲ平滑化値Ｒｓが更新されないで維持され、このようにして得られたＳＮＲ平滑化値Ｒｓが第２の閾値算出部３０２Ｃに与えられる。そして、第２の閾値算出部３０２Ｃにおいて、ＳＮＲ平滑化値Ｒｓに、０より大きく１以下の定数値をとる第２の係数Ｃ２Ｃが乗算されて、ＳＮＲ推定値Ｒｉと比較される第２の閾値ＴＨ２Ｃが形成されて第２の音声区間判定部３０３Ｃに与えられる。 The SNR estimated value Ri is smoothed by the second smoothing unit 301C with reference to the second speech section reference truth value Vr2 (= first speech section truth value V1 one unit time before). That is, when the second speech section reference truth value Vr2 is a true value (that is, speech section), the SNR estimated value Ri is smoothed and the SNR smoothed value Rs is updated, and the second speech section reference truth value is updated. When Vr2 is a false value (that is, a noise interval), the SNR smoothed value Rs is maintained without being updated, and the SNR smoothed value Rs thus obtained is given to the second threshold value calculation unit 302C. Then, in the second threshold value calculation unit 302C, the SNR smoothed value Rs is multiplied by a second coefficient C2C having a constant value greater than 0 and equal to or less than 1, and the second threshold value is compared with the SNR estimated value Ri. TH2C is formed and provided to the second speech segment determination unit 303C.

ＳＮＲ推定値Ｒｉと第２の閾値ＴＨ２Ｃとが第２の音声区間判定部３０３Ｃにおいて比較され、ＳＮＲ推定値Ｒｉが第２の閾値ＴＨ２Ｃより大きときに真値の第２の音声区間真偽値Ｖ２が出力され、ＳＮＲ推定値Ｒｉが第２の閾値ＴＨ２Ｃ以下のときに偽値の第２の音声区間真偽値Ｖ２が出力される。 The SNR estimated value Ri and the second threshold value TH2C are compared in the second speech segment determination unit 303C, and the true second speech segment truth value V2 when the SNR estimated value Ri is greater than the second threshold value TH2C. Is output, and when the SNR estimated value Ri is equal to or smaller than the second threshold value TH2C, the false second speech section true / false value V2 is output.

第２のパラメータ算出部１１２Ｃからは、ＳＮＲ推定値Ｒｉと第２の音声区間真偽値Ｖ２とを含む第２のパラメータＦ２Ｃが出力され、第２のパラメータＦ２Ｃが雑音除去部１１４Ｃに与えられ、第２の音声区間真偽値Ｖ２が単位時間遅延部１１３を介して第１のパラメータ算出部１１１に与えられる。 From the second parameter calculation unit 112C, the second parameter F2C including the SNR estimated value Ri and the second speech interval truth value V2 is output, and the second parameter F2C is provided to the noise removal unit 114C. The second speech section truth value V2 is given to the first parameter calculation unit 111 via the unit time delay unit 113.

雑音抑圧部１１４Ｃにおいては、第１のパラメータＦ１及び第２のパラメータＦ２Ｃと入カパワーＰｉｎに基づいて、入力スペクトル（周波数帯域信号）の雑音成分が抑圧され、抑圧後のスペクトルが波形復元部１０３に与えられる。ここで、上述した第２のパラメータＦ２ＣにはＳＮＲ推定値Ｒｉが含まれているので、雑音抑圧部１１４Ｃにおいて、抑圧ゲインの算出に必要な事後ＳＮＲが算出されることなく、与えられたＳＮＲ推定値Ｒｉがそのまま利用される。 In the noise suppression unit 114C, the noise component of the input spectrum (frequency band signal) is suppressed based on the first parameter F1 and the second parameter F2C and the input power Pin, and the suppressed spectrum is sent to the waveform restoration unit 103. Given. Here, since the SNR estimation value Ri is included in the second parameter F2C described above, the given SNR estimation is performed without calculating the a posteriori SNR necessary for calculating the suppression gain in the noise suppression unit 114C. The value Ri is used as it is.

第３の実施形態においても、第１の実施形態と同様に、第１の音声区間判定部２０３及び第２の音声区間判定部３０３Ｃの少なくとも一方でハングオーバー動作を実行するようにしても良い。ハングオーバー動作を両方で行う場合において、第１の音声区間判定部２０３におけるハングオーバー時間と第２の音声区間判定部３０３Ｃにおけるハングオーバー時間とが同じであっても良く、異なっていても良い。第３の実施形態においても、第１の音声区間判定部２０３におけるハングオーバー時間を０．１秒、第２の音声区間判定部３０３Ｃにおけるハングオーバー時間を０．２秒とすることが好ましい態様である。 Also in the third embodiment, as in the first embodiment, the hangover operation may be performed on at least one of the first voice segment determination unit 203 and the second voice segment determination unit 303C. In the case of performing both hangover operations, the hangover time in the first speech segment determination unit 203 and the hangover time in the second speech segment determination unit 303C may be the same or different. Also in the third embodiment, it is preferable that the hangover time in the first speech segment determination unit 203 is 0.1 second and the hangover time in the second speech segment determination unit 303C is 0.2 seconds. is there.

また、第１の実施形態と同様に、第３の実施形態についても、図４に示したようなハングオーバー部１５を有する変形を行うことができる。このハングオーバー部１５におけるハングオーバー時間として０．５秒が好適である。 Similarly to the first embodiment, the third embodiment can be modified to have the hangover portion 15 as shown in FIG. The hangover time in the hangover portion 15 is preferably 0.5 seconds.

（Ｃ−３）第３の実施形態の効果
第３の実施形態によれば、第１のパラメータ算出部による入力信号のパワーに基づく推定及び判定と、第２のパラメータ算出部による入力信号におけるＳＮＲに基づく推定及び判定の、それぞれのパラメータと判定結果を使って互いに推定及び判定を行うので、パラメータの算出精度及び音声区間の検出精度を向上させることができて、自然性と明瞭度の高い雑音抑圧を実現できる。 (C-3) Effect of Third Embodiment According to the third embodiment, estimation and determination based on the power of the input signal by the first parameter calculation unit, and SNR in the input signal by the second parameter calculation unit Estimation and determination based on each parameter and determination result are used to estimate and determine each other, so that parameter calculation accuracy and speech segment detection accuracy can be improved, and noise with high naturalness and clarity Repression can be realized.

（Ｃ−４）第３の実施形態の変形実施形態
上記では、第１のパラメータ算出部が入力パワーに基づいて音声区間を検出し、第２のパラメータ算出部がＳＮＲに基づいて音声区間を検出するものを説明したが、第１のパラメータ算出部がＳＮＲに基づいて音声区間を検出し、第２のパラメータ算出部が入力パワーに基づいて音声区間を検出するものであっても良く、また、第１のパラメータ算出部も第２のパラメータ算出部もＳＮＲに基づいて音声区間を検出するものであっても良い。 (C-4) Modified Embodiment of Third Embodiment In the above description, the first parameter calculation unit detects a voice section based on input power, and the second parameter calculation unit detects a voice section based on SNR. However, the first parameter calculation unit may detect the voice interval based on the SNR, and the second parameter calculation unit may detect the voice interval based on the input power. Both the first parameter calculation unit and the second parameter calculation unit may detect a speech section based on the SNR.

（Ｄ）他の実施形態
上記各実施形態の説明においても種々変形実施形態に言及したが、さらに、以下に例示するような変形実施形態を挙げることができる。 (D) Other Embodiments In the description of each of the above embodiments, various modified embodiments have been mentioned, and further modified embodiments as exemplified below can be given.

上記各実施形態では、各帯域別雑音抑圧手段がそれぞれ独立に機能するものを示したが、静的又は動的に影響し合うようにしても良い。例えば、一部の帯域別雑音抑圧手段の第１のパラメータ算出部及び第２のパラメータ算出部は平滑化部だけを備え、閾値算出部及び音声区間検出部を有する他の帯域別雑音抑圧手段から音声区間真偽値を取込んで動作するようにしても良い。また例えば、全ての帯域別雑音抑圧手段における第１のパラメータ算出部からの第１の音声区間真偽値を多数決や論理積や論理和などにより統合して全ての帯域別雑音抑圧手段における第２のパラメータ算出部に与え、全ての帯域別雑音抑圧手段における第２のパラメータ算出部からの第２の音声区間真偽値を多数決や論理積や論理和などにより統合し、さらに１単位時間分だけ遅延させて全ての帯域別雑音抑圧手段における第１のパラメータ算出部に与えるようにしても良い。 In each of the above-described embodiments, the noise suppression unit for each band functions independently. However, it may be configured to influence each other statically or dynamically. For example, the first parameter calculation unit and the second parameter calculation unit of some band-specific noise suppression units include only a smoothing unit, and other band-specific noise suppression units including a threshold calculation unit and a voice section detection unit. You may make it operate | move by taking in a speech section truth value. Further, for example, the first speech section true / false values from the first parameter calculation unit in all the noise suppression means for each band are integrated by majority vote, logical product, or logical sum, and the second in the noise suppression means for all the bands. The second speech interval truth value from the second parameter calculation unit in all the noise suppression means for all bands is integrated by majority vote, logical product or logical sum, etc., and further for one unit time You may make it delay and give it to the 1st parameter calculation part in all the noise suppression means classified by band.

上記各実施形態では、第１のパラメータ算出部が第２のパラメータ算出部の１単位時間前の検出結果が雑音期間を示しているときに所定の特徴量を更新すると共に、第２のパラメータ算出部が第１のパラメータ算出部の同一単位時間の検出結果が音声期間を示しているときに所定の特徴量を更新する場合を示したが、特徴量を更新する期間の組み合わせはこれに限定されるものではない。例えば、第１のパラメータ算出部が音声期間で特徴量を更新し、第２のパラメータ算出部が雑音期間で更新するようにしても良く、第１のパラメータ算出部及び第２のパラメータ算出部が共に雑音区間で更新するようにしても良く、第１のパラメータ算出部及び第２のパラメータ算出部が共に音声区間で更新するようにしても良い。更新期間の選定によっては、第１のパラメータ算出部において、ＳＮＲに基づいた音声区間の検出を行っても良い。 In each of the above embodiments, the first parameter calculation unit updates the predetermined feature amount when the detection result of one unit time before the second parameter calculation unit indicates the noise period, and the second parameter calculation The unit has shown the case where the predetermined feature amount is updated when the detection result of the same unit time of the first parameter calculation unit indicates the voice period, but the combination of the periods for updating the feature amount is limited to this. It is not something. For example, the first parameter calculation unit may update the feature amount in the voice period, and the second parameter calculation unit may update in the noise period. The first parameter calculation unit and the second parameter calculation unit may Both may be updated in the noise interval, and both the first parameter calculation unit and the second parameter calculation unit may be updated in the voice interval. Depending on the selection of the update period, the first parameter calculation unit may detect the speech section based on the SNR.

上記各実施形態では、第２のパラメータ算出部による音声区間の検出結果（第２の音声区間真偽値）を雑音抑圧部に出力するものを示したが、雑音抑圧部に出力する音声区間の検出結果はこれに限定されるものではない。例えば、第１のパラメータ算出部による音声区間の検出結果（第１の音声区間真偽値）を雑音抑圧部に出力するようにしても良く、第１のパラメータ算出部による音声区間の検出結果と第２のパラメータ算出部による音声区間の検出結果の論理積や論理和を雑音抑圧部に出力するようにしても良い。 In each of the above embodiments, the speech section detection result (second speech section true / false value) output by the second parameter calculation unit is output to the noise suppression unit. However, the speech section output to the noise suppression unit The detection result is not limited to this. For example, the detection result of the speech section (first speech section truth value) by the first parameter calculation unit may be output to the noise suppression unit, and the detection result of the speech section by the first parameter calculation unit You may make it output the logical product and logical sum of the detection result of the audio | voice area by a 2nd parameter calculation part to a noise suppression part.

上記第１及び第２の実施形態では、第１のパラメータ算出部及び第２のパラメータ算出部が完全に別個の構成になっている場合を示したが、同一のパラメータ算出部の主要部（平滑化部、閾値算出部、音声区間判定部）を１単位時間内に時分割で適用して、第１のパラメータ算出部及び第２のパラメータ算出部として機能させるようにしても良い。この場合には、第１のパラメータ算出部として機能する際には、第２のパラメータ算出部に関するデータ（例えば、第２の平滑化パワーＰ２や第２の係数Ｃ２等）を退避させ、第２のパラメータ算出部として機能する際には、第１のパラメータ算出部に関するデータ（例えば、第１の平滑化パワーＰ１や第１の係数Ｃ１等）を退避させるメモリなど、補助的な構成を設けることを要する。特許請求の範囲の表現はこのような同一構成を時分割で利用する場合を含むものとする。 In the first and second embodiments, the case where the first parameter calculation unit and the second parameter calculation unit have completely different configurations has been described. (A conversion unit, a threshold value calculation unit, and a speech segment determination unit) may be applied in a time division manner within one unit time so as to function as a first parameter calculation unit and a second parameter calculation unit. In this case, when functioning as the first parameter calculation unit, the data related to the second parameter calculation unit (for example, the second smoothing power P2, the second coefficient C2, etc.) is saved and the second When functioning as a parameter calculation unit, an auxiliary configuration such as a memory for saving data related to the first parameter calculation unit (for example, the first smoothing power P1, the first coefficient C1, etc.) is provided. Cost. The expression of a claim shall include the case where such an identical structure is used by time division.

上記各実施形態では、特徴量の平滑値に基づいて音声区間検出に用いる閾値を決定するものを示したが、他の方法によって閾値を決定するようにしても良い。例えば、雑音区間と判定された直前過去の所定期間（例えば３秒間；断続的に雑音区間が生じている場合には合算時間が３秒間）における入力パワーの最小値の所定倍を閾値とするようにしても良く、音声区間と判定された直前過去の所定期間（例えば３秒間）における入力パワーの最大値の所定倍を閾値とするようにしても良い。また、第３の実施形態のように特徴量としてＳＮＲを用いる場合であれば、音声区間と判定された直前過去の所定期間（例えば３秒間）におけるＳＮＲの最大値の所定倍を閾値とするようにしても良い。 In each of the above-described embodiments, the threshold value used for speech segment detection is determined based on the smooth value of the feature amount. However, the threshold value may be determined by another method. For example, a predetermined multiple of the minimum value of the input power in a predetermined period (for example, 3 seconds; when the noise period is intermittently generated, 3 seconds in the past) determined as the noise period is set as a threshold value. Alternatively, the threshold value may be a predetermined multiple of the maximum value of the input power in a predetermined period (for example, 3 seconds) immediately before the determination of the voice section. Further, when SNR is used as a feature quantity as in the third embodiment, a predetermined multiple of the maximum value of SNR in a predetermined period (for example, 3 seconds) immediately before the determination as a speech section is used as a threshold value. Anyway.

上記各実施形態では、第１のパラメータ算出部及び又は第２のパラメータ算出部で得られた特徴量を雑音除去で利用するものを示したが、他の目的の動作で利用するようにしても良い。 In each of the embodiments described above, the feature amount obtained by the first parameter calculation unit and / or the second parameter calculation unit is used for noise removal. However, the feature amount may be used for other purposes. good.

図８は、本発明による雑音推定装置の一実施形態の構成を示すブロック図であり、図１との同一、対応部分には同一、対応符号を付して示している。 FIG. 8 is a block diagram showing the configuration of an embodiment of the noise estimation apparatus according to the present invention, in which the same and corresponding parts as in FIG.

図８において、この実施形態の雑音推定装置４００は、第１の実施形態と同様な周波数解析部１０１、帯域別雑音パワー推定部４０２−１〜４０２−Ｍ及び雑音パワー統合部４０３を有する。各帯域別雑音パワー推定部４０２（４０２−１〜４０２−Ｍ）は、第１の実施形態と同様なパワー算出部１１０、第１の実施形態と同様な第１のパラメータ算出部１１１及び第１の実施形態と同様な第２のパラメータ算出部１１２を備え、各帯域別雑音パワー推定部４０２からは、第１のパラメータ算出部１１１の内部で得た第１の平滑化パワーＰ１が雑音パワー統合部４０３に与えられる。雑音パワー統合部４０３は、全ての帯域別雑音パワー推定部４０２−１〜４０２−Ｍからの第１の平滑化パワーＰ１−１〜Ｐ１−Ｍを統合して雑音パワースペクトルとする。この際の統合は、各周波数帯域の値をベクトルの要素に割り当てたベクトルの作成であっても良く、合算であっても良く、平均値（重み付け平均値であっても良い）の算出であっても良い。 In FIG. 8, the noise estimation apparatus 400 of this embodiment includes a frequency analysis unit 101, band-specific noise power estimation units 402-1 to 402-M, and a noise power integration unit 403 similar to those of the first embodiment. Each band-specific noise power estimation unit 402 (402-1 to 402-M) includes a power calculation unit 110 similar to that in the first embodiment, a first parameter calculation unit 111 similar to that in the first embodiment, and a first parameter. The second parameter calculation unit 112 similar to that of the first embodiment is provided, and the first smoothing power P1 obtained inside the first parameter calculation unit 111 is integrated with the noise power from each band noise power estimation unit 402. To the unit 403. The noise power integration unit 403 integrates the first smoothed powers P1-1 to P1-M from all the band-specific noise power estimation units 402-1 to 402-M into a noise power spectrum. Integration at this time may be creation of a vector in which values of each frequency band are assigned to vector elements, summation, or calculation of an average value (may be a weighted average value). May be.

図９は、本発明によるＳＮＲ推定装置の一実施形態の構成を示すブロック図であり、図１との同一、対応部分には同一、対応符号を付して示している。 FIG. 9 is a block diagram showing a configuration of an embodiment of the SNR estimation apparatus according to the present invention, in which the same and corresponding parts as in FIG.

図９において、この実施形態のＳＮＲ推定装置５００は、第１の実施形態と同様な周波数解析部１０１、帯域別ＳＮＲ推定部５０２−１〜５０２−Ｍ及びＳＮＲ統合部５０３を有する。各帯域別ＳＮＲ推定部５０２（５０２−１〜５０２−Ｍ）は、第１の実施形態と同様なパワー算出部１１０、第１の実施形態と同様な第１のパラメータ算出部１１１及び第３の実施形態と同様な第２のパラメータ算出部１１２Ｃを備え、各帯域別ＳＮＲ推定部５０２からは、第２のパラメータ算出部１１２Ｃの内部で得たＳＮＲ推定値ＲｉがＳＮＲ統合部５０３に与えられる。ＳＮＲ統合部５０３は、全ての帯域別ＳＮＲ推定部５０２−１〜５０２−ＭからのＳＮＲ推定値Ｒｉを統合して出力するＳＮＲ推定値を形成する。ＳＮＲ推定値Ｒｉの統合方法として、例えば、全周波数帯域のＳＮＲ推定値Ｒｉの平均値を取る方法を挙げることができる。この平均値を取る方法は、ＳＮＲ推定値Ｒｉの平均値を求める方法であっても良く、また、ＳＮＲ推定値Ｒｉをデシベル（対数尺度）に変換した後に平均値をとる方法であっても良い（この場合、出力はデシベルのままでも良く、また、元の尺度に戻したものでも良い）。 In FIG. 9, the SNR estimation apparatus 500 of this embodiment includes a frequency analysis unit 101, band-specific SNR estimation units 502-1 to 502-M, and an SNR integration unit 503 similar to those of the first embodiment. Each SNR estimation unit 502 (502-1 to 502-M) for each band includes a power calculation unit 110 similar to that in the first embodiment, a first parameter calculation unit 111 similar to that in the first embodiment, and a third parameter calculation unit. A second parameter calculation unit 112C similar to that of the embodiment is provided, and the SNR estimation value Ri obtained inside the second parameter calculation unit 112C is given to the SNR integration unit 503 from each band-specific SNR estimation unit 502. The SNR integration unit 503 forms SNR estimation values that are output by integrating the SNR estimation values Ri from the band-specific SNR estimation units 502-1 to 502-M. As a method for integrating the SNR estimated values Ri, for example, a method of taking an average value of the SNR estimated values Ri of the entire frequency band can be cited. The method of obtaining the average value may be a method of obtaining an average value of the SNR estimated value Ri, or a method of obtaining the average value after converting the SNR estimated value Ri into a decibel (logarithmic scale). (In this case, the output may be left in decibels or may be restored to the original scale).

上記各実施形態では、雑音に対比される目的音が音声である場合を示したが、本発明はこれに限定されるものではない。例えば、機械のモータ音が雑音に対比される目的音になっている場合にも、本発明の技術思想を適用することができる。 In each of the above embodiments, the case where the target sound to be compared with the noise is a voice is shown, but the present invention is not limited to this. For example, the technical idea of the present invention can also be applied when the motor sound of a machine is a target sound compared with noise.

１００、１００Ａ、１００Ｂ、１００Ｃ…雑音抑圧装置、１０１…周波数解析部、１０２−１〜１０２−Ｍ…帯域別雑音抑圧手段、１０３…波形復元部、１１０…パワー算出部、１１１、１１１Ｂ…第１のパラメータ算出部、１１２、１１２Ａ、１１２Ｂ、１１２Ｃ…第２のパラメータ算出部、１１３…単位時間遅延部、１１４…雑音抑圧部、２０１…第１の平滑化部、２０２、２０２Ｂ…第１の閾値算出部、２０３…第１の音声区間判定部、３０１、３０１Ｃ…第２の平滑化部、３０２、３０２Ｂ、３０２Ｃ…第２の閾値算出部、３０３、３０３Ｃ…第２の音声区間判定部、３０４…ハングオーバー部、３０５…ＳＮＲ算出部、４００…雑音推定装置、４０２−１〜４０２−Ｍ…帯域別雑音パワー推定部、４０３…雑音パワー統合部、５００…ＳＮＲ推定装置、５０２−１〜５０２−Ｍ…帯域別ＳＮＲ推定部、５０３…ＳＮＲ統合部。 DESCRIPTION OF SYMBOLS 100, 100A, 100B, 100C ... Noise suppression apparatus, 101 ... Frequency analysis part, 102-1 to 102-M ... Noise suppression means classified by band, 103 ... Waveform restoration part, 110 ... Power calculation part, 111, 111B ... 1st 112, 112A, 112B, 112C ... second parameter calculation unit, 113 ... unit time delay unit, 114 ... noise suppression unit, 201 ... first smoothing unit, 202, 202B ... first threshold Calculation unit, 203... First speech section determination unit, 301, 301C, second smoothing unit, 302, 302B, 302C, second threshold calculation unit, 303, 303C, second speech section determination unit, 304 Hangover unit, 305 ... SNR calculation unit, 400 ... noise estimation device, 402-1 to 402-M ... noise power estimation unit for each band, 403 ... noise power integration unit, 50 ... SNR estimation device, 502-1 to 502-M ... band domain SNR estimator, 503 ... SNR integration unit.

Claims

In the noise suppression device that suppresses the noise component contained in the input signal and emphasizes the target sound component,
A frequency analysis unit that performs frequency analysis of the input signal to calculate an input spectrum;
Corresponding to the frequency band of any input spectrum calculated by the frequency analysis unit, comprising a plurality of band-specific noise suppression means for suppressing noise components in the signal of the frequency band,
The noise suppression means for each band is as follows:
The first feature quantity based on the first input power calculated for the input frequency band signal is compared with the first threshold value generated internally, and the first frequency in the input frequency band signal is compared. A first parameter calculation unit for obtaining a detection result of the target sound section;
The second feature amount based on the second input power calculated for the input frequency band signal is compared with the second threshold value generated internally, and the second feature value in the input frequency band signal is compared. A second parameter calculation unit for obtaining a detection result of the target sound section;
Noise that suppresses a noise component included in the input frequency band signal based on the first parameter obtained by the first parameter calculation unit and the second parameter obtained by the second parameter calculation unit A repressor,
The first parameter calculation unit uses the second parameter output by the second parameter calculation unit before a predetermined unit time and includes at least a detection result of the second target sound section. Generate a threshold,
The second parameter calculation unit uses the second parameter that is output by the first parameter calculation unit in the same unit time and includes at least the detection result of the first target sound section, and then uses the second parameter. A noise suppressor characterized by generating

The first parameter calculation unit includes:
A first smoothing power is calculated by smoothing the first input power while controlling execution and stop of smoothing based on the detection result of the second target sound section before a predetermined unit time. 1 smoothing unit;
A first threshold value calculation unit for calculating the first threshold value by applying at least the first smoothing power;
The first target sound that obtains the detection result of the first target sound section by comparing the first input power with the first threshold value as the first feature amount to determine whether the target sound section is present or not. A section determination unit,
The second parameter calculation unit includes:
A second smoothing power is calculated by smoothing the second input power while controlling the execution and stop of the smoothing based on the detection result of the first target sound section of the same unit time. A smoothing section of
A second threshold value calculation unit for calculating the second threshold value by applying at least the second smoothing power;
The second target sound is obtained by comparing the second input power as the second feature amount with the second threshold value to determine whether the target sound section is present, and obtaining a detection result of the second target sound section. The noise suppression device according to claim 1, further comprising: an interval determination unit.

The first smoothing unit performs smoothing of the first input power when a detection result of the second target sound section before a predetermined unit time is not the target sound section, and performs a predetermined unit time before When the detection result of the second target sound section is the target sound section, the smoothing of the first input power is stopped and the first smoothed power is maintained.
The second smoothing unit performs the second input power smoothing when the detection result of the first target sound section of the same unit time is the target sound section, and the second smoothing unit performs the above-mentioned of the same unit time. 3. The smoothing of the second input power is stopped when the detection result of the first target sound section is not the target sound section, and the second smoothing power is maintained. 4. Noise suppression device.

One of the first smoothing unit and the second smoothing unit performs smoothing when the detection result of the target sound section input to itself is the target sound section, and the other Smoothing is performed when the input detection result of the target sound section is not the target sound section.
The first threshold value calculation unit calculates the first threshold value by applying the first smoothing power and the second smoothing power before a predetermined unit time,
The second threshold value calculation unit calculates the second threshold value by applying the first smoothing power and the second smoothing power in the same unit time. The noise suppressor described.

The first threshold value calculating unit calculates an arithmetic average or a geometric average of the first smoothing power and the second smoothing power before a predetermined unit time as the first threshold value. The noise suppression device according to claim 4.

The second threshold value calculation unit calculates an arithmetic average or a geometric average of the first smoothing power and the second smoothing power in the same unit time as the second threshold value. The noise suppression device according to claim 4 or 5.

The first parameter calculation unit includes:
A first smoothing power is calculated by smoothing the first input power while controlling execution and stop of smoothing based on the detection result of the second target sound section before a predetermined unit time. 1 smoothing unit;
A first threshold value calculation unit for calculating the first threshold value by applying at least the first smoothing power;
The first target sound that obtains the detection result of the first target sound section by comparing the first input power with the first threshold value as the first feature amount to determine whether the target sound section is present or not. A section determination unit,
The second parameter calculation unit includes:
An SNR calculator that calculates an estimated value of SNR based on the second input power and the first smoothed power of the same unit time;
Second smoothing for smoothing the SNR estimated value and calculating the smoothed value of the SNR while controlling the execution and stop of smoothing based on the detection result of the first target sound section of the same unit time And
A second threshold value calculation unit for calculating the second threshold value by applying at least the SNR smooth value;
The SNR estimation value is compared with the second threshold value as the second feature amount to determine whether or not the target sound section is present, and second target sound section determination for obtaining a detection result of the second target sound section The noise suppression device according to claim 1, further comprising:

In a noise estimation device that estimates noise power in an input signal,
A frequency analysis unit that performs frequency analysis of the input signal to calculate an input spectrum;
Corresponding to the frequency band of any input spectrum calculated by the frequency analysis unit, a plurality of band-specific noise estimation means for estimating the noise power in the signal of the frequency band,
The noise estimation means for each band obtained by the noise estimation means for each band obtained by integrating the noise noise estimation values for each frequency band to obtain a final noise power estimation value,
The noise estimation means for each band is respectively
The first feature amount based on the first input power calculated for the input frequency band signal is compared with the first threshold value generated internally, and the first frequency value in the input frequency band signal is compared . A first parameter calculation unit for detecting a target sound section of
The second feature value based on the second input power calculated for the input frequency band signal is compared with the second threshold value generated internally, and the second frequency value in the input frequency band signal is compared . A second parameter calculation unit for detecting the target sound section of
The first parameter calculation unit includes:
A first smoothing power is calculated by smoothing the first input power while controlling execution and stop of smoothing based on the detection result of the second target sound section before a predetermined unit time. 1 smoothing unit;
A first threshold value calculation unit for calculating the first threshold value by applying at least the first smoothing power;
The first target sound that obtains the detection result of the first target sound section by comparing the first input power with the first threshold value as the first feature amount to determine whether the target sound section is present or not. A section determination unit,
The second parameter calculation unit includes:
A second smoothing power is calculated by smoothing the second input power while controlling the execution and stop of the smoothing based on the detection result of the first target sound section of the same unit time. A smoothing section of
A second threshold value calculation unit for calculating the second threshold value by applying at least the second smoothing power;
The second target sound is obtained by comparing the second input power as the second feature amount with the second threshold value to determine whether the target sound section is present, and obtaining a detection result of the second target sound section. A section determination unit,
The first smoothing unit or the second smoothing unit receives a detection result of the second target sound section before a predetermined unit time or a detection result of the first target sound section of the same unit time. Smoothing if it is not the target sound section, stopping smoothing if it is the target sound section, and obtaining the first smoothing power or the second smoothing power as an estimate of noise power for each band. A characteristic noise estimation apparatus.

In an SNR estimation device for estimating an SNR in an input signal,
A frequency analysis unit that performs frequency analysis of the input signal to calculate an input spectrum;
A plurality of band-specific SNR estimation means for estimating the SNR of a signal in the frequency band corresponding to the frequency band of any input spectrum calculated by the frequency analysis unit;
Band-specific SNR integration means for obtaining a final SNR estimate value by integrating a plurality of SNR estimation values for each frequency band obtained by the band-specific SNR estimation means,
Each of the band-specific SNR estimation means is respectively
The first feature amount based on the first input power calculated for the input frequency band signal is compared with the first threshold value generated internally, and the first frequency value in the input frequency band signal is compared . A first parameter calculation unit for detecting a target sound section of
The second feature value based on the second input power calculated for the input frequency band signal is compared with the second threshold value generated internally, and the second frequency value in the input frequency band signal is compared . A second parameter calculation unit for detecting the target sound section of
The first parameter calculation unit includes:
A first smoothing power is calculated by smoothing the first input power while controlling execution and stop of smoothing based on the detection result of the second target sound section before a predetermined unit time. 1 smoothing unit;
A first threshold value calculation unit for calculating the first threshold value by applying at least the first smoothing power;
The first target sound that obtains the detection result of the first target sound section by comparing the first input power with the first threshold value as the first feature amount to determine whether the target sound section is present or not. A section determination unit,
The second parameter calculation unit includes:
An SNR calculator that calculates an estimated value of SNR based on the second input power and the first smoothed power of the same unit time;
Second smoothing for smoothing the SNR estimated value and calculating the smoothed value of the SNR while controlling the execution and stop of smoothing based on the detection result of the first target sound section of the same unit time And
A second threshold value calculation unit for calculating the second threshold value by applying at least the SNR smooth value;
The SNR estimation value is compared with the second threshold value as the second feature amount to determine whether or not the target sound section is present, and second target sound section determination for obtaining a detection result of the second target sound section And
The SNR estimation apparatus, wherein the SNR estimation value from the SNR calculation unit is obtained as an SNR estimation value of the frequency band in the input frequency band signal.

A noise suppression program that suppresses a noise component contained in an input signal and emphasizes a target sound component,
Computer
A frequency analysis unit that performs frequency analysis of the input signal to calculate an input spectrum;
Corresponding to the frequency band of any input spectrum calculated by the frequency analysis unit, to suppress the noise component in the signal of the frequency band, to function as a plurality of band-specific noise suppression means,
The noise suppression means for each band is as follows:
The first feature quantity based on the first input power calculated for the input frequency band signal is compared with the first threshold value generated internally, and the first frequency in the input frequency band signal is compared. A first parameter calculation unit for obtaining a detection result of the target sound section;
The second feature amount based on the second input power calculated for the input frequency band signal is compared with the second threshold value generated internally, and the second feature value in the input frequency band signal is compared. A second parameter calculation unit for obtaining a detection result of the target sound section;
Noise that suppresses a noise component included in the input frequency band signal based on the first parameter obtained by the first parameter calculation unit and the second parameter obtained by the second parameter calculation unit A repressor,
The first parameter calculation unit uses the second parameter output by the second parameter calculation unit before a predetermined unit time and includes at least a detection result of the second target sound section. Generate a threshold,
The second parameter calculation unit uses the second parameter that is output by the first parameter calculation unit in the same unit time and includes at least the detection result of the first target sound section, and then uses the second parameter. A noise suppression program characterized by generating

A noise estimation program for estimating noise power in an input signal,
Computer
A frequency analysis unit that performs frequency analysis of the input signal to calculate an input spectrum;
Corresponding to the frequency band of any input spectrum calculated by the frequency analysis unit, a plurality of band-specific noise estimation means for estimating the noise power in the signal of the frequency band,
The noise estimation means for each band obtained by each of the above bands functions as a noise power integration means for each band to obtain a final noise power estimation value by integrating a plurality of noise power estimation values for each frequency band,
The noise estimation means for each band is respectively
The first feature amount based on the first input power calculated for the input frequency band signal is compared with the first threshold value generated internally, and the first frequency value in the input frequency band signal is compared . A first parameter calculation unit for detecting a target sound section of
The second feature value based on the second input power calculated for the input frequency band signal is compared with the second threshold value generated internally, and the second frequency value in the input frequency band signal is compared . A second parameter calculation unit for detecting the target sound section of
The first parameter calculation unit includes:
A first smoothing power is calculated by smoothing the first input power while controlling execution and stop of smoothing based on the detection result of the second target sound section before a predetermined unit time. 1 smoothing unit;
A first threshold value calculation unit for calculating the first threshold value by applying at least the first smoothing power;
The first target sound that obtains the detection result of the first target sound section by comparing the first input power with the first threshold value as the first feature amount to determine whether the target sound section is present or not. A section determination unit,
The second parameter calculation unit includes:
A second smoothing power is calculated by smoothing the second input power while controlling the execution and stop of the smoothing based on the detection result of the first target sound section of the same unit time. A smoothing section of
A second threshold value calculation unit for calculating the second threshold value by applying at least the second smoothing power;
The second target sound is obtained by comparing the second input power as the second feature amount with the second threshold value to determine whether the target sound section is present, and obtaining a detection result of the second target sound section. A section determination unit,
The first smoothing unit or the second smoothing unit receives a detection result of the second target sound section before a predetermined unit time or a detection result of the first target sound section of the same unit time. Smoothing if it is not the target sound section, stopping smoothing if it is the target sound section, and obtaining the first smoothing power or the second smoothing power as an estimate of noise power for each band. Feature noise estimation program.

An SNR estimation program for estimating an SNR in an input signal,
Computer
A frequency analysis unit that performs frequency analysis of the input signal to calculate an input spectrum;
A plurality of band-specific SNR estimation means for estimating the SNR of a signal in the frequency band corresponding to the frequency band of any input spectrum calculated by the frequency analysis unit;
It functions as a band-specific SNR integration unit that obtains a final SNR estimation value by integrating a plurality of SNR estimation values for each frequency band obtained by each band-specific SNR estimation unit,
Each of the band-specific SNR estimation means is respectively
The first feature amount based on the first input power calculated for the input frequency band signal is compared with the first threshold value generated internally, and the first frequency value in the input frequency band signal is compared . A first parameter calculation unit for detecting a target sound section of
The second feature value based on the second input power calculated for the input frequency band signal is compared with the second threshold value generated internally, and the second frequency value in the input frequency band signal is compared . A second parameter calculation unit for detecting the target sound section of
The first parameter calculation unit includes:
A first smoothing power is calculated by smoothing the first input power while controlling execution and stop of smoothing based on the detection result of the second target sound section before a predetermined unit time. 1 smoothing unit;
A first threshold value calculation unit for calculating the first threshold value by applying at least the first smoothing power;
The first target sound that obtains the detection result of the first target sound section by comparing the first input power with the first threshold value as the first feature amount to determine whether the target sound section is present or not. A section determination unit,
The second parameter calculation unit includes:
An SNR calculator that calculates an estimated value of SNR based on the second input power and the first smoothed power of the same unit time;
Second smoothing for smoothing the SNR estimated value and calculating the smoothed value of the SNR while controlling the execution and stop of smoothing based on the detection result of the first target sound section of the same unit time And
A second threshold value calculation unit for calculating the second threshold value by applying at least the SNR smooth value;
The SNR estimation value is compared with the second threshold value as the second feature amount to determine whether or not the target sound section is present, and second target sound section determination for obtaining a detection result of the second target sound section And
The SNR estimation program characterized in that the SNR estimation value from the SNR calculation unit is obtained as an estimated value of SNR in the frequency band in the input frequency band signal.