JP7139822B2

JP7139822B2 - Noise estimation device, noise estimation program, noise estimation method, and sound collection device

Info

Publication number: JP7139822B2
Application number: JP2018176478A
Authority: JP
Inventors: 大藤枝
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2018-09-20
Filing date: 2018-09-20
Publication date: 2022-09-21
Anticipated expiration: 2038-09-20
Also published as: JP2020046586A

Description

この発明は、雑音推定装置、雑音推定プログラム、雑音推定方法、及び収音装置に関し、例えば、入力信号に含まれる雑音成分の推定結果を用いて、入力信号に重畳された雑音成分を抑圧する装置に適用し得る。 The present invention relates to a noise estimation device, a noise estimation program, a noise estimation method, and a sound collection device, for example, a device that suppresses noise components superimposed on an input signal using estimation results of noise components included in the input signal. can be applied to

自然環境において雑音はいたる所に存在するため、一般に実世界で観測される音声は種々の発信元からの雑音を含む。雑音を含んで観測された入力信号から音声だけを強調させるために、様々な雑音抑圧方法が開発されている。これらのうちのほとんどは、抑圧すべき雑音を推定する方法と、雑音を抑圧するフィルタを計算する方法とを有する。従来の入力信号から雑音を抑圧する音声処理装置では、周波数領域で雑音のパワーを推定するものがある。 Since noise exists everywhere in the natural environment, speech observed in the real world generally contains noise from various sources. Various noise suppression methods have been developed to enhance only speech from an observed noisy input signal. Most of these have methods for estimating the noise to be suppressed and methods for computing filters to suppress the noise. Some conventional speech processing devices that suppress noise from an input signal estimate noise power in the frequency domain.

従来、最も単純な雑音推定方法の例として、入力スペクトルを音声が存在しない区間で平均する方法がある。しかし、このような従来の雑音推定方法は、事前に音声が存在しない区間を推定しなければならない。そのため、音声が存在する区間を推定する音声区間検出（ＶｏｉｃｅＡｃｔｉｖｉｔｙＤｅｔｅｃｔｉｏｎ：ＶＡＤ）という技術も盛んに開発されているが、完全なＶＡＤは未だ達成されていない。雑音推定処理において、音声区間の推定を誤ると、推定雑音が目的音声を含んでしまうため、強調音声や残留雑音を歪ませるという問題が生じる。また、上述のような雑音推定方法では、雑音区間でしか雑音を推定しないため、長い音声区間があると雑音の変化に追従できないという欠点もある。 Conventionally, as an example of the simplest noise estimation method, there is a method of averaging an input spectrum in an interval in which speech does not exist. However, such a conventional noise estimation method requires pre-estimation of a section in which no speech exists. For this reason, a technique called voice activity detection (VAD) for estimating an interval in which voice exists has been actively developed, but perfect VAD has not yet been achieved. In the noise estimation process, if the estimation of the speech period is erroneous, the estimated noise includes the target speech, which causes the problem of distorting the emphasized speech and the residual noise. In addition, the noise estimation method described above estimates noise only in the noise interval, and therefore has the disadvantage that it cannot follow changes in noise if there is a long speech interval.

このような背景から、音声区間でも雑音の推定を継続する雑音推定方法として、非特許文献１、非特許文献２、及び特許文献１の記載技術がある。いずれの文献も雑音抑圧方法（音声強調方法とも言う）に関する。 Against this background, there are techniques described in Non-Patent Document 1, Non-Patent Document 2, and Patent Document 1 as noise estimation methods for continuing noise estimation even in speech segments. Both documents relate to noise suppression methods (also called speech enhancement methods).

非特許文献１に記載の従来の雑音推定方法は、入力パワーの時間方向のピークが目的音声の存在を表す一方で、谷が平滑化した雑音パワーの推定に使えるという発見に基づいている。具体的には、現在から所定時間過去までの入力パワーの最小値を、第１の推定雑音パワーとする。しかし、第１の推定雑音パワーはバイアスを有しており、真の雑音パワーよりも小さくなる性質を持つ。このバイアスは、第１の推定雑音パワーの期待値から推定され、得られたバイアス推定値を用いて第１の推定雑音パワーを補正して、第２の推定雑音パワー（最終的な推定値）を得る。 The conventional noise estimation method described in Non-Patent Document 1 is based on the discovery that the temporal peaks of the input power indicate the presence of the target speech, while the valleys can be used to estimate the smoothed noise power. Specifically, the minimum value of the input power from the present to the past for a predetermined time is taken as the first estimated noise power. However, the first estimated noise power has a bias and has the property of being smaller than the true noise power. This bias is estimated from the expected value of the first noise power estimate, and the resulting bias estimate is used to correct the first noise power estimate to produce a second noise power estimate (final estimate) get

特許文献１に記載の従来の雑音推定方法は、入力パワーに適切な重み係数を乗じて、得られた加重入力パワーを所定時間分だけ記憶しておき、記憶した加重入力パワーの平均値を推定雑音パワーとする。適切な重み係数は、現在の入力パワーを直前の推定雑音パワーで除した事後ＳＮＲ（Ｓｉｇｎａｌ－ｔｏ－ＮｏｉｓｅＲａｔｉｏ）に基づいて算出される。具体的には、事後ＳＮＲが所定の値Ｇ１以下では重み係数を１とし、事後ＳＮＲがＧ１以上では事後ＳＮＲに反比例するように重み係数を設定し、事後ＳＮＲが所定の値Ｇ２以上では重み係数を０とする。また、重み係数が０の場合には、加重入力パワーは記憶しない。 The conventional noise estimation method described in Patent Document 1 multiplies the input power by an appropriate weighting factor, stores the obtained weighted input power for a predetermined time period, and estimates the average value of the stored weighted input power. noise power. Appropriate weighting factors are calculated based on the a posteriori SNR (Signal-to-Noise Ratio), which is the current input power divided by the previous estimated noise power. Specifically, when the posterior SNR is less than or equal to a predetermined value G1, the weighting factor is set to 1, when the posterior SNR is greater than or equal to G1, the weighting factor is set so as to be inversely proportional to the posterior SNR, and when the posterior SNR is greater than or equal to the predetermined value G2, the weighting factor is set to 0. Also, when the weighting factor is 0, the weighted input power is not stored.

非特許文献２に記載の従来の雑音推定方法は、目的音声と雑音の複素スペクトルの分布がいずれも平均ゼロの複素正規分布に従うという仮説に基づいて、雑音の複素スペクトルの分散の最尤推定値を推定雑音パワーとする。この仮説に基づくと、入力音声の複素スペクトルの分布は、音声の複素スペクトルの分散と雑音の複素スペクトルの分散の和を分散とする平均ゼロの複素正規分布となる。ここに現在の入力が劣化音声と雑音のどちらであるかに関する隠れ変数を導入して、忘却係数を伴ったオンラインＥＭ（ＥｘｐｅｃｔａｔｉｏｎＭａｘｉｍｉｚａｔｉｏｎ）アルゴリズムを適用することで、雑音の複素スペクトルの最尤推定値を算出する。 The conventional noise estimation method described in Non-Patent Document 2 is based on the hypothesis that the distributions of the complex spectra of the target speech and the noise both follow a complex normal distribution with a mean of zero. be the estimated noise power. Based on this hypothesis, the distribution of the complex spectrum of the input speech is a zero-mean complex normal distribution whose variance is the sum of the variance of the speech complex spectrum and the noise complex spectrum. Here, we introduce a hidden variable regarding whether the current input is degraded speech or noise, and apply an on-line EM (Expectation Maximization) algorithm with a forgetting factor to obtain the maximum likelihood estimate of the complex spectrum of the noise. Calculate

特開２００２－２０４１７５号公報Japanese Unexamined Patent Application Publication No. 2002-204175

Ｒ．Ｍａｒｔｉｎ、“ＳｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎＢａｓｅｄｏｎＭｉｎｉｍｕｍＳｔａｔｉｓｔｉｃｓ”、ｉｎＰｒｏｃｅｅｄｉｎｇｓｏｆ７ｔｈＥｕｒｏｐｅａｎＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇＣｏｎｆｅｒｅｎｃｅ、１９９４、ｐｐ．１１８２－１１８５R. Martin, "Spectral Subtraction Based on Minimum Statistics," in Proceedings of the 7th European Signal Processing Conference, 1994, pp. 1182-1185 Ｍ．Ｓｏｕｄｅｎ、Ｍ．Ｄｅ１ｃｒｏｉｘ、Ｋ．Ｋｉｎｓｏｓｈｉｔａ、Ｔ．Ｙｏｓｈｉｏｋａ、ａｎｄＴ．Ｎａｋａｔａｎｉ、“ＮｏｉｓｅＰｏｗｅｒＳｐｅｃｔｒａｌＤｅｎｓｉｔｙＴｒａｃｋｉｎｇ：ＡＭａｘｉｍｕｍＬｉｋｅｌｉｈｏｏｄＰｅｒｓｐｅｃｔｉｖｅ”、ＩＥＥＥＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇＬｅｔｔｅｒｓ、Ｖｏｌ．１９、Ｎｏ．８、２０１２、ｐｐ．４９５－４９８M. Souden, M. De1croix, K.; Kinsoshita, T. Yoshioka, and T. Nakatani, "Noise Power Spectral Density Tracking: A Maximum Likelihood Perspective", IEEE Signal Processing Letters, Vol. 19, No. 8, 2012, pp. 495-498

しかしながら、従来の雑音推定方法には以下に述べるような問題点が存在する。 However, conventional noise estimation methods have the following problems.

非特許文献１の方法は、雑音が急に大きくなった場合に、推定雑音パワーが遅れて急激に大きくなるという課題を有している。具体的には、雑音が大きくなってから所定時間の間は、推定雑音パワーは小さいままである。そして、雑音が大きくなってから所定時間後に、推定雑音パワーは急激に増大する。 The method of Non-Patent Document 1 has a problem that when noise suddenly increases, the estimated noise power increases rapidly with a delay. Specifically, the estimated noise power remains small for a predetermined time after the noise increases. Then, the estimated noise power sharply increases a predetermined time after the noise increases.

特許文献１の方法は、音声伝送におけるパケット損失やエコー対策のためのボイススイッチなどによって入力音声が一時的に小さくなる現象が起きると、入力音声が元に戻っても推定雑音パワーはしばらく小さいままとなる課題を有している。すなわち、小さい入力音声は事後ＳＮＲがＧ１より小さくなるので雑音パワーの推定に使われ続け、推定雑音パワーが小さくなる。その状態で入力音声が大きくなると、事後ＳＮＲがＧ２より大きくなるので雑音パワーの推定に使われなくなり、推定雑音パワーは更新されなくなる。 In the method of Patent Document 1, when a phenomenon occurs in which the input voice is temporarily reduced due to packet loss in voice transmission, a voice switch for echo countermeasures, or the like, the estimated noise power remains small for a while even after the input voice returns to normal. There is a problem of That is, a small input speech has a posterior SNR smaller than G1, so it continues to be used for noise power estimation, and the estimated noise power becomes smaller. In this state, when the input speech becomes louder, the posterior SNR becomes larger than G2, so it is not used for noise power estimation, and the estimated noise power is not updated.

非特許文献２の方法は、この雑音推定方法で用いられているオンラインＥＭアルゴリズムには、忘却係数を大きくすると安定性が増して追従が遅くなり、忘却係数を小さくすると追従が速くなって安定性が下がるという、追従の速さと最尤推定の安定性とのトレードオフがあるという課題を有している。非特許文献２の記載技術において、忘却係数は観測環境の雑音レベルに合わせて設定する必要があるため、実用性に乏しい。 In the method of Non-Patent Document 2, the online EM algorithm used in this noise estimation method has a higher forgetting factor to increase stability and slow down tracking, and a small forgetting factor to speed up tracking and improve stability. There is a problem that there is a trade-off between the speed of tracking and the stability of maximum likelihood estimation. In the technology described in Non-Patent Document 2, the forgetting factor needs to be set according to the noise level of the observation environment, so it is not practical.

以上のように、従来の雑音推定方法は、入力音声が急に大きくなると推定雑音パワーが不適切なタイミングで急激に大きくなったり、入力音声が一時的に小さくなると推定雑音パワーが小さいままとなったり、推定パラメータを環境に合わせて調整が必要になったりする課題があった。 As described above, in the conventional noise estimation method, when the input speech suddenly increases, the estimated noise power suddenly increases at an inappropriate timing, and when the input speech decreases temporarily, the estimated noise power remains small. Also, there was a problem that it was necessary to adjust the estimated parameters according to the environment.

以上のような問題に鑑みて、調整必要な項目（例えば、推定パラメータ）が少なく、かつ、入力音声の急な変化や変化に左右されずに雑音パワーを推定できる雑音推定装置、雑音推定プログラム、雑音推定方法、及び収音装置が望まれている。 In view of the above problems, noise estimation apparatus, noise estimation program, noise estimation apparatus, noise estimation program, A noise estimation method and sound collection device are desired.

第１の本発明は、音声と雑音が混合された入力信号中に含まれる雑音成分を推定して推定雑音パワーを得る雑音推定装置において、（１）前記入力信号の入力パワーと過去の前記推定雑音パワーから前記入力パワーの定常確率を算出する定常確率算出手段と、（２）過去の前記入力パワーのデータをバッファリングする入力バッファと、過去に前記定常確率算出手段が算出した定常確率をバッファリングする確率バッファとを備え、前記入力パワーと前記定常確率算出手段で算出された定常確率に基づいて前記入力バッファの内容と、前記確率バッファの内容を更新するバッファ更新手段と、（３）前記確率バッファに保持されている定常確率を重み係数として前記入力バッファに保持されている前記入力パワーの加重平均をとることで前記推定雑音パワーを算出する加重平均手段とを備えることを特徴とする。 A first aspect of the present invention provides a noise estimation apparatus for obtaining an estimated noise power by estimating a noise component contained in an input signal in which speech and noise are mixed, wherein: (1) the input power of the input signal and the past estimation; (2) an input buffer for buffering past input power data; and a buffer for past stationary probabilities calculated by the stationary probability calculation means. (3) buffer updating means for updating the content of the input buffer and the content of the probability buffer based on the input power and the stationary probability calculated by the stationary probability calculating means; weighted averaging means for calculating the estimated noise power by taking a weighted average of the input powers held in the input buffer using the stationary probability held in the probability buffer as a weighting factor.

第２の本発明の雑音推定プログラムは、音声と雑音が混合された入力信号中に含まれる雑音成分を推定して推定雑音パワーを得る雑音推定装置に搭載されたコンピュータを、（１）前記入力信号の入力パワーと過去の前記推定雑音パワーから前記入力パワーの定常確率を算出する定常確率算出手段と、（２）過去の前記入力パワーのデータをバッファリングする入力バッファと、過去に前記定常確率算出手段が算出した定常確率をバッファリングする確率バッファとを備え、前記入力パワーと前記定常確率算出手段で算出された定常確率に基づいて前記入力バッファの内容と、前記確率バッファの内容を更新するバッファ更新手段と、（３）前記確率バッファに保持されている定常確率を重み係数として前記入力バッファに保持されている前記入力パワーの加重平均をとることで前記推定雑音パワーを算出する加重平均手段として機能することを特徴とする。 A noise estimation program according to a second aspect of the present invention provides a computer installed in a noise estimation device for obtaining an estimated noise power by estimating a noise component contained in an input signal in which speech and noise are mixed: (1) the input (2) an input buffer for buffering data of the input power in the past; a probability buffer for buffering the steady-state probability calculated by the calculation means, and updating the content of the input buffer and the content of the probability buffer based on the input power and the steady-state probability calculated by the steady-state probability calculation means. (3) weighted average means for calculating the estimated noise power by taking a weighted average of the input powers held in the input buffer using the stationary probability held in the probability buffer as a weighting factor; It is characterized by functioning as

第３の本発明は、音声と雑音が混合された入力信号中に含まれる雑音成分を推定して推定雑音パワーを得る雑音推定装置が行う雑音推定方法において、（１）定常確率算出手段、バッファ更新手段、及び加重平均手段を有し、（２）前記定常確率算出手段は、前記入力信号の入力パワーと過去の前記推定雑音パワーから前記入力パワーの定常確率を算出し、（３）前記バッファ更新手段は、過去の前記入力パワーのデータをバッファリングする入力バッファと、過去に前記定常確率算出手段が算出した定常確率をバッファリングする確率バッファとを備え、前記入力パワーと前記定常確率算出手段で算出された定常確率に基づいて前記入力バッファの内容と、前記確率バッファの内容を更新し、（４）前記加重平均手段は、前記確率バッファに保持されている定常確率を重み係数として前記入力バッファに保持されている前記入力パワーの加重平均をとることで前記推定雑音パワーを算出することを特徴とする。 A third aspect of the present invention provides a noise estimation method performed by a noise estimation apparatus for obtaining an estimated noise power by estimating a noise component contained in an input signal in which speech and noise are mixed, comprising: (1) stationary probability calculating means; (2) the stationary probability calculation means calculates the stationary probability of the input power from the input power of the input signal and the past estimated noise power; (3) the buffer; The updating means includes an input buffer for buffering data of the past input power and a probability buffer for buffering the steady-state probability calculated by the steady-state probability calculating means in the past, and the input power and the steady-state probability calculating means. (4) the weighted averaging means updates the contents of the input buffer and the probability buffer based on the steady-state probability calculated in (4), wherein the weighted average means updates the steady-state probability held in the probability buffer as a weighting factor; The estimated noise power is calculated by taking a weighted average of the input powers held in a buffer.

第４の本発明は、音声と雑音が混合された入力信号から前記音声を収音する収音装置において、（１）前記入力信号に含まれる雑音成分を推定する雑音推定部と、（２）前記雑音推定部の推定結果を用いて、前記入力信号から前記音声を抽出して収音する収音部とを備え、前記雑音推定部として、第１の本発明の雑音推定装置を適用したことを特徴とする。 A fourth aspect of the present invention is a sound collection device for collecting speech from an input signal in which speech and noise are mixed, comprising: (1) a noise estimation unit for estimating a noise component included in the input signal; a sound collecting unit that extracts and collects the voice from the input signal using the estimation result of the noise estimating unit, and the noise estimating apparatus according to the first aspect of the present invention is applied as the noise estimating unit. It is characterized by

本発明によれば、調整必要な項目が少なく、かつ、入力音声の急な変化や変化に左右されずに雑音パワーを推定できるという効果を奏する。 According to the present invention, the number of items that need to be adjusted is small, and the noise power can be estimated without being influenced by sudden changes or changes in the input speech.

第１、第２の実施形態に係る雑音推定手段の機能的構成について示したブロック図である。3 is a block diagram showing the functional configuration of noise estimation means according to the first and second embodiments; FIG. 第１、第２の実施形態に係る収音装置の機能的構成について示したブロック図である。2 is a block diagram showing the functional configuration of the sound collecting device according to the first and second embodiments; FIG. 第１、第２の実施形態に係る収音装置のハードウェア構成について示したブロック図である。3 is a block diagram showing the hardware configuration of the sound collecting device according to the first and second embodiments; FIG. 第１、第２の実施形態に係る雑音推定部（雑音推定装置）の機能的構成について示したブロック図である。3 is a block diagram showing the functional configuration of a noise estimator (noise estimator) according to the first and second embodiments; FIG. 第１、第２の実施形態に係る定常確率算出関数の例について示した説明図（グラフ）である。FIG. 4 is an explanatory diagram (graph) showing an example of a stationary probability calculation function according to the first and second embodiments; 第１、第２の実施形態に係る定常確率算出関数を対数尺度で設計した場合の例について示した説明図（グラフ）である。FIG. 4 is an explanatory diagram (graph) showing an example of a case where the stationary probability calculation function according to the first and second embodiments is designed on a logarithmic scale;

（Ａ）第１の実施形態
以下、本発明による雑音推定装置、雑音推定プログラム、雑音推定方法、及び収音装置の第１の実施形態を、図面を参照しながら詳述する。この実施形態では、本発明の雑音推定装置、雑音推定プログラム、及び雑音推定方法を、雑音推定手段に適用した例について説明する。 (A) First Embodiment Hereinafter, a first embodiment of a noise estimation device, a noise estimation program, a noise estimation method, and a sound collection device according to the present invention will be described in detail with reference to the drawings. In this embodiment, an example in which the noise estimation apparatus, noise estimation program, and noise estimation method of the present invention are applied to noise estimation means will be described.

（Ａ－１）第１の実施形態の構成
図２は、この実施形態の収音装置１の機能的構成について示したブロック図である。なお、図２における括弧内の符号は後述する第２の実施形態でのみ用いられる符号である。 (A-1) Configuration of First Embodiment FIG. 2 is a block diagram showing the functional configuration of the sound pickup device 1 of this embodiment. Note that the symbols in parentheses in FIG. 2 are symbols used only in the second embodiment described later.

収音装置１は、マイクロホンＭにより捕捉される音響信号から、目的音を収音する収音処理を行う装置である。 The sound collection device 1 is a device that performs sound collection processing for collecting a target sound from an acoustic signal captured by the microphone M. FIG.

この実施形態の例では、マイクロホンＭは図示しない電話端末の受話器に搭載されているものとする。この場合、マイクロホンＭにより捕捉される音響信号には、例えば、目的音としての音声（例えば、近端話者の音声）と非目的音としての雑音（例えば、背景雑音等）が含まれる。そして、この実施形態の例では、収音装置１は図示しない電話端末に搭載され、マイクロホンＭにより捕捉される音響信号から、非目的音（例えば、背景雑音等の雑音）を除去して目的音（例えば、近端話者の音声）を収音する処理を行う。 In the example of this embodiment, it is assumed that the microphone M is mounted on the receiver of a telephone terminal (not shown). In this case, the acoustic signal captured by the microphone M includes, for example, speech as the target sound (for example, the speech of the near-end speaker) and noise as the non-target sound (for example, background noise). In the example of this embodiment, the sound collecting device 1 is mounted on a telephone terminal (not shown), and removes non-target sounds (for example, noise such as background noise) from the acoustic signal captured by the microphone M to obtain the target sound. (for example, the voice of the near-end speaker).

次に、収音装置１の内部構成について説明する。 Next, the internal configuration of the sound collecting device 1 will be described.

この実施形態において、収音装置１は、信号入力部１０、雑音抑圧部２０、信号出力部３０、及び雑音推定部４０を備える。 In this embodiment, the sound collecting device 1 includes a signal input section 10, a noise suppression section 20, a signal output section 30, and a noise estimation section 40. FIG.

収音装置１は、全てハードウェア（例えば、専用チップ等）により構成するようにしてもよいし一部又は全部についてソフトウェア（プログラム）として構成するようにしてもよい。収音装置１は、例えば、プロセッサ及びメモリを有するコンピュータにプログラム（実施形態の雑音推定プログラムを含む収音プログラム）をインストールすることにより構成するようにしてもよい。 The sound collecting device 1 may be configured entirely of hardware (for example, a dedicated chip or the like), or may be partially or wholly configured as software (program). The sound collecting device 1 may be configured, for example, by installing a program (a sound collecting program including the noise estimation program of the embodiment) in a computer having a processor and memory.

信号入力部１０は、マイクロホンＭから供給されるアナログの音響信号をデジタル信号に変換してコンピュータ２００に供給する機能を担っている。以下では、マイクロホンＭにより捕捉され、信号入力部１０によりデジタル変換された音響信号を入力信号ｘと呼ぶものとする。 The signal input unit 10 has a function of converting an analog acoustic signal supplied from the microphone M into a digital signal and supplying the digital signal to the computer 200 . Hereinafter, the acoustic signal captured by the microphone M and digitally converted by the signal input section 10 will be referred to as an input signal x.

雑音推定部４０は、入力信号ｘに含まれる雑音（非目的音）を推定する機能を担っている。以下では、雑音推定部４０が推定した雑音（推定した雑音の信号）を「推定雑音」と呼ぶものとする。 The noise estimation unit 40 has a function of estimating noise (non-target sound) included in the input signal x. Hereinafter, the noise (estimated noise signal) estimated by the noise estimation unit 40 is referred to as "estimated noise".

雑音抑圧部２０は、雑音推定部４０が推定した推定雑音を用いて、入力信号ｘに含まれる雑音成分を抑圧した信号（以下、「雑音抑圧済信号」と呼ぶ）を出力する機能を担っている。 The noise suppressor 20 uses the estimated noise estimated by the noise estimator 40 to output a signal in which the noise component contained in the input signal x is suppressed (hereinafter referred to as a "noise-suppressed signal"). there is

信号出力部３０は、当該収音装置１の収音結果（この実施形態では、雑音抑圧部２０が出力する雑音抑圧済信号を出力する機能を担っている。 The signal output unit 30 has a function of outputting the sound pickup result of the sound pickup device 1 (in this embodiment, the noise-suppressed signal output by the noise suppression unit 20).

図３は、収音装置１のハードウェア構成の例について示したブロック図である。なお、図３における括弧内の符号は後述する第２の実施形態で用いられる符号である。 FIG. 3 is a block diagram showing an example of the hardware configuration of the sound collecting device 1. As shown in FIG. Note that the symbols in parentheses in FIG. 3 are symbols used in the second embodiment described later.

図３では、収音装置１をソフトウェア（コンピュータ）を用いて構成する際の構成について示している。 FIG. 3 shows a configuration when the sound collecting device 1 is configured using software (computer).

図３に示す収音装置１は、ハードウェア的な構成要素として、少なくとも信号入力部１０と、プログラム（実施形態の雑音推定プログラムを含む収音プログラム）がインストールされたコンピュータ２００を有している。 The sound collecting device 1 shown in FIG. 3 has, as hardware components, at least a signal input unit 10 and a computer 200 in which a program (a sound collecting program including a noise estimation program of the embodiment) is installed. .

信号入力部１０は、例えば、Ｄ／Ａコンバータを用いて構成することができる。なお、コンピュータ２００自体にＤ／Ａコンバータが搭載されていれば、信号入力部１０を別途設ける必要はない。 The signal input unit 10 can be configured using, for example, a D/A converter. Note that if the computer 200 itself is equipped with a D/A converter, the signal input unit 10 need not be provided separately.

コンピュータ２００は、信号入力部１０から供給される音響信号（デジタル音響信号）に所定の処理を施して出力する処理を行う。この実施形態では、コンピュータ２００には、少なくとも雑音抑圧部２０、信号出力部３０、及び雑音推定部４０に相当するプログラム（この実施形態の収音プログラム）がインストールされているものとする。なお、この実施形態の収音プログラムには、雑音推定部４０に相当する雑音推定プログラムが含まれている。 The computer 200 performs predetermined processing on the acoustic signal (digital acoustic signal) supplied from the signal input unit 10 and outputs the processed signal. In this embodiment, it is assumed that the computer 200 is installed with programs corresponding to at least the noise suppression unit 20, the signal output unit 30, and the noise estimation unit 40 (sound pickup program of this embodiment). Note that the sound collection program of this embodiment includes a noise estimation program corresponding to the noise estimation unit 40 .

なお、コンピュータ２００は、収音プログラム専用のコンピュータとしてもよいし、他の機能（例えば、電話端末が受信した遠端信号（受話信号）を図示しないスピーカから出力する機能）のプログラムと共用される構成としてもよい。 Note that the computer 200 may be a computer dedicated to the sound collection program, or may be shared with a program for other functions (for example, a function of outputting a far-end signal (receiving signal) received by the telephone terminal from a speaker (not shown)). may be configured.

図３に示すコンピュータ２００は、プロセッサ２０１、一次記憶部２０２、及び二次記憶部２０３を有している。一次記憶部２０２は、プロセッサ２０１の作業用メモリ（ワークメモリ）として機能する記憶手段であり、例えば、ＤＲＡＭ等の高速動作するメモリが適用される。二次記憶部２０３は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）やプログラムデータ（実施形態に係る収音プログラムのデータを含む）等の種々のデータを記録する記憶手段であり、例えば、ＦＬＡＳＨメモリやＨＤＤ等の不揮発性メモリが適用される。この実施形態のコンピュータ２００では、プロセッサ２０１が起動する際、二次記憶部２０３に記録されたＯＳやプログラム（実施形態に係る収音プログラムを含む）を読み込み、一次記憶部２０２上に展開して実行する。 A computer 200 shown in FIG. 3 has a processor 201 , a primary storage unit 202 and a secondary storage unit 203 . The primary storage unit 202 is storage means that functions as a working memory (work memory) for the processor 201, and for example, a high-speed memory such as a DRAM is applied. The secondary storage unit 203 is storage means for recording various data such as an OS (Operating System) and program data (including sound pickup program data according to the embodiment). sensitive memory is applied. In the computer 200 of this embodiment, when the processor 201 is activated, the OS and programs (including the sound pickup program according to the embodiment) recorded in the secondary storage unit 203 are read, and expanded on the primary storage unit 202. Run.

なお、コンピュータ２００の具体的な構成は図３の構成に限定されないものであり、種々の構成を適用することができる。例えば、一次記憶部２０２が不揮発メモリ（例えば、ＦＬＡＳＨメモリ等）であれば、二次メモリについては除外した構成としてもよい。 Note that the specific configuration of the computer 200 is not limited to the configuration in FIG. 3, and various configurations can be applied. For example, if the primary storage unit 202 is a non-volatile memory (for example, FLASH memory), the secondary memory may be excluded.

次に、雑音推定部４０の内部構成について図４を用いて説明する。 Next, the internal configuration of the noise estimator 40 will be explained using FIG.

図４は、雑音推定部４０の機能的構成について示したブロック図である。なお、図４における括弧内の符号は後述する第２の実施形態で用いられる符号である。 FIG. 4 is a block diagram showing the functional configuration of the noise estimation section 40. As shown in FIG. Note that the symbols in parentheses in FIG. 4 are symbols used in the second embodiment described later.

雑音推定部４０は、帯域分割手段４１と、Ｋ個のパワー算出手段４２（４２－１～４２－Ｋ）と、Ｋ個の雑音推定手段４３（４３－１～４３－Ｋ）とを有している。 The noise estimator 40 has a band dividing means 41, K power calculating means 42 (42-1 to 42-K), and K noise estimating means 43 (43-1 to 43-K). ing.

そして、図１は、それぞれの雑音推定手段４３（４３－１～４３－Ｋ）の内部構成について示した説明図である。この実施形態では、雑音推定手段４３－１～４３－Ｋの内部は全て図１を用いて示すことができる構成となっているものとする。なお、図１における括弧内の符号は後述する第２の実施形態で用いられる符号である。 FIG. 1 is an explanatory diagram showing the internal configuration of each noise estimation means 43 (43-1 to 43-K). In this embodiment, the interiors of the noise estimation means 43-1 to 43-K are all configured as shown in FIG. The symbols in parentheses in FIG. 1 are symbols used in the second embodiment described later.

図１に示すように、それぞれの雑音推定手段４３は、定常確率算出手段１０１、バッファ更新手段１０２、加重平均手段１０３、及び記憶手段１０４を有している。 As shown in FIG. 1, each noise estimation means 43 has stationary probability calculation means 101 , buffer update means 102 , weighted average means 103 and storage means 104 .

雑音推定部４０を構成する各要素（雑音推定手段４３を構成する各要素を含む）の詳細な機能（動作）については後述する。 Detailed functions (operations) of the elements constituting the noise estimation section 40 (including the elements constituting the noise estimation means 43) will be described later.

（Ａ－２）第１の実施形態の動作
次に、以上のような構成を有する収音装置１（雑音推定部４０）の動作を説明する。 (A-2) Operation of the First Embodiment Next, the operation of the sound pickup device 1 (noise estimation unit 40) having the configuration as described above will be described.

以下では、収音装置１のうち、本発明の特徴に係る雑音推定部４０の動作を中心に説明する。 In the following, the operation of the noise estimator 40 of the sound collecting device 1, which is a feature of the present invention, will be mainly described.

帯域分割手段４１は、入力信号ｘを周波数解析して周波数スペクトル（以下、「入力スペクトル」とも呼ぶ）を算出する。そして、帯域分割手段４１は、得られた入力スペクトルをＫ個に分割（Ｋ個の周波数帯域に分割）して、分割した入力スペクトル（以下、「周波数帯域信号」と呼ぶ）を、各パワー算出手段４２（４２－１～４２－Ｋ）に与える。 The band dividing means 41 frequency-analyzes the input signal x to calculate a frequency spectrum (hereinafter also referred to as "input spectrum"). Then, the band dividing means 41 divides the obtained input spectrum into K (divides into K frequency bands), divides the divided input spectrum (hereinafter referred to as "frequency band signal") into each power calculation It is given to means 42 (42-1 to 42-K).

帯域分割手段４１が行う周波数解析処理の手法は限定されないものであるが、例えば、高速フーリエ変換（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ：ＦＦＴ）やウェーブレット変換やフィルタバンクなどを適用することができる。なお、この実施形態の帯域分割手段４１では、ＦＦＴを適用することが好適である。 Although the method of frequency analysis processing performed by the band dividing means 41 is not limited, for example, Fast Fourier Transform (FFT), wavelet transform, filter bank, etc. can be applied. It should be noted that FFT is preferably applied to the band dividing means 41 of this embodiment.

パワー算出手段４２（４２－１～４２－Ｋ）は、入力された周波数帯域信号に基づく入力パワーを算出し、対応する雑音推定手段４３（４３－１～４３－Ｋ）に与える。パワー算出手段４２－１～４２－Ｋは、それぞれ算出した入力パワーを、雑音推定手段４３－１～４３－Ｋに与える。 The power calculation means 42 (42-1 to 42-K) calculate the input power based on the input frequency band signal and give it to the corresponding noise estimation means 43 (43-1 to 43-K). The power calculators 42-1 to 42-K supply the calculated input powers to the noise estimators 43-1 to 43-K.

各パワー算出手段４２では、パワーの算出方法として、種々の算出方法を適用することができる。パワー算出手段４２は、例えば、入力された周波数帯域信号を構成する各周波数のパワーの絶対値を取得し、「取得した絶対値の２乗和」又は「取得した絶対値の和」を入力パワーとして算出するようにしても良い。以下では、各パワー算出手段４２（４２－１～４２－Ｋ）が出力するパワー（以下、「入力パワー」とも呼ぶ）をＰＸ（ＰＸ＿１～ＰＸ＿Ｋ）と表すものとする。 In each power calculation means 42, various calculation methods can be applied as power calculation methods. For example, the power calculation means 42 acquires the absolute value of the power of each frequency constituting the input frequency band signal, and calculates the "sum of squares of the acquired absolute values" or "sum of the acquired absolute values" as the input power. You may make it calculate as. Hereinafter, the power output by each power calculating means 42 (42-1 to 42-K) (hereinafter also referred to as "input power") is represented by PX (PX_1 to PX_K).

各雑音推定手段４３（４３－１～４３－Ｋ）は、それぞれパワー算出手段４２（４２－１～４２－Ｋ）から供給される入力パワーＰＸ（ＰＸ＿１～ＰＸ＿Ｋ）に含まれる雑音成分のパワーを推定し、その結果（以下、「推定雑音パワー」とも呼ぶ）を出力する。以下では、各雑音推定手段４３（４３－１～４３－Ｋ）が出力する推定雑音パワーを、ＰＮ（ＰＮ＿１～ＰＮ＿Ｋ）と表すものとする。雑音推定パワーＰＮ＿１～ＰＮ＿Ｋは、推定雑音の各成分のパワーをあらわしているため、雑音推定パワーＰＮ＿１～ＰＮ＿Ｋの集合体は、推定雑音の信号を表しているといえる。そして、この実施形態では、各雑音推定手段４３（４３－１～４３－Ｋ）が出力するＰＮ（ＰＮ＿１～ＰＮ＿Ｋ）は、雑音抑圧部２０に供給される。 Each noise estimation means 43 (43-1 to 43-K) calculates the power of the noise component contained in the input power PX (PX_1 to PX_K) supplied from the power calculation means 42 (42-1 to 42-K). Estimate and output the result (hereinafter also referred to as "estimated noise power"). In the following, the estimated noise power output by each noise estimation means 43 (43-1 to 43-K) is represented as PN (PN_1 to PN_K). Since the estimated noise powers PN_1 to PN_K represent the power of each component of the estimated noise, it can be said that the aggregate of the estimated noise powers PN_1 to PN_K represents the signal of the estimated noise. In this embodiment, the PNs (PN_1 to PN_K) output by the noise estimation means 43 (43-1 to 43-K) are supplied to the noise suppressor 20. FIG.

次に、各雑音推定手段４３（４３－１～４３－Ｋ）内部の動作について説明する。 Next, the operation inside each noise estimation means 43 (43-1 to 43-K) will be described.

雑音推定手段４３－１～４３－Ｋでは処理する入力パワーＰＸの周波数帯が異なるだけで処理自体は同様であるため、以下では、雑音推定手段４３－１～４３－Ｋを総称した雑音推定手段４３の動作を説明する。 Since the noise estimating means 43-1 to 43-K are different in the frequency band of the input power PX to be processed and the processing itself is the same, hereinafter, the noise estimating means 43-1 to 43-K will be collectively referred to as noise estimating means. 43 will be described.

雑音推定手段４３では、入力された入力パワーＰＸが、定常確率算出手段１０１とバッファ更新手段１０２に供給される。 In the noise estimating means 43 , the input power PX is supplied to the stationary probability calculating means 101 and the buffer updating means 102 .

まず、定常確率算出手段１０１の動作について説明する。 First, the operation of the stationary probability calculation means 101 will be described.

定常確率算出手段１０１は、入力パワーＰＸと後述するＤ個のサンプル分過去の推定雑音パワー（以下、「ＰＮＤ」と表す）とに基づいて入力パワーＰＸに関する定常確率（以下、「ＳＰ」と表す）を算出する。そして、定常確率算出手段１０１は、得られた定常確率ＳＰをバッファ更新手段１０２に与える。なお、Ｄの具体的な数値については後述する。 The stationary probability calculation means 101 calculates a stationary probability (hereinafter referred to as "SP") regarding the input power PX based on the input power PX and an estimated noise power (hereinafter referred to as "PND") for D samples to be described later. ) is calculated. Then, the stationary probability calculating means 101 gives the obtained stationary probability SP to the buffer updating means 102 . A specific numerical value of D will be described later.

定常確率ＳＰは、事後ＳＮＲ（以下、この事後ＳＮＲの値を「Ｇ」とも表す）に基づいて算出される。事後ＳＮＲ（Ｇ）は、（１）式で算出することができる。なお、定常確率ＳＰは、後述する定常確率算出関数によって算出することができる。

The stationary probability SP is calculated based on the posterior SNR (hereinafter, this posterior SNR value is also referred to as "G"). The posterior SNR(G) can be calculated by equation (1). The steady-state probability SP can be calculated by a steady-state probability calculation function, which will be described later.

この実施形態の定常確率算出関数は、事後ＳＮＲ（Ｇ）に対する定常確率ＳＰの関数である。定常確率算出手段１０１では、予め設計された定常確率算出関数が設定されているものとする。 The stationary probability calculation function of this embodiment is a function of the stationary probability SP with respect to the posterior SNR(G). It is assumed that a previously designed stationary probability calculation function is set in the stationary probability calculation means 101 .

次に、定常確率算出手段１０１に設定する定常確率算出関数の設計について説明する。 Next, the design of the steady-state probability calculation function set in the steady-state probability calculation means 101 will be described.

この実施形態において、定常確率算出関数の定義域（Ｇ）は０．０以上の実数であり、値域（ＳＰ）は０．０以上１．０以下の実数であるものとする。この実施形態における定常確率算出関数の形状（定常確率算出関数をグラフで表した場合における曲線の形状；Ｇの値を横軸、定常確率ＳＰを縦軸とした場合のグラフにおける曲線の形状）は、Ｇを所定の値（以下、「Ｇ０」と表す）としたとき（すなわちＧ＝Ｇ０としたとき）をピークとする山形であるものとする。すなわち、この実施形態における定常確率算出関数は、０．０≦Ｇ≦Ｇ０の区間で定常確率ＳＰが広義単調増加し、Ｇ０≦Ｇの区間で定常確率ＳＰの値が広義単調減少する特性に設計されたものであるものとする。 In this embodiment, the domain (G) of the stationary probability calculation function is a real number of 0.0 or more, and the range (SP) is a real number of 0.0 or more and 1.0 or less. The shape of the stationary probability calculation function in this embodiment (the shape of the curve when the stationary probability calculation function is represented by a graph; the shape of the curve in the graph when the horizontal axis is the value of G and the vertical axis is the stationary probability SP) , G is a predetermined value (hereinafter referred to as “G0”) (that is, when G=G0). That is, the steady-state probability calculation function in this embodiment is designed to have characteristics such that the steady-state probability SP increases monotonically in the broad sense in the interval of 0.0≦G≦G0, and the value of the steady-state probability SP monotonically decreases in the interval of G0≦G. shall have been made.

Ｇ０は、特に限定されるものではなく、Ｇの定義域の範囲で任意の値を取ることができる。ただし、定常確率算出関数は定常であるほど高い確率を出力する関数であるから、Ｇ０＝１．０（すなわち、ＰＸ＝ＰＮＤ）とするのが最も好適である。 G0 is not particularly limited, and can take any value within the domain of G. However, since the steady-state probability calculation function is a function that outputs a higher probability as the state becomes more steady, it is most preferable to set G0=1.0 (that is, PX=PND).

図５、図６は、定常確率算出関数の例について示した説明図（グラフ）である。 5 and 6 are explanatory diagrams (graphs) showing examples of stationary probability calculation functions.

図５に示す定常確率算出関数は、Ｇに対して滑らかに定義した関数となっている。 The stationary probability calculation function shown in FIG. 5 is a function smoothly defined for G. In FIG.

この実施形態で用いられる定常確率算出関数において、ＳＰ＝１．０となるＧ（Ｇ０）は、特に限定されるものではない。この実施形態の例では、ＳＰ＝１．０となるＧ（Ｇ０）を１．０（すなわちＧ０＝１．０）とする。また、この実施形態で用いられる定常確率算出関数は、Ｇを＋∞に向かって増加（Ｇ→＋∞）させたとき、定常確率ＳＰが０．０に向かって漸近収束（ＳＰ→０．０）する特性（滑らかに収束する特性）となることが好適である。 In the stationary probability calculation function used in this embodiment, G(G0) at which SP=1.0 is not particularly limited. In the example of this embodiment, G (G0) at which SP=1.0 is set to 1.0 (that is, G0=1.0). Further, the stationary probability calculation function used in this embodiment is such that when G is increased toward +∞ (G→+∞), the stationary probability SP asymptotically converges toward 0.0 (SP→0.0 ) (smooth convergence).

図６は、定常確率算出関数を対数尺度上で設計した場合の関数について示した説明図（グラフ）である。 FIG. 6 is an explanatory diagram (graph) showing a function when the stationary probability calculation function is designed on a logarithmic scale.

図６（ａ）は、定常確率算出関数の縦横の両軸を対数尺度で表したグラフである。図６（ｂ）は、図６（ａ）のように対数尺度上で設計した定常確率算出関数を線形尺度で表示した場合のグラフを示している。図６（ａ）、図６（ｂ）に示すように、定常確率算出関数を、縦横の両軸とも対数尺度上で設計することで、ＰＸ＜ＰＮＤのときとＰＸ＞ＰＮＤのときの関係が、フェア（対称）になるとともに、Ｇの全定義域上で定常確率算出関数を容易に定義できる。 FIG. 6(a) is a graph showing the vertical and horizontal axes of the stationary probability calculation function in logarithmic scale. FIG. 6(b) shows a graph when the stationary probability calculation function designed on a logarithmic scale as in FIG. 6(a) is displayed on a linear scale. As shown in FIGS. 6(a) and 6(b), by designing the stationary probability calculation function on both the vertical and horizontal axes on a logarithmic scale, the relationship between when PX<PND and when PX>PND is , is fair (symmetric), and the stationary probability calculation function can be easily defined over the entire domain of G.

次に、バッファ更新手段１０２の動作（機能）について説明する。 Next, the operation (function) of the buffer updating means 102 will be described.

バッファ更新手段１０２は、供給された入力パワーＰＸ（サンプル；データ）を保留するバッファ（以下、「入力バッファＢＸ」と呼ぶ）と、供給された定常確率ＳＰ（サンプル；データ）を保留するバッファ（以下、「確率バッファＢＰ」と呼ぶ）を備え、入力バッファＢＸと確率バッファＢＰの内容を加重平均手段１０３に供給する。すなわち、バッファ更新手段１０２は、入力パワーＰＸと定常確率ＳＰに基づいて入力バッファＢＸと確率バッファＢＰを更新し、得られた入力バッファＢＸと確率バッファＢＰの内容を加重平均手段１０３に与える。 The buffer update means 102 has a buffer (hereinafter referred to as "input buffer BX") that holds the supplied input power PX (sample; data) and a buffer that holds the supplied stationary probability SP (sample; data) ( hereinafter referred to as a "probability buffer BP"), and supplies the contents of the input buffer BX and the probability buffer BP to the weighted average means 103. That is, the buffer updating means 102 updates the input buffer BX and the probability buffer BP based on the input power PX and the stationary probability SP, and gives the contents of the obtained input buffer BX and the probability buffer BP to the weighted average means 103 .

入力バッファＢＸは、入力パワーＰＸのサンプル（データ）を過去から現在までＴ個（Ｔは１以上の任意の整数）分保持するバッファである。 The input buffer BX is a buffer that holds T samples (data) of the input power PX from the past to the present (T is an arbitrary integer equal to or greater than 1).

バッファ長Ｔは、任意の値としてよいが、雑音推定手段４３の用途に応じて１００ミリ秒～数秒に相当する長さ（１００ミリ秒～数秒に相当するサンプル数）とするようにしてもよい。雑音推定手段４３において、バッファ更新手段１０２のバッファ長Ｔを短くしすぎると（例えば、バッファ長Ｔを１００ミリ秒相当分とした場合）、推定対象の雑音の変化への追従が速くなるが、入力パワーＰＸ（入力信号ｘ）の内容によっては、雑音推定精度が劣化する場合がある。例えば、雑音推定手段４３において、バッファ更新手段１０２のバッファ長Ｔを短くしすぎると、入力パワーＰＸ（入力信号ｘ）にゆっくり発話された音声の成分が含まれる場合、その音声の成分が雑音として誤って推定されるおそれがある。また、雑音推定手段４３において、バッファ更新手段１０２のバッファ長Ｔを長くしすぎると（例えば、バッファ長Ｔを８秒相当分とした場合）、入力信号中の定常成分を正確に推定できるが、雑音環境が急に変化した場合（例えば、静かな部屋から騒々しい戸外に出た場合や、空調を一斉にＯＮ／ＯＦＦした場合など）にすぐに追従できない。したがって、雑音推定手段４３において、バッファ更新手段１０２のバッファ長Ｔは、短すぎず且つ長すぎない範囲（例えば、２００ミリ秒～１秒）に相当する長さとするのが好適である。例えば、上記のような雑音推定手段４３の特性を考慮して、実験等により予め好適なバッファ長Ｔを求めて雑音推定手段４３に設定しておくようにしてもよい。 The buffer length T may be an arbitrary value, but may be a length corresponding to 100 milliseconds to several seconds (the number of samples corresponding to 100 milliseconds to several seconds) depending on the application of the noise estimation means 43. . In the noise estimation means 43, if the buffer length T of the buffer updating means 102 is too short (for example, if the buffer length T is set to 100 milliseconds), the change in the noise to be estimated will be tracked faster. Depending on the contents of the input power PX (input signal x), noise estimation accuracy may deteriorate. For example, in the noise estimation means 43, if the buffer length T of the buffer updating means 102 is too short, and if the input power PX (input signal x) contains a speech component that is spoken slowly, the speech component will be detected as noise. It may be estimated incorrectly. Also, in the noise estimation means 43, if the buffer length T of the buffer update means 102 is set too long (for example, if the buffer length T is set to 8 seconds), the stationary component in the input signal can be accurately estimated. When the noise environment changes abruptly (for example, when you leave a quiet room and go out into the noisy outdoors, or when you turn on/off the air conditioner all at once), you cannot immediately follow it. Therefore, in the noise estimating means 43, the buffer length T of the buffer updating means 102 is preferably a length corresponding to a range that is neither too short nor too long (for example, 200 milliseconds to 1 second). For example, in consideration of the characteristics of the noise estimating means 43 as described above, a suitable buffer length T may be obtained in advance through experiments or the like and set in the noise estimating means 43 .

確率バッファＢＰは、定常確率ＳＰのサンプル（データ）を過去から現在までＴ個分保持するバッファである。すなわち、確率バッファＢＰのバッファ長Ｔは、入力バッファＢＸのバッファ長Ｔと同じである。 The probability buffer BP is a buffer that holds T samples (data) of the stationary probability SP from the past to the present. That is, the buffer length T of the probability buffer BP is the same as the buffer length T of the input buffer BX.

バッファ更新手段１０２は、ＦＩＦＯ（ＦｉｒｓｔＩｎ、ＦｉｒｓｔＯｕｔ）でバッファ（ＢＸ及びＢＰ）を更新する。すなわち、一番古い入力パワーＰＸが入力バッファＢＸから削除され、新たに与えられた入力パワーＰＸが入力バッファＢＸに格納される。同様に、一番古い定常確率ＳＰが確率バッファＢＰから削除され、新たに与えられた定常確率ＳＰが確率バッファＢＰに格納される。 The buffer update means 102 updates the buffers (BX and BP) in FIFO (First In, First Out). That is, the oldest input power PX is deleted from the input buffer BX, and the newly applied input power PX is stored in the input buffer BX. Similarly, the oldest stationary probability SP is deleted from the probability buffer BP and the newly given stationary probability SP is stored in the probability buffer BP.

加重平均手段１０３は、入力バッファＢＸと確率バッファＢＰに保持されているサンプル（データ）に基づいて推定雑音パワーＰＮを算出し、得られた推定雑音パワーＰＮを出力すると共に、記憶手段１０４に与えて記憶させる。 The weighted average means 103 calculates the estimated noise power PN based on the samples (data) held in the input buffer BX and the probability buffer BP, outputs the obtained estimated noise power PN, and supplies it to the storage means 104. memorize it.

加重平均手段１０３は、確率バッファＢＰに保持されている定常確率ＳＰを重み係数として、入力バッファＢＸに保持されている入力パワーＰＸの加重平均を計算することで、推定雑音パワーＰＮを算出する。すなわち、入力バッファＢＸと確率バッファＢＰのｉ番目（ｉは１～Ｔのいずれかの整数）の値をそれぞれ入力パワーＰＸ_ｉと定常確率ＳＰ_ｉとすると、加重平均手段１０３は、以下の（２）式を用いて推定雑音パワーＰＮを算出することができる。

The weighted average means 103 calculates the estimated noise power PN by calculating the weighted average of the input power PX held in the input buffer BX using the stationary probability SP held in the probability buffer BP as a weighting factor. That is, assuming that the i-th (i is an integer from 1 to T) values of the input buffer BX and the probability buffer BP are the input power PX _i and the stationary probability SP _i respectively, the weighted average means 103 performs the following (2 ) can be used to calculate the estimated noise power PN.

雑音推定手段４３において、加重平均によって推定雑音パワーＰＮを算出する方法には、以下のようなメリットがある。例えば、雑音推定手段４３では、定常確率ＳＰが高い入力パワーＰＸには大きな重みがかけられるので、推定雑音パワーＰＮの算出に大きな影響力を持つ。また、雑音推定手段４３では、定常確率ＳＰが低い入力パワーＰＸには小さな重みがかけられるので、推定雑音パワーＰＮの算出への影響力は小さく、ほとんど無視される。ここで、雑音推定手段４３において、定常確率ＳＰが小さい入力パワーＰＸを完全に無視してしまうと（例えば、特許文献１は、事後ＳＮＲが所定の値より大きいと無視する）、パケットロスやボイススイッチによって入力信号が一時的に小さくなった後で戻った際に追従できない問題が生じる。そこで、この実施形態の雑音推定手段４３では、定常確率ＳＰが小さい入力パワーＰＸを小さな影響力で考慮することで、安定した推定を継続することが可能となる。 The method of calculating the estimated noise power PN by weighted averaging in the noise estimator 43 has the following merits. For example, in the noise estimating means 43, the input power PX with a high stationary probability SP is given a large weight, so it has a great influence on the calculation of the estimated noise power PN. In the noise estimating means 43, since the input power PX with the low stationary probability SP is given a small weight, its influence on the calculation of the estimated noise power PN is small and is almost ignored. Here, if the noise estimation means 43 completely ignores the input power PX with a small stationary probability SP (for example, Patent Document 1 ignores it if the posterior SNR is larger than a predetermined value), packet loss and voice A problem arises when the input signal is temporarily reduced by the switch and then returned, and cannot be tracked. Therefore, in the noise estimation means 43 of this embodiment, stable estimation can be continued by considering the input power PX with a small stationary probability SP with a small influence.

記憶手段１０４は、推定雑音パワーＰＮのサンプル（データ）をＤ個分の期間だけ保持し、Ｄ個前のサンプル（Ｄ個分過去の推定雑音パワーＰＮＤ）を定常確率算出手段１０１に与える。遅延サンプル数Ｄは、Ｄ＝１とするのが好適であるが、Ｄ＞１としても良い。 The storage means 104 holds the samples (data) of the estimated noise power PN for D periods, and provides the stationary probability calculation means 101 with D samples (D past estimated noise power PND). The delay sample number D is preferably set to D=1, but may be set to D>1.

（Ａ－３）第１の実施形態の効果
第１の実施形態によれば、以下のような効果を奏することができる。 (A-3) Effects of First Embodiment According to the first embodiment, the following effects can be obtained.

第１の実施形態の収音装置１に搭載された雑音推定部４０では、山形の定常確率算出関数に基づいて算出した定常確率を重み係数とする加重平均を用いて雑音パワーを推定する。例えば、第１の実施形態の収音装置１に搭載された雑音推定部４０では、入力信号ｘに含まれる雑音が急に大きくなった場合にも、定常確率を重み係数とする加重平均を用いるので、確率的に変化する雑音パワーのうち低頻度で出現する小さい雑音パワーを使って少しずつ追従するので、推定雑音パワーが急激に大きくなることがない。すなわち、第１の実施形態の収音装置１に搭載された雑音推定部４０では、雑音が急激に変化しても、推定雑音パワーは急激に変化せず、さらに推定雑音パワーが小さいままや大きいままとどまることなく、安定して推定可能な雑音推定方法を提供することができる。 The noise estimating unit 40 installed in the sound collecting device 1 of the first embodiment estimates the noise power using a weighted average in which the stationary probability calculated based on Yamagata's stationary probability calculation function is used as a weighting factor. For example, the noise estimating unit 40 mounted on the sound collecting device 1 of the first embodiment uses a weighted average with the stationary probability as a weighting factor even when the noise included in the input signal x suddenly increases. Therefore, of the noise power that stochastically changes, small noise power that appears infrequently is used to gradually follow, so that the estimated noise power does not suddenly increase. That is, in the noise estimation unit 40 mounted in the sound collecting device 1 of the first embodiment, even if the noise changes abruptly, the estimated noise power does not change abruptly. It is possible to provide a noise estimation method capable of stably estimating without stopping.

また、第１の実施形態の収音装置１に搭載された雑音推定部４０では、定常確率算出関数の形状が山形であることから、入力信号ｘに含まれる入力音声が急に小さくなった場合は非定常とみなすので、一時的に小さくなった入力音声に過剰追従することがなく、推定雑音パワーが小さいままとなる課題は生じない。 Further, in the noise estimation unit 40 installed in the sound collecting device 1 of the first embodiment, since the stationary probability calculation function has a chevron shape, when the input voice contained in the input signal x suddenly decreases is regarded as non-stationary, there is no excessive tracking of the input speech that has temporarily decreased, and the problem of the estimated noise power remaining small does not occur.

さらに、第１の実施形態の収音装置１に搭載された雑音推定部４０では、設定必要なパラメータはバッファ長Ｔと遅延サンプル数Ｄだけなので、推定パラメータの調整が不要な雑音推定処理を実現できる。 Furthermore, in the noise estimation unit 40 installed in the sound collection device 1 of the first embodiment, the only parameters that need to be set are the buffer length T and the number of delay samples D, so noise estimation processing that does not require adjustment of estimation parameters is realized. can.

以上のように、第１の実施形態の収音装置１に搭載された雑音推定部４０では、推定パラメータの調整なしに、入力音声の急な変化や変化に左右されずに雑音パワーを推定することができる。 As described above, the noise estimating unit 40 installed in the sound collecting device 1 of the first embodiment estimates the noise power without adjusting the estimation parameters and without being affected by abrupt changes or changes in the input speech. be able to.

（Ｂ）第２の実施形態
以下、本発明による雑音推定装置、雑音推定プログラム、雑音推定方法、及び収音装置の第２の実施形態を、図面を参照しながら詳述する。この実施形態では、本発明の雑音推定装置、雑音推定プログラム、及び雑音推定方法を、雑音推定手段に適用した例について説明する。 (B) Second Embodiment A second embodiment of the noise estimation device, noise estimation program, noise estimation method, and sound collection device according to the present invention will be described in detail below with reference to the drawings. In this embodiment, an example in which the noise estimation apparatus, noise estimation program, and noise estimation method of the present invention are applied to noise estimation means will be described.

（Ｂ－１）第２の実施形態の構成
第２の実施形態の収音装置１Ａ（雑音推定部４０Ａを含む）の構成についても上述の図１～図４を用いて示すことができる。なお、図１～図４における括弧内の符号は第２の実施形態でのみ用いられる符号である。 (B-1) Configuration of Second Embodiment The configuration of the sound collecting device 1A (including the noise estimation unit 40A) of the second embodiment can also be shown using FIGS. 1 to 4 described above. 1 to 4 are used only in the second embodiment.

以下では、第２の実施形態について第１の実施形態との差異を説明する。 The difference between the second embodiment and the first embodiment will be described below.

図２に示すように、第２の実施形態の収音装置１Ａでは、雑音推定部４０が雑音推定部４０Ａに置き換わっている点で第１の実施形態と異なっている。 As shown in FIG. 2, the sound collecting device 1A of the second embodiment differs from the first embodiment in that the noise estimator 40 is replaced with a noise estimator 40A.

また、図３に示すように、第２の実施形態の雑音推定部４０Ａでは、雑音推定手段４３（４３－１～４３－Ｋ）が、雑音推定手段４３Ａ（４３Ａ－１～４３Ａ－Ｋ）に置き換わっている点で第１の実施形態と異なっている。 Further, as shown in FIG. 3, in the noise estimator 40A of the second embodiment, the noise estimator 43 (43-1 to 43-K) is the noise estimator 43A (43A-1 to 43A-K). It differs from the first embodiment in that it is replaced.

さらに、図１に示すように、第２の実施形態の雑音推定手段４３Ａ（４３Ａ－１～４３Ａ－Ｋ）では、バッファ更新手段１０２がバッファ更新手段１０２Ａに置き換わっている点で第１の実施形態と異なっている。 Furthermore, as shown in FIG. 1, in the noise estimating means 43A (43A-1 to 43A-K) of the second embodiment, the buffer updating means 102 is replaced with the buffer updating means 102A. is different from

次に、第２の実施形態におけるバッファ更新手段１０２Ａについて、第１の実施形態との差異を説明する。 Next, differences from the first embodiment will be described with respect to the buffer updating means 102A in the second embodiment.

第１の実施形態のバッファ更新手段１０２は必ず、入力バッファＢＸと確率バッファＢＰを更新する。しかし、第１の実施形態におけるバッファ更新手段１０２の動作では、大きな入力（入力信号ｘのパワーが大きい状態）や小さな入力（入力信号ｘのパワーが小さい状態）がバッファ長Ｔに相当する期間よりも長く続いた場合に、確率バッファＢＰに格納されている定常確率ＳＰの絶対値が小さいにもかかわらず、すべての定常確率ＳＰが小さいためにそれぞれが大きな影響力を持ってしまう。そのため、第１の実施形態におけるバッファ更新手段１０２の動作では、結果として大きな入力や小さな入力に急激に追従する恐れを否定できない。 The buffer updating means 102 of the first embodiment always updates the input buffer BX and the probability buffer BP. However, in the operation of the buffer updating means 102 in the first embodiment, a large input (the power of the input signal x is large) or a small input (the power of the input signal x is small) is longer than the period corresponding to the buffer length T. continues for a long time, even though the absolute values of the stationary probabilities SP stored in the probability buffer BP are small, all the stationary probabilities SP are small, so each has a large influence. Therefore, in the operation of the buffer updating means 102 in the first embodiment, it cannot be denied that there is a possibility that a large input or a small input will be rapidly followed as a result.

そこで、第２の実施形態のバッファ更新手段１０２Ａでは、定常確率ＳＰが非常に小さい場合（所定の値より小さい場合）には入力バッファＢＸと確率バッファＢＰを更新しない点で第１の実施形態と異なっている。これにより、第２の実施形態の雑音推定部４０Ａ（雑音推定手段４３Ａ）では、第１の実施形態以上に安定的に雑音パワーを推定することができる。 Therefore, the buffer update means 102A of the second embodiment differs from the first embodiment in that it does not update the input buffer BX and the probability buffer BP when the stationary probability SP is extremely small (smaller than a predetermined value). different. As a result, the noise estimator 40A (noise estimator 43A) of the second embodiment can estimate the noise power more stably than the first embodiment.

（Ｂ－２）第２の実施形態の動作
次に、以上のような構成を有する収音装置１Ａ（雑音推定部４０Ａ）の動作を説明する。 (B-2) Operation of Second Embodiment Next, the operation of the sound pickup device 1A (noise estimation unit 40A) having the configuration as described above will be described.

以下では、第２の実施形態について第１の実施形態との差異を中心に説明する。上述の通り、第１の実施形態と第２の実施形態では、主にバッファ更新手段１０２がバッファ更新手段１０２Ａに置き換わっている点で異なる。そこで、以下では、バッファ更新手段１０２Ａの動作を中心に説明する。 The second embodiment will be described below, focusing on differences from the first embodiment. As described above, the main difference between the first embodiment and the second embodiment is that the buffer updating means 102 is replaced with the buffer updating means 102A. Therefore, the operation of the buffer updating means 102A will be mainly described below.

バッファ更新手段１０２Ａは、入力パワーＰＸと定常確率ＳＰに基づいて入力バッファＢＸと確率バッファＢＰを更新し、得られた入力バッファＢＸと確率バッファＢＰの内容を加重平均手段１０３に与える。 The buffer updating means 102A updates the input buffer BX and the probability buffer BP based on the input power PX and the stationary probability SP, and provides the weighted average means 103 with the contents of the obtained input buffer BX and probability buffer BP.

バッファ更新手段１０２Ａは、定常確率算出手段１０１から与えられた定常確率ＳＰが所定の値（以下、「ＳＰ０」と表す）より大きい場合にのみ、ＦＩＦＯ（ＦｉｒｓｔＩｎ、ＦｉｒｓｔＯｕｔ）で各バッファ（入力バッファＢＸと確率バッファＢＰ）を更新する。バッファ更新手段１０２Ａにおける各バッファ（入力バッファＢＸと確率バッファＢＰ）更新動作は、第１の実施形態のバッファ更新手段１０２と同様である。 Buffer updating means 102A updates each buffer (input Update the buffer BX and the probability buffer BP). The update operation of each buffer (input buffer BX and probability buffer BP) in the buffer update means 102A is the same as that of the buffer update means 102 of the first embodiment.

ＳＰ０の値が大きすぎると、第１の実施形態の加重平均手段１０３の説明で記載した加重平均のメリットを損なうため、ＳＰ０は小さい値（大きすぎない値）であることが望ましい。例えば、ＳＰ０としては、０．００１～０．０５の範囲の値とするのが好適である。 If the value of SP0 is too large, the advantage of weighted averaging described in the explanation of the weighted averaging means 103 of the first embodiment is lost, so SP0 should preferably be a small value (not too large). For example, SP0 is preferably set to a value in the range of 0.001 to 0.05.

（Ｂ－３）第２の実施形態の効果
第２の実施形態によれば、第１の実施形態の効果に加えて、以下のような効果を奏することができる。 (B-3) Effects of Second Embodiment According to the second embodiment, the following effects can be obtained in addition to the effects of the first embodiment.

第２の実施形態の収音装置１Ａに搭載された雑音推定部４０Ａ（バッファ更新手段１０２Ａ）では、定常確率ＳＰが非常に小さい場合（所定の値より小さい場合）には入力バッファＢＸと確率バッファＢＰを更新しない点で第１の実施形態と異なっている。これにより、第２の実施形態の収音装置１Ａに搭載された雑音推定部４０Ａでは、長時間入力信号ｘのパワーが大きい状態又は小さい状態が続いた場合でも、安定的に雑音パワーを推定できるという効果を奏する。 In the noise estimator 40A (buffer update means 102A) installed in the sound collection device 1A of the second embodiment, when the steady-state probability SP is very small (smaller than a predetermined value), the input buffer BX and the probability buffer This differs from the first embodiment in that the BP is not updated. As a result, the noise estimation unit 40A installed in the sound collecting device 1A of the second embodiment can stably estimate the noise power even when the power of the input signal x continues to be high or low for a long period of time. It has the effect of

（Ｃ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (C) Other Embodiments The present invention is not limited to the above-described embodiments, and modified embodiments such as those illustrated below can be exemplified.

（Ｃ－１）上記の各実施形態では、説明を簡易とするため設置するマイクロホンは１つであるものとして説明するが、複数のマイクロホンを配置する構成としてもよいし、複数のマイクロホンＭを用いたマイクロホンアレイを配置する構成としても良い。その場合、雑音推定部４０、４０Ａは、複数のマイクロホンからの各入力信号について雑音推定の処理を行うことになる。 (C-1) In each of the above embodiments, one microphone is installed for the sake of simplicity. A configuration in which a microphone array is arranged may also be used. In that case, the noise estimation units 40 and 40A perform noise estimation processing on each input signal from a plurality of microphones.

（Ｃ－２）上記の各実施形態では、雑音推定部４０、４０Ａを収音装置１、１Ａに搭載する例について説明したが、雑音推定部４０、４０Ａが搭載される装置は限定されないものである。また、雑音推定部４０、４０Ａを単独の装置として構成するようにしてもよい。 (C-2) In each of the above embodiments, an example in which the noise estimation units 40 and 40A are installed in the sound collecting devices 1 and 1A has been described, but the devices in which the noise estimation units 40 and 40A are installed are not limited. be. Also, the noise estimation units 40 and 40A may be configured as independent devices.

したがって、上記の各実施形態において、雑音推定部４０、４０Ａが取得した推定雑音パワーＰＮ（ＰＮ＿１～ＰＮ＿Ｋ）の出力先や出力方式（例えば、データ形式や出力インタフェース）については限定されないものであり、雑音推定部４０、４０Ａの用途に応じた出力方式で、雑音推定部４０、４０Ａの用途に応じた出力先に出力するようにしてもよい。例えば、雑音推定部４０、４０Ａは、コンピュータ２００が備えるインタフェース（例えば、回路上の信号線やシリアルインタフェース等）を用いて出力するようにしてもよいし、有線又は無線による通信インタフェース（例えば、有線／無線ＬＡＮインタフェースや、種々のシリアルインタフェース等）を用いて出力するようにしてもよい。 Therefore, in each of the above embodiments, the output destination and output method (for example, data format and output interface) of the estimated noise power PN (PN_1 to PN_K) acquired by the noise estimation units 40 and 40A are not limited. The noise estimation units 40 and 40A may be output to output destinations according to the purpose of the noise estimation units 40 and 40A in accordance with the output method. For example, the noise estimating units 40 and 40A may output using an interface provided in the computer 200 (for example, a signal line on a circuit, a serial interface, etc.), or a wired or wireless communication interface (for example, a wired /wireless LAN interface, various serial interfaces, etc.).

（Ｃ－３）収音装置１、信号入力部１０、雑音抑圧部２０、及び雑音推定部４０又は４０Ａにおいて、雑音抑圧部２０の処理を周波数領域で行う場合には帯域分割手段を雑音抑圧部２０の内部に有する必要があるが、雑音推定部４０における帯域分割手段４１の構成または帯域分割手段４１の出力を共有するようにしても良い。例えば、信号入力部１０の内部に帯域分割手段を含め、当該帯域分割手段の出力を信号入力部１０の出力として雑音抑圧部２０および雑音推定部４０に供給するようにしても良い。 (C-3) In the sound collecting device 1, the signal input unit 10, the noise suppression unit 20, and the noise estimation unit 40 or 40A, when the processing of the noise suppression unit 20 is performed in the frequency domain, the band division means is replaced by the noise suppression unit. 20, the configuration of the band division means 41 in the noise estimation section 40 or the output of the band division means 41 may be shared. For example, the signal input section 10 may include band division means, and the output of the band division means may be supplied to the noise suppression section 20 and the noise estimation section 40 as the output of the signal input section 10 .

１…収音装置、Ｍ…マイクロホン、１０…信号入力部、２０…雑音抑圧処理部、３０…信号出力部、４０…雑音推定部、４１…帯域分割手段、４２、４２－１～４２－Ｋ…パワー算出手段、４３、４３－１～４３－Ｋ…雑音推定手段、１０１…定常確率算出手段、１０２…バッファ更新手段、１０３…加重平均手段、１０４…記憶手段、２００…コンピュータ、２０１…プロセッサ、２０２…一次記憶部、２０３…二次記憶部。 Reference Signs List 1 sound collection device M microphone 10 signal input unit 20 noise suppression processing unit 30 signal output unit 40 noise estimation unit 41 band division means 42, 42-1 to 42-K Power calculating means 43, 43-1 to 43-K Noise estimating means 101 Stationary probability calculating means 102 Buffer updating means 103 Weighted average means 104 Storage means 200 Computer 201 Processor , 202... primary storage unit, 203... secondary storage unit.

Claims

A noise estimator that obtains an estimated noise power by estimating noise components contained in an input signal in which speech and noise are mixed,
stationary probability calculation means for calculating a stationary probability of the input power from the input power of the input signal and the past estimated noise power;
an input buffer for buffering data of the past input power; and a probability buffer for buffering the steady-state probability calculated by the steady-state probability calculation means in the past. buffer update means for updating the content of the input buffer and the content of the probability buffer based on the stationary probability;
and weighted averaging means for calculating the estimated noise power by taking the weighted average of the input powers held in the input buffer using the stationary probability held in the probability buffer as a weighting factor. Noise estimator.

2. The noise estimation apparatus according to claim 1, wherein said stationary probability calculation means calculates said stationary probability using a stationary probability calculation function regarding a posteriori SNR based on said input power and said past estimated noise power. .

3. The buffer updating means according to claim 1, wherein said buffer updating means updates said input buffer and said probability buffer only when said stationary probability given from said stationary probability calculating means is greater than a predetermined value. noise estimator.

A computer installed in a noise estimator that obtains an estimated noise power by estimating noise components contained in an input signal in which speech and noise are mixed,
stationary probability calculation means for calculating a stationary probability of the input power from the input power of the input signal and the past estimated noise power;
an input buffer for buffering data of the past input power; and a probability buffer for buffering the steady-state probability calculated by the steady-state probability calculation means in the past. buffer update means for updating the content of the input buffer and the content of the probability buffer based on the stationary probability;
It functions as weighted averaging means for calculating the estimated noise power by taking a weighted average of the input powers held in the input buffer using the stationary probability held in the probability buffer as a weighting factor. A noise estimation program for .

In a noise estimation method performed by a noise estimation device for obtaining estimated noise power by estimating noise components contained in an input signal in which speech and noise are mixed,
having stationary probability calculating means, buffer updating means, and weighted averaging means;
The stationary probability calculation means calculates the stationary probability of the input power from the input power of the input signal and the past estimated noise power,
The buffer update means includes an input buffer for buffering past input power data and a probability buffer for buffering the steady-state probability calculated by the steady-state probability calculation means in the past. updating the content of the input buffer and the content of the probability buffer based on the stationary probability calculated by the calculating means;
The weighted averaging means calculates the estimated noise power by taking a weighted average of the input powers held in the input buffer using the stationary probability held in the probability buffer as a weighting factor. Noise estimation method.

A sound pickup device that picks up the sound from an input signal in which speech and noise are mixed,
a noise estimation unit that estimates a noise component included in the input signal;
a sound collecting unit that extracts and collects the voice from the input signal using the estimation result of the noise estimating unit;
A sound collecting device, wherein the noise estimating device according to any one of claims 1 to 3 is applied as the noise estimating unit.