JP2014115377A

JP2014115377A - Sound processing device

Info

Publication number: JP2014115377A
Application number: JP2012268102A
Authority: JP
Inventors: Kazunobu Kondo; 多伸近藤
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2012-12-07
Filing date: 2012-12-07
Publication date: 2014-06-26
Anticipated expiration: 2032-12-07
Also published as: JP6064566B2

Abstract

PROBLEM TO BE SOLVED: To highly accurately estimate a noise component of a sound signal even when the noise component fluctuates.SOLUTION: An index computation section 42 computes a confidence level CN(k,m) per unit period being an index of probability that a respective unit period of a sound signal corresponds to a noise component, in accordance with a disparity between a first index value P1(k,m) which follows time changes of power|X(k,m)|of each observation component X(k,m) of the sound signal containing a target sound component and the noise component, and a second index value P2(k,m) which follows the time changes of power|X(k,m)|at low followability compared to the first index value P1(k,m). An estimation processing section 44 generates an estimated noise component N(k,m) from the observation component X(k,m) by utilizing the confidence level CN(k,m) per unit period computed by the index computation section 42.

Description

本発明は、雑音成分を包含する音響信号を処理する技術に関し、雑音成分を推定して音響信号から抑圧する雑音抑圧に特に好適に利用される。 The present invention relates to a technique for processing an acoustic signal including a noise component, and is particularly preferably used for noise suppression in which the noise component is estimated and suppressed from the acoustic signal.

音響信号に包含される雑音成分を推定して音響信号から抑圧する雑音抑圧技術が従来から提案されている。雑音成分の推定には様々な方法が利用される。例えば非特許文献１には、音響信号のうちパワーが最小となる区間を雑音成分として推定する技術（最小統計法）が開示されている。 Conventionally, a noise suppression technique for estimating a noise component included in an acoustic signal and suppressing it from the acoustic signal has been proposed. Various methods are used for estimating the noise component. For example, Non-Patent Document 1 discloses a technique (minimum statistical method) for estimating a noise component as a noise component in an acoustic signal.

Rainer Martin, "Spectral Subtraction Based on Minimum Statistics", Proc. EUSIPCO 94, p. 1182-1185Rainer Martin, "Spectral Subtraction Based on Minimum Statistics", Proc. EUSIPCO 94, p. 1182-1185

しかし、非特許文献１の技術のもとで雑音成分を高精度に推定するためには音響信号のうち数秒程度の長時間にわたる区間を対象としてパワーの最小値を探索する必要がある。したがって、雑音成分の特性が変動した場合に変動後の雑音成分を迅速かつ高精度に推定することが困難であるという問題がある。以上の事情を背景として、本発明は、雑音成分が変動した場合でも音響信号の雑音成分を高精度に推定することを目的とする。 However, in order to estimate the noise component with high accuracy under the technique of Non-Patent Document 1, it is necessary to search for the minimum value of power for a long section of about several seconds in the acoustic signal. Therefore, there is a problem that it is difficult to estimate the noise component after the change quickly and with high accuracy when the characteristic of the noise component changes. In view of the above circumstances, an object of the present invention is to estimate the noise component of an acoustic signal with high accuracy even when the noise component fluctuates.

以上の課題を解決するために、本発明の音響処理装置は、目的音成分と雑音成分とを包含する音響信号の時間変化に追従する第１指標値（例えば第１指標値Ｐ1(k,m)）と、第１指標値と比較して低い追従性で音響信号の時間変化に追従する第２指標値（例えば第２指標値Ｐ2(k,m)）との相違に応じて、音響信号の各単位期間が目的音成分および雑音成分の一方に該当する確度の指標である信頼度（例えば信頼度ＣS(k,m)または信頼度ＣN(k,m)）を単位期間毎に算定する指標算定手段と、指標算定手段が算定した各単位期間の信頼度を利用して音響信号から推定雑音成分を生成する推定処理手段とを具備する。以上の構成では、相異なる追従性で音響信号の時間変化に追従する第１指標値と第２指標値との相違に応じた各単位期間の信頼度を利用して推定雑音成分が生成される。したがって、雑音成分が変動した場合でも音響信号の雑音成分を高精度に推定することが可能である。 In order to solve the above-described problems, the acoustic processing device of the present invention has a first index value (for example, a first index value P1 (k, m) that follows a time change of an acoustic signal including a target sound component and a noise component. )) And the second index value (for example, the second index value P2 (k, m)) that follows the time change of the acoustic signal with low followability compared to the first index value. The reliability (for example, reliability CS (k, m) or reliability CN (k, m)) that is an index of accuracy corresponding to one of the target sound component and the noise component is calculated for each unit period. An index calculating means and an estimation processing means for generating an estimated noise component from the acoustic signal using the reliability of each unit period calculated by the index calculating means. In the above configuration, the estimated noise component is generated using the reliability of each unit period according to the difference between the first index value and the second index value that follow the time change of the acoustic signal with different tracking characteristics. . Therefore, even when the noise component fluctuates, the noise component of the acoustic signal can be estimated with high accuracy.

本発明の好適な態様において、指標算定手段は、音響信号の各単位期間が目的音成分に該当する確度の指標である基礎指標（例えば信頼度ＣS(k,m)）を第１指標値と第２指標値とに相違に応じて単位期間毎に算定し、基礎指標が第１閾値（例えば閾値ＴH1）を上回る単位期間について信頼度を所定値（例えばゼロ）に設定する。以上の態様では、基礎指標が第１閾値を上回る場合に信頼度が所定値に設定されるから、単位期間内で目的音成分が充分に優勢な場合でも推定雑音成分を高精度に生成できるという利点がある。他の態様において、指標算定手段は、音響信号の強度が第２閾値（例えば閾値ＴH2）を下回る単位期間について信頼度を所定値（例えばゼロ）に設定する。以上の態様では、音響信号の強度が第２閾値を下回る場合に信頼度が所定値に設定されるから、例えば音響信号の強度が瞬間的に低下した場合でも推定雑音成分を高精度に生成できるという利点がある。 In a preferred aspect of the present invention, the index calculation means uses a basic index (for example, reliability CS (k, m)), which is an index of accuracy that each unit period of the acoustic signal corresponds to the target sound component, as the first index value. The second index value is calculated for each unit period according to the difference, and the reliability is set to a predetermined value (for example, zero) for the unit period for which the basic index exceeds the first threshold value (for example, the threshold value TH1). In the above aspect, since the reliability is set to a predetermined value when the basic index exceeds the first threshold, the estimated noise component can be generated with high accuracy even when the target sound component is sufficiently dominant within the unit period. There are advantages. In another aspect, the index calculation means sets the reliability to a predetermined value (for example, zero) for a unit period in which the intensity of the acoustic signal is below a second threshold (for example, the threshold TH2). In the above aspect, since the reliability is set to a predetermined value when the intensity of the acoustic signal is lower than the second threshold value, for example, even when the intensity of the acoustic signal is instantaneously reduced, the estimated noise component can be generated with high accuracy. There is an advantage.

本発明の好適な態様において、推定処理手段は、指標算定手段が算定した信頼度に応じた平滑化係数（例えば平滑化係数ω(k,m)）を音響信号の各単位期間の強度の指数移動平均に適用することで推定雑音成分を生成する。以上の態様では、音響信号の各単位期間の強度の指数移動平均で推定雑音成分が算定されるから、音響信号のうち数秒程度の長時間にわたる区間を対象としてパワーの最小値を探索する非特許文献１の技術と比較して、推定雑音成分の算定のために音響信号の強度を保持すべき単位期間の総数（音響信号の強度の保持に必要な記憶容量）が削減されるという利点がある。 In a preferred aspect of the present invention, the estimation processing means uses a smoothing coefficient (for example, the smoothing coefficient ω (k, m)) according to the reliability calculated by the index calculation means as an index of intensity of each unit period of the acoustic signal. An estimated noise component is generated by applying to the moving average. In the above aspect, since the estimated noise component is calculated by the exponential moving average of the intensity of each unit period of the acoustic signal, the non-patent for searching for the minimum value of the power for a long section of about several seconds in the acoustic signal. Compared with the technique of Reference 1, there is an advantage that the total number of unit periods (storage capacity necessary for holding the intensity of the acoustic signal) is reduced for the calculation of the estimated noise component. .

本発明の好適な態様において、推定処理手段は、指標算定手段が算定した信頼度と第１係数（例えば第１係数Ａ1）とに応じた平滑化係数（例えば平滑化係数ω1(k,m)）を音響信号の各単位期間の強度の指数移動平均に適用することで第１推定雑音成分（例えば第１推定雑音成分Ｑ1(k,m)）を生成し、指標算定手段が算定した信頼度と第１係数とは相違する第２係数（例えば第２係数Ａ2）とに応じた平滑化係数（例えば平滑化係数ω2(k,m)）を音響信号の各単位期間の強度の指数移動平均に適用することで第２推定雑音成分（例えば第２推定雑音成分Ｑ2(k,m)）を生成し、第１推定雑音成分と第２推定雑音成分とに応じて推定雑音成分を生成する。以上の態様では、相異なる平滑化係数で音響信号の強度を指数移動平均した第１推定雑音成分および第２推定雑音成分に応じて推定雑音成分が生成されるから、推定雑音成分を高精度に生成できるという効果は格別に顕著である。 In a preferred aspect of the present invention, the estimation processing means includes a smoothing coefficient (for example, a smoothing coefficient ω1 (k, m) corresponding to the reliability calculated by the index calculating means and the first coefficient (for example, the first coefficient A1). ) Is applied to the exponential moving average of the intensity of each unit period of the acoustic signal to generate the first estimated noise component (for example, the first estimated noise component Q1 (k, m)), and the reliability calculated by the index calculating means And a smoothing coefficient (for example, smoothing coefficient ω2 (k, m)) corresponding to a second coefficient different from the first coefficient (for example, the second coefficient A2) is an exponential moving average of the intensity of each unit period of the acoustic signal Is applied to generate a second estimated noise component (for example, a second estimated noise component Q2 (k, m)), and an estimated noise component is generated according to the first estimated noise component and the second estimated noise component. In the above aspect, the estimated noise component is generated according to the first estimated noise component and the second estimated noise component obtained by exponentially moving and averaging the intensity of the acoustic signal with different smoothing coefficients. The effect that it can be generated is particularly remarkable.

本発明の好適な態様に係る音響処理装置は、推定処理手段が生成した推定雑音成分を音響信号から抑圧する雑音抑圧手段を具備する。本発明によれば、推定雑音成分を高精度に生成できるから、音響信号の雑音成分を高精度に抑圧することが可能である。 The acoustic processing apparatus according to a preferred aspect of the present invention includes noise suppression means for suppressing the estimated noise component generated by the estimation processing means from the acoustic signal. According to the present invention, since the estimated noise component can be generated with high accuracy, the noise component of the acoustic signal can be suppressed with high accuracy.

以上の各態様に係る音響処理装置は、音響信号の処理に専用されるＤＳＰ（Digital Signal Processor）等のハードウェア（電子回路）によって実現されるほか、ＣＰＵ（Central Processing Unit）などの汎用の演算処理装置とプログラムとの協働によっても実現される。本発明に係るプログラムは、目的音成分と雑音成分とを包含する音響信号の時間変化に追従する第１指標値と、第１指標値と比較して低い追従性で音響信号の時間変化に追従する第２指標値との相違に応じて、音響信号の各単位期間が目的音成分および雑音成分の一方に該当する確度の指標である信頼度を単位期間毎に算定する指標算定処理と、指標算定処理で算定した各単位期間の信頼度を利用して音響信号から推定雑音成分を生成する推定処理とをコンピュータに実行させる。以上のプログラムによれば、本発明に係る音響処理装置と同様の作用および効果が実現される。 The acoustic processing device according to each of the above aspects is realized by hardware (electronic circuit) such as a DSP (Digital Signal Processor) dedicated to processing of an acoustic signal, or a general-purpose calculation such as a CPU (Central Processing Unit). This is also realized by cooperation between the processing device and the program. The program according to the present invention follows a first index value that follows a time change of an acoustic signal including a target sound component and a noise component, and follows a time change of the sound signal with lower followability than the first index value. An index calculation process for calculating, for each unit period, a reliability that is an index of accuracy in which each unit period of the acoustic signal corresponds to one of the target sound component and the noise component according to a difference from the second index value Using the reliability of each unit period calculated in the calculation process, the computer executes an estimation process for generating an estimated noise component from the acoustic signal. According to the above program, the same operation and effect as the sound processing apparatus according to the present invention are realized.

本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされる。記録媒体は、例えば非一過性（non-transitory）の記録媒体であり、ＣＤ-ＲＯＭ等の光学式記録媒体（光ディスク）が好例であるが、半導体記録媒体や磁気記録媒体等の公知の任意の形式の記録媒体を包含し得る。また、例えば、本発明のプログラムは、通信網を介した配信の形態で提供されてコンピュータにインストールされ得る。 The program of the present invention is provided in a form stored in a computer-readable recording medium and installed in the computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but a known arbitrary one such as a semiconductor recording medium or a magnetic recording medium This type of recording medium can be included. For example, the program of the present invention can be provided in the form of distribution via a communication network and installed in a computer.

本発明は、音響信号の各単位期間が目的音成分および雑音成分の一方に該当する確度の指標である信頼度を算定する音響処理装置（指標算定装置）としても特定される。すなわち、本発明の他の態様に係る音響処理装置は、目的音成分と雑音成分とを包含する音響信号の時間変化に追従する第１指標値と、第１指標値と比較して低い追従性で音響信号の時間変化に追従する第２指標値との相違に応じて、音響信号の各単位期間が目的音成分および雑音成分の一方に該当する確度の指標である信頼度を単位期間毎に算定する指標算定手段を具備する。以上の構成によれば、雑音成分が変動した場合でも音響信号の各単位期間の信頼度を高精度に算定することが可能である。 The present invention is also specified as an acoustic processing device (index calculation device) that calculates a reliability that is an accuracy index in which each unit period of an acoustic signal corresponds to one of a target sound component and a noise component. That is, the acoustic processing device according to another aspect of the present invention has a first index value that follows a time change of an acoustic signal that includes a target sound component and a noise component, and low followability compared to the first index value. In accordance with the difference from the second index value that follows the time change of the acoustic signal, the reliability that is an index of accuracy in which each unit period of the acoustic signal corresponds to one of the target sound component and the noise component is determined for each unit period. An index calculation means for calculating is provided. According to the above configuration, the reliability of each unit period of the acoustic signal can be calculated with high accuracy even when the noise component fluctuates.

本発明の第１実施形態に係る音響処理装置のブロック図である。1 is a block diagram of a sound processing apparatus according to a first embodiment of the present invention. 雑音推定部のブロック図である。It is a block diagram of a noise estimation part. 第１実施形態の効果の説明図である。It is explanatory drawing of the effect of 1st Embodiment. 第１実施形態の効果の説明図である。It is explanatory drawing of the effect of 1st Embodiment.

＜第１実施形態＞
図１は、本発明の第１実施形態に係る音響処理装置１００のブロック図である。図１に示すように、第１実施形態の音響処理装置１００には信号供給装置１２と放音装置１４とが接続される。信号供給装置１２は、目的音成分と雑音成分との混合音の波形を示す時間領域の音響信号ｘ(t)を音響処理装置１００に供給する（ｔ：時間）。目的音成分は、例えば音声や楽音等の音響成分であり、雑音成分は、例えば空調設備の動作音や人混み内の雑踏音等の環境音に代表される加法性雑音の音響成分である。周囲の音響を収音して音響信号ｘ(t)を生成する収音機器や、可搬型または内蔵型の記録媒体から音響信号ｘ(t)を取得して音響処理装置１００に供給する再生装置や、通信網から音響信号ｘ(t)を受信して音響処理装置１００に供給する通信装置が信号供給装置１２として採用され得る。 <First Embodiment>
FIG. 1 is a block diagram of a sound processing apparatus 100 according to the first embodiment of the present invention. As shown in FIG. 1, a signal supply device 12 and a sound emission device 14 are connected to the sound processing device 100 of the first embodiment. The signal supply device 12 supplies a time domain acoustic signal x (t) indicating the waveform of the mixed sound of the target sound component and the noise component to the acoustic processing device 100 (t: time). The target sound component is an acoustic component such as voice or musical sound, for example, and the noise component is an additive noise acoustic component typified by an environmental sound such as an operation sound of an air conditioner or a crowded noise in a crowd. A sound collection device that collects ambient sound and generates an acoustic signal x (t), or a playback device that acquires the acoustic signal x (t) from a portable or built-in recording medium and supplies it to the acoustic processing apparatus 100 Alternatively, a communication device that receives the acoustic signal x (t) from the communication network and supplies the acoustic signal x (t) to the acoustic processing device 100 may be employed as the signal supply device 12.

音響処理装置１００は、信号供給装置１２が供給する音響信号ｘ(t)から雑音成分を抑圧（目的音成分を強調）した音響信号ｙ(t)を生成する信号処理装置（雑音抑圧装置）である。放音装置１４（例えばスピーカやヘッドホン）は、音響処理装置１００が生成した音響信号ｙ(t)に応じた音波を放射する。なお、音響信号ｙ(t)をデジタルからアナログに変換するＤ/Ａ変換器の図示は便宜的に省略されている。 The acoustic processing device 100 is a signal processing device (noise suppression device) that generates an acoustic signal y (t) in which a noise component is suppressed (a target sound component is emphasized) from the acoustic signal x (t) supplied by the signal supply device 12. is there. The sound emitting device 14 (for example, a speaker or a headphone) emits a sound wave corresponding to the acoustic signal y (t) generated by the acoustic processing device 100. Note that a D / A converter that converts the acoustic signal y (t) from digital to analog is not shown for convenience.

図１に示すように、音響処理装置１００は、演算処理装置２２と記憶装置２４とを具備するコンピュータシステムで実現される。記憶装置２４は、演算処理装置２２が実行するプログラムや演算処理装置２２が使用する各種のデータを記憶する。半導体記録媒体や磁気記録媒体などの公知の記録媒体または複数種の記録媒体の組合せが記憶装置２４として任意に採用され得る。音響信号ｘ(t)を記憶装置２４に記憶した構成（したがって信号供給装置１２は省略される）も好適である。 As shown in FIG. 1, the sound processing device 100 is realized by a computer system including an arithmetic processing device 22 and a storage device 24. The storage device 24 stores a program executed by the arithmetic processing device 22 and various data used by the arithmetic processing device 22. A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media can be arbitrarily employed as the storage device 24. A configuration in which the acoustic signal x (t) is stored in the storage device 24 (therefore, the signal supply device 12 is omitted) is also suitable.

演算処理装置２２は、記憶装置２４に格納されたプログラムを実行することで、音響信号ｘ(t)から音響信号ｙ(t)を生成するための複数の機能（周波数分析部３２，雑音推定部３４，雑音抑圧部３６，波形生成部３８）を実現する。なお、演算処理装置２２の各機能を複数の装置に分散した構成や、専用の電子回路（例えばＤＳＰ）が一部の機能を実現する構成も採用され得る。 The arithmetic processing unit 22 executes a program stored in the storage device 24 to thereby generate a plurality of functions (frequency analysis unit 32, noise estimation unit) for generating the acoustic signal y (t) from the acoustic signal x (t). 34, a noise suppression unit 36, and a waveform generation unit 38). A configuration in which each function of the arithmetic processing device 22 is distributed to a plurality of devices or a configuration in which a dedicated electronic circuit (for example, a DSP) realizes a part of the functions may be employed.

周波数分析部３２は、周波数軸上の複数の周波数の各々に対応する音響信号ｘ(t)の成分（以下「観測成分」という）Ｘ(k,m)を時間軸上の単位期間（フレーム）毎に順次に生成する。記号ｋは周波数軸上の任意の１個の周波数（周波数ビン）を意味し、記号ｍは時間軸上の任意の１個の単位期間を意味する。各観測成分（周波数スペクトル）Ｘ(k,m)の算定には、短時間フーリエ変換等の公知の周波数分析が任意に採用される。なお、通過帯域が相違する複数の帯域通過フィルタの系列（フィルタバンク）を周波数分析部３２として利用することも可能である。 The frequency analysis unit 32 converts a component (hereinafter referred to as “observation component”) X (k, m) of an acoustic signal x (t) corresponding to each of a plurality of frequencies on the frequency axis into a unit period (frame) on the time axis. It generates sequentially every time. The symbol k means any one frequency (frequency bin) on the frequency axis, and the symbol m means any one unit period on the time axis. For calculation of each observation component (frequency spectrum) X (k, m), a known frequency analysis such as short-time Fourier transform is arbitrarily employed. Note that it is also possible to use a series (filter bank) of a plurality of bandpass filters having different passbands as the frequency analysis unit 32.

雑音推定部３４は、音響信号ｘ(t)（観測成分Ｘ(k,m)）に包含される雑音成分（以下「推定雑音成分」という）Ｎ(k,m)を単位期間毎に順次に推定する。第１実施形態の推定雑音成分Ｎ(k,m)はパワー（パワースペクトル）に相当する。雑音推定部３４の具体的な構成および動作については後述する。 The noise estimation unit 34 sequentially calculates noise components (hereinafter referred to as “estimated noise components”) N (k, m) included in the acoustic signal x (t) (observed components X (k, m)) for each unit period. presume. The estimated noise component N (k, m) in the first embodiment corresponds to power (power spectrum). A specific configuration and operation of the noise estimation unit 34 will be described later.

雑音抑圧部３６は、雑音推定部３４が推定した推定雑音成分Ｎ(k,m)を音響信号ｘ(t)（観測成分Ｘ(k,m)）から抑圧することで音響信号ｙ(t)の周波数毎の成分（以下「雑音抑圧成分」という）Ｙ(k,m)を単位期間毎に順次に生成する。具体的には、第１実施形態の雑音抑圧部３６は、周波数領域で観測成分Ｘ(k,m)のパワー（パワースペクトル）|Ｘ(k,m)|²から推定雑音成分Ｎ(k,m)を減算（スペクトル減算）する以下の数式(1)の演算で雑音抑圧成分Ｙ(k,m)を算定する。
The noise suppression unit 36 suppresses the estimated noise component N (k, m) estimated by the noise estimation unit 34 from the acoustic signal x (t) (observed component X (k, m)), thereby suppressing the acoustic signal y (t). Y (k, m) for each frequency (hereinafter referred to as “noise suppression component”) is sequentially generated for each unit period. Specifically, the noise suppression unit 36 of the first embodiment calculates the estimated noise component N (k, m) from the power (power spectrum) | X (k, m) | ² of the observed component X (k, m) in the frequency domain. The noise suppression component Y (k, m) is calculated by the following equation (1) that subtracts m) (spectrum subtraction).

数式(1)の記号max{ }は、括弧内の最大値を選択する演算を意味する。数式(1)の記号ｊは虚数単位を意味し、記号θ(k,m)は、音響信号ｘ(t)の位相角（位相スペクトル）を意味する。数式(1)から理解される通り、所定の定数（フロアリング係数）βを下限値として観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²から推定雑音成分Ｎ(k,m)を減算した結果の平方根が雑音抑圧成分Ｙ(k,m)の振幅（振幅スペクトル）|Ｙ(k,m)|に相当する。 The symbol max {} in Equation (1) means an operation for selecting the maximum value in parentheses. The symbol j in Equation (1) means an imaginary unit, and the symbol θ (k, m) means the phase angle (phase spectrum) of the acoustic signal x (t). As understood from the equation (1), the estimated noise component N (k, m) is calculated from the power | X (k, m) | ² of the observed component X (k, m) with a predetermined constant (flooring coefficient) β as a lower limit. The square root of the result of subtracting m) corresponds to the amplitude (amplitude spectrum) | Y (k, m) | of the noise suppression component Y (k, m).

図１の波形生成部３８は、雑音抑圧部３６が単位期間毎に順次に生成する雑音抑圧成分Ｙ(k,m)から時間領域の音響信号ｙ(t)を生成する。具体的には、波形生成部３８は、各単位期間の雑音抑圧成分Ｙ(k,m)を短時間逆フーリエ変換で時間領域に変換するとともに前後の単位期間について相互に連結することで音響信号ｙ(t)を生成する。波形生成部３８が生成した音響信号ｙ(t)が放音装置１４に供給されて音波として放射される。 The waveform generation unit 38 in FIG. 1 generates a time-domain acoustic signal y (t) from the noise suppression component Y (k, m) that the noise suppression unit 36 sequentially generates for each unit period. Specifically, the waveform generation unit 38 converts the noise suppression component Y (k, m) of each unit period into the time domain by the short-time inverse Fourier transform and connects the preceding and subsequent unit periods to each other to connect the acoustic signal. Generate y (t). The acoustic signal y (t) generated by the waveform generator 38 is supplied to the sound emitting device 14 and radiated as a sound wave.

図２は、雑音推定部３４のブロック図である。図２に示すように、第１実施形態の雑音推定部３４は、指標算定部４２と推定処理部４４とを含んで構成される。指標算定部４２は、音響信号ｘ(t)の単位期間毎に各周波数の信頼度ＣN(k,m)を算定する。信頼度ＣN(k,m)は、音響信号ｘ(t)の観測成分Ｘ(k,m)が雑音成分に該当する確度（観測成分Ｘ(k,m)内で雑音成分が目的音成分と比較して優勢である度合）の指標であり、０以上かつ１以下の範囲内で可変に設定される。図２に示すように、第１実施形態の指標算定部４２は、平滑処理部４２２と算定処理部４２４とを含んで構成される。 FIG. 2 is a block diagram of the noise estimation unit 34. As shown in FIG. 2, the noise estimation unit 34 of the first embodiment includes an index calculation unit 42 and an estimation processing unit 44. The index calculation unit 42 calculates the reliability CN (k, m) of each frequency for each unit period of the acoustic signal x (t). The reliability CN (k, m) is the accuracy that the observed component X (k, m) of the acoustic signal x (t) corresponds to the noise component (the noise component is the target sound component in the observed component X (k, m)). It is an index of the degree of dominance in comparison, and is variably set within a range of 0 or more and 1 or less. As illustrated in FIG. 2, the index calculation unit 42 of the first embodiment includes a smoothing processing unit 422 and a calculation processing unit 424.

平滑処理部４２２は、音響信号ｘ(t)の時間変化（具体的には観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の時間変化）に追従する第１指標値Ｐ1(k,m)および第２指標値Ｐ2(k,m)を単位期間毎に順次に算定する。具体的には、平滑処理部４２２は、以下の数式(2A)および数式(2B)で表現される通り、観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の指数移動平均（平滑化）で第１指標値Ｐ1(k,m)および第２指標値Ｐ2(k,m)を算定する。
The smoothing processing unit 422 is a first index value that follows the time change of the acoustic signal x (t) (specifically, the power change of the observed component X (k, m) | X (k, m) | ² ). P1 (k, m) and the second index value P2 (k, m) are sequentially calculated for each unit period. Specifically, the smoothing processing unit 422 performs exponential movement of the power | X (k, m) | ² of the observed component X (k, m) as expressed by the following formulas (2A) and (2B): The first index value P1 (k, m) and the second index value P2 (k, m) are calculated by averaging (smoothing).

数式(2A)の記号α1および数式(2B)の記号α2は、指数移動平均の平滑化係数（忘却係数）を意味し、０以上かつ１以下の範囲内で設定される。数式(2A)および数式(2B)から理解される通り、第１指標値Ｐ1(k,m)および第２指標値Ｐ2(k,m)は、観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の時間軸上の包絡線（パワーエンベロープ）に相当する。 Symbol α1 in equation (2A) and symbol α2 in equation (2B) mean a smoothing coefficient (forgetting coefficient) of the exponential moving average, and are set within a range of 0 or more and 1 or less. As understood from the equations (2A) and (2B), the first index value P1 (k, m) and the second index value P2 (k, m) are the power | X of the observed component X (k, m). This corresponds to an envelope (power envelope) on the time axis of (k, m) | ² .

数式(2B)の平滑化係数α2は数式(2A)の平滑化係数α1を下回る（０≦α2＜α1≦１）。したがって、第２指標値Ｐ2(k,m)は、第１指標値Ｐ1(k,m)と比較して低い追従性で観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の時間変化に追従する。すなわち、数式(2B)による平滑化の時定数τ2は数式(2A)による平滑化の時定数τ1を上回る（τ2＞τ1）。なお、平滑化係数α1を１に設定する（すなわち、観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²を第１指標値Ｐ1(k,m)として利用する）ことも可能である。 The smoothing coefficient α2 in the formula (2B) is lower than the smoothing coefficient α1 in the formula (2A) (0 ≦ α2 <α1 ≦ 1). Accordingly, the second index value P2 (k, m) is lower in tracking ability than the first index value P1 (k, m), and the power | X (k, m) | Follow the time change of ² . That is, the smoothing time constant τ2 according to Equation (2B) exceeds the smoothing time constant τ1 according to Equation (2A) (τ2> τ1). Note that the smoothing coefficient α1 may be set to 1 (that is, the power | X (k, m) | ² of the observed component X (k, m) is used as the first index value P1 (k, m)). Is possible.

図２の算定処理部４２４は、平滑処理部４２２が算定した第１指標値Ｐ1(k,m)と第２指標値Ｐ2(k,m)とを利用して信頼度（観測成分Ｘ(k,m)が雑音成分に該当する確度）ＣN(k,m)を算定する。第１実施形態の算定処理部４２４は、観測成分Ｘ(k,m)が目的音成分に該当する確度（観測成分Ｘ(k,m)内で目的音成分が雑音成分と比較して優勢である度合）を意味する信頼度（基礎指標）ＣS(k,m)を第１指標値Ｐ1(k,m)と第２指標値Ｐ2(k,m)との相違に応じて算定し、信頼度ＣS(k,m)を適用した演算で信頼度ＣN(k,m)を算定する。 The calculation processing unit 424 in FIG. 2 uses the first index value P1 (k, m) and the second index value P2 (k, m) calculated by the smoothing processing unit 422 to determine the reliability (observation component X (k , m) is an accuracy corresponding to the noise component) CN (k, m). The calculation processing unit 424 of the first embodiment has a probability that the observed component X (k, m) corresponds to the target sound component (the target sound component is dominant in the observed component X (k, m) compared to the noise component. The reliability (basic index) CS (k, m) meaning a certain degree) is calculated according to the difference between the first index value P1 (k, m) and the second index value P2 (k, m). The reliability CN (k, m) is calculated by calculation using the degree CS (k, m).

具体的には、算定処理部４２４は、以下の数式(3)の演算で信頼度ＣS(k,m)を算定する。

すなわち、算定処理部４２４は、第２指標値Ｐ2(k,m)に対する第１指標値Ｐ1(k,m)の相対比（Ｐ1(k,m)/Ｐ2(k,m)）を、所定値（第１実施形態の例示では１）以下の範囲内（０≦ＣS(k,m)≦１）で信頼度ＣS(k,m)として算定する。なお、第１指標値Ｐ1(k,m)と第２指標値Ｐ2(k,m)との差分を信頼度ＣS(k,m)として算定する構成も採用され得る。 Specifically, the calculation processing unit 424 calculates the reliability CS (k, m) by the calculation of the following formula (3).

That is, the calculation processing unit 424 determines a relative ratio (P1 (k, m) / P2 (k, m)) of the first index value P1 (k, m) to the second index value P2 (k, m) as a predetermined value. The reliability CS (k, m) is calculated within a range (0 ≦ CS (k, m) ≦ 1) within a value (1 in the example of the first embodiment). A configuration in which the difference between the first index value P1 (k, m) and the second index value P2 (k, m) is calculated as the reliability CS (k, m) may be employed.

音声や楽音等の目的音成分は雑音成分と比較して時間的な変動が顕著であるという概略的な傾向がある。第１指標値Ｐ1(k,m)は第２指標値Ｐ2(k,m)と比較して高い追従性で観測成分Ｘ(k,m)の時間変化に追従するから、以上の傾向のもとでは、観測成分Ｘ(k,m)内で目的音成分が雑音成分と比較して優勢である場合には第１指標値Ｐ1(k,m)が第２指標値Ｐ2(k,m)を上回り、結果的に観測成分Ｘ(k,m)の信頼度ＣS(k,m)は増加する。したがって、観測成分Ｘ(k,m)が目的音成分に該当する確度の指標として信頼度ＣS(k,m)を利用することが可能である。 There is a general tendency that the target sound component such as voice or musical sound has a significant temporal variation compared to the noise component. Since the first index value P1 (k, m) follows the time change of the observed component X (k, m) with a higher followability than the second index value P2 (k, m), In the case where the target sound component is dominant in comparison with the noise component in the observed component X (k, m), the first index value P1 (k, m) is the second index value P2 (k, m). As a result, the reliability CS (k, m) of the observed component X (k, m) increases. Therefore, the reliability CS (k, m) can be used as an index of the accuracy with which the observed component X (k, m) corresponds to the target sound component.

図２の算定処理部４２４は、以下の数式(4)で表現される通り、数式(3)の演算で算定した信頼度ＣS(k,m)を所定値（第１実施形態の例示では１）から減算することで信頼度ＣN(k,m)を算定する。すなわち、信頼度ＣS(k,m)が増加するほど信頼度ＣN(k,m)が減少するように信頼度ＣN(k,m)は算定される。

前述の通り、観測成分Ｘ(k,m)内で目的音成分が優勢であるほど信頼度ＣS(k,m)は０以上かつ１以下の範囲内で大きい数値となる（雑音成分が優勢であるほど信頼度ＣS(k,m)は小さい数値となる）から、観測成分Ｘ(k,m)内で雑音成分が目的音成分と比較して優勢である（観測成分Ｘ(k,m)が雑音成分に該当する確度が高い）ほど信頼度ＣN(k,m)は大きい数値となる。したがって、観測成分Ｘ(k,m)が雑音成分に該当する確度の指標として信頼度ＣN(k,m)を利用することが可能である。以上に説明した通り、図２の指標算定部４２は、第１指標値Ｐ1(k,m)と第２指標値Ｐ2(k,m)との相違（具体的には両者間の相対比）に応じて各周波数の信頼度ＣN(k,m)を単位期間毎に順次に算定する。図２の推定処理部４４は、指標算定部４２が算定した信頼度ＣN(k,m)を利用して音響信号ｘ(t)の推定雑音成分Ｎ(k,m)を単位期間毎に順次に生成する。 The calculation processing unit 424 in FIG. 2 sets the reliability CS (k, m) calculated by the calculation of Expression (3) to a predetermined value (1 in the example of the first embodiment) as expressed by Expression (4) below. ) To calculate the reliability CN (k, m). That is, the reliability CN (k, m) is calculated so that the reliability CN (k, m) decreases as the reliability CS (k, m) increases.

As described above, the reliability CS (k, m) becomes a larger value within the range of 0 to 1 (the noise component is dominant) as the target sound component is dominant in the observed component X (k, m). As the reliability CS (k, m) becomes smaller, the noise component is more dominant in the observed component X (k, m) than the target sound component (observed component X (k, m)). The higher the accuracy corresponding to the noise component), the higher the reliability CN (k, m). Therefore, the reliability CN (k, m) can be used as an index of the accuracy with which the observed component X (k, m) corresponds to the noise component. As described above, the index calculation unit 42 in FIG. 2 differs between the first index value P1 (k, m) and the second index value P2 (k, m) (specifically, the relative ratio between the two). Accordingly, the reliability CN (k, m) of each frequency is sequentially calculated for each unit period. 2 uses the reliability CN (k, m) calculated by the index calculation unit 42 to sequentially calculate the estimated noise component N (k, m) of the acoustic signal x (t) for each unit period. To generate.

具体的には、推定処理部４４は、以下の数式(5)で表現される通り、各単位期間の観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の指数移動平均を推定雑音成分Ｎ(k,m)として算定する。すなわち、推定雑音成分Ｎ(k,m)は、現在の単位期間の観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²と過去の単位期間（直前の単位期間）の推定雑音成分Ｎ(k,m-1)との加重和とも換言される。
Specifically, the estimation processing unit 44 expresses the exponential moving average of the power | X (k, m) | ² of the observed component X (k, m) in each unit period as expressed by the following formula (5). Is calculated as an estimated noise component N (k, m). That is, the estimated noise component N (k, m) is an estimate of the power | X (k, m) | ² of the observed component X (k, m) in the current unit period and the past unit period (immediate unit period). In other words, the weighted sum with the noise component N (k, m-1).

数式(5)の平滑化係数（加重値）ω(k,m)は、指標算定部４２が算定した信頼度ＣN(k,m)に応じて設定される。具体的には、第１実施形態の平滑化係数ω(k,m)は、以下の数式(6)で表現される通り、信頼度ＣN(k,m)と所定の係数Ａとの乗算値である。

係数Ａは、０以上かつ１以下の範囲内で適切な数値に設定される。具体的には、抑圧対象として想定される雑音成分の定常性（時間的な変動の程度）に応じて係数Ａは可変に設定される。例えば、時間的な変動が比較的に大きい雑音成分が想定される場合には係数Ａが０.２から０.５程度の数値に設定され、時間的な変動が小さい雑音成分が想定される場合には係数Ａが０.０２程度の数値に設定される。利用者からの指示に応じて係数Ａを可変に設定することも可能である。 The smoothing coefficient (weighted value) ω (k, m) in Expression (5) is set according to the reliability CN (k, m) calculated by the index calculation unit 42. Specifically, the smoothing coefficient ω (k, m) of the first embodiment is a product of the reliability CN (k, m) and a predetermined coefficient A as expressed by the following equation (6). It is.

The coefficient A is set to an appropriate numerical value within a range of 0 or more and 1 or less. Specifically, the coefficient A is variably set according to the continuity (degree of temporal variation) of the noise component assumed as a suppression target. For example, when a noise component with a relatively large temporal variation is assumed, the coefficient A is set to a value of about 0.2 to 0.5, and a noise component with a small temporal variation is assumed. The coefficient A is set to a value of about 0.02. It is also possible to variably set the coefficient A according to an instruction from the user.

推定処理部４４は、指標算定部４２が単位期間毎に算定する信頼度ＣN(k,m)を適用した数式(5)の演算を実行することで、推定雑音成分Ｎ(k,m)を信頼度ＣN(k,m)に応じて単位期間毎に更新する。推定雑音成分Ｎ(k,m)の初期値（Ｎ(k,0)）は、例えば以下の数式(7)で表現される通り、雑音区間内のＭ0個の単位期間にわたる観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の平均値（単純平均）に設定される。

雑音区間は、音響信号ｘ(t)のうち目的音成分が存在しないと推定される区間である。例えば、雑音抑圧の開始点から所定の時間長にわたる区間（例えば音響信号ｘ(t)の始点から５００ミリ秒の区間）が雑音区間として好適である。 The estimation processing unit 44 calculates the estimated noise component N (k, m) by executing the calculation of Expression (5) to which the reliability CN (k, m) calculated by the index calculation unit 42 for each unit period is applied. Updates every unit period according to the reliability CN (k, m). The initial value (N (k, 0)) of the estimated noise component N (k, m) is, for example, the observed component X (k over the M0 unit periods in the noise interval, as expressed by the following equation (7). , m) power | X (k, m) | ² is set to an average value (simple average).

The noise section is a section in which the target sound component is estimated not to exist in the acoustic signal x (t). For example, a section over a predetermined time length from the start point of noise suppression (for example, a section of 500 milliseconds from the start point of the acoustic signal x (t)) is suitable as the noise section.

数式(5)および数式(6)から理解される通り、信頼度ＣN(k,m)が大きい（平滑化係数ω(k,m)が大きい）ほど、現在の単位期間の観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の影響（観測成分Ｘ(k,m)が推定雑音成分Ｎ(k,m)に反映される度合）が増加する。すなわち、雑音成分が優勢な観測成分Ｘ(k,m)が優先的に推定雑音成分Ｎ(k,m)に反映される（雑音成分が優勢な観測成分Ｘ(k,m)の重み（混合比率）が増加する）ように推定雑音成分Ｎ(k,m)は単位期間毎に更新される。したがって、第１実施形態によれば、音響信号ｘ(t)の雑音成分が時間的に変動する場合でも雑音成分を高精度に推定することが可能である。図１の雑音抑圧部３６は、以上の手順で雑音推定部３４が推定した各単位期間の推定雑音成分Ｎ(k,m)をその単位期間の観測成分Ｘ(k,m)から抑圧する（数式(1)）。したがって、第１実施形態によれば、雑音成分を高精度に抑圧した音響信号（すなわち、目的音成分が高精度に強調された音響信号）ｙ(t)を生成できるという利点がある。 As understood from the equations (5) and (6), the larger the reliability CN (k, m) (the larger the smoothing coefficient ω (k, m)), the observed component X (k , m) power | X (k, m) | ² (the degree to which the observed component X (k, m) is reflected in the estimated noise component N (k, m)) increases. That is, the observed component X (k, m) with the dominant noise component is preferentially reflected in the estimated noise component N (k, m) (the weight of the observed component X (k, m) with the dominant noise component (mixed) The estimated noise component N (k, m) is updated every unit period so that the ratio) increases. Therefore, according to the first embodiment, it is possible to estimate the noise component with high accuracy even when the noise component of the acoustic signal x (t) fluctuates with time. The noise suppression unit 36 in FIG. 1 suppresses the estimated noise component N (k, m) of each unit period estimated by the noise estimation unit 34 by the above procedure from the observed component X (k, m) of that unit period ( Formula (1)). Therefore, according to the first embodiment, there is an advantage that it is possible to generate an acoustic signal y (t) in which a noise component is suppressed with high accuracy (that is, an acoustic signal in which a target sound component is emphasized with high accuracy).

図３の領域Ｆ31には、雑音成分のみを含む音響信号ｘ(t)のスペクトログラムが図示されている。図３および後掲の図４のスペクトログラムでは、表示階調が低い地点（黒色に近い地点）ほど強度が低いことを意味する。図３では、時間軸上の原点で１種類の雑音成分が開始し、原点から約３秒が経過した時点Ｔ0で他の種類の雑音成分を音響信号ｘ(t)に追加した場合が想定されている。すなわち、音響信号ｘ(t)の雑音成分の音響特性は時点Ｔ0で変動する。図３の領域Ｆ32には、領域Ｆ31の音響信号ｘ(t)から第１実施形態の方法で推定された推定雑音成分Ｎ(k,m)のスペクトログラムが図示されている。また、図３の領域Ｆ31の音響信号ｘ(t)のうち雑音区間（時間軸上の原点から５００ミリ秒の区間）内の観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の平均を雑音成分の推定結果とする場合（以下「対比例」という）の推定雑音成分のスペクトログラムが図３の領域Ｆ33に図示されている。対比例では、時点Ｔ0での雑音成分の変動は推定結果に反映されない。他方、第１実施形態によれば、音響信号ｘ(t)の雑音成分が高精度に推定され、かつ、雑音成分が変動した場合に変動後の雑音成分を迅速に推定できることが図３から明確に把握される。 In a region F31 of FIG. 3, a spectrogram of the acoustic signal x (t) including only the noise component is shown. In the spectrogram of FIG. 3 and FIG. 4 to be described later, a point having a lower display gradation (a point close to black) means that the intensity is lower. In FIG. 3, it is assumed that one type of noise component starts at the origin on the time axis, and another type of noise component is added to the acoustic signal x (t) at time T0 when about 3 seconds have elapsed from the origin. ing. That is, the acoustic characteristic of the noise component of the acoustic signal x (t) varies at time T0. A region F32 in FIG. 3 shows a spectrogram of the estimated noise component N (k, m) estimated from the acoustic signal x (t) in the region F31 by the method of the first embodiment. Also, the power | X (k, m) of the observed component X (k, m) in the noise interval (interval of 500 milliseconds from the origin on the time axis) in the acoustic signal x (t) in the region F31 in FIG. The spectrogram of the estimated noise component when the average of ² is used as the noise component estimation result (hereinafter referred to as “proportional”) is shown in a region F33 of FIG. In contrast, the fluctuation of the noise component at time T0 is not reflected in the estimation result. On the other hand, according to the first embodiment, it is clear from FIG. 3 that the noise component of the acoustic signal x (t) is estimated with high accuracy, and the noise component after the fluctuation can be quickly estimated when the noise component fluctuates. To be grasped.

図４の領域Ｆ41には、図３の領域Ｆ31に例示した雑音成分を目的音成分（音声）に重畳した音響信号ｘ(t)のスペクトログラムが図示されている。また、図４の領域Ｆ42には、第１実施形態の方法で生成された推定雑音成分Ｎ(k,m)（図３の領域Ｆ32）を音響信号ｘ(t)から抑圧した音響信号ｙ(t)のスペクトログラムが図示され、図４の領域Ｆ43には、図３の領域Ｆ33の推定雑音成分を音響信号ｘ(t)から抑圧した場合の抑圧後のスペクトログラムが図示されている。対比例では特に時点Ｔ0以降の雑音成分が抑圧後にも残留するのに対し、第１実施形態によれば、時点Ｔ0の前後にわたり雑音成分が良好に抑圧されることが図４から確認できる。 In the region F41 of FIG. 4, a spectrogram of the acoustic signal x (t) in which the noise component exemplified in the region F31 of FIG. 3 is superimposed on the target sound component (speech) is shown. Also, in the area F42 in FIG. 4, the acoustic signal y ((), the estimated noise component N (k, m) (area F32 in FIG. 3) generated by the method of the first embodiment is suppressed from the acoustic signal x (t). A spectrogram of t) is illustrated, and a spectrogram after suppression when the estimated noise component of the region F33 of FIG. 3 is suppressed from the acoustic signal x (t) is illustrated in a region F43 of FIG. In contrast, in particular, the noise component after time T0 remains even after suppression, whereas according to the first embodiment, it can be confirmed from FIG. 4 that the noise component is well suppressed before and after time T0.

また、第１実施形態では、信頼度ＣN(k,m)に応じた平滑化係数ω(k,m)を観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の指数移動平均に適用することで推定雑音成分Ｎ(k,m)が算定される。したがって、音響信号のうち数秒程度の長時間にわたる区間を対象としてパワーの最小値を探索する非特許文献１の技術と比較して、推定雑音成分Ｎ(k,m)の算定のためにパワー|Ｘ(k,m)|²を保持すべき単位期間の総数（観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の保持に必要な記憶容量）が削減されるという利点がある。 In the first embodiment, the smoothing coefficient ω (k, m) corresponding to the reliability CN (k, m) is used as the exponent of the power | X (k, m) | ² of the observed component X (k, m). The estimated noise component N (k, m) is calculated by applying to the moving average. Therefore, in comparison with the technique of Non-Patent Document 1 that searches for a minimum value of power for a long section of about several seconds in an acoustic signal, the power | for calculating the estimated noise component N (k, m) Advantage of reducing the total number of unit periods in which X (k, m) | ² should be held (the storage capacity necessary to hold the power | X (k, m) | ² of the observed component X (k, m) | There is.

＜第２実施形態＞
本発明の第２実施形態を以下に説明する。なお、以下に例示する各形態において作用や機能が第１実施形態と同様である要素については、第１実施形態の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。 Second Embodiment
A second embodiment of the present invention will be described below. In addition, about the element which an effect | action and function are the same as that of 1st Embodiment in each form illustrated below, the reference | standard referred by description of 1st Embodiment is diverted, and each detailed description is abbreviate | omitted suitably.

第２実施形態の指標算定部４２（算定処理部４２４）は、前掲の数式(4)に代えて、以下の数式(8)の演算を実行することで信頼度ＣS(k,m)から信頼度ＣN(k,m)を算定する。
The index calculation unit 42 (calculation processing unit 424) of the second embodiment performs the calculation of the following equation (8) instead of the above-described equation (4), so that the reliability CS (k, m) The degree CN (k, m) is calculated.

数式(8)から理解される通り、信頼度ＣS(k,m)が閾値ＴH1を上回る場合（ＣS(k,m)＞ＴH1）、指標算定部４２は、信頼度ＣN(k,m)をゼロに設定する。閾値ＴH1は、観測成分Ｘ(k,m)内の目的音成分が充分に優勢であると評価できる信頼度ＣS(k,m)の数値（例えば０.６〜０.９程度の１に近い正数）に好適に設定される。 As understood from Equation (8), when the reliability CS (k, m) exceeds the threshold TH1 (CS (k, m)> TH1), the index calculation unit 42 calculates the reliability CN (k, m). Set to zero. The threshold value TH1 is close to a numerical value of the reliability CS (k, m) (for example, about 0.6 to 0.9) that can be evaluated that the target sound component in the observed component X (k, m) is sufficiently dominant. A positive number).

第１実施形態のもとで信頼度ＣS(k,m)が閾値ＴH1を上回るほど観測成分Ｘ(k,m)内で目的音成分が優勢である場合（雑音成分が極端に少ない場合）、数式(4)で算定される信頼度ＣN(k,m)は、充分に小さい数値ではあるが正数（ＣN(k,m)＞０）に設定される。目的音成分は雑音成分と比較してパワー（エネルギー）が充分に高いから、信頼度ＣN(k,m)が充分に小さい数値に設定された場合でも、観測成分Ｘ(k,m)の目的音成分は推定雑音成分Ｎ(k,m)に反映される。したがって、雑音成分の推定精度が低下する可能性がある。他方、第２実施形態では、信頼度ＣS(k,m)が閾値ＴH1を上回るほど観測成分Ｘ(k,m)内で目的音成分が優勢である場合には信頼度ＣN(k,m)がゼロに設定され、結果的に観測成分Ｘ(k,m)は推定雑音成分Ｎ(k,m)に反映されない。したがって、観測成分Ｘ(k,m)内で目的音成分が充分に優勢な場合でも推定雑音成分Ｎ(k,m)を高精度に推定できるという利点がある。 When the target sound component is dominant in the observed component X (k, m) as the reliability CS (k, m) exceeds the threshold TH1 under the first embodiment (when the noise component is extremely small), The reliability CN (k, m) calculated by the equation (4) is set to a positive number (CN (k, m)> 0) although it is a sufficiently small value. Since the target sound component has a sufficiently high power (energy) compared to the noise component, even if the reliability CN (k, m) is set to a sufficiently small value, the purpose of the observed component X (k, m) The sound component is reflected in the estimated noise component N (k, m). Accordingly, there is a possibility that the estimation accuracy of the noise component is lowered. On the other hand, in the second embodiment, if the target sound component is dominant in the observed component X (k, m) as the reliability CS (k, m) exceeds the threshold TH1, the reliability CN (k, m) Is set to zero, and as a result, the observed component X (k, m) is not reflected in the estimated noise component N (k, m). Therefore, there is an advantage that the estimated noise component N (k, m) can be estimated with high accuracy even when the target sound component is sufficiently dominant in the observed component X (k, m).

また、数式(8)から理解される通り、現在の単位期間における観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²が閾値ＴH2を下回る場合（|Ｘ(k,m)|²＜ＴH2）も同様に、指標算定部４２は、信頼度ＣN(k,m)をゼロに設定する。閾値ＴH2は、充分に小さいと評価できるパワー|Ｘ(k,m)|²の数値に設定される。具体的には、閾値ＴH2は、所定の係数λと直前の単位期間の推定雑音成分Ｎ(k,m-1)との乗算値（したがって可変値）に設定される（ＴH2＝λ・Ｎ(k,m-1)）。係数λは所定の正数（例えば０.１以下の数値）に設定される。 Further, as understood from Equation (8), when the power | X (k, m) | ² of the observed component X (k, m) in the current unit period is lower than the threshold value TH2 (| X (k, m) Similarly, in ² <TH2), the index calculation unit 42 sets the reliability CN (k, m) to zero. The threshold TH2 is set to a numerical value of power | X (k, m) | ^{2 that} can be evaluated to be sufficiently small. Specifically, the threshold value TH2 is set to a multiplication value (and hence a variable value) of the predetermined coefficient λ and the estimated noise component N (k, m−1) of the immediately preceding unit period (TH2 = λ · N ( k, m-1)). The coefficient λ is set to a predetermined positive number (for example, a numerical value of 0.1 or less).

雑音成分は、長期的に観察すれば定常的な音響成分と評価できるが、単位期間程度の短時間に着目すれば短時間で急激に変動する音響成分と評価できる。したがって、長期的には雑音成分を定常的と評価できる場合でも観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²が少数の単位期間にて瞬間的に低下する可能性がある。以上のようにパワー|Ｘ(k,m)|²の瞬間的な低下を推定雑音成分Ｎ(k,m)に反映させた場合、長期的な推定雑音成分Ｎ(k,m)が実際の雑音成分から乖離する可能性がある。第２実施形態では、観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²が閾値ＴH2を下回るほど低下した場合には信頼度ＣN(k,m)がゼロに設定され、結果的に観測成分Ｘ(k,m)は推定雑音成分Ｎ(k,m)に反映されない。すなわち、観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の瞬間的な低下は推定雑音成分Ｎ(k,m)に影響しない。したがって、瞬間的なパワー|Ｘ(k,m)|²の低下も推定雑音成分Ｎ(k,m)に反映される第１実施形態と比較して、推定雑音成分Ｎ(k,m)を高精度に推定できるという効果は格別に顕著である。 A noise component can be evaluated as a steady acoustic component when observed over a long period of time, but can be evaluated as an acoustic component that fluctuates rapidly in a short time when attention is focused on a short time of about a unit period. Therefore, there is a possibility that the power | X (k, m) | ^{2 of the} observed component X (k, m) may instantaneously decrease in a small number of unit periods even if the noise component can be evaluated to be stationary in the long term. is there. As described above, when the instantaneous decrease in the power | X (k, m) | ² is reflected in the estimated noise component N (k, m), the long-term estimated noise component N (k, m) is actually There is a possibility of deviating from the noise component. In the second embodiment, the reliability CN (k, m) is set to zero when the power | X (k, m) | ² of the observed component X (k, m) decreases as it falls below the threshold value TH2. As a result, the observed component X (k, m) is not reflected in the estimated noise component N (k, m). That is, the instantaneous decrease in the power | X (k, m) | ² of the observed component X (k, m) does not affect the estimated noise component N (k, m). Therefore, the estimated noise component N (k, m) is reduced as compared with the first embodiment in which the instantaneous power | X (k, m) | ² is also reflected in the estimated noise component N (k, m). The effect of being able to estimate with high accuracy is particularly remarkable.

なお、数式(8)から理解される通り、信頼度ＣS(k,m)が閾値ＴH1以下であり（ＣS(k,m)≦ＴH1）、かつ、観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²が閾値ＴH2以下である場合（|Ｘ(k,m)|²≦ＴH2）、信頼度ＣN(k,m)は、第１実施形態と同様の方法で信頼度ＣS(k,m)に応じた数値に設定される。 As understood from the equation (8), the reliability CS (k, m) is equal to or less than the threshold TH1 (CS (k, m) ≦ TH1) and the power of the observed component X (k, m) | When X (k, m) | ² is equal to or less than the threshold value TH2 (| X (k, m) | ² ≦ TH2), the reliability CN (k, m) is the same as that in the first embodiment. It is set to a numerical value corresponding to CS (k, m).

＜第３実施形態＞
第３実施形態の推定処理部４４は、第１推定雑音成分Ｑ1(k,m)と第２推定雑音成分Ｑ2(k,m)とを単位期間毎に順次に算定する。第１推定雑音成分Ｑ1(k,m)は、以下の数式(9A)で表現される通り、平滑化係数ω1(k,m)を適用した観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の指数移動平均である。同様に、第２推定雑音成分Ｑ2(k,m)は、以下の数式(9B)で表現される通り、平滑化係数ω2(k,m)を適用した観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の指数移動平均である。
<Third Embodiment>
The estimation processing unit 44 of the third embodiment sequentially calculates the first estimated noise component Q1 (k, m) and the second estimated noise component Q2 (k, m) for each unit period. The first estimated noise component Q1 (k, m) is represented by the following expression (9A), and the power | X (of the observed component X (k, m) to which the smoothing coefficient ω1 (k, m) is applied. k, m) | is an exponential moving average of ² . Similarly, the second estimated noise component Q2 (k, m) is the power of the observed component X (k, m) to which the smoothing coefficient ω2 (k, m) is applied, as expressed by the following equation (9B). | X (k, m) | is an exponential moving average of ² .

数式(9A)の平滑化係数ω1(k,m)は、指標算定部４２（算定処理部４２４）が算定した信頼度ＣN(k,m)と所定の第１係数Ａ1とに応じて設定され、数式(9B)の平滑化係数ω2(k,m)は、信頼度ＣN(k,m)と所定の第２係数Ａ2とに応じて設定される。例えば、平滑化係数ω1(k,m)は、以下の数式(10A)のように信頼度ＣN(k,m)と第１係数Ａ1との乗算値であり、平滑化係数ω2(k,m)は、以下の数式(10B)のように信頼度ＣN(k,m)と第２係数Ａ2との乗算値である。第１係数Ａ1および第２係数Ａ2は、０以上かつ１以下の範囲内で相異なる数値に設定される（０≦Ａ1＜Ａ2≦１）。
The smoothing coefficient ω1 (k, m) of the equation (9A) is set according to the reliability CN (k, m) calculated by the index calculation unit 42 (calculation processing unit 424) and the predetermined first coefficient A1. The smoothing coefficient ω2 (k, m) in Equation (9B) is set according to the reliability CN (k, m) and the predetermined second coefficient A2. For example, the smoothing coefficient ω1 (k, m) is a multiplication value of the reliability CN (k, m) and the first coefficient A1 as in the following formula (10A), and the smoothing coefficient ω2 (k, m) ) Is a multiplication value of the reliability CN (k, m) and the second coefficient A2 as in the following formula (10B). The first coefficient A1 and the second coefficient A2 are set to different numerical values within a range of 0 or more and 1 or less (0 ≦ A1 <A2 ≦ 1).

以上のように第１推定雑音成分Ｑ1(k,m)および第２推定雑音成分Ｑ2(k,m)は信頼度ＣN(k,m)に応じて生成されるから、第１実施形態と推定雑音成分Ｎ(k,m)と同様に、音響信号ｘ(t)の雑音成分の推定結果に相当する。ただし、第１係数Ａ1は第２係数Ａ2を下回る（Ａ1＜Ａ2）。すなわち、平滑化係数ω1(k,m)は平滑化係数ω2(k,m)を下回る（ω1(k,m)＜ω2(k,m)）。したがって、第１推定雑音成分Ｑ1(k,m)は、第２推定雑音成分Ｑ2(k,m)と比較して低い追従性で音響信号ｘ(t)の雑音成分の時間変化に追従する。 As described above, the first estimated noise component Q1 (k, m) and the second estimated noise component Q2 (k, m) are generated according to the reliability CN (k, m). Similar to the noise component N (k, m), this corresponds to the estimation result of the noise component of the acoustic signal x (t). However, the first coefficient A1 is lower than the second coefficient A2 (A1 <A2). That is, the smoothing coefficient ω1 (k, m) is lower than the smoothing coefficient ω2 (k, m) (ω1 (k, m) <ω2 (k, m)). Therefore, the first estimated noise component Q1 (k, m) follows the time change of the noise component of the acoustic signal x (t) with lower followability than the second estimated noise component Q2 (k, m).

第３実施形態の推定処理部４４は、第１推定雑音成分Ｑ1(k,m)と第２推定雑音成分Ｑ2(k,m)とに応じた推定雑音成分Ｎ(k,m)を単位期間毎に順次に生成する。具体的には、以下の数式(11)で表現されるように、推定処理部４４は、第１推定雑音成分Ｑ1(k,m)および第２推定雑音成分Ｑ2(k,m)の最小値を推定雑音成分Ｎ(k,m)として選択する。
The estimation processing unit 44 of the third embodiment uses an estimated noise component N (k, m) corresponding to the first estimated noise component Q1 (k, m) and the second estimated noise component Q2 (k, m) as a unit period. It generates sequentially every time. Specifically, as represented by the following mathematical expression (11), the estimation processing unit 44 calculates the minimum value of the first estimated noise component Q1 (k, m) and the second estimated noise component Q2 (k, m). Is selected as the estimated noise component N (k, m).

第３実施形態においても第１実施形態と同様の効果が実現される。また、第３実施形態では、相異なる平滑化係数（ω1(k,m)，ω2(k,m)）を適用してパワー|Ｘ(k,m)|²を移動平均した第１推定雑音成分Ｑ1(k,m)および第２推定雑音成分Ｑ2(k,m)に応じて推定雑音成分Ｎ(k,m)が算定される。非特許文献１の最小統計法でも前提とされる通り、雑音成分は目的音成分と比較してパワーが低いという傾向があるから、第１推定雑音成分Ｑ1(k,m)および第２推定雑音成分Ｑ2(k,m)の最小値を推定雑音成分Ｎ(k,m)として選択する第３実施形態によれば、第１実施形態と比較して高精度に音響信号ｘ(t)の雑音成分を推定できるという利点がある。 In the third embodiment, the same effect as in the first embodiment is realized. In the third embodiment, the first estimated noise obtained by moving and averaging the power | X (k, m) | ² by applying different smoothing coefficients (ω1 (k, m), ω2 (k, m)). An estimated noise component N (k, m) is calculated according to the component Q1 (k, m) and the second estimated noise component Q2 (k, m). As assumed in the minimum statistical method of Non-Patent Document 1, the noise component tends to be lower in power than the target sound component, so the first estimated noise component Q1 (k, m) and the second estimated noise According to the third embodiment in which the minimum value of the component Q2 (k, m) is selected as the estimated noise component N (k, m), the noise of the acoustic signal x (t) is more accurately compared with the first embodiment. There is an advantage that the component can be estimated.

＜変形例＞
前述の各形態は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様は適宜に併合され得る。 <Modification>
Each of the above-described embodiments can be variously modified. Specific modifications are exemplified below. Two or more aspects arbitrarily selected from the following examples can be appropriately combined.

（１）信頼度ＣN(k,m)に応じた推定雑音成分Ｎ(k,m)を算定する方法は任意であり、前述の各形態で例示した方法には限定されない。推定雑音成分Ｎ(k,m)の他の算定方法を以下に例示する。 (1) The method for calculating the estimated noise component N (k, m) according to the reliability CN (k, m) is arbitrary, and is not limited to the methods exemplified in the above embodiments. Other calculation methods of the estimated noise component N (k, m) are exemplified below.

［例１］推定雑音成分Ｎ(k,m)の算定に適用される平滑化係数ω(k,m)の算定方法は任意であり、前掲の数式(6)の演算には限定されない。例えば、以下の数式(12A)のように信頼度ＣN(k,m)のρ乗（ρ＞１）に応じて平滑化係数ω(k,m)を算定する構成が採用され得る。また、以下の数式(12B)のように信頼度ＣN(k,m)の逆数を冪指数とする係数Ａの冪乗を平滑化係数ω(k,m)として算定することも可能である。
[Example 1] The calculation method of the smoothing coefficient ω (k, m) applied to the calculation of the estimated noise component N (k, m) is arbitrary, and is not limited to the calculation of Equation (6). For example, a configuration in which the smoothing coefficient ω (k, m) is calculated according to the ρ power (ρ> 1) of the reliability CN (k, m) as in the following formula (12A) can be adopted. It is also possible to calculate the power of the coefficient A with the reciprocal of the reliability CN (k, m) as the power exponent as the smoothing coefficient ω (k, m) as in the following formula (12B).

数式(12A)および数式(12B)の何れを採用した場合でも、前述の実施形態と同様に、信頼度ＣN(k,m)が大きいほど、平滑化係数ω(k,m)は大きい数値（すなわち、観測成分Ｘ(k,m)が推定雑音成分Ｎ(k,m)に反映される度合を増加させる数値）に設定される。なお、第１実施形態および第２実施形態の平滑化係数ω(k,m)と同様に、第３実施形態の平滑化係数ω1(k,m)および平滑化係数ω2(k,m)の算定方法も任意であり、数式(12A)や数式(12B)と同様の演算が採用され得る。 Regardless of which formula (12A) or formula (12B) is employed, the smoothing coefficient ω (k, m) increases as the reliability CN (k, m) increases, as in the above-described embodiment. That is, the observed component X (k, m) is set to a numerical value that increases the degree to which the estimated noise component N (k, m) is reflected. Note that the smoothing coefficient ω1 (k, m) and the smoothing coefficient ω2 (k, m) of the third embodiment are similar to the smoothing coefficient ω (k, m) of the first embodiment and the second embodiment. The calculation method is also arbitrary, and the same calculation as the formula (12A) and the formula (12B) can be adopted.

［例２］第３実施形態で例示した数式(11)を以下の数式(13)に置換することも可能である。

すなわち、第１推定雑音成分Ｑ1(k,m)および第２推定雑音成分Ｑ2(k,m)の最小値（min｛Ｑ1(k,m),Ｑ2(k,m)｝）が閾値σ1を上回る単位期間では推定雑音成分Ｎ(k,m)が閾値σ1に設定される。すなわち、閾値σ1を上限値として推定雑音成分Ｎ(k,m)が設定される。閾値σ1は、例えば観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²に設定される。数式(13)を採用した構成では、推定雑音成分Ｎ(k,m)の上限値が閾値σ1に制限されるから、周波数領域での雑音抑圧に起因したミュージカルノイズを低減することが可能である。 [Example 2] Formula (11) exemplified in the third embodiment can be replaced by the following formula (13).

That is, the minimum value (min {Q1 (k, m), Q2 (k, m)}) of the first estimated noise component Q1 (k, m) and the second estimated noise component Q2 (k, m) sets the threshold σ1. In the unit period exceeding, the estimated noise component N (k, m) is set to the threshold σ1. That is, the estimated noise component N (k, m) is set with the threshold value σ1 as the upper limit value. The threshold σ1 is set to, for example, the power | X (k, m) | ² of the observed component X (k, m). In the configuration employing Equation (13), since the upper limit value of the estimated noise component N (k, m) is limited to the threshold value σ1, it is possible to reduce musical noise due to noise suppression in the frequency domain. .

［例３］第３実施形態では、第１推定雑音成分Ｑ1(k,m)および第２推定雑音成分Ｑ2(k,m)の最小値を推定雑音成分Ｎ(k,m)として選択したが（数式(11)）、以下の数式(14)で表現されるように、第１推定雑音成分Ｑ1(k,m)および第２推定雑音成分Ｑ2(k,m)の最大値を推定雑音成分Ｎ(k,m)として選択することも可能である。

数式(14)を採用した構成では、数式(11)の演算で推定雑音成分Ｎ(k,m)を算定する構成と比較して音響信号ｘ(t)における雑音成分の時間変化に迅速に追従するように推定雑音成分Ｎ(k,m)を算定できるという利点がある。 [Example 3] In the third embodiment, the minimum value of the first estimated noise component Q1 (k, m) and the second estimated noise component Q2 (k, m) is selected as the estimated noise component N (k, m). (Expression (11)), as expressed by the following expression (14), the maximum values of the first estimated noise component Q1 (k, m) and the second estimated noise component Q2 (k, m) are estimated noise components. It is also possible to select as N (k, m).

Compared with the configuration in which the estimated noise component N (k, m) is calculated by the calculation of Equation (11), the configuration employing Equation (14) quickly follows the time variation of the noise component in the acoustic signal x (t). Thus, there is an advantage that the estimated noise component N (k, m) can be calculated.

なお、数式(14)を以下の数式(15)に置換することも可能である。

すなわち、第１推定雑音成分Ｑ1(k,m)および第２推定雑音成分Ｑ2(k,m)の最大値（max｛Ｑ1(k,m),Ｑ2(k,m)｝）が閾値σ2を上回る単位期間では推定雑音成分Ｎ(k,m)が閾値σ2に設定される。すなわち、閾値σ2を上限値として推定雑音成分Ｎ(k,m)が設定される。閾値σ2は、例えば観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²に設定される。数式(15)を採用した構成では、推定雑音成分Ｎ(k,m)の上限値が閾値σ2に制限されるから、第１推定雑音成分Ｑ1(k,m)および第２推定雑音成分Ｑ2(k,m)の最大値が誤推定等に起因して極端に大きい数値となった場合でも推定雑音成分Ｎ(k,m)を適切な範囲内に抑制することが可能である。 It is also possible to replace Equation (14) with the following Equation (15).

That is, the maximum value (max {Q1 (k, m), Q2 (k, m)}) of the first estimated noise component Q1 (k, m) and the second estimated noise component Q2 (k, m) sets the threshold σ2. In the unit period exceeding, the estimated noise component N (k, m) is set to the threshold σ2. That is, the estimated noise component N (k, m) is set with the threshold σ2 as the upper limit value. The threshold σ2 is set to the power | X (k, m) | ² of the observed component X (k, m), for example. In the configuration employing Equation (15), since the upper limit value of the estimated noise component N (k, m) is limited to the threshold σ2, the first estimated noise component Q1 (k, m) and the second estimated noise component Q2 ( Even when the maximum value of k, m) becomes an extremely large value due to erroneous estimation or the like, it is possible to suppress the estimated noise component N (k, m) within an appropriate range.

［例４］第３実施形態や本変形例の例２および例３では、第１推定雑音成分Ｑ1(k,m)および第２推定雑音成分Ｑ2(k,m)の一方を選択したが、第１推定雑音成分Ｑ1(k,m)および第2推定雑音成分Ｑ2(k,m)の双方を加味して推定雑音成分Ｎ(k,m)を算定することも可能である。例えば、第１推定雑音成分Ｑ1(k,m)と第２推定雑音成分Ｑ2(k,m)との加重和を推定雑音成分Ｎ(k,m)として算定する構成が好適に採用される。第１推定雑音成分Ｑ1(k,m)および第２推定雑音成分Ｑ2(k,m)の各々の加重値は、例えば利用者からの指示に応じて可変に設定される。 [Example 4] In Example 2 and Example 3 of the third embodiment and this modification, one of the first estimated noise component Q1 (k, m) and the second estimated noise component Q2 (k, m) is selected. It is also possible to calculate the estimated noise component N (k, m) in consideration of both the first estimated noise component Q1 (k, m) and the second estimated noise component Q2 (k, m). For example, a configuration is preferably employed in which a weighted sum of the first estimated noise component Q1 (k, m) and the second estimated noise component Q2 (k, m) is calculated as the estimated noise component N (k, m). Each weight value of the first estimated noise component Q1 (k, m) and the second estimated noise component Q2 (k, m) is variably set according to an instruction from the user, for example.

［例５］推定事前ＳＮＲξ(m)を適用した以下の数式(16)の演算で推定雑音成分Ｎ(k,m)を算定することも可能である。

すなわち、第１推定雑音成分Ｑ1(k,m)および第２推定雑音成分Ｑ2(k,m)の最小値（min｛Ｑ1(k,m),Ｑ2(k,m)｝）と観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²とを推定事前ＳＮＲξ(m)に応じた加重値で加重平均することで推定雑音成分Ｎ(k,m)が算定される。係数ηは、推定事前ＳＮＲξ(m)の寄与度を制御するための変数である。 [Example 5] The estimated noise component N (k, m) can be calculated by the following equation (16) using the estimated prior SNRξ (m).

That is, the minimum value (min {Q1 (k, m), Q2 (k, m)}) of the first estimated noise component Q1 (k, m) and the second estimated noise component Q2 (k, m) and the observed component X The estimated noise component N (k, m) is calculated by performing weighted averaging of the power | X (k, m) | ^{2 of} (k, m) with a weighting value corresponding to the estimated prior SNRξ (m). The coefficient η is a variable for controlling the contribution of the estimated prior SNRξ (m).

推定事前ＳＮＲξ(m)は、例えば推定事後ＳＮＲγ(k,m)を適用した以下の数式(17A)で算定され、数式(17A)の推定事後ＳＮＲγ(k,m)は例えば以下の数式(17B)で算定される。数式(17A)の記号max｛ ,0｝は、演算結果を正数の範囲内に制限する演算を意味する。数式(17A)のうち推定事後ＳＮＲγ(k,m)（Ｋ個の周波数にわたる平均値）から１を減算する演算が、推定事前ＳＮＲを算定する演算に相当する。

数式(17A)の記号εは所定の数値に設定される。なお、ゼロの対数は負の無限大であるから、数値εをゼロに設定すると数式(17A)の演算（log₁₀ε）が不安定となる可能性がある。そこで、数値εは非常に小さい正数（ε＞０）に設定される。 The estimated a priori SNRξ (m) is calculated by, for example, the following formula (17A) to which the estimated posterior SNRγ (k, m) is applied, and the estimated posterior SNRγ (k, m) of the formula (17A) is, for example, ). The symbol max {, 0} in Equation (17A) means an operation that limits the operation result to a positive number range. The calculation of subtracting 1 from the estimated posterior SNRγ (k, m) (average value over K frequencies) in the equation (17A) corresponds to the calculation for calculating the estimated prior SNR.

The symbol ε in the equation (17A) is set to a predetermined numerical value. Note that since the logarithm of zero is negative infinity, when the numerical value ε is set to zero, the calculation (log ₁₀ ε) of Expression (17A) may become unstable. Therefore, the numerical value ε is set to a very small positive number (ε> 0).

［例６］前述の各形態では、信頼度ＣN(k,m)に応じた平滑化係数ω(k,m)を適用した観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の指数移動平均を推定雑音成分Ｎ(k,m)として算定したが、以下の数式(18)で表現される通り、観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の加重移動平均で推定雑音成分Ｎ(k,m)を算定することも可能である。

数式(18)から理解される通り、第ｍ番目の単位期間を最後尾とするＭ個の単位期間の各々の観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²を、Ｍ個の単位期間にわたる信頼度ＣN(k,m)の合計値に対するその単位期間の信頼度ＣN(k,m)の相対比（ＣN(k,m)/ΣＣN(k,m-n)）を加重値として加重平均（加重加算）することで推定雑音成分Ｎ(k,m)が算定される。 [Example 6] In each embodiment described above, the power | X (k, m) of the observed component X (k, m) to which the smoothing coefficient ω (k, m) corresponding to the reliability CN (k, m) is applied. The exponential moving average of ² was calculated as the estimated noise component N (k, m). As expressed by the following equation (18), the power of the observed component X (k, m) | X (k, m) It is also possible to calculate the estimated noise component N (k, m) with a weighted moving average of ² .

As understood from the equation (18), the power | X (k, m) | ² of the observed component X (k, m) of each of the M unit periods with the m-th unit period as the tail, Weighted relative ratio (CN (k, m) / ΣCN (k, mn)) of reliability CN (k, m) of unit period to total value of reliability CN (k, m) over M unit periods The estimated noise component N (k, m) is calculated by weighted average (weighted addition) as a value.

以上の例示から理解されるように、推定処理部４４は、各単位期間の信頼度ＣN(k,m)を利用して音響信号ｘ(t)（観測成分Ｘ(k,m)）から推定雑音成分Ｎ(k,m)を生成する要素（推定処理手段）として包括され、信頼度ＣN(k,m)および音響信号ｘ(t)から推定雑音成分Ｎ(k,m)を生成する具体的な方法の如何は不問である。 As understood from the above examples, the estimation processing unit 44 estimates from the acoustic signal x (t) (observed component X (k, m)) using the reliability CN (k, m) of each unit period. Concretely included as an element (estimation processing means) for generating the noise component N (k, m), and generating the estimated noise component N (k, m) from the reliability CN (k, m) and the acoustic signal x (t) It doesn't matter what the method is.

（２）前述の各形態では、観測成分Ｘ(k,m)が雑音成分に該当する確度の指標である信頼度ＣN(k,m)を推定雑音成分Ｎ(k,m)の算定に適用したが、観測成分Ｘ(k,m)が目的音成分に該当する確度の指標である信頼度ＣS(k,m)を推定雑音成分Ｎ(k,m)の算定に適用することも可能である。例えば、前掲の数式(5)および数式(6)は以下の数式(5A)および数式(6A)に置換され得る。信頼度ＣN(k,m)の算定は省略される。
(2) In the above-described embodiments, the reliability CN (k, m), which is an index of the accuracy that the observed component X (k, m) corresponds to the noise component, is applied to the calculation of the estimated noise component N (k, m). However, it is also possible to apply the reliability CS (k, m), which is an index of the accuracy that the observed component X (k, m) corresponds to the target sound component, to calculate the estimated noise component N (k, m). is there. For example, the mathematical formulas (5) and (6) described above can be replaced with the following mathematical formulas (5A) and (6A). Calculation of the reliability CN (k, m) is omitted.

数式(5A)および数式(6A)を利用した構成では、信頼度ＣS(k,m)が小さい（観測成分Ｘ(k,m)内で雑音成分が目的音成分と比較して優勢である）ほど平滑化係数ω(k,m)が小さい数値となり、現在の単位期間の観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の影響が増加する。すなわち、観測成分Ｘ(k,m)内の雑音成分の優劣と観測成分Ｘ(k,m)が推定雑音成分Ｎ(k,m)に反映される度合との関係は、第１実施形態と同様に制御される。以上の説明から理解される通り、前述の各形態における指標算定部４２は、音響信号ｘ(t)の各単位期間が目的音成分および雑音成分の一方に該当する確度の指標である信頼度（ＣS(k,m)，ＣN(k,m)）を単位期間毎に算定する要素（指標算定手段）として包括される。 In the configuration using Equation (5A) and Equation (6A), the reliability CS (k, m) is small (the noise component is dominant in the observed component X (k, m) compared to the target sound component). As the smoothing coefficient ω (k, m) becomes smaller, the influence of the power | X (k, m) | ² of the observed component X (k, m) in the current unit period increases. That is, the relationship between the superiority or inferiority of the noise component in the observed component X (k, m) and the degree to which the observed component X (k, m) is reflected in the estimated noise component N (k, m) is the same as that in the first embodiment. It is controlled similarly. As understood from the above description, the index calculation unit 42 in each of the above-described embodiments is a reliability (an index of accuracy in which each unit period of the acoustic signal x (t) corresponds to one of the target sound component and the noise component ( CS (k, m), CN (k, m)) is included as an element (index calculation means) for calculating each unit period.

（３）前述の各形態では、観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の指数移動平均を第１指標値Ｐ1(k,m)および第２指標値Ｐ2(k,m)として算定したが（数式(2A)，数式(2B)）、第１指標値Ｐ1(k,m)および第２指標値Ｐ2(k,m)を算定する方法は適宜に変更される。例えば、以下の数式(19A)および数式(19B)のように観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の単純移動平均で第１指標値Ｐ1(k,m)および第２指標値Ｐ2(k,m)を算定することも可能である。
(3) In each of the above-described embodiments, the exponential moving average of the power | X (k, m) | ² of the observed component X (k, m) is expressed as the first index value P1 (k, m) and the second index value P2 ( k, m) (Formula (2A), Formula (2B)), but the method of calculating the first index value P1 (k, m) and the second index value P2 (k, m) has been changed as appropriate. The For example, the first index value P1 (k, m) is a simple moving average of the power | X (k, m) | ² of the observed component X (k, m) as in the following equations (19A) and (19B). It is also possible to calculate the second index value P2 (k, m).

数式(19A)から理解される通り、第１指標値Ｐ1(k,m)は、第ｍ番目の単位期間を最後尾とするＭ1個（Ｍ1は１以上の自然数）の単位期間にわたる観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の平均である。また、数式(19B)から理解される通り、第２指標値Ｐ2(k,m)は、第ｍ番目の単位期間を最後尾とするＭ2個の単位期間にわたる観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の平均である。数式(19B)の平均個数Ｍ2は、数式（19A）の平均個数Ｍ1を上回る（Ｍ2＞Ｍ1）。したがって、数式(19B)による平滑化の時定数τ2は、数式(19A)による平滑化の時定数τ1を上回る（τ2＞τ1）。すなわち、第１実施形態と同様に、第２指標値Ｐ2(k,m)は、第１指標値Ｐ1(k,m)と比較して低い追従性で観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の時間変化に追従する。なお、数式(19A)および数式(19B)では単純移動平均を例示したが、複数の単位区間にわたるパワー|Ｘ(k,m)|²の加重移動平均を第１指標値Ｐ1(k,m)および第２指標値Ｐ2(k,m)として算定することも可能である。 As understood from the equation (19A), the first index value P1 (k, m) is the observed component X over M1 (M1 is a natural number of 1 or more) unit periods that end with the mth unit period. This is the average of the power | X (k, m) | ² of (k, m). Further, as understood from the equation (19B), the second index value P2 (k, m) is the observed component X (k, m) over M2 unit periods with the mth unit period at the end. Average of power | X (k, m) | ² . The average number M2 of the formula (19B) exceeds the average number M1 of the formula (19A) (M2> M1). Therefore, the smoothing time constant τ2 according to Expression (19B) exceeds the smoothing time constant τ1 according to Expression (19A) (τ2> τ1). That is, as in the first embodiment, the second index value P2 (k, m) is lower in followability than the first index value P1 (k, m) and the power of the observed component X (k, m). Follow the time change of | X (k, m) | ² . Note that although the simple moving average is exemplified in the equations (19A) and (19B), the weighted moving average of the power | X (k, m) | ² over a plurality of unit sections is used as the first index value P1 (k, m). It is also possible to calculate as the second index value P2 (k, m).

以上の説明から理解される通り、第１指標値Ｐ1(k,m)は、音響信号ｘ(t)の時間変化（観測成分Ｘ(k,m)のパワー|Ｘ(k,m)|²の時間変化）に追従する数値として包括され、第２指標値Ｐ2(k,m)は、第１指標値Ｐ1(k,m)と比較して低い追従性で音響信号ｘ(t)の時間変化に追従する数値として包括される。 As understood from the above description, the first index value P1 (k, m) is the time change of the acoustic signal x (t) (the power | X (k, m) | ^{2 of the} observed component X (k, m). The second index value P2 (k, m) is lower than the first index value P1 (k, m) and the time of the acoustic signal x (t) is lower than the first index value P1 (k, m). It is included as a numerical value that follows changes.

（４）前述の各形態では、推定雑音成分Ｎ(k,m)を周波数領域にて観測成分Ｘ(k,m)から減算することで雑音成分を抑圧したが、推定雑音成分Ｎ(k,m)を利用して音響信号ｘ(t)から雑音成分を抑圧する処理の内容は適宜に変更される。例えば、推定雑音成分Ｎ(k,m)に応じた調整値（スペクトルゲイン）を観測成分Ｘ(k,m)に乗算することで雑音成分を抑圧する乗算型の雑音抑圧にも本発明は適用され得る。具体的には、ウィナーフィルタを利用した雑音抑圧、ＭＭＳＥ-ＳＴＳＡ法やＭＭＳＥ-ＬＳＡ法を利用した雑音抑圧、あるいは、ＭＡＰ推定を利用した音声強調等に代表される任意の雑音抑圧技術に、以上の各形態で算定された推定雑音成分Ｎ(k,m)を適用することが可能である。 (4) In each of the above-described embodiments, the noise component is suppressed by subtracting the estimated noise component N (k, m) from the observed component X (k, m) in the frequency domain, but the estimated noise component N (k, m The content of the process of suppressing the noise component from the acoustic signal x (t) using m) is appropriately changed. For example, the present invention is also applied to multiplicative noise suppression that suppresses a noise component by multiplying an observed component X (k, m) by an adjustment value (spectral gain) corresponding to the estimated noise component N (k, m). Can be done. Specifically, noise suppression using a Wiener filter, noise suppression using the MMSE-STSA method or MMSE-LSA method, or any noise suppression technique represented by speech enhancement using MAP estimation, etc. It is possible to apply the estimated noise component N (k, m) calculated in each form.

（５）前述の各形態では、音響信号ｘ(t)を周波数軸上で分割した複数の観測成分Ｘ(k,m)の各々について推定雑音成分Ｎ(k,m)の算定および抑圧を並列に実行したが、周波数毎の推定雑音成分Ｎ(k,m)の算定および抑圧は本発明に必須の要件ではない。例えば、音響信号ｘ(t)を周波数軸上で分割した複数の観測成分Ｘ(k,m)の何れかについて推定雑音成分Ｎ(k,m)の算定および抑圧を実行する構成や、音響信号ｘ(t)を複数の観測成分Ｘ(k,m)に分割せずに（すなわち処理対象の全帯域にわたり一括的に）推定雑音成分Ｎ(m)の算定および抑圧を実行する構成も採用され得る。 (5) In each of the above-described embodiments, calculation and suppression of the estimated noise component N (k, m) are performed in parallel for each of a plurality of observed components X (k, m) obtained by dividing the acoustic signal x (t) on the frequency axis. However, calculation and suppression of the estimated noise component N (k, m) for each frequency are not essential requirements for the present invention. For example, a configuration for calculating and suppressing an estimated noise component N (k, m) for any one of a plurality of observed components X (k, m) obtained by dividing the acoustic signal x (t) on the frequency axis, A configuration that calculates and suppresses the estimated noise component N (m) without dividing x (t) into a plurality of observed components X (k, m) (that is, collectively over the entire band to be processed) is also employed. obtain.

（６）前述の各形態で例示した推定雑音成分Ｎ(k,m)の生成および抑圧は、公知の他の音響処理技術に適用され得る。例えば、複数の収音機器（マイクロフォンアレイ）で生成された各チャネルの音響信号ｘ(t)について前述の各形態と同様に推定雑音成分Ｎ(k,m)の生成および抑圧を実行することで音響信号ｙ(t)をチャネル毎に生成し、各チャネルの音響信号ｙ(t)を利用した指向性制御処理（例えば遅延加算型や死角制御型等のビーム形成）で既知の音源方向に収音ビーム（収音感度が高い領域）を形成することも可能である。 (6) The generation and suppression of the estimated noise component N (k, m) exemplified in the above embodiments can be applied to other known acoustic processing techniques. For example, by generating and suppressing the estimated noise component N (k, m) for the acoustic signals x (t) of each channel generated by a plurality of sound collection devices (microphone arrays) in the same manner as the above-described embodiments. An acoustic signal y (t) is generated for each channel, and is adjusted to a known sound source direction by directivity control processing (for example, beam formation of a delay addition type or blind spot control type) using the acoustic signal y (t) of each channel. It is also possible to form a sound beam (region with high sound collection sensitivity).

（７）前述の各形態で例示した各種の演算に適用される音響信号ｘ(t)（観測成分Ｘ(k,m)）のパワー|Ｘ(k,m)|²を振幅|Ｘ(k,m)|や振幅|Ｘ(k,m)|の冪乗（例えば４乗や１/２乗）に置換することも可能である。音響信号ｘ(t)の振幅|Ｘ(k,m)|の冪乗は、音響信号ｘ(t)の強度として包括される。音響信号ｘ(t)のパワー|Ｘ(k,m)|²や振幅|Ｘ(k,m)|は、音響信号ｘ(t)の強度の典型例である。 (7) The power | X (k, m) | ² of the acoustic signal x (t) (observed component X (k, m)) applied to the various computations exemplified in the above embodiments is expressed as the amplitude | X (k , m) | and amplitude | X (k, m) | can be replaced with a power (for example, fourth power or half power). The power of the amplitude | X (k, m) | of the acoustic signal x (t) is included as the intensity of the acoustic signal x (t). The power | X (k, m) | ² and the amplitude | X (k, m) | of the acoustic signal x (t) are typical examples of the intensity of the acoustic signal x (t).

（８）前述の各形態では、推定雑音成分Ｎ(k,m)の生成および抑圧の双方を実行する音響処理装置１００（雑音抑圧装置）を例示したが、音響信号ｘ(t)から推定雑音成分Ｎ(k,m)を算定する音響処理装置（雑音推定装置）としても本発明は適用され得る。すなわち、図１の雑音抑圧部３６を省略することも可能である。 (8) In each of the above-described embodiments, the acoustic processing device 100 (noise suppression device) that executes both generation and suppression of the estimated noise component N (k, m) has been illustrated, but the estimated noise is calculated from the acoustic signal x (t). The present invention can also be applied as an acoustic processing device (noise estimation device) for calculating the component N (k, m). That is, the noise suppression unit 36 in FIG. 1 can be omitted.

また、音響信号ｘ(t)の各単位期間が目的音成分に該当する確度の指標である信頼度ＣS(k,m)、または、音響信号ｘ(t)の各単位期間が雑音成分に該当する確度の指標である信頼度ＣN(k,m)を算定する音響処理装置（指標算定装置）としても本発明は適用され得る。信頼度ＣS(k,m)および信頼度ＣN(k,m)を利用する方法の典型例は、前述の各形態で例示した雑音成分の推定（推定雑音成分Ｎ(k,m)の算定）であるが、雑音成分の推定以外の用途に信頼度ＣS(k,m)または信頼度ＣN(k,m)を利用することも可能である。具体的には、信頼度ＣS(k,m)または信頼度ＣN(k,m)に応じて音響信号ｘ(t)を時間軸上で雑音区間と目的音区間とに区別する構成が想定される。例えば、複数の周波数にわたる信頼度ＣN(k,m)の平均値が所定の閾値を上回る単位期間を雑音区間に選別する構成や、複数の周波数にわたる信頼度ＣS(k,m)の平均値が所定の閾値を上回る単位期間を目的音区間に選別する構成が好適に採用される。以上の説明から理解されるように、推定処理部４４による推定雑音成分Ｎ(k,m)の生成は省略され得る。 In addition, reliability CS (k, m), which is an index of the accuracy with which each unit period of the acoustic signal x (t) corresponds to the target sound component, or each unit period of the acoustic signal x (t) corresponds to the noise component. The present invention can also be applied to an acoustic processing device (index calculation device) that calculates the reliability CN (k, m) that is an index of the accuracy to be performed. A typical example of the method using the reliability CS (k, m) and the reliability CN (k, m) is the estimation of the noise component exemplified in the above-described embodiments (calculation of the estimated noise component N (k, m)). However, the reliability CS (k, m) or the reliability CN (k, m) can be used for applications other than the estimation of the noise component. Specifically, a configuration is assumed in which the acoustic signal x (t) is distinguished into a noise interval and a target sound interval on the time axis according to the reliability CS (k, m) or the reliability CN (k, m). The For example, a configuration in which a unit period in which the average value of reliability CN (k, m) over a plurality of frequencies exceeds a predetermined threshold is selected as a noise interval, or the average value of reliability CS (k, m) over a plurality of frequencies is A configuration in which a unit period exceeding a predetermined threshold is selected as a target sound section is preferably employed. As can be understood from the above description, the generation of the estimated noise component N (k, m) by the estimation processing unit 44 can be omitted.

（９）携帯電話機等の端末装置と通信するサーバ装置で音響処理装置１００（または雑音抑圧装置）を実現することも可能である。例えば、音響処理装置１００は、端末装置から受信した音響信号ｘ(t)から音響信号ｙ(t)を生成して端末装置に送信する。なお、音響信号ｘ(t)の各観測成分Ｘ(k,m)を音響処理装置１００が端末装置から受信する構成（端末装置が周波数分析部３２を具備する構成）では周波数分析部３２が省略され、各雑音抑圧成分Ｙ(k,m)を音響処理装置１００から端末装置に送信する構成（端末装置が波形生成部３８を具備する構成）では波形生成部３８が省略される。また、音響処理装置１００が、端末装置から受信した音響信号ｘ(t)から推定雑音成分Ｎ(k,m)を生成して端末装置に送信する構成（例えば端末装置が雑音抑圧部３６を具備する構成）では雑音抑圧部３６が省略され得る。 (9) The sound processing device 100 (or noise suppression device) can be realized by a server device that communicates with a terminal device such as a mobile phone. For example, the acoustic processing device 100 generates an acoustic signal y (t) from the acoustic signal x (t) received from the terminal device and transmits the acoustic signal y (t) to the terminal device. In the configuration in which the acoustic processing device 100 receives each observation component X (k, m) of the acoustic signal x (t) from the terminal device (a configuration in which the terminal device includes the frequency analysis unit 32), the frequency analysis unit 32 is omitted. In the configuration in which each noise suppression component Y (k, m) is transmitted from the sound processing apparatus 100 to the terminal device (the configuration in which the terminal device includes the waveform generation unit 38), the waveform generation unit 38 is omitted. In addition, the acoustic processing device 100 generates an estimated noise component N (k, m) from the acoustic signal x (t) received from the terminal device and transmits the estimated noise component N (k, m) to the terminal device (for example, the terminal device includes the noise suppression unit 36). In the configuration, the noise suppression unit 36 can be omitted.

１００……音響処理装置、１２……信号供給装置、１４……放音装置、２２……演算処理装置、２４……記憶装置、３２……周波数分析部、３４……雑音推定部、３６……雑音抑圧部、３８……波形生成部、４２……指標算定部、４２２……平滑処理部、４２４……算定処理部、４４……推定処理部。
DESCRIPTION OF SYMBOLS 100 ... Acoustic processing device, 12 ... Signal supply device, 14 ... Sound emission device, 22 ... Arithmetic processing device, 24 ... Memory | storage device, 32 ... Frequency analysis part, 34 ... Noise estimation part, 36 ... ... noise suppression unit, 38 ... waveform generation unit, 42 ... index calculation unit, 422 ... smoothing processing unit, 424 ... calculation processing unit, 44 ... estimation processing unit.

Claims

A first index value that follows the time change of the acoustic signal including the target sound component and the noise component, and a second index value that follows the time change of the sound signal with lower followability than the first index value. In accordance with the difference, the index calculation means for calculating, for each unit period, reliability, which is an accuracy index corresponding to one of the target sound component and the noise component, in each unit period of the acoustic signal,
An acoustic processing apparatus comprising: an estimation processing unit that generates an estimated noise component from the acoustic signal using the reliability of each unit period calculated by the index calculation unit.

The index calculating means determines a basic index, which is an index of accuracy that each unit period of the acoustic signal corresponds to the target sound component, for each unit period according to a difference between the first index value and the second index value. The reliability is set to a predetermined value for a unit period in which the basic index exceeds the first threshold or a unit period in which the intensity of the acoustic signal is lower than the second threshold, while other unit periods are set as the basic index. The sound processing apparatus according to claim 1, wherein the reliability is calculated accordingly.

2. The estimation processing means generates the estimated noise component by applying a smoothing coefficient corresponding to the reliability calculated by the index calculation means to an exponential moving average of the intensity of each unit period of the acoustic signal. Or the acoustic processing apparatus of Claim 2.

The estimation processing means applies a smoothing coefficient corresponding to the reliability and the first coefficient calculated by the index calculation means to the exponential moving average of the intensity of each unit period of the acoustic signal, thereby to obtain a first estimated noise component. And applying a smoothing coefficient according to the reliability calculated by the index calculation means and the second coefficient different from the first coefficient to the exponential moving average of the intensity of each unit period of the acoustic signal. The acoustic processing apparatus according to any one of claims 1 to 3, wherein a second estimated noise component is generated and the estimated noise component is generated according to the first estimated noise component and the second estimated noise component.

The acoustic processing apparatus according to claim 1, further comprising: a noise suppression unit that suppresses an estimated noise component generated by the estimation processing unit from the acoustic signal.