JP5457999B2

JP5457999B2 - Noise suppressor, method and program thereof

Info

Publication number: JP5457999B2
Application number: JP2010273702A
Authority: JP
Inventors: 雅清藤本; 晋治渡部; 智広中谷
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-12-08
Filing date: 2010-12-08
Publication date: 2014-04-02
Anticipated expiration: 2030-12-08
Also published as: JP2012123185A

Description

この発明は、目的信号である音声信号に雑音信号が重畳した音響信号から、雑音信号を抑圧して目的信号を抽出する雑音抑圧装置と、その方法とプログラムに関する。 The present invention relates to a noise suppression device that suppresses a noise signal and extracts a target signal from an acoustic signal in which the noise signal is superimposed on an audio signal that is a target signal, and a method and a program thereof.

自動音声認識技術を実際の環境で利用する場合においては、処理対象とする目的信号（音声信号）以外の信号、つまり雑音が含まれる音響信号から雑音を取り除き、所望の目的信号のみを抽出する必要がある。その雑音抑圧性能の向上は、早急に解決されるべき課題である。 When using automatic speech recognition technology in an actual environment, it is necessary to remove noise from signals other than the target signal (speech signal) to be processed, that is, an acoustic signal containing noise, and extract only the desired target signal. There is. Improvement of the noise suppression performance is a problem to be solved as soon as possible.

非特許文献１には、予め推定した音声信号と雑音信号の確率モデルから入力信号の確率モデルを生成して確率モデルと入力信号全体の統計量との差分をテイラー展開で表現し、その差分をＥＭアルゴリズムを用いて推定して入力信号の確率モデルを最適化する。そして、その後、最適化された入力信号の確率モデルと音声信号の確率モデルのパラメータを用いて雑音を抑圧する方法が開示されている。 In Non-Patent Document 1, a probability model of an input signal is generated from a probability model of a speech signal and a noise signal estimated in advance, and a difference between the probability model and the statistics of the entire input signal is expressed by Taylor expansion. Estimate using the EM algorithm to optimize the stochastic model of the input signal. After that, a method of suppressing noise using the optimized input signal probability model and speech signal probability model parameters is disclosed.

また、非特許文献２には、並列非線形カルマンフィルタにより雑音信号を推定して音声信号区間検出と雑音抑圧で確率モデルを共有して情報の共有を密にし、音声信号区間検出結果に応じた最適な雑音抑圧フィルタを設計する音声信号区間検出機能付き雑音抑圧方法が開示されている。 In Non-Patent Document 2, a noise signal is estimated by a parallel nonlinear Kalman filter, a probability model is shared by voice signal section detection and noise suppression, and information sharing is made dense. A noise suppression method with a speech signal section detection function for designing a noise suppression filter is disclosed.

P. J. Moreno, B. Raj, and R. M. Stern, “A vector Taylor series approach for environment-independent speech recognition,” in Proceedings of ICASSP ’96, vol. II, pp. 733-736, May 1996.P. J. Moreno, B. Raj, and R. M. Stern, “A vector Taylor series approach for environment-independent speech recognition,” in Proceedings of ICASSP '96, vol. II, pp. 733-736, May 1996. Masakiyo Fujimoto, Kentaro Ishizuka, and Tomohiro Nakatani, “Study of Integration of Statistical Model-Based Voice Activity Detection and Noise Suppression,” in Proceedings of Interspeech ’08, pp. 2008-2011, Sept. 2008.Masakiyo Fujimoto, Kentaro Ishizuka, and Tomohiro Nakatani, “Study of Integration of Statistical Model-Based Voice Activity Detection and Noise Suppression,” in Proceedings of Interspeech ’08, pp. 2008-2011, Sept. 2008.

非特許文献１に開示された技術では、収音された入力信号全体を用いてＥＭアルゴリズムにより入力信号の確率モデルを最適化するが、入力音響信号に含まれる雑音信号の特徴が定常的なものであるという前提のもとで雑音抑圧を行う。しかし、実環境における雑音信号の多くは非定常的な特徴を持っている。つまり、雑音信号の統計的な特徴が時間の経過に伴って変動するので、雑音の時間変動に対応できず、十分な雑音抑圧性能が得られない。 In the technique disclosed in Non-Patent Document 1, a stochastic model of an input signal is optimized by an EM algorithm using the entire collected input signal. However, the noise signal included in the input acoustic signal has a steady feature. Noise suppression is performed on the assumption that However, many noise signals in the real environment have non-stationary characteristics. In other words, since the statistical characteristics of the noise signal vary with time, it is not possible to cope with the time variation of noise, and sufficient noise suppression performance cannot be obtained.

非特許文献２には、並列非線形カルマンフィルタにより非定常的な雑音信号を逐次的に推定する方法が開示されているが、雑音の潜在的な成分（パラメータ）の存在について考慮されておらず、並列非線形カルマンフィルタの逐次推定手法に適さない成分が存在しても逐次推定手法によって雑音信号を推定してしまう。その結果、雑音信号の推定誤差が増大し、十分な雑音抑圧性能が得られない場合がある。 Non-Patent Document 2 discloses a method of sequentially estimating a non-stationary noise signal using a parallel nonlinear Kalman filter, but does not take into account the presence of a potential component (parameter) of noise. Even if there is a component that is not suitable for the nonlinear Kalman filter successive estimation method, the noise signal is estimated by the successive estimation method. As a result, the estimation error of the noise signal increases, and sufficient noise suppression performance may not be obtained.

この発明は、このような点に鑑みてなされたものであり、雑音信号を、定常成分（バイアス成分）と非定常成分（残差成分）とに分解することで、高精度に雑音を推定して抑圧することが可能な雑音抑圧装置と、その方法とプログラムを提供することを目的とする。 The present invention has been made in view of such a point, and noise is estimated with high accuracy by decomposing a noise signal into a stationary component (bias component) and an unsteady component (residual component). It is an object of the present invention to provide a noise suppression device that can be suppressed, a method thereof, and a program.

この発明の雑音抑圧装置は、音響特徴抽出部と、雑音バイアス成分推定部と、雑音残差成分推定部と、雑音抑圧部と、を具備する。音響特徴抽出部は、目的信号である音声信号に雑音信号が重畳した音響信号を入力として、上記音響信号の一定時間長をフレームとしたフレーム毎に複素数スペクトルと対数メルスペクトルを音響特徴量として抽出する。雑音バイアス成分推定部は、対数メルスペクトルと、無音ＧＭＭとクリーン音声ＧＭＭのパラメータと、を入力として雑音信号の音響特徴量空間の重心であるバイアス成分を最適推定する。雑音残差成分推定部は、対数メルスペクトルとバイアス成分と、無音ＧＭＭとクリーン音声ＧＭＭのパラメータと、を入力として雑音信号とバイアス成分との差分である残差成分を最適推定する。雑音抑圧部は、対数メルスペクトルと複素数スペクトルと、バイアス成分と残差成分と、無音ＧＭＭとクリーン音声ＧＭＭのパラメータと、を入力として雑音信号を抑圧した音響信号を出力する。 The noise suppression device of the present invention includes an acoustic feature extraction unit, a noise bias component estimation unit, a noise residual component estimation unit, and a noise suppression unit. The acoustic feature extraction unit receives an acoustic signal obtained by superimposing a noise signal on the target audio signal, and extracts a complex number spectrum and a log mel spectrum as acoustic feature amounts for each frame with a certain time length of the acoustic signal as a frame. To do. The noise bias component estimator optimally estimates the bias component that is the center of gravity of the acoustic feature amount space of the noise signal with the log mel spectrum and the parameters of the silent GMM and the clean speech GMM as inputs. The noise residual component estimation unit optimally estimates a residual component that is a difference between the noise signal and the bias component by using the log mel spectrum, the bias component, and the parameters of the silent GMM and the clean speech GMM as inputs. The noise suppression unit outputs an acoustic signal in which the noise signal is suppressed with the log mel spectrum, the complex spectrum, the bias component and the residual component, and the parameters of the silence GMM and the clean speech GMM as inputs.

この発明の雑音抑圧装置は、雑音信号が重畳した音響信号を、時間変化を伴わないバイアス成分と、時間変動を伴う残差成分とに分解し、各々の成分に適した推定方法を適用して雑音を高精度に推定するので、雑音抑圧性能を高めることが出来る。 The noise suppression device according to the present invention decomposes an acoustic signal on which a noise signal is superimposed into a bias component not accompanied by a time change and a residual component accompanying a time change, and applies an estimation method suitable for each component. Since noise is estimated with high accuracy, noise suppression performance can be improved.

雑音信号の２次元特徴量空間を概念的に示す図。The figure which shows notionally the two-dimensional feature-value space of a noise signal. この発明の雑音抑圧装置１００の機能構成例を示す図。The figure which shows the function structural example of the noise suppression apparatus 100 of this invention. 雑音抑圧装置１００の動作フローを示す図。The figure which shows the operation | movement flow of the noise suppression apparatus. 雑音バイアス成分推定部１１の機能構成例を示す図。The figure which shows the function structural example of the noise bias component estimation part 11. FIG. 雑音バイアス成分推定部１１の動作フローを示す図。The figure which shows the operation | movement flow of the noise bias component estimation part 11. FIG. 雑音残差成分推定部１２の機能構成例を示す図。The figure which shows the function structural example of the noise residual component estimation part 12. FIG. 雑音残差成分推定部１２の動作フローを示す図。The figure which shows the operation | movement flow of the noise residual component estimation part 12. FIG. 雑音抑圧部１４の機能構成例を示す図。The figure which shows the function structural example of the noise suppression part 14. FIG. 雑音抑圧フィルタ推定部１４０の機能構成例を示す図。The figure which shows the function structural example of the noise suppression filter estimation part 140. FIG. 雑音抑圧フィルタ推定部１４０の動作フローを示す図。The figure which shows the operation | movement flow of the noise suppression filter estimation part 140. 雑音抑圧フィルタ適用部１４１の機能構成例を示す図。The figure which shows the function structural example of the noise suppression filter application part 141. 雑音抑圧フィルタ適用部１４１の動作フローを示す図。The figure which shows the operation | movement flow of the noise suppression filter application part 141. 時間領域の音声波形を示す図であり、（ａ）は目的信号である音声信号に空港ロビー雑音を重畳させた音響信号ｏ_τであり、（ｂ）はこの発明の雑音抑圧装置にその音響信号ｏ_τを入力して得られた雑音抑圧音声＾ｓ_τを示す図である。It is a figure which shows the audio | voice waveform of a time domain, (a) is the acoustic signal o ( _tau ) which superimposed the airport lobby noise on the audio | voice signal which is a target signal, (b) is the acoustic signal to the noise suppression apparatus of this invention. It is a figure which shows noise suppression sound ^ s ( _tau) obtained by inputting o ( _tau ).

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。なお、以下の説明において、説明の中で使用する記号「＾」、「〜」等は、本来直後の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直前に記載する。式中においては、これらの記号は本来の位置に記述している。また、各変数は特にことわりの無い限り縦ベクトルである。実施例の説明の前に、この発明の基本的な考えについて説明する。 Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated. In the following description, the symbols “^”, “˜”, etc. used in the description should be described immediately above the original character, but due to restrictions on text notation, Enter immediately before. In the formula, these symbols are written in their original positions. Each variable is a vertical vector unless otherwise specified. Prior to the description of the embodiments, the basic idea of the present invention will be described.

〔この発明の基本的な考え〕
この発明の雑音抑圧装置は、雑音信号を、時間不変の定常成分（バイアス成分）と時間変動を伴う非定常成分（残差成分）とに分解して考える。 [Basic idea of the present invention]
The noise suppression apparatus of the present invention considers a noise signal by decomposing it into a time-invariant stationary component (bias component) and a non-stationary component (residual component) with time fluctuation.

図１の横軸は１次元目の音響特徴量を表し、縦軸は２次元目の音響特徴量を表す。描写の問題から２次元の音響特徴量空間のみを示している。雑音信号をバイアス成分と残差成分の２つから構成されるものと考えると、バイアス成分μ_Ｎは雑音Ｎ_ｔの音響特徴量空間の重心とみなすことができ、残差成分〜Ｎ_ｔは雑音〜Ｎ_ｔとバイアス成分μ_Ｎとの差分であるとみなすことが出来る。 The horizontal axis in FIG. 1 represents the first-dimensional acoustic feature value, and the vertical axis represents the second-dimensional acoustic feature value. Only a two-dimensional acoustic feature space is shown due to the problem of depiction. If the noise signal is considered to be composed of two components, a bias component and a residual component, the bias component μ _N can be regarded as the center of gravity of the acoustic feature amount space of the noise N _t , and the residual component to N _t is noise. it can be regarded as the difference between to N _t and the bias component mu _N.

このように考えると、あるフレームｔにおける雑音の音響特徴量（例えば、２４次元の対数メルスペクトルベクトル）をＮ_ｔとすると、Ｎ_ｔは式（１）に示すように時間変化を伴わないバイアス成分μ_Ｎと残差成分〜Ｎ_ｔとに分解することが可能である。 Considering this, if the acoustic feature of noise in a certain frame t (for example, a 24-dimensional log mel spectrum vector) is N _t , N _t is a bias component that does not change with time as shown in Equation (1). It can be decomposed into mu _N and the residual component to N _t.

そして、この発明では、残差成分の時間変化を予測誤差Ｕ_ｔを伴って式（２）に示すような自己回帰モデルを用いて表現する。 Then, in this invention, it expressed using autoregressive model shown the time variation of the residual component in the equation with the prediction error U _t (2).

ここで、Ｆは自己回帰係数を対角成分に持つ行列である。予測誤差Ｕ_ｔは平均ベクトル０、対角分散行列Σ_Ｕの多次元白色雑音とする。Σ_Ｕの各対角成分は微小な値（例えば0.001）を持つものとする。
式（２）を式（１）に代入することにより対数メルスペクトルベクトルＮ_ｔは、式（３）に示すようなバイアス付き自己回帰モデルで表現することが出来る。 Here, F is a matrix having autoregressive coefficients as diagonal components. The prediction error U _t is a multidimensional white noise with an average vector 0 and a diagonal dispersion matrix Σ _U. Each diagonal elements of sigma _U shall have a very small value (e.g., 0.001).
By substituting Equation (2) into Equation (1), the log mel spectrum vector N _t can be expressed by a biased autoregressive model as shown in Equation (3).

この発明は、式（３）に示したバイアス付き自己回帰モデルに基づいて雑音を推定し、雑音抑圧処理を行うものである。 The present invention estimates noise based on the biased autoregressive model shown in Equation (3) and performs noise suppression processing.

図２に、この発明の雑音抑圧装置１００の機能構成例を示す。その動作フローを図２に示す。雑音抑圧装置１００は、音響特徴抽出部１０と、雑音バイアス成分推定部１１と、雑音残差成分推定部１２と、ＧＭＭ記憶部１３と、雑音抑圧部１４と、を具備する。ＧＭＭ記憶部１３は、無音ＧＭＭ１３０とクリーン音声ＧＭＭ１３１とで構成される。 FIG. 2 shows a functional configuration example of the noise suppression device 100 of the present invention. The operation flow is shown in FIG. The noise suppression device 100 includes an acoustic feature extraction unit 10, a noise bias component estimation unit 11, a noise residual component estimation unit 12, a GMM storage unit 13, and a noise suppression unit 14. The GMM storage unit 13 includes a silent GMM 130 and a clean voice GMM 131.

ＧＭＭ記憶部１３を除く各部の機能は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。 The functions of the units other than the GMM storage unit 13 are realized when a predetermined program is read into a computer including, for example, a ROM, a RAM, and a CPU, and the CPU executes the program.

雑音抑圧装置１００は、目的信号である音声信号に雑音信号が重畳した音響信号ｏ_τを入力信号として、時間軸方向に一定時間幅で始点を移動させながら、一定時間長の音響信号をフレームとして切り出して、１フレーム毎に雑音抑圧処理を行う。音響信号ｏ_τは、図示しないＡ/Ｄ変換器によって離散値化された信号であり、添え字τはその離散信号のサンプル点を表わす。１フレームは、例えば、サンプリング周波数を１６ＫＨｚとした場合にＦｒａｍｅ＝３２０個のサンプル点（1/16KHz×320）の２０ｍｓに設定される。 The noise suppression apparatus 100 uses, as an input signal, an acoustic signal o _{τ in} which a noise signal is superimposed on an audio signal that is a target signal, and an acoustic signal having a certain time length as a frame while moving the start point with a certain time width in the time axis direction. Cut out and perform noise suppression processing for each frame. The acoustic signal _oτ is a signal that is discretely converted by an A / D converter (not shown), and the subscript τ represents a sampling point of the discrete signal. For example, one frame is set to 20 ms of Frame = 320 sample points (1/16 KHz × 320) when the sampling frequency is 16 KHz.

音響特徴抽出部１０は、フレーム毎に複素スペクトルＳｐｃ_ｔと対数メルスペクトルＯ_ｔを、音響特徴量として抽出する（ステップＳ１０）。雑音バイアス成分推定部１１は、対数メルスペクトルＯ_ｔと、無音ＧＭＭ１３０とクリーン音声ＧＭＭ１３１のパラメータとを入力として、雑音信号の音響特徴量空間の重心であるバイアス成分μ_Ｎを最適推定する（ステップＳ１１）。 The acoustic feature extraction unit 10 extracts the complex spectrum Spt _t and the log mel spectrum O _t as acoustic feature amounts for each frame (step S10). The noise bias component estimation unit 11 receives the log mel spectrum O _t and the parameters of the silent GMM 130 and the clean speech GMM 131 as input, and optimally estimates the bias component μ _N that is the center of gravity of the acoustic feature space of the noise signal (step S11). ).

雑音残差成分推定部１２は、対数メルスペクトルＯ_ｔとバイアス成分μ_Ｎと、無音ＧＭＭ１３０とクリーン音声ＧＭＭ１３１のパラメータとを入力として、雑音信号とバイアス成分μ_Ｎとの差分である残差成分〜Ｎ_ｔと二乗誤差分散行列〜Σ_Ｎ，ｔを最適推定する（ステップＳ１２）。雑音抑圧部１４は、対数メルスペクトルＯ_ｔと複素数スペクトルＳｐｃ_ｔと、バイアス成分μ_Ｎと残差成分〜Ｎ_ｔと二乗誤差分散行列〜Σ_Ｎ，ｔと、無音ＧＭＭ１３０とクリーン音声ＧＭＭ１３１のパラメータと、を入力として雑音信号を抑圧した音響信号＾ｓ_τを出力する（ステップＳ１４）。 The noise residual component estimation unit 12 receives the log mel spectrum O _t , the bias component μ _N, and the parameters of the silence GMM 130 and the clean speech GMM 131 as inputs, and the residual component that is the difference between the noise signal and the bias component μ _N ~ N _t and the square error variance matrix ~ Σ _{N, t} are optimally estimated (step S12). The noise suppression unit 14 includes a log mel spectrum O _t , a complex spectrum Sp _t , a bias component μ _N , a residual component ˜N _t , a square error variance matrix ˜Σ _{N, t} , parameters of the silent GMM 130 and the clean speech GMM 131, , And an acoustic signal { _circumflex over (s) _{} τ in} which the noise signal is suppressed is output (step S14).

このように雑音抑圧装置１００は、雑音信号が重畳した音響信号を、時間変化を伴わないバイアス成分と、時間変動を伴う残差成分〜Ｎ_ｔとに分解し、各々の成分に適した推定方法を適用するので雑音抑圧性能を向上させることが可能である。以下、雑音抑圧装置の各機能構成部の動作を詳しく説明する。 Thus noise suppression apparatus 100 estimates how the sound signal the noise signal is superimposed, the bias component without time variation, decomposed into residual component to N _t with time variation, suitable for the respective component Therefore, noise suppression performance can be improved. Hereinafter, the operation of each functional component of the noise suppression device will be described in detail.

音響特徴抽出部１０は、音響信号ｏ_ｔ，ｎを例えばＳｈｉｆｔ＝１６０個のサンプル点ずつ始点を移動させながら切り出す。その際、例えば、式（４）に示すハミング窓のような窓関数ｗ_ｎを掛け合わせて切り出す。 The acoustic feature extraction unit 10 cuts out the acoustic signal ot _{, n} while moving the start point by _, for example, Shift = 160 sample points. At that time, for example, cut out by multiplying a window function w _n, such as a Hamming window as shown in equation (4).

ここでｔはフレーム番号、ｎはフレーム内のｎ番目のサンプル点を表す。切り出し後の音響信号ｏ_ｔ，ｎに対して、２のべき乗で且つフレーム以上の値のＭ点（例えば５１２）の高速フーリエ変換処理を適用して複素数スペクトルＳｐｃ_ｔ＝{Ｓｐｃ_ｔ，０，…，Ｓｐｃ_ｔ，ｍ，…，Ｓｐｃ_{ｔ，Ｍ−1}}を得る。ｍは周波数ビン番号である。 Here, t represents the frame number, and n represents the nth sample point in the frame. A complex spectrum Spc _t = {Spc _{t, 0} ,... Is applied to the cut-out acoustic signal o _{t, n} by applying a fast Fourier transform process of M points (for example, 512) that is a power of 2 and a value equal to or greater than the frame. , Spc _{t, m} ,..., Spc _{t, M−1} }. m is a frequency bin number.

次に、複素スペクトルＳｐｃ_ｔ，ｍの絶対値に対してメルフィルタバンク分析処理と対数化処理を適用してＬ次元（例えばＬ＝２４）の対数メルスペクトルを要素に持つベクトルＯ_ｔ＝{Ｏ_ｔ，０，…，Ｏ_ｔ，ｌ，…，Ｏ_{ｔ，Ｌ−１}}を算出する。ｌはベクトルの要素番号である。 Next, a mel filter bank analysis process and a logarithmization process are applied to the absolute value of the complex spectrum Spt _{t, m} , and a vector O _t = {O having an L-dimensional (eg, L = 24) log mel spectrum as an element. _{t, 0} , ..., _{Ot, l} , ..., _{Ot, L-1} } are calculated. l is the element number of the vector.

音響特徴抽出部１０は、複素数スペクトルＳｐｃ_ｔを雑音抑圧部１４、対数メルスペクトルＯ_ｔを雑音バイアス成分推定部１１と雑音残差成分推定部１２と雑音抑圧部１４に出力する。 Acoustic feature extraction unit 10 outputs the complex spectrum Spc _t noise suppressor 14, a logarithmic Mel spectrum O _t to the noise bias component estimator 11 and the noise residual component estimator 12 and the noise suppressor 14.

〔雑音バイアス成分推定部〕
図４に、雑音バイアス成分推定部１１の機能構成例を示す。その動作フローを図５に示す。雑音バイアス成分推定部１１は、バイアス成分初期値推定手段１１０と、確率モデル生成手段１１１と、期待値計算処理手段１１２と、パラメータ更新処理手段１１３と、収束判定処理手段１１４と、を備える。 [Noise bias component estimation unit]
FIG. 4 shows a functional configuration example of the noise bias component estimation unit 11. The operation flow is shown in FIG. The noise bias component estimation unit 11 includes a bias component initial value estimation unit 110, a probability model generation unit 111, an expected value calculation processing unit 112, a parameter update processing unit 113, and a convergence determination processing unit 114.

バイアス成分初期値推定手段１１０は、対数メルスペクトルＯ_ｔを入力として、その対数メルスペクトルＯ_ｔを所定のフレーム数毎に平均化したバイアス成分初期値＾μ_Ｎ ^{（ｉ＝０）}と、そのバイアス成分初期値＾μ_Ｎ ^{（ｉ＝０）}の対角分散行列Σ_Ｎを推定する（ステップＳ１１０）。 The bias component initial value estimation means 110 receives the log mel spectrum O _t as input, and bias component initial value ^ μ _N ^{(i = 0) obtained} by averaging the log mel spectrum O _t for each predetermined number of frames, and its bias component initial value ^ μ _N ^{(i = 0)} to estimate the diagonal covariance matrix sigma _N in (step S110).

バイアス成分初期値＾μ_Ｎ ^{（ｉ＝０）}は、繰り返しインデックスｉを初期化（ステップＳ１１０a）した後、式（５）で計算される（ステップＳ１１０ｂ）。 The bias component initial value ^ μ _N ^{(i = 0)} is calculated by equation (5) after initializing the repetition index i (step S110a) (step S110b).

ここでＡは、初期値推定に要するフレーム数である（例えばＡ＝１０）。ｉは、ｉ回目の繰り返し回数を示す。バイアス成分の対角分散行列Σ_Ｎを式（６）で推定する（ステップＳ１１０ｂ）。 Here, A is the number of frames required for initial value estimation (for example, A = 10). i indicates the number of repetitions of the i-th time. Diagonal covariance matrix sigma _N of bias component is estimated by equation (6) (step S110b).

対角分散行列Σ_Ｎは、繰り返しのインデックスｉに非依存のパラメータである。
確率モデル生成手段１１１は、バイアス成分初期値＾μ_Ｎ ^{（ｉ＝０）}，Σ_Ｎと、無音ＧＭＭ１３０とクリーン音声ＧＭＭ１３１のパラメータを用いて対数メルスペクトルＯ_ｔの確率モデルをＧＭＭで構成する（ステップＳ１１１）。対数メルスペクトルＯ_ｔの確率モデルは式（７）に示すようなＧＭＭで構成する。 Diagonal covariance matrix sigma _N is a parameter independent of the index i of the repeat.
Probabilistic model generation means 111, the bias component initial value _{^{^ μ N (i = 0)}} , and sigma _N, constitute a probabilistic model of the logarithmic Mel spectrum _{O t} in GMM using the parameters of the silence GMM130 and clean speech GMM131 (step S111). The probability model of the log mel spectrum O _t is composed of a GMM as shown in the equation (7).

ｂ_ｊ ^{Ｂｉａｓ（ｉ）}（Ｏ_ｔ）は、確率モデル生成手段１１１で生成される対数メルスペクトルＯ_ｔの確率モデルであり、ｊ＝０は無音ＧＭＭ１３０のパラメータから生成した確率モデル、ｊ＝１はクリーン音声ＧＭＭ１３１のパラメータから生成した確率モデルであることを示す。また、関数Ｎ（・）は、式（８）で与えられる正規分布の確率密度関数である。 b _j ^{Bias (i)} (O _t ) is a probability model of the log mel spectrum O _t generated by the probability model generation unit 111, j = 0 is a probability model generated from the parameters of the silent GMM 130, and j = 1 is The probability model generated from the parameters of the clean speech GMM 131 is shown. The function N (•) is a probability distribution function of a normal distribution given by Expression (8).

ここで、ｋはＧＭＭに含まれる正規分布の番号、Ｋは総正規分布数である（例えばＫ＝２５６）。また、ｗ_ｊ，ｋは無音ＧＭＭ１３０若しくはクリーン音声ＧＭＭ１３１の混合重み、μ_{Ｏ，ｊ，ｋ} ^（ｉ）とΣ_{Ｏ，ｊ，ｋ} ^（ｉ）はバイアス成分＾μ_Ｎ ^（ｉ）と無音ＧＭＭ１３０若しくはクリーン音声ＧＭＭ１３１のパラメータから生成された対数メルスペクトルＯ_ｔの確率モデルの平均ベクトルと対角分散行列である。
対数メルスペクトルＯ_ｔの確率モデルμ_{Ｏ，ｊ，ｋ} ^（ｉ）と対角分散行列Σ_{Ｏ，ｊ，ｋ} ^（ｉ）は次式で与えられる。 Here, k is the number of the normal distribution included in the GMM, and K is the total number of normal distributions (for example, K = 256). Further, w _{j, k} is the mixing weight of the silent GMM 130 or the clean speech GMM 131, and μ _{O, j, k} ⁽ⁱ⁾ and Σ _{O, j, k} ⁽ⁱ⁾ are the bias component ^ μ _N ⁽ⁱ⁾ and the silent GMM 130 or clean. It is the average vector and diagonal dispersion matrix of the probability model of the log mel spectrum O _t generated from the parameters of the speech GMM 131.
The probability model μ _{O, j, k} ^{(i) of the} log mel spectrum O _t and the diagonal dispersion matrix Σ _{O, j, k} ⁽ⁱ⁾ are given by the following equations.

ここで、μ_{Ｓ，ｊ，ｋ}とΣ_{Ｓ，ｊ，ｋ}は、それぞれ無音ＧＭＭ１３０若しくはクリーン音声ＧＭＭ１３１の平均ベクトルと対角分散行列である。関数ｌｏｇ（・）とｅｘｐ（・）はベクトルの要素毎に演算を行う。また、「１」は全ての要素が１の縦ベクトル、Ｉは単位行列、Ｈ_ｊ、ｋ ^（ｉ）は関数ｈ（・）のヤコビ行列である。 Here, μ _{S, j, k} and ΣS _{, j, k} are an average vector and a diagonal dispersion matrix of the silent GMM 130 or the clean speech GMM 131, respectively. The functions log (•) and exp (•) perform an operation for each element of the vector. “1” is a vertical vector with all elements being 1, I is a unit matrix, and H _{j, k} ⁽ⁱ⁾ is a Jacobian matrix of a function h (•).

期待値計算処理手段１１２は、所定フレーム数毎の繰り返し推定における対数スペクトルＳｐｃ_ｔの確率モデルのコスト関数Ｑ（・）の期待値を計算する（ステップＳ１１２）。コスト関数Ｑ（・）の期待値は、式（１２）により計算する。この計算は、ＥＭアルゴリズムにおけるＥ−ｓｔｅｐに当たる。 Expectation value calculation processing unit 112 calculates the expected value of the cost function Q probabilistic model of the logarithmic spectrum Spc _t (·) in the repeating estimation of every predetermined number of frames (step S112). The expected value of the cost function Q (•) is calculated by the equation (12). This calculation corresponds to E-step in the EM algorithm.

ここで、Ｏ_{０：Ｔ−１}＝{Ｏ_０，…，Ｏ_ｔ，…，Ｏ_Ｔ−１}であり、Ｔは対数メルスペクトルＯ_ｔの総フレーム数、Ｐ_ｔ，ｊ ^（ｉ）とＰ_{ｔ，ｊ，ｋ} ^（ｉ）はそれぞれ次式で与えられるフレームｔにおけるＧＭＭ種別ｊ若しくは正規分布ｋに対する事後確率である。特に、Ｐ_{ｔ，ｊ＝０} ^（ｉ）を音声非存在確率、Ｐ_{ｔ，ｊ＝１} ^（ｉ）を音声存在確率と定義する。 Here, O _{0: T−1} = {O ₀ ,..., O _t ,..., O _T−1 }, where T is the total number of frames of the log mel spectrum O _t , P _{t, j} ⁽ⁱ⁾ and P _{t, j, k} ⁽ⁱ⁾ are posterior probabilities for GMM type j or normal distribution k in frame t given by the following equations, respectively. In particular, P _{t, j = 0} ⁽ⁱ⁾ is defined as a speech non-existence probability, and P _{t, j = 1} ⁽ⁱ⁾ is defined as a speech existence probability.

パラメータ更新処理手段１１３は、コスト関数Ｑ（・）の期待値を最大化するバイアス成分＾μ_Ｎ ^（ｉ）をニュートン法によって最適化して更新する（ステップＳ１１３）。この更新ステップは、ＥＭアルゴリズムにおけるＭ−ｓｔｅｐに当たる。 The parameter update processing unit 113 optimizes and updates the bias component ^ μ _N ⁽ⁱ⁾ that maximizes the expected value of the cost function Q (•) by the Newton method (step S113). This update step corresponds to M-step in the EM algorithm.

バイアス成分＾μ_Ｎ ^（ｉ）の更新は、式（１２）のコスト関数Ｑ（・）を最大にするような＾μ_Ｎ ^（ｉ）を求めることにより行う。その方法は、通常、コスト関数Ｑ（・）のバイアス成分＾μ_Ｎ ^（ｉ）に関する偏微分を０にすることにより求める。しかし、式（１２）のコスト関数Ｑ（・）は非線形関数で与えられるため、バイアス成分＾μ_Ｎ ^（ｉ）の解析解を求めることは困難である。
従って、パラメータ更新処理手段１１３は、次式のニュートン法によってバイアス成分＾μ_Ｎ ^（ｉ）を最適化する。 Updating bias component ^ mu _N ⁽ⁱ⁾ is carried out by obtaining a ^ mu _N ⁽ⁱ⁾ that maximizes the cost function Q a (·) in equation (12). This method is usually obtained by setting the partial derivative of the cost function Q (•) with respect to the bias component ^ μ _N ⁽ⁱ⁾ to zero. However, since the cost function Q (·) in Expression (12) is given by a nonlinear function, it is difficult to obtain an analytical solution of the bias component ^ μ _N ⁽ⁱ⁾ .
Therefore, the parameter update processing unit 113 optimizes the bias component ^ μ _N ⁽ⁱ⁾ by the following Newton method.

ここで、∇Ｑ^（ｉ）と∇^２Ｑ^（ｉ）は、それぞれｉ回目の繰り返し推定におけるコスト関数Ｑ（・）の勾配ベクトルとヘッセ行列である。
収束判定処理手段１１４は、バイアス成分＾μ_Ｎ ^（ｉ）が収束するまで確率モデル生成手段１１１と期待値計算処理手段１１２とパラメータ更新処理手段１１３の動作を繰り返す（ステップＳ１１４）。
収束条件の例を次式に示す。η＝0.0001とする。 Here, ∇ Q ⁽ⁱ⁾ and ∇ ² Q ⁽ⁱ⁾ are the gradient vector and Hessian of the cost function Q (•) in the i-th iterative estimation, respectively.
The convergence determination processing unit 114 repeats the operations of the probability model generation unit 111, the expected value calculation processing unit 112, and the parameter update processing unit 113 until the bias component ^ _N ⁽ⁱ⁾ converges (step S114).
An example of the convergence condition is shown in the following equation. It is assumed that η = 0.0001.

式（１６）の収束条件を満たす場合はμ_Ｎ＝＾μ_Ｎ ^（ｉ）として、雑音バイアス成分推定部１１の処理を終了する（ステップＳ１１４ａのＹｅｓ）。満たさない場合は繰り返しのインデックスｉをインクリメントして（ステップＳ１１４ｂ）、確率モデル生成ステップＳ１１１以降の処理を繰り返す。 When the convergence condition of Expression (16) is satisfied, μ _N = ^ μ _N ⁽ⁱ⁾ is set, and the processing of the noise bias component estimation unit 11 is terminated (Yes in Step S114a). If not, the repetitive index i is incremented (step S114b), and the processing after the probability model generation step S111 is repeated.

〔雑音残差成分推定部〕
図６に、雑音残差成分推定部１２の機能構成例を示す。その動作フローを図７に示す。雑音残差成分推定部１２は、残差成分初期値推定手段１２０と、残差成分予測処理手段１２１と、残差成分推定処理手段１２２と、確率モデル生成処理手段１２３と、加重平均処理手段１２４と、期待値計算処理手段１２５と、パラメータ更新処理手段１２６と、収束判定処理手段１２７と、を備える。 [Noise residual component estimation unit]
FIG. 6 shows a functional configuration example of the noise residual component estimation unit 12. The operation flow is shown in FIG. The noise residual component estimation unit 12 includes a residual component initial value estimation unit 120, a residual component prediction processing unit 121, a residual component estimation processing unit 122, a probability model generation processing unit 123, and a weighted average processing unit 124. And an expected value calculation processing means 125, a parameter update processing means 126, and a convergence determination processing means 127.

残差成分初期値推定手段１２０は、対数メルスペクトルＯ_ｔと雑音バイアス成分推定部１１が出力するバイアス成分μ_Ｎの差である残差成分を、所定フレーム数毎に平均して残差成分の初期値を推定する（ステップＳ１２０）。残差成分の初期値は、繰り返しのインデックスｉに非依存のパラメータとして次式によって推定し、全ての繰り返し推定の初期値として利用する。 The residual component initial value estimating means 120 averages the residual component, which is the difference between the log mel spectrum O _t and the bias component μ _N output from the noise bias component estimating unit 11, for each predetermined number of frames. An initial value is estimated (step S120). The initial value of the residual component is estimated by the following equation as a parameter independent of the iteration index i, and is used as the initial value for all iteration estimation.

また、残差成分初期値推定手段１２０は、自己回帰行列Ｆの初期値を以下のように設定する。各要素に対して、自己回帰係数の次元は例えば１次元とする。 Residual component initial value estimating means 120 sets the initial value of autoregressive matrix F as follows. For each element, the dimension of the autoregressive coefficient is, for example, one dimension.

残差成分予測処理手段１２１は、１フレーム前の残差成分推定値と自己回帰行列を乗じて、現在のフレームの残差成分予測値を自己回帰モデルによって予測する（ステップＳ１２１）。現在のフレームのパラメータは、次式に示すように自己回帰モデルによって予測する。 The residual component prediction processing unit 121 multiplies the residual component estimated value of the previous frame by the autoregressive matrix and predicts the residual component predicted value of the current frame by the autoregressive model (step S121). The parameters of the current frame are predicted by an autoregressive model as shown in the following equation.

式（２０）と式（２１）において、〜Ｎ_{ｔ|ｔ−１} ^（ｉ），〜Σ_{Ｎ，ｔ|ｔ−１} ^（ｉ）はｉ回目の繰り返し推定、及びフレームｔにおける残差成分〜Ｎ_ｔの予測値であり、ｔ＝０の場合は初期値を用いて式（２２）と（２３）に示すように予測処理を行う。 In Expressions (20) and (21), ˜N _{t | t−1} ⁽ⁱ⁾ , ˜Σ _{N, t | t−1} ⁽ⁱ⁾ are the i-th iterative estimation and the residual component in frame t a predicted value of _t, in the case of t = 0 the prediction processing carried out as shown in equation (22) (23) using the initial value.

残差成分推定処理手段１２２は、対数メルスペクトルＯ_ｔと雑音バイアス成分推定部１１が出力するバイアス成分μ_Ｎと、残差成分予測処理手段１２１が予測した残差成分予測値〜Ｎ_{ｔ|ｔ−１} ^（ｉ），〜Σ_{Ｎ，ｔ|ｔ−１} ^（ｉ）と無音ＧＭＭ１３０とクリーン音声ＧＭＭ１３１のパラメータμ_{Ｓ，ｊ，ｋ}とΣ_{Ｓ，ｊ，ｋ}を入力として、それぞれのＧＭＭに含まれる正規分布の合計数と同数の残差成分推定値候補を計算する（ステップＳ１２２）。
各ＧＭＭの推定は次式により行う。 The residual component estimation processing unit 122 includes a log mel spectrum O _t , a bias component μ _N output from the noise bias component estimation unit 11, and a residual component predicted value predicted by the residual component prediction processing unit 121 to N _{t | t. −1} ⁽ⁱ⁾ , ˜N _{, t | t−1} ⁽ⁱ⁾ , parameters μ _{S, j, k} and Σ _{S, j, k} of the silent GMM 130 and the clean speech GMM 131 are input and are included in each GMM. The same number of residual component estimation value candidates as the total number of normal distributions are calculated (step S122).
Each GMM is estimated by the following equation.

上式において、〜Ｎ_{ｔ，ｊ，ｋ} ^（ｉ），〜Σ_{Ｎ，ｔ，ｊ，ｋ} ^（ｉ）はｉ回目の繰り返し推定、及びフレームｔにおける残差成分〜Ｎ_ｔの推定値候補である。
確率モデル生成処理手段１２３は、残差成分推定処理手段１２２で計算された残差成分推定値候補〜Ｎ_{ｔ，ｊ，ｋ} ^（ｉ），〜Σ_{Ｎ，ｔ，ｊ，ｋ} ^（ｉ）と、雑音バイアス成分推定部１１が出力するバイアス成分μ_Ｎと、無音ＧＭＭ１３０とクリーン音声ＧＭＭ１３１のパラメータμ_{Ｓ，ｊ，ｋ}とΣ_{Ｓ，ｊ，ｋ}を入力として、現在のフレームｔにおける対数メルスペクトルのＧＭＭパラメータ〜μ_{O，ｔ，ｊ，ｋ} ^（ｉ），〜Σ_{O，ｔ，ｊ，ｋ} ^（ｉ）を生成する（ステップＳ１２３）。
対数メルスペクトルＯ_ｔのフレームｔにおけるＧＭＭのパラメータを次式に示すように生成する。 In the above equation, ˜N _{t, j, k} ⁽ⁱ⁾ , ˜Σ _{N, t, j, k} ⁽ⁱ⁾ are the i-th iterative estimation and residual value candidates for the residual component ˜N _t in frame t. .
The probability model generation processing unit 123 includes residual component estimation value candidates ~ N _{t, j, k} ⁽ⁱ⁾ , ~ Σ _{N, t, j, k} ⁽ⁱ⁾ calculated by the residual component estimation processing unit 122; Using the bias component μ _N output from the noise bias component estimator 11 and the parameters μ _{S, j, k} and Σ _{S, j, k} of the silent GMM 130 and the clean speech GMM 131 as inputs, the GMM of the log mel spectrum at the current frame t Parameters ~ μ 0 _{, t, j, k} ⁽ⁱ⁾ , ~ Σ 0 _{, t, j, k} ⁽ⁱ⁾ are generated (step S123).
The parameters of GMM in the frame t of the log mel spectrum O _t are generated as shown in the following equation.

加重平均処理手段１２４は、対数メルスペクトルＯ_ｔと、現在のフレームにおける対数メルスペクトルのＧＭＭパラメータを入力として、音声非存在確率／存在確率と事後確率を計算し、残差成分推定値候補を加重平均して残差成分の推定値を計算する（ステップＳ１２４）。式（３１）に示すように加重平均することにより、ｉ回目の繰り返し推定及びフレームｔにおける残差成分の推定値を得る。 The weighted average processing means 124 receives the log mel spectrum O _t and the GMM parameters of the log mel spectrum in the current frame as input, calculates the speech non-existence probability / existence probability and the posterior probability, and weights residual component estimation value candidates. An estimated value of the residual component is calculated by averaging (step S124). By performing weighted averaging as shown in Expression (31), an i-th iterative estimation and an estimated value of the residual component in frame t are obtained.

期待値計算処理手段１２５は、所定フレーム数毎の繰り返し推定における対数メルスペクトルの確率モデルのコスト関数Ｑ（・）の期待値を、並列非線形カルマンフィルタの確率モデルで計算する（ステップＳ１２５）。この計算は、ＥＭアルゴリズムにおけるＥ−ｓｔｅｐに当たる。
フレームｔにおける並列非線形カルマンフィルタの確率モデルと尤度ｂ_ｊ ^ＭＮＫＦ（Ｏ_ｔ）は式（３５）に示すように構成される。 The expected value calculation processing means 125 calculates the expected value of the cost function Q (•) of the logarithmic mel spectrum probability model in the iterative estimation for each predetermined number of frames using the probability model of the parallel nonlinear Kalman filter (step S125). This calculation corresponds to E-step in the EM algorithm.
The probabilistic model and likelihood b _j ^MNKF (O _t ) of the parallel nonlinear Kalman filter in the frame t is configured as shown in Expression (35).

すなわち、並列非線形カルマンフィルタの確率モデルのコスト関数Ｑ（・）の期待値は次式より得られる。 That is, the expected value of the cost function Q (•) of the probability model of the parallel nonlinear Kalman filter is obtained from the following equation.

式（３６）において、並列非線形カルマンフィルタは各フレームｔにて確率モデルが変化するため、計算の効率化のため、コスト関数Ｑ（・）の期待値を以下に示すように再帰的に計算する。 In Equation (36), the parallel nonlinear Kalman filter changes the probability model at each frame t, so that the expected value of the cost function Q (•) is recursively calculated as shown below in order to improve the calculation efficiency.

フレームｔにてコスト関数Ｑ（・）の期待値を計算すると、次のフレームｔ＋１の処理に移る（ステップＳ１２５ｂ）。フレームｔ≧Ｔならば、ｉ回目の繰り返し推定における並列非線形カルマンフィルタによる残差成分の推定を終了する（ステップＳ１２５ｃのＹｅｓ）。 When the expected value of the cost function Q (•) is calculated in the frame t, the process proceeds to the next frame t + 1 (step S125b). If the frame is t ≧ T, the residual component estimation by the parallel nonlinear Kalman filter in the i-th iterative estimation is terminated (Yes in step S125c).

パラメータ更新処理手段１２６は、コスト関数Ｑ（・）の期待値を最大化するように自己回帰行列＾Ｆ^（ｉ）を更新する（ステップＳ１２６）。コスト関数Ｑ（・）の期待値を最大化する自己回帰行列＾Ｆ^（ｉ）は、コスト関数Ｑ（・）の＾Ｆ^（ｉ）に関する偏微分を０にすることにより求める。すなわち、自己回帰行列＾Ｆ^（ｉ）は次式により与えられる。 The parameter update processing means 126 updates the autoregressive matrix ^ F ⁽ⁱ⁾ so as to maximize the expected value of the cost function Q (•) (step S126). The autoregressive matrix {circumflex over ⁽ F ⁾ } ⁽ⁱ⁾ that maximizes the expected value of the cost function Q (•) is obtained by setting the partial differentiation of the cost function Q (•) with respect to {circumflex over ⁽ F)} ⁽ⁱ⁾ to zero. That is, the autoregressive matrix ^ F ⁽ⁱ⁾ is given by the following equation.

収束判定処理手段１２７は、自己回帰行列＾Ｆ^（ｉ）が収束するまで残差成分予測処理手段１２１と残差成分推定処理手段１２２と確率モデル生成処理手段１２３と加重平均処理手段１２４と期待値計算処理手段１２５とパラメータ更新処理手段１２６の動作を繰り返す（ステップＳ１２７ａのＮｏ）。
収束条件の例を次式に示す。η＝0.0001とする。 Convergence determination processing means 127 includes residual component prediction processing means 121, residual component estimation processing means 122, probability model generation processing means 123, weighted average processing means 124, and expected value until autoregressive matrix ^ F ⁽ⁱ⁾ converges. The operations of the calculation processing unit 125 and the parameter update processing unit 126 are repeated (No in step S127a).
An example of the convergence condition is shown in the following equation. It is assumed that η = 0.0001.

式（３９）の収束条件を満たす場合はＦ＝＾Ｆ^（ｉ）として、パラメータ更新処理手段１２６の処理を終了する（ステップＳ１２７ａのＹｅｓ）。満たさない場合は繰り返しのインデックスｉをインクリメントした後にｔ＝０として（ステップＳ１２７ｂ）、残差成分予測処理ステップＳ１２１以降の処理を繰り返す。 When the convergence condition of Expression (39) is satisfied, F = ^ F ⁽ⁱ⁾ is set, and the process of the parameter update processing unit 126 is ended (Yes in Step S127a). If not satisfied, the repetitive index i is incremented, t = 0 is set (step S127b), and the processing after the residual component prediction processing step S121 is repeated.

〔雑音抑圧部〕
図８に、雑音抑圧部１４の機能構成例を示す。雑音抑圧部１４は、雑音抑圧フィルタ推定部１４０と、雑音抑圧フィルタ適用部１４１と、を備える。雑音抑圧フィルタ推定部１４０は、対数メルスペクトルＯ_ｔと、バイアス成分μ_Ｎと、残差成分〜Ｎ_ｔ，〜Σ_Ｎ，ｔと、無音ＧＭＭ１３０とクリーン音声ＧＭＭ１３１のパラメータＷ_ｊ，ｋ，μ_{Ｓ，ｊ，ｋ}，Σ_{Ｓ，ｊ，ｋ}と、を入力として雑音抑圧フィルタＷ_ｔ，ｍ ^Ｌｉｎを推定する。 (Noise suppression part)
FIG. 8 shows a functional configuration example of the noise suppression unit 14. The noise suppression unit 14 includes a noise suppression filter estimation unit 140 and a noise suppression filter application unit 141. The noise suppression filter estimator 140 includes the log mel spectrum O _t , the bias component μ _N , the residual components ˜N _t , ˜Σ _{N, t} , the parameters W _{j, k} , μ _{S of the} silence GMM 130 and the clean speech GMM 131. _{, J, k} , Σ _{S, j, k} are input, and the noise suppression filter W _{t, m} ^Lin is estimated.

雑音抑圧フィルタ適用部１４１は、複素スペクトルＳｐｃ_ｔと、雑音抑圧フィルタＷ_ｔ，ｍ ^Ｌｉｎを入力として雑音を抑圧した雑音抑圧信号＾ｓ_τを出力する。雑音抑圧フィルタ推定部１４０と、雑音抑圧フィルタ適用部１４１の動作を詳しく説明する。 Noise suppression filter applying unit 141 outputs the complex spectrum Spc _t, the noise suppression filter _W ^t, the noise suppression signal ^ s _tau that suppresses noise of ^{m Lin} as input. The operations of the noise suppression filter estimation unit 140 and the noise suppression filter application unit 141 will be described in detail.

〔残響抑圧フィルタ推定部〕
図９に、雑音抑圧フィルタ推定部１４０の機能構成例を示す。その動作フローを図１０に示す。雑音抑圧フィルタ推定部１４０は、確率モデル生成処理手段１４００と、確率計算処理手段１４０１と、雑音抑圧フィルタ推定処理手段１４０２と、雑音抑圧フィルタ変換処理手段１４０３と、を備える。 [Reverberation suppression filter estimation unit]
FIG. 9 shows a functional configuration example of the noise suppression filter estimation unit 140. The operation flow is shown in FIG. The noise suppression filter estimation unit 140 includes a probability model generation processing unit 1400, a probability calculation processing unit 1401, a noise suppression filter estimation processing unit 1402, and a noise suppression filter conversion processing unit 1403.

確率モデル生成処理手段１４００は、雑音バイアス推定部１１が出力するバイアス成分μ_Ｎと、雑音残差成分推定部１２が出力する残差成分〜Ｎ_ｔ，〜Σ_Ｎ，ｔと、無音ＧＭＭ１３０とクリーン音声ＧＭＭのパラメータμ_{Ｓ，ｊ，ｋ}，Σ_{Ｓ，ｊ，ｋ}と、を入力として、対数メルスペクトルＯ_ｔのフレームｔにおけるＧＭＭのパラメータを以下のように生成する（ステップＳ１４００）。 The probabilistic model generation processing means 1400 includes a bias component μ _N output from the noise bias estimation unit 11, residual components output from the noise residual component estimation unit 12 ˜N _t , ˜Σ _{N, t} , a silent GMM 130, and a clean speech GMM parameters _{μ S, j, k, Σ} S, j, as inputs and _k, a, is generated as follows the parameters of the GMM in the frame t of the logarithmic Mel spectrum _{O t} (step S1400).

確率計算処理手段１４０１は、対数メルスペクトルＯ_ｔと、確率モデル生成処理手段１４０が出力するＧＭＭパラメータと、無音ＧＭＭ１３０とクリーン音声ＧＭＭのパラメータｗ_ｊ，ｋとを入力として、音声非存在確率／存在確率Ｐ_ｔ，ｊと事後確率Ｐ_{ｔ，ｊ，ｋ}を計算する。
音声非存在確率／存在確率Ｐ_ｔ，ｊは式（４３）、事後確率Ｐ_{ｔ，ｊ，ｋ}は式（４４）で計算する（ステップＳ１４０１）。 The probability calculation processing means 1401 receives the log mel spectrum O _t , the GMM parameters output from the probability model generation processing means 140, and the silence w / GMM 130 and clean voice GMM parameters w _{j, k} as inputs, and the voice non-existence probability / existence Probability P _{t, j} and posterior probability P _{t, j, k} are calculated.
The voice non-existence probability / presence probability P _{t, j} is calculated by equation (43), and the posterior probability P _{t, j, k} is calculated by equation (44) (step S1401).

雑音抑圧フィルタ推定処理手段１４０２は、バイアス成分μ_Ｎと残差成分〜Ｎ_ｔ，〜Σ_Ｎ，ｔと、事後確率Ｐ_{ｔ，ｊ，ｋ}と音声非存在確率／存在確率Ｐ_ｔ，ｊと、を入力としてメル周波数軸上での雑音抑圧フィルタＷ_ｔ，ｌ ^Ｍｅｌを次式により推定する（ステップＳ１４０２）。次式はベクトル要素毎の表記である。 The noise suppression filter estimation processing means 1402 includes a bias component μ _N , residual components ˜N _t , ˜Σ _{N, t} , posterior probability P _{t, j, k} , speech non-existence probability / existence probability P _{t, j} , As an input, the noise suppression filter W _{t, l} ^Mel on the mel frequency axis is estimated by the following equation (step S1402). The following expression is a notation for each vector element.

雑音抑圧フィルタ変換処理手段１４０３は、メル周波数軸上での雑音抑圧フィルタＷ_ｔ，ｌ ^Ｍｅｌを３次スプライン補間により線形周波数軸上での雑音抑圧フィルタＷ_ｔ，ｍ ^Ｌｉｎに変換する（ステップＳ１４０３）。 The noise suppression filter conversion processing unit 1403 converts the noise suppression filter W _{t, l} ^Mel on the mel frequency axis into the noise suppression filter W _{t, m} ^Lin on the linear frequency axis by cubic spline interpolation (step S1403). .

〔雑音抑圧フィルタ適用部〕
図１１に、雑音抑圧フィルタ適用部１４１の機能構成例を示す。その動作フローを図１２に示す。雑音抑圧フィルタ適用部１４１は、フィルタリング処理手段１４１０と、逆高速フーリエ変換処理手段１４１１と、波形連結処理手段１４１２と、を備える。
フィルタリング処理手段１４１０は、複素数スペクトルＳｐｃ_ｔに雑音抑圧フィルタＷ_ｔ，ｌ ^Ｍｅｌを掛け合わせることにより雑音抑圧された複素数スペクトル＾Ｓ_ｔ，ｍ（式（４６））を出力する（ステップＳ１４１０）。式（４６）はベクトルの要素毎の標記である。 [Noise suppression filter application unit]
FIG. 11 shows a functional configuration example of the noise suppression filter application unit 141. The operation flow is shown in FIG. The noise suppression filter application unit 141 includes filtering processing means 1410, inverse fast Fourier transform processing means 1411, and waveform connection processing means 1412.
The filtering processing means 1410 outputs the complex spectrum ＳS _{t, m} (Equation (46)) noise-suppressed by multiplying the complex spectrum Spc _t by the noise suppression filter W _{t, l} ^Mel (step S1410). Expression (46) is a notation for each vector element.

逆高速フーリエ変換処理手段１４１１は、複素数スペクトル＾Ｓ_ｔ，ｍに対して逆高速フーリエ変換を適用することにより、フレームｔにおける雑音抑圧音声＾ｓ_ｔ，ｎを得る（ステップＳ１４１１）。
波形連結処理手段１４１２は、各フレームの雑音抑圧音声＾ｓ_ｔ，ｎを、次式に示すように窓関数ｗ_ｎを解除しながら連結して連続した雑音抑圧音声＾ｓ_ｔ，ｎを得る（ステップＳ１４１２）。 The inverse fast Fourier transform processing unit 1411 obtains the noise-suppressed speech ｓ _{t t, n} in the frame t by applying the inverse fast Fourier transform to the complex spectrum ＳS _{t, m} (step S1411).
Waveform connection process unit 1412, a noise reduced speech ^ s t of each _{frame, n,} the noise reduced speech linked while releasing the window function w _n as shown in the following equation was continuously ^ s _t, obtaining _n ( Step S1412).

〔評価実験結果〕
この発明の効果を確認する目的で、この発明の雑音抑圧装置の雑音抑圧性能を評価する実験を行った。先ず、実験条件を説明する。 [Results of evaluation experiment]
In order to confirm the effect of the present invention, an experiment was conducted to evaluate the noise suppression performance of the noise suppression device of the present invention. First, experimental conditions will be described.

評価用データには、ＩＰＡ（Information-technology promotion agency,Japan）-98-TestSetのうち、男声２３名が発声したデータ１００文を用いており、これらの音声データに対して、空港ロビー、駅プラットフォーム、街頭にて別途収録した雑音をそれぞれＳ/Ｎ比０dB，５dB，１０dBにて計算機上で重畳した。つまり、雑音３種類×Ｓ/Ｎ比３種類の９種類の評価データを作成した。 The evaluation data uses 100 sentences of 23 voices from IPA (Information-technology promotion agency, Japan) -98-TestSet. These voice data are used for airport lobby and station platform. The noise recorded separately on the street was superimposed on the computer with S / N ratios of 0 dB, 5 dB, and 10 dB, respectively. That is, nine types of evaluation data of three types of noise × three types of S / N ratios were created.

それぞれの音声データは、サンプリング周波数１６KHz、量子化ビット数１６ビットで離散サンプリングされたモノラル信号である。この音響信号に対し、１フレームの時間長を２０ｍｓ（１フレーム＝３２０サンプル点）とし、１０ｍｓ毎にフレームの始点を移動させて音響特徴抽出部１０を適用した。 Each audio data is a monaural signal discretely sampled at a sampling frequency of 16 KHz and a quantization bit number of 16 bits. The acoustic feature extraction unit 10 was applied to this acoustic signal by setting the time length of one frame to 20 ms (1 frame = 320 sample points) and moving the start point of the frame every 10 ms.

無音ＧＭＭ１３０、クリーン音声ＧＭＭ１３１には、Ｌ＝２４次元の対数メルスペクトルを音響特徴量とする混合分布数Ｋ＝２５６のＧＭＭを用い、それぞれ無音信号、クリーン音声信号を用いて学習した。 As the silent GMM 130 and the clean speech GMM 131, GMMs having a mixed distribution number K = 256 having an L = 24-dimensional logarithmic mel spectrum as acoustic features are used, and learning is performed using the silent signal and the clean speech signal, respectively.

残差成分初期値推定手段１２０の自己回帰係数の次元は１次元とした。初期値推定に要するフレーム数はＡ＝１０とした。収束判定処理手段１１４と１２７の収束条件のパラメータはη＝0.0001とした。残差成分予測処理ステップＳ１２１において、Σ_Ｕの各対角成分には0.001を与えた。
性能の評価は音声認識により行い、評価尺度は次式の単語誤り率ＷＥＲで行った。 The dimension of the autoregressive coefficient of the residual component initial value estimating means 120 is one dimension. The number of frames required for initial value estimation is A = 10. The parameter of the convergence condition of the convergence determination processing means 114 and 127 is η = 0.0001. In residual component prediction processing step S121, the respective diagonal components of the sigma _U gave 0.001.
The performance was evaluated by speech recognition, and the evaluation scale was the word error rate WER of the following equation.

ここで、Ｎは総単語数、Ｄは脱落誤り単語数、Ｓは置換誤り単語数、Ｉは挿入誤り単語数であり、ＷＥＲの値が小さいほど音声認識性能が高いことを示す。 Here, N is the total number of words, D is the number of dropped error words, S is the number of replacement error words, and I is the number of insertion error words. The smaller the WER value, the higher the speech recognition performance.

音声認識は、有限状態トランスデューサに基づく認識器（T.hori, et al., “Efficient WFST-based one-pass decoding with on-the fly hypothesis rescoring in extremely large vocabulary continuous speech recognition,” IEEE Trans. On ALSP, vol. 15, no. 4. pp.1352-1365, May 2007.）により行い、音響モデルには話者独立のTriphon ＨＭＭを用いており、各ＨＭＭの構造は３状態のLeft-to-right型ＨＭＭであり、各状態は１６の正規分布を持つ。ＨＭＭ全体の状態数は3,000である。 Speech recognition is based on a finite state transducer based recognizer (T.hori, et al., “Efficient WFST-based one-pass decoding with on-the fly hypothesis rescoring in extremely large vocabulary continuous speech recognition,” IEEE Trans. On ALSP. , vol. 15, no. 4. pp.1352-1365, May 2007.) The speaker model is a triphone HMM independent of speakers, and each HMM has a three-state left-to-right structure. Each state has 16 normal distributions. The total number of HMM states is 3,000.

音声認識の音響特徴量は、１フレームの時間長を２０ｍｓとし、１０ｍｓ毎にフレームの始点を移動させて分析した１２次元のＭＦＣＣ（Mel-frequency cepstral coefficient）、対数パワー値、各々の１次及び２次の回帰係数を含む合計３９次元のベクトルである。また、言語モデルにはTri-gramを用い語彙数は20,000単語である。
表１に評価結果を示す。 The acoustic feature quantity for speech recognition is a 12-dimensional MFCC (Mel-frequency cepstral coefficient), logarithmic power value, each primary and A total 39-dimensional vector including a quadratic regression coefficient. The language model is Tri-gram and the vocabulary is 20,000 words.
Table 1 shows the evaluation results.

このようにこの発明の雑音抑圧装置は、従来技術よりも優れた雑音抑圧性能を示すことが確認できた。図１３に、時間領域の音声波形を示す。図１３（ａ）は、目的信号である音声信号に空港ロビー雑音を重畳させた音響信号ｏ_τである。図１３（ｂ）は、この発明の雑音抑圧装置にその音響信号ｏ_τを入力して得られた雑音抑圧音声＾ｓ_τである。雑音が効果的に抑圧されている様子が分かる。 Thus, it has been confirmed that the noise suppression device of the present invention exhibits a noise suppression performance superior to that of the prior art. FIG. 13 shows an audio waveform in the time domain. FIG. 13A shows an acoustic signal o _τ in which airport lobby noise is superimposed on a voice signal that is a target signal. Figure 13 (b) is a noise reduced speech ^ s _tau obtained by inputting the sound signal o _tau in the noise suppressing device of the present invention. You can see how the noise is effectively suppressed.

以上述べたようにこの発明の雑音抑圧装置は、雑音信号が重畳した音響信号を、時間変化を伴わないバイアス成分と時間変動を伴う残差成分とに分解して、それぞれの成分を高精度に推定するので、雑音抑圧性能を高めることが出来る。 As described above, the noise suppression apparatus of the present invention decomposes an acoustic signal on which a noise signal is superimposed into a bias component that does not change with time and a residual component that changes with time, and each component is highly accurate. Since the estimation is performed, the noise suppression performance can be improved.

なお、説明した実施例では、窓関数ｗ_ｎにハミング窓を用いて説明したが、方形窓、ハニング窓、ブラックマン窓などの他の窓関数を用いても良い。また、無音ＧＭＭ１３０とクリーン音声ＧＭＭ１３１の代わりに、音声信号の確率モデルとしてＨＭＭ（Hidden Markov Model）等の他の確率モデルを用いても良い。また、無音ＧＭＭ１３０とクリーン音声ＧＭＭ１３１の２つのＧＭＭだけでなく、より多くのＧＭＭを用いても良い。また、自己回帰係数の次元を２以上に設定しても良い。そうすることで自己回帰係数の次数に応じて残差成分の推定性能が向上することが期待される。また、加重平均処理手段１２４において重み付け平均ではなく、最大の重みを持つ推定結果をそのまま用いるようにしても良い。 In the embodiment described, has been described with reference to hamming window to the window function w _n, rectangular window, Hanning window, may be used other window functions, such as Blackman windows. Further, instead of the silent GMM 130 and the clean speech GMM 131, another probability model such as an HMM (Hidden Markov Model) may be used as the probability model of the speech signal. In addition to the two GMMs, the silent GMM 130 and the clean voice GMM 131, more GMMs may be used. The dimension of the autoregressive coefficient may be set to 2 or more. By doing so, it is expected that the estimation performance of the residual component is improved according to the order of the autoregressive coefficient. Further, the weighted average processing means 124 may use the estimation result having the maximum weight instead of the weighted average.

なお、上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。 When the processing means in the above apparatus is realized by a computer, the processing contents of functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

また、上記方法及び装置において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 Further, the processes described in the above method and apparatus are not only executed in time series according to the order of description, but also may be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Good.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、DVD（Digital Versatile Disc）、DVD-RAM（Random Access Memory）、CD-ROM（Compact Disc Read Only Memory）、CD-R（Recordable）/RW（ReWritable）等を、光磁気記録媒体として、MO（Magneto Optical disc）等を、半導体メモリとしてEEP-ROM（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording media, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 This program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

Claims

An acoustic feature extraction unit that extracts an acoustic signal in which a noise signal is superimposed on an audio signal that is a target signal, and extracts a complex spectrum and a log mel spectrum as acoustic features for each frame with a certain time length of the acoustic signal as a frame; ,
A noise bias component estimator that optimally estimates a bias component that is the center of gravity of the acoustic feature amount space of the noise signal, using the log mel spectrum, silence GMM, and clean speech GMM parameters as inputs;
A noise residual component estimator that optimally estimates a residual component that is a difference between the noise signal and the bias component by using the log mel spectrum, the bias component, and the parameters of the silent GMM and the clean speech GMM as inputs. When,
A noise suppression unit that outputs an acoustic signal in which the noise signal is suppressed with the log mel spectrum, the complex spectrum, the bias component and the residual component, the silent GMM, and the parameters of the clean speech GMM as inputs; ,
A noise suppression device comprising:

The noise suppression device according to claim 1,
A noise suppression apparatus, characterized in that the noise signal is represented by a sum of the bias component and the residual component expressed by an autoregressive model, and a time series of the noise signal is estimated by a biased autoregressive model.

In the noise suppression device according to claim 1 or 2,
The noise bias component estimator is
With the log mel spectrum as an input, a bias component initial value obtained by averaging the log mel spectrum every predetermined number of frames, and bias component initial value estimating means for estimating a diagonal dispersion matrix of the bias component initial value;
Probability model generation means for constructing a logarithmic mel spectrum probability model by GMM using the bias component initial value and parameters of silent GMM and clean speech GMM;
An expected value calculation processing means for calculating an expected value of the cost function of the probability model of the complex spectrum in the repeated estimation for each predetermined number of frames;
Parameter update processing means for optimizing and updating the bias component that maximizes the expected value of the cost function by the Newton method;
A convergence determination processing unit that repeats the operations of the probability model generation unit, the expected value calculation processing unit, and the parameter update processing unit until the bias component converges;
A noise suppression device comprising:

The noise suppression device according to any one of claims 1 to 3,
The noise residual component estimator is
A residual component initial value estimating means for averaging a residual component, which is a difference between the log mel spectrum and the bias component, every predetermined number of frames and estimating an initial value of the residual component;
A residual component prediction processing means for multiplying the residual component estimation value of the previous frame by the autoregressive matrix and predicting the residual component prediction value of the current frame by the autoregressive model;
The logarithmic mel spectrum, the bias component, the residual component prediction value, the silent GMM and the clean speech GMM parameters are input, and the residual component estimation is the same as the total number of normal distributions included in the respective GMMs. Residual component estimation processing means for calculating value candidates;
Probability model generation processing means for generating GMM parameters of the log mel spectrum in the current frame by using the residual component estimation value candidates, the silence GMM and the clean speech GMM parameters as inputs,
Using the log mel spectrum and the GMM parameters of the log mel spectrum in the current frame as inputs, the speech non-existence probability / presence probability and posterior probability are calculated, and the residual component estimation value candidates are weighted and averaged. A weighted average processing means for calculating an estimated value;
Expected value calculation processing means for calculating an expected value of the cost function of the logarithmic mel spectrum probability model in the iterative estimation for each predetermined number of frames using a parallel nonlinear Kalman filter probability model;
Parameter update processing means for updating the autoregressive matrix so as to maximize the expected value of the cost function;
Until the autoregressive matrix converges, the residual component prediction processing means, the residual component estimation processing means, the probability model generation processing means, the weighted average processing means, the expected value calculation processing means, and the parameter update processing means Convergence determination processing means for repeating the operation;
A noise suppression device comprising:

The noise suppression device according to claim 4,
The parameter update processing means includes
A noise suppression apparatus, wherein the autoregressive matrix is optimized using a time series of the residual components and an EM algorithm.

An acoustic feature extraction process in which an acoustic signal in which a noise signal is superimposed on an audio signal, which is a target signal, is input, and a complex spectrum and a log mel spectrum are extracted as acoustic features for each frame with a certain time length of the acoustic signal as a frame; ,
A noise bias component estimation process for optimally estimating a bias component, which is the center of gravity of the acoustic feature amount space of the noise signal, using the log mel spectrum, parameters of the silent GMM and the clean speech GMM as inputs;
A noise residual component estimation process for optimally estimating a residual component that is a difference between the noise signal and the bias component by using the log mel spectrum, the bias component, and the parameters of the silent GMM and the clean speech GMM as inputs. When,
A noise suppression process for outputting an acoustic signal in which the noise signal is suppressed with the log mel spectrum, the complex spectrum, the bias component, the residual component, the silence GMM, and the parameters of the clean speech GMM as inputs. ,
A noise suppression method comprising:

The noise suppression method according to claim 6,
A noise suppression method, wherein the noise signal is represented by a sum of the bias component and the residual component expressed by an autoregressive model, and a time series of the noise signal is estimated by a biased autoregressive model.

The noise suppression method according to claim 6 or 7,
The noise bias component estimation process is as follows:
With the log mel spectrum as an input, a bias component initial value obtained by averaging the log mel spectrum every predetermined number of frames, and a bias component initial value estimating step for estimating a diagonal dispersion matrix of the bias component initial value;
A probability model generation step of configuring a logarithmic mel spectrum probability model with GMM using the bias component initial value, and parameters of silent GMM and clean speech GMM;
An expected value calculation processing step of calculating an expected value of the cost function of the probability model of the probability model of the complex spectrum in the repeated estimation for each predetermined number of frames;
A parameter update processing step for optimizing and updating the bias component that maximizes the expected value of the cost function by the Newton method;
A convergence determination processing step that repeats the operations of the probability model generation step , the expected value calculation processing step, and the parameter update processing step until the bias component converges;
Including a noise suppression method.

The noise suppression method according to any one of claims 6 to 8,
The noise residual component estimation process is as follows:
A residual component initial value estimating step of averaging a residual component, which is a difference between the log mel spectrum and the bias component, every predetermined number of frames to estimate an initial value of the residual component;
A residual component prediction processing step of multiplying a residual component estimation value of one frame before by an autoregressive matrix and predicting a residual component prediction value of the current frame by an autoregressive model;
The logarithmic mel spectrum, the bias component, the residual component prediction value, the silent GMM and the clean speech GMM parameters are input, and the residual component estimation is the same as the total number of normal distributions included in the respective GMMs. A residual component estimation processing step for calculating value candidates;
Probability model generation processing step of generating logarithmic spectrum GMM parameters in the current frame by using the residual component estimation value candidates, the silent GMM and the clean speech GMM parameters as inputs,
Using the log mel spectrum and the GMM parameters of the log mel spectrum in the current frame as inputs, the speech non-existence probability / presence probability and posterior probability are calculated, and the residual component estimation value candidates are weighted and averaged. A weighted average processing step to calculate an estimate;
An expected value calculation processing step for calculating an expected value of the cost function of the probability model of the log mel spectrum in the iterative estimation for each predetermined number of frames with a probability model of a parallel nonlinear Kalman filter;
A parameter update processing step for updating the autoregressive matrix so as to maximize the expected value of the cost function;
Until the autoregressive matrix converges, the residual component prediction processing step , the residual component estimation processing step , the probability model generation processing step , the weighted average processing step , the expected value calculation processing step, and the parameter update processing step Convergence determination processing step that repeats the operation;
Including a noise suppression method.

The noise suppression method according to claim 9, wherein
The parameter update process step
A noise suppression method comprising the step of optimizing the autoregressive matrix using a time series of the residual components and an EM algorithm.

A program for causing a computer to execute the noise suppression method according to any one of claims 6 to 10.