JP4977062B2

JP4977062B2 - Reverberation apparatus and method, program and recording medium

Info

Publication number: JP4977062B2
Application number: JP2008051099A
Authority: JP
Inventors: 智広中谷; 慶介木下
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-02-29
Filing date: 2008-02-29
Publication date: 2012-07-18
Anticipated expiration: 2028-02-29
Also published as: JP2009212599A

Description

この発明は、音源が生成した音響信号（以降、「音源信号」と称す）を、残響のある部屋においてマイクロホンで収音して得られる信号（以降、「観測信号」と称する）から、残響を取り除いた音響信号を抽出する残響除去装置とその方法と、そのプログラムと記録媒体に関する。 According to the present invention, an acoustic signal generated by a sound source (hereinafter referred to as “sound source signal”) is collected from a signal (hereinafter referred to as “observation signal”) obtained by collecting the sound signal with a microphone in a room with reverberation. The present invention relates to a dereverberation apparatus and method for extracting a removed acoustic signal, a program thereof, and a recording medium.

音源信号が残響のある環境で収音されると、本来の音源信号に残響が重畳した信号として観測される。そのため、本来の音源信号の性質を抽出することが困難になると共に、音源信号の明瞭度が低下する。そこで、明瞭度を向上させる目的で重畳した残響を取り除く残響除去方法及び装置が従来から使われている。 When a sound source signal is collected in an environment with reverberation, it is observed as a signal in which reverberation is superimposed on the original sound source signal. For this reason, it becomes difficult to extract the nature of the original sound source signal, and the clarity of the sound source signal decreases. Therefore, a dereverberation method and apparatus for removing the superimposed reverberation have been conventionally used for the purpose of improving intelligibility.

非特許文献１に開示された従来の残響除去装置９００の機能構成例を図１０に示してその動作を簡単に説明する。残響除去装置９００は、音源モデル９０と、予測フィルタ推定部９２と、残響除去部９４とを備える。音源モデル９０は、残響を含まない音源信号の短時間区間の音声波形をガウス分布でモデル化したものである。予測フィルタ推定部９２は、観測信号と音源モデル９０を入力として、観測信号の尤もらしさを表現する最適化関数を最大にする残響信号を予測する予測フィルタ係数を求める。残響除去部９４は、観測信号から予測フィルタ係数で予測された残響信号を除去して音響信号を出力する。
Nakatani,T.,Juang,B.H.,Hikichi,T.,Yoshioka,T.,Kinoshita,K.Delcroix,M.,andMiyoshi,M.,”Study on speech dereverberation with autocorrelation codebook,”Proc.IEEE International Conference on Acoustics,Speech,and Signal Processing(ICASSP-2007),vol.I,pp.193-196,April 2007. An example of the functional configuration of a conventional dereverberation apparatus 900 disclosed in Non-Patent Document 1 is shown in FIG. The dereverberation apparatus 900 includes a sound source model 90, a prediction filter estimation unit 92, and a dereverberation unit 94. The sound source model 90 is obtained by modeling a speech waveform of a short time section of a sound source signal not including reverberation with a Gaussian distribution. The prediction filter estimation unit 92 receives the observation signal and the sound source model 90 as input, and obtains a prediction filter coefficient that predicts a reverberation signal that maximizes an optimization function that expresses the likelihood of the observation signal. The reverberation removing unit 94 removes the reverberation signal predicted by the prediction filter coefficient from the observed signal and outputs an acoustic signal.
Nakatani, T., Juang, BH, Hikichi, T., Yoshioka, T., Kinoshita, K.Delcroix, M., andMiyoshi, M., "Study on speech dereverberation with autocorrelation codebook," Proc.IEEE International Conference on Acoustics , Speech, and Signal Processing (ICASSP-2007), vol.I, pp.193-196, April 2007.

従来の残響除去方法は、観測信号に含まれる残響信号を予測する予測フィルタ係数を、観測信号のみから推定していた。この推定には、ある程度以上の長さの観測信号を必要とするため、観測信号が短い場合には残響信号を高精度に予測することが難しく、精度の良い残響除去が行えなかった。 In the conventional dereverberation method, a prediction filter coefficient for predicting a reverberation signal included in an observation signal is estimated only from the observation signal. Since this estimation requires an observation signal having a length of a certain length or more, if the observation signal is short, it is difficult to predict the reverberation signal with high accuracy, and accurate dereverberation cannot be performed.

この発明は、このような点に鑑みてなされたものであり、観測信号が短い場合においても、比較的精度良く観測信号に含まれる残響信号を推定できる残響除去方法及びその装置と、そのプログラムと記録媒体を提供することを目的とする。 The present invention has been made in view of the above points, and even when the observation signal is short, a dereverberation removal method and apparatus capable of estimating the reverberation signal included in the observation signal with relatively high accuracy, and a program thereof. An object is to provide a recording medium.

この発明による残響除去方法は、音源モデル推定部が、時系列の観測信号を入力として残響を含まない音源モデルのモデルパラメータを推定する音源モデル推定過程と、観測信号共分散推定部が、モデルパラメータと観測信号を入力として観測信号の共分散行列と共分散ベクトルを推定する観測信号共分散推定過程と、予測フィルタ推定部が、観測信号の共分散行列と共分散ベクトルと、上記観測信号を収音する場所において観測される信号に含まれる残響信号を予測する予測フィルタ係数を、当該予測フィルタ係数の確率密度関数でモデル化した予測フィルタモデルを入力として予測フィルタ係数を推定する予測フィルタ係数推定過程と、残響除去部が、観測信号と予測フィルタ係数を入力として残響を含まない音声信号を推定する残響除去過程と、を備える。 In the dereverberation method according to the present invention, a sound source model estimation unit estimates a model parameter of a sound source model that does not include reverberation using a time-series observation signal as an input, and an observation signal covariance estimation unit And the observation signal covariance matrix and covariance estimation process, and the prediction filter estimator collects the observation signal covariance matrix and covariance vector and the observation signal. Prediction filter coefficient estimation process in which a prediction filter coefficient that predicts a reverberation signal included in a signal observed at a sound location is modeled with a prediction filter model that is modeled by a probability density function of the prediction filter coefficient. And a dereverberation unit that estimates an audio signal that does not include reverberation using the observed signal and the prediction filter coefficient as input. It includes a process, a.

この発明の残響除去方法は、従来法の観測信号と音源モデルとから予測フィルタ係数を推定する方法に加え、残響信号を推定する予測フィルタ係数に関する確率モデルを用いて予測フィルタ係数を推定する。予測フィルタ係数に関する確率モデルを用いることで確率的により尤もらしい予測フィルタ係数を推定することが可能であり、観測信号が短い場合においても比較的精度良く観測信号に含まれている残響信号を推定できる。 According to the dereverberation method of the present invention, the prediction filter coefficient is estimated using a probability model related to the prediction filter coefficient for estimating the reverberation signal, in addition to the method of estimating the prediction filter coefficient from the observation signal and the sound source model of the conventional method. Predictive filter coefficients that are more likely to be probabilistic can be estimated by using a probabilistic model for predictive filter coefficients, and even if the observed signal is short, the reverberation signal included in the observed signal can be estimated relatively accurately .

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。 Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated.

〔発明の基本的な考え〕
実施例の説明の前に、この発明の残響除去方法の基本的な考えを説明する。この発明の残響除去方法は、従来法で用いられていた最尤推定を最大事後確率（Maximum a posteriori,以降「ＭＡＰ」と称する）推定に替え、その結果必要となる予測フィルタ係数の確率モデルを考慮するようにしたものである。ＭＡＰ推定とは、観測信号が得られた条件下で、目的となる確率変数（この発明の場合は予測フィルタ係数）の事後確率密度関数を最大にする値を求めることで推定値を得る方法である。なお、この発明は１本以上のマイクロホンで構成することができるが、以降では、記述を簡単にするために２本のマイクロホンの場合を例に説明する。 [Basic idea of the invention]
Prior to the description of the embodiments, the basic idea of the dereverberation method of the present invention will be described. In the dereverberation method of the present invention, the maximum likelihood estimation used in the conventional method is replaced with a maximum a posteriori (hereinafter referred to as “MAP”) estimation, and a probability model of a prediction filter coefficient required as a result is changed. It is something to consider. MAP estimation is a method for obtaining an estimated value by obtaining a value that maximizes the posterior probability density function of a target random variable (in the present invention, a prediction filter coefficient) under the condition that an observation signal is obtained. is there. Although the present invention can be composed of one or more microphones, in the following description, in order to simplify the description, the case of two microphones will be described as an example.

信号名を以下のように定義する。
The signal name is defined as follows.

ｓ_ｔ￣は残響除去した目的信号の長さＮの短時間フレームに対応するベクトル。￣はベクトルであることを表わすが、その表記は式及び図中に示すものが正しい。ｘ_ｔ ^━（ｖ）はｖ本目のマイクロホン信号の短時間フレームに対応するベクトル。ｘ_ｔ￣は全てのマイクロホン信号の短時間フレームをつなげたベクトル。Ｘ_ｔ ^（ｖ）はｘ_ｔ ^（ｖ）￣の時系列を並べた行列。Ｘ_ｔはマイクロホン１と２についての時系列の行列を並べた行列。Ｘ_ｔ2：ｔ1はｘ_ｔ￣をｔ_２〜ｔ₁までの過去に遡って並べた行列である
観測信号を式（１）に示すように多チャネル自己回帰過程でモデル化する。 s _t ￣ is a vector corresponding to a short-time frame of length N of the target signal from which dereverberation has been removed. ￣ represents a vector, but the notation is correct as shown in the formula and the figure. x _t- ^(v) is a vector corresponding to the short-time frame of the v-th microphone signal. x _t ￣ is a vector that connects short-time frames of all microphone signals. X _t ^(v) is a matrix in which time series of x _t ^(v) ￣ are arranged. _Xt is a matrix in which time-series matrices for the microphones 1 and 2 are arranged. X _{t2: t1} is a matrix in which x _t並べ is arranged retroactively from t ₂ to t _{1. An} observation signal is modeled in a multichannel autoregressive process as shown in Equation (1).

式（１）は、左辺に含まれる時刻ｔのｖ番目のマイクロホン信号ｘ_ｔ ^━（ｖ）が、右辺に含まれる過去の信号系列Ｘ_ｔ−Ｄに予測フィルタ係数ｃ^━を乗じて予測した結果、その予測誤差が目的信号ｓ_ｔ ^━となることを意味している。 Expression (1) is a result of prediction by multiplying the past signal sequence X _t-D included in the right side by the prediction filter coefficient c ⁻ for the v-th microphone signal x _t- ^(v) included in the left side at the time t. This means that the prediction error becomes the target signal s _t ^━ .

ここでＤは、時刻ｔの観測信号ｘ_ｔ ^（ｖ）￣を予測する際に観測信号に付加する遅延である。Ｄ＞１を導入することで、予測係数の推定誤差に対する残響除去の頑健性が向上することが報告されている（参考文献：K.Kinoshita,T.Nakatani,and M.Miyoshi,”Spectral subtraction steered by multi-step forward linear prediction for single channel speech dereverberation,”Proc.ICASSP-2006,vol.1,pp.817-820,May,2006.）。以降の説明では、ｖ番目のマイクロホン信号を予測対象の信号として扱う。他チャネルの予測も全く同様に行うことができる。式（１）より目的信号は、式（２）で書けることからｃ￣は逆フィルタと等価な情報を持つ値であるといえる。 Here, D is a delay added to the observation signal when predicting the observation signal x _t ^(v)の at time t. It has been reported that the introduction of D> 1 improves the robustness of dereverberation against estimation error of prediction coefficients (reference: K. Kinoshita, T. Nakatani, and M. Miyoshi, “Spectral subtraction steered by multi-step forward linear prediction for single channel speech dereverberation, "Proc.ICASSP-2006, vol.1, pp.817-820, May, 2006.). In the following description, the v-th microphone signal is treated as a prediction target signal. Other channels can be predicted in exactly the same way. From the equation (1), the target signal can be written by the equation (2). Therefore, it can be said that c￣ is a value having information equivalent to the inverse filter.

次に、最適化関数として推定すべきパラメータを予測フィルタ係数ｃ￣とし、音声モデルパラメータと予測係数からなるパラメータ集合をθとして、式（３）に示すように最適化関数を定義する。 Next, an optimization function is defined as shown in Expression (3), where a parameter to be estimated as an optimization function is a prediction filter coefficient c￣, and a parameter set including a speech model parameter and a prediction coefficient is θ.

ここでｐ_ｘ（・），ｐ_ｓ（・），及びｐ_ｃ（・）は、それぞれ観測信号ｘ_ｔ ^（ｖ）￣、目的信号ｓ_ｔ￣及び予測フィルタ係数ｃ￣に関する確率密度関数を表わす。上記の式の展開において、最適化に無関係な定数であるｌｏｇｐ_ｘ（Ｘ_Ｔ：1；θ）などは略記した。式（４），（５）は、一般的な確率密度関数の性質に基づいて式（３）を展開することで得られる。式（６），（７）は、式（２）に基づき式（５）を展開するとともに、ｃ￣と無関係な項（ｖ本目のマイクロホン以外の信号を予測する場合の予測フィルタ係数に関連する項）を無視することで得られる。 Here, p _x (•), p _s (•), and p _c (•) represent probability density functions related to the observed signal x _t ^(v) ￣, the target signal s _t ￣, and the prediction filter coefficient _cそれぞれ, respectively. In the development of the above formula, logp _x (X _{T: 1} ; θ), which is a constant unrelated to optimization, is abbreviated. Expressions (4) and (5) are obtained by expanding Expression (3) based on the properties of a general probability density function. Expressions (6) and (7) expand Expression (5) based on Expression (2) and relate to a prediction filter coefficient when a signal other than c￣ is predicted (a signal other than the v-th microphone). Obtained by ignoring the term).

式（７）の最適化関数は、目的信号の確率密度関数ｐ_ｓ（ｓ_ｔ￣；θ）と、予測フィルタ係数の確率密度関数ｐ_ｃ（ｃ￣；θ）が与えられれば完全に定義することができる。式（７）の第一項は、従来の残響除去方法の最適化関数と等価な関数である。第二項は予測フィルタ係数の確率モデルである。この発明では、第二項を新たに考慮することで充分な長さの観測信号が得られない場合でも比較的精度の高い残響除去を実現することができる。 The optimization function of Equation (7) is completely defined if the probability density function p _s (s _t ￣; θ) of the target signal and the probability density function p _c ( _c ￣; θ) of the prediction filter coefficient are given. be able to. The first term of Equation (7) is a function equivalent to the optimization function of the conventional dereverberation method. The second term is a probability model of the prediction filter coefficient. In the present invention, reverberation removal with relatively high accuracy can be realized even when an observation signal having a sufficient length cannot be obtained by newly considering the second term.

図１にこの発明の残響除去方法を用いた残響除去装置１００の機能構成例を実施例１として示す。その動作フローを図２に示す。残響除去装置１００は、予測フィルタモデル記録部１０と、音源モデル推定部１１と、観測信号共分散推定部１２と、予測フィルタ推定部１３と、残響除去部４４とを備える。残響除去装置１００は、従来の残響除去装置９００の音源モデル９０を音源モデル推定部１１に置き換え、予測フィルタモデル記録部１０と、観測信号共分散推定部１２とを追加し、予測フィルタ推定部１３での処理内容を変更したものである。また、この例の残響除去装置１００は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。 FIG. 1 shows a functional configuration example of a dereverberation apparatus 100 using the dereverberation method of the present invention as a first embodiment. The operation flow is shown in FIG. The dereverberation apparatus 100 includes a prediction filter model recording unit 10, a sound source model estimation unit 11, an observation signal covariance estimation unit 12, a prediction filter estimation unit 13, and a dereverberation unit 44. The dereverberation apparatus 100 replaces the sound source model 90 of the conventional dereverberation apparatus 900 with the sound source model estimation unit 11, adds a prediction filter model recording unit 10 and an observation signal covariance estimation unit 12, and provides a prediction filter estimation unit 13. The content of the processing in is changed. In addition, the dereverberation apparatus 100 of this example is realized by reading a predetermined program into a computer including, for example, a ROM, a RAM, a CPU, and the like, and executing the program by the CPU.

音源モデル推定部１１は、観測信号の時系列信号を入力として残響を含まない音源モデルのモデルパラメータを推定する（ステップＳ１１）。その方法は、例えば観測信号をプリホワイトニング処理を行なう場合と同様に、観測信号に関する自己回帰係数を求め、それを音声信号の自己回帰係数の近似値として用いる。以降、音源モデル、つまり目的信号のモデルを確率密度関数ｐ_ｓ、モデルパラメータを自己相関行列ｒとして説明する。それぞれを式（８）及び式（９）に示す様に定義する。 The sound source model estimation unit 11 receives the time series signal of the observation signal and estimates model parameters of the sound source model that does not include reverberation (step S11). In this method, for example, as in the case where the observation signal is subjected to the prewhitening process, an autoregressive coefficient relating to the observed signal is obtained and used as an approximate value of the autoregressive coefficient of the speech signal. Hereinafter, a sound source model, that is, a model of a target signal will be described as a probability density function p _s and a model parameter as an autocorrelation matrix r. Each is defined as shown in Equation (8) and Equation (9).

ここでａは、目的信号ｓ_ｔ￣の自己回帰係数α＝[α₁ α₂ … α_ｐ]から式（１０）で定義される上三角テプリッツ行列（Ｎ×Ｎ）である。 Here, a is an upper triangular Toeplitz matrix (N × N) defined by the equation (10) from the autoregressive coefficient α = [α ₁ α ₂ ... Α _p ] of the target signal s _t ￣.

上記自己相関行列ｒと上記した式（８）により、最適化関数を定義する式（７）の第一
項に含まれる被加算項は式（１１）に示すように展開できる。 From the autocorrelation matrix r and the above equation (8), the added term included in the first term of the equation (7) defining the optimization function can be expanded as shown in the equation (11).

ここで、式（１２）とし、式（１１）の展開で最適化に無関係な定数項は省略している。 Here, equation (12) is used, and constant terms unrelated to optimization are omitted in the development of equation (11).

観測信号共分散推定部１２は、自己相関行列ｒと観測信号ｘ_ｔ ^（ｖ）￣を入力として観測信号の共分散行列Φと共分散ベクトルφを、式（１３）と（１４）に基づいて推定する（ステップＳ１２）。 The observation signal covariance estimation unit 12 receives the autocorrelation matrix r and the observation signal x _t ^(v) ￣ as inputs, and obtains the observation signal covariance matrix Φ and the covariance vector φ based on the equations (13) and (14). Estimate (step S12).

ここで、観測信号の共分散行列Φと共分散ベクトルφを求める理由を説明する。上記した最適化関数は、上記した確率密度関数の定義のもと、期待値最大化法（以降、「ＥＭ」と称する）アルゴリズムを用いて効率的に最大化することができる。予測係数の状態ｉを隠れ変数として、ＥＭアルゴリズムにおけるＱ関数は式（１５）で定義される。 Here, the reason for obtaining the covariance matrix Φ and the covariance vector φ of the observation signal will be described. The above optimization function can be efficiently maximized using an expected value maximization method (hereinafter referred to as “EM”) algorithm based on the definition of the above probability density function. With the state i of the prediction coefficient as a hidden variable, the Q function in the EM algorithm is defined by Equation (15).

ここでｐ_ｃ（・）は、観測信号に含まれる残響信号を予測する予測フィルタ係数を確率的
にモデル化した混合ガウス分布であり、式（１６）〜式（１８）で定義される。 Here, p _c (•) is a mixed Gaussian distribution obtained by stochastically modeling the prediction filter coefficient for predicting the reverberation signal included in the observation signal, and is defined by Expressions (16) to (18).

ここで、ｉは、予測フィルタ係数の状態を表わす整数（1≦ｉ≦Ｋ）であり、ｇ_ｉは混合比を表わす。各状態におけるガウス分布は式（１９）で表わせる。 Here, i is an integer (1 ≦ i ≦ K) representing the state of the prediction filter coefficient, and g _i represents the mixing ratio. The Gaussian distribution in each state can be expressed by equation (19).

混合ガウス分布は、ある特定の部屋において事前に学習されたものであり、モデルパラ
メータ｛ｇ₁，μ₁，Σ₁，…｝は事前に求められ、予測フィルタモデル記録部１０に記録されている。なお、式（１９）の変形として、全てのｉに対してμ_ｉ＝０としたり、Σ_ｉの非対角要素を０と置くことで計算の効率を上げることができる。その場合のパラメータも一般的な混合ガウス分布のパラメータ学習アルゴリズムを用いて事前に定めることができる。学習方法は一般的な方法で良いので説明は省略する。 The mixed Gaussian distribution is learned in advance in a specific room, and the model parameters {g ₁ , μ ₁ , Σ ₁ ,...} Are obtained in advance and recorded in the prediction filter model recording unit 10. . As a modification of equation (19), it is possible to increase the calculation efficiency by setting μ _i = 0 for all i or by setting the non-diagonal element of Σ _i to 0. The parameters in that case can also be determined in advance by using a general mixed Gaussian parameter learning algorithm. The learning method may be a general method and will not be described.

ここで式（１５）の右辺は、予測フィルタ係数ｃ￣^「ｎ」が与えられたもとでの条件付期待値関数を表わす。予測フィルタ係数ｃ￣^「ｎ」に関連のある項だけ残して整理すると、Ｑ関数は式（２０）で表わせる。 Here, the right side of the equation (15) represents a conditional expected value function with the prediction filter coefficient c ￣ ^“n” given. If only the terms related to the prediction filter coefficient c ￣ ^“n” are left and rearranged, the Q function can be expressed by Expression (20).

ただし、
However,

ここで式（２１）の第一項は、上記した式（１３）に示す観測信号共分散行列Φである。また、式（２２）の第一項は、上記した式（１４）に示す観測信号共分散ベクトルφである。よって、観測信号の共分散行列Φと共分散ベクトルφの推定値が式（１３），（１４）で与えられているとすると、ＥＭアルゴリズムのＥステップでは、式（２１）と式（２２）の第二項を求めるだけで良い。この第二項は以下の式（２３）〜（２５）で求めることができる。 Here, the first term of the equation (21) is the observation signal covariance matrix Φ shown in the above equation (13). Further, the first term of the equation (22) is the observation signal covariance vector φ shown in the above equation (14). Therefore, assuming that the estimated values of the covariance matrix Φ and the covariance vector φ of the observation signal are given by the equations (13) and (14), in the E step of the EM algorithm, the equations (21) and (22) are used. It is only necessary to find the second term. This second term can be obtained by the following equations (23) to (25).

そしてＥＭアルゴリズムのＭステップでは、予測フィルタ係数ｃ￣の更新値を式（２０）を最大にする式（２２）の値（これは予測係数の期待値に相当）として定める。このように、予測フィルタ推定部１３は、観測信号の共分散行列Φと共分散ベクトルφと、予測フィルタモデルのモデルパラメータ｛ｇ₁，μ₁，Σ₁，…｝を入力として予測フィルタ係数ｃ￣を推定する。 In the M step of the EM algorithm, the updated value of the prediction filter coefficient c is determined as the value of Expression (22) that maximizes Expression (20) (this corresponds to the expected value of the prediction coefficient). As described above, the prediction filter estimation unit 13 receives the covariance matrix Φ and the covariance vector φ of the observation signal and the model parameters {g ₁ , μ ₁ , Σ ₁ ,. Estimate habit.

予測フィルタ推定部１３は、初期値設定部１３１と、事後確率算出部１３２と、期待値算出部１３３と、条件付期待値関数算出部１３４とを備える。初期値設定部１３１は、例えば参考文献に示されているマルチステップ線形予測などにより予測フィルタ係数の初期値ｃ￣^「0」を定める（ステップＳ１３１）。このとき繰り返しのカウンタｎをｎ＝０とする。 The prediction filter estimation unit 13 includes an initial value setting unit 131, a posterior probability calculation unit 132, an expected value calculation unit 133, and a conditional expected value function calculation unit 134. The initial value setting unit 131 determines the initial value c￣ ^“0” of the prediction filter coefficient by, for example, multistep linear prediction shown in the reference (step S131). At this time, the repeated counter n is set to n = 0.

事後確率算出部１３２は、上記した式（２３）により予測フィルタ係数ｃ￣^「ｎ」が与えられたもとでの各状態ｉの事後確率を求める（ステップＳ１３２）。条件付期待値関数算出部１３３は、上記した式（２４）と式（２５）によって、条件付期待値を算出する（ステップＳ１３３）。期待値算出部１３４は、上記した式（２１）と式（２２）とによって予測フィルタ係数の期待値の更新値を求める（ステップＳ１３４）。更新値が収束していなければカウンタｎをｎ＝ｎ+１（ステップＳ１３６）としてステップＳ１３１に戻る。 The posterior probability calculation unit 132 obtains the posterior probability of each state i with the prediction filter coefficient c￣ ^“n” given by the above equation (23) (step S132). The conditional expected value function calculation unit 133 calculates the conditional expected value using the above-described formula (24) and formula (25) (step S133). The expected value calculation unit 134 obtains an updated value of the expected value of the prediction filter coefficient by using the above formula (21) and formula (22) (step S134). If the updated value has not converged, the counter n is set to n = n + 1 (step S136), and the process returns to step S131.

残響除去部４４は、更新された予測フィルタ係数ｃ￣^「ｎ」を用いて上記した式（２）に基づいて観測信号から推定した残響信号を除去する（ステップＳ４４）。残響除去部４４の動作は従来法と同じである。 The reverberation removing unit 44 removes the reverberation signal estimated from the observation signal based on the above equation (2) using the updated prediction filter coefficient c 係数^“n” (step S44). The operation of the dereverberation unit 44 is the same as in the conventional method.

以上述べたようにこの発明の残響除去方法によれば、予測フィルタ係数に関する確率モデルを用いることで確率的により尤もらしい予測フィルタ係数を推定することが可能であり、観測信号が短い場合においても比較的精度良く観測信号に含まれている残響信号を推定できる。 As described above, according to the dereverberation method of the present invention, it is possible to estimate a predictive filter coefficient that is more likely by using a probabilistic model related to a predictive filter coefficient. The reverberation signal included in the observation signal can be estimated with high accuracy.

〔変形例〕
実施例１では、音源モデルを、音源が定常自己回帰過程に従うと仮定して定義した。こ
れに対し、より精度の高い音源モデルを導入することで、より精度の高い残響除去を実現
することができる。例えば、有限状態機械でモデル化した音源モデルを導入する方法が考
えられる。その方法による残響除去装置３００の機能構成例を図３に、その動作フローを
図４に示す。 [Modification]
In Example 1, the sound source model was defined on the assumption that the sound source follows a steady autoregressive process. On the other hand, by introducing a sound source model with higher accuracy, it is possible to realize dereverberation with higher accuracy. For example, a method of introducing a sound source model modeled by a finite state machine can be considered. FIG. 3 shows a functional configuration example of the dereverberation apparatus 300 according to the method, and FIG. 4 shows an operation flow thereof.

残響除去装置３００は、観測信号ｘ_ｔ ^（ｖ）￣の各短時間フレームｔの観測信号と最も合致する自己相関行列ｒを選択するようにしたものである。そのために、複数の自己相関行列を記録した音源モデル記録部３０を新たに備える。また、音源モデル推定部３１は、観測信号ｘ_ｔ ^（ｖ）￣を参照して音源モデル記録部３０に記録された複数の自己相関行列ｒから一つを選択するものであり、残響除去部３２が残響を除去した目的信号ｓ_ｔ￣が収束するまで自己相関行列の選択から繰り返し動作させる収束判定部３２１を備える点が、実施例１の残響除去装置１００と異なる。 The dereverberation apparatus 300 selects an autocorrelation matrix r that most closely matches the observation signal of each short time frame t of the observation signal x _t ^(v) ￣. Therefore, a sound source model recording unit 30 that records a plurality of autocorrelation matrices is newly provided. The sound source model estimation unit 31 selects one from a plurality of autocorrelation matrices r recorded in the sound source model recording unit 30 with reference to the observation signal x _t ^(v) 、. Is different from the dereverberation apparatus 100 of the first embodiment in that it includes a convergence determination unit 321 that repeatedly operates from the selection of the autocorrelation matrix until the target signal s _tした from which derever has been removed converges.

残響除去装置３００も、残響除去の最適化関数として上記した式（７）を用いる。この
例では式（７）の第一項の音源モデルに関する部分の定義のみに変形を加え、第二項は同じものを用いる。 The dereverberation apparatus 300 also uses the above equation (7) as an optimization function for dereverberation. In this example, only the definition of the portion related to the sound source model of the first term of Expression (7) is modified, and the same term is used for the second term.

各時刻ｔの目的信号ｓ_ｔ￣に関する音声モデルのパラメータは、自己相関コードブックのインデックスであり、これをｉ_ｔと表記する。ｉ_ｔは、自己相関コードブックに含まれるコードワードのインデックスｍがとる値を１≦ｍ≦Ｍとすると、そのどれか一つの値をとる。各ｍに対応する自己相関行列をｒ_ｍと書き、ｒ_ｉｔを時刻ｔの自己相関コードブックのインデックスｉ_ｔに対応する自己相関行列とする。さらに、音声時系列の全体のモデルパラメータは、自己相関コードブックのインデックスの時系列全体Ｉ＝｛ｉ₁，ｉ₂，…，ｉ_Ｔ｝とする。 Parameters of the speech models for the target signal s _t ¯ at each time t is the index of the autocorrelation codebook is denoted to as i _t. i _t, when the index m assumes a value of code words contained in the autocorrelation codebook 1 ≦ m ≦ M, take one of the values that any. A self-correlation matrix corresponding to each m written as r _m, the self-correlation matrix corresponding to the r _it to the index i _t of the autocorrelation codebook of time t. Further, the entire model parameter of the speech time series is assumed to be the entire time series I = {i ₁ , i ₂ ,..., I _T } of the index of the autocorrelation codebook.

時刻ｔの音声のモデルは、式（２６）で書ける。
The voice model at time t can be written by equation (26).

ただし、残響除去法で推定すべきパラメータはθ＝｛ｃ￣，Ｉ｝とする。以上により、
最適化関数の式（７）は、上記した式（１８）と式（１９）と式（２６）に基づいて定義
することができる。この例では、最適化関数の最大化を予測フィルタ係数ｃ￣と自己相関
コードブックのインデックスの時系列全体Ｉに関して交互に繰り返して行う。 However, the parameter to be estimated by the dereverberation method is θ = {c￣, I}. With the above,
Expression (7) of the optimization function can be defined based on the above expressions (18), (19), and (26). In this example, the optimization function is maximized by alternately repeating the prediction filter coefficient c ￣ and the autocorrelation codebook index whole time series I.

音源モデル推定部３１は、観測信号ｘ_ｔ￣^（ｖ）そのものを初期推定値ｓ_ｔ￣^[0]とする（ステップＳ３１）。同時に繰り返しカウンタｎ₁をｎ₁＝０とする。そして、観測信号ｘ_ｔ ^（ｖ）￣を参照して音源モデル記録部３０に記録された複数の自己相関行列から一つの自己相関行列ｒ_ｉｔを選択して式（２７）によりｉ_ｔを定める。 The sound source model estimation unit 31 sets the observation signal x _t ￣ ^(v) itself as the initial estimated value s _t ￣ ^[0] (step S31). At the same time, the repeat counter n _{1 is set} to n ₁ = 0. Then, by selecting the observed signal x _{t ^(v)} one of the autocorrelation matrix r _it from a plurality of autocorrelation matrix recorded in the sound source model recording unit 30 with reference to ^¯ defining a i _t by the equation (27).

観測信号共分散推定部１２が、観測信号ｘ_ｔ￣^（ｖ）と自己相関行列ｒ_ｉｔを入力として観測信号の共分散行列Φと共分散ベクトルφを推定するステップＳ１２から、残響除去部４４が、観測信号から残響を除去して目的信号ｓ_ｔ￣を推定するステップＳ４４までは、実施例１と同じである。この例では、残響除去部３２内の収束判定部３２１で、目標信号ｓ_ｔ￣が収束するまで（ステップＳ３２１の済）繰り返しカウンタｎ_１をカウントアップ（ステップＳ３２２）しながら、音源モデル推定部３１の自己相関行列ｒ_ｉｔを変更して予測フィルタ係数ｃ￣を推定する。 The observation signal covariance estimation unit 12 receives the observation signal x _t Φ ^(v) and the autocorrelation matrix r _it and estimates the covariance matrix Φ and covariance vector φ of the observation signal. The process up to step S44 for removing the reverberation from the observed signal and estimating the target signal s _t ￣ is the same as that in the first embodiment. In this example, the convergence determination unit 321 in the dereverberation unit 32 repeatedly counts up the counter n ₁ (step S322) until the target signal s _t収束 converges (step S321), and the sound source model estimation unit 31. estimating the prediction filter coefficients c¯ change the autocorrelation matrix r _it in.

以上述べたように、例えば有限状態機械でモデル化した音源モデルを用いることで、よ
り精度の高い音源モデルにすることができ、その結果、精度の高い残響除去が実現できる。実施例１及び変形例で説明した残響除去方法は、事前に全ての信号が取得済みでありバッチ処理できることを前提にした方法である。次に、逐次的に得られる観測信号に対して最新の予測フィルタ係数を逐次推定するこの発明の残響除去法を実施例２として説明する。 As described above, for example, by using a sound source model modeled by a finite state machine, a more accurate sound source model can be obtained, and as a result, highly accurate dereverberation can be realized. The dereverberation method described in the first embodiment and the modified example is a method based on the premise that all signals have been acquired in advance and can be batch-processed. Next, a dereverberation method according to the present invention for sequentially estimating the latest prediction filter coefficient for observed signals obtained sequentially will be described as a second embodiment.

最新の予測フィルタ係数を逐次推定する残響除去装置５００の機能構成例を図５に、そ
の動作フローを図６に示す。残響除去装置５００は、予め定められた所定の時間間隔で予
測フィルタ係数ｃ￣を推定・更新するものである。各更新時において、その時刻より以前
に得られた観測信号の全て或いはその一部に対して上記した最大化アルゴリズムを適用す
ることで予測フィルタ係数ｃ￣を推定すると共に、各時刻に逐次的に得られる観測信号に
対してそれまでに得られた最新の予測フィルタ係数ｃ￣をその時刻の観測信号に適用する
構成である。 FIG. 5 shows a functional configuration example of the dereverberation apparatus 500 that sequentially estimates the latest prediction filter coefficients, and FIG. 6 shows an operation flow thereof. The dereverberation apparatus 500 estimates / updates the prediction filter coefficient c で at predetermined time intervals. At the time of each update, the prediction filter coefficient c で is estimated by applying the above-described maximization algorithm to all or part of the observation signal obtained before that time, and sequentially at each time. This is a configuration in which the latest prediction filter coefficient c￣ obtained so far is applied to the observation signal at that time for the obtained observation signal.

残響除去装置５００は、観測信号共分散推定部を所定時間間隔で繰り返し動作させ、予
測フィルタ係数ｃ￣を更新させる更新部５０も備え、観測信号共分散推定部は、最新の共分散行列Φ_ｎ-1と共分散ベクトルφΦ_ｎ-1を記録する共分散記録部５１１を備えた観測信号共分散推定部５１である点が、残響除去装置１００と異なる。 The dereverberation apparatus 500 also includes an updating unit 50 that repeatedly operates the observation signal covariance estimation unit at predetermined time intervals and updates the prediction filter coefficient c￣. The observation signal covariance estimation unit includes the latest covariance matrix Φ _{n. −1} and the covariance vector φΦ _n−1 are different from the dereverberation apparatus 100 in that the observation signal covariance estimation unit 51 includes a covariance recording unit 511 that records the covariance vector φΦ _n−1 .

残響除去装置５００の初回の予測フィルタ係数ｃ￣を推定するまでの動作は、残響除去
装置１００と基本的に同じであるが、二回目以降の動作は更新部５０によって所定時間間
隔で繰り返される（ステップＳ５０）点が異なる。また、観測信号共分散推定部５１が、
観測信号の共分散行列Φと共分散ベクトルφを推定した際に、その最新の共分散行列Φと
共分散ベクトルφを、共分散行列Φ_ｎ-1と共分散ベクトルφ_ｎ-1として共分散記録部５１１に記録する点が異なる。また、予測フィルタ推定部１３´の初期値設定部１３１´が予測フィルタ係数の初期値を設定する処理（ステップＳ１３１´）は、初回のみ行われる点が異なる。なお、予測フィルタ係数ｃ￣は所定時間間隔で更新されるが、残響除去は最新の予測フィルタ係数ｃ￣で連続的に行われる。 The operation of the dereverberation apparatus 500 until the first prediction filter coefficient c￣ is estimated is basically the same as that of the dereverberation apparatus 100, but the second and subsequent operations are repeated at predetermined time intervals by the update unit 50 ( Step S50) is different. In addition, the observation signal covariance estimation unit 51
When the covariance matrix Φ and covariance vector φ of the observed signal are estimated, the latest covariance matrix Φ and covariance vector φ are covariance as covariance matrix Φ _n-1 and covariance vector φ _n-1 The point of recording in the recording unit 511 is different. Further, the processing (step S131 ′) in which the initial value setting unit 131 ′ of the prediction filter estimation unit 13 ′ sets the initial value of the prediction filter coefficient is different only in the first time. The prediction filter coefficient c 係数 is updated at predetermined time intervals, but dereverberation is continuously performed with the latest prediction filter coefficient c￣.

残響除去装置５００では、残響除去部４４の残響除去処理は、予測フィルタ推定部１３
´の予測フィルタ推定処理とは並列かつ非同期的に行なわれる。これにより、残響除去部
４４は、逐次的に入力されてくる観測信号に対し、それまでに予測フィルタ推定部１３´
が推定した最新の予測フィルタ推定値に基づき（予測フィルタ推定部の次の予測フィルタ
更新処理が終わるのを待たずに）残響除去を逐次的に行うことができる。なお、予測フィ
ルタ推定部１３´が初回の推定値を得るまでの時刻においては、予測フィルタの推定値は
例えば０とする。若しくは、事前に測定した観測信号に基づいて計算された値を用いても
良い。 In the dereverberation apparatus 500, the dereverberation process of the dereverberation unit 44 is performed by the prediction filter estimation unit 13.
The prediction filter estimation process of ′ is performed in parallel and asynchronously. As a result, the dereverberation unit 44 applies the prediction filter estimation unit 13 ′ to the observed signals input sequentially.
The dereverberation can be sequentially performed based on the latest prediction filter estimated value estimated by (without waiting for the completion of the next prediction filter update process of the prediction filter estimation unit). Note that the estimated value of the prediction filter is, for example, 0 at the time until the prediction filter estimation unit 13 ′ obtains the first estimated value. Alternatively, a value calculated based on an observation signal measured in advance may be used.

観測信号共分散推定部５１は、共分散行列Φと共分散ベクトルφの推定を式（２８）と
式（２９）で行う。
The observation signal covariance estimation unit 51 performs estimation of the covariance matrix Φ and the covariance vector φ using Expression (28) and Expression (29).

ここで、Ｔ_ｉは各更新の以前の所定時間間隔に対応する観測信号の時間のインデックスの全てを表わす。α、βは忘却係数であり、０＜α，β＜１の定数とする。 Here, T _i represents all the time indexes of the observation signal corresponding to the predetermined time interval before each update. α and β are forgetting factors, and are constants of 0 <α and β <1.

以上のようにすることで、各時刻で得られた最新の予測フィルタ係数による残響除去を
行うことができる。次に、残響除去を周波数領域で行なうようにしたこの発明の実施例３
を説明する。 By doing so, dereverberation can be performed using the latest prediction filter coefficient obtained at each time. Next, Embodiment 3 of the present invention in which dereverberation is performed in the frequency domain.
Will be explained.

残響除去を周波数領域で行なうようにした残響除去装置７００の機能構成例を図７に示
す。残響除去装置７００は、周波数領域で残響を除去する周波数領域残響除去部７０を備
える点が、時間領域で残響除去を行なう残響除去装置１００，５００と異なる。 FIG. 7 shows a functional configuration example of a dereverberation apparatus 700 that performs dereverberation in the frequency domain. The dereverberation apparatus 700 is different from the dereverberation apparatuses 100 and 500 that perform the dereverberation in the time domain in that the dereverberation apparatus 700 includes a frequency domain dereverberation unit 70 that removes reverberation in the frequency domain.

観測信号のエネルギーから残響信号のエネルギーをスペクトル減算により引き算する残
響除去処理の方が、音源位置の違い等に対して予測フィルタ係数の推定誤差を頑健にする
ことが、例えば上記した参考文献で報告されている。 The dereverberation process, which subtracts the energy of the reverberant signal from the energy of the observed signal by spectral subtraction, makes the estimation error of the prediction filter coefficient more robust against the difference in the sound source position, etc. Has been.

この発明の残響除去装置においても、観測信号と予測フィルタ係数から残響信号の予測
値ｅ_ｔ￣^（ｖ）を式（３０）で求め、観測信号の短時間パワースペクトルから減算するパワー減算技術を用いて残響除去を行うことが可能である。
The dereverberation apparatus of the present invention also uses a power subtraction technique that obtains the predicted value e _t ￣ ^(v) of the reverberation signal from the observation signal and the prediction filter coefficient by Equation (30) and subtracts it from the short-time power spectrum of the observation signal. It is possible to remove dereverberation.

周波数領域残響除去部７０は、観測信号ｘ_ｔ￣^（ｖ）と残響信号の予測値ｅ_ｔ￣^（ｖ）のそれぞれを、例えば短時間フーリエ変換等の一般的な周波数変換技術を用いてパワースペクトルに変換した後に、残響除去をパワースペクトル同士で行うものであり、従来技術で構成できる。例えば、観測信号ｘ_ｔ￣^（ｖ）、残響信号の予測値ｅ_ｔ￣^（ｖ）、目的信号ｓ_ｔ￣、それぞれの短時間フーリエ変換をＸ_ｎ，ｍ ^〜（Ｖ），Ｅ_ｎ，ｍ ^〜（Ｖ），Ｓ_ｎ，ｍ ^〜，と書く（ｎ，ｍは、時間フレームと周波数ビンのインデックス）とスペクトル減算は式（３１）〜（３３）で表わせる。 Frequency domain dereverberation unit 70, the observed signal x _t ¯ ^(v) and the respective predicted values e _t ¯ of the reverb signal ^(v), for example, a power spectrum using the common frequency conversion technology short time Fourier transform, etc. After conversion to, the dereverberation is performed between the power spectra, and can be configured by conventional techniques. For example, the observed signal x _t ￣ ^(v) , the predicted value of the reverberation signal e _t ￣ ^(v) , the target signal s _t ￣, and the short-time Fourier transform of each of them, X _{n, m} ^{to (V)} , En _{, m} ^{to (V)} , S _{n, m} ^˜ (where n and m are indices of time frames and frequency bins) and spectral subtraction can be expressed by equations (31) to (33).

ここでεは、１よりも十分に小さな正定数とした。さらに、得られた目的信号の短時間
フーリエ変換Ｓ_ｎ，ｍ ^〜に短時間逆フーリエ変換を適用し、オーバラップ加算などを行うことで目的信号の時間領域信号を得ることができる。上記したように実施例３によれば、より正確な残響除去が可能になる。 Here, ε is a positive constant sufficiently smaller than 1. Furthermore, short-time Fourier transform S _n of the resulting target _signal, applying an inverse Fourier transform short time _m ^~, it is possible to obtain a time domain signal of the target signal by performing such overlap-add. As described above, according to the third embodiment, more accurate dereverberation can be performed.

〔シミュレーション結果〕
この発明の残響除去方法の性能を評価した。まず、残響除去の対象になる部屋の予測フィルタの事前確率分布のモデルパラメータの推定条件を説明する。分析対象にした部屋を模式的に図８に示す。分析対象は３.５ｍ×４.５ｍ×２.５ｍの広さで残響時間０.５秒の残響室とした。高さ１.５ｍの室内中央付近の１平方メートルの範囲を３０×３０点で等分割し、各点を音源位置とした。受音点のマイクロホンは２本用意し、マイクロホン間の距離は０.２ｍとした。各音源位置から２本のマイクロホンＭｉｃ．１とＭｉｃ．２へのインパルス応答を鏡像法によって求め、各伝達経路の予測フィルタ係数を多チャネルマルチステップ線形予測によって推定し、これら９００地点分の係数を用いて、ＥＭアルゴリズムにより混合ガウス分布のモデルパラメータ｛ｇ₁，μ₁，Σ₁，…｝を推定した。混合ガウス分布の混合数は４とした。〔simulation result〕
The performance of the dereverberation method of the present invention was evaluated. First, the estimation condition of the model parameter of the prior probability distribution of the prediction filter of the room to be dereverberation will be described. FIG. 8 schematically shows a room to be analyzed. The object of analysis was a reverberation chamber with a reverberation time of 0.5 seconds and a size of 3.5 m × 4.5 m × 2.5 m. An area of 1 square meter near the center of the room with a height of 1.5 m was equally divided by 30 × 30 points, and each point was set as a sound source position. Two microphones at the receiving point were prepared, and the distance between the microphones was 0.2 m. Two microphones Mic. 1 and Mic. The impulse response to 2 is obtained by mirror image method, the prediction filter coefficient of each transmission path is estimated by multi-channel multi-step linear prediction, and the model parameter {g ₁ , μ ₁ , Σ ₁ , ...} were estimated. The number of mixtures in the mixed Gaussian distribution was 4.

得られた予測フィルタ係数の事前確率分布を用いて音声の残響除去を行った。シミュレーション条件は以下の通りとした。図８にスピーカｓｐ１とｓｐ２で示した二箇所に、音源位置を設定した。スピーカｓｐ１は学習範囲内中央に位置する。スピーカｓｐ２は学習範囲外に位置する。多チャネルマルチステップ線形予測の次数は２８００、ステップサイスＤ＝４００とした。音源はＡＴＲ音声データベースを用いた。サンプリング周波数は８ｋＨｚ、量子化ビット数は１６とした。残響を含まない音声信号に、鏡像法で作成した残響時間０.５秒に相当するスピーカｓｐ１とｓｐ２の各音源位置から受音点までのインパルス応答を畳み込んで、０.５、１.０、2.０、4.０秒と異なる長さのステレオ観測信号を作成した。これらの異なる長さの観測信号を用いて、この発明の残響除去方法と従来法によるマイクロホンＭｉｃ．１における観測信号の残響除去性能を比較した。 The dereverberation of speech was performed using the obtained prior probability distribution of the prediction filter coefficients. The simulation conditions were as follows. Sound source positions were set at two locations indicated by speakers sp1 and sp2 in FIG. The speaker sp1 is located at the center in the learning range. The speaker sp2 is located outside the learning range. The order of multi-channel multi-step linear prediction was 2800, and step size D = 400. The sound source was an ATR audio database. The sampling frequency was 8 kHz and the number of quantization bits was 16. An impulse response from the sound source positions of the speakers sp1 and sp2 corresponding to a reverberation time of 0.5 seconds created by a mirror image method is convoluted with an audio signal that does not include reverberation, and is 0.5 to 1.0. Stereo observation signals with different lengths of 2.0 and 4.0 seconds were created. Using these observation signals of different lengths, the dereverberation method of the present invention and the microphone Mic. 1 compared the dereverberation performance of the observed signal.

その比較を残響の無い原音声信号と残響除去音声のケプストラム（Cepstrum）歪みによって行った。ケプストラム歪み（ＣＤ）は式（３４）で定義される。
The comparison was made by the cepstrum distortion of the original speech signal without reverberation and the dereverberation speech. Cepstrum distortion (CD) is defined by equation (34).

ここで、ｃ_ｋ＾、ｃ_ｋはそれぞれ、残響除去音声、原音声信号のケプストラム係数であり、Ｄ₀、Ｄ₁はケプストラム係数の次元である。このシミュレーションでは０次から１２次までのケプストラム係数を用いてケプストラム歪みを定義した。また、各次元のケプストラム係数は時系列の平均値を減算したものを用いた。 Here, c _k ^ and c _k are the cepstrum coefficients of the dereverberation voice and the original voice signal, respectively, and D ₀ and D ₁ are the dimensions of the cepstrum coefficients. In this simulation, cepstrum distortion was defined using cepstrum coefficients from the 0th order to the 12th order. The cepstrum coefficient for each dimension was obtained by subtracting the average value of the time series.

従来法とこの発明の方法による各観測信号長毎のケプストラム歪みの時間平均値と観測信号のケプストラム歪みの時間平均値を図９に示す。図９（ａ）は音源位置がスピーカｓｐ１、図９（ｂ）はスピーカｓｐ２の場合を示す。それぞれの横軸は観測時間長[秒]、縦軸はケプストラム歪みの時間平均値を[ｄＢ]で表わす。 FIG. 9 shows the time average value of the cepstrum distortion for each observation signal length and the time average value of the cepstrum distortion of the observation signal according to the conventional method and the method of the present invention. FIG. 9A shows the case where the sound source position is the speaker sp1, and FIG. Each horizontal axis represents the observation time length [second], and the vertical axis represents the time average value of cepstrum distortion in [dB].

従来法のケプストラム歪みの時間平均値（●でプロット）に対して、観測時間長が２秒以内の範囲においてこの発明の方法（□でプロット）の方が歪みが少ない特性を示す。特に観測時間長が０.５秒の場合に、従来法では観測信号よりも悪化した歪みを示すのに対し、この発明の方法は音源位置の場所によらずに大きく（２.３〜３ｄＢ）改善されたケプストラム歪みの時間平均値を示す。 Compared with the time average value of the cepstrum distortion of the conventional method (plotted with ●), the method of the present invention (plotted with □) shows a characteristic with less distortion when the observation time length is within 2 seconds. In particular, when the observation time length is 0.5 seconds, the conventional method shows distortion that is worse than that of the observation signal, whereas the method of the present invention is large (2.3 to 3 dB) regardless of the location of the sound source. The time-averaged value of improved cepstrum distortion is shown.

以上述べたようにこの発明の残響除去方法によれば、予測フィルタ係数を推定するために利用できる観測時間長が短い場合に、従来法よりも音源位置の違いに頑健で良好な残響除去が行えることが確認できた。 As described above, according to the dereverberation method of the present invention, when the observation time length that can be used for estimating the prediction filter coefficient is short, the dereverberation method is more robust against the difference in sound source position than the conventional method and can perform good dereverberation. I was able to confirm.

なお、この発明の方法及び装置は上述の実施形態に限定されるものではなく、この発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記方法及び装置において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 In addition, the method and apparatus of this invention are not limited to the above-mentioned embodiment, In the range which does not deviate from the meaning of this invention, it can change suitably. Further, the processes described in the above method and apparatus are not only executed in time series according to the order of description, but also may be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Good.

また、上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。 Further, when the processing means in the above apparatus is realized by a computer, the processing contents of functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）/ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

この発明の残響除去方法を用いた残響除去装置１００の機能構成例を示す図。The figure which shows the function structural example of the dereverberation apparatus 100 using the dereverberation method of this invention. 残響除去装置１００の動作フローを示す図。The figure which shows the operation | movement flow of the dereverberation apparatus. この発明の残響除去装置３００の機能構成例を示す図。The figure which shows the function structural example of the dereverberation apparatus 300 of this invention. 残響除去装置３００の動作フローを示す図。The figure which shows the operation | movement flow of the dereverberation apparatus. この発明の残響除去装置５００の機能構成例を示す図。The figure which shows the function structural example of the dereverberation apparatus 500 of this invention. 残響除去装置５００の動作フローを示す図。The figure which shows the operation | movement flow of the dereverberation apparatus. この発明の残響除去装置７００の機能構成例を示す図。The figure which shows the function structural example of the dereverberation apparatus 700 of this invention. シミュレーションに用いた分析対象にした部屋を模式的に示す図。The figure which shows typically the room made into the analysis object used for simulation. シミュレーション結果を示す図であり、（ａ）は音源位置がスピーカｓｐ１の場合、（ｂ）は音源位置がスピーカｓｐ２の場合を示す。It is a figure which shows a simulation result, (a) shows the case where a sound source position is speaker sp1, (b) shows the case where a sound source position is speaker sp2. 非特許文献１に開示された従来の残響除去装置９００の機能構成例を示す図。The figure which shows the function structural example of the conventional dereverberation apparatus 900 disclosed by the nonpatent literature 1. FIG.

Claims

A sound source model estimation unit for estimating a model parameter of a sound source model that does not include reverberation using a time-series observation signal as an input;
An observation signal covariance estimation unit that estimates the covariance matrix and covariance vector of the observation signal by using the model parameter and the observation signal as inputs;
A prediction filter model in which a prediction filter model in which a prediction filter coefficient for predicting a reverberation signal included in a signal observed at a place where the observation signal is collected is modeled with a probability density function of the prediction filter coefficient is recorded in advance. A recording section;
A prediction filter estimation unit that estimates a prediction filter coefficient using the covariance matrix and covariance vector of the observed signal as input, and the prediction filter model;
A dereverberation unit that estimates a speech signal that does not include reverberation using the observed signal and the prediction filter coefficient as inputs,
A dereverberation apparatus comprising:

The dereverberation apparatus according to claim 1, wherein
The prediction filter estimation unit
A sound source of a speech signal that does not include reverberation determined depending on the function value based on the prediction filter model of the prediction filter coefficient and the prediction filter coefficient, with the covariance matrix and covariance vector of the observation signal and the prediction filter model as inputs. An dereverberation apparatus characterized by estimating a prediction filter coefficient that maximizes an optimization function defined by a sum of function values based on a model.

In the dereverberation apparatus according to claim 1 or 2 ,
An update unit that operates the observation signal covariance estimation unit at predetermined time intervals,
The observation signal covariance estimation unit updates the covariance matrix and the covariance vector at the predetermined time interval from the model parameters and the time series of the observation signals sequentially input,
The dereverberation apparatus, wherein the prediction filter estimation unit and the dereverberation unit operate corresponding to the updated covariance matrix and the covariance vector.

The dereverberation apparatus according to any one of claims 1 to 3 ,
The dereverberation apparatus, wherein the dereverberation unit operates in a frequency domain.

A sound source model estimation unit for estimating a model parameter of a sound source model that does not include reverberation using a time-series observation signal as an input;
An observation signal covariance estimation unit that estimates the covariance matrix and covariance vector of the observation signal by using the model parameter and the observation signal as inputs; and
A prediction filter estimation unit calculates a prediction filter coefficient for predicting a reverberation signal included in a signal observed at a location where the observation signal is collected , and a covariance matrix and a covariance vector of the observation signal . A prediction filter estimation process for estimating a prediction filter coefficient using a prediction filter model modeled by a probability density function as an input;
A dereverberation unit, which receives the observation signal and the prediction filter coefficient as input and estimates a speech signal that does not include reverberation;
A dereverberation method comprising:

In the dereverberation method according to claim 5,
The prediction filter estimation process is as follows:
A sound source of a speech signal that does not include reverberation determined depending on the function value based on the prediction filter model of the prediction filter coefficient and the prediction filter coefficient, with the covariance matrix and covariance vector of the observation signal and the prediction filter model as inputs. A dereverberation method, which is a process of estimating a prediction filter coefficient that maximizes an optimization function defined by a sum of function values based on a model.

In the dereverberation method according to claim 5 or 6 ,
An dereverberation method comprising: an update process in which the observed signal covariance estimation process, the prediction filter estimation process, and the dereverberation process are repeatedly performed at predetermined time intervals.

The dereverberation method according to any one of claims 5 to 7 ,
The dereverberation method, wherein the dereverberation process operates in a frequency domain.

Device program for causing a computer to function as a dereverberation apparatus according to any one of claims 1 to 4.

A computer-readable recording medium on which any of the apparatus programs according to claim 9 is recorded.