JP5172536B2

JP5172536B2 - Reverberation removal apparatus, dereverberation method, computer program, and recording medium

Info

Publication number: JP5172536B2
Application number: JP2008214462A
Authority: JP
Inventors: 弘和亀岡; 智広中谷; 拓也吉岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-08-22
Filing date: 2008-08-22
Publication date: 2013-03-27
Anticipated expiration: 2028-08-22
Also published as: JP2010049102A

Abstract

<P>PROBLEM TO BE SOLVED: To flexibly remove reverberation even when reverberation environment is changed according to sound source movement and room temperature variation. <P>SOLUTION: A observation power time sequence creation section 1 creates an observation power time sequence. An initial setting section 2 sets an indoor impulse response estimation value and an original sound power estimation value time sequence. A reverberation power estimation value time sequence calculating section 3 calculates a reverberation power estimation value time sequence. An indoor impulse response updating section 4 updates the indoor impulse response estimation value. An original sound power estimation value time sequence updating section 5 updates the original sound power estimation value time sequence. A parameter normalizing section 6 normalizes the indoor impulse response estimation value, and normalizes the original sound power estimation value time sequence. A convergence determination section 7 determines whether or not, the indoor impulse response estimation value and the original sound power estimation value time sequence satisfy predetermined criteria. A parameter output section 8 outputs the indoor impulse response estimation value and the original sound power estimation value time sequence. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、音響信号から残響成分を除去する残響除去装置、残響除去方法、コンピュータプログラムおよび記録媒体に関する。 The present invention relates to a dereverberation apparatus, a dereverberation method, a computer program, and a recording medium that remove a reverberation component from an acoustic signal.

音響信号から残響成分を除去する方法は、これまでさまざまなアプローチが提案されている。例えば、単一のマイクロホンに入力された音響信号に対して動作するアプローチとしては、クリーン音声に関する仮定やモデル（調波性、スパース性、自己回帰モデル、自己相関関数コードブックなど）に基づいて、復元音声ができるだけクリーンな音声らしさを有するように室内インパルス応答の逆フィルタを推定するものが知られている（例えば、非特許文献１参照）。一般的に、室内インパルス応答は音源位置に応じて時々刻々と著しく変化することがあるため、非特許文献１に記載されている技術においては、短い観測信号から、いかに頑健に逆フィルタを推定できるかが重要課題となっている。 Various approaches have been proposed for removing reverberation components from an acoustic signal. For example, an approach that operates on an acoustic signal input to a single microphone is based on clean speech assumptions and models (such as harmonics, sparsity, autoregressive models, autocorrelation function codebooks) There is known one that estimates an inverse filter of a room impulse response so that the restored sound has as clean a sound as possible (for example, see Non-Patent Document 1). In general, the indoor impulse response may change remarkably every moment depending on the sound source position. Therefore, in the technique described in Non-Patent Document 1, how robustly the inverse filter can be estimated from a short observation signal. Is an important issue.

また、音響信号から残響成分を除去する方法としては、サブバンドごとのパワー時間包絡に対して逆フィルタリングを行なう技術も知られている（例えば、非特許文献２、非特許文献３参照）。非特許文献２および非特許文献３に記載の技術は、室内インパルス応答の中でも音源位置に応じて著しく変化するのは特に位相スペクトルであり、振幅スペクトルないしパワースペクトルに関しては比較的影響を受けにくいという仮説を基礎としている。パワー包絡の畳み込みモデルは近似的にしか成り立たないものであるため、残響除去精度に関してはある程度の限界があることが予想されるが、クリーン音声らしさを規準として室内インパルス応答の逆フィルタを推定する非特許文献１に記載の技術に比べて、音源位置などの変化に対してある程度頑健に動作する可能性がある。
Ｔ．Ｎａｋａｔａｎｉ，Ｂ．Ｊｕａｎｇ，Ｔ．Ｈｉｋｉｃｈｉ，Ｔ．Ｙｏｓｈｉｏｋａ，Ｋ．Ｋｉｎｏｓｈｉｔａ，Ｍ．Ｄｅｌｃｒｏｉｘ，Ｍ．Ｍｉｙｏｓｈｉ，“ＳｔｕｄｙｏｎＳｐｅｅｃｈＤｅｒｅｖｅｒｂｅｒａｔｉｏｎｗｉｔｈＡｕｔｏｃｏｒｒｅｌａｔｉｏｎＣｏｄｅｂｏｏｋ，”ｉｎＰｒｏｃ．ＩＣＡＳＳＰ２００７，ｐｐ．１９３−１９７，２００７．広林茂樹，野村博昭，小池恒彦，東山三樹夫，“パワーエンベエロープ伝達関数の逆フィルタ処理による残響音声の回復，”電子情報通信学会論文誌，Ｖｏｌ．Ｊ８１−Ａ，Ｎｏ．１０，ｐｐ．１３２３−１３３０，１９９８．Ｍ．Ｕｎｏｋｉ，Ｍ．Ｆｕｒｕｋａｗａ，Ｋ．Ｓａｋａｔａ，Ｍ．Ａｋａｇｉ，”ＡＭｅｔｈｏｄＢａｓｅｄｏｎｔｈｅＭＴＦＣｏｎｃｅｐｔｆｏｒＤｅｒｅｖｅｒｂｅｒａｔｉｎｇｔｈｅＰｏｗｅｒＥｎｖｅｌｏｐｅｆｒｏｍｔｈｅＲｅｖｅｒｂｅｒａｎｔＳｉｇｎａｌ，”ｉｎＰｒｏｃ．ＩＣＡＳＳＰ２００３，Ｖｏｌ．１，ｐｐ．８８８−８９１，２００３． As a method for removing a reverberation component from an acoustic signal, a technique of performing inverse filtering on a power time envelope for each subband is also known (see, for example, Non-Patent Document 2 and Non-Patent Document 3). According to the techniques described in Non-Patent Document 2 and Non-Patent Document 3, it is the phase spectrum that changes significantly depending on the sound source position in the room impulse response, and the amplitude spectrum or the power spectrum is relatively unaffected. Based on a hypothesis. The convolution model of the power envelope can only be approximated, so it is expected that there will be a certain limit to the accuracy of dereverberation, but it is not possible to estimate the inverse filter of the room impulse response based on the clean speech quality. Compared to the technique described in Patent Document 1, there is a possibility that the operation may be more robust against changes in the sound source position and the like.
T.A. Nakatani, B .; Jung, T .; Hikichi, T .; Yoshioka, K .; Kinoshita, M .; Delcroix, M .; Miyoshi, “Study on Speech Develberation with Automation Codebook,” in Proc. ICASSP 2007, pp. 193-197, 2007. Shigeki Hirobayashi, Hiroaki Nomura, Tsunehiko Koike, Mikio Higashiyama, “Recovering reverberant speech by inverse filtering of power envelope transfer function,” IEICE Transactions, Vol. J81-A, no. 10, pp. 1323-1330, 1998. M.M. Unoki, M .; Furukawa, K .; Sakata, M .; Akagi, “A Method Based on the MTF Concept for Developing the Power Environment from the Reverse Signal,” in Proc. ICASSP 2003, Vol. 1, pp. 888-891, 2003.

非特許文献２および非特許文献３の技術では、残響成分のパワー包絡をパラメトリックな関数でモデル化しており、非特許文献３の技術では、そのパラメータを変調度と呼ぶ尺度をもとに推定している。しかしながら、実環境では音源の移動や室温変化などに伴って残響環境の変化があるため、実際の残響成分がこれらの関数クラスに理想的に従うことは極めて稀である。よって、これらの技術では良好に残響を除去することが必ずしも保証されないという問題がある。 In the techniques of Non-Patent Document 2 and Non-Patent Document 3, the power envelope of the reverberation component is modeled by a parametric function, and in the technique of Non-Patent Document 3, the parameter is estimated based on a scale called modulation degree. ing. However, in an actual environment, since the reverberation environment changes as the sound source moves or changes in room temperature, it is extremely rare for an actual reverberation component to ideally follow these function classes. Therefore, these techniques have a problem that it is not always guaranteed that the reverberation is satisfactorily removed.

本発明はこのような事情に鑑みてなされたものであり、音源の移動や室温変化などに伴う残響環境の変化に柔軟に対応しつつ残響を除去することが可能な残響除去装置、残響除去方法、コンピュータプログラムおよび記録媒体を提供することを目的とする。 The present invention has been made in view of such circumstances, and a dereverberation apparatus and a dereverberation method capable of removing reverberation while flexibly responding to changes in a reverberation environment accompanying movement of a sound source or changes in room temperature. An object of the present invention is to provide a computer program and a recording medium.

本発明は、音響信号の入力を受け付け、短時間周波数分析により周波数チャネルごとのサブバンド信号の振幅またはパワーの時系列である観測パワー時系列を生成する観測パワー時系列生成部と、周波数チャネルごとの非負制約をもつ室内インパルス応答推定値と、原音の周波数チャネルごとのパワー推定値時系列である原音パワー推定値時系列とを設定する初期設定部と、前記室内インパルス応答推定値と、前記原音パワー推定値時系列とを畳み込み、周波数チャネルごとの残響音モデルのパワー時系列である残響音パワー推定値時系列を算出する残響音パワー推定値時系列算出部と、前記観測パワー時系列と、前記残響音パワー推定値時系列と、前記室内インパルス応答推定値と、前記原音パワー推定値時系列とに基づいて、非負制約を満たして前記室内インパルス応答推定値を更新する室内インパルス応答更新部と、前記観測パワー時系列と、前記残響音パワー推定値時系列と、前記室内インパルス応答推定値と、前記原音パワー推定値時系列とに基づいて、非負制約を満たして前記原音パワー推定値時系列を更新する原音パワー推定値時系列更新部と、前記室内インパルス応答更新部が更新した前記室内インパルス応答推定値を、当該室内インパルス応答推定値の要素値の総和が一定値になるように規格化し、前記原音パワー推定値時系列更新部が更新した前記原音パワー推定値時系列を、当該原音パワー推定値時系列の要素値の総和が一定値になるように規格化するパラメータ規格化部と、前記パラメータ規格化部が規格化した前記室内インパルス応答推定値と前記原音パワー推定値時系列とが、所定の規準を満たしているか否かを判定する収束判定部と、前記収束判定部が、前記パラメータ規格化部が規格化した前記室内インパルス応答推定値と前記原音パワー推定値時系列とが、所定の規準を満たしていると判定した場合、当該室内インパルス応答推定値と当該原音パワー推定値時系列とを出力するパラメータ出力部と、を備え、前記収束判定部が、前記パラメータ規格化部が規格化した前記室内インパルス応答推定値と前記原音パワー推定値時系列とが、所定の規準を満たしていないと判定した場合、前記パラメータ規格化部が規格化した前記室内インパルス応答推定値と前記原音パワー推定値時系列とに基づいて、前記残響音パワー推定値時系列算出部は前記残響音パワー推定値時系列を算出し、前記室内インパルス応答更新部は前記室内インパルス応答推定値を更新し、前記原音パワー推定値時系列更新部は前記原音パワー推定値時系列を更新し、前記原音パワー推定値時系列更新部は前記原音パワー推定値時系列を更新することを特徴とする残響除去装置である。 The present invention receives an input of an acoustic signal and generates an observation power time series that is a time series of the amplitude or power of a subband signal for each frequency channel by short-time frequency analysis, and for each frequency channel An initial setting unit that sets a room impulse response estimated value having a non-negative constraint and an original sound power estimated time series that is a power estimated value time series for each frequency channel of the original sound, the indoor impulse response estimated value, and the original sound A reverberant sound power estimated value time series calculating unit that calculates a reverberant sound power estimated value time series that is a power time series of a reverberant sound model for each frequency channel, and the observed power time series, Based on the reverberant power estimate time series, the indoor impulse response estimate, and the original sound power estimate time series, non-negative constraints are set. Thus, the room impulse response update unit for updating the room impulse response estimated value, the observation power time series, the reverberation sound power estimated time series, the room impulse response estimated value, and the original sound power estimated value An original sound power estimated value time series updating unit that updates the original sound power estimated value time series while satisfying a non-negative constraint, and the indoor impulse response estimated value updated by the indoor impulse response updating unit. The original sound power estimated value time series normalized by the sum of the element values of the impulse response estimated values to be a constant value and updated by the original sound power estimated value time series update unit is the element value of the original sound power estimated value time series. A parameter normalization unit that normalizes so that the sum of the values becomes a constant value, the indoor impulse response estimated value normalized by the parameter normalization unit, and the original sound A convergence determination unit for determining whether or not the time estimation value time series satisfies a predetermined criterion, and the convergence determination unit, the indoor impulse response estimation value and the original sound power normalized by the parameter normalization unit A parameter output unit that outputs the room impulse response estimated value and the original sound power estimated value time series when it is determined that the estimated time series satisfies a predetermined criterion, the convergence determining unit When the room impulse response estimated value and the original sound power estimated value time series normalized by the parameter normalization unit do not satisfy a predetermined criterion, the room normalized by the parameter normalization unit Based on the impulse response estimated value and the original sound power estimated value time series, the reverberant power estimated value time series calculating unit calculates the reverberant power estimated value time series, and An inner impulse response update unit updates the indoor impulse response estimated value, the original sound power estimated value time series update unit updates the original sound power estimated value time series, and the original sound power estimated value time series update unit updates the original sound power. An dereverberation apparatus that updates an estimated time series.

また、本発明の残響除去装置において、前記室内インパルス応答更新部は、周波数チャネルごとに、前記観測パワー時系列と前記原音パワー推定値時系列との相関関数である観測音・推定原音間相関関数を算出する観測音・推定原音間相関関数算出部と、周波数チャネルごとに、前記残響音パワー推定値時系列と前記原音パワー推定値時系列との相関関数である推定残響音・推定原音間相関関数を算出する推定残響音・推定原音間相関関数算出部と、周波数チャネルごとに、前記観測音・推定原音間相関関数の時系列の要素値を、前記推定残響音・推定原音間相関関数の時系列の要素値で除算した値である室内インパルス応答推定値更新係数を算出する室内インパルス応答推定値更新係数算出部と、周波数チャネルごとに、前記室内インパルス応答推定値と前記室内インパルス応答推定値更新係数とを要素ごとに積算し、室内インパルス応答推定値更新値を算出する室内インパルス応答推定値更新値出力部と、を備えることを特徴とする。 Further, in the dereverberation apparatus of the present invention, the indoor impulse response update unit includes, for each frequency channel, a correlation function between the observed sound and the estimated original sound, which is a correlation function between the observed power time series and the original sound power estimated value time series. and the observed sound-estimated original correlation function calculation section for calculating a, for each frequency channel, between the estimated reverberation Probable original sound is a correlation function of the reverberation power estimate time series and the original sound power estimate time series An estimated reverberation sound / estimated original sound correlation function calculation unit for calculating a correlation function, and a time series element value of the observed sound / estimated original sound correlation function for each frequency channel, the estimated reverberant sound / estimated original sound correlation function An indoor impulse response estimated value update coefficient calculating unit that calculates an indoor impulse response estimated value update coefficient that is a value divided by a time-series element value, and for each frequency channel, the indoor impulse Integrating the response estimation value and the room impulse response estimate update coefficient for each element, and the room impulse response estimate update value output unit that calculates the room impulse response estimate update value, comprising: a.

また、本発明の残響除去装置において、前記室内インパルス応答更新部は、周波数チャネルごとに、前記観測パワー時系列を前記残響音パワー推定値時系列で要素ごとに除算した時系列と、前記原音パワー推定値時系列との相関関数であるモデル化誤差比系列・推定原音間相関関数を算出するモデル化誤差比系列・推定原音間相関関数算出部と、周波数チャネルごとに、前記原音パワー推定値時系列の各特定範囲の要素値の部分和を要素値とした系列である推定原音部分和系列を算出する推定原音部分和系列算出部と、周波数チャネルごとに、モデル化誤差比系列・推定原音間相関関数を、前記推定原音部分和系列の要素値で除算した値である室内インパルス応答推定値更新係数を算出する室内インパルス応答推定値更新係数算出部と、周波数チャネルごとに、前記室内インパルス応答推定値と前記室内インパルス応答推定値更新係数とを要素ごとに積算し、室内インパルス応答推定値更新値を算出する室内インパルス応答推定値更新値出力部と、を備えることを特徴とする。 Further, in the dereverberation apparatus of the present invention, the room impulse response update unit includes, for each frequency channel, a time series obtained by dividing the observation power time series by the reverberation sound power estimated value time series, and the original sound power. A modeling error ratio sequence / estimated original sound correlation function calculating unit that calculates a correlation function between a modeled error ratio sequence / estimated original sound, which is a correlation function with an estimated value time series, and for each frequency channel, Estimated original sound partial sum series calculation unit that calculates an estimated original sound partial sum series that is a series with element values of the partial sum of element values of each specific range of the sequence, and between the modeling error ratio sequence and the estimated original sound for each frequency channel the correlation function, and the room impulse response estimate update coefficient calculation unit for calculating a room impulse response estimate update coefficient is a value obtained by dividing the element values of the estimated original partial sum sequence, frequency An indoor impulse response estimated value update value output unit that calculates the indoor impulse response estimated value update value by integrating the indoor impulse response estimated value and the indoor impulse response estimated value update coefficient for each element for each channel. It is characterized by that.

また、本発明の残響除去装置において、前記原音パワー推定値時系列更新部は、周波数チャネルごとに、前記観測パワー時系列と前記室内インパルス応答推定値との相関関数である観測音・推定インパルス応答間相関関数を算出する観測音・推定インパルス応答間相関関数算出部と、周波数チャネルごとに、前記残響音パワー推定値時系列と前記室内インパルス応答推定値との相関関数を算出し、当該相関関数と、前記原音パワー推定値時系列を要素ごとに定数乗しさらに定数倍した時系列とを加算した時系列であるスパース補正項つき推定残響音・推定インパルス応答間相関関数を算出するスパース補正項つき推定残響音・推定インパルス応答間相関関数算出部と、周波数チャネルごとに、前記観測音・推定インパルス応答間相関関数の時系列の要素を、前記スパース補正項つき推定残響音・推定インパルス応答間相関関数の時系列の要素値で除算した値である原音パワー推定値時系列更新係数を算出する原音パワー推定値時系列更新係数算出部と、周波数チャネルごとに、前記原音パワー推定値時系列と前記原音パワー推定値時系列更新係数とを要素ごとに積算し、原音パワー推定値時系列更新値を算出する原音パワー推定値時系列更新値出力部と、を備えることを特徴とする。 Further, in the dereverberation apparatus of the present invention, the original sound power estimated value time-series updating unit is an observation sound / estimated impulse response which is a correlation function between the observed power time series and the indoor impulse response estimated value for each frequency channel. A correlation function between the observed sound / estimated impulse response for calculating an inter-correlation function, and a correlation function between the reverberation power estimate time series and the indoor impulse response estimate for each frequency channel, and the correlation function And a sparse correction term for calculating a correlation function between the estimated reverberant sound and the estimated impulse response with a sparse correction term, which is a time series obtained by adding the time series obtained by multiplying the original sound power estimated value time series by a constant for each element and further multiplying by a constant and regarding the estimated reverberation-estimated impulse response correlation function calculation unit, for each frequency channel, when the observed sound-estimated impulse response correlation function Original sound power estimated value time series update for calculating an original sound power estimated value time series update coefficient, which is a value obtained by dividing the elements of the column by the time series element value of the estimated reverberant sound / estimated impulse response correlation function with the sparse correction term The original sound power estimated value for calculating the original sound power estimated value time series update value by adding the original sound power estimated value time series and the original sound power estimated value time series update coefficient for each element for the coefficient calculating unit and each frequency channel And a time-series update value output unit.

また、本発明の残響除去装置において、前記原音パワー推定値時系列更新部は、周波数チャネルごとに、前記観測パワー時系列を前記残響音パワー推定値時系列で要素ごとに除算した時系列と、前記室内インパルス応答推定値との相関関数であるモデル化誤差比系列・推定インパルス応答間相関関数を算出するモデル化誤差比系列・推定インパルス応答間相関関数算出部と、周波数チャネルごとに、前記室内インパルス応答推定値の各特定範囲の要素値の部分和を要素値とした系列を算出し、当該系列と、前記原音パワー推定値時系列を要素ごとに定数乗しさらに定数倍した時系列とを加算した時系列であるスパース補正項つき推定インパルス応答部分和系列を算出するスパース補正項つき推定インパルス応答部分和系列算出部と、周波数チャネルごとに、前記モデル化誤差比系列・推定インパルス応答間相関関数の時系列の要素を、前記スパース補正項つき推定インパルス応答部分和系列の要素値で除算した値である原音パワー推定値時系列更新係数を算出する原音パワー推定値時系列更新係数算出部と、周波数チャネルごとに、前記原音パワー推定値時系列と前記原音パワー推定値時系列更新係数とを要素ごとに積算し、原音パワー推定値時系列更新値を算出する原音パワー推定値時系列更新値出力部と、を備えることを特徴とする。 Further, in the dereverberation apparatus of the present invention, the original sound power estimated value time series update unit, for each frequency channel, a time series obtained by dividing the observed power time series by the reverberant power estimated value time series, and A modeled error ratio sequence / estimated impulse response correlation function calculating unit that calculates a correlation function between a modeled error ratio sequence / estimated impulse response, which is a correlation function with the indoor impulse response estimated value, and for each frequency channel, A series having an element value as a partial sum of the element values of each specific range of the impulse response estimated value is calculated, and the series and a time series obtained by multiplying the original sound power estimated value time series by a constant and multiplying by a constant An estimated impulse response partial sum sequence calculation unit with a sparse correction term that calculates an estimated impulse response partial sum sequence with a sparse correction term that is an added time series; For each channel, the original sound power estimated value time series is a value obtained by dividing the time series elements of the modeled error ratio series / estimated impulse response correlation function by the element values of the estimated impulse response partial sum series with sparse correction terms. An original sound power estimated value time series update coefficient calculation unit for calculating an update coefficient, and for each frequency channel, the original sound power estimated value time series and the original sound power estimated value time series update coefficient are integrated element by element to estimate the original sound power An original sound power estimated value time series update value output unit for calculating a value time series update value.

また、本発明は、観測パワー時系列生成部が、音響信号の入力を受け付け、短時間周波数分析により周波数チャネルごとのサブバンド信号の振幅またはパワーの時系列である観測パワー時系列を生成する観測パワー時系列生成ステップと、初期設定部が、周波数チャネルごとの非負制約をもつ室内インパルス応答推定値と、原音の周波数チャネルごとのパワー推定値時系列である原音パワー推定値時系列とを設定する初期設定ステップと、残響音パワー推定値時系列算出部が、前記室内インパルス応答推定値と、前記原音パワー推定値時系列とを畳み込み、周波数チャネルごとの残響音モデルのパワー時系列である残響音パワー推定値時系列を算出する残響音パワー推定値時系列算出ステップと、室内インパルス応答更新部が、前記観測パワー時系列と、前記残響音パワー推定値時系列と、前記室内インパルス応答推定値と、前記原音パワー推定値時系列とに基づいて、非負制約を満たして前記室内インパルス応答推定値を更新する室内インパルス応答更新ステップと、原音パワー推定値時系列更新部が、前記観測パワー時系列と、前記残響音パワー推定値時系列と、前記室内インパルス応答推定値と、前記原音パワー推定値時系列とに基づいて、非負制約を満たして前記原音パワー推定値時系列を更新する原音パワー推定値時系列更新ステップと、パラメータ規格化部が、前記室内インパルス応答更新ステップで更新した前記室内インパルス応答推定値を、当該室内インパルス応答推定値の要素値の総和が一定値になるように規格化し、前記原音パワー推定値時系列更新ステップで更新した前記原音パワー推定値時系列を、当該原音パワー推定値時系列の要素値の総和が一定値になるように規格化するパラメータ規格化ステップと、収束判定部が、前記パラメータ規格化ステップで規格化した前記室内インパルス応答推定値と前記原音パワー推定値時系列とが、所定の規準を満たしているか否かを判定する収束判定ステップと、前記収束判定ステップで、前記パラメータ規格化部が規格化した前記室内インパルス応答推定値と前記原音パワー推定値時系列とが、所定の規準を満たしていると判定した場合、パラメータ出力部が当該室内インパルス応答推定値と当該原音パワー推定値時系列とを出力するパラメータ出力ステップと、を有し、前記収束判定ステップで、前記パラメータ規格化部が規格化した前記室内インパルス応答推定値と前記原音パワー推定値時系列とが、所定の規準を満たしていないと判定した場合、前記パラメータ規格化ステップで規格化した前記室内インパルス応答推定値と前記原音パワー推定値時系列とに基づいて、前記残響音パワー推定値時系列算出ステップで前記残響音パワー推定値時系列を算出し、前記室内インパルス応答更新ステップで前記室内インパルス応答推定値を更新し、前記原音パワー推定値時系列更新ステップで前記原音パワー推定値時系列を更新し、前記原音パワー推定値時系列更新ステップで前記原音パワー推定値時系列を更新することを特徴とする残響除去方法である。 The present invention also provides an observation power time series generation unit that receives an input of an acoustic signal and generates an observation power time series that is a time series of amplitude or power of a subband signal for each frequency channel by short-time frequency analysis. The power time series generation step and the initial setting unit set a room impulse response estimation value having a non-negative constraint for each frequency channel and an original sound power estimation value time series that is a power estimation time series for each frequency channel of the original sound. A reverberation sound that is a power time series of a reverberation sound model for each frequency channel by initial setting step and a reverberation sound power estimate time series calculation unit convolves the room impulse response estimation value and the original sound power estimation value time series. A reverberation sound power estimate time series calculating step for calculating a power estimate time series, and an indoor impulse response updating unit include the observation power Based on a time series, the reverberation sound power estimated value time series, the room impulse response estimated value, and the original sound power estimated value time series, a room impulse that satisfies the non-negative constraint and updates the room impulse response estimated value A response update step; and an original sound power estimated value time series update unit based on the observed power time series, the reverberant sound power estimated value time series, the indoor impulse response estimated value, and the original sound power estimated value time series. An original sound power estimated value time series updating step that updates the original sound power estimated value time series satisfying a non-negative constraint, and a parameter normalization unit that updates the indoor impulse response estimated value updated in the indoor impulse response updating step, Normalizing the sum of the element values of the indoor impulse response estimated value to be a constant value, the original sound power estimated value time series update step A parameter normalization step for normalizing the updated original sound power estimated value time series so that the sum of the element values of the original sound power estimated value time series becomes a constant value, and a convergence determining unit in the parameter normalizing step In the convergence determination step for determining whether the normalized indoor impulse response estimated value and the original sound power estimated value time series satisfy a predetermined criterion, and in the convergence determination step, the parameter normalization unit defines a standard When it is determined that the converted indoor impulse response estimated value and the original sound power estimated time series satisfy a predetermined criterion, the parameter output unit outputs the indoor impulse response estimated value and the original sound power estimated time series A parameter output step for outputting the indoor impulse normalized by the parameter normalization unit in the convergence determination step. When it is determined that the response response estimated value and the original sound power estimated value time series do not satisfy a predetermined criterion, the room impulse response estimated value and the original sound power estimated value time series normalized in the parameter normalizing step Based on the above, the reverberant sound power estimated value time series calculating step calculates the reverberant sound power estimated value time series, the indoor impulse response updated step updates the indoor impulse response estimated value, and the original sound power estimated value In the dereverberation method, the original sound power estimated value time series is updated in a time series updating step, and the original sound power estimated value time series is updated in the original sound power estimated value time series updating step.

また、本発明の残響除去方法において、前記室内インパルス応答更新ステップは、観測音・推定原音間相関関数算出部が、周波数チャネルごとに、前記観測パワー時系列と前記原音パワー推定値時系列との相関関数である観測音・推定原音間相関関数を算出する観測音・推定原音間相関関数算出ステップと、推定残響音・推定原音間相関関数算出部が、周波数チャネルごとに、前記残響音パワー推定値時系列と前記原音パワー推定値時系列との相関関数である推定残響音・推定原音間相関関数を算出する推定残響音・推定原音間相関関数算出ステップと、室内インパルス応答推定値更新係数算出部が、周波数チャネルごとに、前記観測音・推定原音間相関関数の時系列の要素値を、前記推定残響音・推定原音間相関関数の時系列の要素値で除算した値である室内インパルス応答推定値更新係数を算出する室内インパルス応答推定値更新係数算出ステップと、室内インパルス応答推定値更新値出力部が、周波数チャネルごとに、前記室内インパルス応答推定値と前記室内インパルス応答推定値更新係数とを要素ごとに積算し、室内インパルス応答推定値更新値を算出する室内インパルス応答推定値更新値出力ステップと、を含むことを特徴とする。 Further, in the dereverberation method of the present invention, the indoor impulse response update step includes a step in which the observed sound / estimated original sound correlation function calculating unit calculates the observed power time series and the original sound power estimated value time series for each frequency channel. An observed sound / estimated original sound correlation function calculating step for calculating a correlation function between the observed sound / estimated original sound, and an estimated reverberant sound / estimated original sound correlation function calculating section for each frequency channel, value time series and the estimated reverberation Probable original correlation function calculation step of calculating the estimated reverberation Probable original correlation function is a correlation function of the original sound power estimate time series, the room impulse response estimate update coefficient For each frequency channel, the calculation unit divides the time-series element value of the correlation function between the observed sound and the estimated original sound by the time-series element value of the estimated reverberant sound / estimated original sound correlation function. An indoor impulse response estimated value update coefficient calculating step for calculating an indoor impulse response estimated value update coefficient, and an indoor impulse response estimated value update value output unit for each frequency channel. And an indoor impulse response estimated value update value output step of calculating an impulse response estimated value update coefficient for each element and calculating an indoor impulse response estimated value update value.

また、本発明の残響除去方法において、前記室内インパルス応答更新ステップは、モデル化誤差比系列・推定原音間相関関数算出部が、周波数チャネルごとに、前記観測パワー時系列を前記残響音パワー推定値時系列で要素ごとに除算した時系列と、前記原音パワー推定値時系列との相関関数であるモデル化誤差比系列・推定原音間相関関数を算出するモデル化誤差比系列・推定原音間相関関数算出ステップと、推定原音部分和系列算出部が、周波数チャネルごとに、前記原音パワー推定値時系列の各特定範囲の要素値の部分和を要素値とした系列である推定原音部分和系列を算出する推定原音部分和系列算出ステップと、室内インパルス応答推定値更新係数算出部が、周波数チャネルごとに、モデル化誤差比系列・推定原音間相関関数を、前記推定原音部分和系列の要素値で除算した値である室内インパルス応答推定値更新係数を算出する室内インパルス応答推定値更新係数算出ステップと、室内インパルス応答推定値更新値出力部が、周波数チャネルごとに、前記室内インパルス応答推定値と前記室内インパルス応答推定値更新係数とを要素ごとに積算し、室内インパルス応答推定値更新値を算出する室内インパルス応答推定値更新値出力ステップと、を含むことを特徴とする。 Further, in the dereverberation method of the present invention, the indoor impulse response update step includes a modeled error ratio sequence / estimated original sound correlation function calculation unit that converts the observed power time series into the reverberant sound power estimated value for each frequency channel. Modeling error ratio sequence / estimated original sound correlation function for calculating a model error ratio sequence / estimated original sound correlation function that is a correlation function between the time series divided for each element in the time series and the original sound power estimated value time series A calculation step and an estimated original sound partial sum series calculation unit calculate an estimated original sound partial sum series that is a series in which the element value is a partial sum of element values of each specific range of the original sound power estimated value time series for each frequency channel. the estimated original partial sum sequence calculation step of, room impulse response estimate update coefficient calculation unit, for each frequency channel, the modeling error ratio sequence-estimation original correlation function, Serial estimation and room impulse response estimate update coefficient calculating a room impulse response estimate update coefficient is a value obtained by dividing the element values of the original partial sum sequence, the room impulse response estimate update value output unit, each frequency channel The indoor impulse response estimated value and the indoor impulse response estimated value update coefficient are integrated element by element, and an indoor impulse response estimated value update value output step of calculating an indoor impulse response estimated value update value is included. Features.

また、本発明の残響除去方法において、前記原音パワー推定値時系列更新ステップは、観測音・推定インパルス応答間相関関数算出部が、周波数チャネルごとに、前記観測パワー時系列と前記室内インパルス応答推定値との相関関数である観測音・推定インパルス応答間相関関数を算出する観測音・推定インパルス応答間相関関数算出ステップと、スパース補正項つき推定残響音・推定インパルス応答間相関関数算出部が、周波数チャネルごとに、前記残響音パワー推定値時系列と前記室内インパルス応答推定値との相関関数を算出し、当該相関関数と、前記原音パワー推定値時系列を要素ごとに定数乗しさらに定数倍した時系列とを加算した時系列であるスパース補正項つき推定残響音・推定インパルス応答間相関関数を算出するスパース補正項つき推定残響音・推定インパルス応答間相関関数算出ステップと、原音パワー推定値時系列更新係数算出部が、周波数チャネルごとに、前記観測音・推定インパルス応答間相関関数の時系列の要素を、前記スパース補正項つき推定残響音・推定インパルス応答間相関関数の時系列の要素値で除算した値である原音パワー推定値時系列更新係数を算出する原音パワー推定値時系列更新係数算出ステップと、原音パワー推定値時系列更新値出力部が、周波数チャネルごとに、前記原音パワー推定値時系列と前記原音パワー推定値時系列更新係数とを要素ごとに積算し、原音パワー推定値時系列更新値を算出する原音パワー推定値時系列更新値出力ステップと、を含むことを特徴とする。 Further, in the dereverberation method of the present invention, the original sound power estimated value time series update step includes: the observed sound time / estimated impulse response correlation function calculation unit, for each frequency channel, the observed power time series and the indoor impulse response estimation. A correlation function calculation step between the observed sound and estimated impulse response that calculates a correlation function between the observed sound and estimated impulse response, which is a correlation function with the value, and a correlation function calculation unit between the estimated reverberant sound and estimated impulse response with a sparse correction term, For each frequency channel, a correlation function between the reverberation sound power estimated time series and the room impulse response estimated value is calculated, and the correlation function and the original sound power estimated time series are multiplied by a constant for each element and further multiplied by a constant. To calculate the correlation function between the estimated reverberant sound with the sparse correction term and the estimated impulse response. Correlation function calculation step between the positive section with estimated reverberation-estimated impulse response, when the original sound power estimate sequence updating coefficient calculating unit, for each frequency channel, the elements of the time series of the observed sound-estimated impulse response correlation function An original sound power estimated value time series update coefficient calculating step for calculating an original sound power estimated value time series update coefficient that is a value divided by a time series element value of the correlation function between estimated reverberant sound / estimated impulse response with sparse correction term; The original sound power estimate time series update value output unit integrates the original sound power estimate time series and the original sound power estimate time series update coefficient element by element for each frequency channel to update the original sound power estimate time series An original sound power estimated value time series update value output step for calculating a value.

また、本発明の残響除去方法において、前記原音パワー推定値時系列更新ステップは、モデル化誤差比系列・推定インパルス応答間相関関数算出部が、周波数チャネルごとに、前記観測パワー時系列を前記残響音パワー推定値時系列で要素ごとに除算した時系列と、前記室内インパルス応答推定値との相関関数であるモデル化誤差比系列・推定インパルス応答間相関関数を算出するモデル化誤差比系列・推定インパルス応答間相関関数算出ステップと、スパース補正項つき推定インパルス応答部分和系列算出部が、周波数チャネルごとに、前記室内インパルス応答推定値の各特定範囲の要素値の部分和を要素値とした系列を算出し、当該系列と、前記原音パワー推定値時系列を要素ごとに定数乗しさらに定数倍した時系列とを加算した時系列であるスパース補正項つき推定インパルス応答部分和系列を算出するスパース補正項つき推定インパルス応答部分和系列算出ステップと、原音パワー推定値時系列更新係数算出部が、周波数チャネルごとに、前記モデル化誤差比系列・推定インパルス応答間相関関数の時系列の要素を、前記スパース補正項つき推定インパルス応答部分和系列の要素値で除算した値である原音パワー推定値時系列更新係数を算出する原音パワー推定値時系列更新係数算出ステップと、原音パワー推定値時系列更新値出力部が、周波数チャネルごとに、前記原音パワー推定値時系列と前記原音パワー推定値時系列更新係数とを要素ごとに積算し、原音パワー推定値時系列更新値を算出する原音パワー推定値時系列更新値出力ステップと、を含むことを特徴とする。 Further, in the dereverberation method of the present invention, in the original sound power estimated value time series update step, the modeled error ratio series / estimated impulse response correlation function calculation unit calculates the reverberation of the observed power time series for each frequency channel. Modeling error ratio sequence / estimation for calculating a correlation function between a time series obtained by dividing the sound power estimation value for each element by a time series and the indoor impulse response estimation value and a correlation function between estimated impulse responses A correlation function calculation step between impulse responses and an estimated impulse response partial sum series calculation unit with a sparse correction term for each frequency channel, wherein the partial sum of element values of each specific range of the indoor impulse response estimation value is an element value And the time series obtained by adding the time series obtained by multiplying the original sound power estimated value time series by a constant power for each element and further multiplying by a constant An estimated impulse response partial sum sequence calculation step with a sparse correction term that calculates an estimated impulse response partial sum sequence with a sparse correction term, and an original sound power estimate time series update coefficient calculation unit, for each frequency channel, the modeling error ratio The original sound power estimated value for calculating the original sound power estimated value time series update coefficient, which is a value obtained by dividing the time series element of the correlation function between the sequence and estimated impulse response by the element value of the estimated impulse response partial sum series with sparse correction term A time series update coefficient calculating step, and an original sound power estimated value time series update value output unit, for each frequency channel, integrates the original sound power estimated value time series and the original sound power estimated value time series update coefficient element by element, An original sound power estimated value time series update value output step for calculating an original sound power estimated value time series update value;

また、本発明は、コンピュータを、残響除去装置として動作させるためのコンピュータプログラムである。 The present invention is also a computer program for operating a computer as a dereverberation device.

また、本発明は、コンピュータを、残響除去装置として動作させるためのコンピュータプログラムを記録したコンピュータ読み取り可能な記録媒体である。 The present invention is also a computer-readable recording medium that records a computer program for operating a computer as a dereverberation device.

本発明によれば、音源の移動や室温変化などに伴う残響環境の変化に柔軟に対応しつつ残響を除去することができる。また、残響環境に変化がない場合でも、従来と同等に残響を除去することができる。また、残響除去の計算を高速に行うことができる。 According to the present invention, it is possible to remove reverberation while flexibly responding to changes in the reverberation environment accompanying movement of a sound source, changes in room temperature, and the like. Moreover, even when there is no change in the reverberation environment, the reverberation can be removed as in the conventional case. Also, the calculation of dereverberation can be performed at high speed.

以下、図面を参照し、本発明の一実施形態について説明する。本実施形態では、定数λ、ｐ、Ｃ_Ｇ、Ｃ_Ｓをあらかじめ定めていることを前提とする。また、時刻のインデックスを（式１）とする。また、周波数のインデックスを（式２）とする。また、原信号のパワースペクトル時系列（以後、原音パワー推定値時系列と記す）をＳ（ω，ｔ）とする。また、室内伝達系のインパルス応答のスペクトログラム（以後、室内インパルス応答推定値と記す）をＧ（ω，ｔ）とする。なお、（式３）は整数全体の集合を示す。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. In the present embodiment, it is assumed that constants λ, p, C _G , and C _S are determined in advance. Also, let the time index be (Equation 1). Further, the frequency index is represented by (Equation 2). Further, the power spectrum time series of the original signal (hereinafter referred to as the original sound power estimated value time series) is S (ω, t). Further, the spectrogram of the impulse response of the indoor transmission system (hereinafter referred to as an estimated value of the indoor impulse response) is assumed to be G (ω, t). (Equation 3) represents a set of whole integers.

図１は、本実施形態における残響除去装置の機能ブロック図である。同図に示す残響除去装置は、観測パワー時系列生成部１と、初期設定部２と、残響音パワー推定値時系列算出部３と、室内インパルス応答更新部４と、原音パワー推定値時系列更新部５と、パラメータ規格化部６と、収束判定部７と、パラメータ出力部８とを備える。残響除去装置は、図示せぬマイクロホンなどから音響信号の入力を受け付け、この音響信号から室内インパルス応答推定値Ｇ（ω，ｔ）と、原音パワー推定値時系列Ｓ（ω，ｔ）とを算出し、出力する。 FIG. 1 is a functional block diagram of the dereverberation apparatus according to the present embodiment. The dereverberation apparatus shown in the figure includes an observation power time series generation unit 1, an initial setting unit 2, a reverberation sound power estimation value time series calculation unit 3, an indoor impulse response update unit 4, and an original sound power estimation value time series. The update unit 5, the parameter normalization unit 6, the convergence determination unit 7, and the parameter output unit 8 are provided. The dereverberation apparatus receives an input of an acoustic signal from a microphone or the like (not shown), and calculates an indoor impulse response estimated value G (ω, t) and an original sound power estimated value time series S (ω, t) from the acoustic signal. And output.

はじめに、観測パワー時系列生成部１は、音響信号の入力を受け付け、入力された音響信号のパワースペクトルの時間周波数成分を出力する。なお、この時間周波数成分をＹ（ω，ｔ）と表す。また、Ｙ（ω，ｔ）を観測パワー時系列と呼ぶ。ω＝１，・・・，Ωを周波数に対応するインデックスとする。また、ｔ＝１，・・・，Ｔを周波数に対応するインデックスとする。観測パワー時系列生成部１は、短時間Ｆｏｕｒｉｅｒ変換やウェーブレット変換など、複数チャネルのフィルタバンク出力による時間周波数分解手段により時間周波数成分Ｙ（ω，ｔ）を計算する。 First, the observation power time series generation unit 1 receives an input of an acoustic signal and outputs a time frequency component of a power spectrum of the input acoustic signal. This time frequency component is represented as Y (ω, t). Y (ω, t) is called an observation power time series. Let ω = 1,..., Ω be an index corresponding to the frequency. Further, t = 1,..., T is an index corresponding to the frequency. The observation power time series generation unit 1 calculates a time frequency component Y (ω, t) by time frequency decomposition means using a filter bank output of a plurality of channels such as short-time Fourier transform and wavelet transform.

続いて、初期設定部２は、原音パワー推定値時系列Ｓ（ω，ｔ）と室内インパルス応答推定値Ｇ（ω，ｔ）の初期値を設定し、出力する。これらの値は、乱数により設定してもよいが、原音パワー推定値時系列Ｓ（ω，ｔ）の初期値は、観測パワー時系列生成部１が出力したＹ（ω，ｔ）と等しくなるように設定するのが好適である。また、室内インパルス応答推定値Ｇ（ω，ｔ）の初期値は、指数関数などのようにｔ＝１で最大値をとり、ｔが増えるに従って小さくなるように設定するのが好適である。 Subsequently, the initial setting unit 2 sets and outputs initial values of the original sound power estimated value time series S (ω, t) and the indoor impulse response estimated value G (ω, t). These values may be set by random numbers, but the initial value of the original sound power estimated value time series S (ω, t) is equal to Y (ω, t) output by the observation power time series generation unit 1. It is preferable to set as follows. The initial value of the indoor impulse response estimated value G (ω, t) is preferably set to take a maximum value at t = 1, such as an exponential function, and become smaller as t increases.

続いて、残響音パワー推定値時系列算出部３は、初期設定部２もしくは後述するパラメータ規格化部６が出力した原音パワー推定値時系列Ｓ（ω，ｔ）と室内インパルス応答推定値Ｇ（ω，ｔ）の入力を受け付ける。続いて、残響音パワー推定値時系列算出部３は、（式４）を用いて畳み込みにより残響音パワー推定値時系列Ｘ（ω，ｔ）を算出し、出力する。 Subsequently, the reverberant sound power estimated value time series calculation unit 3 and the original sound power estimated value time series S (ω, t) output from the initial setting unit 2 or the parameter normalization unit 6 described later and the indoor impulse response estimated value G ( The input of ω, t) is accepted. Subsequently, the reverberant sound power estimated value time series calculation unit 3 calculates and outputs a reverberant sound power estimated value time series X (ω, t) by convolution using (Equation 4).

ここで、系列Ｆ（ω，１），Ｆ（ω，２），・・・に対する離散Ｆｏｕｒｉｅｒ変換Ｆ´（ω，ｋ）を（式５）と表記する。また、系列Ｆ´（ω，１），Ｆ´（ω，２），・・・に対する離散逆Ｆｏｕｒｉｅｒ変換Ｆ（ω，ｔ）を（式６）と表記する。これにより、残響音パワー推定値時系列Ｘ（ω，ｔ）は（式７）と言える。よって、ＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）により、残響音パワー推定値時系列Ｘ（ω，ｔ）を高速に計算することができる。 Here, the discrete Fourier transform F ′ (ω, k) for the series F (ω, 1), F (ω, 2),... Is expressed as (Equation 5). Also, the discrete inverse Fourier transform F (ω, t) for the series F ′ (ω, 1), F ′ (ω, 2),. Thereby, it can be said that the reverberation sound power estimated value time series X (ω, t) is (Expression 7). Therefore, the reverberant sound power estimated time series X (ω, t) can be calculated at high speed by FFT (Fast Fourier Transform).

また、（式８）および（式９）は、それぞれＳ（ω，ｔ）およびＧ（ω，ｔ）に対し、循環畳み込みの影響を減らす目的で適当に零詰したものを表す。具体的には、（式１０）および（式１１）である。ただし、Ｍ_ωは（式８）および（式９）の時刻インデックス数である。 In addition, (Equation 8) and (Equation 9) represent those in which S (ω, t) and G (ω, t) are appropriately zeroed for the purpose of reducing the influence of cyclic convolution. Specifically, (Equation 10) and (Equation 11). However, _Mω is the number of time indexes in (Expression 8) and (Expression 9).

続いて、室内インパルス応答更新部４は、観測パワー時系列生成部１が出力した観測パワー時系列Ｙ（ω，ｔ）と、初期設定部２もしくは後述するパラメータ規格化部６が出力した原音パワー推定値時系列Ｓ（ω，ｔ）と室内インパルス応答推定値Ｇ（ω，ｔ）と、残響音パワー推定値時系列算出部３が出力した残響音パワー推定値時系列Ｘ（ω，ｔ）との入力を受け付ける。続いて、室内インパルス応答更新部４は、室内インパルス応答推定値Ｇ（ω，ｔ）を更新し、出力する。具体的な室内インパルス応答推定値Ｇ（ω，ｔ）の更新方法については、図２および図３を参照して説明する。 Subsequently, the indoor impulse response updating unit 4 outputs the observation power time series Y (ω, t) output from the observation power time series generation unit 1 and the original sound power output from the initial setting unit 2 or the parameter normalization unit 6 described later. Estimated value time series S (ω, t), indoor impulse response estimated value G (ω, t), and reverberant sound power estimated value time series calculation unit 3 output reverberant sound power estimated value time series X (ω, t) Is accepted. Subsequently, the indoor impulse response update unit 4 updates and outputs the indoor impulse response estimated value G (ω, t). A specific method for updating the indoor impulse response estimated value G (ω, t) will be described with reference to FIGS.

図２および図３は、室内インパルス応答更新部４が備える機能ブロックの構成例を示した図である。はじめに、図２に示した構成例について説明する。図２に示した例では、室内インパルス応答更新部４は、観測音・推定原音間相関関数算出部４１と、推定残響音・推定原音間相関関数算出部４２と、室内インパルス応答推定値更新係数算出部４３と、室内インパルス応答推定値更新値出力部４４とを備えている。 FIG. 2 and FIG. 3 are diagrams showing configuration examples of functional blocks included in the indoor impulse response update unit 4. First, the configuration example shown in FIG. 2 will be described. In the example shown in FIG. 2, the indoor impulse response updating unit 4 includes an observed sound / estimated original sound correlation function calculating unit 41, an estimated reverberant sound / estimated original sound correlation function calculating unit 42, and an indoor impulse response estimated value update coefficient. A calculation unit 43 and an indoor impulse response estimated value update value output unit 44 are provided.

観測音・推定原音間相関関数算出部４１は、観測パワー時系列Ｙ（ω，ｔ）と、原音パワー推定値時系列Ｓ（ω，ｔ）との入力を受け付ける。続いて、観測音・推定原音間相関関数算出部４１は（式１２）の計算を行い、観測音・推定原音間相関関数Ｒ_ＳＹ（ω，τ）を出力する。 The observed sound / estimated original sound correlation function calculation unit 41 receives input of the observed power time series Y (ω, t) and the original sound power estimated value time series S (ω, t). Subsequently, the observed sound / estimated original sound correlation function calculation unit 41 calculates (Equation 12) and outputs the observed sound / estimated original sound correlation function R _SY (ω, τ).

続いて、推定残響音・推定原音間相関関数算出部４２は、原音パワー推定値時系列Ｓ（ω，ｔ）と、残響音パワー推定値時系列Ｘ（ω，ｔ）との入力を受け付ける。続いて、推定残響音・推定原音間相関関数算出部４２は（式１３）の計算を行い、推定残響音・推定原音間相関関数Ｒ_ＳＸ（ω，τ）を出力する。 Subsequently, the estimated reverberant sound / estimated original sound correlation function calculation unit 42 receives input of the original sound power estimated value time series S (ω, t) and the reverberant power estimated value time series X (ω, t). Subsequently, the estimated reverberant sound / estimated original sound correlation function calculation unit 42 calculates (Equation 13) and outputs the estimated reverberant sound / estimated original sound correlation function R _SX (ω, τ).

続いて、室内インパルス応答推定値更新係数算出部４３は、観測音・推定原音間相関関数算出部４１が出力した観測音・推定原音間相関関数Ｒ_ＳＹ（ω，τ）と、推定残響音・推定原音間相関関数算出部４２が出力した推定残響音・推定原音間相関関数Ｒ_ＳＸ（ω，τ）との入力を受け付ける。続いて、室内インパルス応答推定値更新係数算出部４３は、（式１４）の計算を行い、室内インパルス応答推定値更新係数α_Ｇ（ω，ｔ）を出力する。 Subsequently, the indoor impulse response estimated value update coefficient calculating unit 43 outputs the observed sound / estimated original sound correlation function R _SY (ω, τ) output from the observed sound / estimated original sound correlation function calculating unit 41 and the estimated reverberant sound / The input of the estimated reverberant sound / estimated original sound correlation function R _SX (ω, τ) output from the estimated original sound correlation function calculation unit 42 is received. Subsequently, the indoor impulse response estimated value update coefficient calculation unit 43 performs calculation of (Equation 14) and outputs an indoor impulse response estimated value update coefficient α _G (ω, t).

続いて、室内インパルス応答推定値更新値出力部４４は、室内インパルス応答推定値Ｇ（ω，ｔ）と、室内インパルス応答推定値更新係数算出部４３が出力した室内インパルス応答推定値更新係数α_Ｇ（ω，ｔ）との入力を受け付ける。続いて、室内インパルス応答推定値更新値出力部４４は（式１５）の計算を行い、更新後の室内インパルス応答推定値Ｇ（ω，ｔ）を出力する。ただし、「←」は代入を意味する。 Subsequently, the indoor impulse response estimated value update value output unit 44 includes the indoor impulse response estimated value G (ω, t) and the indoor impulse response estimated value update coefficient α _G output from the indoor impulse response estimated value update coefficient calculation unit 43. The input of (ω, t) is accepted. Subsequently, the indoor impulse response estimated value update value output unit 44 calculates (Equation 15) and outputs the updated indoor impulse response estimated value G (ω, t). However, “←” means substitution.

次に、図３に示した構成例について説明する。図３に示した例では、室内インパルス応答更新部４は、モデル化誤差比系列・推定原音間相関関数算出部４５と、推定原音部分和系列算出部４６と、室内インパルス応答推定値更新係数算出部４７と、室内インパルス応答推定値更新値出力部４８とを備えている。 Next, the configuration example shown in FIG. 3 will be described. In the example shown in FIG. 3, the indoor impulse response update unit 4 includes a modeling error ratio sequence / estimated original sound correlation function calculation unit 45, an estimated original sound partial sum sequence calculation unit 46, and an indoor impulse response estimated value update coefficient calculation. Unit 47 and an indoor impulse response estimated value update value output unit 48.

モデル化誤差比系列・推定原音間相関関数算出部４５は、観測パワー時系列Ｙ（ω，ｔ）と、原音パワー推定値時系列Ｓ（ω，ｔ）と、残響音パワー推定値時系列Ｘ（ω，ｔ）との入力を受け付ける。続いて、モデル化誤差比系列・推定原音間相関関数算出部４５は（式１６）の計算を行い、モデル化誤差比系列・推定原音間相関関数Ｌ_ＳＹ／Ｘ（ω，τ）を出力する。 The modeling error ratio sequence / estimated original sound correlation function calculation unit 45 includes an observed power time series Y (ω, t), an original sound power estimated value time series S (ω, t), and a reverberant sound power estimated value time series X. The input of (ω, t) is accepted. Subsequently, the modeling error ratio sequence / estimated original sound correlation function calculation unit 45 calculates (Equation 16) and outputs the modeling error ratio sequence / estimated original sound correlation function L _{SY / X} (ω, τ). .

続いて、推定原音部分和系列算出部４６は、原音パワー推定値時系列Ｓ（ω，ｔ）の入力を受け付ける。続いて、推定原音部分和系列算出部４６は（式１７）の計算を行い、推定原音部分和系列Ｌ_Ｓ（ω，τ）を出力する。 Subsequently, the estimated original sound partial sum series calculation unit 46 receives an input of the original sound power estimated value time series S (ω, t). Subsequently, the estimated original sound partial sum series calculation unit 46 performs the calculation of (Equation 17) and outputs the estimated original sound partial sum series L _S (ω, τ).

続いて、室内インパルス応答推定値更新係数算出部４７は、モデル化誤差比系列・推定原音間相関関数算出部４５が出力したモデル化誤差比系列・推定原音間相関関数Ｌ_ＳＹ／Ｘ（ω，τ）と、推定原音部分和系列算出部４６が出力した推定原音部分和系列Ｌ_Ｓ（ω，τ）との入力を受け付ける。続いて、室内インパルス応答推定値更新係数算出部４７は（式１８）の計算を行い、室内インパルス応答推定値更新係数β_Ｇ（ω，ｔ）を出力する。 Subsequently, the indoor impulse response estimated value update coefficient calculation unit 47 outputs the modeling error ratio sequence / estimated original sound correlation function L _{SY / X} (ω, τ) and the estimated original sound partial sum sequence L _S (ω, τ) output from the estimated original sound partial sum sequence calculation unit 46 are received. Subsequently, the indoor impulse response estimated value update coefficient calculation unit 47 calculates (Equation 18) and outputs an indoor impulse response estimated value update coefficient β _G (ω, t).

続いて、室内インパルス応答推定値更新値出力部４８は、室内インパルス応答推定値Ｇ（ω，ｔ）と、室内インパルス応答推定値更新係数算出部４７が出力した室内インパルス応答推定値更新係数β_Ｇ（ω，ｔ）との入力を受け付ける。続いて、室内インパルス応答推定値更新値出力部４８は（式１９）の計算を行い、更新後の室内インパルス応答推定値Ｇ（ω，ｔ）を出力する。ただし、「←」は代入を意味する。 Subsequently, the indoor impulse response estimated value update value output unit 48 outputs the indoor impulse response estimated value G (ω, t) and the indoor impulse response estimated value update coefficient β _G output from the indoor impulse response estimated value update coefficient calculation unit 47. The input of (ω, t) is accepted. Subsequently, the indoor impulse response estimated value update value output unit 48 calculates (Equation 19), and outputs the updated indoor impulse response estimated value G (ω, t). However, “←” means substitution.

図２および図３を参照して説明したとおり、室内インパルス応答更新部４は、室内インパルス応答推定値Ｇ（ω，ｔ）を更新し、出力する。 As described with reference to FIGS. 2 and 3, the indoor impulse response update unit 4 updates and outputs the indoor impulse response estimated value G (ω, t).

以下、図１の説明に戻る。続いて、原音パワー推定値時系列更新部５は、観測パワー時系列生成部１が出力した観測パワー時系列Ｙ（ω，ｔ）と、初期設定部２もしくは後述するパラメータ規格化部６が出力した原音パワー推定値時系列Ｓ（ω，ｔ）と室内インパルス応答推定値Ｇ（ω，ｔ）と、残響音パワー推定値時系列算出部３が出力した残響音パワー推定値時系列Ｘ（ω，ｔ）との入力を受け付ける。続いて、原音パワー推定値時系列更新部５は、原音パワー推定値時系列Ｓ（ω，ｔ）を更新し、出力する。具体的な原音パワー推定値時系列Ｓ（ω，ｔ）の更新方法については、図４および図５を参照して説明する。 Returning to the description of FIG. Subsequently, the original sound power estimated value time series update unit 5 outputs the observation power time series Y (ω, t) output from the observation power time series generation unit 1 and the initial setting unit 2 or the parameter normalization unit 6 described later. Original sound power estimated value time series S (ω, t), indoor impulse response estimated value G (ω, t), and reverberant sound power estimated value time series calculation unit 3 output reverberant sound power estimated value time series X (ω , T). Subsequently, the original sound power estimated value time series update unit 5 updates and outputs the original sound power estimated value time series S (ω, t). A specific method of updating the original sound power estimated value time series S (ω, t) will be described with reference to FIGS. 4 and 5.

図４および図５は、原音パワー推定値時系列更新部５が備える機能ブロックの構成例を示した図である。はじめに、図４に示した構成例について説明する。図４に示した例では、原音パワー推定値時系列更新部５は、観測音・推定インパルス応答間相関関数算出部５１と、スパース補正項つき推定残響音・推定インパルス応答間相関関数算出部５２と、原音パワー推定値時系列更新係数算出部５３と、原音パワー推定値時系列更新値出力部５４とを備えている。 FIG. 4 and FIG. 5 are diagrams showing configuration examples of functional blocks included in the original sound power estimated value time-series updating unit 5. First, the configuration example shown in FIG. 4 will be described. In the example shown in FIG. 4, the original sound power estimated value time-series updating unit 5 includes an observed sound / estimated impulse response correlation function calculating unit 51 and an estimated reverberant sound / estimated impulse response correlation function calculating unit 52 with a sparse correction term. And an original sound power estimated value time series update coefficient calculating unit 53 and an original sound power estimated value time series update value output unit 54.

観測音・推定インパルス応答間相関関数算出部５１は、観測パワー時系列Ｙ（ω，ｔ）と、室内インパルス応答推定値Ｇ（ω，ｔ）との入力を受け付ける。続いて、観測音・推定インパルス応答間相関関数算出部５１は（式２０）の計算を行い、観測音・推定インパルス応答間相関関数Ｒ_ＧＹ（ω，τ）を出力する。 The observed sound / estimated impulse response correlation function calculation unit 51 accepts inputs of the observed power time series Y (ω, t) and the indoor impulse response estimated value G (ω, t). Subsequently, the observed sound / estimated impulse response correlation function calculation unit 51 calculates (Equation 20), and outputs the observed sound / estimated impulse response correlation function R _GY (ω, τ).

続いて、スパース補正項つき推定残響音・推定インパルス応答間相関関数算出部５２は、原音パワー推定値時系列Ｓ（ω，ｔ）と、室内インパルス応答推定値Ｇ（ω，ｔ）と、残響音パワー推定値時系列Ｘ（ω，ｔ）との入力を受け付ける。続いて、スパース補正項つき推定残響音・推定インパルス応答間相関関数算出部５２は（式２１）の計算を行い、スパース補正項つき推定残響音・推定インパルス応答間相関関数Ｒ_ＧＸ（ω，τ）を出力する。 Subsequently, the correlation function calculation unit 52 between the estimated reverberation sound / estimated impulse response with sparse correction term, the original sound power estimated value time series S (ω, t), the indoor impulse response estimated value G (ω, t), and the reverberation. The input of the sound power estimated value time series X (ω, t) is received. Subsequently, the estimated reverberant sound / estimated impulse response correlation function calculation unit 52 with the sparse correction term performs the calculation of (Equation 21), and the estimated reverberant sound / estimated impulse response correlation function R _GX (ω, τ) with the sparse correction term. ) Is output.

続いて、原音パワー推定値時系列更新係数算出部５３は、観測音・推定インパルス応答間相関関数算出部５１が出力した観測音・推定インパルス応答間相関関数Ｒ_ＧＹ（ω，τ）と、スパース補正項つき推定残響音・推定インパルス応答間相関関数算出部５２が出力したスパース補正項つき推定残響音・推定インパルス応答間相関関数Ｒ_ＧＸ（ω，τ）との入力を受け付ける。続いて、原音パワー推定値時系列更新係数算出部５３は（式２２）の計算を行い、原音パワー推定値時系列更新係数α_Ｓ（ω，ｔ）を出力する。 Subsequently, the original sound power estimated value time series update coefficient calculation unit 53 calculates the correlation function R _GY (ω, τ) between the observed sound and the estimated impulse response output from the correlation function calculation unit 51 between the observed sound and the estimated impulse response, and the sparse. An input of the estimated reverberant sound / estimated impulse response correlation function R _GX (ω, τ) with a sparse correction term output from the estimated reverberant sound / corrected impulse response correlation function calculation unit 52 with a correction term is received. Subsequently, the original sound power estimated value time series update coefficient calculation unit 53 calculates (Equation 22) and outputs the original sound power estimated value time series update coefficient α _S (ω, t).

続いて、原音パワー推定値時系列更新値出力部５４は、原音パワー推定値時系列Ｓ（ω，ｔ）と、原音パワー推定値時系列更新係数算出部５３が出力した原音パワー推定値時系列更新係数α_Ｓ（ω，ｔ）との入力を受け付ける。続いて、原音パワー推定値時系列更新値出力部５４は（式２３）の計算を行い、更新後の原音パワー推定値時系列Ｓ（ω，ｔ）を出力する。ただし、「←」は代入を意味する。 Subsequently, the original sound power estimated value time series update value output unit 54 outputs the original sound power estimated value time series S (ω, t) and the original sound power estimated value time series update coefficient calculation unit 53. An input with the update coefficient α _S (ω, t) is received. Subsequently, the original sound power estimated value time series update value output unit 54 calculates (Equation 23) and outputs the updated original sound power estimated value time series S (ω, t). However, “←” means substitution.

次に、図５に示した構成例について説明する。図５に示した例では、原音パワー推定値時系列更新部５は、モデル化誤差比系列・推定インパルス応答間相関関数算出部５５と、スパース補正項つき推定インパルス応答部分和系列算出部５６と、原音パワー推定値時系列更新係数算出部５７と、原音パワー推定値時系列更新値出力部５８とを備えている。 Next, the configuration example shown in FIG. 5 will be described. In the example shown in FIG. 5, the original sound power estimated value time series update unit 5 includes a modeling error ratio sequence / estimated impulse response correlation function calculation unit 55, an estimated impulse response partial sum series calculation unit 56 with a sparse correction term, The original sound power estimated value time series update coefficient calculating unit 57 and the original sound power estimated value time series update value output unit 58 are provided.

モデル化誤差比系列・推定インパルス応答間相関関数算出部５５は、観測パワー時系列Ｙ（ω，ｔ）と、室内インパルス応答推定値Ｇ（ω，ｔ）と、残響音パワー推定値時系列Ｘ（ω，ｔ）との入力を受け付ける。続いて、モデル化誤差比系列・推定インパルス応答間相関関数算出部５５は（式２４）の計算を行い、モデル化誤差比系列・推定インパルス応答間相関関数Ｌ_ＧＹ／Ｘ（ω，τ）を出力する。 The modeling error ratio sequence / estimated impulse response correlation function calculation unit 55 includes an observation power time series Y (ω, t), an indoor impulse response estimated value G (ω, t), and a reverberant power estimated value time series X. The input of (ω, t) is accepted. Subsequently, the modeling error ratio sequence / estimated impulse response correlation function calculation unit 55 calculates (Equation 24), and calculates the modeling error ratio sequence / estimated impulse response correlation function L _{GY / X} (ω, τ). Output.

続いて、スパース補正項つき推定インパルス応答部分和系列算出部５６は、原音パワー推定値時系列Ｓ（ω，ｔ）と、室内インパルス応答推定値Ｇ（ω，ｔ）との入力を受け付ける。続いて、スパース補正項つき推定インパルス応答部分和系列算出部５６は、（式２５）の計算を行い、スパース補正項つき推定インパルス応答部分和系列Ｌ_Ｇ（ω，τ）を出力する。 Subsequently, the estimated impulse response partial sum series calculation unit 56 with a sparse correction term receives inputs of the original sound power estimated value time series S (ω, t) and the indoor impulse response estimated value G (ω, t). Subsequently, the estimated impulse response partial sum series calculation unit 56 with a sparse correction term performs the calculation of (Equation 25), and outputs an estimated impulse response partial sum series L _G (ω, τ) with a sparse correction term.

続いて、原音パワー推定値時系列更新係数算出部５７は、モデル化誤差比系列・推定インパルス応答間相関関数算出部５５が出力したモデル化誤差比系列・推定インパルス応答間相関関数Ｌ_ＧＹ／Ｘ（ω，τ）と、スパース補正項つき推定インパルス応答部分和系列算出部５６が出力したスパース補正項つき推定インパルス応答部分和系列Ｌ_Ｇ（ω，τ）との入力を受け付ける。続いて、原音パワー推定値時系列更新係数算出部５７は、（式２６）の計算を行い、原音パワー推定値時系列更新係数β_Ｓ（ω，ｔ）を出力する。 Subsequently, the original sound power estimated value time-series update coefficient calculation unit 57 performs the modeling error ratio sequence / estimated impulse response correlation function calculation unit 55 output by the modeling error ratio sequence / estimated impulse response correlation function L _{GY / X.} The input of (ω, τ) and the estimated impulse response partial sum series L _G (ω, τ) with sparse correction term output from the estimated impulse response partial sum series calculation unit 56 with sparse correction term is received. Subsequently, the original sound power estimated value time series update coefficient calculation unit 57 calculates (Equation 26) and outputs the original sound power estimated value time series update coefficient β _S (ω, t).

続いて、原音パワー推定値時系列更新値出力部５８は、原音パワー推定値時系列Ｓ（ω，ｔ）と、原音パワー推定値時系列更新係数算出部５７が出力した原音パワー推定値時系列更新係数β_Ｓ（ω，ｔ）との入力を受け付ける。続いて、原音パワー推定値時系列更新値出力部５８は（式２７）の計算を行い、更新後の原音パワー推定値時系列Ｓ（ω，ｔ）を出力する。ただし、「←」は代入を意味する。 Subsequently, the original sound power estimated value time series update value output unit 58 outputs the original sound power estimated value time series S (ω, t) and the original sound power estimated value time series update coefficient calculating unit 57. The input of the update coefficient β _S (ω, t) is received. Subsequently, the original sound power estimated value time series update value output unit 58 performs calculation of (Equation 27), and outputs the updated original sound power estimated value time series S (ω, t). However, “←” means substitution.

図４および図５を参照して説明したとおり、原音パワー推定値時系列更新部５は、原音パワー推定値時系列Ｓ（ω，ｔ）を更新し、出力する。 As described with reference to FIGS. 4 and 5, the original sound power estimated value time series update unit 5 updates and outputs the original sound power estimated value time series S (ω, t).

以下、図１の説明に戻る。続いて、パラメータ規格化部６は、室内インパルス応答更新部４が出力した室内インパルス応答推定値Ｇ（ω，ｔ）と、原音パワー推定値時系列更新部５が出力した原音パワー推定値時系列Ｓ（ω，ｔ）との入力を受け付ける。続いて、パラメータ規格化部６は（式２８）の計算を行い、入力を受け付けた室内インパルス応答推定値Ｇ（ω，ｔ）を修正し、修正後の室内インパルス応答推定値Ｇ（ω，ｔ）を出力する。また、パラメータ規格化部６は（式２９）の計算を行い、入力を受け付けた原音パワー推定値時系列Ｓ（ω，ｔ）を修正し、修正後の原音パワー推定値時系列Ｓ（ω，ｔ）を出力する。 Returning to the description of FIG. Subsequently, the parameter normalization unit 6 outputs the room impulse response estimated value G (ω, t) output from the room impulse response update unit 4 and the original sound power estimated value time series output from the original sound power estimated value time series update unit 5. An input with S (ω, t) is accepted. Subsequently, the parameter normalization unit 6 calculates (Equation 28), corrects the indoor impulse response estimated value G (ω, t) that has received the input, and corrects the corrected indoor impulse response estimated value G (ω, t). ) Is output. Further, the parameter normalization unit 6 calculates (Equation 29), corrects the original sound power estimated value time series S (ω, t) that has received the input, and corrects the original sound power estimated value time series S (ω, t). t) is output.

上述した、残響音パワー推定値時系列算出部３と、室内インパルス応答更新部４と、原音パワー推定値時系列更新部５と、パラメータ規格化部６とが、この順に各々の処理を繰り返し実行することで、パラメータ規格化部６が出力する室内インパルス応答推定値Ｇ（ω，ｔ）と、原音パワー推定値時系列Ｓ（ω，ｔ）との精度が向上する。 The reverberant sound power estimated time series calculating unit 3, the indoor impulse response updating unit 4, the original sound power estimated value time series updating unit 5, and the parameter normalizing unit 6 described above repeatedly execute each process in this order. Thus, the accuracy of the indoor impulse response estimated value G (ω, t) output from the parameter normalization unit 6 and the original sound power estimated value time series S (ω, t) is improved.

続いて、収束判定部７は、パラメータ規格化部６が出力する室内インパルス応答推定値Ｇ（ω，ｔ）と、原音パワー推定値時系列Ｓ（ω，ｔ）との精度が十分であるか否かを判定する。収束判定部７が、パラメータ規格化部６が出力する室内インパルス応答推定値Ｇ（ω，ｔ）と、原音パワー推定値時系列Ｓ（ω，ｔ）との精度が十分であると判定した場合、後述するパラメータ出力部８が処理を実行する。収束判定部７が、パラメータ規格化部６が出力する室内インパルス応答推定値Ｇ（ω，ｔ）と、原音パワー推定値時系列Ｓ（ω，ｔ）との精度が十分ではないと判定した場合、再度、残響音パワー推定値時系列算出部３と、室内インパルス応答更新部４と、原音パワー推定値時系列更新部５と、パラメータ規格化部６とが、この順に各々の処理を実行する。 Subsequently, the convergence determination unit 7 has sufficient accuracy of the indoor impulse response estimated value G (ω, t) output from the parameter normalization unit 6 and the original sound power estimated value time series S (ω, t). Determine whether or not. When the convergence determination unit 7 determines that the accuracy of the indoor impulse response estimated value G (ω, t) output by the parameter normalization unit 6 and the original sound power estimated value time series S (ω, t) is sufficient The parameter output unit 8 to be described later executes processing. When the convergence determination unit 7 determines that the accuracy of the indoor impulse response estimated value G (ω, t) output from the parameter normalization unit 6 and the original sound power estimated value time series S (ω, t) is not sufficient Again, the reverberant sound power estimated value time series calculating unit 3, the indoor impulse response updating unit 4, the original sound power estimated value time series updating unit 5, and the parameter normalizing unit 6 execute the respective processes in this order. .

パラメータ規格化部６が出力する室内インパルス応答推定値Ｇ（ω，ｔ）と、原音パワー推定値時系列Ｓ（ω，ｔ）との精度が十分であるか否かの判定方法としては、以下の方法がある。 A method for determining whether or not the accuracy of the indoor impulse response estimated value G (ω, t) output by the parameter normalization unit 6 and the original sound power estimated value time series S (ω, t) is sufficient is as follows. There is a way.

例えば、収束判定部７は、残響音パワー推定値時系列算出部３と、室内インパルス応答更新部４と、原音パワー推定値時系列更新部５と、パラメータ規格化部６とが、所定の回数以上処理を行ったか否かを判定する。続いて、収束判定部７は、所定の回数以上処理を行ったと判定した場合、室内インパルス応答推定値Ｇ（ω，ｔ）と、原音パワー推定値時系列Ｓ（ω，ｔ）との精度が十分であると判定し、それ以外は精度が十分ではないと判定する。 For example, the convergence determination unit 7 includes a reverberation sound power estimated value time-series calculating unit 3, an indoor impulse response updating unit 4, an original sound power estimated value time-series updating unit 5, and a parameter normalizing unit 6, a predetermined number of times It is determined whether the above processing has been performed. Subsequently, when the convergence determination unit 7 determines that the processing has been performed a predetermined number of times or more, the accuracy of the indoor impulse response estimated value G (ω, t) and the original sound power estimated value time series S (ω, t) is high. It is determined that it is sufficient, and otherwise it is determined that the accuracy is not sufficient.

また、例えば、収束判定部７は、パラメータ規格化部６が更新する室内インパルス応答推定値Ｇ（ω，ｔ）と、原音パワー推定値時系列Ｓ（ω，ｔ）の変化率が所定値以下であるか否かを判定する。続いて、収束判定部７は、目的関数の変化率が所定値以下である場合は室内インパルス応答推定値Ｇ（ω，ｔ）と原音パワー推定値時系列Ｓ（ω，ｔ）との精度が十分であると判定し、それ以外は精度が十分ではないと判定する。 Further, for example, the convergence determining unit 7 has a rate of change of the indoor impulse response estimated value G (ω, t) updated by the parameter normalizing unit 6 and the original sound power estimated value time series S (ω, t) below a predetermined value. It is determined whether or not. Subsequently, when the rate of change of the objective function is equal to or less than a predetermined value, the convergence determination unit 7 has the accuracy of the indoor impulse response estimated value G (ω, t) and the original sound power estimated value time series S (ω, t). It is determined that it is sufficient, and otherwise it is determined that the accuracy is not sufficient.

また、例えば、収束判定部７は目的関数を算出し、算出した目的関数の変化率が所定値以下であるか否かを判定する。続いて、収束判定部７は、目的関数の変化率が所定値以下である場合は室内インパルス応答推定値Ｇ（ω，ｔ）と原音パワー推定値時系列Ｓ（ω，ｔ）との精度が十分であると判定し、それ以外は精度が十分ではないと判定する。収束判定部７は、（式３０）または（式３１）を算出し、目的関数Ｊ（Ｓ，Ｇ）またはＫ（Ｓ，Ｇ）を算出する。ただし、Λ＝｛１，・・・，Ω｝、Γ＝｛１，・・・，Ｔ｝である。目的関数Ｊ（Ｓ，Ｇ）とＫ（Ｓ，Ｇ）を最小化する乗法更新アルゴリズムについては後述する。 For example, the convergence determination unit 7 calculates an objective function and determines whether or not the calculated change rate of the objective function is equal to or less than a predetermined value. Subsequently, when the rate of change of the objective function is equal to or less than a predetermined value, the convergence determination unit 7 has the accuracy of the indoor impulse response estimated value G (ω, t) and the original sound power estimated value time series S (ω, t). It is determined that it is sufficient, and otherwise it is determined that the accuracy is not sufficient. The convergence determination unit 7 calculates (Expression 30) or (Expression 31), and calculates an objective function J (S, G) or K (S, G). However, Λ = {1,..., Ω} and Γ = {1,. A multiplicative update algorithm for minimizing the objective functions J (S, G) and K (S, G) will be described later.

パラメータ出力部８は、収束判定部７が、パラメータ規格化部６が出力する室内インパルス応答推定値Ｇ（ω，ｔ）と、原音パワー推定値時系列Ｓ（ω，ｔ）との精度が十分であると判定した場合、この室内インパルス応答推定値Ｇ（ω，ｔ）と、原音パワー推定値時系列Ｓ（ω，ｔ）とを出力する。 The parameter output unit 8 has sufficient accuracy between the indoor impulse response estimated value G (ω, t) output from the parameter normalization unit 6 and the original sound power estimated value time series S (ω, t). If it is determined that, the indoor impulse response estimated value G (ω, t) and the original sound power estimated value time series S (ω, t) are output.

次に、目的関数Ｊ（Ｓ，Ｇ）およびＫ（Ｓ，Ｇ）を最小化する乗法更新アルゴリズムについて説明する。上述したとおり、本実施形態では、時刻のインデックスを（式３２）とする。また、周波数のインデックスを（式３３）とする。また、原信号のパワースペクトル時系列をＳ（ω，ｔ）とする。以下、パワースペクトル時系列をスペクトログラムと記す。また（式３４）は整数全体の集合を示す。また、残響音声のスペクトログラムをＸ（ω，ｔ）とする。また、室内伝達系のインパルス応答のスペクトログラム（以後、室内インパルス応答と記す）をＧ（ω，ｔ）とする。また、残響音声のスペクトログラムＸ（ω，ｔ）は、室内インパルス応答Ｇ（ω，ｔ）を用いて近似的に（式３５）で表されるとする。 Next, a multiplicative update algorithm for minimizing the objective functions J (S, G) and K (S, G) will be described. As described above, in the present embodiment, the time index is (Expression 32). The frequency index is represented by (Expression 33). The power spectrum time series of the original signal is S (ω, t). Hereinafter, the power spectrum time series is referred to as a spectrogram. Further, (Expression 34) represents a set of whole integers. Further, the spectrogram of reverberant speech is assumed to be X (ω, t). Further, the spectrogram of the impulse response of the indoor transmission system (hereinafter referred to as the indoor impulse response) is G (ω, t). Further, it is assumed that the spectrogram X (ω, t) of the reverberant voice is approximately expressed by (Expression 35) using the room impulse response G (ω, t).

音声観測スペクトログラムがＹ（ω，ｔ）のとき、Ｘ（ω，ｔ）≒Ｙ（ω，ｔ）で、かつ、原信号のスペクトログラムＳ（ω，ｔ）ができるだけスパースとなるような非負のインパルス応答Ｇ（ω，ｔ）を求めるのが本実施形態での目的である。そこで、（式３６）と（式３７）で示した範囲の音声観測スペクトログラムＹ（ω，ｔ）を近似する問題を考え、原信号のスペクトログラムＳ（ω，ｔ）を（式３８）とし、室内インパルス応答Ｇ（ω，ｔ）を（式３９）とする。ここで、Γは時刻のインデックスの集合である。また、Λは周波数のインデックスの集合である。また、Ｔは時刻のインデックスの集合の要素である。また、Ωは周波数のインデックスの集合の要素である。 When the speech observation spectrogram is Y (ω, t), X (ω, t) ≈Y (ω, t), and the non-negative impulse is such that the spectrogram S (ω, t) of the original signal is as sparse as possible. The purpose of this embodiment is to obtain the response G (ω, t). Therefore, considering the problem of approximating the speech observation spectrogram Y (ω, t) in the range shown in (Expression 36) and (Expression 37), the spectrogram S (ω, t) of the original signal is expressed as (Expression 38). The impulse response G (ω, t) is represented by (Equation 39). Here, Γ is a set of time indexes. Λ is a set of frequency indexes. T is an element of a time index set. Ω is an element of a set of frequency indexes.

また、Υ_ω＝｛１，・・・，Ｔ_ω｝である。Ｔ_ωはωごとに異なりうる室内インパルス応答の時間長（時刻インデックス数）であり、以後フィルタ長と記す。音声観測スペクトログラムＹ（ω，ｔ）と残響音声のスペクトログラムＸ（ω，ｔ）との近さを二乗誤差で測ることにすると、原信号のスペクトログラムＳ（ω，ｔ）と室内インパルス応答Ｇ（ω，ｔ）とに関する、制約つき最適化問題は（式４０）〜（式４３）のとおりである。 Further, Υ _ω = {1,..., T _ω }. T _ω is the time length (number of time indexes) of the indoor impulse response that can be different for each _ω , and is hereinafter referred to as a filter length. When the closeness between the speech observation spectrogram Y (ω, t) and the spectrogram X (ω, t) of the reverberant speech is measured by a square error, the spectrogram S (ω, t) of the original signal and the indoor impulse response G (ω , T), the constrained optimization problem is as shown in (Equation 40) to (Equation 43).

（式４０）の第２項は、スパース性コストである。このスパース性コストが小さいほど原信号のスペクトログラムＳ（ω，ｔ）はスパースである。λはモデル化誤差に対するコストと音声スペクトログラムのスパース性コストのバランスを調節するための定数である。また、ｐは０＜ｐ≦２の範囲で任意に定めてよい実数定数である。 The second term of (Equation 40) is sparsity cost. The smaller the sparseness cost, the sparser the spectrogram S (ω, t) of the original signal. λ is a constant for adjusting the balance between the cost for modeling error and the sparsity cost of the speech spectrogram. Further, p is a real constant that may be arbitrarily determined within the range of 0 <p ≦ 2.

なお、上記では音声観測スペクトログラムＹ（ω，ｔ）と残響音声のスペクトログラムＸ（ω，ｔ）との近さの基準を二乗誤差とした。これに対してＩダイバージェンスを音声観測スペクトログラムＹ（ω，ｔ）と残響音声のスペクトログラムＸ（ω，ｔ）との近さの基準にした場合における、原信号のスペクトログラムＳ（ω，ｔ）と室内インパルス応答Ｇ（ω，ｔ）とに関する、制約つき最適化問題は（式４４）〜（式４３）のとおりである。なお、Ｉダイバージェンスについては、例えば、文献「Ｉ．Ｃｓｉｓｚａｒ，“Ｉ−ＤｉｖｅｒｇｅｎｃｅＧｅｏｍｅｔｒｙｏｆＰｒｏｂａｂｉｌｉｔｙＤｉｓｔｒｉｂｕｔｉｏｎｓａｎｄＭｉｎｉｍｉｚａｔｉｏｎＰｒｏｂｌｅｍｓ，”ＴｈｅａｎｎａｌｓｏｆＰｒｏｂａｂｉｌｉｔｙ，Ｖｏｌ．３，Ｎｏ．１，ｐｐ．１４６−１５８，１９７５．」に記載されている。 In the above description, the square error is used as the reference for the closeness between the speech observation spectrogram Y (ω, t) and the spectrogram X (ω, t) of the reverberant speech. On the other hand, the spectrogram S (ω, t) of the original signal and the room in the case where the I divergence is used as a reference for the closeness between the spectrogram Y (ω, t) of the sound observation spectrogram Y (ω, t) Constrained optimization problems related to the impulse response G (ω, t) are as shown in (Expression 44) to (Expression 43). As for I divergence, for example, the document “I. Csiszar,“ I-Diverence Geometry of Probability Distributions and Minimization Problems, ”The Annals of Probability 3, p. It is described in.

λはモデル化誤差に対するコストと音声スペクトログラムのスパース性コストのバランスを調節するための定数である。また、ｐは０＜ｐ≦２の範囲で任意に定めてよい実数定数である。 λ is a constant for adjusting the balance between the cost for modeling error and the sparsity cost of the speech spectrogram. Further, p is a real constant that may be arbitrarily determined within the range of 0 <p ≦ 2.

次に、目的関数Ｊ（Ｓ，Ｇ）を最小化する乗法更新アルゴリズムについて説明する。目的関数Ｊ（Ｓ，Ｇ）は、乗法更新アルゴリズムと同様な反復アルゴリズムにより効率的に小さくすることができる。乗法更新アルゴリズムについては、例えば、文献「Ｄ．Ｄ．ＬｅｅａｎｄＨ．Ｓ．Ｓｅｕｎｇ，“ＬｅａｒｎｉｎｇｔｈｅＰａｒｔｓｏｆＯｂｊｅｃｔｓｂｙＮｏｎ−ｎｅｇａｔｉｖｅＭａｔｒｉｘＦａｃ−ｔｏｒｉｚａｔｉｏｎ，”ＮａｔｕｒｅＶｏｌ．４０１，ｐｐ．７８８−７９１，１９９９．」に記載されている。 Next, a multiplicative update algorithm for minimizing the objective function J (S, G) will be described. The objective function J (S, G) can be efficiently reduced by an iterative algorithm similar to the multiplicative update algorithm. The multiplicative update algorithm is described in, for example, the document “DD Lee and HS Seung,“ Learning the Part of Objects by Non-Negative Matrix Fac-torization, ”Nature Vol. 401, pp. 788-79.79.79. ."It is described in.

ここで、原信号のスペクトログラムＳ（ω，ｔ）の乗法更新式を導く。パラメータ規格化部６が出力する原信号のスペクトログラムをＳ（ω，ｔ）とする。また、原信号のスペクトログラムＳ（ω，ｔ）を出力する前に、パラメータ規格化部６が処理を行った際に出力した原信号のスペクトログラムをＳ´（ω，ｔ）とする。すなわち原信号のスペクトログラムＳ（ω，ｔ）の１ステップ前の更新値はＳ´（ω，ｔ）である。また、パラメータ規格化部６が出力する室内インパルス応答をＧ（ω，ｔ）とする。また、室内インパルス応答Ｇ（ω，ｔ）を出力する前に、パラメータ規格化部６が処理を行った際に出力する室内インパルス応答をＧ´（ω，ｔ）とする。すなわち室内インパルス応答Ｇ（ω，ｔ）の１ステップ前の更新値はＧ´（ω，ｔ）である。ここで、ｍ_τ（ω，ｔ）を（式４８）とすると、（式４９）が成り立つ。 Here, a multiplicative update formula for the spectrogram S (ω, t) of the original signal is derived. The spectrogram of the original signal output from the parameter normalization unit 6 is S (ω, t). Further, before outputting the spectrogram S (ω, t) of the original signal, the spectrogram of the original signal output when the parameter normalization unit 6 performs processing is assumed to be S ′ (ω, t). That is, the updated value of the spectrogram S (ω, t) of the original signal one step before is S ′ (ω, t). The indoor impulse response output by the parameter normalization unit 6 is assumed to be G (ω, t). In addition, before outputting the indoor impulse response G (ω, t), the indoor impulse response output when the parameter normalization unit 6 performs processing is defined as G ′ (ω, t). That is, the updated value of the indoor impulse response G (ω, t) one step before is G ′ (ω, t). Here, when m _τ (ω, t) is (Expression 48), (Expression 49) is established.

（式４９）の右辺を（式５０）とする。（式５０）を最小化するように原信号のスペクトログラムＳ（ω，ｔ）を更新すれば、Ｊ（Ｓ，Ｇ）の非増加性が保証される。そこで、（式５１）を解くと、Ｊ（Ｓ，Ｇ）の非増加性が保証される更新式（式５２）を得る。ただし、Ｘ´（ω，ｔ）は（式５３）である。 The right side of (Expression 49) is defined as (Expression 50). If the spectrogram S (ω, t) of the original signal is updated so as to minimize (Equation 50), the non-increasing property of J (S, G) is guaranteed. Therefore, solving (Equation 51) yields an update equation (Equation 52) that guarantees the non-increasing property of J (S, G). However, X ′ (ω, t) is (Formula 53).

（式５２）のとおり、原信号のスペクトログラムＳ（ω，ｔ）の更新値は、１ステップ前の更新値Ｓ´（ω，ｔ）と更新係数との積となる。このような形の更新式を乗法更新式という。また（式５２）より、Ｓ´（ω，ｔ）およびＧ´（ω，ｔ）の要素がすべて非負値であれば、Ｓ（ω，ｔ）の要素はすべて非負値に更新されることがわかり、（式４１）の条件を満たす。また、（式５４）であれば、必ず（式５５）となる。よって、初期設定で（式５５）としておけば、この更新により（式４３）の条件を逸脱することはない。 As shown in (Formula 52), the updated value of the spectrogram S (ω, t) of the original signal is the product of the updated value S ′ (ω, t) one step before and the update coefficient. Such an update formula is called a multiplicative update formula. Further, from (Equation 52), if all the elements of S ′ (ω, t) and G ′ (ω, t) are non-negative values, all the elements of S (ω, t) are updated to non-negative values. As can be seen, the condition of (Equation 41) is satisfied. In addition, if (Formula 54), it will be (Formula 55). Therefore, if (Formula 55) is set in the initial setting, this update does not deviate from the condition of (Formula 43).

次に、室内インパルス応答Ｇ（ω，ｔ）の乗法更新式を導く。原信号のスペクトログラムＳ（ω，ｔ）の乗法更新式を導いた際と同様に、原信号のスペクトログラムＳ（ω，ｔ）の１ステップ前の更新値をＳ´（ω，ｔ）とし、室内インパルス応答Ｇ（ω，ｔ）の１ステップ前の更新値をＧ´（ω，ｔ）とする。（式４８）を用いると（式５６）が成り立つ。 Next, a multiplicative update formula for the indoor impulse response G (ω, t) is derived. Similar to the case of deriving the multiplicative update formula of the spectrogram S (ω, t) of the original signal, the update value one step before the spectrogram S (ω, t) of the original signal is S ′ (ω, t), The updated value of the impulse response G (ω, t) one step before is G ′ (ω, t). If (Formula 48) is used, (Formula 56) is established.

（式５６）の右辺を（式５７）とする。原信号のスペクトログラムＳ（ω，ｔ）の乗法更新式を導いた際と同様に（式５８）を解くと、乗法更新式（式５９）を得る。ただし、乗法更新式（式５９）の導出においては、（式６０）の拘束は考慮していないため、（式５９）の更新後に規格化する必要がある。 The right side of (Expression 56) is defined as (Expression 57). When (Formula 58) is solved in the same manner as when the multiplicative update formula of the spectrogram S (ω, t) of the original signal is derived, the multiplicative update formula (Formula 59) is obtained. However, in the derivation of the multiplicative update formula (Formula 59), the constraint of (Formula 60) is not taken into consideration, and thus it is necessary to standardize after the update of (Formula 59).

次に、目的関数Ｋ（Ｓ，Ｇ）を最小化する乗法更新アルゴリズムについて説明する。目的関数Ｊ（Ｓ，Ｇ）と同様に、目的関数Ｋ（Ｓ，Ｇ）についても乗法更新アルゴリズムと同様な反復アルゴリズムにより効率的に小さくすることができる。 Next, a multiplicative update algorithm for minimizing the objective function K (S, G) will be described. Similar to the objective function J (S, G), the objective function K (S, G) can be efficiently reduced by an iterative algorithm similar to the multiplicative update algorithm.

ここで、原信号のスペクトログラムＳ（ω，ｔ）の乗法更新式を導く。パラメータ規格化部６が出力する原信号のスペクトログラムをＳ（ω，ｔ）とする。また、原信号のスペクトログラムＳ（ω，ｔ）を出力する前に、パラメータ規格化部６が処理を行った際に出力した原信号のスペクトログラムをＳ´（ω，ｔ）とする。すなわち原信号のスペクトログラムＳ（ω，ｔ）の１ステップ前の更新値はＳ´（ω，ｔ）である。また、パラメータ規格化部６が出力する室内インパルス応答をＧ（ω，ｔ）とする。また、室内インパルス応答Ｇ（ω，ｔ）を出力する前に、パラメータ規格化部６が処理を行った際に出力する室内インパルス応答をＧ´（ω，ｔ）とする。すなわち室内インパルス応答Ｇ（ω，ｔ）の１ステップ前の更新値はＧ´（ω，ｔ）である。ここで、ｍ_τ（ω，ｔ）を（式６１）とすると、（式６２）が成り立つ。 Here, a multiplicative update formula for the spectrogram S (ω, t) of the original signal is derived. The spectrogram of the original signal output from the parameter normalization unit 6 is S (ω, t). Further, before outputting the spectrogram S (ω, t) of the original signal, the spectrogram of the original signal output when the parameter normalization unit 6 performs processing is assumed to be S ′ (ω, t). That is, the updated value of the spectrogram S (ω, t) of the original signal one step before is S ′ (ω, t). The indoor impulse response output by the parameter normalization unit 6 is assumed to be G (ω, t). In addition, before outputting the indoor impulse response G (ω, t), the indoor impulse response output when the parameter normalization unit 6 performs processing is defined as G ′ (ω, t). That is, the updated value of the indoor impulse response G (ω, t) one step before is G ′ (ω, t). Here, when m _τ (ω, t) is (Expression 61), (Expression 62) is established.

（式６２）の右辺を（式６３）とする。（式６３）を最小化するように原信号のスペクトログラムＳ（ω，ｔ）を更新すれば、Ｋ（Ｓ，Ｇ）の非増加性が保証される。そこで、（式６４）を解くと、乗法更新式（式６５）を得る。ただし、Ｘ´（ω，ｔ）は（式６６）である。 The right side of (Expression 62) is defined as (Expression 63). If the spectrogram S (ω, t) of the original signal is updated so as to minimize (Equation 63), the non-increasing property of K (S, G) is guaranteed. Therefore, solving (Equation 64) yields a multiplicative update equation (Equation 65). However, X ′ (ω, t) is (Expression 66).

（式６５）より、Ｓ´（ω，ｔ）およびＧ´（ω，ｔ）の要素がすべて非負値であれば、Ｓ（ω，ｔ）の要素はすべて非負値に更新されることがわかる。 (Expression 65) indicates that if all the elements of S ′ (ω, t) and G ′ (ω, t) are non-negative values, all the elements of S (ω, t) are updated to non-negative values. .

次に、室内インパルス応答Ｇ（ω，ｔ）の乗法更新式を導く。原信号のスペクトログラムＳ（ω，ｔ）の乗法更新式を導いた際と同様に、原信号のスペクトログラムＳ（ω，ｔ）の１ステップ前の更新値をＳ´（ω，ｔ）とし、室内インパルス応答Ｇ（ω，ｔ）の１ステップ前の更新値をＧ´（ω，ｔ）とする。（式６１）を用いると（式６７）が成り立つ。 Next, a multiplicative update formula for the indoor impulse response G (ω, t) is derived. Similar to the case of deriving the multiplicative update formula of the spectrogram S (ω, t) of the original signal, the update value one step before the spectrogram S (ω, t) of the original signal is S ′ (ω, t), The updated value of the impulse response G (ω, t) one step before is G ′ (ω, t). When (Expression 61) is used, (Expression 67) is established.

（式６７）の右辺を（式６８）とする。原信号のスペクトログラムＳ（ω，ｔ）の乗法更新式を導いた際と同様に（式６９）を解くと、乗法更新式（式７０）を得る。ただし、乗法更新式（式６３）の導出においては、（式７１）の拘束は考慮していないため、（式７０）の更新後に規格化する必要がある。 The right side of (Expression 67) is defined as (Expression 68). When (Equation 69) is solved in the same manner as when the multiplicative update equation of the spectrogram S (ω, t) of the original signal is derived, the multiplicative update equation (Equation 70) is obtained. However, in the derivation of the multiplicative update equation (Equation 63), the constraint of (Equation 71) is not taken into consideration, and thus it is necessary to standardize after the update of (Equation 70).

次に、残響音パワー推定値時系列算出部３において、２つの系列間の畳み込みを計算する構成の一例である、畳み込み計算部９について説明する。畳み込み計算部９は、高速に畳み込みを計算することができる。畳み込み計算部９は、残響音パワー推定値時系列算出部３から畳み込みの計算に必要な値の入力を受け付け、この値に基づいて畳み込みの計算を行い、計算結果を残響音パワー推定値時系列算出部３に入力する。 Next, a convolution calculation unit 9 that is an example of a configuration for calculating a convolution between two sequences in the reverberation sound power estimated value time series calculation unit 3 will be described. The convolution calculator 9 can calculate the convolution at high speed. The convolution calculation unit 9 receives an input of a value necessary for calculation of convolution from the reverberation sound power estimated value time series calculation unit 3, performs convolution calculation based on this value, and calculates the calculation result as a reverberation sound power estimated value time series. Input to the calculation unit 3.

図６は、畳み込み計算部９が備える機能ブロックの構成例を示した図である。畳み込み計算部９は、零詰部９１と、高速フーリエ変換部９２と、フーリエ変換積算出部９３と、高速逆フーリエ変換部９４とを備えている。 FIG. 6 is a diagram illustrating a configuration example of functional blocks included in the convolution calculation unit 9. The convolution calculation unit 9 includes a zero padding unit 91, a fast Fourier transform unit 92, a Fourier transform product calculation unit 93, and a fast inverse Fourier transform unit 94.

以下、Ｗ（１），Ｗ（２），・・・，Ｗ（Ｔ_Ｗ）と、Ｚ（１），Ｚ（２），・・・，Ｚ（Ｔ_Ｚ）に対して、（式７２）の形で与えられる畳み込み計算の結果Ｖ（ｔ）を出力する例を用いて畳み込み計算部９の各部の説明を行う。ただし、Ｔ_ＷおよびＴ_Ｚはそれぞれの系列の要素数である。 Hereinafter, for W (1), W (2),..., W (T _W ) and Z (1), Z (2),..., Z (T _Z ), (Equation 72) Each part of the convolution calculation unit 9 will be described using an example of outputting the result V (t) of the convolution calculation given in the form of However, _TW and _TZ are the number of elements of each series.

零詰部９１は（式７２）によりＷ´（ｔ）を出力する。また、零詰部９１は（式７３）によりＺ´（ｔ）を出力する。ただし、ＵはＷ´（ｔ）およびＺ´（ｔ）の要素数である。 The zero padding portion 91 outputs W ′ (t) according to (Expression 72). Further, the zero padding portion 91 outputs Z ′ (t) according to (Expression 73). Here, U is the number of elements of W ′ (t) and Z ′ (t).

続いて、高速フーリエ変換部９２は、零詰部９１が出力したＷ´（ｔ）とＺ´（ｔ）の入力を受け付ける。続いて、高速フーリエ変換部９２は、（式７５）と（式７６）の計算をＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）により行い、ｗ（ｋ）とｚ（ｋ）を出力する。 Subsequently, the fast Fourier transform unit 92 receives input of W ′ (t) and Z ′ (t) output from the zero padding unit 91. Subsequently, the fast Fourier transform unit 92 performs calculation of (Expression 75) and (Expression 76) by FFT (Fast Fourier Transform) and outputs w (k) and z (k).

続いて、フーリエ変換積算出部９３は、高速フーリエ変換部９２が出力したｗ（ｋ）とｚ（ｋ）の入力を受け付ける。続いて、フーリエ変換積算出部９３は（式７７）の計算を行い、ｖ（ｋ）を出力する。 Subsequently, the Fourier transform product calculation unit 93 receives the input of w (k) and z (k) output from the fast Fourier transform unit 92. Subsequently, the Fourier transform product calculation unit 93 calculates (Expression 77) and outputs v (k).

続いて、高速逆フーリエ変換部９４は、フーリエ変換積算出部９３が出力したｖ（ｋ）の入力を受け付ける。続いて、高速逆フーリエ変換部９４は（式７８）の計算をＩＦＦＴ（ＩｎｖｅｒｓｅＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）により行い、Ｖ（ｔ）を出力する。 Subsequently, the fast inverse Fourier transform unit 94 receives the input of v (k) output from the Fourier transform product calculation unit 93. Subsequently, the fast inverse Fourier transform unit 94 performs calculation of (Equation 78) by IFFT (Inverse Fast Fourier Transform) and outputs V (t).

零詰部９１と、高速フーリエ変換部９２と、フーリエ変換積算出部９３と、高速逆フーリエ変換部９４の動作により、畳み込み計算部９は、高速に畳み込みを計算することができる。 The convolution calculation unit 9 can calculate the convolution at high speed by the operations of the zero padding unit 91, the fast Fourier transform unit 92, the Fourier transform product calculation unit 93, and the fast inverse Fourier transform unit 94.

次に、観測音・推定原音間相関関数算出部４１と、推定残響音・推定原音間相関関数算出部４２と、モデル化誤差比系列・推定原音間相関関数算出部４５と、観測音・推定インパルス応答間相関関数算出部５１と、スパース補正項つき推定残響音・推定インパルス応答開相関関数算出部５２と、モデル化誤差比系列・推定インパルス応答間相関関数算出部５５において、２つの系列間の相関関数を計算する構成の一例である、相関関数計算部１０について説明する。相関関数計算部１０は、高速に相関関数を計算することができる。相関関数計算部１０は、相関関数の計算結果を必要とする各部から相関関数の計算に必要な値の入力を受け付け、この値に基づいて相関関数の計算を行い、計算結果を各部に入力する。 Next, the observed sound / estimated original sound correlation function calculating unit 41, the estimated reverberant sound / estimated original sound correlation function calculating unit 42, the modeling error ratio sequence / estimated original sound correlation function calculating unit 45, and the observed sound / estimated sound The correlation function calculation unit 51 between impulse responses, the estimated reverberation sound with sparse correction term / estimated impulse response open correlation function calculation unit 52, and the modeling error ratio sequence / correlation function between estimated impulse responses 55 The correlation function calculation unit 10, which is an example of a configuration for calculating the correlation function, is described. The correlation function calculation unit 10 can calculate the correlation function at high speed. The correlation function calculation unit 10 receives an input of a value necessary for calculating the correlation function from each unit that requires the calculation result of the correlation function, calculates a correlation function based on this value, and inputs the calculation result to each unit. .

図７は、相関関数計算部１０が備える機能ブロックの構成例を示した図である。相関関数計算部１０は、零詰部１０１と、高速フーリエ変換部１０２と、複素共役化部１０３と、フーリエ変換積算出部１０４と、高速逆フーリエ変換部１０５とを備えている。 FIG. 7 is a diagram illustrating a configuration example of functional blocks included in the correlation function calculation unit 10. The correlation function calculation unit 10 includes a zero padding unit 101, a fast Fourier transform unit 102, a complex conjugate unit 103, a Fourier transform product calculation unit 104, and a fast inverse Fourier transform unit 105.

以下、Ｗ（１），Ｗ（２），・・・，Ｗ（Ｔ_Ｗ）と、Ｚ（１），Ｚ（２），・・・，Ｚ（Ｔ_Ｚ）に対して、（式７９）の形で与えられる相関関数計算の結果Ｖ（ｔ）を出力する例を用いて相関関数計算部１０の各部の説明を行う。ただし、Ｔ_ＷおよびＴ_Ｚはそれぞれの系列の要素数である。 Hereinafter, W (1), W ( 2), ···, and _{W (T W), Z (} 1), Z (2), ···, against _{Z (T} Z), (Formula 79) Each part of the correlation function calculation unit 10 will be described using an example of outputting a correlation function calculation result V (t) given in the form of However, _TW and _TZ are the number of elements of each series.

零詰部１０１は（式８０）によりＷ´（ｔ）を出力する。また、零詰部１０１は（式８１）によりＺ´（ｔ）を出力する。ただし、ＵはＷ´（ｔ）およびＺ´（ｔ）の要素数である。 The zero padding unit 101 outputs W ′ (t) according to (Expression 80). Further, the zero padding portion 101 outputs Z ′ (t) according to (Expression 81). Here, U is the number of elements of W ′ (t) and Z ′ (t).

続いて、高速フーリエ変換部１０２は、零詰部１０１が出力したＷ´（ｔ）とＺ´（ｔ）の入力を受け付ける。続いて、高速フーリエ変換部１０２は、（式８２）と（式８３）の計算をＦＦＴにより行い、ｗ（ｋ）とｚ（ｋ）を出力する。 Subsequently, the fast Fourier transform unit 102 receives input of W ′ (t) and Z ′ (t) output from the zero padding unit 101. Subsequently, the fast Fourier transform unit 102 performs calculation of (Expression 82) and (Expression 83) by FFT, and outputs w (k) and z (k).

続いて、複素共役化部１０３は、高速フーリエ変換部１０２出力したｗ（ｋ）とｚ（ｋ）の入力を受け付ける。続いて、複素共役化部１０３は、（式８４）と（式８５）の操作を行い、ｗ（ｋ）とｚ（ｋ）を出力する。ただし、（・）^＊は複素共役を表す。また、「←」は代入を表す。 Subsequently, the complex conjugate unit 103 receives the input of w (k) and z (k) output from the fast Fourier transform unit 102. Subsequently, the complex conjugate unit 103 performs the operations of (Expression 84) and (Expression 85), and outputs w (k) and z (k). However, (·) ^* represents a complex conjugate. “←” represents substitution.

続いて、フーリエ変換積算出部１０４は、複素共役化部１０３が出力したｗ（ｋ）とｚ（ｋ）の入力を受け付ける。続いて、フーリエ変換積算出部９３は（式８６）の計算を行い、ｖ（ｋ）を出力する。 Subsequently, the Fourier transform product calculation unit 104 receives the input of w (k) and z (k) output from the complex conjugate unit 103. Subsequently, the Fourier transform product calculation unit 93 calculates (Equation 86) and outputs v (k).

続いて、高速逆フーリエ変換部１０５は、フーリエ変換積算出部１０４が出力したｖ（ｋ）の入力を受け付ける。続いて、高速逆フーリエ変換部１０５は（式８７）の計算をＩＦＦＴにより行い、Ｖ（ｔ）を出力する。 Subsequently, the fast inverse Fourier transform unit 105 receives the input of v (k) output from the Fourier transform product calculation unit 104. Subsequently, the fast inverse Fourier transform unit 105 performs calculation of (Expression 87) by IFFT and outputs V (t).

零詰部１０１と、高速フーリエ変換部１０２と、複素共役化部１０３と、フーリエ変換積算出部１０４と、高速逆フーリエ変換部１０５の動作により、相関関数計算部１０は高速に相関関数を計算することができる。 The correlation function calculation unit 10 calculates the correlation function at high speed by the operations of the zero padding unit 101, the fast Fourier transform unit 102, the complex conjugate unit 103, the Fourier transform product calculation unit 104, and the fast inverse Fourier transform unit 105. can do.

本実施形態の残響除去装置は、音声信号のスパース性を基準として、上述したアルゴリズムを用いて残響成分のパワー包絡を推定する。よって、本実施形態の残響除去装置は、音源の移動や室温変化などに伴う残響変化に対して頑健に動作する。また、本実施形態の残響除去装置は、残響環境に変化がない場合においても、従来知られている残響除去装置と同等程度の性能で残響除去を行うことができる。また、本実施形態の残響除去装置は、高速に残響除去の計算を行うことができる。 The dereverberation apparatus of this embodiment estimates the power envelope of the reverberation component using the algorithm described above with reference to the sparsity of the audio signal. Therefore, the dereverberation apparatus of the present embodiment operates robustly against reverberation changes accompanying movement of the sound source, room temperature changes, and the like. Further, the dereverberation apparatus of the present embodiment can perform dereverberation with the same level of performance as a conventionally known dereverberation apparatus even when there is no change in the reverberation environment. In addition, the dereverberation apparatus of the present embodiment can perform dereverberation calculation at high speed.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成、プログラム、およびシステムはこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration, program, and system are not limited to this embodiment, and the design and the like without departing from the gist of the present invention. Is also included.

また、残響除去装置の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより、音声信号の残響除去を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものであってもよい。 In addition, a program for realizing the function of the dereverberation apparatus is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system and executed, thereby executing dereverberation of the audio signal. May be performed. Here, the “computer system” may include an OS and hardware such as peripheral devices.

また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used. The “computer-readable recording medium” means a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, etc. This is a storage device.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（Dynamic Random Access Memory））のように、一定時間プログラムを保持しているものも含むものとする。 Further, the “computer-readable recording medium” means a volatile memory (for example, DRAM (Dynamic DRAM) in a computer system that becomes a server or a client when a program is transmitted through a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

本発明の一実施形態における残響除去装置の機能ブロック図である。It is a functional block diagram of the dereverberation apparatus in one Embodiment of this invention. 本実施形態における室内インパルス応答更新部が備える機能ブロックの構成を示した図である。It is the figure which showed the structure of the functional block with which the indoor impulse response update part in this embodiment is provided. 本実施形態における室内インパルス応答更新部が備える機能ブロックの構成を示した図である。It is the figure which showed the structure of the functional block with which the indoor impulse response update part in this embodiment is provided. 本実施形態における原音パワー推定値時系列更新部が備える機能ブロックの構成を示した図である。It is the figure which showed the structure of the functional block with which the original sound power estimated value time series update part in this embodiment is provided. 本実施形態における原音パワー推定値時系列更新部が備える機能ブロックの構成を示した図である。It is the figure which showed the structure of the functional block with which the original sound power estimated value time series update part in this embodiment is provided. 本実施形態における畳み込み計算部が備える機能ブロックの構成を示した図である。It is the figure which showed the structure of the functional block with which the convolution calculation part in this embodiment is provided. 本実施形態における相関関数計算部が備える機能ブロックの構成を示した図である。It is the figure which showed the structure of the functional block with which the correlation function calculation part in this embodiment is provided.

Explanation of symbols

１・・・観測パワー時系列生成部、２・・・初期設定部、３・・・残響音パワー推定値時系列算出部、４・・・室内インパルス応答更新部、５・・・原音パワー推定値時系列更新部、６・・・パラメータ規格化部、７・・・収束判定部、８・・・パラメータ出力部、９・・・畳み込み計算部、１０・・・相関関数計算部、４１・・・観測音・推定原音間相関関数算出部、４２・・・推定残響音・推定原音間相関関数算出部、４３，４７・・・室内インパルス応答推定値更新係数算出部、４４，４８・・・室内インパルス応答推定値更新値出力部、４５・・・モデル化誤差比系列・推定原音間相関関数算出部、４６・・・推定原音部分和系列算出部、５１・・・観測音・推定インパルス応答間相関関数算出部、５２・・・スパース補正項つき推定残響音・推定インパルス応答間相関関数算出部、５３，５７・・・原音パワー推定値時系列更新係数算出部、５４，５８・・・原音パワー推定値時系列更新値出力部、５５・・・モデル化誤差比系列・推定インパルス応答間相関関数算出部、５６・・・スパース補正項つき推定インパルス応答部分和系列算出部、９１，１０１・・・零詰部、９２，１０２・・・高速フーリエ変換部、９３，１０４・・・フーリエ変換積算出部、９４，１０５・・・高速逆フーリエ変換部、１０３・・・複素共役化部 DESCRIPTION OF SYMBOLS 1 ... Observation power time series production | generation part, 2 ... Initial setting part, 3 ... Reverberation sound power estimated value time series calculation part, 4 ... Indoor impulse response update part, 5 ... Original sound power estimation Value time series update unit, 6 ... parameter normalization unit, 7 ... convergence determination unit, 8 ... parameter output unit, 9 ... convolution calculation unit, 10 ... correlation function calculation unit, 41. .. Correlation function calculation unit between observed sound / estimated original sound, 42... Estimated reverberation sound / estimated original sound correlation function calculation unit, 43, 47... Indoor impulse response estimated value update coefficient calculation unit, 44, 48. Indoor impulse response estimated value update value output unit, 45 ... modeling error ratio sequence / estimated original sound correlation function calculating unit, 46 ... estimated original sound partial sum sequence calculating unit, 51 ... observed sound / estimated impulse Response correlation function calculation unit, 52... Reverberation sound / estimated impulse response correlation function calculation unit, 53, 57... Original sound power estimated value time series update coefficient calculation unit, 54, 58... Original sound power estimated value time series update value output unit, 55. Modeling error ratio sequence / estimated impulse response correlation function calculation unit, 56... Estimated impulse response partial sum sequence calculation unit with sparse correction term, 91, 101. Transformer, 93, 104 ... Fourier transform product calculator, 94, 105 ... Fast inverse Fourier transform, 103 ... Complex conjugate unit

Claims

An observation power time series generation unit that receives an input of an acoustic signal and generates an observation power time series that is a time series of amplitude or power of a subband signal for each frequency channel by short-time frequency analysis;
An initial setting unit for setting an indoor impulse response estimated value having a non-negative constraint for each frequency channel and an original sound power estimated value time series that is a power estimated value time series for each frequency channel of the original sound;
Reverberation sound power estimation time series calculation that convolves the indoor impulse response estimation value and the original sound power estimation time series to calculate a reverberation power estimation time series that is a power time series of a reverberation sound model for each frequency channel. And
Based on the observation power time series, the reverberation sound power estimate time series, the room impulse response estimate, and the original sound power estimate time series, the room impulse response estimate is updated by satisfying a non-negative constraint. An indoor impulse response updating unit
Based on the observation power time series, the reverberation sound power estimation time series, the room impulse response estimation value, and the original sound power estimation time series, the original sound power estimation time series satisfying non-negative constraints An original sound power estimated value time series update unit to be updated;
The room impulse response estimated value updated by the room impulse response updating unit is normalized so that the sum of the element values of the room impulse response estimated value becomes a constant value, and the original sound power estimated value time series updating unit is updated. A parameter normalization unit that normalizes the original sound power estimated value time series so that a sum of element values of the original sound power estimated value time series becomes a constant value;
A convergence determination unit that determines whether the indoor impulse response estimated value and the original sound power estimated value time series normalized by the parameter normalization unit satisfy a predetermined criterion;
When the convergence determining unit determines that the room impulse response estimated value normalized by the parameter normalizing unit and the original sound power estimated value time series satisfy a predetermined criterion, the room impulse response estimated value And a parameter output unit for outputting the original sound power estimated value time series,
With
When the convergence determining unit determines that the room impulse response estimated value and the original sound power estimated value time series normalized by the parameter normalizing unit do not satisfy a predetermined criterion, the parameter normalizing unit Based on the normalized room impulse response estimated value and the original sound power estimated value time series, the reverberant power estimated value time series calculating unit calculates the reverberant power estimated value time series, and updates the room impulse response Unit updates the indoor impulse response estimated value, the original sound power estimated value time series update unit updates the original sound power estimated value time series, and the original sound power estimated value time series update unit performs the original sound power estimated value time series The dereverberation apparatus characterized by renewing.

The indoor impulse response update unit,
For each frequency channel, an observed sound / estimated original sound correlation function calculating unit that calculates a correlation function between the observed power / estimated original sound, which is a correlation function between the observed power time series and the original sound power estimated value time series,
For each frequency channel, calculating estimated reverberation Probable original correlation function for calculating the estimated reverberation Probable original correlation function is a correlation function of the reverberation power estimate time series and the original sound power estimate time series And
Update of the room impulse response estimated value, which is a value obtained by dividing the time series element value of the correlation function between the observed sound and the estimated original sound by the time series element value of the estimated reverberation sound and the estimated original sound for each frequency channel. An indoor impulse response estimated value update coefficient calculation unit for calculating a coefficient;
For each frequency channel, the indoor impulse response estimated value and the indoor impulse response estimated value update coefficient are integrated element by element, and an indoor impulse response estimated value update value output unit that calculates an indoor impulse response estimated value update value;
The dereverberation apparatus according to claim 1, further comprising:

The indoor impulse response update unit,
For each frequency channel, a modeled error ratio sequence / estimated original sound, which is a correlation function of the time series obtained by dividing the observed power time series by the reverberant power estimate time series for each element and the original sound power estimate time series A modeled error ratio sequence / estimated original sound correlation function calculation unit for calculating an inter-correlation function;
An estimated original sound partial sum sequence calculating unit that calculates an estimated original sound partial sum sequence that is a sequence having element values of partial sums of element values of each specific range of the original sound power estimated value time series for each frequency channel;
An indoor impulse response estimated value update coefficient for calculating an indoor impulse response estimated value update coefficient, which is a value obtained by dividing the correlation function between the modeled error ratio sequence and the estimated original sound by the element value of the estimated original sound partial sum series for each frequency channel A calculation unit;
For each frequency channel, the indoor impulse response estimated value and the indoor impulse response estimated value update coefficient are integrated element by element, and an indoor impulse response estimated value update value output unit that calculates an indoor impulse response estimated value update value;
The dereverberation apparatus according to claim 1, further comprising:

The original sound power estimated value time series update unit,
For each frequency channel, an observed sound / estimated impulse response correlation function calculating unit that calculates a correlation function between the observed sound / estimated impulse response that is a correlation function between the observed power time series and the indoor impulse response estimated value;
For each frequency channel, a correlation function between the reverberation sound power estimated time series and the room impulse response estimated value is calculated, and the correlation function and the original sound power estimated time series are multiplied by a constant for each element and further multiplied by a constant. An estimated reverberation sound with a sparse correction term and an estimated impulse response correlation function that calculates a correlation function between the estimated reverberation sound and the estimated impulse response with a sparse correction term, which is a time series obtained by adding the calculated time series,
For each frequency channel, the original sound is a value obtained by dividing the time series element of the correlation function between the observed sound and the estimated impulse response by the time series element value of the estimated reverberant sound with the sparse correction term and the correlation function between the estimated impulse responses. An original sound power estimated time series update coefficient calculating unit for calculating a power estimated time series update coefficient;
For each frequency channel, the original sound power estimated value time series and the original sound power estimated value time series update coefficient are integrated element by element, and the original sound power estimated value time series updated value is calculated. And
The dereverberation apparatus according to any one of claims 1 to 3, further comprising:

The original sound power estimated value time series update unit,
For each frequency channel, a modeled error ratio sequence / estimated impulse response that is a correlation function between the time series obtained by dividing the observed power time series by the reverberant sound power estimated value time series for each element and the indoor impulse response estimated value A modeled error ratio sequence / estimated impulse response correlation function calculation unit for calculating an inter-correlation function;
For each frequency channel, calculate a series having element values that are partial sums of the element values of each specific range of the indoor impulse response estimation values, multiply the series and the original sound power estimation value time series by a constant power for each element, and An estimated impulse response partial sum series calculation unit with a sparse correction term that calculates an estimated impulse response partial sum sequence with a sparse correction term that is a time series obtained by adding a time series multiplied by a constant,
For each frequency channel, when the original sound power estimated value is a value obtained by dividing the time series element of the correlation function between the modeled error ratio series and the estimated impulse response by the element value of the estimated impulse response partial sum series with the sparse correction term. An original sound power estimated value time series update coefficient calculation unit for calculating a series update coefficient;
For each frequency channel, the original sound power estimated value time series and the original sound power estimated value time series update coefficient are integrated element by element, and the original sound power estimated value time series updated value is calculated. And
The dereverberation apparatus according to any one of claims 1 to 3, further comprising:

An observation power time series generation unit that receives an input of an acoustic signal and generates an observation power time series that is a time series of amplitude or power of a subband signal for each frequency channel by short-time frequency analysis; and ,
An initial setting step in which an initial setting unit sets an indoor impulse response estimated value having a non-negative constraint for each frequency channel and an original sound power estimated value time series that is a power estimated value time series for each frequency channel of the original sound;
A reverberant sound power estimated time series calculation unit convolves the room impulse response estimated value with the original sound power estimated time series, and a reverberant sound power estimated time series that is a power time series of a reverberant sound model for each frequency channel. Reverberation sound power estimation time series calculation step for calculating
The indoor impulse response update unit satisfies the non-negative constraint based on the observation power time series, the reverberation power estimation value time series, the indoor impulse response estimation value, and the original sound power estimation value time series, and An indoor impulse response update step for updating the indoor impulse response estimate;
An original sound power estimated value time series update unit performs non-negative constraints based on the observed power time series, the reverberant sound power estimated time series, the room impulse response estimated value, and the original sound power estimated value time series. An original sound power estimate time series update step that satisfies and updates the original sound power estimate time series; and
The parameter normalization unit normalizes the room impulse response estimated value updated in the room impulse response update step so that the sum of the element values of the room impulse response estimated value becomes a constant value, and the original sound power estimated value A parameter normalizing step for normalizing the original sound power estimated value time series updated in the series updating step so that the sum of the element values of the original sound power estimated value time series becomes a constant value;
A convergence determining step for determining whether the room impulse response estimated value and the original sound power estimated value time series normalized by the parameter normalizing step satisfy a predetermined criterion; and
When it is determined in the convergence determination step that the room impulse response estimated value normalized by the parameter normalization unit and the original sound power estimated value time series satisfy a predetermined criterion, the parameter output unit A parameter output step for outputting the impulse response estimated value and the original sound power estimated value time series;
Have
In the convergence determination step, when it is determined that the room impulse response estimated value and the original sound power estimated value time series normalized by the parameter normalization unit do not satisfy a predetermined criterion, the parameter normalizing step Based on the normalized room impulse response estimated value and the original sound power estimated value time series, the reverberant sound power estimated value time series is calculated in the reverberant power estimated value time series calculating step, and the room impulse response update is performed. Updating the room impulse response estimated value in the step, updating the original sound power estimated value time series in the original sound power estimated value time series updating step, and updating the original sound power estimated value time series in the original sound power estimated time series The dereverberation method characterized by updating the above.

The indoor impulse response update step includes:
The observed sound / estimated original sound correlation function calculator calculates an observed sound / estimated original sound correlation function that is a correlation function between the observed power time series and the original sound power estimated value time series for each frequency channel. A step of calculating a correlation function between estimated original sounds;
Estimated reverberation Probable original correlation function calculating unit, for each frequency channel, it estimates the reverberation Probable original correlation function is a correlation function of the reverberation power estimate time series and the original sound power estimate time series An estimated reverberation sound / estimated original sound correlation function calculating step,
The indoor impulse response estimated value update coefficient calculation unit calculates, for each frequency channel, the time series element value of the correlation function between the observed sound and the estimated original sound as the time series element value of the estimated reverberant sound and the estimated original sound correlation function. An indoor impulse response estimated value update coefficient calculating step for calculating an indoor impulse response estimated value update coefficient that is a divided value;
An indoor impulse response estimated value update value output unit integrates the indoor impulse response estimated value and the indoor impulse response estimated value update coefficient element by element for each frequency channel, and calculates an indoor impulse response estimated value update value An impulse response estimated value update value output step;
The dereverberation method according to claim 6, further comprising:

The indoor impulse response update step includes:
A modeling error ratio sequence / estimated original sound correlation function calculation unit, for each frequency channel, divides the observed power time series by the reverberant sound power estimated value time series for each element, and the original sound power estimated time A modeling error ratio sequence / estimated original sound correlation function calculating step for calculating a modeling error ratio sequence / estimated original sound correlation function that is a correlation function with the sequence;
An estimated original sound partial sum sequence calculation unit calculates an estimated original sound partial sum sequence that is a sequence having element values of partial sums of element values of each specific range of the original sound power estimated value time series for each frequency channel Sum series calculation step;
The indoor impulse response estimated value update coefficient calculation unit updates the indoor impulse response estimated value that is a value obtained by dividing the modeled error ratio sequence / estimated original sound correlation function by the element value of the estimated original sound partial sum sequence for each frequency channel. An indoor impulse response estimated value update coefficient calculation step for calculating a coefficient;
An indoor impulse response estimated value update value output unit integrates the indoor impulse response estimated value and the indoor impulse response estimated value update coefficient element by element for each frequency channel, and calculates an indoor impulse response estimated value update value An impulse response estimated value update value output step;
The dereverberation method according to claim 6, further comprising:

The original sound power estimated value time series update step includes:
Observation sound / estimated impulse response correlation function calculation unit calculates, for each frequency channel, an observation sound / estimated impulse response correlation function that is a correlation function between the observed power time series and the indoor impulse response estimated value A step of calculating a correlation function between estimated impulse responses;
An estimated reverberant sound / estimated impulse response correlation function calculation unit with a sparse correction term calculates a correlation function between the reverberant power estimated value time series and the indoor impulse response estimated value for each frequency channel, and the correlation function With a sparse correction term for calculating a correlation function between estimated reverberant sound and estimated impulse response, which is a time series obtained by adding the time series obtained by multiplying the original sound power estimated value time series by a constant power for each element and further multiplying by a constant A step of calculating a correlation function between the estimated reverberant sound and the estimated impulse response;
The original sound power estimated value time series update coefficient calculation unit calculates, for each frequency channel, the time series elements of the correlation function between the observed sound and the estimated impulse response, and the correlation function between the estimated reverberant sound with the sparse correction term and the estimated impulse response. An original sound power estimated value time series update coefficient calculating step for calculating an original sound power estimated value time series update coefficient that is a value divided by a time series element value;
An original sound power estimated value time series update value output unit integrates the original sound power estimated value time series and the original sound power estimated value time series update coefficient element by element for each frequency channel, and the original sound power estimated value time series updated value An original sound power estimated value time series update value output step for calculating
The dereverberation method according to any one of claims 6 to 8, further comprising:

The original sound power estimated value time series update step includes:
The modeling error ratio sequence / estimated impulse response correlation function calculation unit, for each frequency channel, divides the observed power time series into elements by the reverberant power estimate time series, and the indoor impulse response estimation A modeling error ratio sequence / estimated impulse response correlation function calculating step for calculating a modeling error ratio sequence / estimated impulse response correlation function, which is a correlation function with a value;
An estimated impulse response partial sum series calculation unit with a sparse correction term calculates, for each frequency channel, a series having element values as partial sums of element values of each specific range of the indoor impulse response estimated values, and the series, Estimated impulse response partial sum sequence with sparse correction term to calculate the estimated impulse response partial sum sequence with sparse correction term, which is a time series obtained by multiplying the original sound power estimated value time series by a constant power for each element and adding a time series multiplied by a constant A calculation step;
The original sound power estimated value time-series update coefficient calculation unit calculates, for each frequency channel, the time-series element of the modeling error ratio series / estimated impulse response correlation function as the element of the estimated impulse response partial sum series with the sparse correction term. An original sound power estimated value time series update coefficient calculating step for calculating an original sound power estimated value time series update coefficient that is a value divided by the value;
An original sound power estimated value time series update value output unit integrates the original sound power estimated value time series and the original sound power estimated value time series update coefficient element by element for each frequency channel, and the original sound power estimated value time series updated value An original sound power estimated value time series update value output step for calculating
The dereverberation method according to any one of claims 6 to 8, further comprising:

A computer program for causing a computer to operate as the dereverberation apparatus according to any one of claims 1 to 5.

A computer-readable recording medium having recorded thereon a computer program for operating the computer as the dereverberation apparatus according to any one of claims 1 to 5.