JP2007235875A

JP2007235875A - Transmission path estimating method, echo canceling method, sound source separating method, apparatus therefor, program, and recording medium

Info

Publication number: JP2007235875A
Application number: JP2006058189A
Authority: JP
Inventors: Akira Emura; 暁江村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-03-03
Filing date: 2006-03-03
Publication date: 2007-09-13
Anticipated expiration: 2026-03-03
Also published as: JP4422692B2

Abstract

<P>PROBLEM TO BE SOLVED: To estimate a transmission path or input signal without using previous information of the transmission path. <P>SOLUTION: The present invention is based upon processing steps wherein a present signal space matrix and a past/future signal space matrix are determined from a multi-channel signal, a projection residual signal space matrix is determined by projecting the present signal space matrix on an orthogonal base of the past/future signal space matrix, and an optimal response length is directly estimated from the projection residual signal space matrix. A corrected projection residual signal space is determined by correcting a projection residual signal space based on the estimated optimal response length, and an impulse response of the transmission path is estimated from the corrected projection residual signal space. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、１入力多出力もしくは多入力多出力の線形伝達経路からの多チャネル出力信号から、伝達経路の事前情報をもちいずに、伝達経路や入力信号を推定する伝達経路推定方法、装置、プログラムおよび記録媒体、および、これを利用した残響除去方法、音源分離方法、これらの装置、プログラム、記録媒体に関する。 The present invention relates to a transmission path estimation method, apparatus, and the like for estimating a transmission path and an input signal from a multi-channel output signal from a linear transmission path of 1-input multi-output or multi-input multi-output without using prior information of the transmission path, The present invention relates to a program and a recording medium, and a dereverberation method, a sound source separation method, an apparatus, a program, and a recording medium using the same.

線形伝達経路を経た観測信号から、伝達経路およびソース信号の知識を全く使用せずに伝達経路やソース信号を推定する問題は、ブラインド推定問題と呼ばれ、音声音響信号処理や無線通信の分野で研究が進められている。
音声音響信号処理の分野では、室内で収録された音声に壁等からの反射で生じる残響が混入し、音声の明瞭性が低下して認識しにくくなる問題がある。ブラインド推定問題を解くことで、音声の明瞭性を回復して、認識しやすくできる。また無線通信では、ビル等の建築物による反射も受信することでゴーストが生じ、伝送誤りが生じさせやすくなる。ブラインド推定問題を解くことで、伝送誤りを減らして伝送速度を向上させることができる。 The problem of estimating the transmission path and source signal from the observed signal that has passed through the linear transmission path without using any knowledge of the transmission path and source signal is called the blind estimation problem, and is used in the fields of audio-acoustic signal processing and wireless communication. Research is ongoing.
In the field of audio-acoustic signal processing, there is a problem that reverberation caused by reflection from a wall or the like is mixed in the sound recorded indoors, and the clarity of the sound is lowered and is difficult to recognize. Solving the blind estimation problem restores the clarity of speech and makes it easier to recognize. In wireless communication, ghosts are generated by receiving reflections from buildings such as buildings, and transmission errors are likely to occur. Solving the blind estimation problem can reduce transmission errors and improve the transmission rate.

これら残響やゴーストの現象は、図１０に示すようにソース信号をs(k)、Pチャネル観測信号をy₁(k)〜y_P(k)として、モデル化することができる。

ただしh₁(k)…h_P(k)は伝達経路のインパルス応答であり、Nはインパルス応答の長さになる。
無線通信のブラインド推定問題では、観測信号の周波数特性は伝達経路に由来することがほとんどである。一方、音声音響分野のブラインド推定問題では、観測信号の周波数特性がソース信号と伝達経路の両方に由来し、推定が一層困難になっている。
上記ブラインド推定問題に関して、有色のソース信号から伝達経路を推定する方法として、Least Square Smoothing法（以下LSS法と表記）とその拡張であるJ-LSS法が提案されている（非特許文献１）。 These reverberation and ghost phenomena can be modeled with the source signal as s (k) and the P channel observation signal as y ₁ (k) to y _P (k) as shown in FIG.

However, h ₁ (k)... H _P (k) is the impulse response of the transmission path, and N is the length of the impulse response.
In the wireless communication blind estimation problem, the frequency characteristics of the observed signal are mostly derived from the transmission path. On the other hand, in the blind estimation problem in the audio-acoustic field, the frequency characteristics of the observation signal are derived from both the source signal and the transmission path, making estimation more difficult.
Regarding the blind estimation problem, the Least Square Smoothing method (hereinafter referred to as LSS method) and the J-LSS method, which is an extension thereof, have been proposed as methods for estimating a transmission path from a colored source signal (Non-patent Document 1). .

LSS法の処理を、図９乃至図１２を用いて説明する。LSS法では、実際のインパルス応答長Nが既知であることを前提とし、これを伝達経路インパルス応答の想定長jとして図９に示す現信号空間行列生成手段１１１、過去信号空間行列生成手段１１２、未来信号空間行列生成手段１１３に設定する。尚、ここで図９に示す多チャネル信号蓄積手段１０１に蓄積されている多チャネル信号について、図１１を用いて説明する。多チャネル信号蓄積手段１０１には或る時刻における所定時間長の多チャネル信号を蓄積する。蓄積された多チャネル信号の時間軸方向のほぼ中央に位置する信号からなるベクトルで張られる空間を現信号空間と称し、この現信号より時間的に古い信号ベクトルが張る空間を過去信号空間、現信号空間より時間的に新しい信号ベクトルが張る空間を未来信号空間と称すものである。 The processing of the LSS method will be described with reference to FIGS. In the LSS method, it is assumed that the actual impulse response length N is known, and this is assumed as the assumed length j of the transmission path impulse response, and the current signal space matrix generation unit 111, the past signal space matrix generation unit 112 shown in FIG. The future signal space matrix generation means 113 is set. Here, the multi-channel signal stored in the multi-channel signal storage means 101 shown in FIG. 9 will be described with reference to FIG. The multi-channel signal storage means 101 stores a multi-channel signal having a predetermined time length at a certain time. A space spanned by a vector composed of signals located approximately in the center of the time axis direction of the accumulated multi-channel signal is called a current signal space, and a space spanned by a signal vector that is older in time than the current signal is past signal space, current space. A space in which a signal vector that is temporally newer than the signal space is called a future signal space.

ステップSP1（図１２参照）：多チャネル信号蓄積手段１０１に蓄積されたPチャネル観測信号から、過去信号空間、未来信号空間、現信号空間を求める。
以下では、Pチャネル収音信号の時刻ｋの値y₁(k)〜y_P(k)からなる縦ベクトル

をベースとして説明を進める。
過去信号空間行列生成手段１１２は、多チャネル信号蓄積手段１０１に蓄積された多チャネル信号から、多チャネル過去信号空間として過去信号空間行列Z_P（非特許文献１の“past data matrix”）を生成する。未来信号空間行列生成手段１１３は、同様に多チャネル未来信号空間として未来信号空間行列をZ_F（非特許文献１の“future data matrix”）を生成する。

wの推奨値はjである。またTについては、T>2wに設定する。行列Z_P、Z_Fは（P× w）行（T+1）列の行列になる。 Step SP1 (see FIG. 12): A past signal space, a future signal space, and a current signal space are obtained from the P channel observation signals accumulated in the multichannel signal accumulation means 101.
In the following, a vertical vector consisting of values y ₁ (k) to y _P (k) of time k of the P channel sound pickup signal

The explanation will proceed based on the above.
The past signal space matrix generation unit 112 generates a past signal space matrix Z _P (“past data matrix” in Non-Patent Document 1) as a multi-channel past signal space from the multi-channel signals stored in the multi-channel signal storage unit 101. To do. The future signal space matrix generation unit 113 similarly generates a future signal space matrix Z _F (“future data matrix” of Non-Patent Document 1) as a multi-channel future signal space.

The recommended value for w is j. For T, set T> 2w. The matrices Z _P and Z _F are (P × w) rows (T + 1) columns.

現信号空間行列生成手段１１１は、多チャネル現信号空間として（P× w）行（T+1）列の現信号空間行列Y（非特許文献１の“current data matrix”）を生成する。

ステップSP2：Z行列生成手段１１５にて、多チャネル過去信号空間行列Z_Pと多チャネル未来信号空間行列Z_Fから過去未来信号空間行列Z（非特許文献１の“future-past data matrix”）を生成する。

The current signal space matrix generating unit 111 generates a current signal space matrix Y (“current data matrix” in Non-Patent Document 1) of (P × w) rows (T + 1) columns as a multi-channel current signal space.

Step SP2: In the Z matrix generation means 115, the past future signal space matrix Z (“future-past data matrix” in Non-Patent Document 1) is obtained from the multi-channel past signal space matrix Z _P and the multi-channel future signal space matrix Z _F. Generate.

ステップSP3：多チャネル現信号空間を、多チャネル過去信号空間および多チャネル未来信号空間に射影して、射影残差信号空間を求める。そのために、過去未来信号空間直交基底算出手段１２１において、過去未来信号空間に対応する行列Zの直交基底を求める。この直交基底Vは行列Zを
Z＝UΣV^T
のように特異値分解することで得られる。 Step SP3: The multi-channel current signal space is projected onto the multi-channel past signal space and the multi-channel future signal space to obtain a projected residual signal space. For this purpose, the past future signal space orthogonal basis calculation means 121 obtains the orthogonal basis of the matrix Z corresponding to the past future signal space. This orthogonal basis V is the matrix Z
Z = UΣV ^T
It is obtained by singular value decomposition as follows.

射影残差信号行列抽出手段１３１において、現信号空間の過去未来信号空間への射影をYVV^Tで求める。そして射影残差信号空間を
E＝Y−YVV^T
により、射影残差信号空間行列E（非特許文献１の“projection error matrix”）として求める。 In the projection residual signal matrix extracting means 131 calculates the projection of the past future signal space of the current signal space YVV ^T. And projective residual signal space
E = Y−YVV ^T
Is obtained as a projection residual signal space matrix E (“projection error matrix” in Non-Patent Document 1).

ステップSP4：伝達経路推定手段１６１において、
射影残差信号空間行列Eを特異値分解し、最大特異値に対応する左特異ベクトルを取り出すPチャネル収音経路のインパルス応答を

とし、h^₁(0)とh₁(0)の推定値とした場合、前記の左特異ベクトルの各成分は、

のようにPチャネル収音経路インパルス応答の推定値と対応する。 Step SP4: In the transmission path estimation means 161,
The impulse response of the P channel sound collection path that extracts the left singular vector corresponding to the maximum singular value by singular value decomposition of the projection residual signal space matrix E

And assuming that h ^ ₁ (0) and h ₁ (0) are estimated values, each component of the left singular vector is

This corresponds to the estimated value of the P channel sound collection path impulse response.

このため、左特異ベクトルの成分を、P個ごとに取り出してベクトル化することで、Pチャネル収音経路インパルス応答の推定値に応答するP本のベクトル

を得ることができる。このP本のベクトルにより伝達経路推定手段１６１は伝達経路を推定することができ、この伝達経路推定値を用いて逆フィルタ算出手段２０１は逆フィルタ係数を算出し、逆フィルタ手段２０２のフィルタ特性を設定する。以上がLSS法の処理となる。 Therefore, by extracting the components of the left singular vector for every P and vectorizing them, P vectors that respond to the estimated value of the P channel sound collection path impulse response

Can be obtained. The transmission path estimation means 161 can estimate the transmission path from the P vectors, and using this transmission path estimation value, the inverse filter calculation means 201 calculates the inverse filter coefficient, and the filter characteristic of the inverse filter means 202 is obtained. Set. The above is the processing of the LSS method.

LSS法では、実際のインパルス応答長Nは既知なことを前提としたが、実際には不明なため、応答長を確定させるための処理が必要となる。この処理を含む方法が非特許文献２に記載されているJ−LSS法である。
J−LSS法を図１３及び図１４を用いて説明する。図１３に示すJ−LSS法の特徴とする構成は、射影残差信号行列抽出手段１３１と伝達経路推定手段１６１’の応答長推定手段１４１を設けた点と、これに伴って伝達経路推定手段１６１の手順をわずかに変更し、１６１’とした点である。全体の処理の流れとしては図１４に示すようにステップSP1〜SP3の処理はLSS法とJ−LSS法で共通なため、J−LSS法における処理として、以下ではステップSP6以降を図１４を用いて説明する。 In the LSS method, it is assumed that the actual impulse response length N is known, but since it is actually unknown, processing for determining the response length is required. A method including this processing is the J-LSS method described in Non-Patent Document 2.
The J-LSS method will be described with reference to FIGS. The characteristic configuration of the J-LSS method shown in FIG. 13 is that projection residual signal matrix extraction means 131 and response length estimation means 141 of transmission path estimation means 161 ′ are provided, and accompanying this, transmission path estimation means. The procedure of 161 is slightly changed to 161 ′. As shown in FIG. 14, the processing flow of steps SP1 to SP3 is common to the LSS method and the J-LSS method as shown in FIG. 14. Therefore, in FIG. I will explain.

ステップSP6：応答長推定手段１４１において、射影残差信号空間行列Eを特異値分解する。
E＝U_EΣ_EV_E ^T
そして応答長を仮定する。これを仮定長k（１≦k≦j）とする。一例としてjに設定することが考えられる。 Step SP6: The response length estimation means 141 performs singular value decomposition on the projection residual signal space matrix E.
E = U _E Σ _E V _E ^T
And the response length is assumed. This is assumed to be an assumed length k (1 ≦ k ≦ j). As an example, setting to j can be considered.

ステップSP7：行列Eの小さい方からP（j＋1）−j＋k−1個の特異値に対応する左特異ベクトルを行列U_Eから取り出す。その左特異ベクトルを行にもつ行列Dを生成し、行列Dを
D＝|D₁…D_j|
のようにj個に等分割し、これからブロックHankel行列Γ_k(D)

を生成する。 Step SP7: The left singular vector corresponding to P (j + 1) −j + k−1 singular values is extracted from the matrix U _E from the smaller of the matrix E. Generate a matrix D with the left singular vector in the row,
D = | D ₁ … D _j |
, And divide equally into j and block Hankel matrix Γ _k (D)

Is generated.

ステップSP8：仮定長kの妥当性を調べるために、上記ブロックHankel行列に関する一次式
Γ_k(D)f＝0
が０ベクトル以外の解を持つか否かを調べる。この一次式が０ベクトル以外の解を持つとき、その仮定長kはインパルス応答長Nと一致すると判定し、ステップSP10に行く。０ベクトル以外の解を持たないとき、ステップSP9に行く。 Step SP8: In order to check the validity of the assumed length k, a linear expression for the block Hankel matrix
Γ _k (D) f = 0
Check if has a solution other than 0 vector. When this linear expression has a solution other than the 0 vector, it is determined that the assumed length k matches the impulse response length N, and the process goes to step SP10. When there is no solution other than the zero vector, the process goes to step SP9.

ステップSP9：仮定長kを変えて、ステップSP7に戻る。一例として、最初に仮定長kをjに設定した場合には、k←k−1のように仮定長kを変えることが考えられる。
ステップSP10：伝達経路推定手段１６１’において、
Γ_k(D)f＝0
の０ベクトル以外の解fとして、インパルス応答を求める。fとインパルス応答との対応は、ステップSP4における特異ベクトルとインパルス応答との対応と同じになる。
L.Tong and Q. Zhao, "Joint Order Detection and Blind Channel Estimation by Least Squares Smoothing," IEEE Transactions on signal processing, Vol. 47, No.9 1999. Step SP9: Change the assumed length k and return to step SP7. As an example, when the assumed length k is initially set to j, it is conceivable to change the assumed length k as k ← k−1.
Step SP10: In the transmission path estimation means 161 ′,
Γ _k (D) f = 0
An impulse response is obtained as a solution f other than the zero vector of. The correspondence between f and the impulse response is the same as the correspondence between the singular vector and the impulse response in step SP4.
L.Tong and Q. Zhao, "Joint Order Detection and Blind Channel Estimation by Least Squares Smoothing," IEEE Transactions on signal processing, Vol. 47, No.9 1999.

上記J−LSS法（従来技術）では、インパルス応答の仮定長kの妥当性を仮定を少しずつ変えながら繰り返し検証することにより最適なインパルス応答長を求めるため、伝達経路または入力信号の推定には膨大な演算を必要とする問題がある。
本発明の目的は、インパルス応答長の仮定を変更して繰り返し検証する処理を行なわず、少ない演算量で伝達経路または入力信号を推定することにある。 In the J-LSS method (prior art), the optimal impulse response length is obtained by repeatedly verifying the validity of the assumed impulse response length k while changing the assumption little by little. There is a problem that requires enormous operations.
An object of the present invention is to estimate a transmission path or an input signal with a small amount of computation without changing the assumption of the impulse response length and repeatedly performing verification processing.

この発明では、最初に得られた射影残差信号空間から最適な応答長を直接に推定する。そして、この最適応答長に基づいて射影残差信号空間を補正して補正射影残差信号空間を求め、補正射影残差信号空間から伝達経路のインパルス応答を推定する処理手順を基本動作とする。
以下にこの発明による伝達経路推定方法、この伝達経路推定方法を利用して動作する残響除去方法、音源分離方法の具体的な手順を説明する。
この発明による伝達経路推定方法はソース信号から複数の線形伝達経路を経て観測された観測信号から、伝達経路を推定する伝達経路推定方法において、多チャネル観測信号を蓄積する多チャネル信号蓄積処理手段と、蓄積された多チャネル観測信号から現信号空間行列を求める現信号空間行列生成処理と、蓄積された多チャネル観測信号から過去未来信号空間行列を生成する過去未来信号空間行列生成処理と、過去未来信号空間行列の直交基底を求める過去未来信号空間直交基底算出処理と、現信号空間行列と前記過去未来信号空間の直交基底から、現信号空間行列を過去未来信号空間に射影したときの射影残差信号空間行列を求める射影残差信号行列抽出処理と、射影残差信号行列Eのランクrから冗長射影残差信号空間行列Ebを求め、射影残差信号空間行列Eのうち冗長射影残差信号空間行列Ebに直交する成分を取り出して補正射影残差信号空間行列とする射影残差信号行列補正処理と、補正射影残差信号行列を特異値分解し、最大特異値に対応する左特異ベクトルを取り出して多チャネル伝達経路のインパルス応答を求める伝達経路推定処理とを含むことを特徴とする。 In the present invention, the optimum response length is directly estimated from the initially obtained projection residual signal space. Based on this optimum response length, the projection residual signal space is corrected to obtain a corrected projection residual signal space, and the processing procedure for estimating the impulse response of the transmission path from the corrected projection residual signal space is a basic operation.
Hereinafter, specific procedures of the transmission path estimation method according to the present invention, the dereverberation method that operates using the transmission path estimation method, and the sound source separation method will be described.
A transmission path estimation method according to the present invention includes a multi-channel signal accumulation processing means for accumulating multi-channel observation signals in a transmission path estimation method for estimating a transmission path from observation signals observed from a source signal through a plurality of linear transmission paths. Current signal space matrix generation processing for obtaining a current signal space matrix from accumulated multi-channel observation signals, past future signal space matrix generation processing for generating past future signal space matrices from accumulated multi-channel observation signals, and past future Projected residual when projecting current signal space matrix to past future signal space from past future signal space orthogonal basis calculation processing to obtain orthogonal basis of signal space matrix and orthogonal base of current signal space matrix and said past future signal space Projection residual signal matrix extraction processing to obtain the signal space matrix, and the redundant projection residual signal space matrix Eb from the rank r of the projection residual signal matrix E to obtain the projection Projection residual signal matrix correction processing that takes out a component orthogonal to the redundant projection residual signal space matrix Eb from the residual signal space matrix E to obtain a corrected projection residual signal space matrix, and the corrected projection residual signal matrix as a singular value A transmission path estimation process for decomposing and extracting a left singular vector corresponding to the maximum singular value to obtain an impulse response of the multi-channel transmission path.

この発明による残響除去方法は上記伝達経路推定方法で適用する各処理に加えて、推定された伝達経路インパルス応答から逆フィルタを求める逆フィルタ算出処理と、逆フィルタを多チャネル観測信号に適用してソース信号推定結果を出力する逆フィルタ処理とを含むことを特徴とする。 The dereverberation method according to the present invention applies an inverse filter calculation process for obtaining an inverse filter from the estimated transmission path impulse response, and an inverse filter applied to the multichannel observation signal, in addition to each process applied in the above transmission path estimation method. And an inverse filter process for outputting a source signal estimation result.

更に、この発明による音源分離方法では上記伝達経路推定方法で用いる各処理に加えて、推定された伝達経路インパルス応答から残響除去フィルタを求める残響除去フィルタ算出処理と、多チャネル観測信号に対して残響除去フィルタ算出処理で求めた残響除去フィルタを適用する残響除去フィルタリング処理と、残響除去処理後段の信号を入力として音源分離処理を行なう音源分離処理とを含むことを特徴とする。 Furthermore, in the sound source separation method according to the present invention, in addition to the processes used in the transmission path estimation method, a dereverberation filter calculation process for obtaining a dereverberation filter from the estimated transmission path impulse response, and reverberation for a multichannel observation signal. The present invention includes a dereverberation filtering process that applies a dereverberation filter obtained by a removal filter calculation process, and a sound source separation process that performs a sound source separation process using a signal after the dereverberation process as an input.

本発明は、現信号空間行列を未来過去信号空間行列に射影して求めた射影残差信号空間から最適な応答長を直接に推定する。そして、この最適応答長に基づいて射影残差信号空間を補正して補正射影残差信号空間を求めて、補正射影残差信号空間から伝達経路のインパルス応答を推定する。これにより、本発明は従来法（J−LSS法）のような繰り返し処理の必要がなくなることから、従来技術（J−LSS法）に比べて少ない演算量で伝達経路のインパルス応答を推定することができる。従って、推定するインパルス応答長が大きい可能性がある場合には、本発明は特に有効である。 The present invention directly estimates the optimum response length from the projected residual signal space obtained by projecting the current signal space matrix onto the future past signal space matrix. Then, based on the optimum response length, the projection residual signal space is corrected to obtain a corrected projection residual signal space, and the impulse response of the transmission path is estimated from the corrected projection residual signal space. As a result, the present invention eliminates the need for iterative processing as in the conventional method (J-LSS method), and therefore estimates the impulse response of the transmission path with a smaller amount of computation than in the prior art (J-LSS method). Can do. Therefore, the present invention is particularly effective when the estimated impulse response length may be large.

本発明による伝達経路推定方法、残響除去方法、音源分離方法及びこれらの装置は全てハードウェアによって実現することができるが、最も簡素に実現するにはコンピュータに本発明による伝達経路推定プログラム、残響除去プログラム、音源分離プログラムをインストールし、コンピュータにこれらの装置として機能させる形態が最良の形態である。 The transmission path estimation method, the dereverberation method, the sound source separation method, and these apparatuses according to the present invention can all be realized by hardware. However, in order to realize the simplest method, the transmission path estimation program according to the present invention and the dereverberation are implemented in a computer. The best mode is one in which a program and a sound source separation program are installed and the computer functions as these devices.

コンピュータを伝達経路推定装置として機能させる場合、コンピュータには伝達経路推定プログラムをインストールし、このプログラムによりコンピュータ内に多チャネル信号蓄積手段と、現信号空間行列生成手段と、過去信号空間行列生成手段と、過去未来信号空間直交基底算出手段と、射影残差信号行列抽出手段と、射影残差信号行列補正手段と、伝達経路推定手段とを構築し、伝達経路推定装置として機能させる。 When the computer functions as a transmission path estimation device, a transmission path estimation program is installed in the computer, and by this program, multi-channel signal storage means, current signal space matrix generation means, past signal space matrix generation means, The past future signal space orthogonal basis calculation means, the projection residual signal matrix extraction means, the projection residual signal matrix correction means, and the transmission path estimation means are constructed and function as a transmission path estimation device.

コンピュータを残響除去装置として機能させる場合、コンピュータには残響除去プログラムをインストールし、このプログラムによりコンピュータ内に上記の伝達経路推定装置の構成に、逆フィルタ算出手段と、逆フィルタ手段とを加えた構成を構築し、残響除去装置として機能させる。
コンピュータを音源分離装置として機能させる場合、コンピュータに音源分離プログラムをインストールし、このプログラムによってコンピュータ内に上記伝達経路推定装置の構成に残響除去フィルタ算出手段と、残響除去フィルタリング手段と、音源分離手段とを追加した構成を構築し、音源分離装置として機能させる。 When making a computer function as an dereverberation device, a dereverberation program is installed in the computer, and a configuration in which an inverse filter calculating unit and an inverse filter unit are added to the configuration of the transmission path estimation device in the computer by this program And make it function as a dereverberation device.
When making a computer function as a sound source separation device, a sound source separation program is installed in the computer, and by this program, the dereverberation filter calculating means, the dereverberation filtering means, the sound source separation means, Is added to make it function as a sound source separation device.

図１乃至図３を用いて本発明の第１の実施例を説明する。なお図９に示した従来技術と同じ処理を行うブロックには同一の符号を付して示している。
まず最初に伝達経路インパルス応答想定長jを、インパルス応答長N以上と見込まれる値にあらかじめ図１の現信号空間行列生成手段１１１、過去信号空間行列生成手段１１２、未来信号空間行列生成手段１１３に設定する。尚、図１に示す実施例１では過去信号空間行列生成手段１１２と未来信号空間行列生成手段１１３を設け、これらで生成した過去信号空間行列と未来信号空間行列からZ行列を生成する構成を示すが、過去信号空間行列生成手段１１２と未来信号空間行列生成手段１１３を介することなくZ行列を生成する方法も存在するため、ここでは過去信号空間行列生成手段１１２と未来信号空間行列生成手段１１３とZ行列生成手段１１５を含めてZ行列生成部１０２と称すことにする。 A first embodiment of the present invention will be described with reference to FIGS. In addition, the same code | symbol is attached | subjected and shown to the block which performs the same process as the prior art shown in FIG.
First, the assumed transmission path impulse response length j is set to a value expected to be equal to or longer than the impulse response length N in advance in the current signal space matrix generation unit 111, the past signal space matrix generation unit 112, and the future signal space matrix generation unit 113 of FIG. Set. In the first embodiment shown in FIG. 1, a past signal space matrix generation unit 112 and a future signal space matrix generation unit 113 are provided, and a Z matrix is generated from the past signal space matrix and the future signal space matrix generated by them. However, since there is also a method for generating the Z matrix without going through the past signal space matrix generation means 112 and the future signal space matrix generation means 113, here the past signal space matrix generation means 112, the future signal space matrix generation means 113, The Z matrix generation unit 115 including the Z matrix generation unit 115 is referred to as a Z matrix generation unit 102.

ステップ１：現信号空間行列生成手段１１１が、多チャネル信号蓄積手段１０１に蓄積された多チャネル信号から現信号空間行列Yを求める。同様にして、過去信号空間行列生成手段１１２が過去信号空間行列をZ_P求め、未来信号空間行列生成手段１１３が未来信号空間行列Z_Fを求める。詳細な処理内容は「背景技術」のステップSP1と同一なのでここでは説明を省略する。 Step 1: The current signal space matrix generation unit 111 obtains the current signal space matrix Y from the multichannel signals stored in the multichannel signal storage unit 101. Similarly, the past signal space matrix generation means 112 obtains the past signal space matrix Z _P and the future signal space matrix generation means 113 obtains the future signal space matrix Z _F. The detailed processing content is the same as that in Step SP1 of “Background Technology”, and the description thereof is omitted here.

ステップ２：Z行列生成部１０２が、過去信号空間行列Z_Pと未来信号空間行列Z_Fから

により行列Zを生成する。詳細な処理内容は「背景技術」のステップSP2と同一なのでここでは説明を省略する。 Step 2: The Z matrix generation unit 102 calculates the past signal space matrix Z _P and the future signal space matrix Z _F from

To generate a matrix Z. Detailed processing contents are the same as in step SP2 of “Background Technology”, and thus description thereof is omitted here.

ステップ３：過去未来信号空間直交基底算出手段１２１において行列Zの直交基底Vを求め、射影残差信号空間行列Eを
E＝Y−YVV^T
により求める。詳細な処理内容は「背景技術」のステップSP2と同一なのでここでは説明を省略する。 Step 3: In the past future signal space orthogonal basis calculation means 121, the orthogonal basis V of the matrix Z is obtained, and the projection residual signal space matrix E is obtained.
E = Y−YVV ^T
Ask for. Detailed processing contents are the same as in step SP2 of “Background Technology”, and thus description thereof is omitted here.

ステップ４：射影残差信号行列補正手段１５１において、射影残差信号空間行列Eのランクrを求め、j−r＋1を最適な応答長とする。射影残差信号空間行列Eの下からP×（r−１）行を取り出して、冗長射影残差信号空間行列Ebとする。その大きさはP×（r−１）行（T＋1）列となる。 Step 4: In the projection residual signal matrix correction means 151, the rank r of the projection residual signal space matrix E is obtained, and j−r + 1 is set as the optimum response length. P × (r−1) rows are extracted from the bottom of the projection residual signal space matrix E and set as a redundant projection residual signal space matrix Eb. Its size is P × (r−1) rows (T + 1) columns.

ここでは射影残差信号空間行列Eを下記のように行列Eaと冗長射影残差信号空間行列Ebに分割されるものとみなし、射影残差信号空間行列Eから冗長射影残差信号空間行列Ebのみを取り出す。

次に、取り出した冗長射影残差信号空間行列Ebの直交基底Vbを、後述のように特異値分解等を経由して求め、
E₂＝E_a−E_aV_bV_b ^T
により補正された射影残差信号行列E₂を求める。このステップ４が背景技術のLSS法にもJ−LSS法にも無い本願特有の処理である。 Here, it is assumed that the projection residual signal space matrix E is divided into the matrix Ea and the redundant projection residual signal space matrix Eb as follows, and only the redundant projection residual signal space matrix Eb from the projection residual signal space matrix E Take out.

Next, the orthogonal base Vb of the extracted redundant projection residual signal space matrix Eb is obtained via singular value decomposition or the like as described later,
E ₂ = E _a −E _a V _b V _b ^T
The projection residual signal matrix E ₂ corrected by is obtained. This step 4 is a process unique to the present application which is neither in the background art LSS method nor in the J-LSS method.

ステップ５：伝達経路推定手段１６１において、補正された射影残差行列E₂を特異値分解し、最大特異値に対応する左特異ベクトルから、多チャネル収音経路のインパルス応答推定値を得る。入力が射影残差行列Eの代わりに補正された射影残差行列E₂となっている以外の詳細な処理内容は「背景技術」LSS法のステップSP4と同一なのでここでは説明を省略する。 Step 5: In pathways estimating means 161, and singular value decomposition corrected projection residual matrix E _2, the left singular vector corresponding to the largest singular value, to obtain an impulse response estimate of the multi-channel sound path. Since the detailed processing contents other than the input being the corrected residual matrix E ₂ instead of the projected residual matrix E are the same as those in step SP4 of the “background art” LSS method, the description thereof is omitted here.

ここで補足説明を図２を用いて行なう。図２Ａに示すYは現信号空間行列生成手段１１１と、過去信号空間行列生成手段１１２、未来信号空間行列生成手段１１３で生成した各行列を示す。この行列Yには冗長部分Xを含んでいるとする。上記ステップ４で求めた射影残差信号行列E₂は図２Ｂに示すように冗長部分Xの量を割り出し、その冗長部分Xを除去し、適正量に補正した射影残差信号行列である。この補正された射影残差信号行列E₂を用いて伝達経路の推定を行なうことにより、適正なインパルス応答長で伝達経路の推定を行なうことができることになる。 Here, supplementary explanation will be given with reference to FIG. 2A indicates each matrix generated by the current signal space matrix generation unit 111, the past signal space matrix generation unit 112, and the future signal space matrix generation unit 113. It is assumed that the matrix Y includes a redundant part X. Projection residual signal matrix E ₂ obtained in step 4 is indexing the amount of redundant portions X as shown in FIG. 2B, to remove the redundant portion X, a projection residual signal matrix corrected to a proper amount. By the estimation of the transmission path by using the corrected projected residual signal matrix E _2, so that it is possible to estimate the transmission path at an appropriate impulse response length.

従って、本発明によれば過去未来信号空間直交基底抽出処理と、射影残差信号行列抽出処理と、射影残差信号行列補正処理を一度実行するだけで補正された射影残差信号行列E₂を得ることができるから、その演算量を少なくすることができる。
従来技術J−LSS法では、応答長の仮定を少しずつ変えて、仮定の妥当性を検証する処理を繰り返し行う必要がある。一方本発明では、J−LSS法における繰り返し処理（図１４に示すステップSP7−SP8−SP9のループ）が不要となる。 Therefore, according to the present invention, the corrected residual signal matrix E ₂ corrected by performing the past future signal space orthogonal base extraction process, the projected residual signal matrix extraction process, and the projected residual signal matrix correction process once is obtained. Therefore, the amount of calculation can be reduced.
In the prior art J-LSS method, it is necessary to change the assumption of the response length little by little and repeatedly perform the process of verifying the validity of the assumption. On the other hand, in the present invention, iterative processing (loop of steps SP7-SP8-SP9 shown in FIG. 14) in the J-LSS method is unnecessary.

実施例１の手法の有効性を示すために行ったシミュレーションの結果を図３に示す。図３に示すシミュレーションは、８kHzサンプリングの音声信号をソース信号とし、１入力２出力の伝達経路を経て観測された観測信号から伝達経路の推定と逆フィルタの推定を行なう。上記１入力２出力伝達経路のインパルス応答長をN=５００に設定し、最初のインパルス応答長の想定をJ=550に設定している。上記伝達経路は、サンプリング周波数８」kHzで測定された残響時間２００ｍｓの室内インパルス応答を５００タップに打ち切って使用している。 FIG. 3 shows the result of the simulation performed to show the effectiveness of the method of the first embodiment. The simulation shown in FIG. 3 uses a sound signal of 8 kHz sampling as a source signal, performs estimation of a transmission path and estimation of an inverse filter from an observation signal observed through a transmission path of one input and two outputs. The impulse response length of the 1-input 2-output transmission path is set to N = 500, and the initial impulse response length is assumed to be J = 550. The transmission path uses a room impulse response with a reverberation time of 200 ms, measured at a sampling frequency of 8 ”kHz, cut to 500 taps.

図３に示すＡ１、Ａ２は、信号源から伝達経路後段までのインパルス応答（真値）である。また、図３に示すＣ１、Ｃ２は実施例１の手法によるインパルス応答の推定結果である。Ａ１とＣ１およびＡ２とＣ２を比較すると両者はほぼ同一波形となっており、実施例１の手法でインパルス応答が良好に推定されているのが分かる。
実施例１の変形：また射影残差信号行列Eからランク経由で最適な応答長を求めた後に、伝達経路のインパルス応答を求める方法として、行列Y、Z、Eを再計算する下記の実施方法も考えられる。
（１）想定長をj−r＋1（r＋1は冗長分）に再設定し、実施例１のステップ１を適用して行列Yを求める。
（２）想定長をj−r＋1に再設定し、実施例１のステップ２を適用して行列Zを求める。
（３）実施例１のステップ３を適用して、射影残差行列を再計算する。
（４）再計算された射影残差行列を入力として、実施例１のステップ５を適用する。
実施例１の方法は、P×（r−1）行（T＋1）列の行列E₂の直交基底を求めればよい。 A1 and A2 shown in FIG. 3 are impulse responses (true values) from the signal source to the latter stage of the transmission path. Also, C1 and C2 shown in FIG. 3 are impulse response estimation results by the method of the first embodiment. When A1 and C1 and A2 and C2 are compared, both have substantially the same waveform, and it can be seen that the impulse response is well estimated by the method of the first embodiment.
Modification of the first embodiment: After obtaining an optimum response length via rank from the projection residual signal matrix E, as a method for obtaining the impulse response of the transmission path, the following implementation method for recalculating the matrices Y, Z, and E Is also possible.
(1) The assumed length is reset to j−r + 1 (r + 1 is redundant), and the matrix Y is obtained by applying step 1 of the first embodiment.
(2) The assumed length is reset to j−r + 1, and the matrix Z is obtained by applying step 2 of the first embodiment.
(3) Applying Step 3 of Example 1 to recalculate the projection residual matrix.
(4) Step 5 of Embodiment 1 is applied with the recalculated projection residual matrix as an input.
In the method according to the first embodiment, the orthogonal basis of the matrix E _{2 having} P × (r−1) rows (T + 1) columns may be obtained.

但し、上記の再計算による実施方法ではｗ＝j−r＋1として（2P×ｗ）行（T＋1）列の行列Zの直交基底を求める必要がある。実施例１の方法と比較すると、再計算による実施方法では、行列Zと行列E₂のサイズ差に応じた演算量が余分に必要となる。 However, in the implementation method based on the above recalculation, it is necessary to obtain the orthogonal basis of the matrix Z of (2P × w) rows (T + 1) columns with w = j−r + 1. Compared to the method of Example 1, in the exemplary method according to recalculation, the amount of computation extra required according to the size difference of the matrix E ₂ and the matrix Z.

本発明の第２の実施例を図４をもちいて説明する。第２実施例では、第１実施例による伝達経路インパルス応答の推定結果をもちいて、残響を除去したソース信号を推定する。なお図９の従来技術と同じ処理を行うブロックには同一の符号を付している。
ステップ１：第１実施例の方法をもちいて多チャネル収音経路のインパルス応答推定値を得る。
ステップ２：逆フィルタ算出手段２０１において、この推定された伝達経路インパルス応答から、特開昭６２−１９０９３５公報に記載の方法をもちいて逆フィルタを求める。
ステップ３：逆フィルタ手段２０２において、この逆フィルタを多チャネル観測信号に適用して、本来のソース信号を推定する。
実施例２の手法の有効性を示すために、第１実施例のシミュレーションと同一の設定をもちいて、シミュレーションを行った。実施例２の手法により求めた逆フィルタを用いたときの信号源から逆フィルタ後段までのインパルス応答を図５に示す。このインパルス応答は、ほぼ直接波のみのインパルス応答になっており、残響がうまく除去されていることが分かる。 A second embodiment of the present invention will be described with reference to FIG. In the second embodiment, the source signal from which reverberation is removed is estimated using the estimation result of the transmission path impulse response according to the first embodiment. It should be noted that blocks that perform the same processing as in the prior art in FIG.
Step 1: An impulse response estimation value of a multi-channel sound pickup path is obtained using the method of the first embodiment.
Step 2: Inverse filter calculation means 201 obtains an inverse filter from the estimated transmission path impulse response using the method described in Japanese Patent Laid-Open No. 62-190935.
Step 3: The inverse filter means 202 applies the inverse filter to the multichannel observation signal to estimate the original source signal.
In order to show the effectiveness of the technique of the second embodiment, a simulation was performed using the same settings as the simulation of the first embodiment. FIG. 5 shows an impulse response from the signal source to the latter stage of the inverse filter when the inverse filter obtained by the method of the second embodiment is used. This impulse response is an impulse response of only a direct wave, and it can be seen that the reverberation is well removed.

次に図６をもちいて本発明の第３の実施例を説明する。
この実施例では図７のような多入力多出力伝達経路を経て観測された信号の残響除去と分離を目的とする。多入力多出力伝達経路のインパルス応答長をN、その想定長をN以上と見込まれる値jに設定する。ここで音源数はQで既知とし、収音チャネル数はPである。但し、Q<Pである。 Next, a third embodiment of the present invention will be described with reference to FIG.
The purpose of this embodiment is to eliminate and reverberate signals observed through a multi-input multi-output transmission path as shown in FIG. The impulse response length of the multi-input multi-output transmission path is set to N, and the assumed length is set to a value j that is expected to be N or more. Here, the number of sound sources is known as Q, and the number of sound collection channels is P. However, Q <P.

ステップ１：多チャネル信号蓄積手段１０１に蓄積された多チャネル観測信号から、現信号空間、過去信号空間、未来信号空間を求める。
以下では、Pチャネル収音信号の時刻kの値y₁(k)〜y_P(k)からなる縦ベクトル

をベースとして説明を進める。
同様に多チャネル信号蓄積手段１０１に蓄積された多チャネル信号から、過去信号空間行列生成手段１１２により、多チャネル過去信号空間として行列Z_Fを生成する。

但し、音源数Qをもちいて、w≧Q×jに設定する。またTについては、T>2wに設定する。行列Z_P、Z_Fのサイズは（P×w）行（T＋1）列である。
現信号空間行列生成手段１１１は、多チャネル信号蓄積手段１０１に蓄積された多チャネル観測信号から、多チャネル現信号空間として、行列Yを生成する。このとき行列Yのサイズは（P×j）行（T＋1）列になる。

Step 1: A current signal space, a past signal space, and a future signal space are obtained from the multichannel observation signals stored in the multichannel signal storage means 101.
In the following, a vertical vector consisting of values y ₁ (k) to y _P (k) of time k of the P channel sound pickup signal

The explanation will proceed based on the above.
Similarly, a matrix Z _F is generated as a multi-channel past signal space by the past signal space matrix generation unit 112 from the multi-channel signals stored in the multi-channel signal storage unit 101.

However, w ≧ Q × j is set using the number of sound sources Q. For T, set T> 2w. The sizes of the matrices Z _P and Z _F are (P × w) rows (T + 1) columns.
The current signal space matrix generation unit 111 generates a matrix Y as a multichannel current signal space from the multichannel observation signals stored in the multichannel signal storage unit 101. At this time, the size of the matrix Y is (P × j) rows (T + 1) columns.

ステップ２：Z行列生成手段１１５にて、過去信号空間行列Z_Pと未来信号空間行列Z_Fから

により行列Zを生成する。 Step 2: In the Z matrix generation means 115, from the past signal space matrix Z _P and the future signal space matrix Z _F

To generate a matrix Z.

ステップ３：過去未来信号空間直交基底算出手段１２１において行列Zの直交基底Vを求め、射影残差信号空間行列Eを
E＝Y−YVV^T
により求める。 Step 3: In the past future signal space orthogonal basis calculation means 121, the orthogonal basis V of the matrix Z is obtained, and the projection residual signal space matrix E is obtained.
E = Y−YVV ^T
Ask for.

ステップ４：射影残差信号行列補正手段１５１において、射影残差信号行列Eのランクrを求める。このランクrをもちいて、射影残差信号空間行列Eの下からP×（r−Q）行を取り出して、冗長射影残差信号空間行列Ebとする。行列Ebの大きさは、P×（r−Q）行（T＋1）列となる。行列Eは下記のように行列Eaと冗長射影残差信号空間行列Ebに分割される。

冗長射影残差信号空間行列Ebの直交基底Vbを、上述のように特異値分解等を経由して求め、
E₂＝E_a−E_aV_bV_b ^T
により補正された射影残差信号空間行列を求める。 Step 4: The projection residual signal matrix correction means 151 obtains the rank r of the projection residual signal matrix E. Using this rank r, P × (r−Q) rows are extracted from the bottom of the projection residual signal space matrix E and set as a redundant projection residual signal space matrix Eb. The size of the matrix Eb is P × (r−Q) rows (T + 1) columns. The matrix E is divided into a matrix Ea and a redundant projection residual signal space matrix Eb as follows.

Obtain the orthogonal basis Vb of the redundant projection residual signal space matrix Eb via singular value decomposition as described above,
E ₂ = E _a −E _a V _b V _b ^T
The projection residual signal space matrix corrected by is obtained.

ステップ５：伝達経路推定手段１６１において、補正された射影残差信号行列E₂を特異値分解し、最大特異値に対応する左特異ベクトルから大きさがQ番目の特異値に対応する左特異ベクトルまでを取り出す。第q番目の特異ベクトルを

とすると、

はq番目の音源に関するPチャネル収音経路のインパルス応答推定値になっている。 Step 5: left singular vectors in the transmission path estimator 161, the singular value decomposition of the corrected projection residual signal matrix E _2, the size of left singular vector corresponding to the largest singular value corresponds to the Q-th singular value Take out. The qth singular vector

Then,

Is the impulse response estimate of the P channel sound collection path for the qth sound source.

ステップ６：残響除去フィルタ算出手段２０３において、ステップ５にて推定された伝達経路インパルス応答から、前出の特開昭６２−１９０９３５号公報に記載の方法をもちいて残響除去フィルタを求める。 Step 6: The dereverberation filter calculating means 203 obtains an dereverberation filter from the transmission path impulse response estimated in Step 5 by using the method described in the above-mentioned Japanese Patent Application Laid-Open No. 62-190935.

ステップ７：残響除去フィルタリング手段２０４において、この残響除去フィルタを多チャネル観測信号に適用して、残響成分の取り除かれた多チャネル信号を推定する。 Step 7: In the dereverberation filtering means 204, this dereverberation filter is applied to the multichannel observation signal to estimate the multichannel signal from which the reverberation component has been removed.

ステップ８：残響成分の取り除かれた多チャネル信号に音源分離手段３０１で音源分離処理を適用し、音源信号を取り出す。 Step 8: The sound source separation unit 301 applies sound source separation processing to the multichannel signal from which the reverberation component has been removed, and extracts the sound source signal.

音源分離処理には参考文献１に記載の独立成分解析（ICA）に基づくブラインド分離アルゴリズムを用いることができる。（参考文献１：「J.F. Cardoso,Blind Signal Separation: Statistical Principles, Proceedings of the IEEE, VOL.86, NO.10, pp.2009-2025, 1998.」）
上記の残響除去処理に含まれる伝達経路推定方法は、LSS法（非特許文献１）をベースとする。ただしLSS法は、音源が単一有色信号で伝達経路が１入力多出力系の場合のみを扱っている。
LSS法では、多チャネル信号から、過去信号空間行列Z_Pおよび未来信号空間行列Z_Fを生成する際に、wとして想定インパルス応答長jを推奨している。しかしLSS法をそのまま多入力多出力系の観測信号に適用しても、所望の結果を得ることができない。 For the sound source separation processing, a blind separation algorithm based on independent component analysis (ICA) described in Reference 1 can be used. (Reference 1: “JF Cardoso, Blind Signal Separation: Statistical Principles, Proceedings of the IEEE, VOL.86, NO.10, pp.2009-2025, 1998.”)
The transmission path estimation method included in the above dereverberation process is based on the LSS method (Non-patent Document 1). However, the LSS method deals only with the case where the sound source is a single colored signal and the transmission path is a one-input multi-output system.
In the LSS method, an assumed impulse response length j is recommended as w when generating a past signal space matrix Z _P and a future signal space matrix Z _F from a multi-channel signal. However, the desired result cannot be obtained even if the LSS method is applied as it is to the observation signal of a multi-input multi-output system.

そこで本発明では、LSS法を多入力多出力系に拡張するために、w≧Q×jに設定している。ただしQを音源数Qとする。
また実施例３の畳み込み混合された信号を分離する問題に対して特開２００３−３３３６８２号公報に記載されている周波数領域ブラインド音源分離（周波数領域BSS）という技術が従来よく用いられている。 Therefore, in the present invention, in order to extend the LSS method to a multi-input multi-output system, w ≧ Q × j is set. However, Q is the number of sound sources Q.
Further, a technique called frequency domain blind sound source separation (frequency domain BSS) described in Japanese Patent Application Laid-Open No. 2003-333682 is often used for the problem of separating the convolution mixed signal of the third embodiment.

しかし周波数領域BSSでは、除去したい音の直接音成分はほぼ完全に除去できるが、除去したい音の残響成分（間接音成分）はうまく除去できずに残留雑音となり、クロストーク成分の大きさは−１０ｄB前後となって、分離性能が大幅に低下することが知られている。
分離性能の改善を狙い、周波数領域ブラインド音源分離処理の後段に残響対応処理として雑音抑圧処理（クロストーク成分抑圧処理）を行なう方法も特開２００３−９９０９３号公報で提案されている。この特許文献に記載されている構成を図８に示す。雑音抑圧手段３０２の雑音抑圧処理では、推定された直接音成分の遅延とゲインを調整した信号として残響成分をモデル化する。このモデルにもとづき、音源分離手段３０１の音源分離処理後の信号に含まれるクロストーク成分を推定して差し引く。しかし残響成分を数次程度の反射波として簡略にモデル化しているため、分離性能の改善は平均３〜４ｄBにとどまる。 However, in the frequency domain BSS, the direct sound component of the sound to be removed can be almost completely removed. It is known that the separation performance is significantly reduced at around 10 dB.
Japanese Patent Laid-Open No. 2003-99093 also proposes a method of performing noise suppression processing (crosstalk component suppression processing) as reverberation processing after the frequency domain blind sound source separation processing with the aim of improving separation performance. The structure described in this patent document is shown in FIG. In the noise suppression processing of the noise suppression unit 302, the reverberation component is modeled as a signal in which the delay and gain of the estimated direct sound component are adjusted. Based on this model, the crosstalk component included in the signal after the sound source separation processing of the sound source separation means 301 is estimated and subtracted. However, since the reverberation component is simply modeled as a reflected wave of several orders, the improvement of the separation performance is only 3 to 4 dB on average.

これに対し、実施例３の手法をもちいることにより、クロストーク成分を−３０ｄB前後にまで抑えながら、所望の音を残響成分なく抽出することが可能となる。
以上実施例１で説明した伝達経路推定装置、実施例２で説明した残響除去装置、実施例３で説明した音源分離装置は全て、それぞれの実施例で説明した手順に従ってコンピュータを動作させる伝達経路推定プログラム、残響除去プログラム、音源分離プログラムによってコンピュータを機能させることにより実現することができる。
各プログラムはコンピュータが解読可能なプログラム言語によって記述され、コンピュータが読み取り可能な例えば磁気ディスク、CD-ROM或いは半導体メモリ等の記録媒体に記録される。これらの記録媒体から或いは通信回線を通じてコンピュータにインストールされ、コンピュータに備えられたCPUに解読されて実行される。 On the other hand, by using the method of the third embodiment, it is possible to extract a desired sound without a reverberation component while suppressing the crosstalk component to around −30 dB.
The transmission path estimation apparatus described in the first embodiment, the dereverberation apparatus described in the second embodiment, and the sound source separation apparatus described in the third embodiment are all transmission path estimation that causes a computer to operate according to the procedure described in each embodiment. It can be realized by causing a computer to function by a program, a dereverberation program, and a sound source separation program.
Each program is written in a computer-readable program language, and is recorded on a computer-readable recording medium such as a magnetic disk, CD-ROM, or semiconductor memory. It is installed in a computer from these recording media or through a communication line, and is decrypted and executed by a CPU provided in the computer.

本発明による伝達経路推定装置、残響除去装置、音源分離装置はそれぞれハンズフリーの音声通話システム等の分野に活用される。 The transmission path estimation apparatus, dereverberation apparatus, and sound source separation apparatus according to the present invention are each utilized in the field of hands-free voice communication systems and the like.

この発明の実施例１を説明するためのブロック図。BRIEF DESCRIPTION OF THE DRAWINGS The block diagram for demonstrating Example 1 of this invention. 実施例１の動作を説明するための図。FIG. 5 is a diagram for explaining the operation of the first embodiment. 実施例１の効果を説明するための波形図。FIG. 6 is a waveform diagram for explaining the effect of the first embodiment. この発明の実施例２を説明するためのブロック図。The block diagram for demonstrating Example 2 of this invention. 実施例２の効果を説明するための波形図。FIG. 6 is a waveform diagram for explaining the effect of the second embodiment. この発明の実施例３を説明するためのブロック図。The block diagram for demonstrating Example 3 of this invention. 実施例３の利用状況を説明するための配置図。FIG. 6 is a layout diagram for explaining a usage situation of the third embodiment. 公知文献に記載された音源分離手段と雑音抑圧手段の接続関係を説明するためのブロック図。The block diagram for demonstrating the connection relation of the sound source separation means described in the well-known literature, and a noise suppression means. 従来技術を説明するためのブロック図。The block diagram for demonstrating a prior art. 従来技術の利用状況を説明するための配置図。The layout for demonstrating the utilization condition of a prior art. 多チャネル信号蓄積手段の内部の様子を説明するための図。The figure for demonstrating the mode inside a multichannel signal storage means. 背景技術を説明するためのフローチャート。The flowchart for demonstrating background art. 背景技術の他の例を説明するためのブロック図。The block diagram for demonstrating the other example of background art. 図１３に示した他の背景技術を説明するためのフローチャート。14 is a flowchart for explaining another background art shown in FIG. 13.

Explanation of symbols

１０１多チャネル信号蓄積手段１３１射影残差信号行列抽出手段
１０２Ｚ行列生成部１４１応答長推定手段
１１１現信号空間行列生成手段１５１射影残差信号行列補正手段
１１２過去信号空間行列生成手段１６１、１６１’ 伝達経路推定手段
１１３未来信号空間行列生成手段２０１逆フィルタ算出手段
１１５Ｚ行列生成手段２０２逆フィルタ手段
１２１過去未来信号空間直交基底算出手段 101 Multi-channel signal accumulating unit 131 Projected residual signal matrix extracting unit 102 Z matrix generating unit 141 Response length estimating unit 111 Current signal space matrix generating unit 151 Projected residual signal matrix correcting unit 112 Past signal space matrix generating unit 161, 161 ′ Transmission path estimation means 113 Future signal space matrix generation means 201 Inverse filter calculation means 115 Z matrix generation means 202 Inverse filter means 121 Past future signal space orthogonal basis calculation means

Claims

In a transfer path estimation method for estimating a transfer path from an observation signal observed through a plurality of linear transfer paths from a source signal,
Multi-channel signal accumulation processing for accumulating multi-channel observation signals;
Current signal space matrix generation processing for obtaining a current signal space matrix from the accumulated multi-channel observation signals;
Past future signal space matrix generation processing for generating a past future signal space matrix from accumulated multi-channel observation signals;
A past future signal space orthogonal basis calculation process for obtaining an orthogonal basis of the past future signal space matrix;
From the orthogonal base of the current signal space matrix and the past future signal space, a projection residual signal matrix extraction process for obtaining a projection residual signal space matrix when the current signal space matrix is projected onto the past future signal space;
A redundant residual signal space matrix Eb is obtained from the rank r of the projected residual signal matrix E, and a component that is orthogonal to the redundant projected residual signal space matrix Eb is extracted from the projected residual signal space matrix E. A projection residual signal matrix correction process as a matrix;
Singular value decomposition of the corrected projected residual signal matrix, taking a left singular vector corresponding to the maximum singular value, and obtaining a transmission path estimation process for obtaining an impulse response of a multi-channel transmission path;
Including a transmission path estimation method.

In the dereverberation method for estimating the transmission path and the source signal from the observation signal observed through the plurality of linear transmission paths from the source signal,
Multi-channel signal accumulation processing for accumulating multi-channel observation signals;
Current signal space matrix generation processing for obtaining a current signal space matrix from the accumulated multi-channel observation signals;
Past future signal space matrix generation processing for generating a past future signal space matrix from accumulated multi-channel observation signals;
A past future signal space orthogonal basis calculation process for obtaining an orthogonal basis of the past future signal space matrix;
A projection residual signal matrix extraction process for obtaining a projection residual signal space matrix when the current signal space matrix is projected onto the past future signal space from orthogonal bases of the current signal space matrix and the past future signal space;
A redundant residual signal space matrix Eb is obtained from the rank r of the projected residual signal matrix E, and a component orthogonal to the redundant projected residual signal space matrix Eb is extracted from the projected residual signal space matrix E to obtain a corrected projected residual. A projection residual signal matrix correction process as a signal matrix;
Singular value decomposition of the corrected projection residual signal matrix, taking out the left singular vector corresponding to the maximum singular value and obtaining the impulse response of the multi-channel transmission route,
An inverse filter calculation process for obtaining an inverse filter from the estimated transmission path impulse response;
Applying an inverse filter to the multi-channel observation signal and outputting a source signal estimation result;
A dereverberation method characterized by comprising:

In a sound source separation method for estimating a source signal from an observation signal obtained by mixing a plurality of source signals through a plurality of linear transmission paths,
Multi-channel signal accumulation processing for accumulating multi-channel observation signals;
Current signal space matrix generation processing for obtaining a current signal space matrix from the accumulated multi-channel observation signals;
Past future signal space matrix generation processing for generating a past future signal space matrix from accumulated multi-channel observation signals;
A past future signal space orthogonal basis calculation process for obtaining an orthogonal basis of a past future signal space matrix;
A projection residual signal matrix extraction process for obtaining a projection residual signal space matrix when the current signal space matrix is projected onto the past future signal space from orthogonal bases of the current signal space matrix and the past future signal space;
A redundant residual signal space matrix Eb is obtained from the projected residual signal matrix E rank r, and a component orthogonal to the redundant projected residual signal space matrix Eb is extracted from the projected residual signal space matrix E to obtain a corrected projected residual signal. A projection residual signal matrix correction process as a matrix;
Singular value decomposition of the corrected projection residual signal matrix, taking out the left singular vector corresponding to the maximum singular value and obtaining the impulse response of the multi-channel transmission route,
A dereverberation filter calculation process for obtaining a dereverberation filter from the estimated transfer path impulse response;
A dereverberation filtering process that applies the dereverberation filter obtained in the dereverberation filter calculation process to a multi-channel observation signal;
Sound source separation processing that performs sound source separation processing using the signal after the dereverberation processing as input,
A sound source separation method comprising:

In a transmission path estimation device that estimates a transmission path from an observation signal observed from a source signal through a plurality of linear transmission paths,
Multi-channel signal storage means for storing multi-channel observation signals;
Current signal space matrix generating means for obtaining a current signal space matrix from the accumulated multi-channel observation signals;
A past future signal space matrix generating means for generating a past future signal space matrix from the accumulated multi-channel observation signals;
Past future signal space orthogonal basis calculating means for obtaining an orthogonal basis of the past future signal space matrix;
Projected residual signal matrix extraction means for obtaining a projected residual signal space matrix when the current signal space matrix is projected onto the past future signal space from orthogonal bases of the current signal space matrix and the past future signal space;
A redundant residual signal space matrix Eb is obtained from the rank r of the projected residual signal matrix E, and a component orthogonal to the redundant projected residual signal space matrix Eb is extracted from the projected residual signal space matrix E to obtain a corrected projected residual. A projection residual signal matrix correction means as a signal matrix;
Singular value decomposition of the corrected projected residual signal matrix, taking out the left singular vector corresponding to the maximum singular value, and obtaining the impulse response of the multi-channel transmission path, transmission path estimation means,
A transmission path estimation apparatus comprising:

In the dereverberation device that estimates the transmission path and the source signal from the observation signal observed through the plurality of linear transmission paths from the source signal,
Multi-channel signal storage means for storing multi-channel observation signals;
Current signal space matrix generating means for obtaining a current signal space matrix from the accumulated multi-channel observation signals;
A past future signal space matrix generating means for generating a past future signal space matrix from the accumulated multi-channel observation signals;
Past future signal space orthogonal basis calculating means for obtaining an orthogonal basis of the past future signal space matrix;
Projected residual signal matrix extraction means for obtaining a projected residual signal space matrix when the current signal space matrix is projected onto a past future signal space from orthogonal bases of the current signal space matrix and the past future signal space;
A redundant residual signal space matrix Eb is obtained from the projected residual signal matrix E rank r, and a component orthogonal to the redundant projected residual signal space matrix Eb is extracted from the projected residual signal space matrix E to obtain a corrected projected residual signal. A projection residual signal matrix correction means as a matrix;
Singular value decomposition of the corrected projected residual signal matrix, taking out the left singular vector corresponding to the maximum singular value, and obtaining the impulse response of the multi-channel transmission route,
An inverse filter calculating means for obtaining an inverse filter from the estimated transmission path impulse response;
Means for applying an inverse filter to the multi-channel observation signal and outputting a source signal estimation result;
A dereverberation apparatus comprising:

In a sound source separation device that estimates a source signal from an observation signal in which a plurality of source signals are mixed through a plurality of linear transmission paths,
Multi-channel signal storage means for storing multi-channel observation signals;
Current signal space matrix generating means for obtaining a current signal space matrix from the accumulated multi-channel observation signals;
A past future signal space matrix generating means for generating a past future signal space matrix from the accumulated multi-channel observation signals;
Past future signal space orthogonal basis calculating means for obtaining an orthogonal basis of the past future signal space matrix;
Projected residual signal extraction means for obtaining a projected residual signal space matrix when the current signal space matrix is projected onto the past future signal space from orthogonal bases of the current signal space matrix and the past future signal space;
A redundant residual signal space matrix Eb is obtained from the rank r of the projected residual signal matrix E, and a component orthogonal to the redundant projected residual signal space matrix Eb is extracted from the projected residual signal space matrix E to obtain a corrected projected residual. A projection residual signal matrix correction means for making a difference signal matrix;
Singular value decomposition of the corrected projected residual signal matrix, taking out the left singular vector corresponding to the maximum singular value, and obtaining the impulse response of the multi-channel transmission path, transmission path estimation means,
Dereverberation filter calculating means for obtaining an dereverberation filter from the estimated transfer path impulse response;
Dereverberation filtering means for applying the dereverberation filter obtained by the dereverberation filter calculating means to the multi-channel observation signal;
Sound source separation means for performing sound source separation processing using the signal after the dereverberation processing as input,
A sound source separation device comprising:

A program written in a program language that can be read by a computer, and causing the computer to function as the device according to claims 4 to 6.

A recording medium comprising a computer-readable recording medium, wherein the program according to claim 7 is recorded on the recording medium.