JP2008092363A

JP2008092363A - Signal separation apparatus and method

Info

Publication number: JP2008092363A
Application number: JP2006272163A
Authority: JP
Inventors: Atsuo Hiroe; 厚夫廣江
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2006-10-03
Filing date: 2006-10-03
Publication date: 2008-04-17
Anticipated expiration: 2026-10-03
Also published as: JP4946330B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a signal separation apparatus and method in which signals can be correspondingly separated without performing post-processing after separation, when separating a mixed signal in which a plurality of signals are mixed, for each signal. <P>SOLUTION: The signal separation apparatus comprises a plurality of separation systems 1<SB>j</SB>each for generating a separated signal from an observation signal in a time domain or a time/frequency domain in which a plurality of signals are mixed and a separation matrix and a mapping unit 6 for generating a signal set of corresponding separated signals among a plurality of separated signals generated from different and multiple separation systems 1<SB>j</SB>, wherein the separation matrix is updated to maximize independency among signal sets and the separated signals are generated. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、独立成分分析（Independent Component Analysis；ＩＣＡ）を用いて信号を分離する信号分離装置及びその方法に関する。 The present invention relates to a signal separation apparatus and method for separating a signal using independent component analysis (ICA).

複数の原信号が未知の係数によって線形に混合されているときに、統計的独立性のみを用いて原信号を分離・復元するという独立成分分析（Independent Component Analysis；ＩＣＡ）の手法が信号処理の分野で注目されている。この独立成分分析を応用することで、例えば話者とマイクロフォンとが離れた場所にあり、マイクロフォンで話者の音声以外の音を拾ってしまうような状況でも、音声信号を分離・復元することが可能となる。 Independent component analysis (ICA), which uses only statistical independence to separate and restore the original signal when multiple original signals are linearly mixed by unknown coefficients, It is attracting attention in the field. By applying this independent component analysis, voice signals can be separated and restored even in situations where the speaker and the microphone are separated and the microphone picks up sounds other than the speaker's voice. It becomes possible.

図２０に示すようにＮ個の音源からそれぞれ異なる音が鳴っており、それらをｎ個のマイクロフォンで観測するという状況を想定する。ｋ番目（１≦ｋ≦ｎ）のマイクロフォンｋで観測される信号（観測信号）ｘ_ｋ（ｔ）が、それぞれの原信号ｓ_ｊ（ｔ）の線形結合であると仮定すると、ｘ_ｋ（ｔ）は、下記式（１）で表される。また、全てのマイクロフォンについての観測信号を一つの式で表わすと、下記式（２）で表わせる。この式（１）、（２）において、ｘ（ｔ）、ｓ（ｔ）はそれぞれｘ_ｋ（ｔ）、ｓ_ｋ（ｔ）を要素とする列ベクトルを表し、Ａはａ_ｉｊ（ｔ）を要素とするｎ行Ｎ列の行列を表す。なお、以下ではＮ＝ｎとする。 As shown in FIG. 20, it is assumed that different sounds are generated from N sound sources and these are observed by n microphones. Assuming that the signal (observation signal) x _k (t) observed by the k-th (1 ≦ k ≦ n) microphone k is a linear combination of the respective original signals s _j (t), x _k (t ) Is represented by the following formula (1). Further, if the observation signals for all the microphones are expressed by one equation, it can be expressed by the following equation (2). In these formulas (1) and (2), x (t) and s (t) represent column vectors each having x _k (t) and s _k (t) as elements, and A represents a _ij (t). This represents an n-row N-column matrix as an element. In the following, N = n.

ＩＣＡは、観測信号ｘから、原信号ｓと、ｘからｓを生成する行列（分離行列）とを推定する問題として定式化される。すなわち、分離行列をＷ、推定された原信号（すなわち分離結果）をｙとして、ｙ＝ＷｘなるｙとＷとを求める問題である。 The ICA is formulated as a problem of estimating an original signal s and a matrix (separation matrix) that generates s from x from the observed signal x. That is, it is a problem of obtaining y and W such that y = Wx, where W is a separation matrix and y is an estimated original signal (ie, separation result).

特開２００３−２６３１８９号公報JP 2003-263189 A 特開２００６−２３８４０９号公報JP 2006-238409 A 特開２００４−１４５１７２号公報JP 2004-145172 A “間隔の異なる複数のマイクペアによるブラインド音源分離”，澤田宏，荒木章子，向井良，牧野昭二，日本音響学会春季研究発表会, pp.621-622, 2002年3月“Blind sound source separation using multiple microphone pairs with different intervals”, Hiroshi Sawada, Akiko Araki, Ryo Mukai, Shoji Makino, Acoustical Society of Japan Spring Meeting, pp.621-622, March 2002 “マルチモーダル独立成分分析 −複数情報源からの共通特徴抽出法−”赤穂昭太郎，梅山伸二，電子情報通信学会論文誌，VOL.J83-A No.6 June 2000，pp.669-676“Multimodal Independent Component Analysis -Common Feature Extraction Method from Multiple Sources-” Shotaro Akaho, Shinji Umeyama, IEICE Transactions, VOL.J83-A No.6 June 2000, pp.669-676

ＩＣＡには、どの原信号がどの出力チャンネルに分離されるかが不定であるという問題が存在する。これを「チャンネル間のパーミュテーション問題」と呼ぶ。なお、「周波数ビン間のパーミュテーション問題」と呼ばれる、時間周波数領域のＩＣＡにおいて、どのチャンネルにどの原信号が出力されるかが周波数ビンによって不一致であるという問題も存在するが、この「周波数ビン間のパーミュテーション問題」は、本件発明者によって解決されている（特許文献２参照。）。 The ICA has a problem that it is indeterminate which original signal is separated into which output channel. This is called “permutation problem between channels”. In the ICA in the time-frequency domain, which is called “permutation problem between frequency bins”, there is a problem that which original signal is output to which channel is inconsistent depending on the frequency bin. The permutation problem between bins "has been solved by the present inventors (see Patent Document 2).

上記「チャンネル間のパーミュテーション問題」について、例えば、図２０において２個の音源を２本のマイクで収音し、それらを２つの成分へと分離する、すなわちＮ＝２、ｎ＝２の場合を考える。この場合、２つある出力チャンネルにどの音源が出力されるかは、以下の２通りがありえる。 Regarding the “permutation problem between channels”, for example, in FIG. 20, two sound sources are picked up by two microphones and separated into two components, that is, N = 2 and n = 2. Think about the case. In this case, there are the following two types of sound sources that are output to two output channels.

ケース１：チャンネル１＝音源１、チャンネル２＝音源２
ケース２：チャンネル１＝音源２、チャンネル２＝音源１
分離システムの出力がどちらのケースとなるかは、以下に示すようなさまざまな要因に依存するため、事前に予想することは困難である。 Case 1: channel 1 = sound source 1, channel 2 = sound source 2
Case 2: channel 1 = sound source 2, channel 2 = sound source 1
Which case the output of the separation system becomes depends on various factors as described below, and it is difficult to predict in advance.

・音源の種類
・音源とマイクとの位置関係
・ＩＣＡのアルゴリズム
・音響処理（フーリエ変換など）、前処理、後処理
・分離行列Ｗの初期値
次に、分離システムが複数ある場合について考える。例えば、上記の音源数＝２の環境において、２入力・２出力の分離システムを２つ動作させる場合を考える。２番目の分離システムは、１番目とは異なる位置のマイクを使用するものであってもよいし（すなわち、音源とマイクとの位置関係が２つの分離システムで異なる）、共有のマイクで別のアルゴリズムや初期値などを用いるものであっても良い。この場合、出力チャンネルにどの音源が出力されるかは、以下の４通りがあり得る。 -Type of sound source-Positional relationship between sound source and microphone-ICA algorithm-Acoustic processing (Fourier transform, etc.), pre-processing, post-processing-Initial value of separation matrix W Next, consider the case where there are multiple separation systems. For example, consider a case where two 2-input / 2-output separation systems are operated in the above-described environment where the number of sound sources = 2. The second separation system may use a microphone at a position different from the first (that is, the positional relationship between the sound source and the microphone is different between the two separation systems), or a separate microphone may be used. An algorithm, an initial value, or the like may be used. In this case, which sound source is output to the output channel can be the following four types.

ケース１：
分離システム１：チャンネル１＝音源１、チャンネル２＝音源２
分離システム２：チャンネル１＝音源１、チャンネル２＝音源２
ケース２：
分離システム１：チャンネル１＝音源１、チャンネル２＝音源２
分離システム２：チャンネル１＝音源２、チャンネル２＝音源１
ケース３：
分離システム１：チャンネル１＝音源２、チャンネル２＝音源１
分離システム２：チャンネル１＝音源１、チャンネル２＝音源２
ケース４：
分離システム１：チャンネル１＝音源２、チャンネル２＝音源１
分離システム２：チャンネル１＝音源２、チャンネル２＝音源１
なお、以降では、分離システムの間で、同一のチャンネルに同一の音源が出力されている場合（ケース１・ケース４）を「チャンネル同士の対応が取れている」と呼び、そうではない場合（ケース２・ケース３）を「チャンネル同士の対応が取れていない」と呼ぶ。 Case 1:
Separation system 1: channel 1 = sound source 1, channel 2 = sound source 2
Separation system 2: channel 1 = sound source 1, channel 2 = sound source 2
Case 2:
Separation system 1: channel 1 = sound source 1, channel 2 = sound source 2
Separation system 2: channel 1 = sound source 2, channel 2 = sound source 1
Case 3:
Separation system 1: channel 1 = sound source 2, channel 2 = sound source 1
Separation system 2: channel 1 = sound source 1, channel 2 = sound source 2
Case 4:
Separation system 1: channel 1 = sound source 2, channel 2 = sound source 1
Separation system 2: channel 1 = sound source 2, channel 2 = sound source 1
In the following, the case where the same sound source is output to the same channel between separation systems (case 1 and case 4) will be referred to as “the correspondence between the channels is taken”, otherwise ( Cases 2 and 3) are referred to as “channels are not compatible”.

ここで、チャンネルの対応が取れている出力のみを生成する、すなわちケース１や４のみを発生させ、ケース２や３を防止する場合において、上述のＩＣＡを用いた信号分離システム（以下、分離システムという）が複数あり、それらの出力チャンネル同士で分離結果の対応付けを行なう場合について考える。具体例としては、次のようなバリエーションを挙げることができる。 Here, in the case where only the output corresponding to the channel is generated, that is, only cases 1 and 4 are generated and cases 2 and 3 are prevented, the signal separation system using the above-mentioned ICA (hereinafter referred to as separation system). Let us consider a case where a plurality of output channels are associated with separation results. Specific examples include the following variations.

各分離システムが、関連はあるが異なる種類の信号を扱う場合。例えば、一方の分離システムが体内で発生する音の分離を行い、他方の分離システムが体内で発生する電気信号の分離を行なう場合において、それぞれの分離システムの出力から、心臓に由来する音信号（心音）と心臓の電気信号（心電図）とを対応付ける場合。 Each separation system handles different but related types of signals. For example, when one separation system separates sound generated in the body and the other separation system separates electrical signals generated in the body, the sound signal derived from the heart (from the output of each separation system ( When associating a heart sound with an electrical signal (electrocardiogram) of the heart.

異なる条件で観測された信号（同一種類）をそれぞれの分離システムが扱う場合。例えば、一方の分離システムが広い間隔のマイクで収録された音信号を扱い、他方の分離システムが狭い間隔のマイクで収録された信号を扱う場合において、同一の音源に由来する出力を２つの分離システムの間で対応付ける場合。他にも、一方のシステムでは横一列に配置された複数のマイクを使用し、他のシステムでは直行する方向に配置された（すなわち、縦一列に配置された）マイクを使用するような場合も同様である。 When each separation system handles signals (same type) observed under different conditions. For example, when one separation system handles sound signals recorded with a wide interval microphone and the other separation system handles signals recorded with a narrow interval microphone, the output from the same sound source is separated into two When mapping between systems. In other cases, one system uses a plurality of microphones arranged in a horizontal row, and the other system uses microphones arranged in a perpendicular direction (that is, arranged in a vertical row). It is the same.

信号の種類も観測条件も同一だが、ＩＣＡの初期条件がそれぞれの分離システムで異なる場合。例えば、それぞれの分離システムが時間周波数領域の分離システムであり、フーリエ変換の窓長やシフト幅などがそれぞれで異なる場合、又は分離行列の初期値が異なる場合において、同一の音源に由来する出力を２つの分離システムの間で対応付ける場合。 The signal type and observation conditions are the same, but the initial conditions for ICA differ for each separation system. For example, when each separation system is a separation system in the time-frequency domain and the window length and shift width of the Fourier transform are different from each other, or when the initial value of the separation matrix is different, outputs derived from the same sound source When mapping between two separation systems.

しかしながら、一般に、ＩＣＡには「パーミュテーション（置換）の不定性」と呼ばれる問題があり、どのチャンネルにどの原信号が出力されるかは不定であるため、複数の分離システムを別々に動作させただけでは、対応の取れていない分離結果が出力される、すなわち前述のケース２やケース３となる可能性がある。 However, in general, ICA has a problem called “permutation (replacement) indefiniteness”, and it is indeterminate which original signal is output to which channel. Therefore, a plurality of separation systems are operated separately. However, there is a possibility that separation results that cannot be dealt with will be output, that is, Case 2 or Case 3 described above.

ＩＣＡに関する従来技術の中には、上記の問題領域と一部重なるものが存在する。例えば、特許文献１及び非特許文献１には、広間隔マイクを用いた分離システムと狭間隔マイクを用いた分離システムとを用意し、低い周波数は広間隔マイクのシステムの分離結果を採用し、高い周波数は狭間隔マイクの分離結果を採用することにより、信号の分離性能を向上させる方法が提案されている。しかし、この方法においては、両方の分離システムの間で、出力チャンネル同士の対応付けを行なう必要があるが、その方法については言及されていない。 Some of the prior arts related to ICA overlap with the above problem areas. For example, in Patent Document 1 and Non-Patent Document 1, a separation system using a wide-space microphone and a separation system using a narrow-space microphone are prepared, and the low frequency adopts the separation result of the wide-space microphone system, A method has been proposed for improving the signal separation performance by adopting the separation result of the narrow interval microphone for high frequencies. However, in this method, it is necessary to associate output channels between both separation systems, but this method is not mentioned.

対応付けを行なう方法の１つとして、後処理による並べ替えがある。例えば、分離システム自体は、上記のケース２やケース３を生成するが、その後に何らかの尺度に基づいて出力をチャンネル間で交換し、最終的にケース１やケース４を得る方法である。しかし、後処理で用いる尺度は、信号の種類に依存するため、上記の問題領域全てに対応した方法は存在しない。例えば、音データを扱う場合は、時間方向エンベロープの類似性や音源方向の推定角度といった、周波数ビン間のパーミュテーション問題を解消するために使用されている尺度を流用することで、ある程度はチャンネル間の対応付けができるが、他の種類の信号への応用は困難である。なお、時間方向エンベロープや音源方向を用いた周波数ビン間パーミュテーション解消については、例えば特許文献３を参照されたい。 One method for performing the association is rearrangement by post-processing. For example, the separation system itself generates the case 2 and the case 3 described above, but thereafter, the output is exchanged between channels based on some scale, and finally the case 1 and the case 4 are obtained. However, since the scale used in the post-processing depends on the type of signal, there is no method corresponding to all the above problem areas. For example, when dealing with sound data, by using the scales used to solve permutation problems between frequency bins, such as the similarity of the temporal direction envelope and the estimated angle of the sound source direction, to some extent the channel Can be associated with each other, but application to other types of signals is difficult. For canceling permutation between frequency bins using a time direction envelope or a sound source direction, see, for example, Patent Document 3.

また、音信号であっても、複数の分離システムでマイクの設置位置が異なる場合は、それぞれのシステムで推定された音源方向が異なるため、音源方向に基づく対応付けが困難になる可能性もある。 In addition, even for sound signals, if microphones are installed at different positions in a plurality of separation systems, the sound source directions estimated by the respective systems may be different, which may make association based on the sound source directions difficult. .

さらに、出力チャンネル数や分離システム数が増えるにつれて、対応付けの組み合わせは爆発的に増えるため、後処理で最適な対応付けを見つけるのが計算量的に困難になる。例えば、出力チャンネル数＝４、分離システム数＝３の場合、可能な組み合わせは（４！）^３＝１３８２４通りにも及ぶ。従って、後処理で対応付けするよりも、対応付けされた結果を分離システムが直接出力する方が計算量的にも望ましい。 Furthermore, as the number of output channels and the number of separation systems increase, the number of combinations increases rapidly, making it difficult to find an optimal association in post-processing. For example, when the number of output channels = 4 and the number of separation systems = 3, there are (4!) ³ = 13824 possible combinations. Therefore, it is more computationally preferable that the separation system directly outputs the associated result than the association in post-processing.

次に、対応付けをＩＣＡの後処理で実現するのではなく、対応付けのなされた信号を分離システムが直接出力する場合について考える。例を挙げると、音信号を分離する分離システムが２つあって、一方の分離システムで、「チャンネル１＝音声、チャンネル２＝音楽」という分離結果が得られるとともに、他方の分離システムでも同様に「チャンネル１＝音声、チャンネル２＝音楽」という分離結果が得られるような場合である。この場合、原信号は２つで、一方が音声、他方が音楽である。 Next, let us consider a case where the separation system directly outputs the associated signal instead of realizing the association by post-processing of ICA. For example, there are two separation systems for separating sound signals, and one separation system can obtain a separation result of “channel 1 = sound, channel 2 = music”, and the other separation system similarly. This is a case where a separation result “channel 1 = sound, channel 2 = music” is obtained. In this case, there are two original signals, one is voice and the other is music.

図２１は、ＩＣＡを用いたｍ個の分離システム１００_１〜１００_ｍを備えた場合の信号の分離結果を示す模式図である。各分離システムは、ｎ個の独立な成分Ｙを分離結果として出力する。この図において、Ｙ_ｉ ^［ｊ］は、時系列データ（又はそれに類するデータ）であり、ｉ番目の分離システムのｊ番目のチャンネルの出力全体を表す。また、Ｙ_ｉ ^［ｊ］の中で時刻ｔ（又はｔ番目のサンプル）をＹ_ｉ ^［ｊ］（ｔ）と表記する。Ｙ_ｉ ^［ｊ］（ｔ）は、音声波形といったスカラー量でも、スペクトラムといった多次元量（多変量）でも構わないが、スカラー量は多次元量の特別な場合と考えられるため、以降では主にＹ_ｉ ^［ｊ］（ｔ）が多次元量（ベクトル）である場合、特にスペクトラムである場合について考える。なお、次元数は、同一の分離システム内では一致している必要があるが、異なる分離システム間では異なる次元数でも構わない。例えば、Ｙ_１ ^［１］（ｔ）とＹ_２ ^［１］（ｔ）とは同次元である必要があるが、Ｙ_１ ^［１］（ｔ）とＹ_１ ^［２］（ｔ）とは次元数が異なっても構わない。 FIG. 21 is a schematic diagram illustrating a signal separation result when _m separation systems 100 _{1 to} 100 _m using ICA are provided. Each separation system outputs n independent components Y as separation results. In this figure, Y _i ^[j] is time series data (or similar data) and represents the entire output of the j th channel of the i th separation system. _Further, denoted ^{Y i [j]} at time t in the (or t-th sample) _Y ^{i [j]} and (t). Y _i ^[j] (t) may be a scalar quantity such as a speech waveform or a multi-dimensional quantity (multivariate) such as a spectrum. However, since the scalar quantity is considered to be a special case of a multi-dimensional quantity, Consider a case where Y _i ^[j] (t) is a multidimensional quantity (vector), particularly a spectrum. Note that the number of dimensions needs to be the same in the same separation system, but the number of dimensions may be different between different separation systems. For example, Y ₁ ^[1] (t) and Y ₂ ^[1] (t) need to have the same dimension, but Y ₁ ^[1] (t) and Y ₁ ^[2] (t) have dimensions. The number may be different.

ここで、同一システム内の異なる出力チャンネル間では信号を独立にする一方で、異なるシステムの同一チャンネル間では信号の対応がついている（＝依存関係がある）ように信号を分離することを考える。例えば、Ｙ_１ ^［１］〜Ｙ_ｎ ^［１］は互いに独立だが、Ｙ_１ ^［１］〜Ｙ_１ ^［ｍ］は依存関係がある。言い換えると、ｋ番目の出力チャンネル内でＹ_ｋ ^［１］〜Ｙ_ｋ ^［ｍ］という信号のペアを作ったときに、ペア内は依存、ペア間は独立になるように信号を分離する。 Here, it is considered that the signals are separated between different output channels in the same system, while signals are separated (= dependence) between the same channels of different systems. For example, Y ₁ ^{[1] to} Y _n ^[1] are independent from each other, but Y ₁ ^{[1] to} Y ₁ ^[m] are dependent. In other words, when made k-th Y _{k ^[1]} in the output channel ^{to Y k} _^[m] that the signal pairs, inside the pair-dependent, is between a pair separates the signal to be independently.

このようなペア単位での独立成分を得る方法の一つとして、「マルチモーダル独立成分分析」と呼ばれる手法が提案されている（例えば、非特許文献２参照。）。この「マルチモーダル独立成分分析」は、分離システムが２つ（ｍ＝２）の場合に、同一の分離システムにおいては出力間で独立性を最大にする（図２１でたとえると、Ｙ_１ ^［１］とＹ_２ ^［１］とは独立）と共に、異なる分離システム間の同一チャンネル間では独立性を最小にする（Ｙ_１ ^［１］とＹ_１ ^［２］とは独立性が最小）ように信号を分離するようなものである。 As one method for obtaining such independent components in pairs, a technique called “multimodal independent component analysis” has been proposed (see, for example, Non-Patent Document 2). This “multimodal independent component analysis” maximizes the independence between outputs in the same separation system when there are two separation systems (m = 2) (Y ₁ ^{[1 ]} And Y ₂ ^[1] are independent of each other, and the independence between the same channels between different separation systems is minimized (Y ₁ ^[1] and Y ₁ ^[2] are the least independent). Is like separating.

しかしながら、マルチモーダル独立成分分析では、例えば、図２１のようにｍ＞２の場合や、各Ｙ_ｉ ^［ｊ］がベクトルである場合、次の理由により信号を分離することが困難であった。 However, in multimodal independent component analysis, for example, when m> 2 as shown in FIG. 21 or when each Y _i ^[j] is a vector, it is difficult to separate signals for the following reason.

先ず、「同一の分離システムにおいての独立性（最大化）」と「異なるシステム間での独立性（最小化）」という２つの尺度を統合するため、両者の重みつき和を用いているが、重みをヒューリスティックに調整しなければならなかった。 First, in order to integrate the two measures of “independence in the same separation system (maximization)” and “independence between different systems (minimization)”, the weighted sum of the two is used. The weight had to be adjusted heuristically.

次に、「異なるチャンネル間での独立性」を計算するため、２次元のGram-Charlier展開を用いてエントロピーを近似しているが、Gram-Charlier展開の式は、２次元の場合で既に複雑であるため、分離システムの個数を３以上に増やしたり、Ｙ_ｉ ^［ｊ］を多次元へ拡張したりするのが困難であった。 Next, in order to calculate “independence between different channels”, entropy is approximated using a two-dimensional Gram-Charlier expansion, but the Gram-Charlier expansion formula is already complex in the two-dimensional case. Therefore, it is difficult to increase the number of separation systems to 3 or more, or to extend Y _i ^[j] to multiple dimensions.

本発明は、このような従来の実情に鑑みて提案されたものであり、独立成分分析を用いて複数の信号が混合された混合信号を信号毎に分離する際に、分離後の後処理を行なうことなく、信号を対応付けて分離することができる信号分離装置及び方法を提供することを目的とする。 The present invention has been proposed in view of such a conventional situation, and when separating a mixed signal in which a plurality of signals are mixed for each signal using independent component analysis, post-processing after separation is performed. It is an object of the present invention to provide a signal separation device and method capable of associating and separating signals without performing them.

上述した目的を達成するために、本発明に係る信号分離装置は、複数の信号が混合された時間領域または時間周波数領域の観測信号と分離行列とから分離信号を生成する複数の分離手段と、異なる複数の分離手段から生成される複数の分離信号の間で対応する分離信号の信号組を生成する信号組生成手段と、上記信号組を引数にとる多変量確率密度関数またはスコア関数を用いて上記信号組間の独立性または独立性の尺度の変化分を演算し、上記信号組間の独立性が最大となるように上記複数の分離手段を制御する独立性演算手段とを有し、各分離手段は、上記観測信号と初期値が代入された分離行列とから分離信号を生成し、上記独立性演算手段で演算された上記信号組間の独立性に基づいて上記分離行列が略々収束するまで該分離行列を修正し、略々収束した分離行列を用いて上記分離信号を生成する。 In order to achieve the above-described object, a signal separation device according to the present invention includes a plurality of separation means for generating a separation signal from an observation signal and a separation matrix in a time domain or a time frequency domain in which a plurality of signals are mixed, Using a signal set generation means for generating a signal set of corresponding separation signals among a plurality of separation signals generated from different separation means, and a multivariate probability density function or score function taking the signal set as an argument An independence calculation means for calculating the independence between the signal sets or a change in a measure of independence, and controlling the plurality of separation means so that the independence between the signal sets is maximized, Separation means generates a separation signal from the observed signal and a separation matrix into which initial values are substituted, and the separation matrix substantially converges based on the independence between the signal sets computed by the independence computation means. The separation matrix until Correct, and generates the separated signal by using the substantially converged separation matrix.

また、上述した目的を達成するために、複数の信号が混合された時間領域または時間周波数領域の観測信号を、独立成分分析を用いて分離信号に分離する信号分離装置において、上記観測信号と分離行列とから分離信号を生成する分離手段と、上記分離手段で生成される複数の分離信号の間で対応する分離信号の信号組を生成する信号組生成手段と、上記信号組を引数にとる多変量確率密度関数またはスコア関数を用いて上記信号組間の独立性または独立性の尺度の変化量を演算し、上記信号組間の独立性が最大となるように上記分離手段を制御する独立性演算手段とを有し、上記分離手段は、上記観測信号と初期値が代入された分離行列とから分離信号を生成し、上記独立性演算手段で演算された上記信号組間の独立性に基づいて上記分離行列が略々収束するまで該分離行列を修正し、略々収束した分離行列を用いて上記分離信号を生成する。 In order to achieve the above-described object, a signal separation apparatus that separates an observation signal in a time domain or a time frequency domain, in which a plurality of signals are mixed, into a separated signal using independent component analysis, is separated from the observed signal. Separation means for generating a separation signal from the matrix, signal set generation means for generating a signal set of a corresponding separation signal among a plurality of separation signals generated by the separation means, and a multiplicity that takes the signal set as an argument. The independence of calculating the independence between the signal pairs using the random probability density function or the score function, or controlling the separation means so that the independence between the signal sets is maximized. The separation means generates a separation signal from the observation signal and a separation matrix into which the initial values are substituted, and based on the independence between the signal sets computed by the independence computation means Separation Column modifies the separation matrix until substantially converge, it generates the separated signal by using the substantially converged separation matrix.

また、上述した目的を達成するために、本発明に係る信号分離方法は、複数の信号が混合された時間領域または時間周波数領域の観測信号を、独立成分分析を用いて分離信号に分離する信号分離方法において、上記観測信号と分離行列とから分離信号を複数の分離手段で生成する工程と、異なる複数の分離手段から生成される複数の分離信号の間で対応する分離信号の信号組を生成する工程と、上記信号組を引数にとる多変量確率密度関数またはスコア関数を用いて上記信号組間の独立性または独立性の尺度の変化分を演算する工程と、上記観測信号と初期値が代入された分離行列とから分離信号を生成し、上記確率密度関数またはスコア関数を用いて演算された信号組間の独立性が最大となるように上記分離行列が略々収束するまで該分離行列を修正する工程とを有する。 In addition, in order to achieve the above-described object, the signal separation method according to the present invention is a signal for separating an observation signal in a time domain or a time frequency domain in which a plurality of signals are mixed into a separated signal using independent component analysis. In the separation method, a signal set of a corresponding separation signal is generated between the step of generating a separation signal from the observation signal and the separation matrix by a plurality of separation means and a plurality of separation signals generated from a plurality of different separation means. A step of calculating an independence between the signal sets or a change in a measure of independence using a multivariate probability density function or a score function taking the signal set as an argument, and the observed signal and the initial value are A separation signal is generated from the substituted separation matrix and the separation matrix is substantially converged until the separation matrix is substantially converged so that the independence between the signal sets calculated using the probability density function or the score function is maximized. And a step of modifying a column.

また、上述した目的を達成するために、本発明に係る信号分離方法は、複数の信号が混合された時間領域または時間周波数領域の観測信号を、独立成分分析を用いて分離信号に分離する信号分離方法において、上記観測信号と分離行列とから分離信号を生成する工程と、上記分離信号の間で対応する分離信号の信号組を生成する工程と、上記信号組を引数にとる多変量確率密度関数またはスコア関数を用いて上記信号組間の独立性または独立性の尺度の変化分を演算する工程と、上記観測信号と初期値が代入された分離行列とから分離信号を生成し、上記確率密度関数またはスコア関数を用いて演算された信号組間の独立性が最大となるように上記分離行列が略々収束するまで該分離行列を修正する工程とを有する。 In addition, in order to achieve the above-described object, the signal separation method according to the present invention is a signal for separating an observation signal in a time domain or a time frequency domain in which a plurality of signals are mixed into a separated signal using independent component analysis. In the separation method, a step of generating a separation signal from the observed signal and the separation matrix, a step of generating a signal pair of a corresponding separation signal between the separation signals, and a multivariate probability density taking the signal pair as an argument A separation signal is generated from the step of calculating the independence between the signal pairs using a function or a score function, or a change in the measure of independence, and the separation matrix into which the observed signal and the initial value are substituted, and the probability Modifying the separation matrix until the separation matrix is substantially converged so that the independence between the signal sets calculated using the density function or the score function is maximized.

本発明に係る音声信号分離装置及びその方法によれば、分離信号の信号組を生成し、信号組を引数にとる多変量確率密度関数またはスコア関数を用いて上記信号組間の独立性または独立性の尺度の変化分を演算し、演算された信号組間の独立性が最大となるように分離行列が略々収束するまで該分離行列を修正し、略々収束した分離行列を用いて分離信号を生成することにより、分離後の後処理を行なうことなく、信号を対応付けて分離することができる。 According to the speech signal separation device and method therefor according to the present invention, a signal set of separated signals is generated, and the independence or independence between the signal sets using a multivariate probability density function or a score function that takes the signal set as an argument. The change of the gender measure is calculated, the separation matrix is corrected until the separation matrix is substantially converged so that the independence between the calculated signal pairs is maximized, and the separation is performed using the substantially converged separation matrix. By generating the signals, the signals can be associated and separated without performing post-processing after separation.

以下、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。以下の説明において使用する「ペア」とは、１より多くの信号の組を表し、２つの信号の組に限定されるものではない。 Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings. A “pair” used in the following description represents more than one set of signals and is not limited to two sets of signals.

図１は、本発明の一実施形態における信号分離装置の構成を示すブロック図である。この信号分離装置は、独立成分分析（Independent Component Analysis；ＩＣＡ）を用いるｍ個の分離システム１_１〜１_ｍを有している。なお、分離システムの詳細は後述する。 FIG. 1 is a block diagram showing a configuration of a signal separation device according to an embodiment of the present invention. This signal separation device has m separation systems 1 ₁ to 1 _m using Independent Component Analysis (ICA). Details of the separation system will be described later.

信号統合部２は、各分離システムの出力（＝分離結果）を必要に応じていくつかの信号に統合する。 The signal integration unit 2 integrates the outputs (= separation results) of each separation system into several signals as necessary.

出力部３は、スピーカやディスプレイなどのデバイスであり、音声、音楽、波形等を出力する。例えば、時間周波数領域のＩＣＡを用いた音信号の分離システムの場合、時間周波数領域信号（＝スペクトログラム）を時間領域信号（＝波形）に変換し、スピーカ等から出力する。 The output unit 3 is a device such as a speaker or a display, and outputs voice, music, waveforms, and the like. For example, in the case of a sound signal separation system using an ICA in the time frequency domain, a time frequency domain signal (= spectrogram) is converted into a time domain signal (= waveform) and output from a speaker or the like.

制御部４は、各構成部を統括的に制御し、例えば、各分離システムで分離された分離信号の統合や出力を信号間の対応付けに基づいて制御する。 The control unit 4 comprehensively controls each component, and controls, for example, the integration and output of the separated signals separated by each separation system based on the correspondence between the signals.

独立性計算部５は、分離システム１_１〜１_ｍから出力される複数の信号の間で対応する信号の信号ペア間の独立性を計算し、独立性に基づいて分離システム１_１〜１_ｍを制御する。なお、独立性そのものを計算する代わりに、独立性の変化量に対応した何らかの量（例えば後述のスコア関数）を計算しても構わない。 Independence calculation unit 5, the separation system 1 ₁ to 1 independence between signals corresponding pair of signals calculated between a plurality of signals output from the _m, the separation system 1 ₁ to 1 _m based on independence To control. Instead of calculating the independence itself, any amount corresponding to the amount of change in independence (for example, a score function described later) may be calculated.

マッピング部６は、分離システム１_１〜１_ｍから出力される複数の信号の間で対応する信号の信号ペアを生成し、独立性計算部５で信号ペア間の独立性を計算する際、どの信号の間でペアを構成するかを表す情報を与える。すなわち信号間の対応付けを行なう。具体的には、マイクの間隔や位置、サンプリング周波数、短時間フーリエ変換の窓長、短時間フーリエ変換のシフト幅、分離行列の初期値、観測信号スペクトログラムの正規化方法から選択される１以上の条件に基づいてペアを構成する。なお、詳細は後述する。 The mapping unit 6 generates a signal pair of corresponding signals among a plurality of signals output from the separation systems 1 ₁ to 1 _m , and the independence calculation unit 5 calculates the independence between the signal pairs. Information indicating whether a pair is formed between signals is given. That is, association between signals is performed. Specifically, one or more selected from the interval and position of the microphone, the sampling frequency, the window length of the short-time Fourier transform, the shift width of the short-time Fourier transform, the initial value of the separation matrix, and the normalization method of the observation signal spectrogram Configure pairs based on conditions. Details will be described later.

次に、分離システム１_１〜１_ｍについて詳細に説明する。図２は、ｊ番目の分離システムの構成を示すブロック図である。分離システム１_ｊは、複数個のセンサ１１_ｊ１〜１１_ｊｎと、分離行列更新部１２_ｊとを有し、観測信号Ｘ_ｋ ^［ｊ］（Ｘ_ｋ ^［ｊ］（ｋ番目のセンサで観測された信号）から構成されるベクトルに、分離のための行列Ｗ^［ｊ］をかけることで、分離結果（又は、「成分」）Ｙ_１ ^［ｊ］〜Ｙ_ｎ ^［ｊ］を生成する。 Next, the separation systems 1 ₁ to 1 _m will be described in detail. FIG. 2 is a block diagram showing the configuration of the jth separation system. The separation system 1 _j has a plurality of sensors 11 _{j1 to} 11 _jn and a separation matrix update unit 12 _j, and is observed by an observation signal X _k ^[j] (X _k ^[j] (kth sensor). A separation result (or “component”) Y ₁ ^{[j] to} Y _n ^[j] is generated by multiplying a vector composed of the signal) by a matrix W ^[j] for separation.

センサ１１_ｊ１〜１１_ｊｎは、扱う信号の種類に応じて種々ものを用いることができ、例えば、音信号を扱う場合はマイクロフォンを用いることができる。また、センサの個数は、分離システム１_ｊごとに異なってもよく、また、複数の分離システムの間でセンサを共有してもよい。センサ１１_ｊ１〜１１_ｊｎで観測された信号Ｘ_１ ^［ｊ］〜Ｘ_ｎ ^［ｊ］は、スカラー量でも多次元量（ベクトル量）でもよい。また、Ｘ_ｋ ^［ｊ］の要素数（ベクトルの次元数）は、分離システムごとに異なる値でもよい。 Various sensors 11 _{j1 to} 11 _jn can be used in accordance with the type of signal to be handled. For example, when a sound signal is handled, a microphone can be used. Further, the number of sensors may be different for each separation system 1 _j , and the sensors may be shared among a plurality of separation systems. Signals X ₁ ^{[j] to} X _n ^[j] observed by the sensors 11 _{j1 to} 11 _jn may be scalar quantities or multidimensional quantities (vector quantities). In addition, the number of elements of X _k ^[j] (the number of vector dimensions) may be different for each separation system.

分離行列更新部１２_ｊは、独立性計算部５で計算された独立性に基づいて分離行列Ｗ^［ｊ］を更新する。 The separation matrix update unit 12 _j updates the separation matrix W ^[j] based on the independence calculated by the independence calculation unit 5.

この分離システム１_ｊは、分離システム間又は分離システム内でペアを構成するために、分離された成分Ｙ_１ ^［ｊ］〜Ｙ_ｎ ^［ｊ］をマッピング部６へ送る。そして、独立性計算部５は、マッピング部６で構成されたペアの間の独立性（又は、独立性の変化量に対応した量）を計算する。独立性計算部５で計算された独立性を表す尺度は、分離行列更新部１２_ｊへ送られ、分離行列更新部１２_ｊは、新たな分離行列Ｗ^［ｊ］の値を計算する。なお、独立性計算部５及びマッピング部６の詳細な処理については、後述の具体例ごとに説明する。 The separation system 1 _j sends the separated components Y ₁ ^{[j] to} Y _n ^[j] to the mapping unit 6 in order to form a pair between the separation systems or within the separation system. Then, the independence calculation unit 5 calculates the independence between the pairs configured by the mapping unit 6 (or an amount corresponding to the change amount of independence). Measure of the calculated independence at isolation calculation unit 5 is sent to the separation matrix updating section 12 _j, a separation matrix updating section 12 _j calculates the value of a new separation matrix W ^[j]. Detailed processing of the independence calculation unit 5 and the mapping unit 6 will be described for each specific example described later.

次に、信号分離処理の概略を図３に示すフローチャートを用いて説明する。先ず、センサ１１_ｊ１〜１１_ｊｎを用いて信号を観測する（ステップＳ１１）。音のＩＣＡの場合、マイクで音を収録する。ステップＳ１２では、前処理としてセンサ１１_ｊ１〜１１_ｊｎで取得した観測信号の加工を行なう。 Next, the outline of the signal separation processing will be described with reference to the flowchart shown in FIG. First, signals are observed using the sensors 11 _{j1 to} 11 _jn (step S11). In the case of sound ICA, sound is recorded with a microphone. In step S12, the observation signals acquired by the sensors 11 _{j1 to} 11 _jn are processed as preprocessing.

ステップＳ１３からステップＳ１５までは学習ループであり、分離行列及び分離結果が収束するまで分離処理（ステップＳ１４）を繰り返し実行する。このように分離行列及び分離結果が収束するまで処理を繰り返すことを、以下「学習」と呼ぶ。 Steps S13 to S15 are a learning loop, and the separation process (step S14) is repeatedly executed until the separation matrix and the separation result converge. Repeating the process until the separation matrix and the separation result converge in this manner is hereinafter referred to as “learning”.

ステップＳ１５において学習が終了した場合、後処理として信号を再び加工する（ステップＳ１６）。後処理された信号は、分離結果として後段で処理される（ステップＳ１７）。後段の処理とは、例えば音声認識処理などである。 When learning is completed in step S15, the signal is processed again as post-processing (step S16). The post-processed signal is processed at a later stage as a separation result (step S17). The subsequent processing is, for example, voice recognition processing.

次に、ステップＳ１２における前処理について、図４に示すフローチャートを用いて説明する。ここでは、入力として音信号、分離処理として時間周波数領域のＩＣＡを用いる場合について説明する。 Next, the pre-processing in step S12 will be described using the flowchart shown in FIG. Here, a case where a sound signal is used as input and a time-frequency domain ICA is used as separation processing will be described.

ステップＳ１２１のＡＤ変換でデジタルに変換された観測信号は、短時間フーリエ変換によって時間周波数領域の信号へ変換される（ステップＳ１２２）。このような時間周波数領域の信号をスペクトログラムと呼ぶ。この操作により、時系列のスカラー量だった観測信号が時系列のベクトル量（多変量）へと変換される。 The observation signal converted into digital by the AD conversion in step S121 is converted into a signal in the time-frequency domain by short-time Fourier transform (step S122). Such a signal in the time frequency domain is called a spectrogram. By this operation, the observation signal which is a time series scalar quantity is converted into a time series vector quantity (multivariate).

次に、ステップＳ１２３において、観測信号スペクトログラムに対して必要に応じて正規化を行なう。正規化としては、例えばスペクトログラムの周波数ビンごとに、時間方向の平均が０、分散が１となるように信号を加工する手法を用いることができる。また、正規化は、分離システム毎、チャンネル毎、又は周波数ビン毎に行ってもよい。 Next, in step S123, normalization is performed on the observed signal spectrogram as necessary. As normalization, for example, a method of processing a signal so that the average in the time direction becomes 0 and the variance becomes 1 for each frequency bin of the spectrogram can be used. Further, normalization may be performed for each separation system, each channel, or each frequency bin.

正規化された観測信号に対して、必要に応じて後述するような無相関化を行なう（ステップＳ１２４）。ステップＳ１２５では、分離行列の初期値を設定する。初期値としては、単位行列でもよいが、前回の学習で求めた分離行列を用いてもよい。 The normalized observation signal is subjected to decorrelation as will be described later if necessary (step S124). In step S125, the initial value of the separation matrix is set. As an initial value, a unit matrix may be used, but a separation matrix obtained in the previous learning may be used.

次に、ステップＳ１４における分離処理について、図５に示すフローチャートを用いて説明する。この分離処理の概略は、「ペア間の独立性が増大する方向に分離行列を更新する」という処理を、分離行列が収束するまで繰り返すことである。 Next, the separation processing in step S14 will be described using the flowchart shown in FIG. The outline of this separation process is to repeat the process of “updating the separation matrix in a direction in which the independence between pairs increases” until the separation matrix converges.

先ず、ペア間の独立性を計算するため、分離システム間、又は分離システム内の複数チャンネル間の成分でペアを構成する（ステップＳ１４１）。次に、ペアの間の独立性を計算し（ステップＳ１４２）、その独立性が増大するように、分離行列の修正値であるΔＷを計算する（ステップＳ１４３）。最後に、修正値に基づいて分離行列を更新する（ステップＳ１４４）。 First, in order to calculate the independence between pairs, a pair is constituted by components between separation systems or between a plurality of channels in the separation system (step S141). Next, independence between the pairs is calculated (step S142), and ΔW, which is a modified value of the separation matrix, is calculated so that the independence increases (step S143). Finally, the separation matrix is updated based on the correction value (step S144).

なお、実際のシステムにおいては、ステップＳ１４２〜ステップＳ１４４をまとめて一つの式で表現してもよく、後述する式（１５）及び式（１７）がそれに該当する。この場合、ステップＳ１４２のペア間の独立性の計算に該当するのは、式（１７）に出現するスコア関数の値を計算することである。また、ステップＳ１４１のペアを構成する処理に該当するのは、「それぞれのスコア関数がどの成分を引数として取るか」という対応付けであり、マッピング部６で行われる。 In an actual system, step S142 to step S144 may be collectively expressed as one expression, and expressions (15) and (17) described later correspond thereto. In this case, what corresponds to the calculation of the independence between the pairs in step S142 is to calculate the value of the score function appearing in the equation (17). Also, the processing that forms the pair in step S141 corresponds to “which component each score function takes as an argument” and is performed by the mapping unit 6.

次に、ステップＳ１６における後処理について、図６に示すフローチャートを用いて説明する。後処理の概略は、前処理で時間周波数領域へ変換された信号（スペクトログラム）を、時間領域の信号（波形）へ戻すことであり、従来の時間周波数領域ＩＣＡの後処理と同様である。 Next, the post-processing in step S16 will be described using the flowchart shown in FIG. The outline of the post-processing is to return the signal (spectrogram) converted to the time-frequency domain in the pre-processing to the time-domain signal (waveform), and is the same as the post-processing of the conventional time-frequency domain ICA.

ステップＳ１６１のリスケーリングは、周波数ビンの間のスケールを調整する処理である。その後、ステップＳ１６２のフーリエ逆変換によって、分離結果を波形へ戻す。これらの処理は、ステップＳ１６の後段の処理の内容によっては省略可能である。 The rescaling in step S161 is a process of adjusting the scale between frequency bins. Thereafter, the separation result is returned to the waveform by the inverse Fourier transform in step S162. These processes can be omitted depending on the contents of the process subsequent to step S16.

次に、上述した信号分離処理について、信号例として音声のスペクトルを用いて詳細に説明する。 Next, the signal separation processing described above will be described in detail using a speech spectrum as a signal example.

先ず、マイクロフォンで観測された信号に対して短時間フーリエ変換（STFT）をかけ、フレーム毎にスペクトルを求める。サンプリング周波数をＦｓ、ＳＴＦＴの窓幅をＬ、Ｍ＝Ｌ／２＋１とすると、ＳＴＦＴで求めたスペクトルは、０からＦｓ／２までの周波数をＭ等分した周波数（周波数ビン）に対応する成分である。これらをＸ_ｋ ^［ｊ］(１，ｔ)，・・・，Ｘ_ｋ ^［ｊ］（Ｍ，ｔ）とおき、さらにそれらを縦に並べたベクトルをＸ_ｋ ^［ｊ］（ｔ）とする。 First, a short-time Fourier transform (STFT) is applied to the signal observed by the microphone to obtain a spectrum for each frame. Assuming that the sampling frequency is Fs, the STFT window width is L, and M = L / 2 + 1, the spectrum obtained by the STFT is a component corresponding to a frequency (frequency bin) obtained by dividing the frequency from 0 to Fs / 2 into M equal parts. is there. These are set as X _k ^[j] (1, t),..., X _k ^[j] (M, t), and a vector obtained by arranging them vertically is set as X _k ^[j] (t).

次に、観測信号から分離結果を生成するための行列について説明する。この行列の表現方法は、特定の周波数ビンに対応したものから全分離システムの信号をまとめて分離するものまで、いくつかの段階が可能である。 Next, a matrix for generating a separation result from the observation signal will be described. This matrix representation method can have several stages, from one corresponding to a specific frequency bin to one that separates the signals of all separation systems together.

分離システム１_ｊにおいて、ω番目の周波数ビンに対応した分離結果Ｙ_１ ^［ｊ］（ω，ｔ），・・・，Ｙ_ｎ ^［ｊ］(ω，ｔ)は、ω番目の周波数ビンに対応した観測信号Ｘ_１ ^［ｊ］（ω，ｔ），・・・，Ｘ_ｎ ^［ｊ］（ω，ｔ）に分離行列Ｗ^［ｊ］（ω）をかけることで生成することができる。これを下記式（３）で表す。 In the separation system 1 _j , the separation results Y ₁ ^[j] (ω, t),..., Y _n ^[j] (ω, t) corresponding to the ωth frequency bin correspond to the ωth frequency bin. observed signals _X ^{1 [j]} that _{(ω, t), ···,} X n [j] (ω, t) to the separation matrix ^{W [j] (ω)} can be produced by applying a. This is represented by the following formula (3).

次に、式（３）を全ての周波数ビン（ω＝１〜Ｍ）について展開し、要素を並び替えると、下記式（４）を得る。これは、下記式（５）及び下記式（６）と等価であって、一つの分離システムにおいて、そのシステムに入力される観測信号を一度に分離する式である。 Next, when Expression (3) is expanded for all frequency bins (ω = 1 to M) and the elements are rearranged, the following Expression (4) is obtained. This is equivalent to the following formula (5) and the following formula (6), and is a formula that separates observation signals input to the system at one time in one separation system.

次に、全ての分離システム１_１〜１_ｍの観測信号を同時に分離する式を考える。そのためには、上記式（４）を全てのｊ（ｊ＝１〜ｍ）について展開し、要素を並び替えればよく、結果として下記式（７）を得る。式（７）において、分離結果，分離行列，観測信号をそれぞれＹ（ｔ），Ｗ，Ｘ（ｔ）とおくと、下記式（８）に示すようにＹ（ｔ）＝ＷＸ（ｔ）と書き表すことができる。すなわち、複数の分離システム１_１〜１_ｍを有する装置であっても、全ての観測信号の同時分離を一つの式で書き表すことができる。 Next, consider an equation for simultaneously separating the observation signals of all the separation systems 1 ₁ to 1 _m . For that purpose, the above equation (4) is expanded for all j (j = 1 to m) and the elements are rearranged, and as a result, the following equation (7) is obtained. In Equation (7), if the separation result, separation matrix, and observation signal are Y (t), W, and X (t), respectively, Y (t) = WX (t) as shown in Equation (8) below. Can be written. That is, even in an apparatus having a plurality of separation systems 1 ₁ to 1 _m , simultaneous separation of all observation signals can be expressed by one equation.

このように全分離システム１_１〜１_ｍの信号をまとめて分離する行列を用いることにより、後述する「ペア単位の独立性」を導入することができる。 By using a matrix that collectively separates the signals of all the separation systems 1 ₁ to 1 _m in this way, “independence of pair units” described later can be introduced.

次に、「ペア単位の独立性」を表す尺度の一つとして、多変量（多次元）の確率密度関数（Probability Density Function: PDF）から計算されるKullback-Leiblar（ＫＬ）情報量について説明する。 Next, the Kullback-Leiblar (KL) information amount calculated from the multivariate (multidimensional) probability density function (Probability Density Function: PDF) will be described as one of the measures representing “independence of pair units”. .

Kullback-Leiblar情報量を導入するための準備として、２種類のＰＤＦを導入する。一つは、全ての分離システムの分離結果（出力）を引数としてとる関数、すなわち、ある分離結果の同時確率を与える関数である。これを下記式（９）に示す。以下では、式（９）の左端のような全引数を縦ベクトルで表記したものと、中央のような全引数を横に並べたものとを、等価なものとして扱う。なお、式（９）のＰＤＦは、式変形の途中で消えてしまうため、具体的な式を当てはめる必要はない。 As preparation for introducing the Kullback-Leiblar information amount, two types of PDF are introduced. One is a function that takes the separation results (outputs) of all separation systems as an argument, that is, a function that gives the joint probability of a certain separation result. This is shown in the following formula (9). In the following description, all the arguments such as the left end of Expression (9) are expressed as vertical vectors, and all the arguments such as the center arranged horizontally are treated as equivalent. In addition, since PDF of Formula (9) disappears in the middle of formula deformation, it is not necessary to apply a specific formula.

もう一つのＰＤＦは、ｋ番目のペアの要素を引数としてとる関数、すなわち、あるペアが発生する確率を与える関数である。これを下記式（１０）に示す。関数の具体例については後述する。 Another PDF is a function that takes an element of the k-th pair as an argument, that is, a function that gives a probability that a certain pair will occur. This is shown in the following formula (10). Specific examples of functions will be described later.

ここで、上記式（１０）を全てのｋ（ｋ＝１〜ｎ）について乗算した式を考える（下記式（１１）の右辺）。独立性の定義によれば、式（１１）の左辺（＝式（９））と右辺とが等しくなるのは、ペア単位の分離結果Ｙ_１〜Ｙ_ｎが互いに独立な場合のみである。従って、式（１１）の両辺に「距離」のような尺度を用いると、その尺度はＹ_１〜Ｙ_ｎが独立の場合に最小（理想的には０）となる。 Here, consider an equation obtained by multiplying the above equation (10) for all k (k = 1 to n) (the right side of the following equation (11)). According to the definition of independence, the left side of Equation (11) (= Equation (9)) and the right side are equal only when the pair-wise separation results Y _{1 to} Y _n are independent of each other. Therefore, when a scale such as “distance” is used on both sides of equation (11), the scale is minimum (ideally 0) when Y ₁ to Y _n are independent.

ＰＤＦ間の「距離」を表す尺度として、ＫＬ情報量がある。上記式（１１）の両辺のＰＤＦ間の距離、すなわち、ＫＬ情報量は、Ｙの同時エントロピーＨ（Ｙ）と、Ｙ_１〜Ｙ_ｎのそれぞれのエントロピーＨ（Ｙ_１）〜Ｈ（Ｙ_ｎ）とを用いて、下記式（１２）のように定義される。式（１２）は、エントロピーの定義とＹ＝ＷＸの関係（上記式（５））とを用いて、下記式（１３）のように変形される。ここで、式（１３）に示すＰ(Ｙｋ（ｔ）)は、上記式（１０）のＰＤＦである。 There is a KL information amount as a measure representing “distance” between PDFs. The distance between the PDF of both sides of the equation (11), i.e., KL information amount, and Y co-entropy H of _(Y), Y 1 to Y _n of the respective entropy _H (Y 1) to H _(Y n) And is defined as in the following formula (12). Expression (12) is transformed into the following expression (13) using the definition of entropy and the relationship of Y = WX (the above expression (5)). Here, P (Yk (t)) shown in Equation (13) is the PDF of Equation (10).

ペア単位の分離結果Ｙ_１〜Ｙ_ｎをお互いに独立にするためには、式（１３）で計算されるＫＬ情報量Ｉ（Ｙ）を最小にする分離行列Ｗを、何らかのアルゴリズムによって求めればよい。以下では、アルゴリズムの一例として自然勾配法による式を導出する。 In order to make the pair-wise separation results Y _{1 to} Y _n independent of each other, the separation matrix W that minimizes the KL information amount I (Y) calculated by the equation (13) may be obtained by some algorithm. . In the following, as an example of an algorithm, an equation by a natural gradient method is derived.

Ｉ（Ｙ）を最小にするＷを自然勾配法で求めるには、式（１４）〜式（１６）をＷが収束するまで繰り返せばよい（学習）。ただし、式（１４）は、Ｙ（ｔ）＝ＷＸ（ｔ）を全てのフレームについて行なうことを意味する。また、式（１６）のηは、学習率と呼び、小さな正の実数である（例えば、0.1）。ηは、定数でもよいが、学習の度合いに応じて動的に変更してもよい。例えば、学習の初期は、ηを小さい値（例えば、0.05程度）にして、Ｗのオーバーフローを防ぎ、学習が収束してきたらηを大きい値（例えば、0.5）にして収束を速めるといった処理を行ってもよい。 In order to obtain W that minimizes I (Y) by the natural gradient method, Expressions (14) to (16) may be repeated until W converges (learning). However, Expression (14) means that Y (t) = WX (t) is performed for all frames. In addition, η in equation (16) is called a learning rate and is a small positive real number (for example, 0.1). η may be a constant or may be dynamically changed according to the degree of learning. For example, at the initial stage of learning, η is set to a small value (for example, about 0.05) to prevent W overflow, and when learning has converged, η is set to a large value (for example, 0.5) to accelerate convergence. Also good.

なお、式（１５）の偏微分が具体的にどのような式になるかは、信号の種類やペアの構成方法によって異なる。 Note that the specific expression of the partial differentiation of equation (15) varies depending on the type of signal and the pair configuration method.

以下、上述した信号分離処理方法の適用例について具体的に説明する。 Hereinafter, an application example of the above-described signal separation processing method will be specifically described.

（具体例１）
具体例１として、異なるマイク間隔の協調的分離について説明する。例えば、図７に示すような、広い間隔のマイク（ｍｉｃ１，４）と、狭い間隔のマイク（ｍｉｃ２，３）とを備える場合が考えられる。 (Specific example 1)
As specific example 1, cooperative separation of different microphone intervals will be described. For example, as shown in FIG. 7, there may be a case where microphones (mic1, 4) with wide intervals and microphones (mic2, 3) with narrow intervals are provided.

具体的には、広間隔マイクで観測された信号から得られた観測信号スペクトログラムをＸ_１ ^[１]，Ｘ_２ ^［１］、その分離結果をＹ_１ ^［１］，Ｙ_２ ^［１］、狭間隔マイクについても同様にＸ_１ ^[２]，Ｘ_２ ^［２］、Ｙ_１ ^［２］，Ｙ_２ ^［２］とし、Ｙ_１ ^［１］とＹ_１ ^［２］との間、及びＹ_２ ^［１］とＹ_２ ^［２］との間で対応付けのとれた分離結果を生成するものである。なお、図７に示す分離システムは、４本のマイクを装備しているが、これに限られるものではなく、図８に示すように、同じ位置にあるマイクを複数の分離システムの間で共有しても構わない。この図において、例えば枠で囲まれたＸ_１ ^[１]とＸ_１ ^［２］とは、同一のマイクで観測された信号を２系統の観測信号として用いることを表わしている。 Specifically, the observed signal spectrograms obtained from the signals observed with the wide interval microphone are X ₁ ^[1] , X ₂ ^[1] , and the separation results are Y ₁ ^[1] , Y ₂ ^[1] , narrow. Similarly, the interval microphones are set as X ₁ ^[2] , X ₂ ^[2] , Y ₁ ^[2] , Y ₂ ^[2], between Y ₁ ^[1] and Y ₁ ^[2] , and Y ₂ ^{[ 1]} and Y ₂ ^[2] are generated as a result of separation. In addition, although the separation system shown in FIG. 7 is equipped with four microphones, the present invention is not limited to this, and as shown in FIG. 8, a microphone at the same position is shared among a plurality of separation systems. It doesn't matter. In this figure, for example, X ₁ ^[1] and X ₁ ^[2] surrounded by a frame indicate that signals observed by the same microphone are used as observation signals of two systems.

分離システムｊのω番目の周波数ビンに対応した分離行列をＷ^［ｊ］(ω）、その増分をΔＷ^［ｊ］(ω）とおくと、ΔＷ^［ｊ］(ω）は、下記式（１７）で計算される。この式において、φは、ペアＹ_ｋのＰＤＦを対数微分したものを要素として持つ関数であり、スコア関数や活性化関数と呼ばれる。 When the separation matrix corresponding to the ω-th frequency bin of the separation system j is W ^[j] (ω) and the increment is ΔW ^[j] (ω), ΔW ^[j] (ω) is expressed by the following equation (17). ). In this equation, φ is a function having as an element a logarithmically differentiated PDF of the pair Y _k , and is called a score function or an activation function.

また、ＰＤＦの具体例として、下記式（１８）に示す球状分布と呼ばれる一種の多変量（多次元）ＰＤＦを適用することができる。これは、任意のスカラー関数ｆ（ｘ）に、ペアＹ_ｋ（ｔ）のＬ−Ｎノルム（式（１９））を代入したものとして表現される。 As a specific example of PDF, a kind of multivariate (multidimensional) PDF called a spherical distribution represented by the following formula (18) can be applied. This is expressed as a value obtained by substituting the LN norm (formula (19)) of the pair Y _k (t) into an arbitrary scalar function f (x).

ここで、Ｎは任意の正の数であり、例えばＮ＝２を用いる。係数ｈは、確率の総和を１に調整するための項であるが、スコア関数を導出する際に消えてしまうため、具体的な値を求める必要はない。また、Ｙ_ｋ（ｔ）のノルムの代わりに、要素毎に重みをつけた式（２０）を用いてもよい。スカラー関数ｆ（ｘ）としては、任意の非負関数が使用可能だが、例えば式（２１）や式（２２）を用いる。これらの式において、Ｋ及びｌは、任意の正の値である。 Here, N is an arbitrary positive number, for example, N = 2 is used. The coefficient h is a term for adjusting the total sum of probabilities to 1, but it disappears when the score function is derived, so it is not necessary to obtain a specific value. Further, instead of the norm of Y _k (t), Expression (20) in which a weight is given for each element may be used. As the scalar function f (x), any non-negative function can be used. For example, Expression (21) or Expression (22) is used. In these equations, K and l are arbitrary positive values.

また、ＰＤＦとしては他にも、チャンネルごとのノルムの総和をｆ（ｘ）に代入したり（式（２３）参照。）、同じくノルムの積をｆ（ｘ）に代入しても（式（２４）参照。）構わない。また、球状分布以外の多変量ＰＤＦを用いても構わない。 In addition, as PDF, the sum of norms for each channel is substituted into f (x) (see equation (23)), or the product of norms is also substituted into f (x) (equation ( See 24). Moreover, you may use multivariate PDF other than spherical distribution.

スコア関数の具体例を下記に示す。式（２５）は、式（２０）のＰＤＦと式（２１）のｆ（ｘ）とから導出したスコア関数であり、式（２６）は、式（２０）のＰＤＦと式（２２）とから導出したスコア関数である。これらの式において、Ｋ_ｋ ^［ｊ］(ω)は、式（２１）又は式（２２）のＫに対応しており、ペア毎、チャンネル毎、周波数ビン毎に異なる値を用いてもよい。これにより、収束時のＹやＷのスケールを制御することができる。 A specific example of the score function is shown below. Expression (25) is a score function derived from the PDF of Expression (20) and f (x) of Expression (21). Expression (26) is obtained from the PDF of Expression (20) and Expression (22). This is a derived score function. In these equations, K _k ^[j] (ω) corresponds to K in Equation (21) or Equation (22), and different values may be used for each pair, each channel, and each frequency bin. Thereby, the scale of Y or W at the time of convergence can be controlled.

このようにスコア関数の引数に、周波数ビン間だけでなく、分離システム間、システム内の複数チャンネル間等の、対応をつけたい（ペアにしたい）信号の全てを代入する。これにより、同一のスコア関数に代入されている信号同士はペアを組んだ方が分離結果のKL情報量が低くなる、すなわち信号全体での独立性が高くなるため、チャンネル同士で対応付けられた分離結果が生成されるようになる。 As described above, all the signals to be matched (desired to be paired), such as not only between the frequency bins but also between the separation systems and between the plurality of channels in the system, are substituted into the argument of the score function. As a result, when the signals assigned to the same score function are paired, the KL information amount of the separation result is lower, that is, the independence of the entire signal is higher, so that the channels are associated with each other. A separation result is generated.

続いて、独立性計算部５及びマッピング部６の処理について図９を用いて説明する。 Next, processing of the independence calculation unit 5 and the mapping unit 6 will be described with reference to FIG.

マッピング部６は、上述のようなスコア関数を表現するために、成分（分離結果）とスコア関数の引数との対応関係をとる。独立性計算部５は、例えば、数式（１７）のような勾配法に基づく更新式を用いる場合、スコア関数の計算を行なう。 The mapping unit 6 takes a correspondence relationship between the component (separation result) and the argument of the score function in order to express the score function as described above. For example, the independence calculation unit 5 calculates a score function when using an update expression based on a gradient method such as Expression (17).

ここで、スコア関数は、分離システム毎、チャンネル毎、周波数ビン毎に異なるものを用意する。すなわち、システム全体での分離結果をＹ（ｔ）というベクトルで表わすと、Ｙ（ｔ）の要素数とスコア関数の個数とは同一となる。 Here, different score functions are prepared for each separation system, each channel, and each frequency bin. That is, when the separation result in the entire system is represented by a vector Y (t), the number of elements of Y (t) and the number of score functions are the same.

つまり、スコア関数は、同一の引数をとるもの同士でグルーピングされる。例えば、グループ１のスコア関数は、それぞれ同一の引数をとることを表す。なお、φ_１１ ^[１]〜φ_１Ｍ ^[１]は、分離システム１の１番目のチャンネルに対応したスコア関数を表し、φ_１１ ^[１]は、１番目の周波数ビンに対応した単一のスコア関数を表している。 That is, the score functions are grouped by those having the same argument. For example, the score function of group 1 represents taking the same argument. Φ ₁₁ ^{[1] to} φ _1M ^[1] represents a score function corresponding to the first channel of the separation system 1, and φ ₁₁ ^[1] represents a single score corresponding to the first frequency bin. Represents a function.

マッピング部６は、各分離システムから対応する成分を取り出し、ペアを構成する。例えば、ペア１は、それぞれの分離システムの１番目のチャンネルから構成されるペアである。同様に、ペアｋは、それぞれの分離システムのｋ番目のチャンネルから構成されるペアである。 The mapping unit 6 extracts corresponding components from each separation system and configures a pair. For example, pair 1 is a pair composed of the first channel of each separation system. Similarly, pair k is a pair composed of the kth channel of each separation system.

マッピング部６で構成されたペアは、独立性計算部５にて同一の引数をとるスコア関数に供給される。例えば、ペア１は、グループ１のスコア関数に供給され、同様に、ペアｋは、グループｋに供給される。 The pair configured by the mapping unit 6 is supplied to the score function that takes the same argument by the independence calculation unit 5. For example, pair 1 is supplied to the score function of group 1, and similarly pair k is supplied to group k.

なお、ペアを構成するという処理は、必ずしもデータのコピーを伴う必要はなく、ポインタ等による参照でも構わない。例えば、グループ１がペア１を参照し、ペア１の成分Ｙ_１ ^[１]が分離システム１の成分Ｙ_１ ^[１]を参照するという方式でも構わない。 Note that the process of forming a pair does not necessarily need to be accompanied by a copy of data, and may be referred to by a pointer or the like. For example, group 1 refers to the pair 1, component Y ₁ ^[1] of the pair 1 may be a method of referring to the components Y ₁ ^[1] of the separation system 1.

（具体例２）
次に、異なる音響の分析について説明する。例えば、分離システム間でマイクペアは共有しているが、分離システム間で音響分析が異なる場合が考えられる。ここでいう音響分析とは、サンプリングや離散フーリエ変換（DFT）などのことである。例えば、２つの分離システムで２本のマイクを共有していて、一方は１６ｋＨｚサンプリングで、他方は４８kHz サンプリングである場合がこれに該当する。他にも、同一サンプリング周波数でも、両者でDFTの窓長が異なる場合（例えば、５１２と２０４８）や、窓長が同じでもシフト幅が異なる場合（例えば、１２８と２５６）や、窓の種類が異なる場合（例えば、ハニング窓と平方根ハニング窓）なども、異なる音響分析に該当する。 (Specific example 2)
Next, analysis of different sounds will be described. For example, the microphone pair is shared between the separation systems, but the acoustic analysis may be different between the separation systems. The acoustic analysis here refers to sampling, discrete Fourier transform (DFT), and the like. For example, two separation systems share two microphones, one with 16 kHz sampling and the other with 48 kHz sampling. In addition, even when the same sampling frequency is used, the DFT window length is different between the two (for example, 512 and 2048), the window length is the same but the shift width is different (for example, 128 and 256), or the window type is different. Different cases (for example, Hanning window and square root Hanning window) correspond to different acoustic analysis.

このような場合についても、具体例１と同一の式を用いることで、対応付けのとれた分離結果を生成することができる。ただし、音響分析が異なると、フレーム間隔も異なる可能性がある。例えば、１６ｋＨｚでサンプリングされた信号に対して、シフト幅１２８のＤＦＴを適用すると、１フレームあたりの長さは８ｍｓであり、シフト幅１６０のＤＦＴを適用すると、１フレームあたりの長さは１０ｍｓである。このような場合、式（１０）のＹ_ｋ（ｔ）のような列ベクトルを直接構成することはできないが、例えば、線形補間等の何らかの補間方法を用いることにより、両者のフレーム間隔を統一することができる。また、必要に応じてフレームデータを重複させることで、フレーム間隔を調整することができる。例えば、図１０に示すように、Ｙ_１ ^[１]が８ｍｓ間隔のフレーム、Ｙ_１ ^［２］が１０ｍｓ間隔のフレームである場合、Ｙ_１ ^［２］のデータを４フレームに１回の割合で重複させれば、Ｙ_１ ^[１]とＹ_１ ^［２］との間隔を合わせることができる。または逆に、Ｙ_１ ^[１] を５フレームに１回の割合で間引いてもよい。 Even in such a case, by using the same formula as in the first specific example, it is possible to generate a separation result that is associated with each other. However, if the acoustic analysis is different, the frame interval may be different. For example, when a DFT with a shift width of 128 is applied to a signal sampled at 16 kHz, the length per frame is 8 ms, and when a DFT with a shift width of 160 is applied, the length per frame is 10 ms. is there. In such a case, a column vector like Y _k (t) in equation (10) cannot be directly constructed, but for example, by using some interpolation method such as linear interpolation, the frame interval between them is unified. be able to. Further, the frame interval can be adjusted by duplicating the frame data as necessary. For example, as shown in FIG. 10, frame _{Y 1} ^[1] is 8ms _intervals, if ^{Y 1 [2]} is a frame 10ms _interval, ^{Y 1} with every other data to four frames of ^[2] If they are overlapped, the interval between Y ₁ ^[1] and Y ₁ ^[2] can be matched. Or conversely, Y ₁ ^[1] may be thinned out at a rate of once every 5 frames.

以上、具体例１及び具体例２では、異なるマイク間隔と異なる音響分析とを別の例として紹介したが、もちろん、具体例１と具体例２との組み合わせも可能である。例えば、広いマイク間隔の分離システムと狭いマイク間隔の分離システムとの間で異なる音響分析を用いる場合でも、具体例２と同一の方法で、対応付けの取れた分離結果を生成することができる。 As described above, in the specific example 1 and the specific example 2, different microphone intervals and different acoustic analysis have been introduced as different examples. Of course, a combination of the specific example 1 and the specific example 2 is also possible. For example, even when different acoustic analysis is used between a separation system with a wide microphone interval and a separation system with a narrow microphone interval, it is possible to generate an associated separation result by the same method as in the second specific example.

（具体例３）
次に、本発明の簡易的な実現方法について説明する。これは、特許文献２のようなチャンネルごとの分離を行なうシステムを流用して「ペア単位の分離」を実現する方法、言い換えると、図２１において分離システムが１つしか存在しない場合に「ペア単位の分離」を実現する方法についてのものである。 (Specific example 3)
Next, a simple method for realizing the present invention will be described. This is a method for realizing “separation of pair units” by diverting a system for performing separation for each channel as in Patent Document 2, in other words, when there is only one separation system in FIG. It is about the method of realizing “separation”.

図１１は、簡易的に信号分離を実現するための処理を示す模式図である。ここで、観測信号Ｘ_１ ^［１］〜Ｘ_ｎ ^［ｍ］のｎは、分離システムを表し、ｍは信号のペアを表す。先ず、観測信号に対し、並べ替えを行なう。すなわち、この図では横方向に並んでいる観測信号のペアＸ_ｋ ^[１]〜Ｘ_ｋ ^［ｍ］を縦方向に並べ替え、それをＸ'_ｋとする。次に、Ｘ'_ｋを一枚のスペクトログラムとみなして、Ｘ'_１〜Ｘ'_ｎをチャンネル毎に独立な成分Ｙ'_１〜Ｙ'_ｎに分離する。Ｘ'_１〜Ｘ'_ｎの分離においては、分離システムは１つだけあればよい。この分離結果Ｙ'_１〜Ｙ'_ｎも、Ｘ'_１〜Ｘ'_ｎと同様の並び方をしている。分離結果に対し、観測信号に対する並べ替えと逆の並べ替えを行なうと、所望の分離結果が得られる。 FIG. 11 is a schematic diagram showing processing for realizing signal separation simply. Here, _n of the observation signals X ₁ ^{[1] to} X _n ^[m] represents a separation system, and m represents a signal pair. First, the observation signals are rearranged. That is, in this figure, the pairs of observation signals X _k ^{[1] to} X _k ^[m] arranged in the horizontal direction are rearranged in the vertical direction, which is _denoted as X ′ _k . Next, X _'k is regarded as one of the spectrogram, X' is separated into ₁ to X 'independent components Y for each channel _{_n'} ₁ _{~Y 'n.} In the separation of X ′ _{1 to} X ′ _n , only one separation system is required. The separation results Y ′ _{1 to} Y ′ _n are arranged in the same manner as X ′ _{1 to} X ′ _n . A desired separation result can be obtained by rearranging the separation result in reverse order to the rearrangement with respect to the observation signal.

同様に、図７に示す具体例のように、周波数ビン数＝Ｍ、チャンネル数＝２の分離システムが２系等ある場合は、例えば、周波数ビン数＝２Ｍ、チャンネル数＝２とみなすことにより、簡易的に信号分離を実現することができる。 Similarly, as in the specific example shown in FIG. 7, when there are two separation systems having the frequency bin number = M and the channel number = 2, for example, by assuming that the frequency bin number = 2M and the channel number = 2. Thus, signal separation can be realized easily.

（具体例４）
また、チャンネル数は分離システムの間で異なっていても構わない。図１２は、３入力、３出力の分離システム１及び２入力、２出力の分離システム２における信号の対応付けの例を示す図である。この構成例において、１番目と２番目のチャンネルについては、それぞれペア内で対応付ける。分離システム２については、ｎ＝２、ｍ＝２、ｊ＝２として式（１７）〜式（２６）を適用する。そして、分離システム１については、３番目のチャンネルに対応したスコア関数に変更を加える。 (Specific example 4)
Also, the number of channels may be different between separation systems. FIG. 12 is a diagram illustrating an example of signal correspondence in the separation system 1 with three inputs and three outputs and the separation system 2 with two inputs and two outputs. In this configuration example, the first and second channels are associated with each other in pairs. For the separation system 2, Equations (17) to (26) are applied with n = 2, m = 2, and j = 2. And about the separation system 1, a change is made to the score function corresponding to the 3rd channel.

分離システム１で使用するスコア関数は、下記式（２７）のように表される。ここで、３チャンネル目だけスコア関数の引数が異なる理由は、対応する成分が分離システム２には存在しないからである。従って、３チャンネル目は、式（２５）及び式（２６）とは異なるスコア関数を使用し、例えば、式（２８）で導出される式（２９）を用いる。この式において、シグマ記号についているｊ≠２は、２番目以外の分離システムについての総和を意味する。図１２の例では、分離システム数＝２なので、ｊ≠２は、ｊ＝１と同じ意味となる。 The score function used in the separation system 1 is expressed as the following equation (27). Here, the reason why the argument of the score function differs only for the third channel is that the corresponding component does not exist in the separation system 2. Therefore, the third channel uses a score function different from those in Expression (25) and Expression (26), and uses, for example, Expression (29) derived from Expression (28). In this equation, j ≠ 2 attached to the sigma symbol means the sum for the other separation systems. In the example of FIG. 12, since the number of separation systems = 2, j ≠ 2 has the same meaning as j = 1.

なお、上記式（２７）において、２チャンネル目のスコア関数についても式（２９）を用いる（式（２７）の右辺において、上から２番目の要素の代わりに、式（２９）でｋ＝２とした式を用いる）と、２番目のペアの対応付けがとれなくなる。すなわち、Ｙ_２ ^［２］に対応する成分がＹ_２ ^［１］とＹ_３ ^［１］とのどちらに現われるかが不定となる。言い換えると、チャンネル毎にスコア関数の引数の個数を異ならせると、チャンネル毎に対応付けの有無を制御することができる。 In the above equation (27), equation (29) is also used for the score function of the second channel (in the right side of equation (27), k = 2 in equation (29) instead of the second element from the top) And the second pair cannot be associated with each other. That is, it is undefined whether the component corresponding to Y ₂ ^[2] appears in Y ₂ ^[1] or Y ₃ ^[1] . In other words, if the number of arguments of the score function is made different for each channel, the presence / absence of association can be controlled for each channel.

すなわち、対応付けをとりたい成分については、それらをスコア関数の引数に含め、対応付ける必要がない、又は対応付けをとることができない成分については、スコア関数の引数から除外することにより、チャンネル毎の対応付けを制御することができる。 In other words, components that are to be associated are included in the argument of the score function, and components that do not need to be associated or cannot be associated are excluded from the argument of the score function. The association can be controlled.

（具体例５）
また、分離システムの中には、出力を固定したものが含まれていてもよい。適用の一例として、マイクとスピーカとを装備したシステムにおいて、マイクで収録した音からスピーカ由来の音を除去するエコーキャンセルが挙げられる。以下では、固定出力を持つシステムも「分離システム」と表現する。 (Specific example 5)
Further, the separation system may include a fixed output. As an example of application, in a system equipped with a microphone and a speaker, echo cancellation for removing the sound derived from the speaker from the sound recorded by the microphone can be mentioned. In the following, a system having a fixed output is also expressed as a “separation system”.

図１３は、信号源が２つあり、一方は音声であり、他方はスピーカから出る音（音楽）である場合の信号分離を説明するための模式図である。 FIG. 13 is a schematic diagram for explaining signal separation when there are two signal sources, one is sound, and the other is sound (music) from a speaker.

ここでは、任意の数の音源からの混合音を、音源数以上の複数のマイクで収録する。また、スピーカから出る音は、ライン出力端子などを通じて分離システムの一方に直接入力される。直接入力される信号をＸ_１ ^[２]とする。なお、スピーカが複数ある場合、それらからは他の音源とは混ざっていない信号が直接入力されているとし、それらをＸ_２ ^[２]，Ｘ_３ ^[２],・・・とする。 Here, the mixed sound from any number of sound sources is recorded by a plurality of microphones equal to or more than the number of sound sources. In addition, sound output from the speaker is directly input to one side of the separation system through a line output terminal or the like. Let X ₁ ^[2] be a directly input signal. When there are a plurality of speakers, signals that are not mixed with other sound sources are directly input from these speakers, and these are assumed to be X ₂ ^[2] , X ₃ ^[2] ,.

また、２種類の分離システムを用意する。一方は、マイクで観測された信号を独立な成分に分離する分離システムであり、他方は、直接入力の信号Ｘ_１ ^[２]をそのまま出力する分離システムである。これは、具体例４において、２番目の分離システムの分離行列を単位行列に固定した特別な場合とみなすことができる。従って、具体例４と同様の式を用いることができる。 In addition, two types of separation systems are prepared. One is a separation system that separates a signal observed by a microphone into independent components, and the other is a separation system that outputs a directly input signal X ₁ ^[2] as it is. This can be regarded as a special case in which the separation matrix of the second separation system is fixed to the unit matrix in the specific example 4. Therefore, the same formula as in specific example 4 can be used.

すなわち、分離システム１については、具体例４と同様に式（１７）、式（２７）〜式（２９）を用いて分離行列を更新し、分離システム２については、分離行列の更新は行なわず、Ｙ_ｋ ^[２]＝Ｘ_ｋ ^[２]とする。 That is, the separation system 1 is updated using the equations (17) and (27) to (29) similarly to the specific example 4, and the separation matrix 2 is not updated for the separation system 2. , Y _k ^[2] = X _k ^[2] .

これにより、分離システム１の各チャンネルには、分離システム２の各チャンネルに対応した信号が、まるで引き寄せられるように分離される。そして、分離システム１の残りのチャンネルには、スピーカ由来の音がキャンセルされた音が出力される。図１３の例では、スピーカ由来の音に対応した成分がＹ_１ ^[１]に出力され、人の音声に対応した成分がＹ_２ ^[１]に出力される。 Thereby, the signal corresponding to each channel of the separation system 2 is separated into each channel of the separation system 1 so as to be drawn. And the sound from which the sound derived from a speaker was canceled is output to the remaining channels of the separation system 1. In the example of FIG. 13, a component corresponding to a speaker-derived sound is output to Y ₁ ^[1] , and a component corresponding to a human voice is output to Y ₂ ^[1] .

（具体例６）
また、上述の具体例は、分離システム間で信号が混合しない（＝ペア内の信号は混合しない）と仮定していたが、ペア内で信号が混合する場合にも本発明は適用可能である。 (Specific example 6)
In the above-described specific example, it is assumed that the signals are not mixed between the separation systems (= the signals in the pair are not mixed), but the present invention can also be applied to the case where the signals are mixed in the pair. .

図１４は、３つの音源を３つのマイクで収録し、それらを３つの成分に分離する処理を説明するための模式図である。ここで、音源のうち２つは、相関性の高い信号を発生していると仮定する。例えば、２台のラジオから同一の番組が放送されている場合や、ステレオスピーカーの左右チャンネルである場合などが該当する。従って、３つのマイクで収録した音を３つの成分に分離する際、２つの成分間には依存関係があって、例えば、Ｙ_１ ^[１]と、Ｙ_２ ^[１]とのペアには依存関係があり、そのペアとＹ_３ ^[１]単独とが独立となるように分離する。 FIG. 14 is a schematic diagram for explaining a process of recording three sound sources with three microphones and separating them into three components. Here, it is assumed that two of the sound sources generate highly correlated signals. For example, a case where the same program is broadcast from two radios or a case where the left and right channels of a stereo speaker are used. Therefore, when the sound recorded by three microphones is separated into three components, there is a dependency between the two components, for example, depending on the pair of Y ₁ ^[1] and Y ₂ ^[1]. There is a relationship, and the pair and Y ₃ ^[1] alone are separated so as to be independent.

Ｙ_１ ^［１］及びＹ_２ ^[１]のペアと、Ｙ_３ ^[１]単独とが独立であることは、ＰＤＦを用いて式（３０）のように書き表すことができる。従って、式（３０）の両辺の距離をＫＬ情報量で表し、それを最小化するようなアルゴリズムで分離行列Ｗを更新していけば、上述のような分離結果を求めることができる。 The fact that the pair of Y ₁ ^[1] and Y ₂ ^[1] and Y ₃ ^[1] alone are independent can be expressed as in Expression (30) using PDF. Therefore, if the distance between both sides of the expression (30) is expressed by the KL information amount and the separation matrix W is updated with an algorithm that minimizes the distance, the separation result as described above can be obtained.

なお、分離行列の更新は、式（１７）で行なえばよい。ただし、スコア関数は、式（３１）を用いる。すなわち、依存関係を持たせたい成分をスコア関数の引数に含める。 Note that the update of the separation matrix may be performed using Expression (17). However, equation (31) is used for the score function. That is, a component to be given a dependency is included in the argument of the score function.

このようなスコア関数を表現するため、マッピング部６及び独立計算部５は、図９に示す処理とは異なる処理を行なう。図１５は、独立性計算部５及びマッピング部６の処理を説明するための模式図である。 In order to express such a score function, the mapping unit 6 and the independent calculation unit 5 perform processing different from the processing shown in FIG. FIG. 15 is a schematic diagram for explaining the processing of the independence calculation unit 5 and the mapping unit 6.

独立性計算部５は、同一の引数をとるスコア関数をグルーピングする。ここでは、チャンネル１とチャンネル２とでペアを構成しているため、これらのチャンネルに対応したスコア関数で一つのグループ１が構成され、チャンネル３に対応したスコア関数だけでもう一つのグループ２が構成される。 The independence calculation unit 5 groups score functions that take the same argument. Here, since a pair is composed of channel 1 and channel 2, one group 1 is composed of score functions corresponding to these channels, and another group 2 is composed of only the score function corresponding to channel 3. Composed.

マッピング部６は、チャンネル１とチャンネル２とでペア１を構成し、それを独立性計算部５のグループ１中のスコア関数に供給する。また、チャンネル３だけでもう一つのペアを構成し、それを独立性計算部５のグループ２のスコア関数に供給する。 The mapping unit 6 forms a pair 1 with the channel 1 and the channel 2 and supplies it to the score function in the group 1 of the independence calculation unit 5. Further, another pair is formed by only channel 3 and is supplied to the score function of group 2 of the independence calculation unit 5.

（具体例７）
また、従来法のＩＣＡにおいて、分離行列Ｗが正規直交行列であるという制約の下で高速に収束するアルゴリズムが知られているが、それらのアルゴリズムを適用することも可能である。なお、以下の説明では、正規直交制約に基づくアルゴリズムの例として、正規直交制約付き勾配法について説明するが、その他のニュートン法や不動点法などのアルゴリズムも適用可能である。 (Specific example 7)
Further, in the conventional ICA, algorithms that converge at high speed under the restriction that the separation matrix W is an orthonormal matrix are known, but it is also possible to apply these algorithms. In the following description, the gradient method with an orthonormal constraint will be described as an example of an algorithm based on the orthonormal constraint, but other algorithms such as the Newton method and the fixed point method are also applicable.

正規直交行列とは、下記式（３２）を満たす行列のことである。Ｗのスパース性を考慮すると、この式は、分離システム毎、且つ周波数ビン毎の分離行列が正規直交であることを表す式（３３）と同値である。 The orthonormal matrix is a matrix that satisfies the following equation (32). Considering the sparsity of W, this equation is equivalent to the equation (33) indicating that the separation matrix for each separation system and each frequency bin is orthonormal.

正規直交制約を分離行列に適用するためには、観測信号を事前に無相関化しておく必要がある。無相関化は、白色化（whitening）や球状化（sphering）とも呼ばれ、ここではチャンネル間の相関係数を０にする処理である。式（３２）が式（３３）と同値であることから、無相関化は、分離システム毎、且つ周波数ビン毎に行なえばよい。つまり、「分離システム全体での同時無相関化」といった処理は不要である。無相関化の過程について下記式を用いて説明する。 In order to apply the orthonormal constraint to the separation matrix, the observed signal needs to be decorrelated in advance. The decorrelation is also called whitening or sphering, and here is a process of setting the correlation coefficient between channels to zero. Since equation (32) is equivalent to equation (33), decorrelation may be performed for each separation system and for each frequency bin. That is, processing such as “simultaneous decorrelation in the entire separation system” is unnecessary. The process of decorrelation will be described using the following equation.

Σ^［ｊ］(ω)は、分離システムｊ、周波数ビン＝ωにおける観測信号の分散共分散行列である（式（３４））。また、Σ^［ｊ］(ω)は、固有値λ_ｋ ^［ｊ］(ω)と固有ベクトルp_k ^［ｊ］(ω)とを用いて式（３５）のように表わすことができる。固有値λ_ｋ ^［ｊ］(ω)からなる対角行列をΛ^［ｊ］(ω)（式（３８））、固有ベクトルからなる行列をＰ^［ｊ］(ω)（式（３７））とし、Ｘ^［ｊ］(ω,t)に対して式（３６）による変換をかけると、結果であるＸ'^［ｊ］(ω,t)の各成分は、互いに無相関となる。すなわち、Ｅｔ[Ｘ'^［ｊ］(ω,t) Ｘ'^［ｊ］(ω,t)^Ｈ]＝Ｉを満たす。 Σ ^[j] (ω) is the variance-covariance matrix of the observed signal in the separation system j and frequency bin = ω (formula (34)). Further, Σ ^[j] (ω) can be expressed as Equation (35) using the eigenvalue λ _k ^[j] (ω) and the eigenvector p _k ^[j] (ω). A diagonal matrix composed of eigenvalues λ _k ^[j] (ω) is Λ ^[j] (ω) (formula (38)), a matrix composed of eigenvectors is P ^[j] (ω) (formula (37)), and X ^{When [j]} (ω, t) is transformed according to the equation (36), the resulting components of X ′ ^[j] (ω, t) are uncorrelated with each other. That is, Et [X ′ ^[j] (ω, t) X ′ ^[j] (ω, t) ^H ] = I is satisfied.

また、学習は、式（１７）の代わりに式（４０）で行なう。この式は、式（１７）に対して正規直交制約を適用して得られるものである。この式のスコア関数φについては、式（１７）で用いたものと同一である。 In addition, learning is performed using equation (40) instead of equation (17). This equation is obtained by applying an orthonormal constraint to the equation (17). The score function φ in this equation is the same as that used in equation (17).

また、ＩＣＡのアルゴリズムの中には、無相関化と分離との両方をΔＷの式に組み込んだものも存在し、これは、「独立性による等分散適応的分離」(Equivariant Adaptive Separation via Independence: EASI)と呼ばれている。式（４０）に対してＥＡＳＩを適用すると式（４１）を得る。この式（４１）を用いれば、Ｉ−Ｙ^［ｊ］(ω,t) Ｙ^^［ｊ］(ω,t)^Ｈという項が無相関化を行なうため、事前の無相関化は不要である。 Also, some ICA algorithms incorporate both decorrelation and separation into the ΔW equation, which is equivalent to “Equivariant Adaptive Separation via Independence: EASI). Applying EASI to equation (40) yields equation (41). If this equation (41) is used, the term I−Y ^[j] (ω, t) Y ^ ^[j] (ω, t) ^H is decorrelated, so that prior decorrelation is unnecessary. .

なお、ＥＡＳＩについては、文献（『詳解独立成分分析――信号解析の新しい世界――』Aapo Hyvarinen 他，東京電機大学出版局，pp.269272，12.5 独立性による等分散適応的分離）を参考されたい。 For EASI, reference is made to the literature (“Detailed Independent Component Analysis – New World of Signal Analysis”, Aapo Hyvarinen et al., Tokyo Denki University Press, pp.269272, 12.5 Equally Distributed Adaptive Separation by Independence) I want.

（具体例８）
マイク間隔が異なる分離システムを複数用い、後で特定の周波数領域を抽出・合成する場合、使用されない周波数ビンが存在する。この使用されない周波数ビンは、あらかじめＸ_ｋ ^［ｊ］から除外しておくことにより、ＩＣＡの計算量を削減することができる。 (Specific example 8)
When a plurality of separation systems having different microphone intervals are used and a specific frequency region is extracted and synthesized later, there are frequency bins that are not used. By removing these unused frequency bins from X _k ^{[j] in} advance, the amount of ICA calculation can be reduced.

（具体例９）
具体例２では、フレーム間隔の異なるシステム同士を統合する方法として、図１０に示すようにフレーム間隔の荒い側（Ｙ_ｋ ^[２]）を間隔の細かい側（Ｙ_ｋ ^[１]）にあわせる方法を説明したが、他の例として、時刻の一致するフレームのみを用いて学習するようにしてもよい。 (Specific example 9)
In the second specific example, as a method of integrating systems having different frame intervals, as shown in FIG. 10, a method in which the rough side (Y _k ^[2] ) of the frame interval is adjusted to the fine side (Y _k ^[1] ). However, as another example, learning may be performed using only frames with the same time.

例えば、図１０において、最初のフレームの次は、４０ｍｓ後に再びフレーム時刻が一致している。すなわち、Ｙ_ｋ ^[１]側は、５フレーム毎（１,６,１１,１６,・・・）、Ｙ_ｋ ^[２]側は、４フレーム毎（１,５,９,１３,・・・）に時刻が一致するため、時刻が一致するフレームのみを観測信号から抽出し、そのデータを用いて全体の分離行列を学習する。この段階の分離行列は、システム間の出力の対応はとれているものの、間引いたデータで学習されたため、全データで学習した場合と比べて精度は落ちることから、この分離行列を初期値として、それぞれの分離システムにおいて一層の学習を行なう。その際は、全てのフレームを用いて学習を行なう。 For example, in FIG. 10, after the first frame, the frame times coincide again after 40 ms. That is, the Y _k ^[1] side is every 5 frames (1, 6, 11, 16,...), And the Y _k ^[2] side is every 4 frames (1, 5, 9, 13,... Therefore, only the frames with the same time are extracted from the observation signal, and the entire separation matrix is learned using the data. Although the separation matrix at this stage has correspondence between the outputs between the systems, it is learned with the thinned data, so the accuracy is lower than when learning with all the data, so this separation matrix is the initial value, Further learning is performed in each separation system. In that case, learning is performed using all frames.

（具体例１０）
次に、分離システム間でマイクの設置位置が異なる場合の協調的分離について、図１６を用いて説明する。 (Specific Example 10)
Next, cooperative separation when the microphone installation positions are different between the separation systems will be described with reference to FIG.

協調的分離の説明に入る前に、ＩＣＡとビームフォーミングとの関係について説明する。時間周波数領域のＩＣＡによって求められた分離行列は、マイクロフォンアレイを用いたビームフォーミングの各種方法のうち、死角形成型と等価であることが知られている。両者の等価性については、例えば以下の文献を参照されたい。 Before describing the cooperative separation, the relationship between ICA and beamforming will be described. It is known that the separation matrix obtained by ICA in the time-frequency domain is equivalent to the blind spot forming type among various beam forming methods using a microphone array. For the equivalence of the two, refer to the following documents, for example.

荒木章子, 牧野昭二, 向井良, 猿渡洋, “周波数領域ブラインド音源分離と周波数領域適応ビームフォーマの関係について” 日本音響学会2001年秋秋季研究発表会講演論文集 2-6-12, pp. 613-614 Akiko Araki, Shoji Makino, Ryo Mukai, Hiroshi Saruwatari, “Relationship between Frequency Domain Blind Source Separation and Frequency Domain Adaptive Beamformer” Proceedings of the 2001 Acoustical Society of Japan Annual Meeting 2-6-12, pp. 613- 614

従って、ＩＣＡの分離結果は、次のように解釈することができる。例えば、チャンネルｋにｉ番目の音源が出力されている場合というのは、マイクロフォンアレイによってｉ番目以外の音源の方向に死角（感度の低い方向）が形成され、ｉ番目以外の音源の音が抑圧されることで、結果としてｉ番目の音源からの音のみが残っていると考えられる。 Therefore, the separation result of ICA can be interpreted as follows. For example, when the i-th sound source is output to the channel k, a dead angle (a low sensitivity direction) is formed in the direction of the sound source other than the i-th by the microphone array, and the sound of the sound source other than the i-th sound is suppressed. As a result, it is considered that only the sound from the i-th sound source remains as a result.

形成される死角の鋭さはマイクの設置位置に影響を受けるため、ＩＣＡの分離性能もマイクの配置の影響を受ける。例えば、直線状に配置されたマイクを用いて死角を形成する場合、マイク列と直交する方向に対しては幅の狭い（鋭い）死角を形成できるが、マイク列の延長線上に対しては幅の広い死角しか形成できない。従って、直線状に配置されたマイクを用いるＩＣＡでは、マイク列と直交する方向に位置する複数の音源同士を高い精度で分離することは容易だが、マイク列の延長線上に位置する複数の音源同士を高い精度で分離するのは難しい。 Since the sharpness of the formed blind spot is affected by the microphone installation position, the ICA separation performance is also affected by the microphone arrangement. For example, when forming a blind spot using microphones arranged in a straight line, a narrow (sharp) blind spot can be formed in a direction orthogonal to the microphone array, but a width is formed on the extension line of the microphone array. Only a wide blind spot can be formed. Therefore, in ICA using microphones arranged in a straight line, it is easy to separate a plurality of sound sources located in a direction orthogonal to the microphone row with high accuracy, but a plurality of sound sources located on an extension line of the microphone row are arranged. Is difficult to separate with high accuracy.

ここで、図１６に示すマイク配置について説明する。この図では分離システムが２つあり、一方（分離システム１）は横方向に配置されたマイクを使用し、もう一方（分離システム２）は縦方向に配置されたマイクを使用するものとする。（この図では、中央のマイクは両システムで共有されているが、別々に用意しても構わない。また、各システムは５個のマイクを有しているが、音源数以上ならば何個でも構わない。）また、４つの音源が存在し、それらは同時に鳴っているとする。 Here, the microphone arrangement shown in FIG. 16 will be described. In this figure, there are two separation systems, one (separation system 1) uses a microphone arranged in the horizontal direction, and the other (separation system 2) uses a microphone arranged in the vertical direction. (In this figure, the central microphone is shared by both systems, but you may prepare them separately. Each system has five microphones, but if there are more than the number of sound sources, how many. However, it is assumed that there are four sound sources and they are playing simultaneously.

分離システム１で使用するマイクにとって、音源１と音源２はマイク列とほぼ直交する方向に位置しているため、それぞれの音源からの音は、鋭い死角によってピンポイント的にマスクすることができる。従って、分離システム１は音源１と音源２とを高い精度で分離することができる。一方、音源３と音源４は、マイク列のほぼ延長線上に位置しているため、広い死角で両音源をまとめてマスクすることは容易だが、一方の音源のみをマスクしてもう一方を残すといったことは難しい。以上をまとめると、分離システム１の分離結果は以下のような特徴をもつ。 For the microphone used in the separation system 1, the sound source 1 and the sound source 2 are positioned in a direction substantially orthogonal to the microphone row, so that the sound from each sound source can be pinpointly masked by a sharp blind spot. Therefore, the separation system 1 can separate the sound source 1 and the sound source 2 with high accuracy. On the other hand, since the sound source 3 and the sound source 4 are located almost on the extension line of the microphone row, it is easy to mask both sound sources together with a wide blind spot, but only one sound source is masked and the other is left. It ’s difficult. In summary, the separation result of the separation system 1 has the following characteristics.

音源１が主に出力されているチャンネル：
高い精度で分離されている。音源２は鋭い死角でマスクされ、音源３と４は広い死角でまとめてマスクされるため。 Channel to which sound source 1 is mainly output:
Separated with high accuracy. Sound source 2 is masked with a sharp blind spot, and sound sources 3 and 4 are masked together with a wide blind spot.

音源２が主に出力されているチャンネル：
高い精度で分離されている。音源１は鋭い死角でマスクされ、音源３と４は広い死角でまとめてマスクされるため。 Channel to which sound source 2 is mainly output:
Separated with high accuracy. Sound source 1 is masked with a sharp blind spot, and sound sources 3 and 4 are masked together with a wide blind spot.

音源３が主に出力されているチャンネル：
音源４が残っているように聞こえる。音源１と２は鋭い死角でそれぞれマスクされるのに対し、音源４は広い死角でマスクされ、音源３もある程度は抑圧されるため。 Channel to which sound source 3 is mainly output:
Sounds like sound source 4 remains. The sound sources 1 and 2 are each masked with a sharp blind spot, whereas the sound source 4 is masked with a wide blind spot, and the sound source 3 is suppressed to some extent.

音源４が主に出力されているチャンネル：
音源３が残っているように聞こえる。音源１と２は鋭い死角でそれぞれマスクされるのに対し、音源３は広い死角でマスクされ、音源４もある程度は抑圧されるため。 Channel to which sound source 4 is mainly output:
Sounds like sound source 3 remains. The sound sources 1 and 2 are each masked with a sharp blind spot, whereas the sound source 3 is masked with a wide blind spot, and the sound source 4 is suppressed to some extent.

一方、分離システム２が使用するマイクは、分離システム１のマイクとは直交しているため、音源とマイク列との位置関係が分離システム１とは逆になる。従って、分離システム２の分離結果は以下のような特徴をもつ。 On the other hand, since the microphone used by the separation system 2 is orthogonal to the microphone of the separation system 1, the positional relationship between the sound source and the microphone row is opposite to that of the separation system 1. Therefore, the separation result of the separation system 2 has the following characteristics.

音源１が主に出力されているチャンネル：
音源２が残っているように聞こえる。 Channel to which sound source 1 is mainly output:
Sounds like sound source 2 remains.

音源２が主に出力されているチャンネル：
音源１が残っているように聞こえる。 Channel to which sound source 2 is mainly output:
Sounds like sound source 1 remains.

音源３が主に出力されているチャンネル：
高い精度で分離される。 Channel to which sound source 3 is mainly output:
Separated with high accuracy.

音源４が主に出力されているチャンネル：
高い精度で分離される。 Channel to which sound source 4 is mainly output:
Separated with high accuracy.

以上をまとめると、マイク列と音源とが図１６の関係にある場合、音源１と音源２に対応した分離結果については分離システム１のものを採用したほうが良く、音源３と音源４については分離システム２のものを採用した方が良い。このように、マイク配置の異なる分離システムを複数用意することで、分離結果のうち、それぞれのマイク配置が得意とする方向の音源に対応したチャンネルのみを採用することが可能となる。 In summary, when the microphone array and the sound source are in the relationship shown in FIG. 16, it is better to adopt the separation system 1 for the separation results corresponding to the sound source 1 and the sound source 2, and the sound source 3 and the sound source 4 are separated. It is better to adopt the system 2. In this way, by preparing a plurality of separation systems with different microphone arrangements, it is possible to employ only the channel corresponding to the sound source in the direction in which each microphone arrangement is good, among the separation results.

このような、分離システム間でのチャンネルの取捨選択を行なうためには、その前に、分離システム間でチャンネルの対応付けができている必要がある。具体例１と同様の方法や式を用いると、分離システム間でチャンネルの対応付けがとられた出力を得ることができる。例えば、分離システムの１番目のチャンネルに音源４が出力されている場合、分離システム２の１番目のチャンネルにも音源４が出力される。 In order to perform channel selection between separation systems as described above, it is necessary to associate channels between separation systems before that. By using the same method and formula as in the first specific example, it is possible to obtain an output in which the channels are associated between the separation systems. For example, when the sound source 4 is output to the first channel of the separation system, the sound source 4 is also output to the first channel of the separation system 2.

以上説明したように、対応を付けたい出力チャンネルで同一の引数をとるスコア関数を用いることにより、複数の分離システムの間で、出力チャンネル同士の対応付けのとれた分離結果を生成することができる。さらに、「ペア内の信号は依存関係があるが、ペア間の信号は互いに独立」という分離結果を生成することができる。 As described above, by using the score function that takes the same argument in the output channel to be associated, a separation result in which the output channels are associated with each other can be generated between the plurality of separation systems. . Furthermore, it is possible to generate a separation result that “the signals in the pair are dependent but the signals between the pairs are independent of each other”.

図１７は、狭間隔マイクと広間隔マイクとの協調的分離の具体例を説明するための模式図である。図１７（Ａ）に示すように狭間隔のマイク１〜３は、７．５ｃｍの等間隔で設置されている。広間隔のマイク１，３，４は、１５ｃｍの間隔及び３０ｃｍの間隔で設置されている。すなわち、チャンネル数ｎ＝３、分離システム数ｍ＝２である。両者で共有しているマイクがあるため、システム全体でのマイク本数は６ではなく、４となる。 FIG. 17 is a schematic diagram for explaining a specific example of cooperative separation between a narrow interval microphone and a wide interval microphone. As shown in FIG. 17A, the closely spaced microphones 1 to 3 are installed at equal intervals of 7.5 cm. Widely spaced microphones 1, 3, 4 are installed at 15 cm and 30 cm intervals. That is, the number of channels n = 3 and the number of separation systems m = 2. Since there are microphones shared by both parties, the number of microphones in the entire system is four instead of six.

また、図１７（Ｂ）に示すようにマイク群中央から１ｍ離れた場所にスピーカを３個置き、音源１は右前方４５度方向からの音楽、音源２は正面からの音声、音源３は左真横からの音楽とした。 Also, as shown in FIG. 17B, three speakers are placed at a distance of 1 m from the center of the microphone group, the sound source 1 is music from the 45 ° right front direction, the sound source 2 is the sound from the front, and the sound source 3 is the left It was music from the side.

これらの音を４個のマイクで収録し、１６ｋＨｚでサンプリングした。その信号に対して、窓長５１２、シフト幅１２８のＦＦＴをかけ、観測信号のスペクトログラムを生成した（６枚）。すなわち、Ｍ＝(５１２／２)＋１＝２５７である。なお、後述するＳＩＲ（Signal-to-Interference Ratio）という比率を計算するため、原信号はそれぞれ単独で鳴らしたものを複数マイクで観測し、計算機により波形を混合した。 These sounds were recorded with four microphones and sampled at 16 kHz. The signal was subjected to FFT with a window length of 512 and a shift width of 128 to generate spectrograms of observation signals (six images). That is, M = (512/2) + 1 = 257. In order to calculate a ratio called SIR (Signal-to-Interference Ratio), which will be described later, the original signal that was sounded independently was observed with a plurality of microphones, and the waveforms were mixed by a computer.

こうして生成した観測信号スペクトログラムに対して、式（１７）及び式（２５）を用いて分離を行った。ただし、式（２５）において係数ａはすべて１、Ｋも共通でｓｑｒｔ（Ｍ）≒１６．０３、Ｎ＝２である。また、反復回数は、３００回である。 The observation signal spectrogram generated in this way was separated using Equation (17) and Equation (25). However, in the equation (25), the coefficient a is all 1 and K is common, and sqrt (M) ≈16.03 and N = 2. The number of iterations is 300.

図１８及び図１９は、それぞれ狭間隔マイク及び広間隔マイクから得られた分離結果である。図１８（Ａ）及び図１９（Ａ）に示すスペクトログラムにより、両者間で対応する信号が同じチャンネルに出力されていることが分かる。すなわち、Ｙ_１ ^［１］とＹ_１ ^［２］は音楽の音源３に対応した成分、Ｙ_２ ^［１］とＹ_２ ^［２］は音声の音源２に対応した成分、Ｙ_３ ^［１］とＹ_３ ^［２］は音楽の音源１に対応した成分である。なお、従来のように2つの分離システム（狭間隔マイクを持つシステムと広間隔マイクを持つシステム）を別々に動作させた場合は、両システム間でチャンネルの対応が取れた分離結果が出力される保証はない。そのため、このようにシステム間で対応の取れた分離結果が生成されることは、本発明の利点である。 18 and 19 show the separation results obtained from the narrow interval microphone and the wide interval microphone, respectively. It can be seen from the spectrograms shown in FIGS. 18A and 19A that corresponding signals are output to the same channel. That is, Y ₁ ^[1] and Y ₁ ^[2] are components corresponding to the sound source 3 of music, Y ₂ ^[1] and Y ₂ ^[2] are components corresponding to the sound source 2 of sound, and Y ₃ ^[1] and Y ₃ ^[2] is a component corresponding to the music source 1. In addition, when two separation systems (a system with a narrow interval microphone and a system with a wide interval microphone) are operated separately as in the past, a separation result in which the channel correspondence between both systems can be output is output. There is no guarantee. Therefore, it is an advantage of the present invention that a separation result that can be dealt with between systems in this way is generated.

なお、図１８及び図１９の右側は、周波数ビンごとのＳＩＲである。ＳＩＲとは、３つの原信号がどれくらいのパワー比で含まれているかを表わす尺度であり、この図での単位はデシベルである。このグラフのある周波数ビンにおいて、一つの原信号だけ比率が高ければ（右方向に突出していれば）、その周波数ビンでは分離が成功していることを意味し、逆に、３つとも同じくらいの比率で含まれていれば、分離が失敗していることを意味する。図１８（Ｂ）のグラフと図１９（Ｂ）のグラフとを比較すると、高い精度で分離されている周波数ビンが両者で異なっていることが分かる。どの周波数ビンで分離に成功しやすい（しにくい）かは、マイクの間隔にある程度依存する。従って、それぞれの分離システムの結果（スペクトログラム）から、高い精度で分離されている周波数ビンの成分を抽出し、次に両者を混合することで、一層高い精度の分離結果を生成することも可能である。 The right side of FIGS. 18 and 19 is the SIR for each frequency bin. The SIR is a scale representing how much power ratio the three original signals are included, and the unit in this figure is decibel. In a frequency bin in this graph, if the ratio of only one original signal is high (projecting to the right), it means that the separation is successful in that frequency bin, and conversely all three are as much If it is included in the ratio, it means that the separation has failed. Comparing the graph of FIG. 18B and the graph of FIG. 19B, it can be seen that the frequency bins separated with high accuracy are different between the two. Which frequency bin is likely (or difficult to) to succeed in separation depends to some extent on the distance between the microphones. Therefore, it is also possible to extract the frequency bin components separated with high accuracy from the results (spectrogram) of each separation system, and then mix them together to generate a higher accuracy separation result. is there.

ではここで、本発明と、特開２００６−２３８４０９と、特開２００６−２３８４０９以前の時間周波数領域ＩＣＡ（以下では「従来の時間周波数領域のＩＣＡ」）との理論上の違いについて、ＫＬ情報量に着目して説明する。ＫＬ情報量はＰＤＦ間の距離のような尺度であるが、どのＰＤＦの間で距離を計算しているかが三者で異なるのである。以下では、チャンネル数ｎ＝３、分離システム数ｍ＝１としているが、数は任意であってもよい。ここで、以下の４つのＰＤＦを考える。 Now, regarding the theoretical difference between the present invention and the time frequency domain ICA before JP 2006-238409 (hereinafter, “conventional time frequency domain ICA”), the KL information amount is disclosed. This will be explained with a focus on. The amount of KL information is a measure such as the distance between PDFs, but the difference in which PDF is calculated differs among the three. In the following, the number of channels n = 3 and the number of separation systems m = 1, but the numbers may be arbitrary. Here, the following four PDFs are considered.

式（４２）は、出力の全ての要素を引数として持つＰＤＦであり、全部の引数の同時確率を表している。式（４３）は、Ｙ_１ ^［１］とＹ_２ ^［１］のペアと、Ｙ_３ ^［１］単独とが独立であることを表しており、具体例６で使用されているものである。式（４４）は、スペクトログラム単位で独立であることを表している。式（４５）は、すべての引数が独立であることを表している。 Expression (42) is a PDF having all the elements of the output as arguments, and represents the joint probability of all the arguments. Formula (43) represents that the pair of Y ₁ ^[1] and Y ₂ ^[1] and Y ₃ ^[1] alone are independent, and are used in Specific Example 6. Formula (44) represents that it is independent in spectrogram units. Expression (45) indicates that all arguments are independent.

従来の時間周波数領域のＩＣＡは、式（４５）と式（４２）とのＫＬ情報量を最小化するのと等価であった。式（４５）は「同一チャンネル内の周波数ビンの間も独立である」という余計な条件を含んでいるため、このＫＬ情報量を最小にするような分離行列および分離結果を求めると、同一チャンネル内の異なる周波数ビンの間で異なる原信号が現われるという現象、すなわち周波数ビン間のパーミュテーション問題が発生していた。 The conventional ICA in the time-frequency domain is equivalent to minimizing the KL information amount of Equation (45) and Equation (42). Since the expression (45) includes an extra condition that “the frequency bins in the same channel are also independent”, when the separation matrix and the separation result that minimize the amount of KL information are obtained, the same channel is obtained. A phenomenon in which different original signals appear between different frequency bins, that is, a permutation problem between frequency bins has occurred.

一方、特開２００６−２３８４０９号公報に記載された技術では、式（４４）と式（４２）とのＫＬ情報量を最小化している。式（４４）は「異なるチャンネル間は独立だが、同一チャンネル内の周波数ビン同士は非独立（依存関係あり）」という条件を表わしているため、このKL情報量の最小化によって得られた分離結果では、周波数ビン間のパーミュテーション問題が解消されている。 On the other hand, in the technique described in Japanese Patent Application Laid-Open No. 2006-238409, the KL information amount of Expression (44) and Expression (42) is minimized. Since the equation (44) represents a condition that “different channels are independent but frequency bins in the same channel are not independent (with a dependency relationship)”, the separation result obtained by minimizing this KL information amount Then, the permutation problem between frequency bins is solved.

ここで、式（４２）を「全く分解（factorize）されていないＰＤＦ」、式（４５）を「要素ごとに完全に分解されたＰＤＦ」と見なすと、式（４４）は両者の中間段階の PDF であると見なすことができる。すなわち、ＰＤＦの分解を「チャンネルごとに分解された」状態でとどめたのが式（４４）である。 Here, when Equation (42) is regarded as “PDF that has not been factorized at all” and Equation (45) is regarded as “PDF that is completely decomposed element by element”, Equation (44) is an intermediate stage between the two. Can be considered PDF. That is, the expression (44) shows that the decomposition of the PDF is stopped in the state of being decomposed for each channel.

それに対し、本発明は、ＰＤＦの分解を「ペアごとに分解された」状態でとどめたものと見なすことができる。例えば、具体例６で使用されている式（４３）は、式（４２）と式（４４）との間の中間的なＰＤＦと見なせる。 On the other hand, the present invention can be regarded as stopping the decomposition of the PDF in a “decomposed pairwise” state. For example, Equation (43) used in Specific Example 6 can be regarded as an intermediate PDF between Equation (42) and Equation (44).

ペアの構成の仕方は任意であり、また、ペア内の要素数はペアごとに異なっていても構わないため、式（４２）と式（４５）との間に存在しうる任意のＰＤＦ（以降「中間段階ＰＤＦ」）を表現することができる。すなわち、中間段階ＰＤＦと式（４２）とのＫＬ情報量を最小化するという方式のＩＣＡは、全て本発明の範疇に含まれる。 The method of configuring the pair is arbitrary, and the number of elements in the pair may be different for each pair. Therefore, any PDF that can exist between Expression (42) and Expression (45) “Intermediate stage PDF”) can be expressed. In other words, all ICAs that minimize the amount of KL information between the intermediate stage PDF and the equation (42) are included in the scope of the present invention.

本発明の一実施形態における信号分離装置の構成を示すブロック図である。It is a block diagram which shows the structure of the signal separation apparatus in one Embodiment of this invention. ｊ番目の分離システムの構成を示すブロック図である。It is a block diagram which shows the structure of the j-th separation system. 信号分離処理の概略を説明するフローチャートである。It is a flowchart explaining the outline of a signal separation process. 信号分離処理における前処理を説明するフローチャートである。It is a flowchart explaining the pre-process in a signal separation process. 信号分離処理における分離処理を説明するフローチャートである。It is a flowchart explaining the separation process in a signal separation process. 信号分離処理における後処理を説明するフローチャートである。It is a flowchart explaining the post-process in a signal separation process. 広い間隔のマイクと狭い間隔のマイクとを備える場合の分離処理を説明するための模式図である。It is a schematic diagram for demonstrating the isolation | separation process in the case of providing a microphone of a wide space | interval and a microphone of a narrow space | interval. 同じ位置にあるマイクを複数の分離システムの間で共有した場合を示す模式図である。It is a schematic diagram which shows the case where the microphone in the same position is shared between several isolation | separation systems. 独立性計算部及びマッピング部の処理を説明するための模式図ある。It is a schematic diagram for demonstrating the process of an independence calculation part and a mapping part. Ｙ_１ ^[１]が８ｍｓ間隔のフレーム、Ｙ_１ ^［２］が１０ｍｓ間隔のフレームである場合の処理を説明するための模式図である。Y ₁ ^[1] is a schematic diagram for frame 8ms _intervals, ^{Y 1 [2]} is for explaining a process performed when a frame of 10ms intervals. 簡易的に信号分離を実現するための処理を示す模式図である。It is a schematic diagram which shows the process for implement | achieving signal separation simply. ３入力、３出力の分離システム、及び２入力、２出力の分離システムにおける信号の対応付けの例を示す図である。It is a figure which shows the example of the matching of the signal in the separation system of 3 inputs and 3 outputs, and the separation system of 2 inputs and 2 outputs. 信号源が２つあり、一方は音声であり、他方はスピーカから出る音（音楽）である場合の信号分離を説明するための模式図である。It is a schematic diagram for demonstrating signal separation in case there are two signal sources, one is sound, and the other is sound (music) coming out of a speaker. ３つの音源を３つのマイクで収録し、それらを３つの成分に分離する処理を説明するための模式図である。It is a mimetic diagram for explaining processing which records three sound sources with three microphones, and separates them into three components. 独立性計算部及びマッピング部の処理を説明するための模式図である。It is a schematic diagram for demonstrating the process of an independence calculation part and a mapping part. 分離システム間でマイクの設置位置が異なる場合の協調的分離を説明するための模式図である。It is a schematic diagram for demonstrating cooperative separation in case the installation position of a microphone differs between separation systems. 狭間隔マイクと広間隔マイクとの協調的分離の具体例を説明するための模式図である。It is a schematic diagram for demonstrating the specific example of the cooperative separation of a narrow interval microphone and a wide interval microphone. 狭間隔マイクから得られた分離結果を示す図である。It is a figure which shows the isolation | separation result obtained from the narrow space | interval microphone. 広間隔マイクから得られた分離結果を示す図である。It is a figure which shows the isolation | separation result obtained from the wide space | interval microphone. Ｎ個の音源から出力された原信号をｎ個のマイクロフォンで観測する状況を示す図である。It is a figure which shows the condition which observes the original signal output from N sound sources with n microphones. ＩＣＡを用いたｍ個の分離システムを備えた場合の信号の分離結果を示す模式図である。It is a schematic diagram which shows the separation result of the signal at the time of providing the m pieces of separation system using ICA.

Explanation of symbols

１_１〜１_ｍ分離システム、２信号統合部、３出力部、４制御部、５独立性計算部、６マッピング部 1 ₁ to 1 _m separation system, 2 signal integration unit, 3 output unit, 4 control unit, 5 independence calculation unit, 6 mapping unit

Claims

In a signal separation device that separates an observation signal in a time domain or a time frequency domain in which a plurality of signals are mixed into a separated signal using independent component analysis,
A plurality of separation means for generating a separation signal from the observed signal and the separation matrix;
A signal set generation means for generating a signal set of corresponding separation signals among a plurality of separation signals generated from different separation means;
Using the multivariate probability density function or score function that takes the signal pair as an argument, calculate the independence between the signal pairs or the change in the independence measure, and so that the independence between the signal pairs is maximized An independence calculating means for controlling a plurality of separating means,
Each separation means generates a separation signal from the observation signal and the separation matrix into which the initial value is substituted, and the separation matrix is approximately based on the independence between the signal sets computed by the independence computation means. A signal separation device, wherein the separation matrix is corrected until convergence, and the separation signal is generated using a substantially converged separation matrix.

2. The signal separation device according to claim 1, wherein a Kullback-Leibler information amount calculated from the multivariate probability density function is used as a measure representing independence between the signal sets.

The signal set generation means is selected from microphone interval or installation position, sampling frequency, short-time Fourier transform window length, short-time Fourier transform shift width, initial value of separation matrix, and observation signal spectrogram normalization method 2. The signal separation device according to claim 1, wherein a signal set of corresponding separation signals is generated among a plurality of separation signals generated from a plurality of separation means having at least one or more different conditions.

In a signal separation device that separates an observation signal in a time domain or a time frequency domain in which a plurality of signals are mixed into a separated signal using independent component analysis,
Separation means for generating a separation signal from the observed signal and the separation matrix;
Signal set generation means for generating a signal set of corresponding separation signals between the channels of the separation means;
Using the multivariate probability density function or score function that takes the signal pair as an argument, calculate the independence between the signal pairs or the change in the independence measure, and so that the independence between the signal pairs is maximized An independence calculating means for controlling the separating means,
The separation means generates a separation signal from the observation signal and a separation matrix into which initial values are substituted, and the separation matrix is approximately based on the independence between the signal sets calculated by the independence calculation means. A signal separation device, wherein the separation matrix is corrected until convergence, and the separation signal is generated using a substantially converged separation matrix.

In a signal separation method for separating a time-domain or time-frequency domain observation signal in which a plurality of signals are mixed into a separated signal using independent component analysis,
Generating a separation signal from the observed signal and the separation matrix by a plurality of separation means;
Generating a signal set of corresponding separated signals among a plurality of separated signals generated from different separating means;
Calculating an independence between the signal pairs or a change in an independence measure using a multivariate probability density function or a score function taking the signal pairs as arguments;
A separation signal is generated from the observation signal and a separation matrix into which initial values are substituted, and the separation matrix is converged so that the independence between signal sets calculated using the score function is maximized. Modifying the separation matrix.

6. The signal separation method according to claim 5, wherein a Kullback-Leibler information amount calculated from the score function is used as a measure representing independence between the signal sets.

In the process of generating the above signal set, select from microphone spacing or installation position, sampling frequency, short Fourier transform window length, short Fourier transform shift width, initial value of separation matrix, and observation signal spectrogram normalization method 6. The signal separation method according to claim 5, wherein a signal set of corresponding separation signals is generated among a plurality of separation signals generated from a plurality of separation means having different one or more conditions.

In a signal separation method for separating a time-domain or time-frequency domain observation signal in which a plurality of signals are mixed into a separated signal using independent component analysis,
Generating a separation signal from the observed signal and the separation matrix;
Generating a signal set of corresponding separated signals between the channels;
Calculating an independence between the signal pairs or a change in an independence measure using a multivariate probability density function or a score function taking the signal pairs as arguments;
A separation signal is generated from the observation signal and a separation matrix into which initial values are substituted, and the separation matrix is converged so that the independence between signal sets calculated using the score function is maximized. Modifying the separation matrix.