JP5387442B2

JP5387442B2 - Signal processing device

Info

Publication number: JP5387442B2
Application number: JP2010038295A
Authority: JP
Inventors: 多伸近藤
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2010-02-24
Filing date: 2010-02-24
Publication date: 2014-01-15
Anticipated expiration: 2030-02-24
Also published as: JP2011176535A

Description

本発明は、相異なる音源が発生した複数の音響の混合音のうち特定の音源からの音響を強調（分離または抽出）する技術に関する。 The present invention relates to a technique for emphasizing (separating or extracting) sound from a specific sound source among a plurality of mixed sound generated by different sound sources.

複数の音響（音声や雑音）の混合音を複数の収音機器で収音した複数の観測信号に音源分離を実行することで各音源からの音響を分離する技術（音源分離技術）が従来から提案されている。音源分離に適用される分離行列（逆混合行列）は、例えば周波数領域の独立成分分析（FDICA：Frequency-Domain Independent Component Analysis））を利用した学習処理（反復的な更新）で周波数毎に算定される。 Conventionally, there is a technology (sound source separation technology) that separates sound from each sound source by performing sound source separation on multiple observation signals obtained by collecting mixed sounds of multiple sounds (speech and noise) with multiple sound collection devices. Proposed. The separation matrix (inverse mixing matrix) applied to sound source separation is calculated for each frequency by a learning process (iterative update) using, for example, frequency-domain independent component analysis (FDICA). The

非特許文献１には、複数の周波数から所定個毎に選択した各周波数について学習処理で分離行列を生成し、学習処理後の分離行列を利用して非選択の各周波数の分離行列を補充する技術が開示されている。非選択の周波数の分離行列の生成には死角制御型のビーム形成（NBF（Null Beam Former））が利用される。すなわち、学習処理後の分離行列から推定される音源方向に収音の死角が形成されるように非選択の周波数の分離行列が設定される。非特許文献１の技術によれば、独立成分分析による学習処理を当初から全部の周波数について実行する場合と比較して演算量を削減することが可能である。 In Non-Patent Document 1, a separation matrix is generated by learning processing for each frequency selected from a plurality of frequencies for each predetermined number, and a separation matrix for each non-selected frequency is supplemented using the separation matrix after learning processing. Technology is disclosed. A blind spot control type beam forming (NBF (Null Beam Former)) is used to generate a separation matrix of non-selected frequencies. In other words, a non-selected frequency separation matrix is set so that a dead angle of sound collection is formed in the sound source direction estimated from the separation matrix after learning processing. According to the technique of Non-Patent Document 1, it is possible to reduce the amount of calculation compared to a case where learning processing by independent component analysis is executed for all frequencies from the beginning.

大迫ほか３名，“死角制御型ビームフォーマによる周波数帯域補間を用いたブラインド音源分離の高速化手法”，日本音響学会講演論文集，日本音響学会，2007年3月，p.549-p.550Osako et al., “Fast method of blind source separation using frequency band interpolation by blind spot control type beamformer”, Proceedings of the Acoustical Society of Japan, Acoustical Society of Japan, March 2007, p.549-p.550

しかし、非特許文献１のように死角制御型のビーム形成で生成された分離行列を非選択の周波数の分離行列として利用する構成では、非選択の周波数について音源分離の精度を充分に確保できない可能性がある。以上の事情を考慮して、本発明は、分離行列の生成に必要な演算量の削減と音源分離の高精度化との両立を目的とする。 However, in the configuration in which the separation matrix generated by the blind spot control type beam forming as in Non-Patent Document 1 is used as the separation matrix of the non-selected frequency, the accuracy of sound source separation cannot be sufficiently ensured for the non-selected frequency. There is sex. In view of the above circumstances, an object of the present invention is to achieve both reduction in the amount of calculation required for generating a separation matrix and high accuracy in sound source separation.

以上の課題を解決するために、本発明の信号処理装置は、相異なる音源が発生した複数の音響の混合音を複数の収音機器で収音した複数の観測信号に対して複数の周波数の各々の分離行列を適用することで音源毎の複数の分離信号を生成する信号分離手段と、複数の周波数を第１周波数と第２周波数とに選別する周波数選別手段と、複数の観測信号における第１周波数の成分に対応する学習データを適用した１次学習処理で当該第１周波数の分離行列を生成する第１学習手段と、第１学習手段が生成した分離行列から各音源の方向を推定する方向推定手段と、方向推定手段が推定した方向に収音の死角またはビームが形成されるように初期分離行列を生成する初期行列設定手段と、複数の観測信号における第２周波数の成分に対応する学習データを適用した２次学習処理を、初期行列設定手段が生成した初期分離行列を初期値として、１次学習処理よりも少ない反復回数で実行することで、当該第２周波数の分離行列を生成する第２学習手段とを具備する。 In order to solve the above problems, the signal processing apparatus according to the present invention has a plurality of frequencies for a plurality of observation signals obtained by collecting a plurality of sound mixed sounds generated by different sound sources by a plurality of sound collecting devices. A signal separation unit that generates a plurality of separation signals for each sound source by applying each separation matrix, a frequency selection unit that selects a plurality of frequencies into a first frequency and a second frequency, and a first in the plurality of observation signals First learning means for generating a separation matrix of the first frequency in a primary learning process using learning data corresponding to a component of one frequency, and estimating the direction of each sound source from the separation matrix generated by the first learning means Corresponding to direction estimation means, initial matrix setting means for generating an initial separation matrix so that a dead angle or beam of sound collection is formed in the direction estimated by the direction estimation means, and components of the second frequency in a plurality of observation signals Learning The second frequency separation process is performed by using the initial separation matrix generated by the initial matrix setting means as the initial value with a smaller number of iterations than the first learning process, thereby generating the second frequency separation matrix. Second learning means.

以上の構成においては、第１周波数に選別された各周波数については１次学習処理で分離行列が生成され、第２周波数に選別された各周波数については、１次学習処理で生成された分離行列に応じた初期分離行列を初期値として、１次学習処理よりも少ない反復回数の２次学習処理を実行することで分離行列が生成される。したがって、全部の周波数について１次学習処理を実行する場合と比較して演算量が削減されるという利点がある。また、１次学習処理で生成された分離行列から推定される音源の方向に収音の死角またはビームが形成されるように設定された分離行列を第２周波数について適用する構成（２次学習処理を実行しない構成）と比較して、高精度な音源分離が可能な分離行列を生成できるという利点もある。 In the above configuration, a separation matrix is generated in the primary learning process for each frequency selected as the first frequency, and a separation matrix generated in the primary learning process for each frequency selected as the second frequency. The separation matrix is generated by executing the secondary learning process with a smaller number of iterations than the primary learning process with the initial separation matrix corresponding to the initial value as the initial value. Therefore, there is an advantage that the calculation amount is reduced as compared with the case where the primary learning process is executed for all frequencies. Also, a configuration (secondary learning process) in which a separation matrix set so that a dead angle or a beam of sound collection is formed in the direction of a sound source estimated from the separation matrix generated by the primary learning process is applied to the second frequency. Compared with a configuration that does not execute the above, there is an advantage that a separation matrix capable of high-accuracy sound source separation can be generated.

ところで、収音条件が劣悪な環境では、第２周波数について２次学習処理を実行しないほうが高精度な分離行列を生成できる場合がある。以上の傾向を考慮して、本発明の好適な態様に係る信号処理装置は、収音条件の良否を周波数毎に判定する条件判定手段を具備し、第２学習手段は、収音条件が良いと条件判定手段が判定した周波数については、初期分離行列を初期値とした第２学習処理で分離行列を生成し、収音条件が悪いと条件判定手段が判定した周波数については、初期分離行列を分離行列として採用する。以上の態様においては、収音条件が悪い周波数については２次学習処理が実行されないから、第２周波数に選別された全部の周波数について収音条件に関わらず２次学習処理を実行する構成と比較すると、高精度な分離行列を生成することが可能となる。なお、以上の態様の具体例は例えば第２実施形態として後述される。 By the way, in an environment where the sound collection conditions are poor, there is a case where a high-precision separation matrix can be generated without performing the secondary learning process for the second frequency. Considering the above tendency, the signal processing apparatus according to a preferred aspect of the present invention includes a condition determination unit that determines whether the sound collection condition is good or not for each frequency, and the second learning unit has a good sound collection condition. For the frequencies determined by the condition determination means, a separation matrix is generated by the second learning process using the initial separation matrix as an initial value, and for the frequencies determined by the condition determination means that the sound collection condition is bad, the initial separation matrix is Adopt as a separation matrix. In the above aspect, since the secondary learning process is not executed for frequencies with poor sound collection conditions, it is compared with the configuration in which the secondary learning process is executed for all frequencies selected as the second frequency regardless of the sound collection conditions. Then, it becomes possible to generate a highly accurate separation matrix. In addition, the specific example of the above aspect is later mentioned as 2nd Embodiment, for example.

また、観測信号のうち第２周波数に選別された周波数の成分に１個の音源の音響のみが含まれる場合には、音源分離の前後で当該周波数の成分が過度に変化しないように分離行列を設定する構成が好適である。そこで、収音条件の良否（音源数の単数／複数）を周波数毎に判定する条件判定手段と、第２周波数に選別された各周波数のうち収音条件が悪いと条件判定手段が判定した周波数について、複数の観測信号から推定される音源方向からの到来音が強調されるように分離行列を設定する行列設定手段（例えば図１４の行列設定部７６）とを具備する構成が採用され得る。以上の態様では、分離行列の生成に必要な演算量を削減するという観点から、第２周波数に選別された各周波数のうち収音条件が悪いと条件判定手段が判定した周波数について、初期行列設定手段による初期分離行列の生成と第２学習手段による２次学習処理とを停止する構成が格別に好適である。なお、以上の態様の具体例は例えば第３実施形態として後述される。 In addition, when only the sound of one sound source is included in the frequency component selected as the second frequency in the observation signal, the separation matrix is set so that the frequency component does not change excessively before and after sound source separation. A configuration to be set is preferable. Therefore, condition determination means for determining whether the sound collection condition is good (single / plurality of sound sources) for each frequency, and the frequency determined by the condition determination means that the sound collection condition is bad among the frequencies selected as the second frequency. For example, a configuration including matrix setting means (for example, the matrix setting unit 76 in FIG. 14) that sets a separation matrix so that incoming sounds from sound source directions estimated from a plurality of observation signals are emphasized may be employed. In the above aspect, from the viewpoint of reducing the amount of computation required for generating the separation matrix, initial matrix setting is performed for the frequencies determined by the condition determination means that the sound collection condition is bad among the frequencies selected as the second frequency. A configuration in which the generation of the initial separation matrix by the means and the secondary learning process by the second learning means is stopped is particularly suitable. In addition, the specific example of the above aspect is later mentioned as 3rd Embodiment, for example.

本発明の好適な態様に係る信号処理装置は、各周波数の学習データを適用した学習処理の有意性を示す有意指標値を複数の観測信号から周波数毎に算定する有意指標算定手段を具備し、周波数選別手段は、各周波数の有意指標値に応じて複数の周波数を第１周波数と第２周波数とに選別する。以上の態様においては、学習処理の有意性を示す有意指標値に応じて複数の周波数が第１周波数と第２周波数とに選別されるから、第１周波数および第２周波数の選別を学習処理の有意性とは無関係に実行する構成（例えば複数の周波数の配列から所定個毎に選択した周波数を第１周波数に選別するとともに残余の周波数を第２周波数に選別する構成）と比較して、高精度な分離行列を生成することが可能である。具体的には、条件判定手段は、相異なる音源が発生した複数の音響の強度の相違が大きい場合に収音条件が悪いと判定し、各音響の強度の相違が小さい場合に収音条件が良いと判定する。 The signal processing apparatus according to a preferred aspect of the present invention includes a significant index calculation means for calculating a significant index value indicating the significance of the learning process to which the learning data of each frequency is applied for each frequency from a plurality of observation signals, The frequency sorting means sorts a plurality of frequencies into a first frequency and a second frequency according to a significant index value of each frequency. In the above aspect, since the plurality of frequencies are sorted into the first frequency and the second frequency according to the significant index value indicating the significance of the learning process, the selection of the first frequency and the second frequency is performed in the learning process. Compared to a configuration that is executed regardless of significance (for example, a configuration in which a frequency selected for each predetermined number from a plurality of frequencies is selected as a first frequency and a remaining frequency is selected as a second frequency) It is possible to generate an accurate separation matrix. Specifically, the condition determination means determines that the sound collection condition is bad when the differences in the intensity of a plurality of sounds generated by different sound sources are large, and the sound collection condition is determined when the difference in the intensity of each sound is small. Judge as good.

なお、有意指標算定手段を具備する態様では、有意指標算定手段が算定した各周波数の有意指標値に応じて条件判定手段が周波数毎の収音条件の良否を判定する構成が格別に好適である。以上の態様においては、収音条件の良否の指標を有意指標値とは別個に算定する構成と比較して、分離行列の生成に必要な演算量を削減できるという利点がある。具体的には、複数の観測信号の各々における各周波数での強度を要素とする観測ベクトルの共分散行列の行列式は、学習処理の有意性を示す指標として利用され、かつ、相異なる音源が発生した複数の音響の強度の相違（収音条件の良否）に応じて変化する。そこで、観測ベクトルの共分散行列の行列式を、周波数の選別と収音条件の良否の判定とに流用する構成が好適である。 In the aspect including the significant index calculation means, a configuration in which the condition determination means determines the quality of the sound pickup condition for each frequency according to the significant index value of each frequency calculated by the significant index calculation means is particularly suitable. . In the above aspect, there is an advantage that the amount of calculation required for generating the separation matrix can be reduced as compared with the configuration in which the index of the sound collection condition is calculated separately from the significant index value. Specifically, the determinant of the covariance matrix of the observation vector whose element is the intensity at each frequency in each of the plurality of observation signals is used as an index indicating the significance of the learning process, and different sound sources are used. It changes according to the difference in the intensity of the plurality of generated sounds (good or bad sound collection conditions). Therefore, a configuration in which the determinant of the covariance matrix of the observation vector is used for selecting the frequency and determining the quality of the sound collection condition is preferable.

なお、独立成分分析による学習処理は、独立な基底を音源の個数だけ特定する処理と等価であるから、複数の観測信号の各々における各周波数での強度を要素とする観測ベクトルの基底の総数は、学習データを利用した学習の有意性の指標として好適に利用される。そこで、本発明の好適な態様における有意指標算定手段は、複数の観測信号の各々における各周波数での強度を要素とする観測ベクトルの分布における基底の総数の指標値を算定し、周波数選別手段は、指標値が示す基底の総数が多い周波数を第１周波数に選別する。 Note that the learning process by independent component analysis is equivalent to the process of specifying the independent bases by the number of sound sources, so the total number of observation vector bases whose elements are the intensities at each frequency in each of the plurality of observation signals is It is preferably used as an index of the significance of learning using learning data. Therefore, the significant index calculation means in a preferred aspect of the present invention calculates the index value of the total number of bases in the distribution of observation vectors whose elements are the intensities at each frequency in each of the plurality of observation signals, and the frequency selection means The frequency having the large total number of bases indicated by the index value is selected as the first frequency.

基底の総数の指標値としては、例えば、観測ベクトルの共分散行列の行列式や条件数が例示される。したがって、本発明の好適な態様における有意指標算定手段は、複数の観測信号における各周波数の成分の強度を要素とする観測ベクトルの共分散行列を複数の周波数の各々について算定する共分散行列算定手段と、各周波数の共分散行列から有意指標値を算定する指標算定手段（例えば図６の行列式算定部４２４）とを含んで構成される。指標算定手段は、例えば共分散行列の行列式や条件数に応じて有意指標値を算定する。また、観測ベクトルの共分散行列のトレース（パワー）が大きいほど観測ベクトルの分布領域（基底）が音源毎に明確に特定されるという傾向を考慮すると、複数の観測信号の共分散行列のトレースから有意指標算定手段が有意指標値を算定する構成も好適である。 Examples of the index value of the total number of bases include determinants of the covariance matrix of observation vectors and condition numbers. Therefore, the significance index calculation means in a preferred aspect of the present invention is a covariance matrix calculation means for calculating an observation vector covariance matrix having each frequency component in a plurality of observation signals as an element. And index calculation means (for example, determinant calculation unit 424 in FIG. 6) for calculating a significant index value from the covariance matrix of each frequency. The index calculation means calculates a significant index value according to, for example, the determinant of the covariance matrix and the condition number. Considering the tendency that the distribution region (basis) of the observation vector is clearly identified for each sound source as the trace (power) of the observation vector covariance matrix is larger, A configuration in which the significant index calculation means calculates the significant index value is also suitable.

なお、有意指標値の定義や算定の方法は任意である。例えば、観測信号の強度の度数分布における尖度が低いほど観測信号が多くの音源からの音を含むという傾向を考慮すると、観測信号の強度の度数分布における尖度に応じた有意指標値を有意指標算定手段が算定し、尖度が低い周波数を周波数選別手段が第１周波数に選別する構成が好適である。また、複数の観測信号の相互間の独立性が高い（相関が低い）ほど、学習データを利用した学習の有意性は高いという傾向を考慮すると、複数の観測信号の相互間の独立性に応じた有意指標値を有意指標算定手段が算定し、有意指標値が示す独立性が高い周波数を周波数選別手段が第１周波数に選別する構成が好適である。複数の観測信号の相互間の独立性の指標値としては、例えば、相互相関や相互情報量が例示される。 In addition, the definition of a significant index value and the calculation method are arbitrary. For example, considering the tendency that the lower the kurtosis in the intensity distribution of the observed signal, the more the observed signal contains sound from the sound source, the significant index value corresponding to the kurtosis in the intensity distribution of the observed signal is significant. A configuration in which the index calculating unit calculates and the frequency selecting unit selects a frequency having a low kurtosis as the first frequency is preferable. In addition, considering the tendency that the higher the independence between multiple observation signals (the lower the correlation), the higher the significance of learning using learning data, it depends on the independence between multiple observation signals. A configuration in which the significant index value is calculated by the significant index calculation means, and the frequency selecting means selects the first independent frequency indicated by the significant index value is preferable. Examples of the index value of independence among a plurality of observation signals include cross-correlation and mutual information.

以上の各態様に係る信号処理装置は、音声の処理に専用されるＤＳＰ（Digital Signal Processor）などのハードウェア（電子回路）によって実現されるほか、ＣＰＵ（Central Processing Unit）などの汎用の演算処理装置とプログラムとの協働によっても実現される。本発明に係るプログラムは、相異なる音源が発生した複数の音響の混合音を複数の収音機器で収音した複数の観測信号に対して複数の周波数の各々の分離行列を適用することで音源毎の複数の分離信号を生成する信号分離処理と、複数の周波数を第１周波数と第２周波数とに選別する周波数選別処理と、複数の観測信号における第１周波数の成分に対応する学習データを適用した１次学習処理で当該第１周波数の分離行列を生成する第１処理と、第１処理で生成した分離行列から各音源の方向を推定する方向推定処理と、方向推定処理で推定した方向に収音の死角またはビームが形成されるように初期分離行列を生成する初期行列設定処理と、複数の観測信号における第２周波数の成分に対応する学習データを適用した２次学習処理を、初期行列設定手段が生成した初期分離行列を初期値として、１次学習処理よりも少ない反復回数で実行することで、当該第２周波数の分離行列を生成する第２処理とをコンピュータに実行される。以上のプログラムによれば、本発明に係る信号処理装置と同様の作用および効果が奏される。本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で利用者に提供されてコンピュータにインストールされるほか、通信網を介した配信の形態でサーバ装置から提供されてコンピュータにインストールされる。 The signal processing apparatus according to each aspect described above is realized by hardware (electronic circuit) such as a DSP (Digital Signal Processor) dedicated to voice processing, and general-purpose arithmetic processing such as a CPU (Central Processing Unit). This is also realized by cooperation between the apparatus and the program. The program according to the present invention applies a separation matrix of each of a plurality of frequencies to a plurality of observation signals obtained by collecting a plurality of sound mixed sounds generated by different sound sources by a plurality of sound collecting devices. A signal separation process for generating a plurality of separated signals for each, a frequency sorting process for sorting a plurality of frequencies into a first frequency and a second frequency, and learning data corresponding to a component of the first frequency in the plurality of observation signals. A first process for generating a separation matrix of the first frequency in the applied primary learning process, a direction estimation process for estimating the direction of each sound source from the separation matrix generated in the first process, and a direction estimated by the direction estimation process Initial matrix setting processing for generating an initial separation matrix so that a dead angle or beam of sound collection is formed in the second stage, and secondary learning processing using learning data corresponding to components of the second frequency in a plurality of observation signals, The initial separation matrix column setting means is generated as an initial value by executing a small number of iterations than the primary learning process is executed and a second process of generating the second frequency of the separating matrix in the computer. According to the above program, the same operation and effect as the signal processing apparatus according to the present invention are exhibited. The program of the present invention is provided to a user in a form stored in a computer-readable recording medium and installed in the computer, or provided from a server device in a form of distribution via a communication network and installed in the computer. Is done.

第１実施形態に係る信号処理装置のブロック図である。1 is a block diagram of a signal processing device according to a first embodiment. 観測ベクトルおよび学習データの説明図である。It is explanatory drawing of an observation vector and learning data. 信号分離部のブロック図である。It is a block diagram of a signal separation unit. 分離行列生成部のブロック図である。It is a block diagram of a separation matrix production | generation part. 分離行列生成部の動作の説明図である。It is explanatory drawing of operation | movement of a separation matrix production | generation part. 有意指標算定部のブロック図である。It is a block diagram of a significant index calculation part. 観測ベクトルの共分散行列の行列式と基底数との関係を示す概念図である。It is a conceptual diagram which shows the relationship between the determinant of the covariance matrix of an observation vector, and a basis number. 第１周波数の個数と学習処理の反復回数との関係を示すグラフである。It is a graph which shows the relationship between the number of 1st frequencies, and the repetition frequency of a learning process. 第１周波数の個数と雑音抑圧率との関係を示すグラフである。It is a graph which shows the relationship between the number of 1st frequencies, and a noise suppression rate. 第１周波数の個数とケプストラム歪との関係を示すグラフである。It is a graph which shows the relationship between the number of 1st frequencies, and cepstrum distortion. 第１周波数の個数と雑音抑圧率との関係を示すグラフである。It is a graph which shows the relationship between the number of 1st frequencies, and a noise suppression rate. 第１周波数の個数とケプストラム歪との関係を示すグラフである。It is a graph which shows the relationship between the number of 1st frequencies, and cepstrum distortion. 第２実施形態における分離行列生成部のブロック図である。It is a block diagram of the separation matrix production | generation part in 2nd Embodiment. 第３実施形態における分離行列生成部のブロック図である。It is a block diagram of the separation matrix production | generation part in 3rd Embodiment. 共分散行列のトレースと観測ベクトルの分布範囲との関係を示す概念図である。It is a conceptual diagram which shows the relationship between the trace of a covariance matrix, and the distribution range of an observation vector. 補正前尖度と加重値との関係を示すグラフである。It is a graph which shows the relationship between kurtosis before correction | amendment, and a weight value.

＜Ａ：第１実施形態＞
図１は、第１実施形態に係る信号処理装置１００のブロック図である。相互に間隔をあけて平面ＰL内に配置された収音機器Ｍ1および収音機器Ｍ2が信号処理装置１００に接続される。収音機器Ｍ1および収音機器Ｍ2の周辺の相異なる位置には音源Ｓ1および音源Ｓ2が存在する。音源Ｓ1は、平面ＰLの法線Ｌnに対して角度θ1の方向に位置し、音源Ｓ2は、法線Ｌnに対して角度θ2（θ2≠θ1）の方向に位置する。角度θ1および角度θ2は未知である。なお、収音機器Ｍ（Ｍ1，Ｍ2）の個数や音源Ｓ（Ｓ1，Ｓ2）の個数は任意に変更され得る。 <A: First Embodiment>
FIG. 1 is a block diagram of a signal processing apparatus 100 according to the first embodiment. The sound collecting device M1 and the sound collecting device M2 arranged in the plane PL with a space therebetween are connected to the signal processing apparatus 100. The sound source S1 and the sound source S2 exist at different positions around the sound collection device M1 and the sound collection device M2. The sound source S1 is located in the direction of the angle θ1 with respect to the normal line Ln of the plane PL, and the sound source S2 is located in the direction of the angle θ2 (θ2 ≠ θ1) with respect to the normal line Ln. Angle θ1 and angle θ2 are unknown. Note that the number of sound collecting devices M (M1, M2) and the number of sound sources S (S1, S2) can be arbitrarily changed.

音源Ｓ1が発生した音響ＳV1と音源Ｓ2が発生した音響ＳV2との混合音が収音機器Ｍ1および収音機器Ｍ2に到達する。収音機器Ｍ1は観測信号Ｖ1(t)を生成し、収音機器Ｍ2は観測信号Ｖ2(t)を生成する。観測信号Ｖ1(t)および観測信号Ｖ2(t)の各々は、音響ＳV1と音響ＳV2との混合音の時間波形を表す音響信号である。 The mixed sound of the sound SV1 generated by the sound source S1 and the sound SV2 generated by the sound source S2 reaches the sound collection device M1 and the sound collection device M2. The sound collecting device M1 generates an observation signal V1 (t), and the sound collecting device M2 generates an observation signal V2 (t). Each of the observation signal V1 (t) and the observation signal V2 (t) is an acoustic signal representing a time waveform of a mixed sound of the sound SV1 and the sound SV2.

信号処理装置１００は、観測信号Ｖ1(t)および観測信号Ｖ2(t)に対する音源分離（フィルタ処理）で分離信号Ｙ1(t)および分離信号Ｙ2(t)を生成する。分離信号Ｙ1(t)は、音源Ｓ1からの音響ＳV1を強調（音源Ｓ2からの音響ＳV2を抑制）した音響信号であり、分離信号Ｙ2(t)は、音響ＳV2を強調（音響ＳV1を抑制）した音響信号である。すなわち、音響ＳV1と音響ＳV2とが分離（音源分離）される。 The signal processing apparatus 100 generates a separation signal Y1 (t) and a separation signal Y2 (t) by sound source separation (filter processing) for the observation signal V1 (t) and the observation signal V2 (t). The separated signal Y1 (t) is an acoustic signal that emphasizes the sound SV1 from the sound source S1 (suppresses the sound SV2 from the sound source S2), and the separated signal Y2 (t) emphasizes the sound SV2 (suppresses the sound SV1). Sound signal. That is, the sound SV1 and the sound SV2 are separated (sound source separation).

分離信号Ｙ1(t)および分離信号Ｙ2(t)は、スピーカやヘッドホン等の放音機器（図示略）に供給されることで音響として再生される。なお、分離信号Ｙ1(t)および分離信号Ｙ2(t)の一方のみを再生する構成（例えば分離信号Ｙ2(t)を雑音として破棄する構成）も採用される。なお、観測信号Ｖ1(t)および観測信号Ｖ2(t)をアナログからデジタルに変換するＡ/Ｄ変換器や、分離信号Ｙ1(t)および分離信号Ｙ2(t)をデジタルからアナログに変換するＤ/Ａ変換器の図示は便宜的に省略されている。 The separated signal Y1 (t) and the separated signal Y2 (t) are reproduced as sound by being supplied to a sound emitting device (not shown) such as a speaker or headphones. A configuration in which only one of the separated signal Y1 (t) and the separated signal Y2 (t) is reproduced (for example, a configuration in which the separated signal Y2 (t) is discarded as noise) is also employed. An A / D converter that converts the observation signal V1 (t) and the observation signal V2 (t) from analog to digital, and a D that converts the separation signal Y1 (t) and the separation signal Y2 (t) from digital to analog. The illustration of the / A converter is omitted for convenience.

図１に示すように、信号処理装置１００は、演算処理装置１２と記憶装置１４とを含むコンピュータシステムで実現される。記憶装置１４は、観測信号Ｖ1(t)および観測信号Ｖ2(t)から分離信号Ｙ1(t)および分離信号Ｙ2(t)を生成するためのプログラムや各種のデータを記憶する。半導体記録媒体や磁気記録媒体などの公知の記録媒体や複数種の記録媒体の組合せが記憶装置１４として任意に採用される。 As shown in FIG. 1, the signal processing device 100 is realized by a computer system including an arithmetic processing device 12 and a storage device 14. The storage device 14 stores a program and various data for generating the separation signal Y1 (t) and the separation signal Y2 (t) from the observation signal V1 (t) and the observation signal V2 (t). A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media is arbitrarily employed as the storage device 14.

演算処理装置１２は、記憶装置１４に格納されたプログラムを実行することで複数の要素（周波数分析部２２，信号分離部２４，信号合成部２６，分離行列生成部２８）として機能する。なお、音源分離に専用される電子回路（ＤＳＰ）が図１の各要素を実現する構成や、図１の各要素を複数の集積回路に分散した構成も採用され得る。 The arithmetic processing unit 12 functions as a plurality of elements (frequency analysis unit 22, signal separation unit 24, signal synthesis unit 26, and separation matrix generation unit 28) by executing a program stored in the storage device 14. A configuration in which an electronic circuit (DSP) dedicated to sound source separation realizes each element in FIG. 1 or a configuration in which each element in FIG. 1 is distributed over a plurality of integrated circuits may be employed.

周波数分析部２２は、観測信号Ｖ1(t)の周波数スペクトル（複素スペクトル）Ｑ1と観測信号Ｖ2(t)の周波数スペクトル（複素スペクトル）Ｑ2とを、時間軸上の複数のフレームの各々について生成する。図２に示すように、周波数スペクトルＱ1は、周波数軸上に設定されたＫ個の周波数（実際には周波数帯域）ｆ1〜ｆKの各々における成分値ｘ1(m,f1)〜ｘ1(m,fK)の系列である。同様に、周波数スペクトルＱ2は、Ｋ個の周波数ｆ1〜ｆKの各々における成分値ｘ2(m,f1)〜ｘ2(m,fK)の系列である。記号ｍは、フレームの番号（時間軸上に離散的に設定された各時点）を意味する。周波数スペクトルＱ1および周波数スペクトルＱ2の算定には公知の技術（例えば短時間フーリエ変換）が任意に採用される。 The frequency analyzer 22 generates a frequency spectrum (complex spectrum) Q1 of the observation signal V1 (t) and a frequency spectrum (complex spectrum) Q2 of the observation signal V2 (t) for each of a plurality of frames on the time axis. . As shown in FIG. 2, the frequency spectrum Q1 has component values x1 (m, f1) to x1 (m, fK) at each of K frequencies (actually frequency bands) f1 to fK set on the frequency axis. ) Series. Similarly, the frequency spectrum Q2 is a series of component values x2 (m, f1) to x2 (m, fK) at each of the K frequencies f1 to fK. The symbol m means the frame number (each time point set discretely on the time axis). A known technique (for example, short-time Fourier transform) is arbitrarily employed for calculating the frequency spectrum Q1 and the frequency spectrum Q2.

周波数分析部２２が生成した周波数スペクトルＱ1および周波数スペクトルＱ2は図１の信号分離部２４に供給される。信号分離部２４は、観測信号Ｖ1(t)における周波数ｆk（ｋ＝１〜Ｋ）の成分（成分値ｘ1(m,fk)）と観測信号Ｖ2(t)における周波数ｆkの成分（成分値ｘ2(m,fk)）とに対する音源分離をＫ個の周波数ｆ1〜ｆKの各々について個別に実行することで分離信号Ｙ1(t)の周波数スペクトルＲ1と分離信号Ｙ2(t)の周波数スペクトルＲ2とを生成する。周波数スペクトルＲ1は成分値ｙ1(m,f1)〜ｙ1(m,fK)の系列であり、周波数スペクトルＲ2は成分値ｙ2(m,f1)〜ｙ2(m,fK)の系列である。 The frequency spectrum Q1 and the frequency spectrum Q2 generated by the frequency analysis unit 22 are supplied to the signal separation unit 24 of FIG. The signal separation unit 24 has a component (component value x1 (m, fk)) of the frequency fk (k = 1 to K) in the observation signal V1 (t) and a component (component value x2) of the frequency fk in the observation signal V2 (t). (m, fk)) and the frequency spectrum R1 of the separated signal Y1 (t) and the frequency spectrum R2 of the separated signal Y2 (t) by separately performing the sound source separation for each of the K frequencies f1 to fK. Generate. The frequency spectrum R1 is a sequence of component values y1 (m, f1) to y1 (m, fK), and the frequency spectrum R2 is a sequence of component values y2 (m, f1) to y2 (m, fK).

図３は、信号分離部２４のブロック図である。図３に示すように、信号分離部２４は、相異なる周波数ｆk（ｆ1〜ｆK）に対応するＫ個の処理部Ｐ1〜ＰKで構成される。周波数ｆkの処理部Ｐkは、成分値ｘ1(m,fk)および成分値ｘ2(m,fk)から分離信号Ｙ1(t)の成分値ｙ1(m,fk)を生成するフィルタ３２と、成分値ｘ1(m,fk)および成分値ｘ2(m,fk)から分離信号Ｙ2(t)の成分値ｙ2(m,fk)を生成するフィルタ３４とを含んで構成される。 FIG. 3 is a block diagram of the signal separation unit 24. As shown in FIG. 3, the signal separation unit 24 includes K processing units P1 to PK corresponding to different frequencies fk (f1 to fK). The processing unit Pk having the frequency fk includes a filter 32 that generates a component value y1 (m, fk) of the separated signal Y1 (t) from the component value x1 (m, fk) and the component value x2 (m, fk), and a component value and a filter 34 for generating a component value y2 (m, fk) of the separated signal Y2 (t) from x1 (m, fk) and the component value x2 (m, fk).

フィルタ３２およびフィルタ３４は、遅延加算型（ＤＳ（delay-sum）型）のビーム形成を実行する。すなわち、処理部Ｐkのフィルタ３２は、以下の数式(1A)で定義されるように、係数ｗ11(fk)に応じた遅延を成分値ｘ1(m,fk)に付加する遅延素子３２１と、係数ｗ12(fk)に応じた遅延を成分値ｘ2(m,fk)に付加する遅延素子３２３と、遅延素子３２１および遅延素子３２３の各出力の加算で成分値ｙ1(m,fk)を生成する加算部３２５とを含んで構成される。同様に、フィルタ３４は、以下の数式(1B)で定義されるように、係数ｗ21(fk)に応じた遅延を成分値ｘ1(m,fk)に付加する遅延素子３４１と、係数ｗ22(fk)に応じた遅延を成分値ｘ2(m,fk)に付加する遅延素子３４３と、遅延素子３４１および遅延素子３４３の各出力の加算で成分値ｙ2(m,fk)を生成する加算部３４５とを含む。なお、死角制御型（null）のビーム形成も処理部Ｐkに適用され得る。
ｙ1(m,fk)＝ｗ11(fk)・ｘ1(m,fk)＋ｗ12(fk)・ｘ2(m,fk) ……(1A)
ｙ2(m,fk)＝ｗ21(fk)・ｘ1(m,fk)＋ｗ22(fk)・ｘ2(m,fk) ……(1B) The filter 32 and the filter 34 execute delay-added type (DS (delay-sum) type) beam forming. That is, the filter 32 of the processing unit Pk includes a delay element 321 that adds a delay corresponding to the coefficient w11 (fk) to the component value x1 (m, fk), as defined by the following formula (1A), Delay element 323 that adds a delay corresponding to w12 (fk) to component value x2 (m, fk), and addition that generates component value y1 (m, fk) by adding the outputs of delay element 321 and delay element 323 Part 325. Similarly, the filter 34 includes a delay element 341 that adds a delay corresponding to the coefficient w21 (fk) to the component value x1 (m, fk) and a coefficient w22 (fk) as defined by the following formula (1B). ) And a delay element 343 that adds a delay corresponding to the component value x2 (m, fk), and an adder 345 that generates a component value y2 (m, fk) by adding the outputs of the delay element 341 and the delay element 343; including. Note that blind spot control type (null) beam forming can also be applied to the processing unit Pk.
y1 (m, fk) = w11 (fk) x1 (m, fk) + w12 (fk) x2 (m, fk) (1A)
y2 (m, fk) = w21 (fk) x1 (m, fk) + w22 (fk) x2 (m, fk) (1B)

図１の信号合成部２６は、信号分離部２４がフレーム毎に生成した周波数スペクトルＲ1（ｙ1(m,f1)〜ｙ1(m,fK)）の逆フーリエ変換で時間領域の音響信号を生成するとともに前後のフレームを相互に連結することで分離信号Ｙ1(t)を生成する。同様に、信号合成部２６は、信号分離部２４が生成した周波数スペクトルＲ2（ｙ2(m,f1)〜ｙ2(m,fK)）から時間領域の分離信号Ｙ2(t)を生成する。 1 generates a time domain acoustic signal by inverse Fourier transform of the frequency spectrum R1 (y1 (m, f1) to y1 (m, fK)) generated by the signal separation unit 24 for each frame. At the same time, the separated signal Y1 (t) is generated by connecting the preceding and following frames to each other. Similarly, the signal synthesizer 26 generates a time-domain separation signal Y2 (t) from the frequency spectrum R2 (y2 (m, f1) to y2 (m, fK)) generated by the signal separation unit 24.

図１に示すように、周波数分析部２２が生成した周波数スペクトルＱ1および周波数スペクトルＱ2は、信号分離部２４に供給されるとともに観測ベクトルＸ(m,f1)〜Ｘ(m,fK)として記憶装置１４に格納される。観測ベクトルＸ(m,fk)は、図２に示すように、成分値ｘ1(m,fk)と成分値ｘ2(m,fk)とを要素とするベクトル（Ｘ(m,fk)＝（ｘ1(m,fk)，ｘ2(m,fk)）^Ｔ）である。記号Ｔは行列の転置を意味する。記憶装置１４に格納された観測ベクトルＸ(m,f1)〜Ｘ(m,fK)は、図２に示すように、所定個（例えば50個）のフレームで構成される単位区間ＴU毎に学習データＤ(f1)〜Ｄ(fK)に区分される。すなわち、学習データＤ(fk)は、単位区間ＴU内の各フレームについて算定された周波数ｆkの観測ベクトルＸ(m,fk)の時系列である。 As shown in FIG. 1, the frequency spectrum Q1 and the frequency spectrum Q2 generated by the frequency analysis unit 22 are supplied to the signal separation unit 24 and stored as observation vectors X (m, f1) to X (m, fK). 14. As shown in FIG. 2, the observation vector X (m, fk) is a vector (X (m, fk) = (x1) having the component value x1 (m, fk) and the component value x2 (m, fk) as elements. (m, fk), x2 (m, fk)) ^T ). The symbol T means transposition of the matrix. The observation vectors X (m, f1) to X (m, fK) stored in the storage device 14 are learned for each unit interval TU composed of a predetermined number (for example, 50) of frames as shown in FIG. Data is divided into D (f1) to D (fK). That is, the learning data D (fk) is a time series of the observation vector X (m, fk) of the frequency fk calculated for each frame in the unit interval TU.

図１および図３の分離行列生成部２８は、信号分離部２４が音源分離に適用する分離行列Ｗ(f1)〜Ｗ(fK)を生成する。周波数ｆkの分離行列Ｗ(fk)は、図３に示すように、処理部Ｐkのフィルタ３２に適用される係数ｗ11(fk)および係数ｗ12(fk)とフィルタ３４に適用される係数ｗ21(fk)および係数ｗ22(fk)とを要素とする２行２列の行列である。分離行列Ｗ(fk)は、記憶装置１４の学習データＤ(fk)を利用した学習処理（反復的な更新）で単位区間ＴU毎に順次に生成される。図４は、分離行列生成部２８のブロック図であり、図５は、分離行列生成部２８の動作の説明図である。図４に示すように、分離行列生成部２８は、有意指標算定部４２と周波数選別部４４と第１処理部５０と第２処理部６０とを含んで構成される。 The separation matrix generation unit 28 in FIGS. 1 and 3 generates separation matrices W (f1) to W (fK) that the signal separation unit 24 applies to sound source separation. As shown in FIG. 3, the separation matrix W (fk) of the frequency fk is obtained by applying a coefficient w11 (fk) and a coefficient w12 (fk) applied to the filter 32 of the processing unit Pk and a coefficient w21 (fk) applied to the filter 34. ) And coefficient w22 (fk) as a matrix. The separation matrix W (fk) is sequentially generated for each unit interval TU by a learning process (repetitive update) using the learning data D (fk) stored in the storage device 14. FIG. 4 is a block diagram of the separation matrix generation unit 28, and FIG. 5 is an explanatory diagram of the operation of the separation matrix generation unit 28. As illustrated in FIG. 4, the separation matrix generation unit 28 includes a significant index calculation unit 42, a frequency selection unit 44, a first processing unit 50, and a second processing unit 60.

有意指標算定部４２は、周波数ｆkの学習データＤ(fk)を適用した学習処理の有意性の尺度となる有意指標値Ｚ(fk)（Ｚ(f1)〜Ｚ(fK)）をＫ個の周波数ｆ1〜ｆKの各々について算定する。有意指標値Ｚ(fk)は、学習データＤ(fk)を利用した学習処理の結果として分離行列Ｗ(fk)による音源分離の精度が向上する度合を示す数値に相当する。周波数選別部４４は、図５に示すように、Ｋ個の周波数ｆ1〜ｆKの各々を有意指標値Ｚ(fk)に応じて第１周波数ｆAと第２周波数ｆBとに選別（分類）する。第１周波数ｆAは、学習データＤ(fk)を適用した学習処理の有意性が第２周波数ｆBと比較して高い周波数である。 The significant index calculation unit 42 calculates K significant index values Z (fk) (Z (f1) to Z (fK)) as a measure of the significance of the learning process using the learning data D (fk) of the frequency fk. Calculation is made for each of the frequencies f1 to fK. The significant index value Z (fk) corresponds to a numerical value indicating the degree to which the accuracy of sound source separation by the separation matrix W (fk) is improved as a result of the learning process using the learning data D (fk). As shown in FIG. 5, the frequency sorting unit 44 sorts (classifies) each of the K frequencies f1 to fK into the first frequency fA and the second frequency fB according to the significant index value Z (fk). The first frequency fA is a frequency at which the significance of the learning process using the learning data D (fk) is higher than that of the second frequency fB.

第１実施形態の有意指標算定部４２は、学習データＤ(fk)（観測ベクトルＸ(m,fk)）の共分散行列Ｒxx(fk)の行列式ｚ1(fk)を有意指標値Ｚ(fk)として周波数ｆk毎に算定する要素であり、図６に示すように共分散行列算定部４２２と行列式算定部４２４とを含んで構成される。 The significant index calculation unit 42 of the first embodiment uses the determinant z1 (fk) of the covariance matrix Rxx (fk) of the learning data D (fk) (observation vector X (m, fk)) as the significant index value Z (fk). ) And is calculated for each frequency fk, and includes a covariance matrix calculation unit 422 and a determinant calculation unit 424 as shown in FIG.

共分散行列算定部４２２は、Ｋ個の周波数ｆ1〜ｆKの各々について学習データＤ(fk)の共分散行列Ｒxx(fk)（Ｒxx(f1)〜Ｒxx(fK)）を算定する。周波数ｆkの共分散行列Ｒxx(fk)は、学習データＤ(fk)内の複数の観測ベクトルＸ(m,fk)の共分散を要素とする行列である。すなわち、共分散行列Ｒxx(fk)は、例えば以下の数式(2)で算定される。
Ｒxx(fk)＝Ｅ［Ｘ(m,fk)Ｘ(m,fk)^H］ ……(2)
記号Ｈは行列の転置（共役転置）を意味し、記号Ｅ[ ]は、単位区間ＴU内の複数のフレーム（学習データＤ(fk)の全体）にわたる平均値または加算値を意味する。すなわち、共分散行列Ｒxx(fk)は、単位区間ＴU毎（学習データＤ(fk)毎）に生成される２行２列の正方行列である。 The covariance matrix calculation unit 422 calculates the covariance matrix Rxx (fk) (Rxx (f1) to Rxx (fK)) of the learning data D (fk) for each of the K frequencies f1 to fK. The covariance matrix Rxx (fk) of the frequency fk is a matrix having the covariance of a plurality of observation vectors X (m, fk) in the learning data D (fk) as elements. That is, the covariance matrix Rxx (fk) is calculated by the following formula (2), for example.
Rxx (fk) = E [X (m, fk) X (m, fk) ^H ] (2)
The symbol H means transposition (conjugate transposition) of the matrix, and the symbol E [] means an average value or an addition value over a plurality of frames (the entire learning data D (fk)) in the unit interval TU. That is, the covariance matrix Rxx (fk) is a 2-by-2 square matrix generated for each unit interval TU (each learning data D (fk)).

図６の行列式算定部４２４は、共分散行列算定部４２２が算定したＫ個の共分散行列Ｒxx(f1)〜Ｒxx(fK)の各々について行列式ｚ1(fk)（ｚ1(f1)〜ｚ1(fK)）を算定する。行列式ｚ1(fk)の算定には公知の方法が任意に採用されるが、例えば共分散行列Ｒxx(fk)の特異値分解を利用した以下の方法が好適である。なお、以下では便宜的に共分散行列Ｒxx(fk)をＪ行Ｊ列（本実施形態ではＪ＝２）と一般化する。 The determinant calculating unit 424 in FIG. 6 determines the determinant z1 (fk) (z1 (f1) to z1) for each of the K covariance matrices Rxx (f1) to Rxx (fK) calculated by the covariance matrix calculating unit 422. (fK)) is calculated. A known method is arbitrarily employed for calculating the determinant z1 (fk). For example, the following method using singular value decomposition of the covariance matrix Rxx (fk) is preferable. Hereinafter, for convenience, the covariance matrix Rxx (fk) is generalized to J rows and J columns (J = 2 in the present embodiment).

共分散行列Ｒxx(fk)は以下の数式(3)のように特異値分解される。数式(3)における行列Ｆは、２行２列の直交行列であり、行列Ｄは、対角成分（ｄ1，……，ｄJ）以外の要素がゼロとなるＪ行Ｊ列の特異値行列（対角行列）である。
Ｒxx(fk)＝ＦＤＦ^H ……(3) The covariance matrix Rxx (fk) is subjected to singular value decomposition as shown in Equation (3) below. The matrix F in Equation (3) is an orthogonal matrix of 2 rows and 2 columns, and the matrix D is a singular value matrix of J rows and J columns in which elements other than the diagonal components (d1,..., DJ) are zero. Diagonal matrix).
Rxx (fk) = FDF ^H (3)

数式(3)の特異値分解を考慮すると、共分散行列Ｒxx(fk)の行列式ｚ1(fk)は、以下の数式(4)で表現される。数式(4)の導出には、行列Ｆの共役転置行列Ｆ^Hと行列Ｆとの積がＪ次の単位行列であるという関係（Ｆ^HＦ＝Ｉ）や、行列ＡＢの行列式det(ＡＢ)が行列ＢＡの行列式det(ＢＡ)に等しいという関係を利用した。
ｚ1(fk)＝det(Ｒxx(fk))
＝det(ＦＤＦ^H)
＝det(Ｄ)
＝ｄ1・ｄ2・……・ｄJ ……(4) Considering the singular value decomposition of Equation (3), the determinant z1 (fk) of the covariance matrix Rxx (fk) is expressed by Equation (4) below. In order to derive Equation (4), the product of the conjugate transpose matrix F ^H of the matrix F and the matrix F is a J-th order unit matrix (F ^H F = I), or the determinant det (AB of the matrix AB ) Is equal to the determinant det (BA) of the matrix BA.
z1 (fk) = det (Rxx (fk))
= Det (FDF ^H )
= Det (D)
= D1 · d2 · · · dJ (4)

数式(4)から理解されるように、共分散行列Ｒxx(fk)の行列式ｚ1(fk)は、共分散行列Ｒxx(fk)の特異値分解で特定される特異値行列ＤのＪ個の対角成分（ｄ1，……，ｄJ）の乗算値に相当する。図６の行列式算定部４２４は、Ｋ個の周波数ｆ1〜ｆKの各々について数式(4)の演算を実行することで行列式ｚ1(f1)〜ｚ1(fK)を算定する。 As can be understood from the equation (4), the determinant z1 (fk) of the covariance matrix Rxx (fk) is represented by J pieces of the singular value matrix D specified by the singular value decomposition of the covariance matrix Rxx (fk). This corresponds to the multiplication value of the diagonal components (d1,..., DJ). The determinant calculating unit 424 in FIG. 6 calculates the determinants z1 (f1) to z1 (fK) by executing the calculation of the equation (4) for each of the K frequencies f1 to fK.

図７は、単位区間ＴU内の各観測ベクトルＸ(m,fk)の散布図である。横軸は成分値ｘ1(m,fk)を意味し、縦軸は成分値ｘ2(m,fk)を意味する。図７の部分(A)は、行列式ｚ1(fk)が大きい場合の散布図であり、図７の部分(B)は、行列式ｚ1(fk)が小さい場合の散布図である。図７の部分(A)のように、行列式ｚ1(fk)が大きい場合には、観測ベクトルＸ(m,fk)の分布する領域の軸線（基底）が音源Ｓ毎に明確に区別される。具体的には、音源Ｓ1からの音響ＳV1が優勢な観測ベクトルＸ(m,fk)が軸線α1に沿って分布する領域Ａ1と、音源Ｓ2からの音響ＳV2が優勢な観測ベクトルＸ(m,fk)が軸線α2に沿って分布する領域Ａ2とが明確に区別される。他方、行列式ｚ1(fk)が小さい場合、散布図で明確に区別できる観測ベクトルＸ(m,fk)の分布の領域の個数（軸線の本数）が実際の音源Ｓの総数を下回る。例えば、図７の部分(B)のように、音源Ｓ2からの音響ＳV2に対応する明確な領域Ａ2（軸線α2）が存在しない。 FIG. 7 is a scatter diagram of each observation vector X (m, fk) in the unit interval TU. The horizontal axis represents the component value x1 (m, fk), and the vertical axis represents the component value x2 (m, fk). Part (A) in FIG. 7 is a scatter diagram when the determinant z1 (fk) is large, and part (B) in FIG. 7 is a scatter diagram when the determinant z1 (fk) is small. When the determinant z1 (fk) is large as in part (A) of FIG. 7, the axis (base) of the region in which the observation vector X (m, fk) is distributed is clearly distinguished for each sound source S. . Specifically, an observation vector X (m, fk) in which the observation vector X (m, fk) from which the sound SV1 from the sound source S1 is dominant is distributed along the axis α1 and the sound SV2 from the sound source S2 is dominant. ) Is clearly distinguished from the region A2 in which it is distributed along the axis α2. On the other hand, when the determinant z1 (fk) is small, the number of regions (number of axes) of the distribution of the observed vectors X (m, fk) that can be clearly distinguished in the scatter diagram is less than the total number of the actual sound sources S. For example, there is no clear area A2 (axis α2) corresponding to the sound SV2 from the sound source S2 as shown in part (B) of FIG.

以上の傾向から理解されるように、共分散行列Ｒxx(fk)の行列式ｚ1(fk)は、学習データＤ(fk)を構成する各観測ベクトルＸ(m,fk)の分布における基底（観測ベクトルＸ(m,fk)が分布する領域の軸線）の総数の指標として機能する。すなわち、行列式ｚ1(fk)が大きい周波数ｆkほど基底が多いという傾向がある。行列式ｚ1(fk)がゼロとなる周波数ｆkには独立な基底が１個しか含まれない。分離行列Ｗ(fk)の学習処理に適用される独立成分分析は、独立な基底を音源Ｓの個数だけ特定する処理と等価であるから、Ｋ個の周波数ｆ1〜ｆKのうち共分散行列Ｒxx(fk)の行列式ｚ1(fk)が小さい周波数ｆkの学習データＤ(fk)については学習の有意性（学習処理の前後で音源分離の精度が向上する度合）が低いと言える。 As can be understood from the above tendency, the determinant z1 (fk) of the covariance matrix Rxx (fk) is the basis (observation) in the distribution of each observation vector X (m, fk) constituting the learning data D (fk). It functions as an index of the total number of axes in the region in which the vector X (m, fk) is distributed. That is, there is a tendency that the frequency fk with the larger determinant z1 (fk) has more bases. The frequency fk at which the determinant z1 (fk) is zero includes only one independent basis. The independent component analysis applied to the learning process of the separation matrix W (fk) is equivalent to the process of specifying the independent bases by the number of the sound sources S. Therefore, the covariance matrix Rxx ( It can be said that the learning data D (fk) of the frequency fk having a small determinant z1 (fk) of fk) has low learning significance (the degree to which the accuracy of sound source separation improves before and after the learning process).

行列式ｚ1(fk)の以上の性質を利用して、図４の周波数選別部４４は、Ｋ個の周波数ｆ1〜ｆKのうち行列式ｚ1(fk)が大きい１以上の周波数ｆkを第１周波数ｆAに選別し、行列式ｚ1(fk)が小さい残余の周波数ｆkを第２周波数ｆBに選別する。具体的には、周波数選別部４４は、Ｋ個の周波数ｆ1〜ｆKのうち行列式ｚ1(f1)〜ｚ1(fK)の降順で上位に位置する所定個の周波数ｆkや、Ｋ個の周波数ｆ1〜ｆKのうち行列式ｚ1(fk)が所定の閾値を上回る１個以上の周波数ｆkを第１周波数ｆAに選別し、第１周波数ｆA以外の周波数ｆkを第２周波数ｆBに選別する。周波数選別部４４による第１周波数ｆA／第２周波数ｆBの選別は、例えば単位区間ＴU毎に順次に実行される。 Using the above property of the determinant z1 (fk), the frequency selecting unit 44 in FIG. 4 selects one or more frequencies fk having a large determinant z1 (fk) among the K frequencies f1 to fK as the first frequency. The remaining frequency fk having a small determinant z1 (fk) is selected as the second frequency fB. Specifically, the frequency selection unit 44 includes a predetermined number of frequencies fk positioned in descending order of the determinants z1 (f1) to z1 (fK) among the K frequencies f1 to fK, and K frequencies f1. One or more frequencies fk in which the determinant z1 (fk) exceeds a predetermined threshold value are selected as the first frequency fA, and frequencies fk other than the first frequency fA are selected as the second frequency fB. The selection of the first frequency fA / the second frequency fB by the frequency selection unit 44 is executed sequentially for each unit interval TU, for example.

図４の第１処理部５０および第２処理部６０は、信号分離部２４で使用される分離行列Ｗ(fk)（Ｗ(f1)〜Ｗ(fK)）を周波数ｆk毎に生成する。第１処理部５０は、周波数選別部４４が第１周波数ｆAに選別した各周波数ｆkについて分離行列Ｗ(fk)（以下では特に「分離行列ＷA(fk)」と表記する場合がある）を生成し、第２処理部６０は、周波数選別部４４が第２周波数ｆBに選別した各周波数ｆkについて分離行列Ｗ(fk)（以下では特に「分離行列ＷB(fk)」と表記する場合がある）を生成する。 The first processing unit 50 and the second processing unit 60 in FIG. 4 generate a separation matrix W (fk) (W (f1) to W (fK)) used in the signal separation unit 24 for each frequency fk. The first processing unit 50 generates a separation matrix W (fk) (hereinafter sometimes referred to as “separation matrix WA (fk)”) for each frequency fk selected by the frequency selection unit 44 as the first frequency fA. The second processing unit 60 then separates each frequency fk selected by the frequency selecting unit 44 into the second frequency fB (hereinafter, sometimes referred to as “separation matrix WB (fk)”). Is generated.

図４に示すように、第１処理部５０は、初期行列設定部５２と第１学習部５４と補正処理部５６とを含んで構成される。初期行列設定部５２は、分離行列ＷA(fk)を生成する学習処理の初期値（以下「初期分離行列」という）ＷA^[0](fk)を設定する。初期分離行列ＷA^[0](fk)の設定の方法は任意であるが、例えば単位行列が初期分離行列ＷA^[0](fk)として設定され得る。以上のように観測ベクトルＸ(m,fk)とは無関係に初期分離行列ＷA^[0](fk)を設定する構成によれば、音源Ｓ1や音源Ｓ2に関する事前情報が不要であるという利点がある。 As shown in FIG. 4, the first processing unit 50 includes an initial matrix setting unit 52, a first learning unit 54, and a correction processing unit 56. The initial matrix setting unit 52 sets an initial value (hereinafter referred to as “initial separation matrix”) WA ^[0] (fk) of learning processing for generating the separation matrix WA (fk). Although the method of setting the initial separation matrix WA ^[0] (fk) is arbitrary, for example, a unit matrix can be set as the initial separation matrix WA ^[0] (fk). As described above, according to the configuration in which the initial separation matrix WA ^[0] (fk) is set regardless of the observation vector X (m, fk), there is an advantage that prior information on the sound source S1 and the sound source S2 is unnecessary. .

図４の第１学習部５４は、図５に示すように、初期行列設定部５２が設定した初期分離行列ＷA^[0](fk)を初期値とした逐次的な更新（以下「１次学習処理」という）で、第１周波数ｆAに選別された各周波数ｆkの分離行列ＷA(fk)を生成する。第１学習部５４による１次学習処理には公知の技術が任意に採用され得るが、例えば、第(n+1)回目の更新後の分離行列Ｗ^[n+1](fk)を直前の分離行列Ｗ^[n](fk)（分離行列Ｗ^[1](fk)の算定時には初期分離行列ＷA^[0](fk)）から算定する数式(5)の演算が好適である。
Ｗ^[n+1](fk)＝Ｗ^[n](fk)−η｛off-diag(Ｅ[φ(m,fk)Ｙ^[n](m,fk)^H]Ｗ^[n](fk) ……(5) As shown in FIG. 5, the first learning unit 54 in FIG. 4 performs sequential updating (hereinafter referred to as “primary learning”) using the initial separation matrix WA ^[0] (fk) set by the initial matrix setting unit 52 as an initial value. In the processing, a separation matrix WA (fk) of each frequency fk selected as the first frequency fA is generated. For the primary learning process by the first learning unit 54, a known technique can be arbitrarily adopted. For example, the updated (n + 1) -th updated separation matrix W ^{[n + 1]} (fk) The calculation of Equation (5) calculated from the separation matrix W ^[n] (fk) (when calculating the separation matrix W ^[1] (fk), the initial separation matrix WA ^[0] (fk)) is preferable.
W ^{[n + 1]} (fk) = W ^[n] (fk) −η {off-diag (E [φ (m, fk) Y ^[n] (m, fk) ^H ] W ^[n] (fk) ……(Five)

数式(5)の記号ηは所定の定数（ステップサイズ）であり、記号off-diag( )は、対角成分をゼロに置換する演算子である。また、記号φ( )は非線形関数を意味する。例えば双曲線正接関数（tanh：ハイパボリックタンジェント）が非線形関数φ( )として適用され得る。数式(5)の記号Ｙ^[n](m,fk)は、直前の分離行列Ｗ^[n-1](m,fk)を適用した数式(1A)および数式(1B)の演算で算定される成分値ｙ1(m,fk)と成分値ｙ2(m,fk)とを要素とするベクトル（Ｙ^[n](m,fk)＝（ｙ1(m,fk),ｙ2(m,fk)）^T）である。第１学習部５４は、数式(5)の演算をＮA回だけ反復した時点の分離行列Ｗ^[NA](fk)を分離行列ＷA(fk)として確定する。ただし、第１学習部５４は、数式(5)で算定される分離行列Ｗ^[n+1](fk)が収束したと判定される場合には反復がＮA回に到達する以前に１次学習処理を終了し、その時点での分離行列Ｗ^[n+1](fk)を分離行列ＷA(fk)として確定する。 Symbol η in equation (5) is a predetermined constant (step size), and symbol off-diag () is an operator that replaces the diagonal component with zero. The symbol φ () means a nonlinear function. For example, a hyperbolic tangent function (tanh: hyperbolic tangent) can be applied as the nonlinear function φ (). The symbol Y ^[n] (m, fk) in the formula (5) is calculated by the calculation of the formula (1A) and the formula (1B) to which the immediately preceding separation matrix W ^[n-1] (m, fk) is applied. A vector (Y ^[n] (m, fk) = (y1 (m, fk), y2 (m, fk)) ^{T having the} component value y1 (m, fk) and the component value y2 (m, fk) as elements ). The first learning unit 54 determines the separation matrix W ^[NA] (fk) as the separation matrix WA (fk) when the calculation of Expression (5) is repeated NA times. However, if it is determined that the separation matrix W ^{[n + 1]} (fk) calculated by Equation (5) has converged, the first learning unit 54 performs primary learning before the iteration reaches NA times. The process is terminated, and the separation matrix W ^{[n + 1]} (fk) at that time is determined as the separation matrix WA (fk).

ところで、独立成分分析（１次学習処理）で算定される分離行列ＷA(fk)には、音源分離の実行後の各信号の振幅が不定であるという問題（scaling問題）と、音源分離後の各信号と各音源との組合せが周波数ｆk毎に変化し得るという問題（permutation問題）とがある。図４の補正処理部５６は、第１周波数ｆAに選別された各周波数ｆkについて第１学習部５４が生成した各分離行列ＷA(fk)をscaling問題とpermutation問題とが解決されるように補正する。 By the way, the separation matrix WA (fk) calculated by independent component analysis (primary learning processing) has a problem that the amplitude of each signal after execution of sound source separation is indefinite (scaling problem), and There is a problem (permutation problem) that the combination of each signal and each sound source can change for each frequency fk. 4 corrects each separation matrix WA (fk) generated by the first learning unit 54 for each frequency fk selected as the first frequency fA so that the scaling problem and the permutation problem are solved. To do.

以上のscaling問題およびpermutation問題の解決には公知の技術が任意に採用される。例えば、分離行列ＷA(fk)の逆行列の対角成分で構成される対角行列を分離行列ＷA(fk)に乗算することでscaling問題が解決され、分離行列ＷA(fk)から推定される各音源の方向が整合するように分離行列ＷA(fk)の各行を相互に置換することでpermutation問題が解決される。補正処理部５６による補正後の各分離行列ＷA(fk)が、信号分離部２４のＫ個の処理部Ｐ1〜ＰKのうち第１周波数ｆAに選別された各周波数ｆkの処理部Ｐkにて適用される。scaling問題やpermutation問題の解決については、猿渡ほか５名，“Blind Source Separation Combininb Independent Component Analysis and Beamforming"，EURASIP Journal on Applied Signal Processing Vol.2003, No.11, p.1135-1146, 2003（以下「非特許文献２」という）にも詳述されている。 Known techniques are arbitrarily employed to solve the above scaling problem and permutation problem. For example, the scaling problem is solved by multiplying the separation matrix WA (fk) by a diagonal matrix composed of the diagonal components of the inverse matrix of the separation matrix WA (fk), and is estimated from the separation matrix WA (fk). The permutation problem is solved by replacing each row of the separation matrix WA (fk) with each other so that the directions of the sound sources are matched. Each separation matrix WA (fk) corrected by the correction processing unit 56 is applied to the processing unit Pk of each frequency fk selected as the first frequency fA among the K processing units P1 to PK of the signal separation unit 24. Is done. Saruwatari et al., “Blind Source Separation Combininb Independent Component Analysis and Beamforming”, EURASIP Journal on Applied Signal Processing Vol.2003, No.11, p.1135-1146, 2003 (below) (Referred to as “Non-Patent Document 2”).

図４の第２処理部６０は、周波数選別部４４が第２周波数ｆBに選別した各周波数ｆkの分離行列ＷB(fk)を、第１処理部５０が生成した分離行列ＷA(fk)を利用して生成する。図４に示すように、第２処理部６０は、方向推定部６２と初期行列設定部６４と第２学習部６６とを含んで構成される。 The second processing unit 60 in FIG. 4 uses the separation matrix WB (fk) of each frequency fk selected by the frequency selection unit 44 as the second frequency fB and the separation matrix WA (fk) generated by the first processing unit 50. And generate. As shown in FIG. 4, the second processing unit 60 includes a direction estimation unit 62, an initial matrix setting unit 64, and a second learning unit 66.

方向推定部６２は、第１処理部５０が生成した各分離行列ＷA(fk)から音源Ｓ1の方向θ1と音源Ｓ2の方向θ2とを推定する。方向θ1および方向θ2の推定には公知の技術（例えば非特許文献２に開示された方法）が任意に採用されるが、例えば以下の方法が好適である。第１に、方向推定部６２は、図５に示すように、第１周波数ｆAに選別された周波数ｆk毎に方向θ1(fk)と方向θ2(fk)とを分離行列ＷA(fk)から推定する。例えば、分離行列ＷA(fk)の係数ｗ11(fk)と係数ｗ12(fk)とから方向θ1(fk)が特定され、係数ｗ21(fk)と係数ｗ22(fk)とから方向θ2(fk)が特定される。第２に、方向推定部６２は、図５に示すように、各周波数ｆk（第１周波数ｆA）の方向θ1(fk)および方向θ2(fk)から音源Ｓ1の方向θ1と音源Ｓ2の方向θ2とを算定する。例えば、各方向θ1(fk)の代表値（平均値や中央値）が方向θ1として特定され、各方向θ2(fk)の代表値が方向θ2として特定される。 The direction estimation unit 62 estimates the direction θ1 of the sound source S1 and the direction θ2 of the sound source S2 from each separation matrix WA (fk) generated by the first processing unit 50. For estimating the direction θ1 and the direction θ2, a known technique (for example, a method disclosed in Non-Patent Document 2) is arbitrarily adopted. For example, the following method is preferable. First, as shown in FIG. 5, the direction estimation unit 62 estimates the direction θ1 (fk) and the direction θ2 (fk) from the separation matrix WA (fk) for each frequency fk selected as the first frequency fA. To do. For example, the direction θ1 (fk) is specified from the coefficient w11 (fk) and the coefficient w12 (fk) of the separation matrix WA (fk), and the direction θ2 (fk) is determined from the coefficient w21 (fk) and the coefficient w22 (fk). Identified. Second, as shown in FIG. 5, the direction estimation unit 62 determines the direction θ1 of the sound source S1 and the direction θ2 of the sound source S2 from the direction θ1 (fk) and the direction θ2 (fk) of each frequency fk (first frequency fA). And calculate. For example, the representative value (average value or median value) of each direction θ1 (fk) is specified as the direction θ1, and the representative value of each direction θ2 (fk) is specified as the direction θ2.

図４の初期行列設定部６４は、図５に示すように、方向推定部６２が推定した方向θ1および方向θ2に応じて分離行列ＷB(fk)の初期分離行列ＷB^[0](fk)を設定する。初期分離行列ＷB^[0](fk)の生成には、例えば非特許文献２に開示された死角制御型のビーム形成が適用される。具体的には、初期行列設定部６４は、方向推定部６２が推定した方向θ2に収音の死角（収音感度が低い領域）が形成されるように算定された係数ｗ11(fk)および係数ｗ12(fk)と、方向推定部６２が推定した方向θ1に収音の死角が形成されるように算定された係数ｗ21(fk)および係数ｗ22(fk)とを要素とする初期分離行列ＷB^[0](fk)を生成する。初期分離行列ＷB^[0](fk)は、周波数選別部４４が第２周波数ｆBに選別した周波数ｆk毎に個別に生成される。 As shown in FIG. 5, the initial matrix setting unit 64 in FIG. 4 determines the initial separation matrix WB ^[0] (fk) of the separation matrix WB (fk) according to the direction θ1 and the direction θ2 estimated by the direction estimation unit 62. Set. For generating the initial separation matrix WB ^[0] (fk), for example, blind spot control type beam forming disclosed in Non-Patent Document 2 is applied. Specifically, the initial matrix setting unit 64 calculates the coefficient w11 (fk) and the coefficient calculated so that the dead angle of sound collection (region where sound collection sensitivity is low) is formed in the direction θ2 estimated by the direction estimation unit 62. Initial separation matrix WB ^[ w12 (fk) and coefficient w21 (fk) and coefficient w22 (fk) calculated so that a dead angle of sound collection is formed in direction θ1 estimated by direction estimation unit 62 ^0] (fk) is generated. The initial separation matrix WB ^[0] (fk) is individually generated for each frequency fk selected by the frequency selection unit 44 as the second frequency fB.

なお、以上の例示では死角制御型のビーム形成で初期分離行列ＷB^[0](fk)を生成したが、初期分離行列ＷB^[0](fk)の生成には、収音感度が高い領域（ビーム）を生成するビーム形成（例えば遅延加算型のビーム形成）も採用され得る。すなわち、方向θ1に収音のビームが指向するように初期分離行列ＷB^[0](fk)の係数ｗ11(fk)および係数ｗ12(fk)が設定され、方向θ2に収音のビームが指向するように初期分離行列ＷB^[0](fk)の係数ｗ21(fk)および係数ｗ22(fk)が設定される。 In the above example, the initial separation matrix WB ^[0] (fk) is generated by blind-angle-controlled beam forming. However, the initial separation matrix WB ^[0] (fk) is generated in a region where the sound collection sensitivity is high ( Beam forming (for example, delay-added beam forming) may be employed. That is, the coefficient w11 (fk) and the coefficient w12 (fk) of the initial separation matrix WB ^[0] (fk) are set so that the collected beam is directed in the direction θ1, and the collected beam is directed in the direction θ2. Thus, the coefficient w21 (fk) and coefficient w22 (fk) of the initial separation matrix WB ^[0] (fk) are set.

図４の第２学習部６６は、図５に示すように、初期行列設定部６４が設定した初期分離行列ＷB^[0](fk)を初期値とした逐次的な更新（以下「２次学習処理」という）で、第２周波数ｆBに選別された各周波数ｆkの分離行列ＷB(fk)を生成する。２次学習処理には公知の技術が任意に採用され得るが、１次学習処理と同様に、数式(5)の演算が好適に採用される。すなわち、第２学習部６６は、初期行列設定部６４が設定した初期分離行列ＷB^[0](fk)を初期値とし、第２周波数ｆBに選別された各周波数ｆkの学習データＤ(fk)から数式(1A)および数式(1B)で算定されるベクトルＹ^[n](m,fk)を利用して数式(5)の演算を反復する。 As shown in FIG. 5, the second learning unit 66 in FIG. 4 sequentially updates the initial separation matrix WB ^[0] (fk) set by the initial matrix setting unit 64 (hereinafter referred to as “secondary learning”). In the process, a separation matrix WB (fk) of each frequency fk selected as the second frequency fB is generated. A known technique can be arbitrarily employed for the secondary learning process, but the calculation of Expression (5) is preferably employed as in the primary learning process. That is, the second learning unit 66 uses the initial separation matrix WB ^[0] (fk) set by the initial matrix setting unit 64 as an initial value, and learns data D (fk) of each frequency fk selected as the second frequency fB. Then, the calculation of Formula (5) is repeated using the vector Y ^[n] (m, fk) calculated by Formula (1A) and Formula (1B).

第２学習部６６による数式(5)の反復回数ＮBは、第１学習部５４の反復回数ＮAを下回るように設定される（ＮB＜ＮA）。第２学習部６６は、数式(5)の演算をＮB回だけ反復した時点の分離行列Ｗ^[NB](fk)を第２周波数ｆBの分離行列ＷB(fk)として算定する。第１学習部５４と同様に、第２学習部６６は、分離行列Ｗ^[n+1](fk)が収束した場合には反復がＮB回に到達する以前に２次学習処理を終了し、その時点での分離行列Ｗ^[n+1](fk)を分離行列ＷB(fk)として確定する。以上の２次学習処理で生成された分離行列ＷB(fk)が、信号分離部２４のＫ個の処理部Ｐ1〜ＰKのうち第２周波数ｆBに選別された各周波数ｆkの処理部Ｐkにて適用される。なお、１次学習処理と２次学習処理とで演算の内容を相違させた構成も採用され得る。 The number of iterations NB of Equation (5) by the second learning unit 66 is set to be less than the number of iterations NA of the first learning unit 54 (NB <NA). The second learning unit 66 calculates the separation matrix W ^[NB] (fk) at the time when the calculation of the formula (5) is repeated NB times as the separation matrix WB (fk) of the second frequency fB. Similar to the first learning unit 54, when the separation matrix W ^{[n + 1]} (fk) converges, the second learning unit 66 ends the secondary learning process before the iteration reaches NB times, The separation matrix W ^{[n + 1]} (fk) at that time is determined as the separation matrix WB (fk). The separation matrix WB (fk) generated by the above-described secondary learning processing is processed by the processing unit Pk of each frequency fk selected as the second frequency fB among the K processing units P1 to PK of the signal separation unit 24. Applied. In addition, the structure which made the content of calculation differ by the primary learning process and the secondary learning process may be employ | adopted.

方向θ1と方向θ2とに応じた初期分離行列ＷB^[0](fk)から算定される分離行列ＷB(fk)には、事前情報を適用せずに生成される分離行列ＷA(fk)と比較すると、前述のscaling問題やpermutation問題は発生し難い。そこで、第２学習部６６が生成した分離行列ＷB(fk)には、scaling問題やpermutation問題を解決するための補正は実行されない。もっとも、第２学習部６６が生成した分離行列ＷB(fk)を補正処理部５６が補正する構成も採用され得る。 The separation matrix WB (fk) calculated from the initial separation matrix WB ^[0] (fk) corresponding to the direction θ1 and the direction θ2 is compared with the separation matrix WA (fk) generated without applying prior information. Then, the above-mentioned scaling problem and permutation problem are unlikely to occur. Therefore, correction for solving the scaling problem and the permutation problem is not performed on the separation matrix WB (fk) generated by the second learning unit 66. However, a configuration in which the correction processing unit 56 corrects the separation matrix WB (fk) generated by the second learning unit 66 may be employed.

以上に説明したように、本実施形態では、第１周波数ｆAに選別された各周波数ｆkについては反復回数ＮAの１次学習処理で分離行列ＷA(fk)が生成され、第２周波数ｆBに選別された各周波数ｆkについては、分離行列ＷA(fk)に応じて生成された初期分離行列ＷB^[0](fk)を初期値とした反復回数ＮB（ＮB＜ＮA）の２次学習処理で分離行列ＷB(fk)が生成される。したがって、Ｋ個の周波数ｆ1〜ｆKの全部について数式(5)の演算をＮA回だけ反復する構成と比較して、演算処理装置１２（分離行列生成部２８）の演算量が削減されるという利点がある。また、分離行列ＷA(fk)から生成される初期分離行列ＷB^[0](fk)を初期値とした２次学習処理で第２周波数ｆBの分離行列ＷB(fk)が生成されるから、初期分離行列ＷB^[0](fk)を分離行列ＷB(fk)として音源分離に利用する特許文献１の構成（すなわち、２次学習処理を省略した構成）と比較して、高精度な音源分離が可能な分離行列ＷB(fk)を生成できるという利点がある。 As described above, in the present embodiment, for each frequency fk selected as the first frequency fA, the separation matrix WA (fk) is generated by the primary learning process with the number of iterations NA, and is selected as the second frequency fB. Each frequency fk is separated by the secondary learning process of the number of iterations NB (NB <NA) using the initial separation matrix WB ^[0] (fk) generated according to the separation matrix WA (fk) as an initial value. A matrix WB (fk) is generated. Accordingly, the calculation amount of the arithmetic processing unit 12 (separation matrix generation unit 28) is reduced as compared with the configuration in which the calculation of the equation (5) is repeated NA times for all K frequencies f1 to fK. There is. In addition, since the separation matrix WB (fk) of the second frequency fB is generated by the secondary learning process using the initial separation matrix WB ^[0] (fk) generated from the separation matrix WA (fk) as an initial value, Compared with the configuration of Patent Document 1 in which the separation matrix WB ^[0] (fk) is used as the separation matrix WB (fk) for sound source separation (that is, the configuration in which the secondary learning process is omitted), high-accuracy sound source separation is achieved. There is an advantage that a possible separation matrix WB (fk) can be generated.

図８は、Ｋ個（Ｋ＝513）のうち第１周波数ｆAに選別した周波数ｆkの個数（横軸）と数式(5)の演算の総回数（以下「学習総回数」という）との関係を示すグラフである。第１実施形態では１次学習処理の回数と２次学習処理の回数との合計値が学習総回数に相当する。図８では、１次学習処理の反復回数ＮAを500回に設定するとともに２次学習処理の反復回数ＮBを100回に設定し、分離行列Ｗ(fk)の収束が検出された場合には学習処理を停止する場合が想定されている。 FIG. 8 shows the relationship between the number of frequencies fk (horizontal axis) selected as the first frequency fA out of K (K = 513) and the total number of calculations of Equation (5) (hereinafter referred to as “the total number of learning”). It is a graph which shows. In the first embodiment, the total value of the number of primary learning processes and the number of secondary learning processes corresponds to the total number of learnings. In FIG. 8, the iteration number NA of the primary learning process is set to 500 and the iteration number NB of the secondary learning process is set to 100, and learning is performed when convergence of the separation matrix W (fk) is detected. It is assumed that processing will be stopped.

図８には、反復回数ＮA（500回）の１次学習処理を全部（513個）の周波数ｆkについて実行した場合（分離行列Ｗ(fk)の収束時には学習処理を終了）が対比例１（REF1）および対比例２（REF2）として併記されている。対比例１は、１次学習処理の初期分離行列Ｗ^[0](fk)として単位行列を使用した場合であり、対比例２は、既知の方向θ1および方向θ2を利用して死角制御型のビーム形成で生成した分離行列を１次学習処理の初期分離行列Ｗ^[0](fk)として使用した場合である。対比例１および対比例２では２次学習処理は実行していない。また、図８には、音源Ｓ1が発生した音響ＳV1と音源Ｓ2が発生した音響ＳV2との振幅比ＲA（ＲA＝１，0.87，0.71，0.5，0.32）を変化させた複数の場合の各々について学習総回数が図示されている。なお、横軸の各場合の条件や振幅比ＲAの条件は、後掲の図９から図１２でも同様である。 FIG. 8 shows the case where the primary learning process of the number of iterations NA (500 times) is executed for all (513) frequencies fk (the learning process is terminated when the separation matrix W (fk) converges). REF1) and Comparative 2 (REF2). The proportional 1 is a case where a unit matrix is used as the initial separation matrix W ^[0] (fk) of the primary learning process, and the proportional 2 is a blind spot control type using the known direction θ1 and the direction θ2. This is a case where the separation matrix generated by beam forming is used as the initial separation matrix W ^[0] (fk) in the primary learning process. In contrast 1 and contrast 2, secondary learning processing is not executed. FIG. 8 shows each of a plurality of cases in which the amplitude ratio RA (RA = 1, 0.87, 0.71, 0.5, 0.32) between the sound SV1 generated by the sound source S1 and the sound SV2 generated by the sound source S2 is changed. The total number of learning is shown. The conditions in each case on the horizontal axis and the condition of the amplitude ratio RA are the same in FIGS. 9 to 12 described later.

図８から理解されるように、１次学習処理を選択的に実行する第１実施形態では、全周波数ｆkに対して１次学習処理を実行する対比例１や対比例２と比較して、分離行列Ｗ(fk)の生成に必要な学習総回数が大幅に削減される。１次学習処理の対象となる第１周波数ｆAの個数が減少するほど、第１実施形態と対比例１や対比例２との学習総回数の差異は拡大する。すなわち、第１実施形態によれば、対比例１や対比例２と比較して、分離行列Ｗ(f1)〜Ｗ(fK)の生成に必要な演算量が削減されるという利点がある。以上の傾向は、音響ＳV1と音響ＳV2との振幅比ＲAに関わらず同様に確認できる。 As can be understood from FIG. 8, in the first embodiment in which the primary learning process is selectively executed, compared to the comparative 1 and the comparative 2 in which the primary learning process is executed for all the frequencies fk, The total number of learnings required for generating the separation matrix W (fk) is greatly reduced. As the number of first frequencies fA to be subjected to the primary learning process decreases, the difference in the total number of learnings between the first embodiment and the comparison 1 or the comparison 2 increases. That is, according to the first embodiment, there is an advantage that the amount of calculation required for generating the separation matrices W (f1) to W (fK) is reduced as compared with the comparative 1 and the comparative 2. The above tendency can be similarly confirmed regardless of the amplitude ratio RA between the sound SV1 and the sound SV2.

図９は、本実施形態および各対比例での雑音抑圧率（NRR：Noise Reduction Rate）のグラフであり、図１０は、本実施形態および各対比例でのケプストラム歪のグラフである。雑音抑圧率は、分離信号Ｙ1(t)における音響ＳV2に対する音響ＳV1の強度比ＳNR_OUTと、観測信号Ｖ1(t)における音響ＳV2に対する音響ＳV1の強度比ＳNR_INとの差分（ＳNR_OUT−ＳNR_IN）である。したがって、雑音抑圧率が高い（図９の上方）ほど音源分離の精度が高い。ケプストラム歪は、音響ＳV1と分離信号Ｙ1(t)とのケプストラムの相違の指標である。ケプストラム歪が小さい（図１０の上方）ほど、音源分離に起因した波形（スペクトル包絡）の変化が小さい（すなわち、音響ＳV1が忠実に分離される）ことを意味する。 FIG. 9 is a graph of noise reduction rate (NRR: Noise Reduction Rate) in the present embodiment and each comparison, and FIG. 10 is a graph of cepstrum distortion in the present embodiment and each comparison. The noise suppression rate is a difference (SNR_OUT−SNR_IN) between the intensity ratio SNR_OUT of the sound SV1 with respect to the sound SV2 in the separated signal Y1 (t) and the intensity ratio SNR_IN of the sound SV1 with respect to the sound SV2 in the observation signal V1 (t). Therefore, the higher the noise suppression rate (upward in FIG. 9), the higher the accuracy of sound source separation. The cepstrum distortion is an index of the difference in cepstrum between the sound SV1 and the separated signal Y1 (t). It means that the smaller the cepstrum distortion (upper part of FIG. 10), the smaller the change in waveform (spectrum envelope) due to sound source separation (that is, the sound SV1 is faithfully separated).

図９および図１０から理解されるように、音響ＳV1と音響ＳV2とで振幅が過度に相違しない範囲内（ＲA＝１，0.87，0.71）では、１次学習処理の対象となる第１周波数ｆAを減少させて演算量を削減した場合でも、対比例１や対比例２と比較して、雑音抑圧率の低下やケプストラム歪の増加は殆ど発生しない。第１周波数ｆAの個数を256個または384個とした場合には、対比例１や対比例２と比較して雑音抑圧率やケプストラム歪の改善さえ確認できる。以上のように、第１実施形態によれば、分離行列Ｗ(fk)の生成に必要な演算量を削減しながら音源分離の高精度化を実現することが可能である。 As understood from FIGS. 9 and 10, the first frequency fA to be subjected to the primary learning process is within a range where the amplitudes of the sound SV1 and the sound SV2 are not excessively different (RA = 1, 0.87, 0.71). Even when the amount of calculation is reduced by reducing, the noise suppression rate decreases and the cepstrum distortion hardly increases as compared with the proportional 1 and the proportional 2. When the number of first frequencies fA is set to 256 or 384, it can be confirmed that the noise suppression rate and the cepstrum distortion are improved as compared with the proportional 1 and the proportional 2. As described above, according to the first embodiment, it is possible to achieve high accuracy of sound source separation while reducing the amount of calculation required for generating the separation matrix W (fk).

図１１および図１２は、第１実施形態のもとで２次学習処理を省略した場合（以下「対比例３」という）の雑音抑圧率（図１１）およびケプストラム歪（図１２）のグラフである。すなわち、対比例３（REF3）では、非特許文献１の技術と同様に、初期行列設定部６４が設定した初期分離行列ＷB^[0](fk)が第２周波数ｆBの分離行列ＷB(fk)として音源分離に適用される。２次学習処理の省略以外の条件は図８から図１０に示した第１実施形態と同様である。 11 and 12 are graphs of the noise suppression rate (FIG. 11) and the cepstrum distortion (FIG. 12) when the secondary learning process is omitted under the first embodiment (hereinafter referred to as “comparative 3”). is there. That is, in the comparative 3 (REF3), as in the technique of Non-Patent Document 1, the initial separation matrix WB ^[0] (fk) set by the initial matrix setting unit 64 is the separation matrix WB (fk) of the second frequency fB. As applied to sound source separation. Conditions other than the omission of the secondary learning process are the same as those in the first embodiment shown in FIGS.

図１１に示すように、対比例３のもとでは、第１周波数ｆAの個数が減少するほど雑音抑圧率が向上するように見える。しかし、図１２を参照すると、第１周波数ｆAの個数が減少するほどケプストラム歪が増加することが確認できる。すなわち、図１１で第１周波数ｆAの個数が少ない場合に雑音抑圧率が向上しているのは、分離信号Ｙ1(t)の波形と本来の音響ＳV1の波形とが乖離していることに起因しており、音源分離の精度が高水準に維持されているわけではないと理解できる。他方、図９や図１０に示すように、第２周波数ｆBについて２次学習処理を実行する第１実施形態のもとでは、ケプストラム歪を充分に抑制しながら雑音抑圧率も高水準に維持することが可能である。したがって、雑音抑圧率の維持とケプストラム歪の低減とを両立する（すなわち高精度な音源分離を実現する）という観点からは、対比例３よりも第１実施形態が有利である。また、音響ＳV1と音響ＳV2との振幅比ＲAが高い範囲内（ＲA＝１，0.87，0.71）に着目して図１０と図１２とのケプストラム歪の数値を対比すると、第１実施形態では、対比例３と比較してケプストラム歪が抑制されることが確認できる。したがって、音響ＳV1や音響ＳV2の忠実な分離という観点からしても第１実施形態が有利である。 As shown in FIG. 11, under contrast 3, it appears that the noise suppression rate improves as the number of first frequencies fA decreases. However, referring to FIG. 12, it can be confirmed that the cepstrum distortion increases as the number of the first frequencies fA decreases. That is, in FIG. 11, when the number of first frequencies fA is small, the noise suppression rate is improved because the waveform of the separated signal Y1 (t) and the waveform of the original sound SV1 are different. Therefore, it can be understood that the accuracy of sound source separation is not maintained at a high level. On the other hand, as shown in FIGS. 9 and 10, under the first embodiment in which the secondary learning process is executed for the second frequency fB, the noise suppression rate is maintained at a high level while sufficiently suppressing the cepstrum distortion. It is possible. Therefore, the first embodiment is more advantageous than the comparative 3 from the viewpoint of achieving both the maintenance of the noise suppression rate and the reduction of the cepstrum distortion (that is, realizing high-accuracy sound source separation). Further, when the numerical values of the cepstrum distortion in FIG. 10 and FIG. 12 are compared focusing on the range (RA = 1, 0.87, 0.71) in which the amplitude ratio RA between the sound SV1 and the sound SV2 is high, in the first embodiment, It can be confirmed that the cepstrum distortion is suppressed as compared with the comparative 3. Therefore, the first embodiment is advantageous from the viewpoint of faithful separation of the sound SV1 and the sound SV2.

＜Ｂ：第２実施形態＞
本発明の第２実施形態を説明する。なお、以下の各例示において作用や機能が第１実施形態と同等である要素については、以上と同じ参照符号を流用して各々の詳細な説明を適宜に省略する。 <B: Second Embodiment>
A second embodiment of the present invention will be described. In the following examples, elements having the same functions and functions as those of the first embodiment are referred to by the same reference numerals as above, and detailed descriptions thereof are appropriately omitted.

第１実施形態では、第２周波数ｆBに選別された全部の周波数ｆkについて２次学習処理を実行したが、収音機器Ｍ1および収音機器Ｍ2による収音条件によっては、２次学習処理を実行しないほうが高精度な音源分離を実現できる場合もある。 In the first embodiment, the secondary learning process is executed for all the frequencies fk selected as the second frequency fB. However, the secondary learning process is executed depending on the sound collection conditions of the sound collection device M1 and the sound collection device M2. In some cases, it is possible to achieve high-precision sound source separation.

例えば、図９や図１０から把握されるように、第１実施形態では、音響ＳV1と音響ＳV2とで強度（振幅やパワー）が乖離する場合（ＲA＝0.5，0.32）に、音源分離の精度が対比例１や対比例２を下回る。他方、図９と図１１との対比や図１０と図１２との対比から把握されるように、２次学習処理を実行しない対比例３の構成では、振幅比ＲAが低い場合でも、対比例１や対比例２に匹敵する精度の音源分離が実現される。したがって、音源分離の対象となる音響ＳV1と音響ＳV2との強度の相違（以下「音源強度差」という）が大きい場合（すなわち、収音条件が悪い場合）には、２次学習処理を実行しないほうが高精度な音源分離を実現できると理解できる。以上の傾向を考慮して、第２実施形態では、収音条件の良否に応じて２次学習処理の実行／停止を可変に制御する。 For example, as can be seen from FIG. 9 and FIG. 10, in the first embodiment, the accuracy of sound source separation when the intensity (amplitude and power) is different between the sound SV1 and the sound SV2 (RA = 0.5, 0.32). Is less than proportional 1 or proportional 2. On the other hand, as can be understood from the comparison between FIG. 9 and FIG. 11 and the comparison between FIG. 10 and FIG. 12, in the configuration of the proportional 3 that does not execute the secondary learning process, even if the amplitude ratio RA is low, the proportional Sound source separation with an accuracy comparable to 1 or proportional 2 is realized. Therefore, when the difference in intensity between the sound SV1 and the sound SV2 (hereinafter referred to as “sound source intensity difference”) that is the target of sound source separation is large (that is, when the sound collection condition is bad), the secondary learning process is not executed. It can be understood that higher accuracy sound source separation can be realized. In consideration of the above tendency, in the second embodiment, execution / stop of the secondary learning process is variably controlled according to the quality of the sound collection condition.

音源Ｓ1が発生する音響ＳV1と音源Ｓ2が発生する音響ＳV2との収音条件について以下に検討する。音響ＳV1の成分値ｓ1(m,fk)と音響ＳV2の成分値ｓ2(m,fk)とを要素とするベクトルＳ(m,fk)（Ｓ(m,fk)＝（ｓ1(m,fk),ｓ2(m,fk)）^T）を想定すると、観測信号Ｖ1(t)および観測信号Ｖ2(t)の観測ベクトルＸ(m,fk)は、以下の数式(6)で表現される。数式(6)の行列Ａ(fk)は、音源Ｓ1および音源Ｓ2の各々から収音機器Ｍ1および収音機器Ｍ2の各々に到達するまでに付与される音響特性を示す混合行列である。
Ｘ(m,fk)＝Ａ(fk)Ｓ(m,fk) ……(6) The sound collection conditions for the sound SV1 generated by the sound source S1 and the sound SV2 generated by the sound source S2 will be discussed below. A vector S (m, fk) (S (m, fk) = (s1 (m, fk)) having the component value s1 (m, fk) of the sound SV1 and the component value s2 (m, fk) of the sound SV2 as elements. , s2 (m, fk)) ^T ), the observation vector X (m, fk) of the observation signal V1 (t) and the observation signal V2 (t) is expressed by the following equation (6). The matrix A (fk) in Expression (6) is a mixing matrix indicating acoustic characteristics that are given from the sound source S1 and the sound source S2 to the sound collection device M1 and the sound collection device M2.
X (m, fk) = A (fk) S (m, fk) (6)

数式(2)と数式(6)とを考慮すると、観測ベクトルＸ(m,fk)の共分散行列Ｒxx(fk)は、ベクトルＳ(m,fk)の共分散行列Ｒss(m,fk)（Ｒss(m,fk)＝Ｅ［Ｓ(m,fk)Ｓ(m,fk)^H］）と混合行列Ａ(fk)とを含む以下の数式(7)で表現される。
Ｒxx(fk)＝Ｅ［Ｘ(m,fk)Ｘ(m,fk)^H］
＝Ｅ［Ａ(fk)Ｓ(m,fk)｛Ａ(fk)Ｓ(m,fk)｝^H］
＝Ｅ［Ａ(fk)Ｓ(m,fk)Ｓ(m,fk)^HＡ(fk)^H］
＝Ａ(fk)Ｅ［Ｓ(m,fk)Ｓ(m,fk)^H］Ａ(fk)^H
＝Ａ(fk)Ｒss(fk)Ａ(fk)^H ……(7) Considering Equation (2) and Equation (6), the covariance matrix Rxx (fk) of the observation vector X (m, fk) is the covariance matrix Rss (m, fk) ( Rss (m, fk) = E [S (m, fk) S (m, fk) ^H ]) and a mixing matrix A (fk) are expressed by the following equation (7).
Rxx (fk) = E [X (m, fk) X (m, fk) ^H ]
= E [A (fk) S (m, fk) {A (fk) S (m, fk)} ^H ]
= E [A (fk) S (m, fk) S (m, fk) ^H A (fk) ^H ]
= A (fk) E [S (m, fk) S (m, fk) ^H ] A (fk) ^H
= A (fk) Rss (fk) A (fk) ^H (7)

他方、共分散行列Ｒss(fk)は、以下の数式(8)のように固有値分解される。
Ｒss(fk)＝Ｑ(fk)Λ(fk)Ｑ(fk)^H ……(8)
数式(8)の行列Ｑ(fk)は正規直交行列であるから、行列Ｑ(fk)Ｑ(fk)^Hの行列式（det（Ｑ(fk)Ｑ(fk)^H）は１である。したがって、共分散行列Ｒss(fk)の行列式det(Ｒss(fk))は、対角行列Λ(fk)の行列式det(Λ(fk))に等しい（det(Ｒss(fk))＝det(Λ(fk))）。以上を考慮すると、共分散行列Ｒxx(fk)の行列式det(Ｒxx(fk))は、数式(7)を変形した以下の数式(9)で表現される。なお、数式(9)の記号Πは総乗（総積）の演算子（Πλi(fk)＝λ1(fk)・λ2(fk)）を意味する。
det(Ｒxx(fk))＝det(Ａ(fk)Ｒss(fk)Ａ(fk)^H)
＝det(Ａ(fk))det(Ｒss(fk))det(Ａ(fk)^H)
＝|det(Ａ(fk))|²det(Λ(fk))
＝|det(Ａ(fk))|²Πλi(fk) ……(9) On the other hand, the covariance matrix Rss (fk) is subjected to eigenvalue decomposition as shown in the following equation (8).
Rss (fk) = Q (fk) Λ (fk) Q (fk) ^H (8)
Since the matrix of Equation (8) Q (fk) is an orthonormal matrix, determinant of matrix Q (fk) Q (fk) H (det (Q (fk) Q (fk) H) is 1. Thus , The determinant det (Rss (fk)) of the covariance matrix Rss (fk) is equal to the determinant det (Λ (fk)) of the diagonal matrix Λ (fk) (det (Rss (fk)) = det ( Λ (fk))) In consideration of the above, the determinant det (Rxx (fk)) of the covariance matrix Rxx (fk) is expressed by the following equation (9) obtained by modifying equation (7). In the equation (9), the symbol 総 means the operator (Πλi (fk) = λ1 (fk) · λ2 (fk)).
det (Rxx (fk)) = det (A (fk) Rss (fk) A (fk) ^H )
= Det (A (fk)) det (Rss (fk)) det (A (fk) ^H )
= | Det (A (fk)) | ² det (Λ (fk))
= | Det (A (fk)) | ² Πλi (fk) …… (9)

数式(9)の行列式det(Ａ(fk))は、混合行列Ａを適用した線形写像における定数倍の要素に相当するから、収音機器Ｍ1および収音機器Ｍ2の各々に対する音響ＳV1や音響ＳV2の伝播が阻害される度合（以下「伝播阻害度」という）が大きいほど（収音条件が悪いほど）、数式(9)の行列式det(Ａ(fk))は小さい数値となる。他方、数式(9)の記号λi(fk)は対角行列Λの成分（共分散行列Ｒss(fk)の固有値）である。すなわち、固有値λ1(fk)は音響ＳV1の周波数ｆkの成分の強度（パワー）に相当し、固有値λ2(fk)は音響ＳV2の周波数ｆkの成分の強度（パワー）に相当する。したがって、音源強度差が大きいほど（収音条件が悪いほど）、数式(9)の総乗Πλi(fk)は小さい数値となる。 Since the determinant det (A (fk)) of Equation (9) corresponds to an element of a constant multiple in the linear map to which the mixing matrix A is applied, the sound SV1 and sound for each of the sound collection devices M1 and M2 The greater the degree of inhibition of SV2 propagation (hereinafter referred to as “propagation inhibition degree”) (the worse the sound collection condition), the smaller the determinant det (A (fk)) of Equation (9). On the other hand, the symbol λi (fk) in Equation (9) is a component of the diagonal matrix Λ (the eigenvalue of the covariance matrix Rss (fk)). That is, the eigenvalue λ1 (fk) corresponds to the intensity (power) of the frequency fk component of the sound SV1, and the eigenvalue λ2 (fk) corresponds to the intensity (power) of the frequency fk component of the sound SV2. Therefore, the greater the sound source intensity difference (the worse the sound collection condition), the smaller the total power λi (fk) in Equation (9).

以上の説明から理解されるように、収音条件が悪いほど（伝播阻害度や音源強度差が大きいほど）、共分散行列Ｒxx(fk)の行列式ｚ1(fk)（ｚ1(fk)＝det(Ｒxx(fk))）は小さい数値になるという傾向がある。以上の傾向を考慮して、第２実施形態では、収音条件の良否の判定に行列式ｚ1(fk)を適用する。 As understood from the above description, the worse the sound collection condition (the greater the propagation inhibition degree or the sound source intensity difference), the more the determinant z1 (fk) (z1 (fk) = det of the covariance matrix Rxx (fk) (Rxx (fk))) tends to be small. In consideration of the above tendency, in the second embodiment, the determinant z1 (fk) is applied to determine whether the sound collection condition is good or bad.

図１３は、第２実施形態における分離行列生成部２８Aのブロック図である。第２実施形態の分離行列生成部２８Aは、第１実施形態の分離行列生成部２８の各要素に条件判定部７２を追加した構成である。条件判定部７２は、第２周波数ｆBに選別された周波数ｆk毎に収音条件の良否を判定する。条件判定部７２による判定には、周波数選別部４４による周波数ｆkの選別のために有意指標算定部４２が算定した行列式ｚ1(fk)が流用される。すなわち、条件判定部７２は、行列式ｚ1(fk)が所定の閾値を上回る場合には周波数ｆkの収音条件が良い（伝播阻害度や音源強度差が小さい）と判定し、行列式ｚ1(fk)が閾値を下回る場合には周波数ｆkの収音条件が悪い（伝播阻害度や音源強度差が大きい悪条件である）と判定する。 FIG. 13 is a block diagram of the separation matrix generation unit 28A in the second embodiment. The separation matrix generation unit 28A of the second embodiment has a configuration in which a condition determination unit 72 is added to each element of the separation matrix generation unit 28 of the first embodiment. The condition determination unit 72 determines whether the sound collection condition is good or not for each frequency fk selected as the second frequency fB. In the determination by the condition determination unit 72, the determinant z1 (fk) calculated by the significant index calculation unit 42 for the selection of the frequency fk by the frequency selection unit 44 is used. That is, the condition determining unit 72 determines that the sound collection condition of the frequency fk is good (the propagation inhibition degree and the sound source intensity difference are small) when the determinant z1 (fk) exceeds a predetermined threshold, and the determinant z1 ( When fk) falls below the threshold, it is determined that the sound collection condition of the frequency fk is bad (a bad condition with a large degree of propagation inhibition and a difference in sound source intensity).

図１３の第２学習部６６は、条件判定部７２の判定の結果に応じて２次学習処理の実行／停止を周波数ｆk毎に決定する。すなわち、第２周波数ｆBに選別された周波数ｆkのうち行列式ｚ1(fk)が大きい（収音条件が良い）と判定された周波数ｆkについて、第２学習部６６は、第１実施形態と同様に、初期分離行列ＷB^[0](fk)を初期値とした２次学習処理で分離行列ＷB(fk)を生成する。他方、第２周波数ｆBに選別された周波数ｆkのうち行列式ｚ1(fk)が小さい（収音条件が悪い）と判定された周波数ｆkについて、第２学習部６６は、２次学習処理を停止し、初期行列設定部６４が設定した初期分離行列ＷB^[0](fk)を分離行列ＷB(fk)として確定する。したがって、行列式ｚ1が小さい（悪条件）の学習データＤ(fk)は分離行列ＷB(fk)の生成に使用されない。 The second learning unit 66 in FIG. 13 determines the execution / stop of the secondary learning process for each frequency fk according to the determination result of the condition determination unit 72. That is, the second learning unit 66 is the same as in the first embodiment for the frequency fk determined that the determinant z1 (fk) is large (the sound collection condition is good) among the frequencies fk selected as the second frequency fB. In addition, the separation matrix WB (fk) is generated by the secondary learning process using the initial separation matrix WB ^[0] (fk) as an initial value. On the other hand, the second learning unit 66 stops the secondary learning process for the frequency fk determined that the determinant z1 (fk) is small (the sound collection condition is bad) among the frequencies fk selected as the second frequency fB. Then, the initial separation matrix WB ^[0] (fk) set by the initial matrix setting unit 64 is determined as the separation matrix WB (fk). Therefore, the learning data D (fk) having a small determinant z1 (bad condition) is not used to generate the separation matrix WB (fk).

第２実施形態においても第１実施形態と同様の効果が実現される。また、第２実施形態では、収音条件の良否に応じて２次学習処理の実行／停止が制御されるから、第２周波数ｆBに選別された各周波数ｆkについて収音条件に関わらず２次学習処理を実行する場合と比較して、音源分離の精度を維持しながら分離行列Ｗ(fk)の生成の演算量が削減されるという格別の効果が実現される。 In the second embodiment, the same effect as in the first embodiment is realized. In the second embodiment, the execution / stop of the secondary learning process is controlled according to the quality of the sound collection condition. Therefore, the secondary frequency is selected for each frequency fk selected as the second frequency fB regardless of the sound collection condition. Compared to the case where the learning process is executed, a special effect is achieved in that the amount of computation for generating the separation matrix W (fk) is reduced while maintaining the accuracy of sound source separation.

＜Ｃ：第３実施形態＞
第２実施形態では、第２周波数ｆBに選別された周波数ｆkのうち収音条件が悪いと判定された周波数ｆkについて、初期行列設定部６４が死角制御型のビーム形成で生成した初期分離行列ＷB^[0](fk)を分離行列ＷB(fk)として利用したが、収音条件が悪い周波数ｆkのについて分離行列ＷB(fk)を設定する方法は、第３実施形態として以下に例示するように適宜に変更される。 <C: Third Embodiment>
In the second embodiment, the initial separation matrix WB generated by the initial matrix setting unit 64 by the blind spot control type beam forming for the frequency fk determined to have a poor sound collection condition among the frequencies fk selected as the second frequency fB. ^{[0] Although} (fk) is used as the separation matrix WB (fk), a method of setting the separation matrix WB (fk) for the frequency fk having a poor sound collection condition is exemplified as the third embodiment below. It is changed appropriately.

観測信号Ｖ1(t)や観測信号Ｖ2(t)の周波数ｆkの成分が１個の音源Ｓの音響ＳVのみを含む場合、条件判定部７２は周波数ｆkの収音条件が悪いと判定する。他方、１個の音源Ｓの音響ＳVのみが周波数ｆkに存在するのであれば、音源分離の前後で周波数ｆkの成分を変化させる必要性は低い。そこで、収音条件が悪い（１個の音源Ｓのみを含む）と条件判定部７２が判定した周波数ｆkについては、第２学習部６６による２次学習処理に加えて初期行列設定部６４による初期分離行列ＷB^[0](fk)の生成も停止し、音源分離の前後で周波数ｆkの成分が過度に変化しないように分離行列ＷB(fk)を設定する構成が採用され得る。具体的な構成を以下に詳述する。 When the component of the frequency fk of the observation signal V1 (t) or the observation signal V2 (t) includes only the sound SV of one sound source S, the condition determination unit 72 determines that the sound collection condition of the frequency fk is bad. On the other hand, if only the sound SV of one sound source S exists at the frequency fk, it is less necessary to change the component of the frequency fk before and after sound source separation. Therefore, for the frequency fk determined by the condition determination unit 72 that the sound collection condition is bad (including only one sound source S), in addition to the secondary learning process by the second learning unit 66, the initial value by the initial matrix setting unit 64 The generation of the separation matrix WB ^[0] (fk) is also stopped, and a configuration in which the separation matrix WB (fk) is set so that the component of the frequency fk does not change excessively before and after the sound source separation can be adopted. A specific configuration will be described in detail below.

図１４は、第３実施形態における分離行列生成部２８Bのブロック図である。第３実施形態の分離行列生成部２８Bは、第２実施形態の分離行列生成部２８Aに方向推定部７４と行列設定部７６とを追加した構成である。 FIG. 14 is a block diagram of the separation matrix generation unit 28B in the third embodiment. The separation matrix generation unit 28B of the third embodiment has a configuration in which a direction estimation unit 74 and a matrix setting unit 76 are added to the separation matrix generation unit 28A of the second embodiment.

方向推定部７４は、第２周波数ｆBに選別された周波数のうち収音条件が悪いと条件判定部７２が判定した周波数ｆk毎に、学習データＤ(fk)を利用して音源方向（すなわち、周波数ｆkの成分を含む音響を放射する１個の音源の方向）θe(fk)を推定する。音源方向θe(fk)の推定には公知の技術が任意に採用され得る。行列設定部７６は、方向推定部７４が推定した音源方向θe(fk)から到来する音響を観測信号Ｖ1(t)および観測信号Ｖ2(t)から分離する分離行列ＷB(fk)を生成する。例えば、行列設定部７６は、収音条件が悪いと判定された周波数ｆk毎に以下の処理を実行することで分離行列ＷB(fk)を生成する。 The direction estimation unit 74 uses the learning data D (fk) for each frequency fk determined by the condition determination unit 72 that the sound collection condition is bad among the frequencies selected as the second frequency fB, that is, the sound source direction (that is, The direction of one sound source that emits sound including a component of frequency fk) θe (fk) is estimated. A known technique can be arbitrarily employed for estimating the sound source direction θe (fk). The matrix setting unit 76 generates a separation matrix WB (fk) that separates the sound coming from the sound source direction θe (fk) estimated by the direction estimating unit 74 from the observation signal V1 (t) and the observation signal V2 (t). For example, the matrix setting unit 76 generates the separation matrix WB (fk) by executing the following processing for each frequency fk determined to have poor sound collection conditions.

第１に、行列設定部７６は、以下の数式で定義される抽出行列Ｃ(fk)を生成する。記号ｄは収音機器Ｍ1と収音機器Ｍ2との間隔を意味し、記号ｃは音速を意味する。したがって、記号τは、音源方向θe(fk)から到来する音響が収音機器Ｍ1および収音機器Ｍ2の各々に到達する時間差に相当する。抽出行列Ｃ(fk)の第１行は、遅延加算型のビーム形成に適用した場合に音源方向θe(fk)からの到来音を強調する。

First, the matrix setting unit 76 generates an extraction matrix C (fk) defined by the following mathematical formula. The symbol d means the interval between the sound collecting device M1 and the sound collecting device M2, and the symbol c means the speed of sound. Therefore, the symbol τ corresponds to the time difference at which the sound coming from the sound source direction θe (fk) reaches each of the sound collecting device M1 and the sound collecting device M2. The first row of the extraction matrix C (fk) emphasizes the incoming sound from the sound source direction θe (fk) when applied to delay-added beamforming.

第２に、行列設定部７６は、方向推定部６２が分離行列ＷA(fk)から推定した方向θ1および方向θ2と方向推定部７４が推定した音源方向θe(fk)との関係に応じて抽出行列Ｃ(fk)の各行を相互に置換することで分離行列ＷB(fk)を生成する。具体的には、行列設定部７６は、音源方向θe(fk)が方向θ1に近い場合には周波数ｆkの成分が分離信号Ｙ1(t)にて強調され、音源方向θe(fk)が方向θ2に近い場合には周波数ｆkの成分が分離信号Ｙ2(t)にて強調されるように、抽出行列Ｃ(fk)の各行の位置を調整する。 Second, the matrix setting unit 76 extracts the direction θ1 and the direction θ2 estimated from the separation matrix WA (fk) by the direction estimation unit 62 and the relationship between the sound source direction θe (fk) estimated by the direction estimation unit 74. A separation matrix WB (fk) is generated by replacing each row of the matrix C (fk) with each other. Specifically, the matrix setting unit 76 emphasizes the component of the frequency fk with the separated signal Y1 (t) when the sound source direction θe (fk) is close to the direction θ1, and the sound source direction θe (fk) is the direction θ2. If the frequency is close to, the position of each row of the extraction matrix C (fk) is adjusted so that the component of the frequency fk is emphasized by the separation signal Y2 (t).

例えば、分離行列ＷA(fk)の第１行（ｗ11(fk)，ｗ12(fk)）が方向θ1の音響ＳV1を強調する（方向θ2に死角を形成する）ように作用し、分離行列ＷA(fk)の第２行（ｗ21(fk)，ｗ22(fk)）が方向θ2の音響ＳV2を強調する（方向θ1に死角を形成する）ように作用する場合を想定する。音源方向θe(fk)が方向θ1に近い場合、行列設定部７６は、前述の抽出行列Ｃ(fk)を分離行列ＷB(fk)として確定する。したがって、分離信号Ｙ1(t)のうち収音条件が悪いと判定された周波数ｆkの成分値ｙ1(m,fk)は音源方向θe(fk)からの到来音を強調した数値に設定され、分離信号Ｙ2(t)の当該周波数ｆkの成分値ｙ2(m,fk)はゼロに設定される。他方、音源方向θe(fk)が方向θ2に近い場合、行列設定部７６は、抽出行列Ｃ(fk)の第１行と第２行とを入替えた行列を分離行列ＷB(fk)として確定する。したがって、分離信号Ｙ1(fk)のうち収音条件が悪いと判定された周波数ｆkの成分値ｙ1(m,fk)はゼロに設定され、分離信号Ｙ2(t)の当該周波数ｆkの成分値ｙ2（m,fk）は音源方向θe(fk)からの到来音を強調した数値に設定される。 For example, the first row (w11 (fk), w12 (fk)) of the separation matrix WA (fk) acts to emphasize the acoustic SV1 in the direction θ1 (forms a blind spot in the direction θ2), and the separation matrix WA ( Assume that the second row (w21 (fk), w22 (fk)) of fk) acts to emphasize the sound SV2 in the direction θ2 (forms a blind spot in the direction θ1). When the sound source direction θe (fk) is close to the direction θ1, the matrix setting unit 76 determines the extraction matrix C (fk) described above as the separation matrix WB (fk). Therefore, the component value y1 (m, fk) of the frequency fk determined to have poor sound collection conditions in the separated signal Y1 (t) is set to a value that emphasizes the incoming sound from the sound source direction θe (fk), and is separated. The component value y2 (m, fk) of the frequency fk of the signal Y2 (t) is set to zero. On the other hand, when the sound source direction θe (fk) is close to the direction θ2, the matrix setting unit 76 determines a matrix obtained by replacing the first row and the second row of the extraction matrix C (fk) as the separation matrix WB (fk). . Therefore, the component value y1 (m, fk) of the frequency fk that is determined to be bad in the sound collection condition in the separated signal Y1 (fk) is set to zero, and the component value y2 of the frequency fk of the separated signal Y2 (t). (M, fk) is set to a numerical value that emphasizes the incoming sound from the sound source direction θe (fk).

第３実施形態においても第１実施形態や第２実施形態と同様の効果が実現される。また、第３実施形態では、収音条件が悪い周波数ｆkについて、第２学習部６６による２次学習処理に加えて初期行列設定部６４による初期分離行列ＷB^[0](fk)の生成も停止するから、分離行列ＷB(fk)の生成に必要な演算量が第２実施形態と比較して削減されるという利点もある。 In the third embodiment, the same effects as those of the first embodiment and the second embodiment are realized. In the third embodiment, the generation of the initial separation matrix WB ^[0] (fk) by the initial matrix setting unit 64 is also stopped in addition to the secondary learning process by the second learning unit 66 for the frequency fk having a poor sound collection condition. Therefore, there is also an advantage that the amount of calculation required for generating the separation matrix WB (fk) is reduced as compared with the second embodiment.

なお、抽出行列Ｃ(fk)の内容は以上の例示に限定されない。例えば、以下に例示する抽出行列Ｃ(fk)を利用すれば、観測信号Ｖ1(t)または観測信号Ｖ2(t)に含まれる周波数ｆkの成分がそのまま分離信号Ｙ1(t)または分離信号Ｙ2(t)の周波数ｆkの成分として信号分離部２４から出力される。

The contents of the extraction matrix C (fk) are not limited to the above examples. For example, if the extraction matrix C (fk) illustrated below is used, the component of the frequency fk included in the observation signal V1 (t) or the observation signal V2 (t) is directly used as the separation signal Y1 (t) or the separation signal Y2 ( The signal is output from the signal separation unit 24 as a component of the frequency fk of t).

＜Ｄ：第４実施形態（有意指標値の例示）＞
以上の各形態において周波数選別部４４による選別の基準となる有意指標値Ｚ(fk)は共分散行列Ｒxx(fk)の行列式ｚ1(fk)に限定されない。具体的には、以下の各態様に例示する数値（統計量）が有意指標値Ｚ(fk)として採用され得る。 <D: Fourth Embodiment (Exemplary Significant Index Value)>
In each of the above embodiments, the significant index value Z (fk) serving as a reference for selection by the frequency selection unit 44 is not limited to the determinant z1 (fk) of the covariance matrix Rxx (fk). Specifically, numerical values (statistics) exemplified in the following aspects can be adopted as the significant index value Z (fk).

＜Ｄ-１：第１の態様（条件数ｚ2(fk)）＞
学習データＤ(fk)を構成する複数の観測ベクトルＸ(m,fk)の共分散行列Ｒxx(fk)の条件数ｚ2(fk)は以下の数式(10)で定義される。数式(10)の演算子‖Ａ‖は、行列Ａのノルム（行列の距離）を意味する。共分散行列Ｒxx(fk)に逆行列が存在する場合（正則である場合）に条件数ｚ2(fk)は小さく、共分散行列Ｒxx(fk)に逆行列が存在しない場合に条件数ｚ2(fk)は大きい数値となる。
ｚ2(fk)＝‖Ｒxx(fk)‖・‖Ｒxx(fk)^-1‖ ……(10) <D-1: First mode (condition number z2 (fk))>
The condition number z2 (fk) of the covariance matrix Rxx (fk) of the plurality of observation vectors X (m, fk) constituting the learning data D (fk) is defined by the following equation (10). The operator ‖A‖ in Expression (10) means the norm (matrix distance) of the matrix A. The condition number z2 (fk) is small when the inverse matrix exists in the covariance matrix Rxx (fk) (when it is regular), and the condition number z2 (fk) when the inverse matrix does not exist in the covariance matrix Rxx (fk). ) Is a large number.
z2 (fk) = ‖Rxx (fk) ‖ ・ ‖Rxx (fk) ^-1 ………… (10)

共分散行列Ｒxx(fk)は以下の数式(11A)のように固有値分解される。数式(11A)の行列Ｕは固有行列（固有ベクトルを要素とする行列）であり、行列Σは、固有値を要素とする対角行列である。また、共分散行列Ｒxx(fk)の逆行列は、数式(11A)を変形した以下の数式(11B)で表現される。
Ｒxx(fk)＝ＵΣＵ^H ……(11A)
Ｒxx(fk)^-1＝ＵΣ^-1Ｕ^H ……(11B) The covariance matrix Rxx (fk) is subjected to eigenvalue decomposition as shown in the following equation (11A). The matrix U in Equation (11A) is an eigenmatrix (matrix having eigenvectors as elements), and the matrix Σ is a diagonal matrix having eigenvalues as elements. The inverse matrix of the covariance matrix Rxx (fk) is expressed by the following formula (11B) obtained by modifying the formula (11A).
Rxx (fk) = UΣU ^H (11A)
Rxx (fk) ^-1 = UΣ ^-1 U ^H (11B)

行列Σの要素にゼロが含まれる場合には数式(11B)の行列Σ^-1が無限大に発散するため、共分散行列Ｒxx(fk)の逆行列は存在しない（すなわち、数式(10)の条件数ｚ2(fk)は大きい数値となる）。一方、行列Σの要素（共分散行列Ｒxx(fk)の固有値）がゼロに近い数値を含むということは、観測ベクトルＸ(m,fk)の分布における基底の総数が少ないことを意味する。したがって、観測ベクトルＸ(m,fk)の基底の総数が少ないほど共分散行列Ｒxx(fk)の条件数ｚ2(fk)が大きい（基底の総数が多いほど条件数ｚ2(fk)は小さい）という傾向が把握される。つまり、共分散行列Ｒxx(fk)の条件数ｚ2(fk)は、行列式ｚ1(fk)と同様に、観測ベクトルＸ(m,fk)の基底の総数の指標として機能する。 When the element of the matrix Σ includes zero, the matrix Σ ⁻¹ of the formula (11B) diverges infinitely, and therefore there is no inverse matrix of the covariance matrix Rxx (fk) (that is, the formula (10) Condition number z2 (fk) is a large numerical value). On the other hand, the fact that the elements of the matrix Σ (the eigenvalues of the covariance matrix Rxx (fk)) include values close to zero means that the total number of bases in the distribution of the observation vector X (m, fk) is small. Therefore, the condition number z2 (fk) of the covariance matrix Rxx (fk) is larger as the total number of bases of the observation vector X (m, fk) is smaller (the condition number z2 (fk) is smaller as the total number of bases is larger). The trend is grasped. That is, the condition number z2 (fk) of the covariance matrix Rxx (fk) functions as an index of the total number of bases of the observation vector X (m, fk), similarly to the determinant z1 (fk).

以上の傾向を考慮して、第１の態様においては、共分散行列Ｒxx(fk)の条件数ｚ2(fk)を有意指標値Ｚ(fk)として利用する。すなわち、有意指標算定部４２は、Ｋ個の周波数ｆ1〜ｆKの各々の共分散行列Ｒxx(fk)について数式(10)の演算を実行することで条件数ｚ2(fk)（ｚ2(f1)〜ｚ2(fK)）を算定する。周波数選別部４４は、有意指標算定部４２の算定した条件数ｚ2(fk)が小さい１個以上の周波数ｆk（例えば、昇順で上位に位置する所定個の周波数ｆkや閾値を下回る周波数ｆk）を第１周波数ｆAに選別するとともに残余の周波数ｆkを第２周波数ｆBに選別する。 Considering the above tendency, in the first mode, the condition number z2 (fk) of the covariance matrix Rxx (fk) is used as the significant index value Z (fk). In other words, the significant index calculation unit 42 performs the operation of Expression (10) on the covariance matrix Rxx (fk) of each of the K frequencies f1 to fK to thereby obtain the condition number z2 (fk) (z2 (f1) to z2 (fK)) is calculated. The frequency selection unit 44 selects one or more frequencies fk (for example, a predetermined number of frequencies fk positioned higher in ascending order or a frequency fk lower than the threshold) in the ascending order, with the condition number z2 (fk) calculated by the significant index calculation unit 42 being small. The first frequency fA is sorted and the remaining frequency fk is sorted to the second frequency fB.

＜Ｄ-２：第２の態様（相互相関ｚ3(fk)，相互情報量ｚ4(fk)）＞
独立成分分析の学習処理は、音源分離後の各信号が統計的に独立となるように分離行列Ｗ(fk)を更新する処理であるから、観測信号Ｖ1(t)と観測信号Ｖ2(t)とで統計的な相関が低い周波数ｆkほど、学習データＤ(fk)を使用した分離行列Ｗ(fk)の学習の有意性が高いと言える。そこで、第２の態様においては、観測信号Ｖ1(t)および観測信号Ｖ2(t)の相互間の独立性に応じた指標値（例えば相互相関ｚ3(fk)）を有意指標値Ｚ(fk)として利用する。 <D-2: Second mode (cross-correlation z3 (fk), mutual information z4 (fk))>
The learning process of independent component analysis is a process of updating the separation matrix W (fk) so that each signal after the sound source separation is statistically independent. Therefore, the observation signal V1 (t) and the observation signal V2 (t) Therefore, it can be said that the learning frequency of the separation matrix W (fk) using the learning data D (fk) is higher as the frequency fk has a lower statistical correlation. Therefore, in the second mode, an index value (for example, cross-correlation z3 (fk)) corresponding to the independence between the observation signal V1 (t) and the observation signal V2 (t) is used as a significant index value Z (fk). Use as

観測信号Ｖ1(t)の周波数ｆkの成分と観測信号Ｖ2(t)の周波数ｆkの成分との相互相関ｚ3(fk)は以下の数式(12)で表現される。数式(12)の記号σ1は、単位区間ＴU内の強度ｘ1(m,fk)の標準偏差を意味し、記号σ2は、単位区間ＴU内の強度ｘ2(m,fk)の標準偏差を意味する。
ｚ3(fk)＝Ｅ［{ｘ1(m,fk)−Ｅ(ｘ1(m,fk))}{ｘ2(m,fk)−Ｅ(ｘ2(m,fk))}］／σ1σ2 ……(12) The cross-correlation z3 (fk) between the frequency fk component of the observation signal V1 (t) and the frequency fk component of the observation signal V2 (t) is expressed by the following equation (12). The symbol σ1 in the equation (12) means the standard deviation of the intensity x1 (m, fk) in the unit interval TU, and the symbol σ2 means the standard deviation of the intensity x2 (m, fk) in the unit interval TU. .
z3 (fk) = E [{x1 (m, fk) -E (x1 (m, fk))} {x2 (m, fk) -E (x2 (m, fk))}] / σ1σ2 (12 )

数式(12)から理解されるように、観測信号Ｖ1(t)と観測信号Ｖ2(t)との独立性が高い（相関が低い）周波数ｆkほど相互相関ｚ3(fk)は小さい数値となる。以上の傾向を考慮して、第２の態様においては、Ｋ個の周波数ｆ1〜ｆKの各々について数式(12)の演算を実行することで有意指標算定部４２が相互相関ｚ3(fk)（ｚ3(f1)〜ｚ3(fK)）を算定し、周波数選別部４４は、Ｋ個の周波数ｆ1〜ｆKのうち相互相関ｚ3(fk)が低い１個以上の周波数ｆk（例えば、昇順で上位の周波数fkや閾値を下回る周波数ｆk）を第１周波数ｆAに選別するとともに残余の周波数ｆkを第２周波数ｆBに選別する。 As understood from the equation (12), the cross-correlation z3 (fk) becomes a smaller numerical value as the frequency fk has higher independence (lower correlation) between the observation signal V1 (t) and the observation signal V2 (t). Considering the above tendency, in the second mode, the significant index calculation unit 42 performs the cross-correlation z3 (fk) (z3) by executing the calculation of the equation (12) for each of the K frequencies f1 to fK. (f1) to z3 (fK)), and the frequency selecting unit 44 selects one or more frequencies fk (for example, higher frequencies in ascending order) having a low cross-correlation z3 (fk) among the K frequencies f1 to fK. fk and the frequency fk below the threshold value are selected as the first frequency fA, and the remaining frequency fk is selected as the second frequency fB.

また、以下の数式(13)で定義される相互情報量ｚ4(fk)も有意指標値Ｚ(fk)として利用され得る。相互相関ｚ3(fk)と同様に、観測信号Ｖ1(t)と観測信号Ｖ2(t)との独立性が高い（相関が低い）周波数ｆkほど相互情報量ｚ4(fk)は小さい数値となる。したがって、周波数選別部４４は、Ｋ個の周波数ｆ1〜ｆKのうち相互情報量ｚ4(fk)が低い１個以上の周波数ｆkを第１周波数ｆAに選別する。
ｚ4(fk)＝(−１／２)log(１−ｚ3(fk)²) ……(13) Further, the mutual information amount z4 (fk) defined by the following equation (13) can also be used as the significant index value Z (fk). Similarly to the cross-correlation z3 (fk), the mutual information z4 (fk) becomes a smaller numerical value as the frequency fk has a higher independence (lower correlation) between the observation signal V1 (t) and the observation signal V2 (t). Therefore, the frequency selection unit 44 selects one or more frequencies fk having a low mutual information amount z4 (fk) from the K frequencies f1 to fK as the first frequency fA.
z4 (fk) = (− 1/2) log (1-z3 (fk) ² ) (13)

＜Ｄ-３：第３の態様（トレースｚ5(fk)）＞
共分散行列Ｒxx(fk)のトレース（パワー）ｚ5(fk)は共分散行列Ｒxx(fk)の対角成分の総和として定義される。共分散行列Ｒxx(fk)の対角成分は、単位区間ＴUにおける観測信号Ｖ1(t)の強度ｘ1(m,fk)の分散σ1²と単位区間ＴUにおける観測信号Ｖ2(t)の強度ｘ2(m,fk)の分散σ2²とに相当するから、共分散行列Ｒxx(fk)のトレースｚ5(fk)は、強度ｘ1(m,fk)の分散σ1²と強度ｘ2(m,fk)の分散σ2²との加算値（ｚ5(fk)＝σ1²＋σ2²）としても定義される。 <D-3: Third mode (trace z5 (fk))>
The trace (power) z5 (fk) of the covariance matrix Rxx (fk) is defined as the sum of the diagonal components of the covariance matrix Rxx (fk). The diagonal components of the covariance matrix Rxx (fk) are the variance σ1 ² of the intensity x1 (m, fk) of the observation signal V1 (t) in the unit interval TU and the intensity x2 () of the observation signal V2 (t) in the unit interval TU. m, fk) corresponding to the variance σ2 ² , the trace z5 (fk) of the covariance matrix Rxx (fk) is the variance σ1 ² of the strength x1 (m, fk) and the variance of the strength x2 (m, fk) is defined as .sigma. @ 2 ² and the addition value ^{(z5 (fk) = σ1 2} + σ2 2).

図１５は、単位区間ＴU内の各観測ベクトルＸ(m,fk)の散布図である。図１５の部分(A)は、トレースｚ5(fk)が大きい場合の散布図であり、図１５の部分(B)は、トレースｚ5(fk)が小さい場合の散布図である。図１５の部分(A)および部分(B)には、図７の部分(A)と同様に、音源Ｓ1からの音響ＳV1が優勢な観測ベクトルＸ(m,fk)が分布する領域Ａ1と、音源Ｓ2からの音響ＳV2が優勢な観測ベクトルＸ(m,fk)が分布する領域Ａ2とが模式的に図示されている。 FIG. 15 is a scatter diagram of each observation vector X (m, fk) in the unit interval TU. Part (A) in FIG. 15 is a scatter diagram when the trace z5 (fk) is large, and part (B) in FIG. 15 is a scatter diagram when the trace z5 (fk) is small. In the part (A) and the part (B) of FIG. 15, as in the part (A) of FIG. 7, an area A1 in which the observation vector X (m, fk) from which the sound SV1 from the sound source S1 is dominant is distributed, A region A2 in which the observation vector X (m, fk) in which the sound SV2 from the sound source S2 is dominant is distributed is schematically illustrated.

強度ｘ1(m,fk)の分散σ1²と強度ｘ2(m,fk)の分散σ2²との加算値という定義からも理解されるように、共分散行列Ｒxx(fk)のトレースｚ5(fk)が大きいほど観測ベクトルＸ(m,fk)は広範に分布する。したがって、トレースｚ5(fk)が大きい場合には、図１５の部分(A)のように、観測ベクトルＸ(m,fk)の分布する領域（領域Ａ1および領域Ａ2）が音源Ｓ毎に明確に区別され、トレースｚ5(fk)が小さい場合には、図１５の部分(B)のように領域Ａ1と領域Ａ2との区別は曖昧になるという傾向がある。つまり、トレースｚ5(fk)は、観測ベクトルＸ(m,fk)が分布する領域の形状（広がり）の指標値として機能する。そして、分離行列Ｗ(fk)の学習処理（独立成分分析）は、独立な基底を音源Ｓの個数だけ特定する処理と等価であるから、観測ベクトルＸ(m,fk)の分布する領域（基底）が音源Ｓ毎に明確に区別される周波数ｆk（すなわちトレースｚ5(fk)が大きい周波数ｆk）ほど、学習データＤ(fk)を使用した分離行列Ｗ(fk)の学習の有意性が高いと言える。 As can be understood from the definition of the added value of the variance σ1 ^{2 of the} intensity x1 (m, fk) and the variance σ2 ² of the intensity x2 (m, fk), the trace z5 (fk) of the covariance matrix Rxx (fk) The observation vector X (m, fk) is more widely distributed as the value of becomes larger. Therefore, when the trace z5 (fk) is large, the region (region A1 and region A2) where the observation vector X (m, fk) is distributed is clearly defined for each sound source S as shown in part (A) of FIG. If the trace z5 (fk) is small, the distinction between the region A1 and the region A2 tends to be ambiguous as shown in part (B) of FIG. That is, the trace z5 (fk) functions as an index value of the shape (expansion) of the region in which the observation vector X (m, fk) is distributed. Since the learning process (independent component analysis) of the separation matrix W (fk) is equivalent to the process of specifying the independent bases by the number of the sound sources S, the region in which the observation vector X (m, fk) is distributed (basis ) Is clearly distinguished for each sound source S (that is, the frequency fk at which the trace z5 (fk) is large), the learning of the separation matrix W (fk) using the learning data D (fk) is more significant. I can say that.

以上の傾向を考慮して、第３の態様では、共分散行列Ｒxx(fk)のトレースｚ5(fk)を有意指標値Ｚ(fk)として利用する。すなわち、有意指標算定部４２は、Ｋ個の周波数ｆ1〜ｆKの各々の共分散行列Ｒxx(fk)の対角成分を加算することでトレースｚ5(fk)（ｚ5(f1)〜ｚ5(fK)）を算定する。周波数選別部４４は、有意指標算定部４２の算定したトレースｚ5(fk)が大きい１個以上の周波数ｆk（例えば、降順で上位の周波数ｆkや閾値を上回る周波数ｆk）を第１周波数ｆAに選別するとともに残余の周波数ｆkを第２周波数ｆBに選別する。 Considering the above tendency, in the third mode, the trace z5 (fk) of the covariance matrix Rxx (fk) is used as the significant index value Z (fk). In other words, the significant index calculation unit 42 adds the diagonal components of the covariance matrix Rxx (fk) of each of the K frequencies f1 to fK to obtain the trace z5 (fk) (z5 (f1) to z5 (fK). ) Is calculated. The frequency sorting unit 44 sorts one or more frequencies fk (for example, a higher frequency fk or a frequency fk that exceeds a threshold value in descending order) having a large trace z5 (fk) calculated by the significant index calculation unit 42 into the first frequency fA. At the same time, the remaining frequency fk is selected as the second frequency fB.

＜Ｄ-４：第４の態様（尖度ｚ6(fk)）＞
観測信号Ｖ1(t)の強度ｘ1(m,fk)の度数分布（強度ｘ1(m,fk)を確率変数とする分布関数）における尖度（カートシス）ｚ6(fk)は、以下の数式(14)で定義される。
ｚ6(fk)＝μ4(fk)／｛μ2(fk)｝² ……(14) <D-4: Fourth aspect (kurtosis z6 (fk))>
The kurtosis z6 (fk) in the frequency distribution of the intensity x1 (m, fk) of the observed signal V1 (t) (distribution function with the intensity x1 (m, fk) as a random variable) is expressed by the following formula (14 ).
z6 (fk) = μ4 (fk) / {μ2 (fk)} ² …… (14)

数式(14)の記号μ4(fk)は、以下の数式(15A)で定義される４次のモーメントを意味し、数式(14)の記号μ2（fk）は、数式(15B)で定義される２次のモーメントを意味する。数式(15A)や数式(15B)の記号ｍ(fk)は、単位区間ＴU内の複数のフレームにわたる強度ｘ1(m,fk)の平均値を意味する。
μ4(fk)＝Ｅ{ｘ1(m,fk)−ｍ(fk)}⁴ ……(15A)
μ2(fk)＝Ｅ{ｘ1(m,fk)−ｍ(fk)}² ……(15B) The symbol μ4 (fk) in the equation (14) means a fourth-order moment defined by the following equation (15A), and the symbol μ2 (fk) in the equation (14) is defined by the equation (15B). Means second moment. The symbol m (fk) in the equations (15A) and (15B) means the average value of the intensity x1 (m, fk) over a plurality of frames in the unit interval TU.
μ4 (fk) = E {x1 (m, fk) −m (fk)} ⁴ …… (15A)
μ2 (fk) = E {x1 (m, fk) −m (fk)} ² …… (15B)

音響ＳV1の成分ＳV1(fk)および音響ＳV2の成分ＳV2(fk)の一方のみが観測信号Ｖ1(t)に含まれる（あるいは支配的である）場合には尖度ｚ6(fk)が大きい数値となり、成分ＳV1(fk)および成分ＳV2(fk)の双方が略同等の強度で観測信号Ｖ1(t)に含まれる場合には尖度ｚ6(fk)が小さい数値となる（中心極限定理）。分離行列Ｗ(fk)の学習処理（独立成分分析）は、独立な基底を音源Ｓの個数だけ特定する処理と等価であるから、有意な音量で観測信号Ｖ1(t)に含まれる音響ＳVの音源Ｓの個数が多い周波数ｆk（すなわち、尖度ｚ6(fk)が小さい周波数ｆk）ほど、学習データＤ(fk)を使用した分離行列Ｗ(fk)の学習の有意性が高いと言える。 When only one of the component SV1 (fk) of the sound SV1 and the component SV2 (fk) of the sound SV2 is included (or dominant) in the observation signal V1 (t), the kurtosis z6 (fk) is a large numerical value. When the component SV1 (fk) and the component SV2 (fk) are included in the observation signal V1 (t) with substantially the same intensity, the kurtosis z6 (fk) is a small value (central limit theorem). Since the learning process (independent component analysis) of the separation matrix W (fk) is equivalent to the process of specifying the independent bases by the number of the sound sources S, the acoustic SV included in the observation signal V1 (t) with a significant volume is obtained. It can be said that the learning of the separation matrix W (fk) using the learning data D (fk) is more significant as the frequency fk having a larger number of sound sources S (that is, the frequency fk having a smaller kurtosis z6 (fk)).

以上の傾向を考慮して、第４の態様では、観測信号Ｖ1(t)の強度ｘ(m,fk)の度数分布における尖度ｚ6(fk)を有意指標値Ｚ(fk)として利用する。すなわち、有意指標算定部４２は、Ｋ個の周波数ｆ1〜ｆKの各々について数式(14)の演算を実行することで尖度ｚ6(f1)〜ｚ6(fK)を算定する。周波数選別部４４は、Ｋ個の周波数ｆ1〜ｆKのうち尖度ｚ6(fk)が小さい１個以上の周波数ｆk（例えば、昇順で上位の周波数ｆkや閾値を下回る周波数ｆk）を第１周波数ｆAに選別するとともに残余の周波数ｆkを第２周波数ｆBに選別する。 Considering the above tendency, in the fourth mode, the kurtosis z6 (fk) in the frequency distribution of the intensity x (m, fk) of the observation signal V1 (t) is used as the significant index value Z (fk). That is, the significant index calculation unit 42 calculates the kurtosis z6 (f1) to z6 (fK) by executing the calculation of the equation (14) for each of the K frequencies f1 to fK. The frequency selection unit 44 selects one or more frequencies fk (for example, a higher frequency fk in an ascending order or a frequency fk lower than the threshold value in ascending order) from the K frequencies f1 to fK as the first frequency fA. And the remaining frequency fk is selected as the second frequency fB.

ところで、人間の音声の尖度は概ね40から70までの範囲内の数値となる。また、雑音が存在する環境で尖度が低下すること（中心極限定理）や尖度の測定の誤差などを考慮すると、人間の音声の尖度は概ね20から80までの範囲（以下「音声範囲」という）内に収まる。一方、空調設備の動作音や人込みでの雑踏音などの定常的な雑音のみが存在する周波数ｆkについては、観測信号Ｖ1(t)の尖度は充分に低い数値（例えば20を下回る数値）となるから、周波数選別部４４にて第１周波数ｆAに選別される可能性が高い。しかし、音源分離の対象音（ＳV1，ＳV2）が人間の音声であるならば、定常的な雑音の周波数ｆkの学習データＤ(fk)を使用した分離行列Ｗの学習の有意性は低いと言える。 By the way, the kurtosis of human speech is a numerical value in the range of approximately 40 to 70. In addition, considering the reduction of kurtosis in the presence of noise (central limit theorem) and kurtosis measurement errors, the kurtosis of human speech is generally in the range of 20 to 80 (hereinafter referred to as “voice range”). ”). On the other hand, for the frequency fk where only stationary noise such as air-conditioning operation noise or crowded noise is present, the kurtosis of the observation signal V1 (t) is a sufficiently low value (for example, a value below 20). Therefore, there is a high possibility that the frequency selection unit 44 selects the first frequency fA. However, if the target sound (SV1, SV2) for sound source separation is a human voice, it can be said that the learning of the separation matrix W using the learning data D (fk) of the stationary noise frequency fk is low. .

そこで、定常的な雑音の周波数ｆkを第１周波数ｆAに選別することが回避されるように数式(14)の尖度を補正する構成が好適に採用される。例えば、有意指標算定部４２は、数式(14)で定義される数値（以下「補正前尖度」という）と加重値ｑとの乗算値を補正後の尖度ｚ6(fk)として算定する。加重値ｑは、例えば図１６の例示のように補正前尖度に対して非線形に選定される。すなわち、補正前尖度が音声範囲の下限値（例えば20）を下回る範囲については、加重値ｑの乗算による補正後の尖度ｚ6(fk)が音声範囲内の上限値（例えば80）を上回るように、補正前尖度に応じて加重値ｑが可変に選定され、音声範囲内の尖度については加重値ｑは所定値（例えば１）に設定される。なお、音声範囲の上限値を上回る範囲については、補正前尖度が充分に高い（すなわち周波数ｆkが第１周波数ｆAに選別される可能性は低い）ため、加重値ｑは音声範囲内と同等の数値に設定される。以上の構成によれば、所期の音声を高精度に分離できる分離行列Ｗ(fk)を生成することが可能である。 Therefore, a configuration in which the kurtosis of Equation (14) is corrected is preferably employed so that the stationary noise frequency fk is avoided from being selected as the first frequency fA. For example, the significant index calculation unit 42 calculates a multiplication value of a numerical value defined by Equation (14) (hereinafter referred to as “priority before correction”) and a weight value q as a corrected kurtosis z6 (fk). The weight value q is selected non-linearly with respect to the kurtosis before correction, for example, as illustrated in FIG. That is, for a range in which the kurtosis before correction is lower than the lower limit value (for example, 20) of the voice range, the kurtosis z6 (fk) after correction by multiplication of the weight value q exceeds the upper limit value (for example, 80) in the voice range. Thus, the weight value q is variably selected according to the kurtosis before correction, and the weight value q is set to a predetermined value (for example, 1) for the kurtosis in the speech range. In the range exceeding the upper limit of the voice range, the pre-correction kurtosis is sufficiently high (that is, the possibility that the frequency fk is selected as the first frequency fA is low), so the weight value q is equal to that in the voice range. Set to the number of. According to the above configuration, it is possible to generate a separation matrix W (fk) that can separate desired speech with high accuracy.

＜Ｅ：変形例＞
以上の各形態には様々な変形が加えられる。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様は適宜に併合され得る。 <E: Modification>
Various modifications are added to the above embodiments. Specific modifications are exemplified below. Two or more aspects arbitrarily selected from the following examples can be appropriately combined.

（１）変形例１
周波数ｆ1〜ｆKを第１周波数ｆAおよび第２周波数ｆBに選別する方法は適宜に変更される。例えば、以上に例示した複数種の指標から有意指標値Ｚ(fk)を算定する構成が採用され得る。すなわち、有意指標算定部４２は、以上に例示した指標（ｚ1(fk)〜ｚ6(fk)）から選択された複数種の指標の加重和（例えば行列式ｚ1(fk)とトレースｚ5(fk)の加重和）を有意指標値Ｚ(fk)として算定する。 (1) Modification 1
The method of selecting the frequencies f1 to fK into the first frequency fA and the second frequency fB is appropriately changed. For example, a configuration in which the significant index value Z (fk) is calculated from the plurality of types of indexes exemplified above can be adopted. That is, the significant index calculation unit 42 calculates a weighted sum (for example, determinant z1 (fk) and trace z5 (fk) of a plurality of types of indices selected from the indices (z1 (fk) to z6 (fk)) exemplified above. Is calculated as a significant index value Z (fk).

なお、第１周波数ｆAと第２周波数ｆBとの選別に有意指標値Ｚ(fk)を利用する構成（有意指標算定部４２）は省略され得る。具体的には、観測ベクトルＸ(m,fk)（学習データＤ(fk)）とは無関係に周波数ｆkを選別する構成も採用され得る。例えば、周波数選別部４４は、周波数ｆ1〜ｆKの配列から所定個毎に選択した各周波数ｆkを第１周波数ｆAに選別するとともに残余の周波数ｆkを第２周波数ｆBに選別する。また、観測信号Ｖ1(t)および観測信号Ｖ2(t)に想定される音響特性や学習処理の内容等の事情から、学習処理の有意性が高い周波数ｆkが例えば実験的または統計的に事前に判明しているならば、当該周波数ｆkを第１周波数ｆAに選別するとともに残余の周波数ｆkを第２周波数ｆBに選別する構成が採用され得る。以上の例示のように有意指標値Ｚ(fk)の算定を省略すれば、演算処理装置１２の演算量が削減されるという利点がある。 Note that the configuration (significant index calculating unit 42) that uses the significant index value Z (fk) for selection between the first frequency fA and the second frequency fB can be omitted. Specifically, a configuration in which the frequency fk is selected regardless of the observation vector X (m, fk) (learning data D (fk)) may be employed. For example, the frequency sorting unit 44 sorts each frequency fk selected for each predetermined number from the arrangement of the frequencies f1 to fK into the first frequency fA and sorts the remaining frequency fk into the second frequency fB. Further, the frequency fk having a high significance of the learning process is experimentally or statistically determined in advance, for example, due to circumstances such as the acoustic characteristics assumed for the observation signal V1 (t) and the observation signal V2 (t) and the contents of the learning process. If known, a configuration may be adopted in which the frequency fk is selected as the first frequency fA and the remaining frequency fk is selected as the second frequency fB. If the calculation of the significant index value Z (fk) is omitted as illustrated above, there is an advantage that the calculation amount of the arithmetic processing device 12 is reduced.

（２）変形例２
第２実施形態では、観測ベクトルＸ(m,fk)の共分散行列Ｒxx(fk)の行列式ｚ1(fk)を収音条件の良否の判定に適用したが、収音条件の良否の判定の方法は任意である。例えば、観測ベクトルＸ(m,fk)の共分散行列Ｒxx(fk)の条件数ｚ2(fk)は、数値解析の難易の尺度として機能する。学習データＤ(fk)の数値解析が容易であるほど収音条件が良いという観点からすると、有意指標算定部４２が算定する条件数ｚ2(fk)に応じて収音条件の良否を判定する構成が採用され得る。条件数ｚ2(fk)が１に近いほど収音条件は良いと評価できるから、条件判定部７２は、条件数ｚ2(fk)が閾値を下回る場合には周波数ｆkの収音条件が良い（良条件）と判定し、条件数ｚ2(fk)が閾値を上回る場合には周波数ｆkの収音条件が悪い（悪条件）と判定する。収音条件が悪い周波数ｆk（第２周波数ｆB）については２次学習処理が省略される。 (2) Modification 2
In the second embodiment, the determinant z1 (fk) of the covariance matrix Rxx (fk) of the observation vector X (m, fk) is applied to determine the quality of the sound collection condition. The method is arbitrary. For example, the condition number z2 (fk) of the covariance matrix Rxx (fk) of the observation vector X (m, fk) functions as a measure of difficulty in numerical analysis. From the viewpoint that the sound collection condition is better as the numerical analysis of the learning data D (fk) is easier, the configuration for determining the quality of the sound collection condition according to the condition number z2 (fk) calculated by the significant index calculation unit 42 Can be employed. Since it can be evaluated that the sound collection condition is better as the condition number z2 (fk) is closer to 1, the condition determination unit 72 has a better sound collection condition of the frequency fk when the condition number z2 (fk) is lower than the threshold (good). If the condition number z2 (fk) exceeds the threshold, it is determined that the sound collection condition of the frequency fk is bad (bad condition). The secondary learning process is omitted for the frequency fk (second frequency fB) where the sound collection condition is bad.

なお、図９および図１０を参照すると、振幅比ＲAが0.5を下回る場合に雑音抑圧率の低下やケプストラム歪の増加が顕在化するから、振幅比ＲAが0.5を下回る場合に悪条件と評価するのが妥当である。条件数ｚ2(fk)は、音響ＳV1と音響ＳV2とのパワーの相対比に相当するから、振幅比ＲAが0.5である（パワーの相対比が0.25）である場合には、条件数ｚ2(fk)が４となることが期待される。したがって、収音条件の良否の判定に条件数ｚ2(fk)を利用する場合には、収音条件の良否の閾値を４に設定する（すなわち、条件数ｚ2(fk)が４を下回る場合に良条件と判定し、条件数ｚ2(fk)が４を上回る場合に悪条件と判定する）構成が好適に採用され得る。 Referring to FIGS. 9 and 10, since the reduction of the noise suppression rate and the increase of the cepstrum distortion become apparent when the amplitude ratio RA is below 0.5, it is evaluated as an unfavorable condition when the amplitude ratio RA is below 0.5. Is reasonable. Since the condition number z2 (fk) corresponds to the relative ratio of the power of the sound SV1 and the sound SV2, when the amplitude ratio RA is 0.5 (the relative ratio of power is 0.25), the condition number z2 (fk ) Is expected to be 4. Therefore, when the condition number z2 (fk) is used to determine whether or not the sound collection condition is good, the threshold value for the sound collection condition is set to 4 (that is, when the condition number z2 (fk) is less than 4). A configuration in which a good condition is determined and a bad condition is determined when the condition number z2 (fk) exceeds 4 can be suitably employed.

以上の例示のように、周波数ｆkの選別に適用される有意指標Ｚ(fk)（ｚ1(fk)，ｚ2(fk)）を収音条件の良否の判定に流用する構成によれば、周波数ｆkの選別と収音条件の判定とに別個の指標を適用する構成と比較して演算量が削減されるという利点がある。ただし、周波数ｆkの選別と収音条件の判定とに別個の指標を適用する構成も採用され得る。例えば、収音条件の判定には行列式ｚ1(fk)を適用し、周波数ｆkの選別には行列式ｚ1(fk)以外の有意指標Ｚ(fk)（ｚ2(fk)〜ｚ6(fk)）を適用する構成が採用される。 As described above, according to the configuration in which the significant index Z (fk) (z1 (fk), z2 (fk)) applied to the selection of the frequency fk is used for the determination of the sound collection condition, the frequency fk There is an advantage that the amount of calculation is reduced as compared with the configuration in which separate indicators are applied to the selection of sound and the determination of the sound pickup condition. However, a configuration in which separate indicators are applied to the selection of the frequency fk and the determination of the sound collection condition can also be adopted. For example, the determinant z1 (fk) is applied for the determination of the sound pickup condition, and the significant index Z (fk) (z2 (fk) to z6 (fk)) other than the determinant z1 (fk) is selected for selecting the frequency fk. The structure which applies is adopted.

（３）変形例３
初期行列設定部５２が初期分離行列ＷA^[0](fk)を生成する方法は任意である。例えば、乱数を要素とする初期分離行列ＷA^[0](fk)を初期行列設定部５２が生成する構成が採用され得る。以上では音源Ｓ1の角度θ1や音源Ｓ2の角度θ2が未知である場合（事前情報を利用しない場合）を例示したが、事前情報（角度θ1や角度θ2）を利用して初期分離行列ＷA^[0](fk)を生成する構成も好適である。事前情報を利用した初期分離行列ＷA^[0](fk)の生成には、橘ほか５名，“Efficient Blind Source Separation Combining Closed-Form Second Order ICA and Nonclosed-Form Higher-Order ICA”， International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 1, p. 45-48, Apr. 2007に開示された主成分分析や２次統計量ICAなどの部分空間法、または、特許第3949074号公報に開示された適応型ビーム形成が好適に採用され得る。また、MUSIC（MUltiple SIgnal Classification）法や最小分散法で推定した各音源Ｓの方向から各種のビーム形成（例えば適応型ビーム形成）を利用して初期分離行列ＷA^[0](fk)を生成する方法や、因子分析で特定した因子ベクトルや正準相関分析で特定した正準ベクトルから初期分離行列ＷA^[0](fk)を生成する方法も採用される。 (3) Modification 3
The method of generating the initial separation matrix WA ^[0] (fk) by the initial matrix setting unit 52 is arbitrary. For example, a configuration in which the initial matrix setting unit 52 generates the initial separation matrix WA ^[0] (fk) having random numbers as elements can be employed. In the above, the case where the angle θ1 of the sound source S1 and the angle θ2 of the sound source S2 are unknown (when the prior information is not used) is illustrated, but the initial separation matrix WA ^[0 is used using the prior information (angle θ1 and angle θ2). ^Arrangements for generating a (fk) is also suitable. The initial separation matrix WA ^[0] (fk) using prior information was generated by Tachibana et al., “Efficient Blind Source Separation Combining Closed-Form Second Order ICA and Nonclosed-Form Higher-Order ICA”, International Conference on Subspace methods such as principal component analysis and second order statistics ICA disclosed in Acoustics, Speech, and Signal Processing (ICASSP), Vol. 1, p. 45-48, Apr. 2007, or Japanese Patent No. 3990774 The adaptive beam forming disclosed in (1) can be suitably employed. In addition, the initial separation matrix WA ^[0] (fk) is generated from the direction of each sound source S estimated by the MUSIC (MUltiple SIgnal Classification) method or the minimum variance method using various beam forming (for example, adaptive beam forming). Alternatively, a method of generating an initial separation matrix WA ^[0] (fk) from a factor vector specified by factor analysis or a canonical vector specified by canonical correlation analysis may be employed.

１００……信号処理装置、１２……演算処理装置、１４……記憶装置、２２……周波数分析部、２４……信号分離部、２６……信号合成部、２８，２８A，２８B……分離行列生成部、４２……有意指標算定部、４２２……共分散行列算定部、４２４……行列式算定部、４４……周波数選別部、５０……第１処理部、５２……初期行列設定部、５４……第１学習部、５６……補正処理部、６０……第２処理部、６２……方向推定部、６４……初期行列設定部、６６……第２学習部、７２……条件判定部。
DESCRIPTION OF SYMBOLS 100 ... Signal processing device, 12 ... Arithmetic processing device, 14 ... Memory | storage device, 22 ... Frequency analysis part, 24 ... Signal separation part, 26 ... Signal composition part, 28, 28A, 28B ... Separation matrix Generation unit 42... Significant index calculation unit 422... Covariance matrix calculation unit 424 .. Determinant calculation unit 44... Frequency selection unit 50... First processing unit 52. 54... First learning unit 56... Correction processing unit 60... Second processing unit 62 .. Direction estimation unit 64. Condition determination unit.

Claims

A plurality of separated signals for each sound source by applying a separation matrix of each of a plurality of frequencies to a plurality of observation signals obtained by collecting a plurality of sound mixed sounds generated by different sound sources by a plurality of sound collecting devices Signal separating means for generating
Frequency sorting means for sorting the plurality of frequencies into a first frequency and a second frequency;
First learning means for generating the separation matrix of the first frequency by primary learning processing applying learning data corresponding to the component of the first frequency in the plurality of observation signals;
Direction estimating means for estimating the direction of each sound source from the separation matrix generated by the first learning means;
Initial matrix setting means for generating an initial separation matrix so that a dead angle or beam of sound collection is formed in the direction estimated by the direction estimation means;
The secondary learning process using the learning data corresponding to the component of the second frequency in the plurality of observation signals is less than the primary learning process with the initial separation matrix generated by the initial matrix setting means as an initial value. A signal processing apparatus comprising: a second learning unit that generates the separation matrix of the second frequency by executing the number of iterations.

It comprises a condition determination means for determining the quality of sound collection conditions for each frequency,
The second learning means uses the initial separation matrix as an initial value for the frequencies determined by the condition determination means that the sound collection condition is good among the frequencies selected as the second frequency. The signal processing device according to claim 1, wherein a separation matrix is generated by processing, and the initial separation matrix is used as the separation matrix for the frequency determined by the condition determination unit when the sound pickup condition is bad.

Significant index calculation means for calculating, for each frequency, a significant index value indicating the significance of learning processing using the learning data of each frequency from the plurality of observation signals,
The signal processing device according to claim 2, wherein the frequency sorting unit sorts the plurality of frequencies into the first frequency and the second frequency according to a significant index value of each frequency.

The signal processing apparatus according to claim 3, wherein the condition determination unit determines whether the sound collection condition for each frequency is acceptable according to a significant index value of each frequency calculated by the significant index calculation unit.

The signal processing device according to claim 3 or 4, wherein the significant index calculation means calculates, as the significant index, a determinant of a covariance matrix of an observation vector having an intensity at each frequency in each of a plurality of observation signals as an element. .