JP5387442B2 - Signal processing device - Google Patents

Signal processing device Download PDF

Info

Publication number
JP5387442B2
JP5387442B2 JP2010038295A JP2010038295A JP5387442B2 JP 5387442 B2 JP5387442 B2 JP 5387442B2 JP 2010038295 A JP2010038295 A JP 2010038295A JP 2010038295 A JP2010038295 A JP 2010038295A JP 5387442 B2 JP5387442 B2 JP 5387442B2
Authority
JP
Japan
Prior art keywords
frequency
matrix
sound
separation matrix
separation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2010038295A
Other languages
Japanese (ja)
Other versions
JP2011176535A (en
Inventor
多伸 近藤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yamaha Corp
Original Assignee
Yamaha Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yamaha Corp filed Critical Yamaha Corp
Priority to JP2010038295A priority Critical patent/JP5387442B2/en
Publication of JP2011176535A publication Critical patent/JP2011176535A/en
Application granted granted Critical
Publication of JP5387442B2 publication Critical patent/JP5387442B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Description

本発明は、相異なる音源が発生した複数の音響の混合音のうち特定の音源からの音響を強調(分離または抽出)する技術に関する。   The present invention relates to a technique for emphasizing (separating or extracting) sound from a specific sound source among a plurality of mixed sound generated by different sound sources.

複数の音響(音声や雑音)の混合音を複数の収音機器で収音した複数の観測信号に音源分離を実行することで各音源からの音響を分離する技術(音源分離技術)が従来から提案されている。音源分離に適用される分離行列(逆混合行列)は、例えば周波数領域の独立成分分析(FDICA:Frequency-Domain Independent Component Analysis))を利用した学習処理(反復的な更新)で周波数毎に算定される。   Conventionally, there is a technology (sound source separation technology) that separates sound from each sound source by performing sound source separation on multiple observation signals obtained by collecting mixed sounds of multiple sounds (speech and noise) with multiple sound collection devices. Proposed. The separation matrix (inverse mixing matrix) applied to sound source separation is calculated for each frequency by a learning process (iterative update) using, for example, frequency-domain independent component analysis (FDICA). The

非特許文献1には、複数の周波数から所定個毎に選択した各周波数について学習処理で分離行列を生成し、学習処理後の分離行列を利用して非選択の各周波数の分離行列を補充する技術が開示されている。非選択の周波数の分離行列の生成には死角制御型のビーム形成(NBF(Null Beam Former))が利用される。すなわち、学習処理後の分離行列から推定される音源方向に収音の死角が形成されるように非選択の周波数の分離行列が設定される。非特許文献1の技術によれば、独立成分分析による学習処理を当初から全部の周波数について実行する場合と比較して演算量を削減することが可能である。   In Non-Patent Document 1, a separation matrix is generated by learning processing for each frequency selected from a plurality of frequencies for each predetermined number, and a separation matrix for each non-selected frequency is supplemented using the separation matrix after learning processing. Technology is disclosed. A blind spot control type beam forming (NBF (Null Beam Former)) is used to generate a separation matrix of non-selected frequencies. In other words, a non-selected frequency separation matrix is set so that a dead angle of sound collection is formed in the sound source direction estimated from the separation matrix after learning processing. According to the technique of Non-Patent Document 1, it is possible to reduce the amount of calculation compared to a case where learning processing by independent component analysis is executed for all frequencies from the beginning.

大迫ほか3名,“死角制御型ビームフォーマによる周波数帯域補間を用いたブラインド音源分離の高速化手法”,日本音響学会講演論文集,日本音響学会,2007年3月,p.549-p.550Osako et al., “Fast method of blind source separation using frequency band interpolation by blind spot control type beamformer”, Proceedings of the Acoustical Society of Japan, Acoustical Society of Japan, March 2007, p.549-p.550

しかし、非特許文献1のように死角制御型のビーム形成で生成された分離行列を非選択の周波数の分離行列として利用する構成では、非選択の周波数について音源分離の精度を充分に確保できない可能性がある。以上の事情を考慮して、本発明は、分離行列の生成に必要な演算量の削減と音源分離の高精度化との両立を目的とする。   However, in the configuration in which the separation matrix generated by the blind spot control type beam forming as in Non-Patent Document 1 is used as the separation matrix of the non-selected frequency, the accuracy of sound source separation cannot be sufficiently ensured for the non-selected frequency. There is sex. In view of the above circumstances, an object of the present invention is to achieve both reduction in the amount of calculation required for generating a separation matrix and high accuracy in sound source separation.

以上の課題を解決するために、本発明の信号処理装置は、相異なる音源が発生した複数の音響の混合音を複数の収音機器で収音した複数の観測信号に対して複数の周波数の各々の分離行列を適用することで音源毎の複数の分離信号を生成する信号分離手段と、複数の周波数を第1周波数と第2周波数とに選別する周波数選別手段と、複数の観測信号における第1周波数の成分に対応する学習データを適用した1次学習処理で当該第1周波数の分離行列を生成する第1学習手段と、第1学習手段が生成した分離行列から各音源の方向を推定する方向推定手段と、方向推定手段が推定した方向に収音の死角またはビームが形成されるように初期分離行列を生成する初期行列設定手段と、複数の観測信号における第2周波数の成分に対応する学習データを適用した2次学習処理を、初期行列設定手段が生成した初期分離行列を初期値として、1次学習処理よりも少ない反復回数で実行することで、当該第2周波数の分離行列を生成する第2学習手段とを具備する。   In order to solve the above problems, the signal processing apparatus according to the present invention has a plurality of frequencies for a plurality of observation signals obtained by collecting a plurality of sound mixed sounds generated by different sound sources by a plurality of sound collecting devices. A signal separation unit that generates a plurality of separation signals for each sound source by applying each separation matrix, a frequency selection unit that selects a plurality of frequencies into a first frequency and a second frequency, and a first in the plurality of observation signals First learning means for generating a separation matrix of the first frequency in a primary learning process using learning data corresponding to a component of one frequency, and estimating the direction of each sound source from the separation matrix generated by the first learning means Corresponding to direction estimation means, initial matrix setting means for generating an initial separation matrix so that a dead angle or beam of sound collection is formed in the direction estimated by the direction estimation means, and components of the second frequency in a plurality of observation signals Learning The second frequency separation process is performed by using the initial separation matrix generated by the initial matrix setting means as the initial value with a smaller number of iterations than the first learning process, thereby generating the second frequency separation matrix. Second learning means.

以上の構成においては、第1周波数に選別された各周波数については1次学習処理で分離行列が生成され、第2周波数に選別された各周波数については、1次学習処理で生成された分離行列に応じた初期分離行列を初期値として、1次学習処理よりも少ない反復回数の2次学習処理を実行することで分離行列が生成される。したがって、全部の周波数について1次学習処理を実行する場合と比較して演算量が削減されるという利点がある。また、1次学習処理で生成された分離行列から推定される音源の方向に収音の死角またはビームが形成されるように設定された分離行列を第2周波数について適用する構成(2次学習処理を実行しない構成)と比較して、高精度な音源分離が可能な分離行列を生成できるという利点もある。   In the above configuration, a separation matrix is generated in the primary learning process for each frequency selected as the first frequency, and a separation matrix generated in the primary learning process for each frequency selected as the second frequency. The separation matrix is generated by executing the secondary learning process with a smaller number of iterations than the primary learning process with the initial separation matrix corresponding to the initial value as the initial value. Therefore, there is an advantage that the calculation amount is reduced as compared with the case where the primary learning process is executed for all frequencies. Also, a configuration (secondary learning process) in which a separation matrix set so that a dead angle or a beam of sound collection is formed in the direction of a sound source estimated from the separation matrix generated by the primary learning process is applied to the second frequency. Compared with a configuration that does not execute the above, there is an advantage that a separation matrix capable of high-accuracy sound source separation can be generated.

ところで、収音条件が劣悪な環境では、第2周波数について2次学習処理を実行しないほうが高精度な分離行列を生成できる場合がある。以上の傾向を考慮して、本発明の好適な態様に係る信号処理装置は、収音条件の良否を周波数毎に判定する条件判定手段を具備し、第2学習手段は、収音条件が良いと条件判定手段が判定した周波数については、初期分離行列を初期値とした第2学習処理で分離行列を生成し、収音条件が悪いと条件判定手段が判定した周波数については、初期分離行列を分離行列として採用する。以上の態様においては、収音条件が悪い周波数については2次学習処理が実行されないから、第2周波数に選別された全部の周波数について収音条件に関わらず2次学習処理を実行する構成と比較すると、高精度な分離行列を生成することが可能となる。なお、以上の態様の具体例は例えば第2実施形態として後述される。   By the way, in an environment where the sound collection conditions are poor, there is a case where a high-precision separation matrix can be generated without performing the secondary learning process for the second frequency. Considering the above tendency, the signal processing apparatus according to a preferred aspect of the present invention includes a condition determination unit that determines whether the sound collection condition is good or not for each frequency, and the second learning unit has a good sound collection condition. For the frequencies determined by the condition determination means, a separation matrix is generated by the second learning process using the initial separation matrix as an initial value, and for the frequencies determined by the condition determination means that the sound collection condition is bad, the initial separation matrix is Adopt as a separation matrix. In the above aspect, since the secondary learning process is not executed for frequencies with poor sound collection conditions, it is compared with the configuration in which the secondary learning process is executed for all frequencies selected as the second frequency regardless of the sound collection conditions. Then, it becomes possible to generate a highly accurate separation matrix. In addition, the specific example of the above aspect is later mentioned as 2nd Embodiment, for example.

また、観測信号のうち第2周波数に選別された周波数の成分に1個の音源の音響のみが含まれる場合には、音源分離の前後で当該周波数の成分が過度に変化しないように分離行列を設定する構成が好適である。そこで、収音条件の良否(音源数の単数/複数)を周波数毎に判定する条件判定手段と、第2周波数に選別された各周波数のうち収音条件が悪いと条件判定手段が判定した周波数について、複数の観測信号から推定される音源方向からの到来音が強調されるように分離行列を設定する行列設定手段(例えば図14の行列設定部76)とを具備する構成が採用され得る。以上の態様では、分離行列の生成に必要な演算量を削減するという観点から、第2周波数に選別された各周波数のうち収音条件が悪いと条件判定手段が判定した周波数について、初期行列設定手段による初期分離行列の生成と第2学習手段による2次学習処理とを停止する構成が格別に好適である。なお、以上の態様の具体例は例えば第3実施形態として後述される。   In addition, when only the sound of one sound source is included in the frequency component selected as the second frequency in the observation signal, the separation matrix is set so that the frequency component does not change excessively before and after sound source separation. A configuration to be set is preferable. Therefore, condition determination means for determining whether the sound collection condition is good (single / plurality of sound sources) for each frequency, and the frequency determined by the condition determination means that the sound collection condition is bad among the frequencies selected as the second frequency. For example, a configuration including matrix setting means (for example, the matrix setting unit 76 in FIG. 14) that sets a separation matrix so that incoming sounds from sound source directions estimated from a plurality of observation signals are emphasized may be employed. In the above aspect, from the viewpoint of reducing the amount of computation required for generating the separation matrix, initial matrix setting is performed for the frequencies determined by the condition determination means that the sound collection condition is bad among the frequencies selected as the second frequency. A configuration in which the generation of the initial separation matrix by the means and the secondary learning process by the second learning means is stopped is particularly suitable. In addition, the specific example of the above aspect is later mentioned as 3rd Embodiment, for example.

本発明の好適な態様に係る信号処理装置は、各周波数の学習データを適用した学習処理の有意性を示す有意指標値を複数の観測信号から周波数毎に算定する有意指標算定手段を具備し、周波数選別手段は、各周波数の有意指標値に応じて複数の周波数を第1周波数と第2周波数とに選別する。以上の態様においては、学習処理の有意性を示す有意指標値に応じて複数の周波数が第1周波数と第2周波数とに選別されるから、第1周波数および第2周波数の選別を学習処理の有意性とは無関係に実行する構成(例えば複数の周波数の配列から所定個毎に選択した周波数を第1周波数に選別するとともに残余の周波数を第2周波数に選別する構成)と比較して、高精度な分離行列を生成することが可能である。具体的には、条件判定手段は、相異なる音源が発生した複数の音響の強度の相違が大きい場合に収音条件が悪いと判定し、各音響の強度の相違が小さい場合に収音条件が良いと判定する。   The signal processing apparatus according to a preferred aspect of the present invention includes a significant index calculation means for calculating a significant index value indicating the significance of the learning process to which the learning data of each frequency is applied for each frequency from a plurality of observation signals, The frequency sorting means sorts a plurality of frequencies into a first frequency and a second frequency according to a significant index value of each frequency. In the above aspect, since the plurality of frequencies are sorted into the first frequency and the second frequency according to the significant index value indicating the significance of the learning process, the selection of the first frequency and the second frequency is performed in the learning process. Compared to a configuration that is executed regardless of significance (for example, a configuration in which a frequency selected for each predetermined number from a plurality of frequencies is selected as a first frequency and a remaining frequency is selected as a second frequency) It is possible to generate an accurate separation matrix. Specifically, the condition determination means determines that the sound collection condition is bad when the differences in the intensity of a plurality of sounds generated by different sound sources are large, and the sound collection condition is determined when the difference in the intensity of each sound is small. Judge as good.

なお、有意指標算定手段を具備する態様では、有意指標算定手段が算定した各周波数の有意指標値に応じて条件判定手段が周波数毎の収音条件の良否を判定する構成が格別に好適である。以上の態様においては、収音条件の良否の指標を有意指標値とは別個に算定する構成と比較して、分離行列の生成に必要な演算量を削減できるという利点がある。具体的には、複数の観測信号の各々における各周波数での強度を要素とする観測ベクトルの共分散行列の行列式は、学習処理の有意性を示す指標として利用され、かつ、相異なる音源が発生した複数の音響の強度の相違(収音条件の良否)に応じて変化する。そこで、観測ベクトルの共分散行列の行列式を、周波数の選別と収音条件の良否の判定とに流用する構成が好適である。   In the aspect including the significant index calculation means, a configuration in which the condition determination means determines the quality of the sound pickup condition for each frequency according to the significant index value of each frequency calculated by the significant index calculation means is particularly suitable. . In the above aspect, there is an advantage that the amount of calculation required for generating the separation matrix can be reduced as compared with the configuration in which the index of the sound collection condition is calculated separately from the significant index value. Specifically, the determinant of the covariance matrix of the observation vector whose element is the intensity at each frequency in each of the plurality of observation signals is used as an index indicating the significance of the learning process, and different sound sources are used. It changes according to the difference in the intensity of the plurality of generated sounds (good or bad sound collection conditions). Therefore, a configuration in which the determinant of the covariance matrix of the observation vector is used for selecting the frequency and determining the quality of the sound collection condition is preferable.

なお、独立成分分析による学習処理は、独立な基底を音源の個数だけ特定する処理と等価であるから、複数の観測信号の各々における各周波数での強度を要素とする観測ベクトルの基底の総数は、学習データを利用した学習の有意性の指標として好適に利用される。そこで、本発明の好適な態様における有意指標算定手段は、複数の観測信号の各々における各周波数での強度を要素とする観測ベクトルの分布における基底の総数の指標値を算定し、周波数選別手段は、指標値が示す基底の総数が多い周波数を第1周波数に選別する。   Note that the learning process by independent component analysis is equivalent to the process of specifying the independent bases by the number of sound sources, so the total number of observation vector bases whose elements are the intensities at each frequency in each of the plurality of observation signals is It is preferably used as an index of the significance of learning using learning data. Therefore, the significant index calculation means in a preferred aspect of the present invention calculates the index value of the total number of bases in the distribution of observation vectors whose elements are the intensities at each frequency in each of the plurality of observation signals, and the frequency selection means The frequency having the large total number of bases indicated by the index value is selected as the first frequency.

基底の総数の指標値としては、例えば、観測ベクトルの共分散行列の行列式や条件数が例示される。したがって、本発明の好適な態様における有意指標算定手段は、複数の観測信号における各周波数の成分の強度を要素とする観測ベクトルの共分散行列を複数の周波数の各々について算定する共分散行列算定手段と、各周波数の共分散行列から有意指標値を算定する指標算定手段(例えば図6の行列式算定部424)とを含んで構成される。指標算定手段は、例えば共分散行列の行列式や条件数に応じて有意指標値を算定する。また、観測ベクトルの共分散行列のトレース(パワー)が大きいほど観測ベクトルの分布領域(基底)が音源毎に明確に特定されるという傾向を考慮すると、複数の観測信号の共分散行列のトレースから有意指標算定手段が有意指標値を算定する構成も好適である。   Examples of the index value of the total number of bases include determinants of the covariance matrix of observation vectors and condition numbers. Therefore, the significance index calculation means in a preferred aspect of the present invention is a covariance matrix calculation means for calculating an observation vector covariance matrix having each frequency component in a plurality of observation signals as an element. And index calculation means (for example, determinant calculation unit 424 in FIG. 6) for calculating a significant index value from the covariance matrix of each frequency. The index calculation means calculates a significant index value according to, for example, the determinant of the covariance matrix and the condition number. Considering the tendency that the distribution region (basis) of the observation vector is clearly identified for each sound source as the trace (power) of the observation vector covariance matrix is larger, A configuration in which the significant index calculation means calculates the significant index value is also suitable.

なお、有意指標値の定義や算定の方法は任意である。例えば、観測信号の強度の度数分布における尖度が低いほど観測信号が多くの音源からの音を含むという傾向を考慮すると、観測信号の強度の度数分布における尖度に応じた有意指標値を有意指標算定手段が算定し、尖度が低い周波数を周波数選別手段が第1周波数に選別する構成が好適である。また、複数の観測信号の相互間の独立性が高い(相関が低い)ほど、学習データを利用した学習の有意性は高いという傾向を考慮すると、複数の観測信号の相互間の独立性に応じた有意指標値を有意指標算定手段が算定し、有意指標値が示す独立性が高い周波数を周波数選別手段が第1周波数に選別する構成が好適である。複数の観測信号の相互間の独立性の指標値としては、例えば、相互相関や相互情報量が例示される。   In addition, the definition of a significant index value and the calculation method are arbitrary. For example, considering the tendency that the lower the kurtosis in the intensity distribution of the observed signal, the more the observed signal contains sound from the sound source, the significant index value corresponding to the kurtosis in the intensity distribution of the observed signal is significant. A configuration in which the index calculating unit calculates and the frequency selecting unit selects a frequency having a low kurtosis as the first frequency is preferable. In addition, considering the tendency that the higher the independence between multiple observation signals (the lower the correlation), the higher the significance of learning using learning data, it depends on the independence between multiple observation signals. A configuration in which the significant index value is calculated by the significant index calculation means, and the frequency selecting means selects the first independent frequency indicated by the significant index value is preferable. Examples of the index value of independence among a plurality of observation signals include cross-correlation and mutual information.

以上の各態様に係る信号処理装置は、音声の処理に専用されるDSP(Digital Signal Processor)などのハードウェア(電子回路)によって実現されるほか、CPU(Central Processing Unit)などの汎用の演算処理装置とプログラムとの協働によっても実現される。本発明に係るプログラムは、相異なる音源が発生した複数の音響の混合音を複数の収音機器で収音した複数の観測信号に対して複数の周波数の各々の分離行列を適用することで音源毎の複数の分離信号を生成する信号分離処理と、複数の周波数を第1周波数と第2周波数とに選別する周波数選別処理と、複数の観測信号における第1周波数の成分に対応する学習データを適用した1次学習処理で当該第1周波数の分離行列を生成する第1処理と、第1処理で生成した分離行列から各音源の方向を推定する方向推定処理と、方向推定処理で推定した方向に収音の死角またはビームが形成されるように初期分離行列を生成する初期行列設定処理と、複数の観測信号における第2周波数の成分に対応する学習データを適用した2次学習処理を、初期行列設定手段が生成した初期分離行列を初期値として、1次学習処理よりも少ない反復回数で実行することで、当該第2周波数の分離行列を生成する第2処理とをコンピュータに実行される。以上のプログラムによれば、本発明に係る信号処理装置と同様の作用および効果が奏される。本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で利用者に提供されてコンピュータにインストールされるほか、通信網を介した配信の形態でサーバ装置から提供されてコンピュータにインストールされる。   The signal processing apparatus according to each aspect described above is realized by hardware (electronic circuit) such as a DSP (Digital Signal Processor) dedicated to voice processing, and general-purpose arithmetic processing such as a CPU (Central Processing Unit). This is also realized by cooperation between the apparatus and the program. The program according to the present invention applies a separation matrix of each of a plurality of frequencies to a plurality of observation signals obtained by collecting a plurality of sound mixed sounds generated by different sound sources by a plurality of sound collecting devices. A signal separation process for generating a plurality of separated signals for each, a frequency sorting process for sorting a plurality of frequencies into a first frequency and a second frequency, and learning data corresponding to a component of the first frequency in the plurality of observation signals. A first process for generating a separation matrix of the first frequency in the applied primary learning process, a direction estimation process for estimating the direction of each sound source from the separation matrix generated in the first process, and a direction estimated by the direction estimation process Initial matrix setting processing for generating an initial separation matrix so that a dead angle or beam of sound collection is formed in the second stage, and secondary learning processing using learning data corresponding to components of the second frequency in a plurality of observation signals, The initial separation matrix column setting means is generated as an initial value by executing a small number of iterations than the primary learning process is executed and a second process of generating the second frequency of the separating matrix in the computer. According to the above program, the same operation and effect as the signal processing apparatus according to the present invention are exhibited. The program of the present invention is provided to a user in a form stored in a computer-readable recording medium and installed in the computer, or provided from a server device in a form of distribution via a communication network and installed in the computer. Is done.

第1実施形態に係る信号処理装置のブロック図である。1 is a block diagram of a signal processing device according to a first embodiment. 観測ベクトルおよび学習データの説明図である。It is explanatory drawing of an observation vector and learning data. 信号分離部のブロック図である。It is a block diagram of a signal separation unit. 分離行列生成部のブロック図である。It is a block diagram of a separation matrix production | generation part. 分離行列生成部の動作の説明図である。It is explanatory drawing of operation | movement of a separation matrix production | generation part. 有意指標算定部のブロック図である。It is a block diagram of a significant index calculation part. 観測ベクトルの共分散行列の行列式と基底数との関係を示す概念図である。It is a conceptual diagram which shows the relationship between the determinant of the covariance matrix of an observation vector, and a basis number. 第1周波数の個数と学習処理の反復回数との関係を示すグラフである。It is a graph which shows the relationship between the number of 1st frequencies, and the repetition frequency of a learning process. 第1周波数の個数と雑音抑圧率との関係を示すグラフである。It is a graph which shows the relationship between the number of 1st frequencies, and a noise suppression rate. 第1周波数の個数とケプストラム歪との関係を示すグラフである。It is a graph which shows the relationship between the number of 1st frequencies, and cepstrum distortion. 第1周波数の個数と雑音抑圧率との関係を示すグラフである。It is a graph which shows the relationship between the number of 1st frequencies, and a noise suppression rate. 第1周波数の個数とケプストラム歪との関係を示すグラフである。It is a graph which shows the relationship between the number of 1st frequencies, and cepstrum distortion. 第2実施形態における分離行列生成部のブロック図である。It is a block diagram of the separation matrix production | generation part in 2nd Embodiment. 第3実施形態における分離行列生成部のブロック図である。It is a block diagram of the separation matrix production | generation part in 3rd Embodiment. 共分散行列のトレースと観測ベクトルの分布範囲との関係を示す概念図である。It is a conceptual diagram which shows the relationship between the trace of a covariance matrix, and the distribution range of an observation vector. 補正前尖度と加重値との関係を示すグラフである。It is a graph which shows the relationship between kurtosis before correction | amendment, and a weight value.

<A:第1実施形態>
図1は、第1実施形態に係る信号処理装置100のブロック図である。相互に間隔をあけて平面PL内に配置された収音機器M1および収音機器M2が信号処理装置100に接続される。収音機器M1および収音機器M2の周辺の相異なる位置には音源S1および音源S2が存在する。音源S1は、平面PLの法線Lnに対して角度θ1の方向に位置し、音源S2は、法線Lnに対して角度θ2(θ2≠θ1)の方向に位置する。角度θ1および角度θ2は未知である。なお、収音機器M(M1,M2)の個数や音源S(S1,S2)の個数は任意に変更され得る。
<A: First Embodiment>
FIG. 1 is a block diagram of a signal processing apparatus 100 according to the first embodiment. The sound collecting device M1 and the sound collecting device M2 arranged in the plane PL with a space therebetween are connected to the signal processing apparatus 100. The sound source S1 and the sound source S2 exist at different positions around the sound collection device M1 and the sound collection device M2. The sound source S1 is located in the direction of the angle θ1 with respect to the normal line Ln of the plane PL, and the sound source S2 is located in the direction of the angle θ2 (θ2 ≠ θ1) with respect to the normal line Ln. Angle θ1 and angle θ2 are unknown. Note that the number of sound collecting devices M (M1, M2) and the number of sound sources S (S1, S2) can be arbitrarily changed.

音源S1が発生した音響SV1と音源S2が発生した音響SV2との混合音が収音機器M1および収音機器M2に到達する。収音機器M1は観測信号V1(t)を生成し、収音機器M2は観測信号V2(t)を生成する。観測信号V1(t)および観測信号V2(t)の各々は、音響SV1と音響SV2との混合音の時間波形を表す音響信号である。   The mixed sound of the sound SV1 generated by the sound source S1 and the sound SV2 generated by the sound source S2 reaches the sound collection device M1 and the sound collection device M2. The sound collecting device M1 generates an observation signal V1 (t), and the sound collecting device M2 generates an observation signal V2 (t). Each of the observation signal V1 (t) and the observation signal V2 (t) is an acoustic signal representing a time waveform of a mixed sound of the sound SV1 and the sound SV2.

信号処理装置100は、観測信号V1(t)および観測信号V2(t)に対する音源分離(フィルタ処理)で分離信号Y1(t)および分離信号Y2(t)を生成する。分離信号Y1(t)は、音源S1からの音響SV1を強調(音源S2からの音響SV2を抑制)した音響信号であり、分離信号Y2(t)は、音響SV2を強調(音響SV1を抑制)した音響信号である。すなわち、音響SV1と音響SV2とが分離(音源分離)される。   The signal processing apparatus 100 generates a separation signal Y1 (t) and a separation signal Y2 (t) by sound source separation (filter processing) for the observation signal V1 (t) and the observation signal V2 (t). The separated signal Y1 (t) is an acoustic signal that emphasizes the sound SV1 from the sound source S1 (suppresses the sound SV2 from the sound source S2), and the separated signal Y2 (t) emphasizes the sound SV2 (suppresses the sound SV1). Sound signal. That is, the sound SV1 and the sound SV2 are separated (sound source separation).

分離信号Y1(t)および分離信号Y2(t)は、スピーカやヘッドホン等の放音機器(図示略)に供給されることで音響として再生される。なお、分離信号Y1(t)および分離信号Y2(t)の一方のみを再生する構成(例えば分離信号Y2(t)を雑音として破棄する構成)も採用される。なお、観測信号V1(t)および観測信号V2(t)をアナログからデジタルに変換するA/D変換器や、分離信号Y1(t)および分離信号Y2(t)をデジタルからアナログに変換するD/A変換器の図示は便宜的に省略されている。   The separated signal Y1 (t) and the separated signal Y2 (t) are reproduced as sound by being supplied to a sound emitting device (not shown) such as a speaker or headphones. A configuration in which only one of the separated signal Y1 (t) and the separated signal Y2 (t) is reproduced (for example, a configuration in which the separated signal Y2 (t) is discarded as noise) is also employed. An A / D converter that converts the observation signal V1 (t) and the observation signal V2 (t) from analog to digital, and a D that converts the separation signal Y1 (t) and the separation signal Y2 (t) from digital to analog. The illustration of the / A converter is omitted for convenience.

図1に示すように、信号処理装置100は、演算処理装置12と記憶装置14とを含むコンピュータシステムで実現される。記憶装置14は、観測信号V1(t)および観測信号V2(t)から分離信号Y1(t)および分離信号Y2(t)を生成するためのプログラムや各種のデータを記憶する。半導体記録媒体や磁気記録媒体などの公知の記録媒体や複数種の記録媒体の組合せが記憶装置14として任意に採用される。   As shown in FIG. 1, the signal processing device 100 is realized by a computer system including an arithmetic processing device 12 and a storage device 14. The storage device 14 stores a program and various data for generating the separation signal Y1 (t) and the separation signal Y2 (t) from the observation signal V1 (t) and the observation signal V2 (t). A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media is arbitrarily employed as the storage device 14.

演算処理装置12は、記憶装置14に格納されたプログラムを実行することで複数の要素(周波数分析部22,信号分離部24,信号合成部26,分離行列生成部28)として機能する。なお、音源分離に専用される電子回路(DSP)が図1の各要素を実現する構成や、図1の各要素を複数の集積回路に分散した構成も採用され得る。   The arithmetic processing unit 12 functions as a plurality of elements (frequency analysis unit 22, signal separation unit 24, signal synthesis unit 26, and separation matrix generation unit 28) by executing a program stored in the storage device 14. A configuration in which an electronic circuit (DSP) dedicated to sound source separation realizes each element in FIG. 1 or a configuration in which each element in FIG. 1 is distributed over a plurality of integrated circuits may be employed.

周波数分析部22は、観測信号V1(t)の周波数スペクトル(複素スペクトル)Q1と観測信号V2(t)の周波数スペクトル(複素スペクトル)Q2とを、時間軸上の複数のフレームの各々について生成する。図2に示すように、周波数スペクトルQ1は、周波数軸上に設定されたK個の周波数(実際には周波数帯域)f1〜fKの各々における成分値x1(m,f1)〜x1(m,fK)の系列である。同様に、周波数スペクトルQ2は、K個の周波数f1〜fKの各々における成分値x2(m,f1)〜x2(m,fK)の系列である。記号mは、フレームの番号(時間軸上に離散的に設定された各時点)を意味する。周波数スペクトルQ1および周波数スペクトルQ2の算定には公知の技術(例えば短時間フーリエ変換)が任意に採用される。   The frequency analyzer 22 generates a frequency spectrum (complex spectrum) Q1 of the observation signal V1 (t) and a frequency spectrum (complex spectrum) Q2 of the observation signal V2 (t) for each of a plurality of frames on the time axis. . As shown in FIG. 2, the frequency spectrum Q1 has component values x1 (m, f1) to x1 (m, fK) at each of K frequencies (actually frequency bands) f1 to fK set on the frequency axis. ) Series. Similarly, the frequency spectrum Q2 is a series of component values x2 (m, f1) to x2 (m, fK) at each of the K frequencies f1 to fK. The symbol m means the frame number (each time point set discretely on the time axis). A known technique (for example, short-time Fourier transform) is arbitrarily employed for calculating the frequency spectrum Q1 and the frequency spectrum Q2.

周波数分析部22が生成した周波数スペクトルQ1および周波数スペクトルQ2は図1の信号分離部24に供給される。信号分離部24は、観測信号V1(t)における周波数fk(k=1〜K)の成分(成分値x1(m,fk))と観測信号V2(t)における周波数fkの成分(成分値x2(m,fk))とに対する音源分離をK個の周波数f1〜fKの各々について個別に実行することで分離信号Y1(t)の周波数スペクトルR1と分離信号Y2(t)の周波数スペクトルR2とを生成する。周波数スペクトルR1は成分値y1(m,f1)〜y1(m,fK)の系列であり、周波数スペクトルR2は成分値y2(m,f1)〜y2(m,fK)の系列である。   The frequency spectrum Q1 and the frequency spectrum Q2 generated by the frequency analysis unit 22 are supplied to the signal separation unit 24 of FIG. The signal separation unit 24 has a component (component value x1 (m, fk)) of the frequency fk (k = 1 to K) in the observation signal V1 (t) and a component (component value x2) of the frequency fk in the observation signal V2 (t). (m, fk)) and the frequency spectrum R1 of the separated signal Y1 (t) and the frequency spectrum R2 of the separated signal Y2 (t) by separately performing the sound source separation for each of the K frequencies f1 to fK. Generate. The frequency spectrum R1 is a sequence of component values y1 (m, f1) to y1 (m, fK), and the frequency spectrum R2 is a sequence of component values y2 (m, f1) to y2 (m, fK).

図3は、信号分離部24のブロック図である。図3に示すように、信号分離部24は、相異なる周波数fk(f1〜fK)に対応するK個の処理部P1〜PKで構成される。周波数fkの処理部Pkは、成分値x1(m,fk)および成分値x2(m,fk)から分離信号Y1(t)の成分値y1(m,fk)を生成するフィルタ32と、成分値x1(m,fk)および成分値x2(m,fk)から分離信号Y2(t)の成分値y2(m,fk)を生成するフィルタ34とを含んで構成される。   FIG. 3 is a block diagram of the signal separation unit 24. As shown in FIG. 3, the signal separation unit 24 includes K processing units P1 to PK corresponding to different frequencies fk (f1 to fK). The processing unit Pk having the frequency fk includes a filter 32 that generates a component value y1 (m, fk) of the separated signal Y1 (t) from the component value x1 (m, fk) and the component value x2 (m, fk), and a component value and a filter 34 for generating a component value y2 (m, fk) of the separated signal Y2 (t) from x1 (m, fk) and the component value x2 (m, fk).

フィルタ32およびフィルタ34は、遅延加算型(DS(delay-sum)型)のビーム形成を実行する。すなわち、処理部Pkのフィルタ32は、以下の数式(1A)で定義されるように、係数w11(fk)に応じた遅延を成分値x1(m,fk)に付加する遅延素子321と、係数w12(fk)に応じた遅延を成分値x2(m,fk)に付加する遅延素子323と、遅延素子321および遅延素子323の各出力の加算で成分値y1(m,fk)を生成する加算部325とを含んで構成される。同様に、フィルタ34は、以下の数式(1B)で定義されるように、係数w21(fk)に応じた遅延を成分値x1(m,fk)に付加する遅延素子341と、係数w22(fk)に応じた遅延を成分値x2(m,fk)に付加する遅延素子343と、遅延素子341および遅延素子343の各出力の加算で成分値y2(m,fk)を生成する加算部345とを含む。なお、死角制御型(null)のビーム形成も処理部Pkに適用され得る。
y1(m,fk)=w11(fk)・x1(m,fk)+w12(fk)・x2(m,fk) ……(1A)
y2(m,fk)=w21(fk)・x1(m,fk)+w22(fk)・x2(m,fk) ……(1B)
The filter 32 and the filter 34 execute delay-added type (DS (delay-sum) type) beam forming. That is, the filter 32 of the processing unit Pk includes a delay element 321 that adds a delay corresponding to the coefficient w11 (fk) to the component value x1 (m, fk), as defined by the following formula (1A), Delay element 323 that adds a delay corresponding to w12 (fk) to component value x2 (m, fk), and addition that generates component value y1 (m, fk) by adding the outputs of delay element 321 and delay element 323 Part 325. Similarly, the filter 34 includes a delay element 341 that adds a delay corresponding to the coefficient w21 (fk) to the component value x1 (m, fk) and a coefficient w22 (fk) as defined by the following formula (1B). ) And a delay element 343 that adds a delay corresponding to the component value x2 (m, fk), and an adder 345 that generates a component value y2 (m, fk) by adding the outputs of the delay element 341 and the delay element 343; including. Note that blind spot control type (null) beam forming can also be applied to the processing unit Pk.
y1 (m, fk) = w11 (fk) x1 (m, fk) + w12 (fk) x2 (m, fk) (1A)
y2 (m, fk) = w21 (fk) x1 (m, fk) + w22 (fk) x2 (m, fk) (1B)

図1の信号合成部26は、信号分離部24がフレーム毎に生成した周波数スペクトルR1(y1(m,f1)〜y1(m,fK))の逆フーリエ変換で時間領域の音響信号を生成するとともに前後のフレームを相互に連結することで分離信号Y1(t)を生成する。同様に、信号合成部26は、信号分離部24が生成した周波数スペクトルR2(y2(m,f1)〜y2(m,fK))から時間領域の分離信号Y2(t)を生成する。   1 generates a time domain acoustic signal by inverse Fourier transform of the frequency spectrum R1 (y1 (m, f1) to y1 (m, fK)) generated by the signal separation unit 24 for each frame. At the same time, the separated signal Y1 (t) is generated by connecting the preceding and following frames to each other. Similarly, the signal synthesizer 26 generates a time-domain separation signal Y2 (t) from the frequency spectrum R2 (y2 (m, f1) to y2 (m, fK)) generated by the signal separation unit 24.

図1に示すように、周波数分析部22が生成した周波数スペクトルQ1および周波数スペクトルQ2は、信号分離部24に供給されるとともに観測ベクトルX(m,f1)〜X(m,fK)として記憶装置14に格納される。観測ベクトルX(m,fk)は、図2に示すように、成分値x1(m,fk)と成分値x2(m,fk)とを要素とするベクトル(X(m,fk)=(x1(m,fk),x2(m,fk)))である。記号Tは行列の転置を意味する。記憶装置14に格納された観測ベクトルX(m,f1)〜X(m,fK)は、図2に示すように、所定個(例えば50個)のフレームで構成される単位区間TU毎に学習データD(f1)〜D(fK)に区分される。すなわち、学習データD(fk)は、単位区間TU内の各フレームについて算定された周波数fkの観測ベクトルX(m,fk)の時系列である。 As shown in FIG. 1, the frequency spectrum Q1 and the frequency spectrum Q2 generated by the frequency analysis unit 22 are supplied to the signal separation unit 24 and stored as observation vectors X (m, f1) to X (m, fK). 14. As shown in FIG. 2, the observation vector X (m, fk) is a vector (X (m, fk) = (x1) having the component value x1 (m, fk) and the component value x2 (m, fk) as elements. (m, fk), x2 (m, fk)) T ). The symbol T means transposition of the matrix. The observation vectors X (m, f1) to X (m, fK) stored in the storage device 14 are learned for each unit interval TU composed of a predetermined number (for example, 50) of frames as shown in FIG. Data is divided into D (f1) to D (fK). That is, the learning data D (fk) is a time series of the observation vector X (m, fk) of the frequency fk calculated for each frame in the unit interval TU.

図1および図3の分離行列生成部28は、信号分離部24が音源分離に適用する分離行列W(f1)〜W(fK)を生成する。周波数fkの分離行列W(fk)は、図3に示すように、処理部Pkのフィルタ32に適用される係数w11(fk)および係数w12(fk)とフィルタ34に適用される係数w21(fk)および係数w22(fk)とを要素とする2行2列の行列である。分離行列W(fk)は、記憶装置14の学習データD(fk)を利用した学習処理(反復的な更新)で単位区間TU毎に順次に生成される。図4は、分離行列生成部28のブロック図であり、図5は、分離行列生成部28の動作の説明図である。図4に示すように、分離行列生成部28は、有意指標算定部42と周波数選別部44と第1処理部50と第2処理部60とを含んで構成される。   The separation matrix generation unit 28 in FIGS. 1 and 3 generates separation matrices W (f1) to W (fK) that the signal separation unit 24 applies to sound source separation. As shown in FIG. 3, the separation matrix W (fk) of the frequency fk is obtained by applying a coefficient w11 (fk) and a coefficient w12 (fk) applied to the filter 32 of the processing unit Pk and a coefficient w21 (fk) applied to the filter 34. ) And coefficient w22 (fk) as a matrix. The separation matrix W (fk) is sequentially generated for each unit interval TU by a learning process (repetitive update) using the learning data D (fk) stored in the storage device 14. FIG. 4 is a block diagram of the separation matrix generation unit 28, and FIG. 5 is an explanatory diagram of the operation of the separation matrix generation unit 28. As illustrated in FIG. 4, the separation matrix generation unit 28 includes a significant index calculation unit 42, a frequency selection unit 44, a first processing unit 50, and a second processing unit 60.

有意指標算定部42は、周波数fkの学習データD(fk)を適用した学習処理の有意性の尺度となる有意指標値Z(fk)(Z(f1)〜Z(fK))をK個の周波数f1〜fKの各々について算定する。有意指標値Z(fk)は、学習データD(fk)を利用した学習処理の結果として分離行列W(fk)による音源分離の精度が向上する度合を示す数値に相当する。周波数選別部44は、図5に示すように、K個の周波数f1〜fKの各々を有意指標値Z(fk)に応じて第1周波数fAと第2周波数fBとに選別(分類)する。第1周波数fAは、学習データD(fk)を適用した学習処理の有意性が第2周波数fBと比較して高い周波数である。   The significant index calculation unit 42 calculates K significant index values Z (fk) (Z (f1) to Z (fK)) as a measure of the significance of the learning process using the learning data D (fk) of the frequency fk. Calculation is made for each of the frequencies f1 to fK. The significant index value Z (fk) corresponds to a numerical value indicating the degree to which the accuracy of sound source separation by the separation matrix W (fk) is improved as a result of the learning process using the learning data D (fk). As shown in FIG. 5, the frequency sorting unit 44 sorts (classifies) each of the K frequencies f1 to fK into the first frequency fA and the second frequency fB according to the significant index value Z (fk). The first frequency fA is a frequency at which the significance of the learning process using the learning data D (fk) is higher than that of the second frequency fB.

第1実施形態の有意指標算定部42は、学習データD(fk)(観測ベクトルX(m,fk))の共分散行列Rxx(fk)の行列式z1(fk)を有意指標値Z(fk)として周波数fk毎に算定する要素であり、図6に示すように共分散行列算定部422と行列式算定部424とを含んで構成される。   The significant index calculation unit 42 of the first embodiment uses the determinant z1 (fk) of the covariance matrix Rxx (fk) of the learning data D (fk) (observation vector X (m, fk)) as the significant index value Z (fk). ) And is calculated for each frequency fk, and includes a covariance matrix calculation unit 422 and a determinant calculation unit 424 as shown in FIG.

共分散行列算定部422は、K個の周波数f1〜fKの各々について学習データD(fk)の共分散行列Rxx(fk)(Rxx(f1)〜Rxx(fK))を算定する。周波数fkの共分散行列Rxx(fk)は、学習データD(fk)内の複数の観測ベクトルX(m,fk)の共分散を要素とする行列である。すなわち、共分散行列Rxx(fk)は、例えば以下の数式(2)で算定される。
Rxx(fk)=E[X(m,fk)X(m,fk)H] ……(2)
記号Hは行列の転置(共役転置)を意味し、記号E[ ]は、単位区間TU内の複数のフレーム(学習データD(fk)の全体)にわたる平均値または加算値を意味する。すなわち、共分散行列Rxx(fk)は、単位区間TU毎(学習データD(fk)毎)に生成される2行2列の正方行列である。
The covariance matrix calculation unit 422 calculates the covariance matrix Rxx (fk) (Rxx (f1) to Rxx (fK)) of the learning data D (fk) for each of the K frequencies f1 to fK. The covariance matrix Rxx (fk) of the frequency fk is a matrix having the covariance of a plurality of observation vectors X (m, fk) in the learning data D (fk) as elements. That is, the covariance matrix Rxx (fk) is calculated by the following formula (2), for example.
Rxx (fk) = E [X (m, fk) X (m, fk) H ] (2)
The symbol H means transposition (conjugate transposition) of the matrix, and the symbol E [] means an average value or an addition value over a plurality of frames (the entire learning data D (fk)) in the unit interval TU. That is, the covariance matrix Rxx (fk) is a 2-by-2 square matrix generated for each unit interval TU (each learning data D (fk)).

図6の行列式算定部424は、共分散行列算定部422が算定したK個の共分散行列Rxx(f1)〜Rxx(fK)の各々について行列式z1(fk)(z1(f1)〜z1(fK))を算定する。行列式z1(fk)の算定には公知の方法が任意に採用されるが、例えば共分散行列Rxx(fk)の特異値分解を利用した以下の方法が好適である。なお、以下では便宜的に共分散行列Rxx(fk)をJ行J列(本実施形態ではJ=2)と一般化する。   The determinant calculating unit 424 in FIG. 6 determines the determinant z1 (fk) (z1 (f1) to z1) for each of the K covariance matrices Rxx (f1) to Rxx (fK) calculated by the covariance matrix calculating unit 422. (fK)) is calculated. A known method is arbitrarily employed for calculating the determinant z1 (fk). For example, the following method using singular value decomposition of the covariance matrix Rxx (fk) is preferable. Hereinafter, for convenience, the covariance matrix Rxx (fk) is generalized to J rows and J columns (J = 2 in the present embodiment).

共分散行列Rxx(fk)は以下の数式(3)のように特異値分解される。数式(3)における行列Fは、2行2列の直交行列であり、行列Dは、対角成分(d1,……,dJ)以外の要素がゼロとなるJ行J列の特異値行列(対角行列)である。
Rxx(fk)=FDFH ……(3)
The covariance matrix Rxx (fk) is subjected to singular value decomposition as shown in Equation (3) below. The matrix F in Equation (3) is an orthogonal matrix of 2 rows and 2 columns, and the matrix D is a singular value matrix of J rows and J columns in which elements other than the diagonal components (d1,..., DJ) are zero. Diagonal matrix).
Rxx (fk) = FDF H (3)

数式(3)の特異値分解を考慮すると、共分散行列Rxx(fk)の行列式z1(fk)は、以下の数式(4)で表現される。数式(4)の導出には、行列Fの共役転置行列FHと行列Fとの積がJ次の単位行列であるという関係(FHF=I)や、行列ABの行列式det(AB)が行列BAの行列式det(BA)に等しいという関係を利用した。
z1(fk)=det(Rxx(fk))
=det(FDFH)
=det(D)
=d1・d2・……・dJ ……(4)
Considering the singular value decomposition of Equation (3), the determinant z1 (fk) of the covariance matrix Rxx (fk) is expressed by Equation (4) below. In order to derive Equation (4), the product of the conjugate transpose matrix F H of the matrix F and the matrix F is a J-th order unit matrix (F H F = I), or the determinant det (AB of the matrix AB ) Is equal to the determinant det (BA) of the matrix BA.
z1 (fk) = det (Rxx (fk))
= Det (FDF H )
= Det (D)
= D1 · d2 · · · dJ (4)

数式(4)から理解されるように、共分散行列Rxx(fk)の行列式z1(fk)は、共分散行列Rxx(fk)の特異値分解で特定される特異値行列DのJ個の対角成分(d1,……,dJ)の乗算値に相当する。図6の行列式算定部424は、K個の周波数f1〜fKの各々について数式(4)の演算を実行することで行列式z1(f1)〜z1(fK)を算定する。   As can be understood from the equation (4), the determinant z1 (fk) of the covariance matrix Rxx (fk) is represented by J pieces of the singular value matrix D specified by the singular value decomposition of the covariance matrix Rxx (fk). This corresponds to the multiplication value of the diagonal components (d1,..., DJ). The determinant calculating unit 424 in FIG. 6 calculates the determinants z1 (f1) to z1 (fK) by executing the calculation of the equation (4) for each of the K frequencies f1 to fK.

図7は、単位区間TU内の各観測ベクトルX(m,fk)の散布図である。横軸は成分値x1(m,fk)を意味し、縦軸は成分値x2(m,fk)を意味する。図7の部分(A)は、行列式z1(fk)が大きい場合の散布図であり、図7の部分(B)は、行列式z1(fk)が小さい場合の散布図である。図7の部分(A)のように、行列式z1(fk)が大きい場合には、観測ベクトルX(m,fk)の分布する領域の軸線(基底)が音源S毎に明確に区別される。具体的には、音源S1からの音響SV1が優勢な観測ベクトルX(m,fk)が軸線α1に沿って分布する領域A1と、音源S2からの音響SV2が優勢な観測ベクトルX(m,fk)が軸線α2に沿って分布する領域A2とが明確に区別される。他方、行列式z1(fk)が小さい場合、散布図で明確に区別できる観測ベクトルX(m,fk)の分布の領域の個数(軸線の本数)が実際の音源Sの総数を下回る。例えば、図7の部分(B)のように、音源S2からの音響SV2に対応する明確な領域A2(軸線α2)が存在しない。   FIG. 7 is a scatter diagram of each observation vector X (m, fk) in the unit interval TU. The horizontal axis represents the component value x1 (m, fk), and the vertical axis represents the component value x2 (m, fk). Part (A) in FIG. 7 is a scatter diagram when the determinant z1 (fk) is large, and part (B) in FIG. 7 is a scatter diagram when the determinant z1 (fk) is small. When the determinant z1 (fk) is large as in part (A) of FIG. 7, the axis (base) of the region in which the observation vector X (m, fk) is distributed is clearly distinguished for each sound source S. . Specifically, an observation vector X (m, fk) in which the observation vector X (m, fk) from which the sound SV1 from the sound source S1 is dominant is distributed along the axis α1 and the sound SV2 from the sound source S2 is dominant. ) Is clearly distinguished from the region A2 in which it is distributed along the axis α2. On the other hand, when the determinant z1 (fk) is small, the number of regions (number of axes) of the distribution of the observed vectors X (m, fk) that can be clearly distinguished in the scatter diagram is less than the total number of the actual sound sources S. For example, there is no clear area A2 (axis α2) corresponding to the sound SV2 from the sound source S2 as shown in part (B) of FIG.

以上の傾向から理解されるように、共分散行列Rxx(fk)の行列式z1(fk)は、学習データD(fk)を構成する各観測ベクトルX(m,fk)の分布における基底(観測ベクトルX(m,fk)が分布する領域の軸線)の総数の指標として機能する。すなわち、行列式z1(fk)が大きい周波数fkほど基底が多いという傾向がある。行列式z1(fk)がゼロとなる周波数fkには独立な基底が1個しか含まれない。分離行列W(fk)の学習処理に適用される独立成分分析は、独立な基底を音源Sの個数だけ特定する処理と等価であるから、K個の周波数f1〜fKのうち共分散行列Rxx(fk)の行列式z1(fk)が小さい周波数fkの学習データD(fk)については学習の有意性(学習処理の前後で音源分離の精度が向上する度合)が低いと言える。   As can be understood from the above tendency, the determinant z1 (fk) of the covariance matrix Rxx (fk) is the basis (observation) in the distribution of each observation vector X (m, fk) constituting the learning data D (fk). It functions as an index of the total number of axes in the region in which the vector X (m, fk) is distributed. That is, there is a tendency that the frequency fk with the larger determinant z1 (fk) has more bases. The frequency fk at which the determinant z1 (fk) is zero includes only one independent basis. The independent component analysis applied to the learning process of the separation matrix W (fk) is equivalent to the process of specifying the independent bases by the number of the sound sources S. Therefore, the covariance matrix Rxx ( It can be said that the learning data D (fk) of the frequency fk having a small determinant z1 (fk) of fk) has low learning significance (the degree to which the accuracy of sound source separation improves before and after the learning process).

行列式z1(fk)の以上の性質を利用して、図4の周波数選別部44は、K個の周波数f1〜fKのうち行列式z1(fk)が大きい1以上の周波数fkを第1周波数fAに選別し、行列式z1(fk)が小さい残余の周波数fkを第2周波数fBに選別する。具体的には、周波数選別部44は、K個の周波数f1〜fKのうち行列式z1(f1)〜z1(fK)の降順で上位に位置する所定個の周波数fkや、K個の周波数f1〜fKのうち行列式z1(fk)が所定の閾値を上回る1個以上の周波数fkを第1周波数fAに選別し、第1周波数fA以外の周波数fkを第2周波数fBに選別する。周波数選別部44による第1周波数fA/第2周波数fBの選別は、例えば単位区間TU毎に順次に実行される。   Using the above property of the determinant z1 (fk), the frequency selecting unit 44 in FIG. 4 selects one or more frequencies fk having a large determinant z1 (fk) among the K frequencies f1 to fK as the first frequency. The remaining frequency fk having a small determinant z1 (fk) is selected as the second frequency fB. Specifically, the frequency selection unit 44 includes a predetermined number of frequencies fk positioned in descending order of the determinants z1 (f1) to z1 (fK) among the K frequencies f1 to fK, and K frequencies f1. One or more frequencies fk in which the determinant z1 (fk) exceeds a predetermined threshold value are selected as the first frequency fA, and frequencies fk other than the first frequency fA are selected as the second frequency fB. The selection of the first frequency fA / the second frequency fB by the frequency selection unit 44 is executed sequentially for each unit interval TU, for example.

図4の第1処理部50および第2処理部60は、信号分離部24で使用される分離行列W(fk)(W(f1)〜W(fK))を周波数fk毎に生成する。第1処理部50は、周波数選別部44が第1周波数fAに選別した各周波数fkについて分離行列W(fk)(以下では特に「分離行列WA(fk)」と表記する場合がある)を生成し、第2処理部60は、周波数選別部44が第2周波数fBに選別した各周波数fkについて分離行列W(fk)(以下では特に「分離行列WB(fk)」と表記する場合がある)を生成する。   The first processing unit 50 and the second processing unit 60 in FIG. 4 generate a separation matrix W (fk) (W (f1) to W (fK)) used in the signal separation unit 24 for each frequency fk. The first processing unit 50 generates a separation matrix W (fk) (hereinafter sometimes referred to as “separation matrix WA (fk)”) for each frequency fk selected by the frequency selection unit 44 as the first frequency fA. The second processing unit 60 then separates each frequency fk selected by the frequency selecting unit 44 into the second frequency fB (hereinafter, sometimes referred to as “separation matrix WB (fk)”). Is generated.

図4に示すように、第1処理部50は、初期行列設定部52と第1学習部54と補正処理部56とを含んで構成される。初期行列設定部52は、分離行列WA(fk)を生成する学習処理の初期値(以下「初期分離行列」という)WA[0](fk)を設定する。初期分離行列WA[0](fk)の設定の方法は任意であるが、例えば単位行列が初期分離行列WA[0](fk)として設定され得る。以上のように観測ベクトルX(m,fk)とは無関係に初期分離行列WA[0](fk)を設定する構成によれば、音源S1や音源S2に関する事前情報が不要であるという利点がある。 As shown in FIG. 4, the first processing unit 50 includes an initial matrix setting unit 52, a first learning unit 54, and a correction processing unit 56. The initial matrix setting unit 52 sets an initial value (hereinafter referred to as “initial separation matrix”) WA [0] (fk) of learning processing for generating the separation matrix WA (fk). Although the method of setting the initial separation matrix WA [0] (fk) is arbitrary, for example, a unit matrix can be set as the initial separation matrix WA [0] (fk). As described above, according to the configuration in which the initial separation matrix WA [0] (fk) is set regardless of the observation vector X (m, fk), there is an advantage that prior information on the sound source S1 and the sound source S2 is unnecessary. .

図4の第1学習部54は、図5に示すように、初期行列設定部52が設定した初期分離行列WA[0](fk)を初期値とした逐次的な更新(以下「1次学習処理」という)で、第1周波数fAに選別された各周波数fkの分離行列WA(fk)を生成する。第1学習部54による1次学習処理には公知の技術が任意に採用され得るが、例えば、第(n+1)回目の更新後の分離行列W[n+1](fk)を直前の分離行列W[n](fk)(分離行列W[1](fk)の算定時には初期分離行列WA[0](fk))から算定する数式(5)の演算が好適である。
[n+1](fk)=W[n](fk)−η{off-diag(E[φ(m,fk)Y[n](m,fk)H]W[n](fk) ……(5)
As shown in FIG. 5, the first learning unit 54 in FIG. 4 performs sequential updating (hereinafter referred to as “primary learning”) using the initial separation matrix WA [0] (fk) set by the initial matrix setting unit 52 as an initial value. In the processing, a separation matrix WA (fk) of each frequency fk selected as the first frequency fA is generated. For the primary learning process by the first learning unit 54, a known technique can be arbitrarily adopted. For example, the updated (n + 1) -th updated separation matrix W [n + 1] (fk) The calculation of Equation (5) calculated from the separation matrix W [n] (fk) (when calculating the separation matrix W [1] (fk), the initial separation matrix WA [0] (fk)) is preferable.
W [n + 1] (fk) = W [n] (fk) −η {off-diag (E [φ (m, fk) Y [n] (m, fk) H ] W [n] (fk) ……(Five)

数式(5)の記号ηは所定の定数(ステップサイズ)であり、記号off-diag( )は、対角成分をゼロに置換する演算子である。また、記号φ( )は非線形関数を意味する。例えば双曲線正接関数(tanh:ハイパボリックタンジェント)が非線形関数φ( )として適用され得る。数式(5)の記号Y[n](m,fk)は、直前の分離行列W[n-1](m,fk)を適用した数式(1A)および数式(1B)の演算で算定される成分値y1(m,fk)と成分値y2(m,fk)とを要素とするベクトル(Y[n](m,fk)=(y1(m,fk),y2(m,fk))T)である。第1学習部54は、数式(5)の演算をNA回だけ反復した時点の分離行列W[NA](fk)を分離行列WA(fk)として確定する。ただし、第1学習部54は、数式(5)で算定される分離行列W[n+1](fk)が収束したと判定される場合には反復がNA回に到達する以前に1次学習処理を終了し、その時点での分離行列W[n+1](fk)を分離行列WA(fk)として確定する。 Symbol η in equation (5) is a predetermined constant (step size), and symbol off-diag () is an operator that replaces the diagonal component with zero. The symbol φ () means a nonlinear function. For example, a hyperbolic tangent function (tanh: hyperbolic tangent) can be applied as the nonlinear function φ (). The symbol Y [n] (m, fk) in the formula (5) is calculated by the calculation of the formula (1A) and the formula (1B) to which the immediately preceding separation matrix W [n-1] (m, fk) is applied. A vector (Y [n] (m, fk) = (y1 (m, fk), y2 (m, fk)) T having the component value y1 (m, fk) and the component value y2 (m, fk) as elements ). The first learning unit 54 determines the separation matrix W [NA] (fk) as the separation matrix WA (fk) when the calculation of Expression (5) is repeated NA times. However, if it is determined that the separation matrix W [n + 1] (fk) calculated by Equation (5) has converged, the first learning unit 54 performs primary learning before the iteration reaches NA times. The process is terminated, and the separation matrix W [n + 1] (fk) at that time is determined as the separation matrix WA (fk).

ところで、独立成分分析(1次学習処理)で算定される分離行列WA(fk)には、音源分離の実行後の各信号の振幅が不定であるという問題(scaling問題)と、音源分離後の各信号と各音源との組合せが周波数fk毎に変化し得るという問題(permutation問題)とがある。図4の補正処理部56は、第1周波数fAに選別された各周波数fkについて第1学習部54が生成した各分離行列WA(fk)をscaling問題とpermutation問題とが解決されるように補正する。   By the way, the separation matrix WA (fk) calculated by independent component analysis (primary learning processing) has a problem that the amplitude of each signal after execution of sound source separation is indefinite (scaling problem), and There is a problem (permutation problem) that the combination of each signal and each sound source can change for each frequency fk. 4 corrects each separation matrix WA (fk) generated by the first learning unit 54 for each frequency fk selected as the first frequency fA so that the scaling problem and the permutation problem are solved. To do.

以上のscaling問題およびpermutation問題の解決には公知の技術が任意に採用される。例えば、分離行列WA(fk)の逆行列の対角成分で構成される対角行列を分離行列WA(fk)に乗算することでscaling問題が解決され、分離行列WA(fk)から推定される各音源の方向が整合するように分離行列WA(fk)の各行を相互に置換することでpermutation問題が解決される。補正処理部56による補正後の各分離行列WA(fk)が、信号分離部24のK個の処理部P1〜PKのうち第1周波数fAに選別された各周波数fkの処理部Pkにて適用される。scaling問題やpermutation問題の解決については、猿渡ほか5名,“Blind Source Separation Combininb Independent Component Analysis and Beamforming",EURASIP Journal on Applied Signal Processing Vol.2003, No.11, p.1135-1146, 2003(以下「非特許文献2」という)にも詳述されている。   Known techniques are arbitrarily employed to solve the above scaling problem and permutation problem. For example, the scaling problem is solved by multiplying the separation matrix WA (fk) by a diagonal matrix composed of the diagonal components of the inverse matrix of the separation matrix WA (fk), and is estimated from the separation matrix WA (fk). The permutation problem is solved by replacing each row of the separation matrix WA (fk) with each other so that the directions of the sound sources are matched. Each separation matrix WA (fk) corrected by the correction processing unit 56 is applied to the processing unit Pk of each frequency fk selected as the first frequency fA among the K processing units P1 to PK of the signal separation unit 24. Is done. Saruwatari et al., “Blind Source Separation Combininb Independent Component Analysis and Beamforming”, EURASIP Journal on Applied Signal Processing Vol.2003, No.11, p.1135-1146, 2003 (below) (Referred to as “Non-Patent Document 2”).

図4の第2処理部60は、周波数選別部44が第2周波数fBに選別した各周波数fkの分離行列WB(fk)を、第1処理部50が生成した分離行列WA(fk)を利用して生成する。図4に示すように、第2処理部60は、方向推定部62と初期行列設定部64と第2学習部66とを含んで構成される。   The second processing unit 60 in FIG. 4 uses the separation matrix WB (fk) of each frequency fk selected by the frequency selection unit 44 as the second frequency fB and the separation matrix WA (fk) generated by the first processing unit 50. And generate. As shown in FIG. 4, the second processing unit 60 includes a direction estimation unit 62, an initial matrix setting unit 64, and a second learning unit 66.

方向推定部62は、第1処理部50が生成した各分離行列WA(fk)から音源S1の方向θ1と音源S2の方向θ2とを推定する。方向θ1および方向θ2の推定には公知の技術(例えば非特許文献2に開示された方法)が任意に採用されるが、例えば以下の方法が好適である。第1に、方向推定部62は、図5に示すように、第1周波数fAに選別された周波数fk毎に方向θ1(fk)と方向θ2(fk)とを分離行列WA(fk)から推定する。例えば、分離行列WA(fk)の係数w11(fk)と係数w12(fk)とから方向θ1(fk)が特定され、係数w21(fk)と係数w22(fk)とから方向θ2(fk)が特定される。第2に、方向推定部62は、図5に示すように、各周波数fk(第1周波数fA)の方向θ1(fk)および方向θ2(fk)から音源S1の方向θ1と音源S2の方向θ2とを算定する。例えば、各方向θ1(fk)の代表値(平均値や中央値)が方向θ1として特定され、各方向θ2(fk)の代表値が方向θ2として特定される。   The direction estimation unit 62 estimates the direction θ1 of the sound source S1 and the direction θ2 of the sound source S2 from each separation matrix WA (fk) generated by the first processing unit 50. For estimating the direction θ1 and the direction θ2, a known technique (for example, a method disclosed in Non-Patent Document 2) is arbitrarily adopted. For example, the following method is preferable. First, as shown in FIG. 5, the direction estimation unit 62 estimates the direction θ1 (fk) and the direction θ2 (fk) from the separation matrix WA (fk) for each frequency fk selected as the first frequency fA. To do. For example, the direction θ1 (fk) is specified from the coefficient w11 (fk) and the coefficient w12 (fk) of the separation matrix WA (fk), and the direction θ2 (fk) is determined from the coefficient w21 (fk) and the coefficient w22 (fk). Identified. Second, as shown in FIG. 5, the direction estimation unit 62 determines the direction θ1 of the sound source S1 and the direction θ2 of the sound source S2 from the direction θ1 (fk) and the direction θ2 (fk) of each frequency fk (first frequency fA). And calculate. For example, the representative value (average value or median value) of each direction θ1 (fk) is specified as the direction θ1, and the representative value of each direction θ2 (fk) is specified as the direction θ2.

図4の初期行列設定部64は、図5に示すように、方向推定部62が推定した方向θ1および方向θ2に応じて分離行列WB(fk)の初期分離行列WB[0](fk)を設定する。初期分離行列WB[0](fk)の生成には、例えば非特許文献2に開示された死角制御型のビーム形成が適用される。具体的には、初期行列設定部64は、方向推定部62が推定した方向θ2に収音の死角(収音感度が低い領域)が形成されるように算定された係数w11(fk)および係数w12(fk)と、方向推定部62が推定した方向θ1に収音の死角が形成されるように算定された係数w21(fk)および係数w22(fk)とを要素とする初期分離行列WB[0](fk)を生成する。初期分離行列WB[0](fk)は、周波数選別部44が第2周波数fBに選別した周波数fk毎に個別に生成される。 As shown in FIG. 5, the initial matrix setting unit 64 in FIG. 4 determines the initial separation matrix WB [0] (fk) of the separation matrix WB (fk) according to the direction θ1 and the direction θ2 estimated by the direction estimation unit 62. Set. For generating the initial separation matrix WB [0] (fk), for example, blind spot control type beam forming disclosed in Non-Patent Document 2 is applied. Specifically, the initial matrix setting unit 64 calculates the coefficient w11 (fk) and the coefficient calculated so that the dead angle of sound collection (region where sound collection sensitivity is low) is formed in the direction θ2 estimated by the direction estimation unit 62. Initial separation matrix WB [ w12 (fk) and coefficient w21 (fk) and coefficient w22 (fk) calculated so that a dead angle of sound collection is formed in direction θ1 estimated by direction estimation unit 62 0] (fk) is generated. The initial separation matrix WB [0] (fk) is individually generated for each frequency fk selected by the frequency selection unit 44 as the second frequency fB.

なお、以上の例示では死角制御型のビーム形成で初期分離行列WB[0](fk)を生成したが、初期分離行列WB[0](fk)の生成には、収音感度が高い領域(ビーム)を生成するビーム形成(例えば遅延加算型のビーム形成)も採用され得る。すなわち、方向θ1に収音のビームが指向するように初期分離行列WB[0](fk)の係数w11(fk)および係数w12(fk)が設定され、方向θ2に収音のビームが指向するように初期分離行列WB[0](fk)の係数w21(fk)および係数w22(fk)が設定される。 In the above example, the initial separation matrix WB [0] (fk) is generated by blind-angle-controlled beam forming. However, the initial separation matrix WB [0] (fk) is generated in a region where the sound collection sensitivity is high ( Beam forming (for example, delay-added beam forming) may be employed. That is, the coefficient w11 (fk) and the coefficient w12 (fk) of the initial separation matrix WB [0] (fk) are set so that the collected beam is directed in the direction θ1, and the collected beam is directed in the direction θ2. Thus, the coefficient w21 (fk) and coefficient w22 (fk) of the initial separation matrix WB [0] (fk) are set.

図4の第2学習部66は、図5に示すように、初期行列設定部64が設定した初期分離行列WB[0](fk)を初期値とした逐次的な更新(以下「2次学習処理」という)で、第2周波数fBに選別された各周波数fkの分離行列WB(fk)を生成する。2次学習処理には公知の技術が任意に採用され得るが、1次学習処理と同様に、数式(5)の演算が好適に採用される。すなわち、第2学習部66は、初期行列設定部64が設定した初期分離行列WB[0](fk)を初期値とし、第2周波数fBに選別された各周波数fkの学習データD(fk)から数式(1A)および数式(1B)で算定されるベクトルY[n](m,fk)を利用して数式(5)の演算を反復する。 As shown in FIG. 5, the second learning unit 66 in FIG. 4 sequentially updates the initial separation matrix WB [0] (fk) set by the initial matrix setting unit 64 (hereinafter referred to as “secondary learning”). In the process, a separation matrix WB (fk) of each frequency fk selected as the second frequency fB is generated. A known technique can be arbitrarily employed for the secondary learning process, but the calculation of Expression (5) is preferably employed as in the primary learning process. That is, the second learning unit 66 uses the initial separation matrix WB [0] (fk) set by the initial matrix setting unit 64 as an initial value, and learns data D (fk) of each frequency fk selected as the second frequency fB. Then, the calculation of Formula (5) is repeated using the vector Y [n] (m, fk) calculated by Formula (1A) and Formula (1B).

第2学習部66による数式(5)の反復回数NBは、第1学習部54の反復回数NAを下回るように設定される(NB<NA)。第2学習部66は、数式(5)の演算をNB回だけ反復した時点の分離行列W[NB](fk)を第2周波数fBの分離行列WB(fk)として算定する。第1学習部54と同様に、第2学習部66は、分離行列W[n+1](fk)が収束した場合には反復がNB回に到達する以前に2次学習処理を終了し、その時点での分離行列W[n+1](fk)を分離行列WB(fk)として確定する。以上の2次学習処理で生成された分離行列WB(fk)が、信号分離部24のK個の処理部P1〜PKのうち第2周波数fBに選別された各周波数fkの処理部Pkにて適用される。なお、1次学習処理と2次学習処理とで演算の内容を相違させた構成も採用され得る。 The number of iterations NB of Equation (5) by the second learning unit 66 is set to be less than the number of iterations NA of the first learning unit 54 (NB <NA). The second learning unit 66 calculates the separation matrix W [NB] (fk) at the time when the calculation of the formula (5) is repeated NB times as the separation matrix WB (fk) of the second frequency fB. Similar to the first learning unit 54, when the separation matrix W [n + 1] (fk) converges, the second learning unit 66 ends the secondary learning process before the iteration reaches NB times, The separation matrix W [n + 1] (fk) at that time is determined as the separation matrix WB (fk). The separation matrix WB (fk) generated by the above-described secondary learning processing is processed by the processing unit Pk of each frequency fk selected as the second frequency fB among the K processing units P1 to PK of the signal separation unit 24. Applied. In addition, the structure which made the content of calculation differ by the primary learning process and the secondary learning process may be employ | adopted.

方向θ1と方向θ2とに応じた初期分離行列WB[0](fk)から算定される分離行列WB(fk)には、事前情報を適用せずに生成される分離行列WA(fk)と比較すると、前述のscaling問題やpermutation問題は発生し難い。そこで、第2学習部66が生成した分離行列WB(fk)には、scaling問題やpermutation問題を解決するための補正は実行されない。もっとも、第2学習部66が生成した分離行列WB(fk)を補正処理部56が補正する構成も採用され得る。 The separation matrix WB (fk) calculated from the initial separation matrix WB [0] (fk) corresponding to the direction θ1 and the direction θ2 is compared with the separation matrix WA (fk) generated without applying prior information. Then, the above-mentioned scaling problem and permutation problem are unlikely to occur. Therefore, correction for solving the scaling problem and the permutation problem is not performed on the separation matrix WB (fk) generated by the second learning unit 66. However, a configuration in which the correction processing unit 56 corrects the separation matrix WB (fk) generated by the second learning unit 66 may be employed.

以上に説明したように、本実施形態では、第1周波数fAに選別された各周波数fkについては反復回数NAの1次学習処理で分離行列WA(fk)が生成され、第2周波数fBに選別された各周波数fkについては、分離行列WA(fk)に応じて生成された初期分離行列WB[0](fk)を初期値とした反復回数NB(NB<NA)の2次学習処理で分離行列WB(fk)が生成される。したがって、K個の周波数f1〜fKの全部について数式(5)の演算をNA回だけ反復する構成と比較して、演算処理装置12(分離行列生成部28)の演算量が削減されるという利点がある。また、分離行列WA(fk)から生成される初期分離行列WB[0](fk)を初期値とした2次学習処理で第2周波数fBの分離行列WB(fk)が生成されるから、初期分離行列WB[0](fk)を分離行列WB(fk)として音源分離に利用する特許文献1の構成(すなわち、2次学習処理を省略した構成)と比較して、高精度な音源分離が可能な分離行列WB(fk)を生成できるという利点がある。 As described above, in the present embodiment, for each frequency fk selected as the first frequency fA, the separation matrix WA (fk) is generated by the primary learning process with the number of iterations NA, and is selected as the second frequency fB. Each frequency fk is separated by the secondary learning process of the number of iterations NB (NB <NA) using the initial separation matrix WB [0] (fk) generated according to the separation matrix WA (fk) as an initial value. A matrix WB (fk) is generated. Accordingly, the calculation amount of the arithmetic processing unit 12 (separation matrix generation unit 28) is reduced as compared with the configuration in which the calculation of the equation (5) is repeated NA times for all K frequencies f1 to fK. There is. In addition, since the separation matrix WB (fk) of the second frequency fB is generated by the secondary learning process using the initial separation matrix WB [0] (fk) generated from the separation matrix WA (fk) as an initial value, Compared with the configuration of Patent Document 1 in which the separation matrix WB [0] (fk) is used as the separation matrix WB (fk) for sound source separation (that is, the configuration in which the secondary learning process is omitted), high-accuracy sound source separation is achieved. There is an advantage that a possible separation matrix WB (fk) can be generated.

図8は、K個(K=513)のうち第1周波数fAに選別した周波数fkの個数(横軸)と数式(5)の演算の総回数(以下「学習総回数」という)との関係を示すグラフである。第1実施形態では1次学習処理の回数と2次学習処理の回数との合計値が学習総回数に相当する。図8では、1次学習処理の反復回数NAを500回に設定するとともに2次学習処理の反復回数NBを100回に設定し、分離行列W(fk)の収束が検出された場合には学習処理を停止する場合が想定されている。   FIG. 8 shows the relationship between the number of frequencies fk (horizontal axis) selected as the first frequency fA out of K (K = 513) and the total number of calculations of Equation (5) (hereinafter referred to as “the total number of learning”). It is a graph which shows. In the first embodiment, the total value of the number of primary learning processes and the number of secondary learning processes corresponds to the total number of learnings. In FIG. 8, the iteration number NA of the primary learning process is set to 500 and the iteration number NB of the secondary learning process is set to 100, and learning is performed when convergence of the separation matrix W (fk) is detected. It is assumed that processing will be stopped.

図8には、反復回数NA(500回)の1次学習処理を全部(513個)の周波数fkについて実行した場合(分離行列W(fk)の収束時には学習処理を終了)が対比例1(REF1)および対比例2(REF2)として併記されている。対比例1は、1次学習処理の初期分離行列W[0](fk)として単位行列を使用した場合であり、対比例2は、既知の方向θ1および方向θ2を利用して死角制御型のビーム形成で生成した分離行列を1次学習処理の初期分離行列W[0](fk)として使用した場合である。対比例1および対比例2では2次学習処理は実行していない。また、図8には、音源S1が発生した音響SV1と音源S2が発生した音響SV2との振幅比RA(RA=1,0.87,0.71,0.5,0.32)を変化させた複数の場合の各々について学習総回数が図示されている。なお、横軸の各場合の条件や振幅比RAの条件は、後掲の図9から図12でも同様である。 FIG. 8 shows the case where the primary learning process of the number of iterations NA (500 times) is executed for all (513) frequencies fk (the learning process is terminated when the separation matrix W (fk) converges). REF1) and Comparative 2 (REF2). The proportional 1 is a case where a unit matrix is used as the initial separation matrix W [0] (fk) of the primary learning process, and the proportional 2 is a blind spot control type using the known direction θ1 and the direction θ2. This is a case where the separation matrix generated by beam forming is used as the initial separation matrix W [0] (fk) in the primary learning process. In contrast 1 and contrast 2, secondary learning processing is not executed. FIG. 8 shows each of a plurality of cases in which the amplitude ratio RA (RA = 1, 0.87, 0.71, 0.5, 0.32) between the sound SV1 generated by the sound source S1 and the sound SV2 generated by the sound source S2 is changed. The total number of learning is shown. The conditions in each case on the horizontal axis and the condition of the amplitude ratio RA are the same in FIGS. 9 to 12 described later.

図8から理解されるように、1次学習処理を選択的に実行する第1実施形態では、全周波数fkに対して1次学習処理を実行する対比例1や対比例2と比較して、分離行列W(fk)の生成に必要な学習総回数が大幅に削減される。1次学習処理の対象となる第1周波数fAの個数が減少するほど、第1実施形態と対比例1や対比例2との学習総回数の差異は拡大する。すなわち、第1実施形態によれば、対比例1や対比例2と比較して、分離行列W(f1)〜W(fK)の生成に必要な演算量が削減されるという利点がある。以上の傾向は、音響SV1と音響SV2との振幅比RAに関わらず同様に確認できる。   As can be understood from FIG. 8, in the first embodiment in which the primary learning process is selectively executed, compared to the comparative 1 and the comparative 2 in which the primary learning process is executed for all the frequencies fk, The total number of learnings required for generating the separation matrix W (fk) is greatly reduced. As the number of first frequencies fA to be subjected to the primary learning process decreases, the difference in the total number of learnings between the first embodiment and the comparison 1 or the comparison 2 increases. That is, according to the first embodiment, there is an advantage that the amount of calculation required for generating the separation matrices W (f1) to W (fK) is reduced as compared with the comparative 1 and the comparative 2. The above tendency can be similarly confirmed regardless of the amplitude ratio RA between the sound SV1 and the sound SV2.

図9は、本実施形態および各対比例での雑音抑圧率(NRR:Noise Reduction Rate)のグラフであり、図10は、本実施形態および各対比例でのケプストラム歪のグラフである。雑音抑圧率は、分離信号Y1(t)における音響SV2に対する音響SV1の強度比SNR_OUTと、観測信号V1(t)における音響SV2に対する音響SV1の強度比SNR_INとの差分(SNR_OUT−SNR_IN)である。したがって、雑音抑圧率が高い(図9の上方)ほど音源分離の精度が高い。ケプストラム歪は、音響SV1と分離信号Y1(t)とのケプストラムの相違の指標である。ケプストラム歪が小さい(図10の上方)ほど、音源分離に起因した波形(スペクトル包絡)の変化が小さい(すなわち、音響SV1が忠実に分離される)ことを意味する。   FIG. 9 is a graph of noise reduction rate (NRR: Noise Reduction Rate) in the present embodiment and each comparison, and FIG. 10 is a graph of cepstrum distortion in the present embodiment and each comparison. The noise suppression rate is a difference (SNR_OUT−SNR_IN) between the intensity ratio SNR_OUT of the sound SV1 with respect to the sound SV2 in the separated signal Y1 (t) and the intensity ratio SNR_IN of the sound SV1 with respect to the sound SV2 in the observation signal V1 (t). Therefore, the higher the noise suppression rate (upward in FIG. 9), the higher the accuracy of sound source separation. The cepstrum distortion is an index of the difference in cepstrum between the sound SV1 and the separated signal Y1 (t). It means that the smaller the cepstrum distortion (upper part of FIG. 10), the smaller the change in waveform (spectrum envelope) due to sound source separation (that is, the sound SV1 is faithfully separated).

図9および図10から理解されるように、音響SV1と音響SV2とで振幅が過度に相違しない範囲内(RA=1,0.87,0.71)では、1次学習処理の対象となる第1周波数fAを減少させて演算量を削減した場合でも、対比例1や対比例2と比較して、雑音抑圧率の低下やケプストラム歪の増加は殆ど発生しない。第1周波数fAの個数を256個または384個とした場合には、対比例1や対比例2と比較して雑音抑圧率やケプストラム歪の改善さえ確認できる。以上のように、第1実施形態によれば、分離行列W(fk)の生成に必要な演算量を削減しながら音源分離の高精度化を実現することが可能である。   As understood from FIGS. 9 and 10, the first frequency fA to be subjected to the primary learning process is within a range where the amplitudes of the sound SV1 and the sound SV2 are not excessively different (RA = 1, 0.87, 0.71). Even when the amount of calculation is reduced by reducing, the noise suppression rate decreases and the cepstrum distortion hardly increases as compared with the proportional 1 and the proportional 2. When the number of first frequencies fA is set to 256 or 384, it can be confirmed that the noise suppression rate and the cepstrum distortion are improved as compared with the proportional 1 and the proportional 2. As described above, according to the first embodiment, it is possible to achieve high accuracy of sound source separation while reducing the amount of calculation required for generating the separation matrix W (fk).

図11および図12は、第1実施形態のもとで2次学習処理を省略した場合(以下「対比例3」という)の雑音抑圧率(図11)およびケプストラム歪(図12)のグラフである。すなわち、対比例3(REF3)では、非特許文献1の技術と同様に、初期行列設定部64が設定した初期分離行列WB[0](fk)が第2周波数fBの分離行列WB(fk)として音源分離に適用される。2次学習処理の省略以外の条件は図8から図10に示した第1実施形態と同様である。 11 and 12 are graphs of the noise suppression rate (FIG. 11) and the cepstrum distortion (FIG. 12) when the secondary learning process is omitted under the first embodiment (hereinafter referred to as “comparative 3”). is there. That is, in the comparative 3 (REF3), as in the technique of Non-Patent Document 1, the initial separation matrix WB [0] (fk) set by the initial matrix setting unit 64 is the separation matrix WB (fk) of the second frequency fB. As applied to sound source separation. Conditions other than the omission of the secondary learning process are the same as those in the first embodiment shown in FIGS.

図11に示すように、対比例3のもとでは、第1周波数fAの個数が減少するほど雑音抑圧率が向上するように見える。しかし、図12を参照すると、第1周波数fAの個数が減少するほどケプストラム歪が増加することが確認できる。すなわち、図11で第1周波数fAの個数が少ない場合に雑音抑圧率が向上しているのは、分離信号Y1(t)の波形と本来の音響SV1の波形とが乖離していることに起因しており、音源分離の精度が高水準に維持されているわけではないと理解できる。他方、図9や図10に示すように、第2周波数fBについて2次学習処理を実行する第1実施形態のもとでは、ケプストラム歪を充分に抑制しながら雑音抑圧率も高水準に維持することが可能である。したがって、雑音抑圧率の維持とケプストラム歪の低減とを両立する(すなわち高精度な音源分離を実現する)という観点からは、対比例3よりも第1実施形態が有利である。また、音響SV1と音響SV2との振幅比RAが高い範囲内(RA=1,0.87,0.71)に着目して図10と図12とのケプストラム歪の数値を対比すると、第1実施形態では、対比例3と比較してケプストラム歪が抑制されることが確認できる。したがって、音響SV1や音響SV2の忠実な分離という観点からしても第1実施形態が有利である。   As shown in FIG. 11, under contrast 3, it appears that the noise suppression rate improves as the number of first frequencies fA decreases. However, referring to FIG. 12, it can be confirmed that the cepstrum distortion increases as the number of the first frequencies fA decreases. That is, in FIG. 11, when the number of first frequencies fA is small, the noise suppression rate is improved because the waveform of the separated signal Y1 (t) and the waveform of the original sound SV1 are different. Therefore, it can be understood that the accuracy of sound source separation is not maintained at a high level. On the other hand, as shown in FIGS. 9 and 10, under the first embodiment in which the secondary learning process is executed for the second frequency fB, the noise suppression rate is maintained at a high level while sufficiently suppressing the cepstrum distortion. It is possible. Therefore, the first embodiment is more advantageous than the comparative 3 from the viewpoint of achieving both the maintenance of the noise suppression rate and the reduction of the cepstrum distortion (that is, realizing high-accuracy sound source separation). Further, when the numerical values of the cepstrum distortion in FIG. 10 and FIG. 12 are compared focusing on the range (RA = 1, 0.87, 0.71) in which the amplitude ratio RA between the sound SV1 and the sound SV2 is high, in the first embodiment, It can be confirmed that the cepstrum distortion is suppressed as compared with the comparative 3. Therefore, the first embodiment is advantageous from the viewpoint of faithful separation of the sound SV1 and the sound SV2.

<B:第2実施形態>
本発明の第2実施形態を説明する。なお、以下の各例示において作用や機能が第1実施形態と同等である要素については、以上と同じ参照符号を流用して各々の詳細な説明を適宜に省略する。
<B: Second Embodiment>
A second embodiment of the present invention will be described. In the following examples, elements having the same functions and functions as those of the first embodiment are referred to by the same reference numerals as above, and detailed descriptions thereof are appropriately omitted.

第1実施形態では、第2周波数fBに選別された全部の周波数fkについて2次学習処理を実行したが、収音機器M1および収音機器M2による収音条件によっては、2次学習処理を実行しないほうが高精度な音源分離を実現できる場合もある。   In the first embodiment, the secondary learning process is executed for all the frequencies fk selected as the second frequency fB. However, the secondary learning process is executed depending on the sound collection conditions of the sound collection device M1 and the sound collection device M2. In some cases, it is possible to achieve high-precision sound source separation.

例えば、図9や図10から把握されるように、第1実施形態では、音響SV1と音響SV2とで強度(振幅やパワー)が乖離する場合(RA=0.5,0.32)に、音源分離の精度が対比例1や対比例2を下回る。他方、図9と図11との対比や図10と図12との対比から把握されるように、2次学習処理を実行しない対比例3の構成では、振幅比RAが低い場合でも、対比例1や対比例2に匹敵する精度の音源分離が実現される。したがって、音源分離の対象となる音響SV1と音響SV2との強度の相違(以下「音源強度差」という)が大きい場合(すなわち、収音条件が悪い場合)には、2次学習処理を実行しないほうが高精度な音源分離を実現できると理解できる。以上の傾向を考慮して、第2実施形態では、収音条件の良否に応じて2次学習処理の実行/停止を可変に制御する。   For example, as can be seen from FIG. 9 and FIG. 10, in the first embodiment, the accuracy of sound source separation when the intensity (amplitude and power) is different between the sound SV1 and the sound SV2 (RA = 0.5, 0.32). Is less than proportional 1 or proportional 2. On the other hand, as can be understood from the comparison between FIG. 9 and FIG. 11 and the comparison between FIG. 10 and FIG. 12, in the configuration of the proportional 3 that does not execute the secondary learning process, even if the amplitude ratio RA is low, the proportional Sound source separation with an accuracy comparable to 1 or proportional 2 is realized. Therefore, when the difference in intensity between the sound SV1 and the sound SV2 (hereinafter referred to as “sound source intensity difference”) that is the target of sound source separation is large (that is, when the sound collection condition is bad), the secondary learning process is not executed. It can be understood that higher accuracy sound source separation can be realized. In consideration of the above tendency, in the second embodiment, execution / stop of the secondary learning process is variably controlled according to the quality of the sound collection condition.

音源S1が発生する音響SV1と音源S2が発生する音響SV2との収音条件について以下に検討する。音響SV1の成分値s1(m,fk)と音響SV2の成分値s2(m,fk)とを要素とするベクトルS(m,fk)(S(m,fk)=(s1(m,fk),s2(m,fk))T)を想定すると、観測信号V1(t)および観測信号V2(t)の観測ベクトルX(m,fk)は、以下の数式(6)で表現される。数式(6)の行列A(fk)は、音源S1および音源S2の各々から収音機器M1および収音機器M2の各々に到達するまでに付与される音響特性を示す混合行列である。
X(m,fk)=A(fk)S(m,fk) ……(6)
The sound collection conditions for the sound SV1 generated by the sound source S1 and the sound SV2 generated by the sound source S2 will be discussed below. A vector S (m, fk) (S (m, fk) = (s1 (m, fk)) having the component value s1 (m, fk) of the sound SV1 and the component value s2 (m, fk) of the sound SV2 as elements. , s2 (m, fk)) T ), the observation vector X (m, fk) of the observation signal V1 (t) and the observation signal V2 (t) is expressed by the following equation (6). The matrix A (fk) in Expression (6) is a mixing matrix indicating acoustic characteristics that are given from the sound source S1 and the sound source S2 to the sound collection device M1 and the sound collection device M2.
X (m, fk) = A (fk) S (m, fk) (6)

数式(2)と数式(6)とを考慮すると、観測ベクトルX(m,fk)の共分散行列Rxx(fk)は、ベクトルS(m,fk)の共分散行列Rss(m,fk)(Rss(m,fk)=E[S(m,fk)S(m,fk)H])と混合行列A(fk)とを含む以下の数式(7)で表現される。
Rxx(fk)=E[X(m,fk)X(m,fk)H
=E[A(fk)S(m,fk){A(fk)S(m,fk)}H
=E[A(fk)S(m,fk)S(m,fk)HA(fk)H
=A(fk)E[S(m,fk)S(m,fk)H]A(fk)H
=A(fk)Rss(fk)A(fk)H ……(7)
Considering Equation (2) and Equation (6), the covariance matrix Rxx (fk) of the observation vector X (m, fk) is the covariance matrix Rss (m, fk) ( Rss (m, fk) = E [S (m, fk) S (m, fk) H ]) and a mixing matrix A (fk) are expressed by the following equation (7).
Rxx (fk) = E [X (m, fk) X (m, fk) H ]
= E [A (fk) S (m, fk) {A (fk) S (m, fk)} H ]
= E [A (fk) S (m, fk) S (m, fk) H A (fk) H ]
= A (fk) E [S (m, fk) S (m, fk) H ] A (fk) H
= A (fk) Rss (fk) A (fk) H (7)

他方、共分散行列Rss(fk)は、以下の数式(8)のように固有値分解される。
Rss(fk)=Q(fk)Λ(fk)Q(fk)H ……(8)
数式(8)の行列Q(fk)は正規直交行列であるから、行列Q(fk)Q(fk)Hの行列式(det(Q(fk)Q(fk)H)は1である。したがって、共分散行列Rss(fk)の行列式det(Rss(fk))は、対角行列Λ(fk)の行列式det(Λ(fk))に等しい(det(Rss(fk))=det(Λ(fk)))。以上を考慮すると、共分散行列Rxx(fk)の行列式det(Rxx(fk))は、数式(7)を変形した以下の数式(9)で表現される。なお、数式(9)の記号Πは総乗(総積)の演算子(Πλi(fk)=λ1(fk)・λ2(fk))を意味する。
det(Rxx(fk))=det(A(fk)Rss(fk)A(fk)H)
=det(A(fk))det(Rss(fk))det(A(fk)H)
=|det(A(fk))|2det(Λ(fk))
=|det(A(fk))|2Πλi(fk) ……(9)
On the other hand, the covariance matrix Rss (fk) is subjected to eigenvalue decomposition as shown in the following equation (8).
Rss (fk) = Q (fk) Λ (fk) Q (fk) H (8)
Since the matrix of Equation (8) Q (fk) is an orthonormal matrix, determinant of matrix Q (fk) Q (fk) H (det (Q (fk) Q (fk) H) is 1. Thus , The determinant det (Rss (fk)) of the covariance matrix Rss (fk) is equal to the determinant det (Λ (fk)) of the diagonal matrix Λ (fk) (det (Rss (fk)) = det ( Λ (fk))) In consideration of the above, the determinant det (Rxx (fk)) of the covariance matrix Rxx (fk) is expressed by the following equation (9) obtained by modifying equation (7). In the equation (9), the symbol 総 means the operator (Πλi (fk) = λ1 (fk) · λ2 (fk)).
det (Rxx (fk)) = det (A (fk) Rss (fk) A (fk) H )
= Det (A (fk)) det (Rss (fk)) det (A (fk) H )
= | Det (A (fk)) | 2 det (Λ (fk))
= | Det (A (fk)) | 2 Πλi (fk) …… (9)

数式(9)の行列式det(A(fk))は、混合行列Aを適用した線形写像における定数倍の要素に相当するから、収音機器M1および収音機器M2の各々に対する音響SV1や音響SV2の伝播が阻害される度合(以下「伝播阻害度」という)が大きいほど(収音条件が悪いほど)、数式(9)の行列式det(A(fk))は小さい数値となる。他方、数式(9)の記号λi(fk)は対角行列Λの成分(共分散行列Rss(fk)の固有値)である。すなわち、固有値λ1(fk)は音響SV1の周波数fkの成分の強度(パワー)に相当し、固有値λ2(fk)は音響SV2の周波数fkの成分の強度(パワー)に相当する。したがって、音源強度差が大きいほど(収音条件が悪いほど)、数式(9)の総乗Πλi(fk)は小さい数値となる。   Since the determinant det (A (fk)) of Equation (9) corresponds to an element of a constant multiple in the linear map to which the mixing matrix A is applied, the sound SV1 and sound for each of the sound collection devices M1 and M2 The greater the degree of inhibition of SV2 propagation (hereinafter referred to as “propagation inhibition degree”) (the worse the sound collection condition), the smaller the determinant det (A (fk)) of Equation (9). On the other hand, the symbol λi (fk) in Equation (9) is a component of the diagonal matrix Λ (the eigenvalue of the covariance matrix Rss (fk)). That is, the eigenvalue λ1 (fk) corresponds to the intensity (power) of the frequency fk component of the sound SV1, and the eigenvalue λ2 (fk) corresponds to the intensity (power) of the frequency fk component of the sound SV2. Therefore, the greater the sound source intensity difference (the worse the sound collection condition), the smaller the total power λi (fk) in Equation (9).

以上の説明から理解されるように、収音条件が悪いほど(伝播阻害度や音源強度差が大きいほど)、共分散行列Rxx(fk)の行列式z1(fk)(z1(fk)=det(Rxx(fk)))は小さい数値になるという傾向がある。以上の傾向を考慮して、第2実施形態では、収音条件の良否の判定に行列式z1(fk)を適用する。   As understood from the above description, the worse the sound collection condition (the greater the propagation inhibition degree or the sound source intensity difference), the more the determinant z1 (fk) (z1 (fk) = det of the covariance matrix Rxx (fk) (Rxx (fk))) tends to be small. In consideration of the above tendency, in the second embodiment, the determinant z1 (fk) is applied to determine whether the sound collection condition is good or bad.

図13は、第2実施形態における分離行列生成部28Aのブロック図である。第2実施形態の分離行列生成部28Aは、第1実施形態の分離行列生成部28の各要素に条件判定部72を追加した構成である。条件判定部72は、第2周波数fBに選別された周波数fk毎に収音条件の良否を判定する。条件判定部72による判定には、周波数選別部44による周波数fkの選別のために有意指標算定部42が算定した行列式z1(fk)が流用される。すなわち、条件判定部72は、行列式z1(fk)が所定の閾値を上回る場合には周波数fkの収音条件が良い(伝播阻害度や音源強度差が小さい)と判定し、行列式z1(fk)が閾値を下回る場合には周波数fkの収音条件が悪い(伝播阻害度や音源強度差が大きい悪条件である)と判定する。   FIG. 13 is a block diagram of the separation matrix generation unit 28A in the second embodiment. The separation matrix generation unit 28A of the second embodiment has a configuration in which a condition determination unit 72 is added to each element of the separation matrix generation unit 28 of the first embodiment. The condition determination unit 72 determines whether the sound collection condition is good or not for each frequency fk selected as the second frequency fB. In the determination by the condition determination unit 72, the determinant z1 (fk) calculated by the significant index calculation unit 42 for the selection of the frequency fk by the frequency selection unit 44 is used. That is, the condition determining unit 72 determines that the sound collection condition of the frequency fk is good (the propagation inhibition degree and the sound source intensity difference are small) when the determinant z1 (fk) exceeds a predetermined threshold, and the determinant z1 ( When fk) falls below the threshold, it is determined that the sound collection condition of the frequency fk is bad (a bad condition with a large degree of propagation inhibition and a difference in sound source intensity).

図13の第2学習部66は、条件判定部72の判定の結果に応じて2次学習処理の実行/停止を周波数fk毎に決定する。すなわち、第2周波数fBに選別された周波数fkのうち行列式z1(fk)が大きい(収音条件が良い)と判定された周波数fkについて、第2学習部66は、第1実施形態と同様に、初期分離行列WB[0](fk)を初期値とした2次学習処理で分離行列WB(fk)を生成する。他方、第2周波数fBに選別された周波数fkのうち行列式z1(fk)が小さい(収音条件が悪い)と判定された周波数fkについて、第2学習部66は、2次学習処理を停止し、初期行列設定部64が設定した初期分離行列WB[0](fk)を分離行列WB(fk)として確定する。したがって、行列式z1が小さい(悪条件)の学習データD(fk)は分離行列WB(fk)の生成に使用されない。 The second learning unit 66 in FIG. 13 determines the execution / stop of the secondary learning process for each frequency fk according to the determination result of the condition determination unit 72. That is, the second learning unit 66 is the same as in the first embodiment for the frequency fk determined that the determinant z1 (fk) is large (the sound collection condition is good) among the frequencies fk selected as the second frequency fB. In addition, the separation matrix WB (fk) is generated by the secondary learning process using the initial separation matrix WB [0] (fk) as an initial value. On the other hand, the second learning unit 66 stops the secondary learning process for the frequency fk determined that the determinant z1 (fk) is small (the sound collection condition is bad) among the frequencies fk selected as the second frequency fB. Then, the initial separation matrix WB [0] (fk) set by the initial matrix setting unit 64 is determined as the separation matrix WB (fk). Therefore, the learning data D (fk) having a small determinant z1 (bad condition) is not used to generate the separation matrix WB (fk).

第2実施形態においても第1実施形態と同様の効果が実現される。また、第2実施形態では、収音条件の良否に応じて2次学習処理の実行/停止が制御されるから、第2周波数fBに選別された各周波数fkについて収音条件に関わらず2次学習処理を実行する場合と比較して、音源分離の精度を維持しながら分離行列W(fk)の生成の演算量が削減されるという格別の効果が実現される。   In the second embodiment, the same effect as in the first embodiment is realized. In the second embodiment, the execution / stop of the secondary learning process is controlled according to the quality of the sound collection condition. Therefore, the secondary frequency is selected for each frequency fk selected as the second frequency fB regardless of the sound collection condition. Compared to the case where the learning process is executed, a special effect is achieved in that the amount of computation for generating the separation matrix W (fk) is reduced while maintaining the accuracy of sound source separation.

<C:第3実施形態>
第2実施形態では、第2周波数fBに選別された周波数fkのうち収音条件が悪いと判定された周波数fkについて、初期行列設定部64が死角制御型のビーム形成で生成した初期分離行列WB[0](fk)を分離行列WB(fk)として利用したが、収音条件が悪い周波数fkのについて分離行列WB(fk)を設定する方法は、第3実施形態として以下に例示するように適宜に変更される。
<C: Third Embodiment>
In the second embodiment, the initial separation matrix WB generated by the initial matrix setting unit 64 by the blind spot control type beam forming for the frequency fk determined to have a poor sound collection condition among the frequencies fk selected as the second frequency fB. [0] Although (fk) is used as the separation matrix WB (fk), a method of setting the separation matrix WB (fk) for the frequency fk having a poor sound collection condition is exemplified as the third embodiment below. It is changed appropriately.

観測信号V1(t)や観測信号V2(t)の周波数fkの成分が1個の音源Sの音響SVのみを含む場合、条件判定部72は周波数fkの収音条件が悪いと判定する。他方、1個の音源Sの音響SVのみが周波数fkに存在するのであれば、音源分離の前後で周波数fkの成分を変化させる必要性は低い。そこで、収音条件が悪い(1個の音源Sのみを含む)と条件判定部72が判定した周波数fkについては、第2学習部66による2次学習処理に加えて初期行列設定部64による初期分離行列WB[0](fk)の生成も停止し、音源分離の前後で周波数fkの成分が過度に変化しないように分離行列WB(fk)を設定する構成が採用され得る。具体的な構成を以下に詳述する。 When the component of the frequency fk of the observation signal V1 (t) or the observation signal V2 (t) includes only the sound SV of one sound source S, the condition determination unit 72 determines that the sound collection condition of the frequency fk is bad. On the other hand, if only the sound SV of one sound source S exists at the frequency fk, it is less necessary to change the component of the frequency fk before and after sound source separation. Therefore, for the frequency fk determined by the condition determination unit 72 that the sound collection condition is bad (including only one sound source S), in addition to the secondary learning process by the second learning unit 66, the initial value by the initial matrix setting unit 64 The generation of the separation matrix WB [0] (fk) is also stopped, and a configuration in which the separation matrix WB (fk) is set so that the component of the frequency fk does not change excessively before and after the sound source separation can be adopted. A specific configuration will be described in detail below.

図14は、第3実施形態における分離行列生成部28Bのブロック図である。第3実施形態の分離行列生成部28Bは、第2実施形態の分離行列生成部28Aに方向推定部74と行列設定部76とを追加した構成である。   FIG. 14 is a block diagram of the separation matrix generation unit 28B in the third embodiment. The separation matrix generation unit 28B of the third embodiment has a configuration in which a direction estimation unit 74 and a matrix setting unit 76 are added to the separation matrix generation unit 28A of the second embodiment.

方向推定部74は、第2周波数fBに選別された周波数のうち収音条件が悪いと条件判定部72が判定した周波数fk毎に、学習データD(fk)を利用して音源方向(すなわち、周波数fkの成分を含む音響を放射する1個の音源の方向)θe(fk)を推定する。音源方向θe(fk)の推定には公知の技術が任意に採用され得る。行列設定部76は、方向推定部74が推定した音源方向θe(fk)から到来する音響を観測信号V1(t)および観測信号V2(t)から分離する分離行列WB(fk)を生成する。例えば、行列設定部76は、収音条件が悪いと判定された周波数fk毎に以下の処理を実行することで分離行列WB(fk)を生成する。   The direction estimation unit 74 uses the learning data D (fk) for each frequency fk determined by the condition determination unit 72 that the sound collection condition is bad among the frequencies selected as the second frequency fB, that is, the sound source direction (that is, The direction of one sound source that emits sound including a component of frequency fk) θe (fk) is estimated. A known technique can be arbitrarily employed for estimating the sound source direction θe (fk). The matrix setting unit 76 generates a separation matrix WB (fk) that separates the sound coming from the sound source direction θe (fk) estimated by the direction estimating unit 74 from the observation signal V1 (t) and the observation signal V2 (t). For example, the matrix setting unit 76 generates the separation matrix WB (fk) by executing the following processing for each frequency fk determined to have poor sound collection conditions.

第1に、行列設定部76は、以下の数式で定義される抽出行列C(fk)を生成する。記号dは収音機器M1と収音機器M2との間隔を意味し、記号cは音速を意味する。したがって、記号τは、音源方向θe(fk)から到来する音響が収音機器M1および収音機器M2の各々に到達する時間差に相当する。抽出行列C(fk)の第1行は、遅延加算型のビーム形成に適用した場合に音源方向θe(fk)からの到来音を強調する。

Figure 0005387442
First, the matrix setting unit 76 generates an extraction matrix C (fk) defined by the following mathematical formula. The symbol d means the interval between the sound collecting device M1 and the sound collecting device M2, and the symbol c means the speed of sound. Therefore, the symbol τ corresponds to the time difference at which the sound coming from the sound source direction θe (fk) reaches each of the sound collecting device M1 and the sound collecting device M2. The first row of the extraction matrix C (fk) emphasizes the incoming sound from the sound source direction θe (fk) when applied to delay-added beamforming.
Figure 0005387442

第2に、行列設定部76は、方向推定部62が分離行列WA(fk)から推定した方向θ1および方向θ2と方向推定部74が推定した音源方向θe(fk)との関係に応じて抽出行列C(fk)の各行を相互に置換することで分離行列WB(fk)を生成する。具体的には、行列設定部76は、音源方向θe(fk)が方向θ1に近い場合には周波数fkの成分が分離信号Y1(t)にて強調され、音源方向θe(fk)が方向θ2に近い場合には周波数fkの成分が分離信号Y2(t)にて強調されるように、抽出行列C(fk)の各行の位置を調整する。   Second, the matrix setting unit 76 extracts the direction θ1 and the direction θ2 estimated from the separation matrix WA (fk) by the direction estimation unit 62 and the relationship between the sound source direction θe (fk) estimated by the direction estimation unit 74. A separation matrix WB (fk) is generated by replacing each row of the matrix C (fk) with each other. Specifically, the matrix setting unit 76 emphasizes the component of the frequency fk with the separated signal Y1 (t) when the sound source direction θe (fk) is close to the direction θ1, and the sound source direction θe (fk) is the direction θ2. If the frequency is close to, the position of each row of the extraction matrix C (fk) is adjusted so that the component of the frequency fk is emphasized by the separation signal Y2 (t).

例えば、分離行列WA(fk)の第1行(w11(fk),w12(fk))が方向θ1の音響SV1を強調する(方向θ2に死角を形成する)ように作用し、分離行列WA(fk)の第2行(w21(fk),w22(fk))が方向θ2の音響SV2を強調する(方向θ1に死角を形成する)ように作用する場合を想定する。音源方向θe(fk)が方向θ1に近い場合、行列設定部76は、前述の抽出行列C(fk)を分離行列WB(fk)として確定する。したがって、分離信号Y1(t)のうち収音条件が悪いと判定された周波数fkの成分値y1(m,fk)は音源方向θe(fk)からの到来音を強調した数値に設定され、分離信号Y2(t)の当該周波数fkの成分値y2(m,fk)はゼロに設定される。他方、音源方向θe(fk)が方向θ2に近い場合、行列設定部76は、抽出行列C(fk)の第1行と第2行とを入替えた行列を分離行列WB(fk)として確定する。したがって、分離信号Y1(fk)のうち収音条件が悪いと判定された周波数fkの成分値y1(m,fk)はゼロに設定され、分離信号Y2(t)の当該周波数fkの成分値y2(m,fk)は音源方向θe(fk)からの到来音を強調した数値に設定される。   For example, the first row (w11 (fk), w12 (fk)) of the separation matrix WA (fk) acts to emphasize the acoustic SV1 in the direction θ1 (forms a blind spot in the direction θ2), and the separation matrix WA ( Assume that the second row (w21 (fk), w22 (fk)) of fk) acts to emphasize the sound SV2 in the direction θ2 (forms a blind spot in the direction θ1). When the sound source direction θe (fk) is close to the direction θ1, the matrix setting unit 76 determines the extraction matrix C (fk) described above as the separation matrix WB (fk). Therefore, the component value y1 (m, fk) of the frequency fk determined to have poor sound collection conditions in the separated signal Y1 (t) is set to a value that emphasizes the incoming sound from the sound source direction θe (fk), and is separated. The component value y2 (m, fk) of the frequency fk of the signal Y2 (t) is set to zero. On the other hand, when the sound source direction θe (fk) is close to the direction θ2, the matrix setting unit 76 determines a matrix obtained by replacing the first row and the second row of the extraction matrix C (fk) as the separation matrix WB (fk). . Therefore, the component value y1 (m, fk) of the frequency fk that is determined to be bad in the sound collection condition in the separated signal Y1 (fk) is set to zero, and the component value y2 of the frequency fk of the separated signal Y2 (t). (M, fk) is set to a numerical value that emphasizes the incoming sound from the sound source direction θe (fk).

第3実施形態においても第1実施形態や第2実施形態と同様の効果が実現される。また、第3実施形態では、収音条件が悪い周波数fkについて、第2学習部66による2次学習処理に加えて初期行列設定部64による初期分離行列WB[0](fk)の生成も停止するから、分離行列WB(fk)の生成に必要な演算量が第2実施形態と比較して削減されるという利点もある。 In the third embodiment, the same effects as those of the first embodiment and the second embodiment are realized. In the third embodiment, the generation of the initial separation matrix WB [0] (fk) by the initial matrix setting unit 64 is also stopped in addition to the secondary learning process by the second learning unit 66 for the frequency fk having a poor sound collection condition. Therefore, there is also an advantage that the amount of calculation required for generating the separation matrix WB (fk) is reduced as compared with the second embodiment.

なお、抽出行列C(fk)の内容は以上の例示に限定されない。例えば、以下に例示する抽出行列C(fk)を利用すれば、観測信号V1(t)または観測信号V2(t)に含まれる周波数fkの成分がそのまま分離信号Y1(t)または分離信号Y2(t)の周波数fkの成分として信号分離部24から出力される。

Figure 0005387442
The contents of the extraction matrix C (fk) are not limited to the above examples. For example, if the extraction matrix C (fk) illustrated below is used, the component of the frequency fk included in the observation signal V1 (t) or the observation signal V2 (t) is directly used as the separation signal Y1 (t) or the separation signal Y2 ( The signal is output from the signal separation unit 24 as a component of the frequency fk of t).
Figure 0005387442

<D:第4実施形態(有意指標値の例示)>
以上の各形態において周波数選別部44による選別の基準となる有意指標値Z(fk)は共分散行列Rxx(fk)の行列式z1(fk)に限定されない。具体的には、以下の各態様に例示する数値(統計量)が有意指標値Z(fk)として採用され得る。
<D: Fourth Embodiment (Exemplary Significant Index Value)>
In each of the above embodiments, the significant index value Z (fk) serving as a reference for selection by the frequency selection unit 44 is not limited to the determinant z1 (fk) of the covariance matrix Rxx (fk). Specifically, numerical values (statistics) exemplified in the following aspects can be adopted as the significant index value Z (fk).

<D-1:第1の態様(条件数z2(fk))>
学習データD(fk)を構成する複数の観測ベクトルX(m,fk)の共分散行列Rxx(fk)の条件数z2(fk)は以下の数式(10)で定義される。数式(10)の演算子‖A‖は、行列Aのノルム(行列の距離)を意味する。共分散行列Rxx(fk)に逆行列が存在する場合(正則である場合)に条件数z2(fk)は小さく、共分散行列Rxx(fk)に逆行列が存在しない場合に条件数z2(fk)は大きい数値となる。
z2(fk)=‖Rxx(fk)‖・‖Rxx(fk)-1‖ ……(10)
<D-1: First mode (condition number z2 (fk))>
The condition number z2 (fk) of the covariance matrix Rxx (fk) of the plurality of observation vectors X (m, fk) constituting the learning data D (fk) is defined by the following equation (10). The operator ‖A‖ in Expression (10) means the norm (matrix distance) of the matrix A. The condition number z2 (fk) is small when the inverse matrix exists in the covariance matrix Rxx (fk) (when it is regular), and the condition number z2 (fk) when the inverse matrix does not exist in the covariance matrix Rxx (fk). ) Is a large number.
z2 (fk) = ‖Rxx (fk) ‖ ・ ‖Rxx (fk) -1 ………… (10)

共分散行列Rxx(fk)は以下の数式(11A)のように固有値分解される。数式(11A)の行列Uは固有行列(固有ベクトルを要素とする行列)であり、行列Σは、固有値を要素とする対角行列である。また、共分散行列Rxx(fk)の逆行列は、数式(11A)を変形した以下の数式(11B)で表現される。
Rxx(fk)=UΣUH ……(11A)
Rxx(fk)-1=UΣ-1H ……(11B)
The covariance matrix Rxx (fk) is subjected to eigenvalue decomposition as shown in the following equation (11A). The matrix U in Equation (11A) is an eigenmatrix (matrix having eigenvectors as elements), and the matrix Σ is a diagonal matrix having eigenvalues as elements. The inverse matrix of the covariance matrix Rxx (fk) is expressed by the following formula (11B) obtained by modifying the formula (11A).
Rxx (fk) = UΣU H (11A)
Rxx (fk) -1 = UΣ -1 U H (11B)

行列Σの要素にゼロが含まれる場合には数式(11B)の行列Σ-1が無限大に発散するため、共分散行列Rxx(fk)の逆行列は存在しない(すなわち、数式(10)の条件数z2(fk)は大きい数値となる)。一方、行列Σの要素(共分散行列Rxx(fk)の固有値)がゼロに近い数値を含むということは、観測ベクトルX(m,fk)の分布における基底の総数が少ないことを意味する。したがって、観測ベクトルX(m,fk)の基底の総数が少ないほど共分散行列Rxx(fk)の条件数z2(fk)が大きい(基底の総数が多いほど条件数z2(fk)は小さい)という傾向が把握される。つまり、共分散行列Rxx(fk)の条件数z2(fk)は、行列式z1(fk)と同様に、観測ベクトルX(m,fk)の基底の総数の指標として機能する。 When the element of the matrix Σ includes zero, the matrix Σ −1 of the formula (11B) diverges infinitely, and therefore there is no inverse matrix of the covariance matrix Rxx (fk) (that is, the formula (10) Condition number z2 (fk) is a large numerical value). On the other hand, the fact that the elements of the matrix Σ (the eigenvalues of the covariance matrix Rxx (fk)) include values close to zero means that the total number of bases in the distribution of the observation vector X (m, fk) is small. Therefore, the condition number z2 (fk) of the covariance matrix Rxx (fk) is larger as the total number of bases of the observation vector X (m, fk) is smaller (the condition number z2 (fk) is smaller as the total number of bases is larger). The trend is grasped. That is, the condition number z2 (fk) of the covariance matrix Rxx (fk) functions as an index of the total number of bases of the observation vector X (m, fk), similarly to the determinant z1 (fk).

以上の傾向を考慮して、第1の態様においては、共分散行列Rxx(fk)の条件数z2(fk)を有意指標値Z(fk)として利用する。すなわち、有意指標算定部42は、K個の周波数f1〜fKの各々の共分散行列Rxx(fk)について数式(10)の演算を実行することで条件数z2(fk)(z2(f1)〜z2(fK))を算定する。周波数選別部44は、有意指標算定部42の算定した条件数z2(fk)が小さい1個以上の周波数fk(例えば、昇順で上位に位置する所定個の周波数fkや閾値を下回る周波数fk)を第1周波数fAに選別するとともに残余の周波数fkを第2周波数fBに選別する。   Considering the above tendency, in the first mode, the condition number z2 (fk) of the covariance matrix Rxx (fk) is used as the significant index value Z (fk). In other words, the significant index calculation unit 42 performs the operation of Expression (10) on the covariance matrix Rxx (fk) of each of the K frequencies f1 to fK to thereby obtain the condition number z2 (fk) (z2 (f1) to z2 (fK)) is calculated. The frequency selection unit 44 selects one or more frequencies fk (for example, a predetermined number of frequencies fk positioned higher in ascending order or a frequency fk lower than the threshold) in the ascending order, with the condition number z2 (fk) calculated by the significant index calculation unit 42 being small. The first frequency fA is sorted and the remaining frequency fk is sorted to the second frequency fB.

<D-2:第2の態様(相互相関z3(fk),相互情報量z4(fk))>
独立成分分析の学習処理は、音源分離後の各信号が統計的に独立となるように分離行列W(fk)を更新する処理であるから、観測信号V1(t)と観測信号V2(t)とで統計的な相関が低い周波数fkほど、学習データD(fk)を使用した分離行列W(fk)の学習の有意性が高いと言える。そこで、第2の態様においては、観測信号V1(t)および観測信号V2(t)の相互間の独立性に応じた指標値(例えば相互相関z3(fk))を有意指標値Z(fk)として利用する。
<D-2: Second mode (cross-correlation z3 (fk), mutual information z4 (fk))>
The learning process of independent component analysis is a process of updating the separation matrix W (fk) so that each signal after the sound source separation is statistically independent. Therefore, the observation signal V1 (t) and the observation signal V2 (t) Therefore, it can be said that the learning frequency of the separation matrix W (fk) using the learning data D (fk) is higher as the frequency fk has a lower statistical correlation. Therefore, in the second mode, an index value (for example, cross-correlation z3 (fk)) corresponding to the independence between the observation signal V1 (t) and the observation signal V2 (t) is used as a significant index value Z (fk). Use as

観測信号V1(t)の周波数fkの成分と観測信号V2(t)の周波数fkの成分との相互相関z3(fk)は以下の数式(12)で表現される。数式(12)の記号σ1は、単位区間TU内の強度x1(m,fk)の標準偏差を意味し、記号σ2は、単位区間TU内の強度x2(m,fk)の標準偏差を意味する。
z3(fk)=E[{x1(m,fk)−E(x1(m,fk))}{x2(m,fk)−E(x2(m,fk))}]/σ1σ2 ……(12)
The cross-correlation z3 (fk) between the frequency fk component of the observation signal V1 (t) and the frequency fk component of the observation signal V2 (t) is expressed by the following equation (12). The symbol σ1 in the equation (12) means the standard deviation of the intensity x1 (m, fk) in the unit interval TU, and the symbol σ2 means the standard deviation of the intensity x2 (m, fk) in the unit interval TU. .
z3 (fk) = E [{x1 (m, fk) -E (x1 (m, fk))} {x2 (m, fk) -E (x2 (m, fk))}] / σ1σ2 (12 )

数式(12)から理解されるように、観測信号V1(t)と観測信号V2(t)との独立性が高い(相関が低い)周波数fkほど相互相関z3(fk)は小さい数値となる。以上の傾向を考慮して、第2の態様においては、K個の周波数f1〜fKの各々について数式(12)の演算を実行することで有意指標算定部42が相互相関z3(fk)(z3(f1)〜z3(fK))を算定し、周波数選別部44は、K個の周波数f1〜fKのうち相互相関z3(fk)が低い1個以上の周波数fk(例えば、昇順で上位の周波数fkや閾値を下回る周波数fk)を第1周波数fAに選別するとともに残余の周波数fkを第2周波数fBに選別する。   As understood from the equation (12), the cross-correlation z3 (fk) becomes a smaller numerical value as the frequency fk has higher independence (lower correlation) between the observation signal V1 (t) and the observation signal V2 (t). Considering the above tendency, in the second mode, the significant index calculation unit 42 performs the cross-correlation z3 (fk) (z3) by executing the calculation of the equation (12) for each of the K frequencies f1 to fK. (f1) to z3 (fK)), and the frequency selecting unit 44 selects one or more frequencies fk (for example, higher frequencies in ascending order) having a low cross-correlation z3 (fk) among the K frequencies f1 to fK. fk and the frequency fk below the threshold value are selected as the first frequency fA, and the remaining frequency fk is selected as the second frequency fB.

また、以下の数式(13)で定義される相互情報量z4(fk)も有意指標値Z(fk)として利用され得る。相互相関z3(fk)と同様に、観測信号V1(t)と観測信号V2(t)との独立性が高い(相関が低い)周波数fkほど相互情報量z4(fk)は小さい数値となる。したがって、周波数選別部44は、K個の周波数f1〜fKのうち相互情報量z4(fk)が低い1個以上の周波数fkを第1周波数fAに選別する。
z4(fk)=(−1/2)log(1−z3(fk)2) ……(13)
Further, the mutual information amount z4 (fk) defined by the following equation (13) can also be used as the significant index value Z (fk). Similarly to the cross-correlation z3 (fk), the mutual information z4 (fk) becomes a smaller numerical value as the frequency fk has a higher independence (lower correlation) between the observation signal V1 (t) and the observation signal V2 (t). Therefore, the frequency selection unit 44 selects one or more frequencies fk having a low mutual information amount z4 (fk) from the K frequencies f1 to fK as the first frequency fA.
z4 (fk) = (− 1/2) log (1-z3 (fk) 2 ) (13)

<D-3:第3の態様(トレースz5(fk))>
共分散行列Rxx(fk)のトレース(パワー)z5(fk)は共分散行列Rxx(fk)の対角成分の総和として定義される。共分散行列Rxx(fk)の対角成分は、単位区間TUにおける観測信号V1(t)の強度x1(m,fk)の分散σ12と単位区間TUにおける観測信号V2(t)の強度x2(m,fk)の分散σ22とに相当するから、共分散行列Rxx(fk)のトレースz5(fk)は、強度x1(m,fk)の分散σ12と強度x2(m,fk)の分散σ22との加算値(z5(fk)=σ12+σ22)としても定義される。
<D-3: Third mode (trace z5 (fk))>
The trace (power) z5 (fk) of the covariance matrix Rxx (fk) is defined as the sum of the diagonal components of the covariance matrix Rxx (fk). The diagonal components of the covariance matrix Rxx (fk) are the variance σ1 2 of the intensity x1 (m, fk) of the observation signal V1 (t) in the unit interval TU and the intensity x2 () of the observation signal V2 (t) in the unit interval TU. m, fk) corresponding to the variance σ2 2 , the trace z5 (fk) of the covariance matrix Rxx (fk) is the variance σ1 2 of the strength x1 (m, fk) and the variance of the strength x2 (m, fk) is defined as .sigma. @ 2 2 and the addition value (z5 (fk) = σ1 2 + σ2 2).

図15は、単位区間TU内の各観測ベクトルX(m,fk)の散布図である。図15の部分(A)は、トレースz5(fk)が大きい場合の散布図であり、図15の部分(B)は、トレースz5(fk)が小さい場合の散布図である。図15の部分(A)および部分(B)には、図7の部分(A)と同様に、音源S1からの音響SV1が優勢な観測ベクトルX(m,fk)が分布する領域A1と、音源S2からの音響SV2が優勢な観測ベクトルX(m,fk)が分布する領域A2とが模式的に図示されている。   FIG. 15 is a scatter diagram of each observation vector X (m, fk) in the unit interval TU. Part (A) in FIG. 15 is a scatter diagram when the trace z5 (fk) is large, and part (B) in FIG. 15 is a scatter diagram when the trace z5 (fk) is small. In the part (A) and the part (B) of FIG. 15, as in the part (A) of FIG. 7, an area A1 in which the observation vector X (m, fk) from which the sound SV1 from the sound source S1 is dominant is distributed, A region A2 in which the observation vector X (m, fk) in which the sound SV2 from the sound source S2 is dominant is distributed is schematically illustrated.

強度x1(m,fk)の分散σ12と強度x2(m,fk)の分散σ22との加算値という定義からも理解されるように、共分散行列Rxx(fk)のトレースz5(fk)が大きいほど観測ベクトルX(m,fk)は広範に分布する。したがって、トレースz5(fk)が大きい場合には、図15の部分(A)のように、観測ベクトルX(m,fk)の分布する領域(領域A1および領域A2)が音源S毎に明確に区別され、トレースz5(fk)が小さい場合には、図15の部分(B)のように領域A1と領域A2との区別は曖昧になるという傾向がある。つまり、トレースz5(fk)は、観測ベクトルX(m,fk)が分布する領域の形状(広がり)の指標値として機能する。そして、分離行列W(fk)の学習処理(独立成分分析)は、独立な基底を音源Sの個数だけ特定する処理と等価であるから、観測ベクトルX(m,fk)の分布する領域(基底)が音源S毎に明確に区別される周波数fk(すなわちトレースz5(fk)が大きい周波数fk)ほど、学習データD(fk)を使用した分離行列W(fk)の学習の有意性が高いと言える。 As can be understood from the definition of the added value of the variance σ1 2 of the intensity x1 (m, fk) and the variance σ2 2 of the intensity x2 (m, fk), the trace z5 (fk) of the covariance matrix Rxx (fk) The observation vector X (m, fk) is more widely distributed as the value of becomes larger. Therefore, when the trace z5 (fk) is large, the region (region A1 and region A2) where the observation vector X (m, fk) is distributed is clearly defined for each sound source S as shown in part (A) of FIG. If the trace z5 (fk) is small, the distinction between the region A1 and the region A2 tends to be ambiguous as shown in part (B) of FIG. That is, the trace z5 (fk) functions as an index value of the shape (expansion) of the region in which the observation vector X (m, fk) is distributed. Since the learning process (independent component analysis) of the separation matrix W (fk) is equivalent to the process of specifying the independent bases by the number of the sound sources S, the region in which the observation vector X (m, fk) is distributed (basis ) Is clearly distinguished for each sound source S (that is, the frequency fk at which the trace z5 (fk) is large), the learning of the separation matrix W (fk) using the learning data D (fk) is more significant. I can say that.

以上の傾向を考慮して、第3の態様では、共分散行列Rxx(fk)のトレースz5(fk)を有意指標値Z(fk)として利用する。すなわち、有意指標算定部42は、K個の周波数f1〜fKの各々の共分散行列Rxx(fk)の対角成分を加算することでトレースz5(fk)(z5(f1)〜z5(fK))を算定する。周波数選別部44は、有意指標算定部42の算定したトレースz5(fk)が大きい1個以上の周波数fk(例えば、降順で上位の周波数fkや閾値を上回る周波数fk)を第1周波数fAに選別するとともに残余の周波数fkを第2周波数fBに選別する。   Considering the above tendency, in the third mode, the trace z5 (fk) of the covariance matrix Rxx (fk) is used as the significant index value Z (fk). In other words, the significant index calculation unit 42 adds the diagonal components of the covariance matrix Rxx (fk) of each of the K frequencies f1 to fK to obtain the trace z5 (fk) (z5 (f1) to z5 (fK). ) Is calculated. The frequency sorting unit 44 sorts one or more frequencies fk (for example, a higher frequency fk or a frequency fk that exceeds a threshold value in descending order) having a large trace z5 (fk) calculated by the significant index calculation unit 42 into the first frequency fA. At the same time, the remaining frequency fk is selected as the second frequency fB.

<D-4:第4の態様(尖度z6(fk))>
観測信号V1(t)の強度x1(m,fk)の度数分布(強度x1(m,fk)を確率変数とする分布関数)における尖度(カートシス)z6(fk)は、以下の数式(14)で定義される。
z6(fk)=μ4(fk)/{μ2(fk)}2 ……(14)
<D-4: Fourth aspect (kurtosis z6 (fk))>
The kurtosis z6 (fk) in the frequency distribution of the intensity x1 (m, fk) of the observed signal V1 (t) (distribution function with the intensity x1 (m, fk) as a random variable) is expressed by the following formula (14 ).
z6 (fk) = μ4 (fk) / {μ2 (fk)} 2 …… (14)

数式(14)の記号μ4(fk)は、以下の数式(15A)で定義される4次のモーメントを意味し、数式(14)の記号μ2(fk)は、数式(15B)で定義される2次のモーメントを意味する。数式(15A)や数式(15B)の記号m(fk)は、単位区間TU内の複数のフレームにわたる強度x1(m,fk)の平均値を意味する。
μ4(fk)=E{x1(m,fk)−m(fk)}4 ……(15A)
μ2(fk)=E{x1(m,fk)−m(fk)}2 ……(15B)
The symbol μ4 (fk) in the equation (14) means a fourth-order moment defined by the following equation (15A), and the symbol μ2 (fk) in the equation (14) is defined by the equation (15B). Means second moment. The symbol m (fk) in the equations (15A) and (15B) means the average value of the intensity x1 (m, fk) over a plurality of frames in the unit interval TU.
μ4 (fk) = E {x1 (m, fk) −m (fk)} 4 …… (15A)
μ2 (fk) = E {x1 (m, fk) −m (fk)} 2 …… (15B)

音響SV1の成分SV1(fk)および音響SV2の成分SV2(fk)の一方のみが観測信号V1(t)に含まれる(あるいは支配的である)場合には尖度z6(fk)が大きい数値となり、成分SV1(fk)および成分SV2(fk)の双方が略同等の強度で観測信号V1(t)に含まれる場合には尖度z6(fk)が小さい数値となる(中心極限定理)。分離行列W(fk)の学習処理(独立成分分析)は、独立な基底を音源Sの個数だけ特定する処理と等価であるから、有意な音量で観測信号V1(t)に含まれる音響SVの音源Sの個数が多い周波数fk(すなわち、尖度z6(fk)が小さい周波数fk)ほど、学習データD(fk)を使用した分離行列W(fk)の学習の有意性が高いと言える。   When only one of the component SV1 (fk) of the sound SV1 and the component SV2 (fk) of the sound SV2 is included (or dominant) in the observation signal V1 (t), the kurtosis z6 (fk) is a large numerical value. When the component SV1 (fk) and the component SV2 (fk) are included in the observation signal V1 (t) with substantially the same intensity, the kurtosis z6 (fk) is a small value (central limit theorem). Since the learning process (independent component analysis) of the separation matrix W (fk) is equivalent to the process of specifying the independent bases by the number of the sound sources S, the acoustic SV included in the observation signal V1 (t) with a significant volume is obtained. It can be said that the learning of the separation matrix W (fk) using the learning data D (fk) is more significant as the frequency fk having a larger number of sound sources S (that is, the frequency fk having a smaller kurtosis z6 (fk)).

以上の傾向を考慮して、第4の態様では、観測信号V1(t)の強度x(m,fk)の度数分布における尖度z6(fk)を有意指標値Z(fk)として利用する。すなわち、有意指標算定部42は、K個の周波数f1〜fKの各々について数式(14)の演算を実行することで尖度z6(f1)〜z6(fK)を算定する。周波数選別部44は、K個の周波数f1〜fKのうち尖度z6(fk)が小さい1個以上の周波数fk(例えば、昇順で上位の周波数fkや閾値を下回る周波数fk)を第1周波数fAに選別するとともに残余の周波数fkを第2周波数fBに選別する。   Considering the above tendency, in the fourth mode, the kurtosis z6 (fk) in the frequency distribution of the intensity x (m, fk) of the observation signal V1 (t) is used as the significant index value Z (fk). That is, the significant index calculation unit 42 calculates the kurtosis z6 (f1) to z6 (fK) by executing the calculation of the equation (14) for each of the K frequencies f1 to fK. The frequency selection unit 44 selects one or more frequencies fk (for example, a higher frequency fk in an ascending order or a frequency fk lower than the threshold value in ascending order) from the K frequencies f1 to fK as the first frequency fA. And the remaining frequency fk is selected as the second frequency fB.

ところで、人間の音声の尖度は概ね40から70までの範囲内の数値となる。また、雑音が存在する環境で尖度が低下すること(中心極限定理)や尖度の測定の誤差などを考慮すると、人間の音声の尖度は概ね20から80までの範囲(以下「音声範囲」という)内に収まる。一方、空調設備の動作音や人込みでの雑踏音などの定常的な雑音のみが存在する周波数fkについては、観測信号V1(t)の尖度は充分に低い数値(例えば20を下回る数値)となるから、周波数選別部44にて第1周波数fAに選別される可能性が高い。しかし、音源分離の対象音(SV1,SV2)が人間の音声であるならば、定常的な雑音の周波数fkの学習データD(fk)を使用した分離行列Wの学習の有意性は低いと言える。   By the way, the kurtosis of human speech is a numerical value in the range of approximately 40 to 70. In addition, considering the reduction of kurtosis in the presence of noise (central limit theorem) and kurtosis measurement errors, the kurtosis of human speech is generally in the range of 20 to 80 (hereinafter referred to as “voice range”). ”). On the other hand, for the frequency fk where only stationary noise such as air-conditioning operation noise or crowded noise is present, the kurtosis of the observation signal V1 (t) is a sufficiently low value (for example, a value below 20). Therefore, there is a high possibility that the frequency selection unit 44 selects the first frequency fA. However, if the target sound (SV1, SV2) for sound source separation is a human voice, it can be said that the learning of the separation matrix W using the learning data D (fk) of the stationary noise frequency fk is low. .

そこで、定常的な雑音の周波数fkを第1周波数fAに選別することが回避されるように数式(14)の尖度を補正する構成が好適に採用される。例えば、有意指標算定部42は、数式(14)で定義される数値(以下「補正前尖度」という)と加重値qとの乗算値を補正後の尖度z6(fk)として算定する。加重値qは、例えば図16の例示のように補正前尖度に対して非線形に選定される。すなわち、補正前尖度が音声範囲の下限値(例えば20)を下回る範囲については、加重値qの乗算による補正後の尖度z6(fk)が音声範囲内の上限値(例えば80)を上回るように、補正前尖度に応じて加重値qが可変に選定され、音声範囲内の尖度については加重値qは所定値(例えば1)に設定される。なお、音声範囲の上限値を上回る範囲については、補正前尖度が充分に高い(すなわち周波数fkが第1周波数fAに選別される可能性は低い)ため、加重値qは音声範囲内と同等の数値に設定される。以上の構成によれば、所期の音声を高精度に分離できる分離行列W(fk)を生成することが可能である。   Therefore, a configuration in which the kurtosis of Equation (14) is corrected is preferably employed so that the stationary noise frequency fk is avoided from being selected as the first frequency fA. For example, the significant index calculation unit 42 calculates a multiplication value of a numerical value defined by Equation (14) (hereinafter referred to as “priority before correction”) and a weight value q as a corrected kurtosis z6 (fk). The weight value q is selected non-linearly with respect to the kurtosis before correction, for example, as illustrated in FIG. That is, for a range in which the kurtosis before correction is lower than the lower limit value (for example, 20) of the voice range, the kurtosis z6 (fk) after correction by multiplication of the weight value q exceeds the upper limit value (for example, 80) in the voice range. Thus, the weight value q is variably selected according to the kurtosis before correction, and the weight value q is set to a predetermined value (for example, 1) for the kurtosis in the speech range. In the range exceeding the upper limit of the voice range, the pre-correction kurtosis is sufficiently high (that is, the possibility that the frequency fk is selected as the first frequency fA is low), so the weight value q is equal to that in the voice range. Set to the number of. According to the above configuration, it is possible to generate a separation matrix W (fk) that can separate desired speech with high accuracy.

<E:変形例>
以上の各形態には様々な変形が加えられる。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された2以上の態様は適宜に併合され得る。
<E: Modification>
Various modifications are added to the above embodiments. Specific modifications are exemplified below. Two or more aspects arbitrarily selected from the following examples can be appropriately combined.

(1)変形例1
周波数f1〜fKを第1周波数fAおよび第2周波数fBに選別する方法は適宜に変更される。例えば、以上に例示した複数種の指標から有意指標値Z(fk)を算定する構成が採用され得る。すなわち、有意指標算定部42は、以上に例示した指標(z1(fk)〜z6(fk))から選択された複数種の指標の加重和(例えば行列式z1(fk)とトレースz5(fk)の加重和)を有意指標値Z(fk)として算定する。
(1) Modification 1
The method of selecting the frequencies f1 to fK into the first frequency fA and the second frequency fB is appropriately changed. For example, a configuration in which the significant index value Z (fk) is calculated from the plurality of types of indexes exemplified above can be adopted. That is, the significant index calculation unit 42 calculates a weighted sum (for example, determinant z1 (fk) and trace z5 (fk) of a plurality of types of indices selected from the indices (z1 (fk) to z6 (fk)) exemplified above. Is calculated as a significant index value Z (fk).

なお、第1周波数fAと第2周波数fBとの選別に有意指標値Z(fk)を利用する構成(有意指標算定部42)は省略され得る。具体的には、観測ベクトルX(m,fk)(学習データD(fk))とは無関係に周波数fkを選別する構成も採用され得る。例えば、周波数選別部44は、周波数f1〜fKの配列から所定個毎に選択した各周波数fkを第1周波数fAに選別するとともに残余の周波数fkを第2周波数fBに選別する。また、観測信号V1(t)および観測信号V2(t)に想定される音響特性や学習処理の内容等の事情から、学習処理の有意性が高い周波数fkが例えば実験的または統計的に事前に判明しているならば、当該周波数fkを第1周波数fAに選別するとともに残余の周波数fkを第2周波数fBに選別する構成が採用され得る。以上の例示のように有意指標値Z(fk)の算定を省略すれば、演算処理装置12の演算量が削減されるという利点がある。   Note that the configuration (significant index calculating unit 42) that uses the significant index value Z (fk) for selection between the first frequency fA and the second frequency fB can be omitted. Specifically, a configuration in which the frequency fk is selected regardless of the observation vector X (m, fk) (learning data D (fk)) may be employed. For example, the frequency sorting unit 44 sorts each frequency fk selected for each predetermined number from the arrangement of the frequencies f1 to fK into the first frequency fA and sorts the remaining frequency fk into the second frequency fB. Further, the frequency fk having a high significance of the learning process is experimentally or statistically determined in advance, for example, due to circumstances such as the acoustic characteristics assumed for the observation signal V1 (t) and the observation signal V2 (t) and the contents of the learning process. If known, a configuration may be adopted in which the frequency fk is selected as the first frequency fA and the remaining frequency fk is selected as the second frequency fB. If the calculation of the significant index value Z (fk) is omitted as illustrated above, there is an advantage that the calculation amount of the arithmetic processing device 12 is reduced.

(2)変形例2
第2実施形態では、観測ベクトルX(m,fk)の共分散行列Rxx(fk)の行列式z1(fk)を収音条件の良否の判定に適用したが、収音条件の良否の判定の方法は任意である。例えば、観測ベクトルX(m,fk)の共分散行列Rxx(fk)の条件数z2(fk)は、数値解析の難易の尺度として機能する。学習データD(fk)の数値解析が容易であるほど収音条件が良いという観点からすると、有意指標算定部42が算定する条件数z2(fk)に応じて収音条件の良否を判定する構成が採用され得る。条件数z2(fk)が1に近いほど収音条件は良いと評価できるから、条件判定部72は、条件数z2(fk)が閾値を下回る場合には周波数fkの収音条件が良い(良条件)と判定し、条件数z2(fk)が閾値を上回る場合には周波数fkの収音条件が悪い(悪条件)と判定する。収音条件が悪い周波数fk(第2周波数fB)については2次学習処理が省略される。
(2) Modification 2
In the second embodiment, the determinant z1 (fk) of the covariance matrix Rxx (fk) of the observation vector X (m, fk) is applied to determine the quality of the sound collection condition. The method is arbitrary. For example, the condition number z2 (fk) of the covariance matrix Rxx (fk) of the observation vector X (m, fk) functions as a measure of difficulty in numerical analysis. From the viewpoint that the sound collection condition is better as the numerical analysis of the learning data D (fk) is easier, the configuration for determining the quality of the sound collection condition according to the condition number z2 (fk) calculated by the significant index calculation unit 42 Can be employed. Since it can be evaluated that the sound collection condition is better as the condition number z2 (fk) is closer to 1, the condition determination unit 72 has a better sound collection condition of the frequency fk when the condition number z2 (fk) is lower than the threshold (good). If the condition number z2 (fk) exceeds the threshold, it is determined that the sound collection condition of the frequency fk is bad (bad condition). The secondary learning process is omitted for the frequency fk (second frequency fB) where the sound collection condition is bad.

なお、図9および図10を参照すると、振幅比RAが0.5を下回る場合に雑音抑圧率の低下やケプストラム歪の増加が顕在化するから、振幅比RAが0.5を下回る場合に悪条件と評価するのが妥当である。条件数z2(fk)は、音響SV1と音響SV2とのパワーの相対比に相当するから、振幅比RAが0.5である(パワーの相対比が0.25)である場合には、条件数z2(fk)が4となることが期待される。したがって、収音条件の良否の判定に条件数z2(fk)を利用する場合には、収音条件の良否の閾値を4に設定する(すなわち、条件数z2(fk)が4を下回る場合に良条件と判定し、条件数z2(fk)が4を上回る場合に悪条件と判定する)構成が好適に採用され得る。   Referring to FIGS. 9 and 10, since the reduction of the noise suppression rate and the increase of the cepstrum distortion become apparent when the amplitude ratio RA is below 0.5, it is evaluated as an unfavorable condition when the amplitude ratio RA is below 0.5. Is reasonable. Since the condition number z2 (fk) corresponds to the relative ratio of the power of the sound SV1 and the sound SV2, when the amplitude ratio RA is 0.5 (the relative ratio of power is 0.25), the condition number z2 (fk ) Is expected to be 4. Therefore, when the condition number z2 (fk) is used to determine whether or not the sound collection condition is good, the threshold value for the sound collection condition is set to 4 (that is, when the condition number z2 (fk) is less than 4). A configuration in which a good condition is determined and a bad condition is determined when the condition number z2 (fk) exceeds 4 can be suitably employed.

以上の例示のように、周波数fkの選別に適用される有意指標Z(fk)(z1(fk),z2(fk))を収音条件の良否の判定に流用する構成によれば、周波数fkの選別と収音条件の判定とに別個の指標を適用する構成と比較して演算量が削減されるという利点がある。ただし、周波数fkの選別と収音条件の判定とに別個の指標を適用する構成も採用され得る。例えば、収音条件の判定には行列式z1(fk)を適用し、周波数fkの選別には行列式z1(fk)以外の有意指標Z(fk)(z2(fk)〜z6(fk))を適用する構成が採用される。   As described above, according to the configuration in which the significant index Z (fk) (z1 (fk), z2 (fk)) applied to the selection of the frequency fk is used for the determination of the sound collection condition, the frequency fk There is an advantage that the amount of calculation is reduced as compared with the configuration in which separate indicators are applied to the selection of sound and the determination of the sound pickup condition. However, a configuration in which separate indicators are applied to the selection of the frequency fk and the determination of the sound collection condition can also be adopted. For example, the determinant z1 (fk) is applied for the determination of the sound pickup condition, and the significant index Z (fk) (z2 (fk) to z6 (fk)) other than the determinant z1 (fk) is selected for selecting the frequency fk. The structure which applies is adopted.

(3)変形例3
初期行列設定部52が初期分離行列WA[0](fk)を生成する方法は任意である。例えば、乱数を要素とする初期分離行列WA[0](fk)を初期行列設定部52が生成する構成が採用され得る。以上では音源S1の角度θ1や音源S2の角度θ2が未知である場合(事前情報を利用しない場合)を例示したが、事前情報(角度θ1や角度θ2)を利用して初期分離行列WA[0](fk)を生成する構成も好適である。事前情報を利用した初期分離行列WA[0](fk)の生成には、橘ほか5名,“Efficient Blind Source Separation Combining Closed-Form Second Order ICA and Nonclosed-Form Higher-Order ICA”, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 1, p. 45-48, Apr. 2007に開示された主成分分析や2次統計量ICAなどの部分空間法、または、特許第3949074号公報に開示された適応型ビーム形成が好適に採用され得る。また、MUSIC(MUltiple SIgnal Classification)法や最小分散法で推定した各音源Sの方向から各種のビーム形成(例えば適応型ビーム形成)を利用して初期分離行列WA[0](fk)を生成する方法や、因子分析で特定した因子ベクトルや正準相関分析で特定した正準ベクトルから初期分離行列WA[0](fk)を生成する方法も採用される。
(3) Modification 3
The method of generating the initial separation matrix WA [0] (fk) by the initial matrix setting unit 52 is arbitrary. For example, a configuration in which the initial matrix setting unit 52 generates the initial separation matrix WA [0] (fk) having random numbers as elements can be employed. In the above, the case where the angle θ1 of the sound source S1 and the angle θ2 of the sound source S2 are unknown (when the prior information is not used) is illustrated, but the initial separation matrix WA [0 is used using the prior information (angle θ1 and angle θ2). Arrangements for generating a (fk) is also suitable. The initial separation matrix WA [0] (fk) using prior information was generated by Tachibana et al., “Efficient Blind Source Separation Combining Closed-Form Second Order ICA and Nonclosed-Form Higher-Order ICA”, International Conference on Subspace methods such as principal component analysis and second order statistics ICA disclosed in Acoustics, Speech, and Signal Processing (ICASSP), Vol. 1, p. 45-48, Apr. 2007, or Japanese Patent No. 3990774 The adaptive beam forming disclosed in (1) can be suitably employed. In addition, the initial separation matrix WA [0] (fk) is generated from the direction of each sound source S estimated by the MUSIC (MUltiple SIgnal Classification) method or the minimum variance method using various beam forming (for example, adaptive beam forming). Alternatively, a method of generating an initial separation matrix WA [0] (fk) from a factor vector specified by factor analysis or a canonical vector specified by canonical correlation analysis may be employed.

100……信号処理装置、12……演算処理装置、14……記憶装置、22……周波数分析部、24……信号分離部、26……信号合成部、28,28A,28B……分離行列生成部、42……有意指標算定部、422……共分散行列算定部、424……行列式算定部、44……周波数選別部、50……第1処理部、52……初期行列設定部、54……第1学習部、56……補正処理部、60……第2処理部、62……方向推定部、64……初期行列設定部、66……第2学習部、72……条件判定部。
DESCRIPTION OF SYMBOLS 100 ... Signal processing device, 12 ... Arithmetic processing device, 14 ... Memory | storage device, 22 ... Frequency analysis part, 24 ... Signal separation part, 26 ... Signal composition part, 28, 28A, 28B ... Separation matrix Generation unit 42... Significant index calculation unit 422... Covariance matrix calculation unit 424 .. Determinant calculation unit 44... Frequency selection unit 50... First processing unit 52. 54... First learning unit 56... Correction processing unit 60... Second processing unit 62 .. Direction estimation unit 64. Condition determination unit.

Claims (5)

相異なる音源が発生した複数の音響の混合音を複数の収音機器で収音した複数の観測信号に対して複数の周波数の各々の分離行列を適用することで前記音源毎の複数の分離信号を生成する信号分離手段と、
前記複数の周波数を第1周波数と第2周波数とに選別する周波数選別手段と、
前記複数の観測信号における前記第1周波数の成分に対応する学習データを適用した1次学習処理で当該第1周波数の前記分離行列を生成する第1学習手段と、
前記第1学習手段が生成した分離行列から前記各音源の方向を推定する方向推定手段と、
前記方向推定手段が推定した方向に収音の死角またはビームが形成されるように初期分離行列を生成する初期行列設定手段と、
前記複数の観測信号における前記第2周波数の成分に対応する学習データを適用した2次学習処理を、前記初期行列設定手段が生成した初期分離行列を初期値として、前記1次学習処理よりも少ない反復回数で実行することで、当該第2周波数の前記分離行列を生成する第2学習手段と
を具備する信号処理装置。
A plurality of separated signals for each sound source by applying a separation matrix of each of a plurality of frequencies to a plurality of observation signals obtained by collecting a plurality of sound mixed sounds generated by different sound sources by a plurality of sound collecting devices Signal separating means for generating
Frequency sorting means for sorting the plurality of frequencies into a first frequency and a second frequency;
First learning means for generating the separation matrix of the first frequency by primary learning processing applying learning data corresponding to the component of the first frequency in the plurality of observation signals;
Direction estimating means for estimating the direction of each sound source from the separation matrix generated by the first learning means;
Initial matrix setting means for generating an initial separation matrix so that a dead angle or beam of sound collection is formed in the direction estimated by the direction estimation means;
The secondary learning process using the learning data corresponding to the component of the second frequency in the plurality of observation signals is less than the primary learning process with the initial separation matrix generated by the initial matrix setting means as an initial value. A signal processing apparatus comprising: a second learning unit that generates the separation matrix of the second frequency by executing the number of iterations.
収音条件の良否を周波数毎に判定する条件判定手段を具備し、
前記第2学習手段は、前記第2周波数に選別された各周波数のうち前記収音条件が良いと前記条件判定手段が判定した周波数については、前記初期分離行列を初期値とした前記第2学習処理で分離行列を生成し、前記収音条件が悪いと前記条件判定手段が判定した周波数については、前記初期分離行列を前記分離行列とする
請求項1の信号処理装置。
It comprises a condition determination means for determining the quality of sound collection conditions for each frequency,
The second learning means uses the initial separation matrix as an initial value for the frequencies determined by the condition determination means that the sound collection condition is good among the frequencies selected as the second frequency. The signal processing device according to claim 1, wherein a separation matrix is generated by processing, and the initial separation matrix is used as the separation matrix for the frequency determined by the condition determination unit when the sound pickup condition is bad.
前記各周波数の学習データを適用した学習処理の有意性を示す有意指標値を前記複数の観測信号から周波数毎に算定する有意指標算定手段を具備し、
前記周波数選別手段は、前記各周波数の有意指標値に応じて前記複数の周波数を前記第1周波数と前記第2周波数とに選別する
請求項2の信号処理装置。
Significant index calculation means for calculating, for each frequency, a significant index value indicating the significance of learning processing using the learning data of each frequency from the plurality of observation signals,
The signal processing device according to claim 2, wherein the frequency sorting unit sorts the plurality of frequencies into the first frequency and the second frequency according to a significant index value of each frequency.
前記条件判定手段は、前記有意指標算定手段が算定した各周波数の有意指標値に応じて周波数毎の前記収音条件の良否を判定する
請求項3の信号処理装置。
The signal processing apparatus according to claim 3, wherein the condition determination unit determines whether the sound collection condition for each frequency is acceptable according to a significant index value of each frequency calculated by the significant index calculation unit.
前記有意指標算定手段は、複数の観測信号の各々における各周波数での強度を要素とする観測ベクトルの共分散行列の行列式を前記有意指標として算定する
請求項3または請求項4の信号処理装置。
The signal processing device according to claim 3 or 4, wherein the significant index calculation means calculates, as the significant index, a determinant of a covariance matrix of an observation vector having an intensity at each frequency in each of a plurality of observation signals as an element. .
JP2010038295A 2010-02-24 2010-02-24 Signal processing device Expired - Fee Related JP5387442B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2010038295A JP5387442B2 (en) 2010-02-24 2010-02-24 Signal processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2010038295A JP5387442B2 (en) 2010-02-24 2010-02-24 Signal processing device

Publications (2)

Publication Number Publication Date
JP2011176535A JP2011176535A (en) 2011-09-08
JP5387442B2 true JP5387442B2 (en) 2014-01-15

Family

ID=44689000

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2010038295A Expired - Fee Related JP5387442B2 (en) 2010-02-24 2010-02-24 Signal processing device

Country Status (1)

Country Link
JP (1) JP5387442B2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6258442B1 (en) * 2016-10-28 2018-01-10 三菱電機インフォメーションシステムズ株式会社 Action specifying device, action specifying method, and action specifying program
US11886996B2 (en) * 2019-06-20 2024-01-30 Nippon Telegraph And Telephone Corporation Training data extension apparatus, training data extension method, and program
KR102329353B1 (en) * 2020-03-17 2021-11-22 성균관대학교산학협력단 A method for inferring of generating direction of sound using deep network and an apparatus for the same
CN116935883B (en) * 2023-09-14 2023-12-29 北京探境科技有限公司 Sound source positioning method and device, storage medium and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4897519B2 (en) * 2007-03-05 2012-03-14 株式会社神戸製鋼所 Sound source separation device, sound source separation program, and sound source separation method

Also Published As

Publication number Publication date
JP2011176535A (en) 2011-09-08

Similar Documents

Publication Publication Date Title
CN112447191B (en) Signal processing device and signal processing method
JP4195267B2 (en) Speech recognition apparatus, speech recognition method and program thereof
EP3511937B1 (en) Device and method for sound source separation, and program
JP4897519B2 (en) Sound source separation device, sound source separation program, and sound source separation method
US9123348B2 (en) Sound processing device
CN111415676B (en) Blind source separation method and system based on separation matrix initialization frequency point selection
JP5967571B2 (en) Acoustic signal processing apparatus, acoustic signal processing method, and acoustic signal processing program
US11894010B2 (en) Signal processing apparatus, signal processing method, and program
US8693287B2 (en) Sound direction estimation apparatus and sound direction estimation method
JP6987075B2 (en) Audio source separation
JP6225245B2 (en) Signal processing apparatus, method and program
JP5387442B2 (en) Signal processing device
JPWO2016167141A1 (en) Signal processing apparatus, signal processing method, and program
US11818557B2 (en) Acoustic processing device including spatial normalization, mask function estimation, and mask processing, and associated acoustic processing method and storage medium
JP5994639B2 (en) Sound section detection device, sound section detection method, and sound section detection program
KR102048370B1 (en) Method for beamforming by using maximum likelihood estimation
JP5406866B2 (en) Sound source separation apparatus, method and program thereof
JP5233772B2 (en) Signal processing apparatus and program
JP5263020B2 (en) Signal processing device
JP5826502B2 (en) Sound processor
Li et al. Low complex accurate multi-source RTF estimation
JP2020012976A (en) Sound source separation evaluation device and sound source separation device
US20230419980A1 (en) Information processing device, and output method
JP4714892B2 (en) High reverberation blind signal separation apparatus and method
JP5163435B2 (en) Signal processing apparatus and program

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20121219

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20130821

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20130910

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20130923

R150 Certificate of patent or registration of utility model

Ref document number: 5387442

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150

LAPS Cancellation because of no payment of annual fees