JP2012178679A

JP2012178679A - Sound processing device

Info

Publication number: JP2012178679A
Application number: JP2011040014A
Authority: JP
Inventors: Kazunobu Kondo; 多伸近藤; Kazuya Takeda; 一哉武田
Original assignee: Nagoya University NUC; Yamaha Corp
Current assignee: Nagoya University NUC; Yamaha Corp
Priority date: 2011-02-25
Filing date: 2011-02-25
Publication date: 2012-09-13
Anticipated expiration: 2031-02-25
Also published as: JP5826502B2

Abstract

PROBLEM TO BE SOLVED: To maintain the low frequency range intensities of signals after sound source separation.SOLUTION: An observation signal x1(t) and an observation signal x2(t) resulting from picking-up of a mixture of a sound S1 from a sound source PS1 and a sound S2 from a sound source PS2 by a sound pickup PM1 and a sound pickup PM2 are supplied to a sound processing device. A direction identifying unit 62 identifies the arriving direction θe1 of the sound S1 and the arriving direction θe2 of the sound S2. An orientation processor 64, by executing on the observation signal x1(t) and the observation signal x2(t) dead angle control type beam formation that forms a dead angle of sound picking-up in the arriving direction θei (i=1, 2), generates an orientation component Zi[n, u]. A coefficient value generator 66 generates as a processing coefficient value αi[n, u] the ratio of the amplitude of the orientation component Zi[n, u] to the amplitude sum of an orientation component Z1[n, u] and an orientation component Z2[n, u].

Description

本発明は、相異なる音源が発生した複数の音響の混合音のうち特定の音源からの音響を強調（分離または抽出）する技術に関する。 The present invention relates to a technique for emphasizing (separating or extracting) sound from a specific sound source among a plurality of mixed sound generated by different sound sources.

音声や雑音等の複数の音響の混合音を複数の収音機器で収音した複数の観測信号に音源分離を実行することで各音源からの音響を分離する音源分離技術が従来から提案されている。音源分離に適用される分離行列（逆混合行列）は、例えば周波数領域の独立成分分析（FDICA：Frequency-Domain Independent Component Analysis）を利用した学習処理（反復的な更新）で周波数毎に算定される。 Sound source separation technology has been proposed that separates sound from each sound source by performing sound source separation on multiple observation signals obtained by collecting multiple sounds such as voice and noise with multiple sound collection devices. Yes. The separation matrix (inverse mixing matrix) applied to sound source separation is calculated for each frequency by learning processing (iterative update) using, for example, frequency-domain independent component analysis (FDICA). .

特許文献１および非特許文献１には、複数の周波数から所定の条件で選択された周波数について複数の観測信号を利用した学習処理で分離行列を生成し、学習処理後の分離行列を利用して非選択の周波数の分離行列を補充する技術が開示されている。非選択の周波数の分離行列の生成には、例えば死角制御型ビーム形成（NBF（Null Beam Former））が利用される。すなわち、学習処理後の分離行列から推定される音響の到来方向に収音の死角が形成されるように非選択の周波数の分離行列が生成される。 In Patent Literature 1 and Non-Patent Literature 1, a separation matrix is generated by learning processing using a plurality of observation signals for frequencies selected from a plurality of frequencies under a predetermined condition, and the separation matrix after learning processing is used. A technique for supplementing the separation matrix of non-selected frequencies is disclosed. For example, blind angle control type beam forming (NBF (Null Beam Former)) is used to generate a separation matrix of non-selected frequencies. That is, a separation matrix of non-selected frequencies is generated such that a dead angle of sound collection is formed in the sound arrival direction estimated from the separation matrix after the learning process.

特開２０１０−１１７６５３号公報JP 2010-117653 A

大迫ほか３名，“死角制御型ビームフォーマによる周波数帯域補間を用いたブラインド音源分離の高速化手法”，日本音響学会講演論文集，日本音響学会，2007年3月，p.549-p.550Osako et al., “Fast method of blind source separation using frequency band interpolation by blind spot control type beamformer”, Proceedings of the Acoustical Society of Japan, Acoustical Society of Japan, March 2007, p.549-p.550

しかし、以上の技術では、音源分離後の信号のうち低域側の周波数での強度が死角制御型ビーム形成に起因して低くなるという問題がある。以上の事情を考慮して、本発明は、音源分離後の信号について低域側の強度を維持することを目的とする。 However, the above technique has a problem that the intensity at the low frequency side of the signal after the sound source separation is lowered due to blind spot control type beam formation. In view of the above circumstances, an object of the present invention is to maintain the low-frequency side strength of a signal after sound source separation.

以上の課題を解決するために本発明が採用する手段を説明する。なお、本発明の理解を容易にするために、以下の説明では、本発明の要素と後述の実施形態の要素との対応を括弧書で付記するが、本発明の範囲を実施形態の例示に限定する趣旨ではない。 Means employed by the present invention to solve the above problems will be described. In order to facilitate the understanding of the present invention, in the following description, the correspondence between the elements of the present invention and the elements of the embodiments described later will be indicated in parentheses, but the scope of the present invention will be exemplified in the embodiments. It is not intended to be limited.

本発明の音響処理装置は、複数の音源（例えば音源ＰS1および音源ＰS2）から到来する音響（例えば音響Ｓ1および音響Ｓ2）の混合音を複数の収音機器（例えば収音機器ＰM1および収音機器ＰM2）で収音した複数の観測信号（例えば観測信号ｘ1(t)および観測信号ｘ2(t)）を処理する音響処理装置であって、複数の音源の各々について音響の到来方向（例えば到来方向θe1および到来方向θe2）を特定する方向特定手段（例えば方向特定部６２）と、方向特定手段が特定した複数の到来方向の各々について、当該到来方向に収音の死角を形成する死角制御型ビーム形成を複数の観測信号について実行することで指向信号（例えば指向成分Ｚ1[n,u]および指向成分Ｚ2[n,u]）を生成する指向処理手段（例えば指向処理部６４）と、指向処理手段が生成した複数の指向信号の振幅の加算値に対する一の指向信号の振幅の比に応じた処理係数値（例えば処理係数値αi[n,u]）を周波数毎に生成する係数値生成手段（例えば係数値生成部６６）と、観測信号の各周波数の成分に当該周波数の処理係数値を作用させる第１信号処理手段（例えば信号処理部６８）とを具備する。 The sound processing apparatus of the present invention is configured to output mixed sound of sounds (for example, sound S1 and sound S2) coming from a plurality of sound sources (for example, sound source PS1 and sound source PS2) to a plurality of sound collection devices (for example, sound collection device PM1 and sound collection device). PM2) is an acoustic processing device for processing a plurality of observation signals (for example, observation signal x1 (t) and observation signal x2 (t)) collected by sound, and for each of the plurality of sound sources, the direction of arrival of sound (for example, the arrival direction) direction specifying means (for example, direction specifying unit 62) for specifying θe1 and direction of arrival θe2), and for each of a plurality of arrival directions specified by the direction specifying means, a blind spot control type beam that forms a dead angle of sound collection in the arrival direction. Directional processing means (for example, directional processing unit 64) for generating directional signals (for example, directional component Z1 [n, u] and directional component Z2 [n, u]) by executing formation on a plurality of observation signals; The compound generated by the means Coefficient value generation means (for example, coefficient value generation) that generates a processing coefficient value (for example, processing coefficient value αi [n, u]) corresponding to the ratio of the amplitude of one directional signal to the sum of the directional signal amplitudes Unit 66) and first signal processing means (for example, signal processing unit 68) for applying a processing coefficient value of the frequency to each frequency component of the observation signal.

以上の形態では、複数の指向信号の振幅の加算値に対する一の指向信号の振幅の比に応じた処理係数値を観測信号に作用させるから、例えば指向処理手段が生成した指向信号を音源分離後の音響信号として確定する構成と比較すると、音源分離後の信号について低域側の強度を維持することが可能である。 In the above embodiment, since the processing coefficient value corresponding to the ratio of the amplitude of one directional signal to the sum of the amplitudes of the plurality of directional signals is applied to the observation signal, for example, the directional signal generated by the directional processing means is separated from the sound source. As compared with the configuration determined as the acoustic signal, it is possible to maintain the low-side intensity of the signal after the sound source separation.

本発明の好適な態様において、複数の周波数（例えばＫ個の周波数Ｆ[1]〜Ｆ[K]）を第１周波数（例えばＭ個の第１周波数ＦA[1]〜ＦA[M]）と第２周波数（例えばＮ個の第２周波数ＦB[1]〜ＦB[N]）とに選別する周波数選別手段（例えば周波数選別部２４）と、複数の観測信号における各第１周波数の成分から分離行列（例えば分離行列Ｗ[m]）を第１周波数毎に生成する分離行列生成手段（例えば分離行列生成部５４）と、複数の観測信号における各第１周波数の成分に当該第１周波数の分離行列を作用させて第１分離信号（例えば分離成分ＹA1[m,u]または分離成分ＹA2[m,u]）を生成する第２信号処理手段（例えば信号処理部５２）とを具備し、指向処理手段は、複数の観測信号における各第２周波数の成分から到来方向毎に指向信号を生成し、係数値生成手段は、各第２周波数について処理係数値を生成し、第１信号処理手段は、複数の観測信号における各第２周波数の成分に当該第２周波数の処理係数値を作用させて第２分離成分（例えば分離成分ＹB1[n,u]または分離成分ＹB2[n,u]）を生成する。以上の態様では、複数の周波数のうち第１周波数については分離行列を利用した観測信号の処理で第１分離信号が生成され、複数の周波数のうち第２周波数については処理係数値を利用した観測信号の処理で第２分離信号が生成される。したがって、第２分離信号における低域側の強度を維持しながら、全部の周波数について分離行列を生成する構成と比較して音響処理装置の演算量や必要な記憶容量を削減できるという利点がある。 In a preferred embodiment of the present invention, a plurality of frequencies (for example, K frequencies F [1] to F [K]) are set as a first frequency (for example, M first frequencies FA [1] to FA [M]). Frequency selecting means (for example, frequency selecting unit 24) for selecting the second frequency (for example, N second frequencies FB [1] to FB [N]), and separating from the components of the first frequencies in the plurality of observation signals Separation matrix generation means (for example, a separation matrix generation unit 54) for generating a matrix (for example, a separation matrix W [m]) for each first frequency, and separation of the first frequency into each first frequency component in a plurality of observation signals Second signal processing means (for example, signal processing unit 52) for generating a first separated signal (for example, separated component YA1 [m, u] or separated component YA2 [m, u]) by applying a matrix; The processing means generates a directional signal for each direction of arrival from each second frequency component in the plurality of observation signals, and generates a coefficient value The generating means generates a processing coefficient value for each second frequency, and the first signal processing means performs the second separation by applying the processing coefficient value of the second frequency to each second frequency component in the plurality of observation signals. A component (eg, separated component YB1 [n, u] or separated component YB2 [n, u]) is generated. In the above aspect, the first separated signal is generated by processing the observation signal using the separation matrix for the first frequency among the plurality of frequencies, and the observation using the processing coefficient value for the second frequency among the plurality of frequencies. A second separated signal is generated by processing the signal. Therefore, there is an advantage that the calculation amount and necessary storage capacity of the sound processing device can be reduced as compared with the configuration in which the separation matrix is generated for all frequencies while maintaining the low-frequency side strength in the second separated signal.

更に好適な態様において、方向特定手段は、分離行列生成手段が各第１周波数について生成した分離行列から複数の音源の各々について音響の到来方向を推定する。以上の態様では、分離行列生成手段が第１周波数について生成した分離行列から各音響の到来方向が推定されるから、分離行列生成手段による分離行列の生成とは別個に方向特定部が各音響の到来方向を推定する構成と比較して、音響処理装置の演算量や必要な記憶容量が削減されるという利点がある。 In a further preferred aspect, the direction specifying means estimates the sound arrival direction for each of the plurality of sound sources from the separation matrix generated by the separation matrix generation means for each first frequency. In the above aspect, since the direction of arrival of each sound is estimated from the separation matrix generated by the separation matrix generation unit for the first frequency, the direction specifying unit separates each sound from the generation of the separation matrix by the separation matrix generation unit. Compared with the configuration in which the direction of arrival is estimated, there is an advantage that the calculation amount and necessary storage capacity of the sound processing device are reduced.

本発明の好適な態様の音響処理装置は、複数の観測信号における各周波数の成分から分離行列を生成する学習処理の有意性を示す有意指標値を周波数毎に算定する指標算定手段（例えば指標算定部２６）を具備し、周波数選別手段は、各周波数の有意指標値に応じて複数の周波数を第１周波数と第２周波数とに選別する。以上の態様では、学習処理（例えば独立成分分析）の有意性を示す有意指標値に応じて複数の周波数が選別されるから、複数の周波数を学習処理の有意性とは無関係に選別する構成と比較して、高精度な音源分離が可能な分離行列を生成することが可能である。 The acoustic processing device according to a preferred aspect of the present invention is an index calculation means (for example, index calculation) that calculates a significant index value indicating the significance of learning processing for generating a separation matrix from components of each frequency in a plurality of observation signals for each frequency. And a frequency sorting means sorts a plurality of frequencies into a first frequency and a second frequency according to a significant index value of each frequency. In the above aspect, since the plurality of frequencies are selected according to the significance index value indicating the significance of the learning process (for example, independent component analysis), the plurality of frequencies are selected regardless of the significance of the learning process. In comparison, it is possible to generate a separation matrix capable of highly accurate sound source separation.

以上の各態様の音響処理装置は、音声の処理に専用されるＤＳＰ（Digital Signal Processor）などのハードウェア（電子回路）によって実現されるほか、ＣＰＵ（Central Processing Unit）などの汎用の演算処理装置とプログラムとの協働によっても実現される。本発明に係るプログラムは、複数の音源（例えば音源ＰS1および音源ＰS2）から到来する音響（例えば音響Ｓ1および音響Ｓ2）の混合音を複数の収音機器（例えば収音機器ＰM1および収音機器ＰM2）で収音した複数の観測信号（例えば観測信号ｘ1(t)および観測信号ｘ2(t)）を処理するためのプログラムであって、複数の音源の各々について音響の到来方向（例えば到来方向θe1および到来方向θe2）を特定する方向特定処理（例えば方向特定部６２）と、方向特定処理で特定した複数の到来方向の各々について、当該到来方向に収音の死角を形成する死角制御型ビーム形成を複数の観測信号について実行することで指向信号（例えば指向成分Ｚ1[n,u]および指向成分Ｚ2[n,u]）を生成する指向処理（例えば指向処理部６４）と、指向処理で生成した複数の指向信号の振幅の加算値に対する一の指向信号の振幅の比に応じた処理係数値（例えば処理係数値αi[n,u]）を周波数毎に生成する係数値生成処理（例えば係数値生成部６６）と、観測信号の各周波数の成分に当該周波数の処理係数値を作用させる第１信号処理（例えば信号処理部６８）とをコンピュータに実行させる。以上のプログラムによれば、本発明に係る音響処理装置と同様の作用および効果が奏される。本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で利用者に提供されてコンピュータにインストールされるほか、通信網を介した配信の形態でサーバ装置から提供されてコンピュータにインストールされる。 The acoustic processing apparatus of each aspect described above is realized by hardware (electronic circuit) such as a DSP (Digital Signal Processor) dedicated to voice processing, and a general-purpose arithmetic processing apparatus such as a CPU (Central Processing Unit). This is also realized through collaboration with programs. The program according to the present invention converts a mixed sound of sounds (for example, sound S1 and sound S2) coming from a plurality of sound sources (for example, sound source PS1 and sound source PS2) to a plurality of sound collection devices (for example, sound collection device PM1 and sound collection device PM2). ) Is a program for processing a plurality of observation signals (for example, observation signal x1 (t) and observation signal x2 (t)), and the sound arrival direction (for example, arrival direction θe1) for each of the plurality of sound sources. And a direction-of-arrival θe2) direction specifying process (for example, the direction specifying unit 62) and a blind spot control type beam forming that forms a dead angle of sound collection in the direction of arrival for each of a plurality of directions of arrival specified by the direction specifying process. Is executed for a plurality of observation signals to generate a directional signal (for example, directional component Z1 [n, u] and directional component Z2 [n, u]) and a directional process to generate Multiple Coefficient value generation processing (for example, coefficient value generation unit) that generates a processing coefficient value (for example, processing coefficient value αi [n, u]) corresponding to the ratio of the amplitude of one directional signal to the sum of the directional signal amplitude values 66) and the first signal processing (for example, the signal processing unit 68) for causing the processing coefficient value of the frequency to act on each frequency component of the observation signal. According to the above program, the same operation and effect as the sound processing apparatus according to the present invention are exhibited. The program of the present invention is provided to a user in a form stored in a computer-readable recording medium and installed in the computer, or provided from a server device in a form of distribution via a communication network and installed in the computer. Is done.

第１実施形態の音響処理装置のブロック図である。1 is a block diagram of a sound processing apparatus according to a first embodiment. 第１音源分離部および第２音源分離部のブロック図である。It is a block diagram of a 1st sound source separation part and a 2nd sound source separation part. 指向処理部のブロック図である。It is a block diagram of an orientation processing unit. 音源分離の前後の振幅スペクトルである。It is an amplitude spectrum before and after sound source separation. 学習処理の対象となる第１周波数の個数と音源分離の精度（セグメンタルＳＮＲ）との関係を示すグラフである。It is a graph which shows the relationship between the number of the 1st frequency used as the object of a learning process, and the precision (segmental SNR) of sound source separation. 学習処理の対象となる第１周波数の個数と音源分離の精度（ＳＩＲの変化量）との関係を示すグラフである。It is a graph which shows the relationship between the number of the 1st frequency used as the object of a learning process, and the precision (SIR variation | change_quantity) of sound source separation. 第２実施形態の音響処理装置のブロック図である。It is a block diagram of the sound processing apparatus of 2nd Embodiment.

＜Ａ：第１実施形態＞
図１は、第１実施形態に係る音響処理装置１００Aのブロック図である。相互に間隔をあけて配置された収音機器ＰM1および収音機器ＰM2が音響処理装置１００Aに接続される。収音機器ＰM1および収音機器ＰM2は、例えば無指向性または指向性のマイクロホンである。収音機器ＰM1および収音機器ＰM2の周辺の相異なる位置には音源ＰS1および音源ＰS2が存在する。音源ＰS1は、観測点（例えば収音機器ＰM1と収音機器ＰM2との中点）に対して方向θ1に位置し、音源ＰS2は観測点に対して方向θ2に位置する。 <A: First Embodiment>
FIG. 1 is a block diagram of a sound processing apparatus 100A according to the first embodiment. The sound collection device PM1 and the sound collection device PM2 that are arranged at a distance from each other are connected to the sound processing apparatus 100A. The sound collection device PM1 and the sound collection device PM2 are non-directional or directional microphones, for example. The sound source PS1 and the sound source PS2 exist at different positions around the sound collection device PM1 and the sound collection device PM2. The sound source PS1 is located in the direction θ1 with respect to the observation point (for example, the midpoint between the sound collection devices PM1 and PM2), and the sound source PS2 is located in the direction θ2 with respect to the observation point.

音源ＰS1が発生した音響Ｓ1と音源ＰS2が発生した音響Ｓ2との混合音が収音機器ＰM1および収音機器ＰM2に到達する。収音機器ＰM1は観測信号ｘ1(t)を生成し、収音機器ＰM2は観測信号ｘ2(t)を生成する。観測信号ｘ1(t)および観測信号ｘ2(t)の各々は、音響Ｓ1と音響Ｓ2との混合音の時間波形を表す音響信号である（ｔ：時間）。 The mixed sound of the sound S1 generated by the sound source PS1 and the sound S2 generated by the sound source PS2 reaches the sound collection device PM1 and the sound collection device PM2. The sound collection device PM1 generates an observation signal x1 (t), and the sound collection device PM2 generates an observation signal x2 (t). Each of the observation signal x1 (t) and the observation signal x2 (t) is an acoustic signal representing a time waveform of a mixed sound of the sound S1 and the sound S2 (t: time).

音響処理装置１００Aは、観測信号ｘ1(t)および観測信号ｘ2(t)に対する音源分離で分離信号ｙ1(t)および分離信号ｙ2(t)を生成する信号処理装置である。分離信号ｙ1(t)は、音響Ｓ1を強調（音響Ｓ2を抑制）した音響信号であり、分離信号ｙ2(t)は、音響Ｓ2を強調（音響Ｓ1を抑制）した音響信号である。すなわち、音響Ｓ1と音響Ｓ2とが分離（音源分離）される。 The acoustic processing device 100A is a signal processing device that generates a separated signal y1 (t) and a separated signal y2 (t) by sound source separation for the observed signal x1 (t) and the observed signal x2 (t). The separated signal y1 (t) is an acoustic signal that emphasizes the sound S1 (suppresses the sound S2), and the separated signal y2 (t) is an acoustic signal that emphasizes the sound S2 (suppresses the sound S1). That is, the sound S1 and the sound S2 are separated (sound source separation).

分離信号ｙ1(t)および分離信号ｙ2(t)は、スピーカやヘッドホン等の放音機器（図示略）に供給されることで音響として再生される。なお、分離信号ｙ1(t)および分離信号ｙ2(t)の一方のみを生成する構成（例えば分離信号ｙ2(t)を雑音として破棄する構成）も採用される。また、観測信号ｘ1(t)および観測信号ｘ2(t)をアナログからデジタルに変換するＡ/Ｄ変換器や、分離信号ｙ1(t)および分離信号ｙ2(t)をデジタルからアナログに変換するＤ/Ａ変換器の図示は便宜的に省略した。 The separated signal y1 (t) and the separated signal y2 (t) are reproduced as sound by being supplied to a sound emitting device (not shown) such as a speaker or a headphone. A configuration that generates only one of the separated signal y1 (t) and the separated signal y2 (t) (for example, a configuration that discards the separated signal y2 (t) as noise) is also employed. Also, an A / D converter that converts the observation signal x1 (t) and the observation signal x2 (t) from analog to digital, and a D that converts the separation signal y1 (t) and separation signal y2 (t) from digital to analog. The illustration of the / A converter is omitted for convenience.

図１に示すように、音響処理装置１００Aは、演算処理装置１２と記憶装置１４とを具備するコンピュータシステムで実現される。記憶装置１４は、演算処理装置１２が実行するプログラムや演算処理装置１２が使用する各種の情報を記憶する。磁気記録媒体や半導体記録媒体等の公知の記録媒体または複数種の記録媒体の組合せが記憶装置１４として任意に採用される。観測信号ｘ1(t)および観測信号ｘ2(t)を事前に収録して記憶装置１４に格納した構成（したがって収音機器ＰM1および収音機器ＰM2は省略される）も好適である。 As shown in FIG. 1, the sound processing device 100 </ b> A is realized by a computer system including an arithmetic processing device 12 and a storage device 14. The storage device 14 stores programs executed by the arithmetic processing device 12 and various types of information used by the arithmetic processing device 12. A known recording medium such as a magnetic recording medium or a semiconductor recording medium or a combination of a plurality of types of recording media is arbitrarily employed as the storage device 14. A configuration in which the observation signal x1 (t) and the observation signal x2 (t) are recorded in advance and stored in the storage device 14 (therefore, the sound collection device PM1 and the sound collection device PM2 are omitted) is also preferable.

演算処理装置１２は、記憶装置１４に格納されたプログラムを実行することで複数の要素（周波数解析部２２，周波数選別部２４，指標算定部２６，第１音源分離部３１，第２音源分離部３２，周波数統合部４２，波形合成部４４）として機能する。なお、演算処理装置１２の各機能を複数の集積回路に分散した構成や、専用の電子回路（DSP）が各機能を実現する構成も採用され得る。 The arithmetic processing unit 12 executes a program stored in the storage device 14 to thereby execute a plurality of elements (frequency analysis unit 22, frequency selection unit 24, index calculation unit 26, first sound source separation unit 31, second sound source separation unit). 32, a frequency integration unit 42, and a waveform synthesis unit 44). A configuration in which each function of the arithmetic processing unit 12 is distributed over a plurality of integrated circuits, or a configuration in which a dedicated electronic circuit (DSP) realizes each function may be employed.

周波数解析部２２は、観測信号ｘ1(t)を周波数軸上の周波数Ｆ[k]（ｋ＝１〜Ｋ）毎（帯域毎）に区分したＫ個の周波数成分Ｘ1[k,u]（Ｘ1[1,u]〜Ｘ1[K,u]）と、観測信号ｘ2(t)を周波数Ｆ[k]毎に区分したＫ個の周波数成分Ｘ2[k,u]（Ｘ2[1,u]〜Ｘ2[K,u]）とを単位期間（フレーム）毎に順次に生成する。記号ｋは周波数軸上の各周波数を示す変数であり、記号ｕは時間軸上の各時点を示す変数（例えば単位期間の番号）である。各周波数成分Ｘ1[k,u]および各周波数成分Ｘ2[k,u]の生成には、例えば短時間フーリエ変換等の公知の周波数解析が任意に採用される。また、通過帯域が相違するＫ個の帯域通過フィルタ（フィルタバンク）を周波数解析部２２として利用することも可能である。周波数成分Ｘ1[k,u]と周波数成分Ｘ2[k,u]とを要素とする観測ベクトルＸv[k,u]（Ｘv[k,u]＝［Ｘ1[k,u]，Ｘ2[k,u]］^T）が順次に記憶装置１４に格納される。なお、記号Ｔは行列の転置を意味する。 The frequency analysis unit 22 divides the observation signal x1 (t) into K frequency components X1 [k, u] (X1) obtained by dividing the observation signal x1 (t) for each frequency F [k] (k = 1 to K) (for each band) on the frequency axis. [1, u] to X1 [K, u]) and K frequency components X2 [k, u] (X2 [1, u] to X) obtained by dividing the observation signal x2 (t) for each frequency F [k] X2 [K, u]) are sequentially generated for each unit period (frame). Symbol k is a variable indicating each frequency on the frequency axis, and symbol u is a variable indicating each time point on the time axis (for example, a unit period number). For generating each frequency component X1 [k, u] and each frequency component X2 [k, u], for example, known frequency analysis such as short-time Fourier transform is arbitrarily employed. Further, K band pass filters (filter banks) having different pass bands can be used as the frequency analysis unit 22. Observation vector Xv [k, u] (Xv [k, u] = [X1 [k, u], X2 [k, u] having frequency component X1 [k, u] and frequency component X2 [k, u] as elements) u]] ^T ) are sequentially stored in the storage device 14. Note that the symbol T means transposition of a matrix.

周波数選別部２４は、Ｋ個の周波数Ｆ[1]〜Ｆ[K]をＭ個の第１周波数ＦA[1]〜ＦA[M]とＮ個の第２周波数ＦB[1]〜ＦB[N]とに単位期間毎に選別する（ＭおよびＮは自然数。Ｋ＝Ｍ＋Ｎ）。周波数解析部２２が生成したＫ個の周波数成分Ｘ1[1,u]〜Ｘ1[K,u]のうち各第１周波数ＦA[m]（ｍ＝１〜Ｍ）のＭ個の周波数成分ＸA1[1,u]〜ＸA1[M,u]は第１音源分離部３１に供給され、各第２周波数ＦB[n]（ｎ＝１〜Ｎ）のＮ個の周波数成分ＸB1[1,u]〜ＸB1[N,u]は第２音源分離部３２に供給される。同様に、Ｋ個の周波数成分Ｘ2[1,u]〜Ｘ2[K,u]のうち各第１周波数ＦA[m]のＭ個の周波数成分ＸA2[1,u]〜ＸA2[M,u]は第１音源分離部３１に供給され、各第２周波数ＦB[n]のＮ個の周波数成分ＸB2[1,u]〜ＸB2[N,u]は第２音源分離部３２に供給される。 The frequency selection unit 24 selects the K frequencies F [1] to F [K] from the M first frequencies FA [1] to FA [M] and the N second frequencies FB [1] to FB [N]. ] For each unit period (M and N are natural numbers, K = M + N). Of the K frequency components X1 [1, u] to X1 [K, u] generated by the frequency analysis unit 22, M frequency components XA1 [of the first frequencies FA [m] (m = 1 to M) are included. 1, u] to XA1 [M, u] are supplied to the first sound source separation unit 31, and N frequency components XB1 [1, u] to the second frequencies FB [n] (n = 1 to N) are provided. XB1 [N, u] is supplied to the second sound source separation unit 32. Similarly, M frequency components XA2 [1, u] to XA2 [M, u] of each first frequency FA [m] among K frequency components X2 [1, u] to X2 [K, u]. Is supplied to the first sound source separation unit 31, and N frequency components XB2 [1, u] to XB2 [N, u] of each second frequency FB [n] are supplied to the second sound source separation unit 32.

指標算定部２６は、周波数選別部２４による周波数Ｆ[k]の選別の基準となる有意指標値σ[k]（σ[1]〜σ[K]）をＫ個の周波数Ｆ[1]〜Ｆ[K]の各々について算定する。有意指標値σ[k]の算定は所定の周期（例えば単位期間の所定個毎）で実行される。有意指標値σ[k]は、その周波数Ｆ[k]の観測ベクトルＸv[k,u]から分離行列を生成する学習処理（独立成分分析）の有意性の尺度となる数値である。第１実施形態の指標算定部２６は、所定個の単位期間にわたる周波数Ｆ[k]の観測ベクトルＸv[k,u]の時系列の共分散行列Ｒxx[k]（Ｒxx[k]＝Ｅ[Ｘv[k,u]Ｘv[k,u]^H]）の行列式をその周波数Ｆ[k]の有意指標値σ[k]として算定する。記号Ｈは行列の共役転置を意味し、記号Ｅ[ ]は所定個の単位期間にわたる平均値（期待値）または加算値を意味する。なお、共分散行列Ｒxx[k]の行列式の算定については特許文献１に詳述されている。 The index calculation unit 26 sets the significant index value σ [k] (σ [1] to σ [K]), which serves as a reference for selection of the frequency F [k] by the frequency selection unit 24, to K frequencies F [1] to Calculate for each of F [K]. The calculation of the significant index value σ [k] is executed at a predetermined cycle (for example, every predetermined unit period). The significant index value σ [k] is a numerical value that is a measure of the significance of learning processing (independent component analysis) that generates a separation matrix from the observation vector Xv [k, u] of the frequency F [k]. The index calculation unit 26 according to the first embodiment uses a time-series covariance matrix Rxx [k] (Rxx [k] = E [) of the observation vector Xv [k, u] of the frequency F [k] over a predetermined number of unit periods. Xv [k, u] Xv [k, u] ^H ]) is calculated as a significant index value σ [k] of the frequency F [k]. The symbol H means conjugate transpose of the matrix, and the symbol E [] means an average value (expected value) or an added value over a predetermined number of unit periods. The calculation of the determinant of the covariance matrix Rxx [k] is described in detail in Patent Document 1.

周波数選別部２４は、指標算定部２６が算定した各有意指標値σ[k]に応じてＫ個の周波数Ｆ[1]〜Ｆ[K]をＭ個の第１周波数ＦA[1]〜ＦA[M]とＮ個の第２周波数ＦB[1]〜ＦB[N]とに選別する。特許文献１に開示されるように、共分散行列Ｒxx[k]の行列式が小さいほど、観測ベクトルＸv[k,u]を適用した行列処理の有意性（学習処理の前後で音源分離の精度が向上する度合）が低いという傾向がある。そこで、周波数選別部２４は、Ｋ個の周波数Ｆ[1]〜Ｆ[K]のうち有意指標値σ[k]が大きいＭ個（例えば有意指標値σ[k]の降順で上位のＭ個や有意指標値σ[k]が所定の閾値を上回るＭ個）の周波数Ｆ[k]を第１周波数ＦA[1]〜ＦA[M]に選別し、有意指標値σ[k]が小さいＮ個の周波数Ｆ[k]を第２周波数ＦB[1]〜ＦB[N]に選別する。 The frequency selecting unit 24 selects the K frequencies F [1] to F [K] from the M first frequencies FA [1] to FA according to each significant index value σ [k] calculated by the index calculating unit 26. [M] and N second frequencies FB [1] to FB [N] are selected. As disclosed in Patent Document 1, the smaller the determinant of the covariance matrix Rxx [k], the more significant the matrix processing to which the observation vector Xv [k, u] is applied (the accuracy of sound source separation before and after the learning processing). Tend to be low). Therefore, the frequency selection unit 24 has M significant sign values [sigma] [k] among the K frequencies F [1] to F [K] (for example, the top M numbers in descending order of the significant index value [sigma] [k]). Or the frequency F [k] of which the significant index value σ [k] exceeds a predetermined threshold) is selected as the first frequencies FA [1] to FA [M], and the significant index value σ [k] is small N The individual frequencies F [k] are selected as the second frequencies FB [1] to FB [N].

図１の第１音源分離部３１は、各周波数成分ＸA1[m,u]および各周波数成分ＸA2[m,u]に対して独立成分分析を適用した音源分離を実行することで、各第１周波数ＦA[m]に対応するＭ個の分離成分ＹA1[1,u]〜ＹA1[M,u]とＭ個の分離成分ＹA2[1,u]〜ＹA2[M,u]とを単位期間毎に生成する。分離成分ＹA1[m,u]は、第１周波数ＦA[m]における音響Ｓ1の成分を強調（音響Ｓ2を抑制）した周波数成分であり、分離成分ＹA2[m,u]は、第２周波数ＦB[m]における音響Ｓ2の成分を強調（音響Ｓ1を抑制）した周波数成分である。 The first sound source separation unit 31 in FIG. 1 performs each sound source separation by applying independent component analysis to each frequency component XA1 [m, u] and each frequency component XA2 [m, u]. M separation components YA1 [1, u] to YA1 [M, u] and M separation components YA2 [1, u] to YA2 [M, u] corresponding to the frequency FA [m] for each unit period. To generate. The separated component YA1 [m, u] is a frequency component in which the component of the sound S1 at the first frequency FA [m] is emphasized (the sound S2 is suppressed), and the separated component YA2 [m, u] is the second frequency FB. This is a frequency component in which the component of the sound S2 in [m] is emphasized (the sound S1 is suppressed).

第２音源分離部３２は、第１音源分離部３１とは相違する信号処理を各周波数成分ＸB1[n,u]および各周波数成分ＸB2[n,u]に対して実行することで、各第２周波数ＦB[m]に対応するＮ個の分離成分ＹB1[1,u]〜ＹB1[N,u]とＮ個の分離成分ＹB2[1,u]〜ＹB2[N,u]とを単位期間毎に生成する。分離成分ＹB1[n,u]は、第２周波数ＦB[n]における音響Ｓ1の成分を強調（音響Ｓ2を抑制）した周波数成分であり、分離成分ＹB2[n,u]は、第２周波数ＦB[n]における音響Ｓ2の成分を強調（音響Ｓ1を抑制）した周波数成分である。 The second sound source separation unit 32 performs the signal processing different from that of the first sound source separation unit 31 on each frequency component XB1 [n, u] and each frequency component XB2 [n, u], so A unit period of N separated components YB1 [1, u] to YB1 [N, u] and N separated components YB2 [1, u] to YB2 [N, u] corresponding to two frequencies FB [m] Generate every time. The separated component YB1 [n, u] is a frequency component in which the component of the sound S1 at the second frequency FB [n] is emphasized (the sound S2 is suppressed), and the separated component YB2 [n, u] is the second frequency FB. This is a frequency component in which the component of the sound S2 in [n] is emphasized (the sound S1 is suppressed).

周波数統合部４２は、第１音源分離部３１が生成したＭ個の分離成分ＹA1[1,u]〜ＹA1[M,u]と第２音源分離部３２が生成したＮ個の分離成分ＹB1[1,u]〜ＹB1[N,u]とを周波数の順番に配列（統合）することでＫ個の分離成分Ｙ1[1,u]〜Ｙ1[K,u]を単位期間毎に生成する。同様に、周波数統合部４２は、Ｍ個の分離成分ＹA2[1,u]〜ＹA2[M,u]とＮ個の分離成分ＹB2[1,u]〜ＹB2[N,u]とを配列したＫ個の分離成分Ｙ2[1,u]〜Ｙ2[K,u]を単位期間毎に生成する。 The frequency integration unit 42 includes M separation components YA1 [1, u] to YA1 [M, u] generated by the first sound source separation unit 31 and N separation components YB1 [generated by the second sound source separation unit 32. 1, u] to YB1 [N, u] are arranged (integrated) in the order of frequencies to generate K separation components Y1 [1, u] to Y1 [K, u] for each unit period. Similarly, the frequency integration unit 42 arranges M separated components YA2 [1, u] to YA2 [M, u] and N separated components YB2 [1, u] to YB2 [N, u]. K separation components Y2 [1, u] to Y2 [K, u] are generated for each unit period.

波形合成部４４は、周波数統合部４２が単位期間毎に生成するＫ個の分離成分Ｙ1[1,u]〜Ｙ1[K,u]から時間領域の分離信号ｙ1(t)を生成する。具体的には、波形合成部４４は、Ｋ個の分離成分Ｙ1[1,u]〜Ｙ1[K,u]の系列（周波数スペクトル）を逆フーリエ変換で時間領域に変換するとともに前後の単位期間について相互に連結することで分離信号ｙ1(t)を生成する。同様に、波形合成部４４は、周波数統合部４２が単位期間毎に生成するＫ個の分離成分Ｙ2[1,u]〜Ｙ2[K,u]から分離信号ｙ2(t)を生成する。 The waveform synthesizer 44 generates a time-domain separated signal y1 (t) from the K separated components Y1 [1, u] to Y1 [K, u] generated by the frequency integrating unit 42 for each unit period. Specifically, the waveform synthesizer 44 converts the sequence (frequency spectrum) of the K separated components Y1 [1, u] to Y1 [K, u] into the time domain by inverse Fourier transform, and before and after the unit period. Are separated from each other to generate a separation signal y1 (t). Similarly, the waveform synthesizer 44 generates a separation signal y2 (t) from the K separation components Y2 [1, u] to Y2 [K, u] generated by the frequency integration unit 42 for each unit period.

図２は、第１音源分離部３１および第２音源分離部３２のブロック図である。図２に示すように、第１音源分離部３１は、信号処理部５２と分離行列生成部５４とを含んで構成される。信号処理部５２は、各第１周波数ＦA[m]の周波数成分ＸA1[m,u]および周波数成分ＸA2[m,u]にその第１周波数ＦA[m]の分離行列Ｗ[m]を作用させることで分離成分ＹA1[m,u]および分離成分ＹA2[m,u]を生成する。具体的には、信号処理部５２は、Ｍ個の第１周波数ＦA[1]〜ＦA[M]の各々について以下の数式(1)の演算（音源分離）を実行する。

FIG. 2 is a block diagram of the first sound source separation unit 31 and the second sound source separation unit 32. As shown in FIG. 2, the first sound source separation unit 31 includes a signal processing unit 52 and a separation matrix generation unit 54. The signal processing unit 52 applies the separation matrix W [m] of the first frequency FA [m] to the frequency component XA1 [m, u] and the frequency component XA2 [m, u] of each first frequency FA [m]. By doing so, the separation component YA1 [m, u] and the separation component YA2 [m, u] are generated. Specifically, the signal processing unit 52 performs calculation (sound source separation) of the following formula (1) for each of the M first frequencies FA [1] to FA [M].

図２の分離行列生成部５４は、信号処理部５２が数式(1)の音源分離に適用する分離行列Ｗ[m]（Ｗ[1]〜Ｗ[M]）をＭ個の第１周波数ＦA[1]〜ＦA[M]の各々について単位期間毎に生成する。分離行列Ｗ[m]の生成には独立成分分析を適用した学習処理（分離行列Ｗ[m]の累積的な更新）が採用される。分離行列Ｗ[m]の学習処理には公知の技術が任意に採用され得るが、第ｐ回目の更新後の分離行列Ｗ_p[m]から直後の分離行列Ｗ_p+1[m]を算定する以下の数式(2)の演算が好適である。なお、最初の分離行列Ｗ₁[m]の算定には所定の初期行列Ｗ₀[m]（例えば単位行列）が適用される。

The separation matrix generation unit 54 in FIG. 2 converts the separation matrix W [m] (W [1] to W [M]) that the signal processing unit 52 applies to the sound source separation of Expression (1) to M first frequencies FA. Each of [1] to FA [M] is generated for each unit period. For the generation of the separation matrix W [m], a learning process (cumulative update of the separation matrix W [m]) to which independent component analysis is applied is employed. Although a known technique can be arbitrarily employed for the learning process of the separation matrix W [m], the immediately following separation matrix W _{p + 1} [m] is calculated from the p-th updated separation matrix W _p [m]. The following equation (2) is preferable. A predetermined initial matrix W ₀ [m] (for example, a unit matrix) is applied to the calculation of the first separation matrix W ₁ [m].

数式(2)の記号ηは所定の定数（ステップサイズ）を意味し、記号off-diag( )は、対角成分をゼロに置換する演算子を意味する。また、記号φ[m,n]は所定の非線形関数（例えば双曲線正接関数）である。数式(2)の記号Ｙv_p[m,u]は、周波数成分ＸA1[m,u]および周波数成分ＸA2[m,u]に分離行列Ｗ_p[m]を作用させる数式(1)の演算で算定されるベクトル（Ｙv_p[m,u]＝［ＹA1[m,u]，ＹA2[m,u]］^T）を意味する。分離行列生成部５４は、数式(2)の演算を所定回だけ反復した時点の分離行列Ｗ_p+1[m]を分離行列Ｗ[m]として確定する。以上が第１音源分離部３１の構成および作用である。 The symbol η in Equation (2) means a predetermined constant (step size), and the symbol off-diag () means an operator that replaces the diagonal component with zero. The symbol φ [m, n] is a predetermined nonlinear function (for example, a hyperbolic tangent function). The symbol Yv _p [m, u] in Equation (2) is an operation of Equation (1) that causes the separation matrix W _p [m] to act on the frequency component XA1 [m, u] and the frequency component XA2 [m, u]. This means the calculated vector (Yv _p [m, u] = [YA1 [m, u], YA2 [m, u]] ^T ). The separation matrix generation unit 54 determines the separation matrix W _{p + 1} [m] at the point of time when the calculation of Expression (2) is repeated a predetermined number of times as the separation matrix W [m]. The above is the configuration and operation of the first sound source separation unit 31.

図２に示すように、第２音源分離部３２は、演算処理部６０と信号処理部６８とを含んで構成される。演算処理部６０は、Ｎ個の第２周波数ＦB[1]〜ＦB[N]の各々について処理係数値α1[n,u]（α1[1,u]〜α1[N,u]）と処理係数値α2[n,u]（α2[1,u]〜α2[N,u]）とを設定する。処理係数値α1[n,u]および処理係数値α2[n,u]の算定は所定の周期毎（例えば単位期間毎）に実行される。 As shown in FIG. 2, the second sound source separation unit 32 includes an arithmetic processing unit 60 and a signal processing unit 68. The arithmetic processing unit 60 processes the processing coefficient value α1 [n, u] (α1 [1, u] to α1 [N, u]) for each of the N second frequencies FB [1] to FB [N]. The coefficient value α2 [n, u] (α2 [1, u] to α2 [N, u]) is set. The calculation of the processing coefficient value α1 [n, u] and the processing coefficient value α2 [n, u] is executed every predetermined cycle (for example, every unit period).

各第２周波数ＦB[n]の処理係数値α1[n,u]および処理係数値α2[n,u]は、音響Ｓ1のうちその第２周波数ＦB[n]の周波数成分Ｓ1[n,u]の振幅|Ｓ1[n,u]|と、音響Ｓ2のうちその第２周波数ＦB[n]の周波数成分Ｓ2[n,u]の振幅|Ｓ2[n,u]|との関係（大小）に応じて０以上かつ１以下の範囲内で可変に設定される。具体的には、音響Ｓ1の振幅|Ｓ1[n,u]|が振幅|Ｓ2[n,u]|に対して大きいほど処理係数値α1[n,u]は大きい数値に設定され、音響Ｓ2の振幅|Ｓ2[n,u]|が振幅|Ｓ1[n,u]|に対して大きいほど処理係数値α2[n,u]は大きい数値に設定される。 The processing coefficient value α1 [n, u] and the processing coefficient value α2 [n, u] of each second frequency FB [n] are the frequency components S1 [n, u of the second frequency FB [n] of the sound S1. ] | S1 [n, u] | and the amplitude | S2 [n, u] | of the frequency component S2 [n, u] of the second frequency FB [n] of the sound S2 (large or small) Accordingly, it is variably set within the range of 0 or more and 1 or less. Specifically, the processing coefficient value α1 [n, u] is set to a larger value as the amplitude | S1 [n, u] | of the sound S1 is larger than the amplitude | S2 [n, u] | The processing coefficient value α2 [n, u] is set to a larger numerical value as the amplitude | S2 [n, u] | is larger than the amplitude | S1 [n, u] |.

図２の信号処理部６８は、各第２周波数ＦB[n]の周波数成分ＸB1[n,u]および周波数成分ＸB2[n,u]にその第２周波数ＦB[n]の処理係数値α1[n,u]および処理係数値α2[n,u]を作用させることで分離成分ＹB1[n,u]および分離成分ＹB2[n,u]を単位期間毎に生成する。具体的には、信号処理部６８は、Ｎ個の第２周波数ＦB[1]〜ＦB[N]の各々について以下の数式(3A)および数式(3B)の演算を実行する。 The signal processing unit 68 in FIG. 2 applies the processing coefficient value α1 [of the second frequency FB [n] to the frequency component XB1 [n, u] and the frequency component XB2 [n, u] of each second frequency FB [n]. By applying n, u] and the processing coefficient value α2 [n, u], the separation component YB1 [n, u] and the separation component YB2 [n, u] are generated for each unit period. Specifically, the signal processing unit 68 performs the following calculations (3A) and (3B) for each of the N second frequencies FB [1] to FB [N].

すなわち、周波数成分ＸB1[n,u]に対する処理係数値α1[n,u]の乗算で、音響Ｓ1の周波数成分Ｓ1[n,u]を強調した分離成分ＹB1[n,u]が生成され、周波数成分ＸB2[n,u]に対する処理係数値α2[n,u]の乗算で、音響Ｓ2の周波数成分Ｓ2[n,u]を強調した分離成分ＹB2[n,u]が生成される。したがって、処理係数値α1[n,u]は周波数成分ＸB1[n,u]に対する利得（スペクトルゲイン）に相当し、処理係数値α2[n,u]は周波数成分ＸB2[n,u]に対する利得に相当する。 That is, by separating the frequency component XB1 [n, u] by the processing coefficient value α1 [n, u], a separated component YB1 [n, u] in which the frequency component S1 [n, u] of the sound S1 is emphasized is generated. A separated component YB2 [n, u] in which the frequency component S2 [n, u] of the sound S2 is emphasized is generated by multiplying the frequency component XB2 [n, u] by the processing coefficient value α2 [n, u]. Therefore, the processing coefficient value α1 [n, u] corresponds to a gain (spectral gain) for the frequency component XB1 [n, u], and the processing coefficient value α2 [n, u] is a gain for the frequency component XB2 [n, u]. It corresponds to.

図２に示すように、演算処理部６０は、方向特定部６２と指向処理部６４と係数値生成部６６とを含んで構成される。方向特定部６２は、音響Ｓ1の到来方向（音源ＰS1の方向）θe1と音響Ｓ2の到来方向（音源ＰS2の方向）θe2とを特定する。なお、以下の説明において符号の添字ｅは推定値（estimate）を意味する。 As shown in FIG. 2, the arithmetic processing unit 60 includes a direction specifying unit 62, a directing processing unit 64, and a coefficient value generating unit 66. The direction specifying unit 62 specifies the arrival direction of the sound S1 (the direction of the sound source PS1) θe1 and the arrival direction of the sound S2 (the direction of the sound source PS2) θe2. In the following description, the subscript “e” denotes an estimated value (estimate).

第１実施形態の方向特定部６２は、分離行列生成部５４が第１周波数ＦA[m]毎に生成する分離行列Ｗ[m]（Ｗ[1]〜Ｗ[M]）を利用して到来方向θe1および到来方向θe2を推定する。到来方向θe1および到来方向θe2の推定には公知の技術（例えば非特許文献１に開示された方法）が任意に採用される。例えば、方向特定部６２は、各第１周波数ＦA[m]の分離行列Ｗ[m]から音響Ｓ1の到来方向θe1[m]と音響Ｓ2の到来方向θe2[m]とを推定し、Ｍ個の到来方向θe1[1]〜θe1[M]の代表値（例えば加重和や平均値や中央値）を到来方向θe1として確定するとともにＭ個の到来方向θe2[1]〜θe2[M]の代表値を到来方向θe2として確定する。 The direction specifying unit 62 of the first embodiment arrives by using the separation matrix W [m] (W [1] to W [M]) generated by the separation matrix generation unit 54 for each first frequency FA [m]. The direction θe1 and the arrival direction θe2 are estimated. A known technique (for example, the method disclosed in Non-Patent Document 1) is arbitrarily employed for estimating the arrival direction θe1 and the arrival direction θe2. For example, the direction specifying unit 62 estimates the arrival direction θe1 [m] of the sound S1 and the arrival direction θe2 [m] of the sound S2 from the separation matrix W [m] of each first frequency FA [m], and M pieces. The representative values (for example, weighted sum, average value, and median) of the arrival directions θe1 [1] to θe1 [M] are determined as the arrival direction θe1 and are representative of the M arrival directions θe2 [1] to θe2 [M]. The value is determined as the direction of arrival θe2.

図２の指向処理部６４は、所定の方向に収音の死角（収音の感度が低い領域）を形成する処理（以下「死角制御型ビーム形成」という）を周波数成分ＸB1[n,u]および周波数成分ＸB2[n,u]に対して実行することで、指向成分Ｚ1[n,u]（Ｚ1[1,u]〜Ｚ1[N,u]）と指向成分Ｚ2[n,u]（Ｚ2[1,u]〜Ｚ2[N,u]）とを単位期間毎に生成する。具体的には、指向処理部６４は、方向特定部６２が特定した到来方向θe2に収音の死角を形成する死角制御型ビーム形成（NBF）を周波数成分ＸB1[n,u]および周波数成分ＸB2[n,u]に実行することで指向成分Ｚ1[n,u]を生成し、到来方向θe1に収音の死角を形成する死角制御型ビーム形成を周波数成分ＸB1[n,u]および周波数成分ＸB2[n,u]に実行することで指向成分Ｚ2[n,u]を生成する。したがって、指向成分Ｚ1[n,u]では到来方向θe2からの到来音（音響Ｓ2）が抑制され、指向成分Ｚ2[n,u]では到来方向θe1からの到来音（音響Ｓ1）が抑制される。 The directivity processing unit 64 in FIG. 2 performs a process (hereinafter referred to as “dead angle control beam forming”) for forming a blind spot of sound collection (an area where the sensitivity of sound collection is low) in a predetermined direction as a frequency component XB1 [n, u]. And the directional component Z1 [n, u] (Z1 [1, u] to Z1 [N, u]) and the directional component Z2 [n, u] ( Z2 [1, u] to Z2 [N, u]) are generated for each unit period. Specifically, the directivity processing unit 64 performs blind angle control type beam forming (NBF) that forms a dead angle of sound collection in the arrival direction θe2 specified by the direction specifying unit 62, with frequency components XB1 [n, u] and frequency components XB2 [n, u] generates a directional component Z1 [n, u], and a blind spot control type beam forming that forms a dead angle of sound collection in the arrival direction θe1 is performed with a frequency component XB1 [n, u] and a frequency component. The directional component Z2 [n, u] is generated by executing it on XB2 [n, u]. Therefore, the directional component Z1 [n, u] suppresses the incoming sound (sound S2) from the arrival direction θe2, and the directional component Z2 [n, u] suppresses the incoming sound (sound S1) from the arrival direction θe1. .

図３は、指向処理部６４のブロック図である。図３には、音源ＰS1が放射した音響Ｓ1（周波数成分Ｓ1[n,u]）と音源ＰS2が放射した音響Ｓ2（周波数成分Ｓ2[n,u]）とが収音機器ＰM1および収音機器ＰM2の各々に到達するまでの伝播経路のモデルが便宜的に併記されている。 FIG. 3 is a block diagram of the directional processing unit 64. In FIG. 3, the sound S1 (frequency component S1 [n, u]) radiated by the sound source PS1 and the sound S2 (frequency component S2 [n, u]) radiated by the sound source PS2 are collected by the sound collecting device PM1 and the sound collecting device. A model of a propagation path to reach each of PM2 is shown for convenience.

図３の記号Ａi[n]（ｉ＝１,２）は、音響Ｓiの周波数成分Ｓi[n,u]の伝播損失（伝播経路で付与される利得）を意味する。なお、周波数成分Ｓi[n,u]の伝播遅延は、伝播損失Ａi[n]に反映されることを考慮して図３では省略した。図３の記号τi1は、周波数成分Ｓi[n,u]が収音機器ＰM2に到達してから収音機器ＰM1に到達するまでの遅延（時間差）を意味し、記号τi2は、周波数成分Ｓi[n,u]が収音機器ＰM1に到達してから収音機器ＰM2に到達するまでの遅延を意味する。 The symbol Ai [n] (i = 1, 2) in FIG. 3 means the propagation loss (gain gained in the propagation path) of the frequency component Si [n, u] of the acoustic Si. Note that the propagation delay of the frequency component Si [n, u] is omitted in FIG. 3 in consideration of the fact that it is reflected in the propagation loss Ai [n]. The symbol τi1 in FIG. 3 means a delay (time difference) from when the frequency component Si [n, u] reaches the sound collection device PM2 until it reaches the sound collection device PM1, and the symbol τi2 represents the frequency component Si [ n, u] means a delay from reaching the sound collection device PM1 to reaching the sound collection device PM2.

図３から理解されるように、収音機器ＰMjによる収音後の周波数成分ＸB1[n,u]および周波数成分ＸB2[n,u]は、以下の数式(4A)および数式(4B)で表現される。数式(4A)および数式(4B)の記号ω[n]は第２周波数ＦB[n]に対応する角周波数を意味し、記号ｊは虚数単位を意味する。

As understood from FIG. 3, the frequency component XB1 [n, u] and the frequency component XB2 [n, u] after sound collection by the sound collection device PMj are expressed by the following equations (4A) and (4B). Is done. In the equations (4A) and (4B), the symbol ω [n] means an angular frequency corresponding to the second frequency FB [n], and the symbol j means an imaginary unit.

図３に示すように、指向処理部６４は、指向成分Ｚ1[n,u]を生成する第１処理部７２と指向成分Ｚ2[n,u]を生成する第２処理部７４とを具備する。第１処理部７２は、周波数成分ＸB1[n,u]に遅延τe22を付与する遅延部７２１と、周波数成分ＸB2[n,u]に遅延τe21を付与する遅延部７２３と、遅延部７２１および遅延部７２３の各出力間の差分を指向成分Ｚ1[n,u]として生成する演算部７２５とを含んで構成される。同様に、第２処理部７４は、周波数成分ＸB2[n,u]に遅延τe11を付与する遅延部７４１と、周波数成分ＸB1[n,u]に遅延τe12を付与する遅延部７４３と、遅延部７４１および遅延部７４３の各出力間の差分を指向成分Ｚ2[n,u]として生成する演算部７４５とを含んで構成される。遅延τeijは、伝播経路で付与される遅延τijの推定値である。遅延τe21および遅延τe22は、到来方向θe2に収音の死角が形成されるように設定され、遅延τe11および遅延τe12は到来方向θe1に収音の死角が形成されるように設定される。 As shown in FIG. 3, the directional processing unit 64 includes a first processing unit 72 that generates a directional component Z1 [n, u] and a second processing unit 74 that generates a directional component Z2 [n, u]. . The first processing unit 72 includes a delay unit 721 that adds a delay τe22 to the frequency component XB1 [n, u], a delay unit 723 that adds a delay τe21 to the frequency component XB2 [n, u], a delay unit 721, and a delay And a calculation unit 725 that generates a difference between the outputs of the unit 723 as a directional component Z1 [n, u]. Similarly, the second processing unit 74 includes a delay unit 741 that applies a delay τe11 to the frequency component XB2 [n, u], a delay unit 743 that applies a delay τe12 to the frequency component XB1 [n, u], and a delay unit. 741 and a calculation unit 745 that generates a difference between the outputs of the delay unit 743 as a directional component Z2 [n, u]. The delay τeij is an estimated value of the delay τij given in the propagation path. The delay τe21 and the delay τe22 are set so that a sound collection dead angle is formed in the arrival direction θe2, and the delay τe11 and the delay τe12 are set so that a sound collection dead angle is formed in the arrival direction θe1.

図３から理解されるように、指向成分Ｚ1[n,u]および指向成分Ｚ2[n,u]は、以下の数式(5A)および数式(5B)で表現される。

As understood from FIG. 3, the directional component Z1 [n, u] and the directional component Z2 [n, u] are expressed by the following formulas (5A) and (5B).

数式(4A)および数式(4B)を数式(5A)に代入して変形すると以下の数式(6A)が導出される。同様に、数式(4A)および数式(4B)を数式(5B)に代入して変形すると以下の数式(6B)が導出される。

By substituting Equation (4A) and Equation (4B) into Equation (5A) for transformation, the following Equation (6A) is derived. Similarly, the following formula (6B) is derived by substituting the formula (4A) and the formula (4B) into the formula (5B) for transformation.

いま、方向特定部６２による到来方向θe1および到来方向θe2の推定の精度が充分に高い（θe1≒θ1，θe2≒θ2）と仮定すると、指向処理部６４に適用される遅延τeijを、実際の伝播経路における遅延τijで近似する（τeij≒τij）ことが可能である。したがって、数式(6A)の右辺の第２項と第４項とが相殺されて以下の数式(7A)が導出され、数式(6B)の右辺の第１項と第３項とが相殺されて以下の数式(7B)が導出される。

Assuming that the accuracy of the estimation of the arrival direction θe1 and the arrival direction θe2 by the direction specifying unit 62 is sufficiently high (θe1≈θ1, θe2≈θ2), the delay τeij applied to the directivity processing unit 64 is actually propagated. It is possible to approximate by the delay τij in the path (τeij≈τij). Therefore, the second term and the fourth term on the right side of the formula (6A) are canceled and the following formula (7A) is derived, and the first and third terms on the right side of the formula (6B) are offset. The following formula (7B) is derived.

いま、指向成分Ｚ1[n,u]の振幅|Ｚ1[n,u]|と指向成分Ｚ2[n,u]の振幅|Ｚ2[n,u]|との加算値（以下「振幅和」という）に対する指向成分Ｚ1[n,u]の振幅|Ｚ1[n,u]|の比は、数式(7A)および数式(7B)を考慮すると以下の数式(8A)のように表現される。同様に、振幅和に対する指向成分Ｚ2[n,u]の振幅|Ｚ2[n,u]|の比は、以下の数式(8B)のように表現される。

Now, the sum of the amplitude | Z1 [n, u] | of the directional component Z1 [n, u] and the amplitude | Z2 [n, u] | of the directional component Z2 [n, u] (hereinafter referred to as “amplitude sum”) The ratio of the amplitude | Z1 [n, u] | of the directional component Z1 [n, u] with respect to) is expressed as the following equation (8A) in consideration of the equations (7A) and (7B). Similarly, the ratio of the amplitude | Z2 [n, u] | of the directivity component Z2 [n, u] to the amplitude sum is expressed as in the following formula (8B).

数式(7A)および数式(7B)のうち遅延（位相）τijに関連する遅延項（後半の括弧部分）は指向成分Ｚ1[n,u]と指向成分Ｚ2[n,u]とで共通する。したがって、数式(8A)および数式(8B)では遅延項が消去される。 Of the equations (7A) and (7B), the delay term related to the delay (phase) τij (the parentheses in the latter half) is common to the directional component Z1 [n, u] and the directional component Z2 [n, u]. Therefore, the delay term is eliminated in Equation (8A) and Equation (8B).

図２の係数値生成部６６は、以下の数式(9A)および数式(9B)に示すように、振幅和に対する指向成分Ｚ1[n,u]の振幅|Ｚ1[n,u]|の比（数式(8A)）を処理係数値α1[n,u]として第２周波数ＦB[n]毎に算定し、振幅和に対する指向成分Ｚ2[n,u]の振幅|Ｚ2[n,u]|の比（数式(8B)）を処理係数値α2[n,u]として第２周波数ＦB[n]毎に算定する。

The coefficient value generation unit 66 in FIG. 2 has a ratio of the amplitude | Z1 [n, u] | of the directional component Z1 [n, u] to the sum of amplitudes as shown in the following equations (9A) and (9B) ( Equation (8A)) is calculated for each second frequency FB [n] as the processing coefficient value α1 [n, u], and the amplitude | Z2 [n, u] | of the directional component Z2 [n, u] with respect to the sum of amplitudes is calculated. The ratio (formula (8B)) is calculated for each second frequency FB [n] as the processing coefficient value α2 [n, u].

数式(8A)および数式(8B)と数式(9A)および数式(9B)とから理解されるように、処理係数値α1[n,u]および処理係数値α2[n,u]は、観測点での音響Ｓ1および音響Ｓ2の単位期間毎の振幅の内分比（観測信号ｘ1(t)および観測信号ｘ2(t)の各々に対する各音源ＰSiの寄与度）に相当する。すなわち、観測点での音響Ｓ1の振幅比が処理係数値α1[n,u]で表現され、観測点での音響Ｓ2の振幅比が処理係数値α2[n,u]で表現され得る。例えば、観測点での音響Ｓ1の振幅（Ａ1[n]|Ｓ1[n,u]|）と音響Ｓ2の振幅（Ａ2[n]|Ｓ2[n,u]|）とが相等しい場合に処理係数値α1[n,u]および処理係数値α2[n,u]は０．５となり、音響Ｓ1の振幅（Ａ1[n]|Ｓ1[n,u]|）が音響Ｓ2の振幅（Ａ2[n]|Ｓ2[n,u]|）を上回る場合には、処理係数値α1[n,u]は処理係数値α2[n,u]を上回る。したがって、数式(9A)の処理係数値α1[n,u]および数式(9B)の処理係数値α2[n,u]は観測点での音響Ｓ1と音響Ｓ2との振幅比を表現する変数として妥当である。 As understood from the equations (8A) and (8B), the equations (9A) and (9B), the processing coefficient value α1 [n, u] and the processing coefficient value α2 [n, u] Corresponds to the internal ratio of the amplitudes of the sound S1 and the sound S2 for each unit period (contribution of each sound source PSi to each of the observation signal x1 (t) and the observation signal x2 (t)). That is, the amplitude ratio of the sound S1 at the observation point can be expressed by the processing coefficient value α1 [n, u], and the amplitude ratio of the sound S2 at the observation point can be expressed by the processing coefficient value α2 [n, u]. For example, processing is performed when the amplitude of the acoustic S1 at the observation point (A1 [n] | S1 [n, u] |) and the amplitude of the acoustic S2 (A2 [n] | S2 [n, u] |) are equal. The coefficient value α1 [n, u] and the processing coefficient value α2 [n, u] are 0.5, and the amplitude of the acoustic S1 (A1 [n] | S1 [n, u] |) is the amplitude of the acoustic S2 (A2 [A2 [n, u] | n] | S2 [n, u] |), the processing coefficient value α1 [n, u] exceeds the processing coefficient value α2 [n, u]. Therefore, the processing coefficient value α1 [n, u] in the equation (9A) and the processing coefficient value α2 [n, u] in the equation (9B) are variables representing the amplitude ratio between the sound S1 and the sound S2 at the observation point. It is reasonable.

処理係数値α1[n,u]および処理係数値α2[n,u]は以上のように設定されるから、数式(9A)の処理係数値α1[n,u]を適用した数式(3A)の演算で信号処理部６８が生成する分離成分ＹB1[n,u]では音響Ｓ1の周波数成分Ｓ1[n,u]が強調され、数式(9B)の処理係数値α2[n,u]を適用した数式(3B)の演算で生成される分離成分ＹB2[n,u]では音響Ｓ2の周波数成分Ｓ2[n,u]が強調される。すなわち、Ｎ個の第２周波数ＦB[1]〜ＦB[N]の各々について音響Ｓ1（周波数成分Ｓ1[n,u]）と音響Ｓ2（周波数成分Ｓ1[n,u]）とが分離される。 Since the processing coefficient value α1 [n, u] and the processing coefficient value α2 [n, u] are set as described above, Formula (3A) to which the processing coefficient value α1 [n, u] of Formula (9A) is applied. In the separation component YB1 [n, u] generated by the signal processing unit 68 in the calculation of (2), the frequency component S1 [n, u] of the sound S1 is emphasized, and the processing coefficient value α2 [n, u] of Equation (9B) is applied. In the separated component YB2 [n, u] generated by the calculation of the mathematical formula (3B), the frequency component S2 [n, u] of the sound S2 is emphasized. That is, for each of the N second frequencies FB [1] to FB [N], the sound S1 (frequency component S1 [n, u]) and the sound S2 (frequency component S1 [n, u]) are separated. .

図４の部分(A)は、音源ＰSiが放射した音響Ｓiの振幅スペクトルであり、図４の部分(C)は、第１実施形態の構成で生成された分離信号ｙi(t)の振幅スペクトルである。図４の部分(B)は、指向処理部６４による死角制御型ビーム形成で生成された指向成分Ｚi[n,u]を分離成分ＹBi[n,u]とする構成（以下「対比例」という）で生成された分離信号ｙi(t)の振幅スペクトルである。 Part (A) in FIG. 4 is the amplitude spectrum of the acoustic Si radiated from the sound source PSi, and part (C) in FIG. 4 is the amplitude spectrum of the separated signal yi (t) generated in the configuration of the first embodiment. It is. Part (B) of FIG. 4 has a configuration in which the directional component Zi [n, u] generated by the blind spot control type beam forming by the directional processing unit 64 is a separated component YBi [n, u] (hereinafter referred to as “comparative”). ) Is an amplitude spectrum of the separated signal yi (t) generated.

指向成分Ｚi[n,u]を示す数式(7A)および数式(7B)の遅延項の各項（ｅ^{-jω[n](τ11+τ12)}，ｅ^{-jω[n](τ12+τ21)}）は角周波数ω[n]が小さいほど１に近付くから、角周波数ω[n]が小さいほど数式(7A)および数式(7B)の遅延項はゼロに近付く。したがって、指向成分Ｚi[n,u]は低域側ほど抑制される。すなわち、指向成分Ｚi[n,u]を分離成分ＹBi[n,u]として分離信号ｙi(t)を生成する対比例の構成では、図４の部分(B)からも把握されるように、分離信号ｙi(t)のうち低域側（特に０Ｈｚ〜５００Ｈｚ）の強度（振幅）が本来の音響Ｓi（部分(A)）と比較して抑制されるという問題がある。 ^Each term (e ^{−jω [n] (τ11 + τ12)} , e ^{−jω [n] (τ12 + τ21)} ) of the expression (7A) and the expression (7B) indicating the directional component Zi [n, u] Since the smaller the angular frequency ω [n], the closer to 1, the smaller the angular frequency ω [n], the closer to zero the delay terms in the equations (7A) and (7B). Therefore, the directional component Zi [n, u] is suppressed toward the lower frequency side. That is, in the comparative configuration in which the separated signal yi (t) is generated with the directional component Zi [n, u] as the separated component YBi [n, u], as can be understood from the part (B) in FIG. There is a problem that the intensity (amplitude) of the low frequency side (particularly 0 Hz to 500 Hz) of the separated signal yi (t) is suppressed as compared with the original sound Si (part (A)).

他方、第１実施形態では、指向成分Ｚ1[n,u]および指向成分Ｚ2[n,u]の振幅から算定される処理係数値αi[n,u]を周波数成分ＸBi[n,u]に作用させて分離成分ＹBi[n,u]が生成される。前述の通り、処理係数値α1[n,u]および処理係数値α2[n,u]では、数式(7A)および数式(7B)における遅延項の影響は排除されるから、図４の部分(C)からも把握されるように、分離信号ｙi(t)における低域側の強度を音響Ｓiと同等に維持することが可能である。すなわち、第１実施形態によれば、対比例と比較して高精度な音源分離が実現される（各音響Ｓiを忠実に抽出できる）という利点がある。 On the other hand, in the first embodiment, the processing coefficient value αi [n, u] calculated from the amplitudes of the directional component Z1 [n, u] and the directional component Z2 [n, u] is used as the frequency component XBi [n, u]. The separated component YBi [n, u] is generated by the action. As described above, in the processing coefficient value α1 [n, u] and the processing coefficient value α2 [n, u], the influence of the delay term in Expression (7A) and Expression (7B) is eliminated. As can be seen from C), the intensity of the low frequency side in the separated signal yi (t) can be maintained equal to that of the sound Si. That is, according to the first embodiment, there is an advantage that sound source separation with higher accuracy can be realized (each sound Si can be faithfully extracted) compared with the comparative example.

また、数式(9A)や数式(9B)から理解されるように、第２音源分離部３２の処理（処理係数値αi[n,u]の算定や数式(3A)および数式(3B)の演算）は第１音源分離部３１の処理（学習処理の反復で分離行列Ｗ[m]を生成する処理）と比較して負荷が少ない。したがって、学習処理の対象となる第１周波数ＦA[m]の個数Ｍを削減できる第１実施形態によれば、音源分離の性能を低下させずに、演算処理装置１２の処理負荷（消費電力）や記憶装置１４に必要な記憶容量を削減できるという利点がある。以上の効果は、演算処理装置１２の性能や電源容量や記憶容量が制約される可搬型の情報端末（例えば携帯電話機）に音響処理装置１００Aを搭載する場合に格別に有利である。 Further, as understood from Equation (9A) and Equation (9B), the processing of the second sound source separation unit 32 (calculation of processing coefficient value αi [n, u] and calculation of Equation (3A) and Equation (3B)) ) Is less loaded than the process of the first sound source separation unit 31 (the process of generating the separation matrix W [m] by repeating the learning process). Therefore, according to the first embodiment in which the number M of the first frequencies FA [m] to be subjected to the learning process can be reduced, the processing load (power consumption) of the arithmetic processing unit 12 without reducing the sound source separation performance. There is an advantage that the storage capacity required for the storage device 14 can be reduced. The above effects are particularly advantageous when the sound processing device 100A is mounted on a portable information terminal (for example, a mobile phone) in which the performance, power supply capacity, and storage capacity of the arithmetic processing device 12 are restricted.

学習処理の対象となる第１周波数ＦA[m]の個数Ｍと音源分離の精度とについて以下に詳述する。なお、以下の説明では、観測信号ｘi(t)を以下の数式(10A)および数式(10B)のように表現し、分離信号ｙi(t)を以下の数式(11A)および数式(11B)のように表現する。記号ｘij(t)および記号ｙij(t)は、音源ＰSiから収音機器ＰMjに到来する音響成分を意味する。

The number M of the first frequencies FA [m] to be subjected to learning processing and the accuracy of sound source separation will be described in detail below. In the following description, the observation signal xi (t) is expressed as in the following equations (10A) and (10B), and the separated signal yi (t) is expressed as in the following equations (11A) and (11B). Express as follows. The symbols xij (t) and yij (t) mean acoustic components that arrive at the sound collection device PMj from the sound source PSi.

図５および図６は、独立成分分析の学習処理で分離行列Ｗ[m]を生成する第１周波数ＦA[m]の個数Ｍ（横軸）と音源分離の評価指標（縦軸）との関係を示すグラフである。図５および図６の横軸の記号“FDICA”は、Ｋ個の周波数Ｆ[1]〜Ｆ[K]の全部（例えばＫ＝５１３）を第１周波数ＦA[m]に選別した場合（すなわち、第２音源分離部３２を省略した構成）を意味する。また、図５および図６では、第１実施形態（実線）および対比例（破線）の各々について、無響室で収録された観測信号ｘi(t)を処理した場合の結果と、残響時間が５００ミリ秒である音響室で収録された観測信号ｘi(t)を処理した場合の結果とが併記されている。 5 and 6 show the relationship between the number M (horizontal axis) of the first frequencies FA [m] for generating the separation matrix W [m] in the independent component analysis learning process and the evaluation index (vertical axis) of the sound source separation. It is a graph which shows. The symbol “FDICA” on the horizontal axis in FIGS. 5 and 6 represents the case where all of the K frequencies F [1] to F [K] (for example, K = 513) are selected as the first frequency FA [m] (that is, , A configuration in which the second sound source separation unit 32 is omitted. In FIGS. 5 and 6, for each of the first embodiment (solid line) and the proportionality (broken line), the result of processing the observation signal xi (t) recorded in the anechoic room and the reverberation time are shown. The result of processing the observation signal xi (t) recorded in the acoustic room of 500 milliseconds is also shown.

図５では、音源分離後のセグメンタルＳＮＲ（SegSNR：Segmental Signal-to-Noise Ratio）が音源分離の評価指標として縦軸に図示されている。音源分離後のセグメンタルＳＮＲは以下の数式(12)で表現される。数式(12)の記号ｘij(h,u)は、数式(10A)および数式(10B)の音響成分ｘij(t)のうち第ｕ番目の単位期間内の時点ｈでの信号値（振幅）を意味する。また、数式(12)の記号ｙi(h,u)は、音源分離後の分離信号ｙi(t)のうち第ｕ番目の単位期間内の時点ｈでの信号値（振幅）を意味する。数式(12)から理解されるように、音源分離後のセグメンタルＳＮＲが大きい（すなわち分離信号ｙi(t)が観測点での音響Ｓiに近い）ほど音源分離の精度が高いと評価できる。

In FIG. 5, a segmental signal-to-noise ratio (SegSNR) after sound source separation is shown on the vertical axis as an evaluation index for sound source separation. The segmental SNR after the sound source separation is expressed by the following formula (12). The symbol xij (h, u) in Equation (12) represents the signal value (amplitude) at the time point h within the u-th unit period among the acoustic components xij (t) in Equation (10A) and Equation (10B). means. The symbol yi (h, u) in the equation (12) means a signal value (amplitude) at the time point h within the u-th unit period of the separated signal yi (t) after the sound source separation. As understood from the equation (12), it can be evaluated that the higher the segmental SNR after the sound source separation (that is, the closer the separated signal yi (t) is to the acoustic Si at the observation point), the higher the sound source separation accuracy.

図５から把握されるように、対比例の構成では、学習処理の対象となる第１周波数ＦA[m]の個数Ｍが減少するほど音源分離の精度（セグメンタルＳＮＲ）が低下するのに対し、第１実施形態では、第１周波数ＦA[m]の個数Ｍを削減した場合でも充分に高精度な音源分離が実現される。Ｋ個の全部を第１周波数ＦA[m]に選別した場合（FDICA）と比較しても第１実施形態のほうが音源分離の精度が高いことが図５から把握される。 As can be seen from FIG. 5, in the comparative configuration, the accuracy of sound source separation (segmental SNR) decreases as the number M of the first frequencies FA [m] to be subjected to learning processing decreases. In the first embodiment, sufficiently accurate sound source separation is realized even when the number M of the first frequencies FA [m] is reduced. It can be understood from FIG. 5 that the first embodiment has higher accuracy of sound source separation even when compared with the case where all K pieces are selected as the first frequency FA [m] (FDICA).

他方、図６では、音源分離の前後にわたるＳＩＲ（信号対干渉比：Signal-to-Interference Ratio）の変化量ΔSIRが音源分離の評価指標として縦軸に図示されている。音源分離前のＳＩＲ_inは以下の数式(13A)で表現され、音源分離後のＳＩＲ_outは以下の数式(13B)で表現される。数式(13A)の音響成分ｘ21(t)および音響成分ｘ12(t)（数式(10A)，数式(10B)）と、数式(13B)の音響成分ｙ21(t)および音響成分ｙ12(t)（数式(11A)，数式(11B)）とが干渉成分（妨害音）に相当する。

On the other hand, in FIG. 6, a change amount ΔSIR of SIR (Signal-to-Interference Ratio) before and after sound source separation is shown on the vertical axis as an evaluation index of sound source separation. SIR _in before the sound source separation is expressed by the following formula (13A), and SIR _out after the sound source separation is expressed by the following formula (13B). The acoustic component x21 (t) and the acoustic component x12 (t) (Equation (10A) and Equation (10B)) of the equation (13A) and the acoustic component y21 (t) and the acoustic component y12 (t) (Equation (13B)) Expressions (11A) and (11B)) correspond to interference components (interfering sounds).

図６の縦軸に図示された変化量ΔSIRは、音源分離前のＳＩＲ_inと音源分離後のＳＩＲ_outとの差分値（ΔSIR＝ＳＩＲ_out−ＳＩＲ_in）に相当する。したがって、変化量ΔSIRが大きいほど音源分離の精度が高いと評価できる。図６から把握されるように、第１実施形態および対比例の双方について、学習処理の対象となる第１周波数ＦA[m]の個数Ｍが減少するほど音源分離の精度（変化量ΔSIR）が低下する。以上の傾向は、残響が発生する環境で特に顕著となる。 The change amount ΔSIR shown on the vertical axis in FIG. 6 corresponds to a difference value (ΔSIR = SIR _out −SIR _in ) between SIR _in before sound source separation and SIR _out after sound source separation. Therefore, it can be evaluated that the greater the change amount ΔSIR, the higher the accuracy of sound source separation. As can be seen from FIG. 6, in both the first embodiment and the comparative example, the accuracy (variation ΔSIR) of the sound source separation increases as the number M of the first frequencies FA [m] to be subjected to learning processing decreases. descend. The above tendency is particularly remarkable in an environment where reverberation occurs.

以上に説明したように、第１実施形態では、セグメンタルＳＮＲの観点から評価した音源分離の精度とＳＩＲ（変化量ΔSIR）の観点から評価した音源分離の精度とが、第１周波数ＦA[m]の個数Ｍに対して相互に背反する関係にある。したがって、図５のセグメンタルＳＮＲと図６のＳＩＲの変化量ΔSIRとが高い水準で両立するように第１周波数ＦA[m]の個数Ｍを選定することで、対比例と比較して高精度な音源分離を実現することが可能である。 As described above, in the first embodiment, the accuracy of the sound source separation evaluated from the viewpoint of the segmental SNR and the accuracy of the sound source separation evaluated from the viewpoint of the SIR (variation amount ΔSIR) are the first frequency FA [m ] Are mutually contradictory to the number M. Therefore, by selecting the number M of the first frequencies FA [m] so that the segmental SNR in FIG. 5 and the SIR variation ΔSIR in FIG. Sound source separation can be realized.

例えば、第１実施形態では第１周波数ＦA[m]の個数Ｍが少ないほどセグメンタルＳＮＲが上昇する。したがって、セグメンタルＳＮＲを改善するという観点や、演算処理装置１２の処理負荷（消費電力）および記憶装置１４の容量を削減するという観点からすると、第１周波数ＦA[m]の個数Ｍを減少させるほど有利である。他方、第１周波数ＦA[m]の個数Ｍを極端に減少させた場合には、ＳＩＲの変化量ΔSIRの低下が顕在化する可能性があるが、第１周波数ＦA[m]の個数Ｍが周波数Ｆ[k]の総数Ｋ（Ｋ＝５１３）の１/４程度（Ｍ＝１２８）を上回る範囲であれば、ＳＩＲの変化量ΔSIRの低下は顕在化しないという傾向が図６から把握される。また、分離行列Ｍ[m]の個数Ｍが極端に少ない場合には到来方向θe1および到来方向θe2の推定精度が低下するが、個数Ｍが周波数Ｆ[k]の総数Ｋの１/４程度であれば、充分な精度で到来方向θe1および到来方向θe2を推定することが可能である。以上の傾向を考慮すると、第１周波数ＦA[m]の個数Ｍを周波数Ｆ[k]の総数Ｋの２５％程度（例えば２０％〜３０％）に設定した構成が格別に好適である。 For example, in the first embodiment, the smaller the number M of the first frequencies FA [m], the higher the segmental SNR. Therefore, from the viewpoint of improving the segmental SNR and from the viewpoint of reducing the processing load (power consumption) of the arithmetic processing device 12 and the capacity of the storage device 14, the number M of the first frequencies FA [m] is reduced. Is more advantageous. On the other hand, when the number M of the first frequencies FA [m] is extremely reduced, there is a possibility that the decrease in the change amount SIR of the SIR becomes obvious, but the number M of the first frequencies FA [m] is From FIG. 6, it can be seen from FIG. 6 that the decrease in the amount of change SIR ΔSIR does not become apparent if the frequency F [k] is in a range that exceeds approximately ¼ (M = 128) of the total number K (K = 513) of the frequency F [k]. . Further, when the number M of the separation matrix M [m] is extremely small, the estimation accuracy of the arrival direction θe1 and the arrival direction θe2 is lowered, but the number M is about 1/4 of the total number K of the frequencies F [k]. If so, the arrival direction θe1 and the arrival direction θe2 can be estimated with sufficient accuracy. Considering the above tendency, a configuration in which the number M of the first frequencies FA [m] is set to about 25% (for example, 20% to 30%) of the total number K of the frequencies F [k] is particularly suitable.

＜Ｂ：第２実施形態＞
本発明の第２実施形態を以下に説明する。第１実施形態では、独立成分分析を利用した音源分離（第１音源分離部３１）と死角制御型ビーム形成を利用した音源分離（第２音源分離部３２）とを併用したが、第２実施形態では独立成分分析による音源分離が省略される。なお、以下に例示する各構成において作用や機能が第１実施形態と同等である要素については、以上の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。 <B: Second Embodiment>
A second embodiment of the present invention will be described below. In the first embodiment, the sound source separation using the independent component analysis (first sound source separation unit 31) and the sound source separation using the blind spot control type beam forming (second sound source separation unit 32) are used in combination. In the form, sound source separation by independent component analysis is omitted. In addition, about the element in which an effect | action and a function are equivalent to 1st Embodiment in each structure illustrated below, the detailed description of each is abbreviate | omitted suitably using the code | symbol referred by the above description.

図７は、第２実施形態の音響処理装置１００Bのブロック図である。図７に示すように、第２実施形態の音響処理装置１００Bは、第１実施形態における周波数選別部２４と指標算定部２６と第１音源分離部３１と周波数統合部４２とを省略した構成であり、周波数解析部２２と音源分離部３５と波形合成部４４とを具備する。周波数解析部２２は、第１実施形態と同様に、観測信号ｘ1(t)のＫ個の周波数成分Ｘ1[k,u]（Ｘ1[1,u]〜Ｘ1[K,u]）と観測信号ｘ2(t)のＫ個の周波数成分Ｘ2[k,u]（Ｘ2[1,u]〜Ｘ2[K,u]）とを生成する。 FIG. 7 is a block diagram of the sound processing apparatus 100B of the second embodiment. As shown in FIG. 7, the sound processing apparatus 100B of the second embodiment has a configuration in which the frequency selection unit 24, the index calculation unit 26, the first sound source separation unit 31, and the frequency integration unit 42 in the first embodiment are omitted. A frequency analysis unit 22, a sound source separation unit 35, and a waveform synthesis unit 44. Similarly to the first embodiment, the frequency analysis unit 22 uses the K frequency components X1 [k, u] (X1 [1, u] to X1 [K, u]) of the observation signal x1 (t) and the observation signal. X frequency components X2 [k, u] (X2 [1, u] to X2 [K, u]) of x2 (t) are generated.

音源分離部３５は、第１実施形態の第２音源分離部３２と同様に、図２の方向特定部６２と指向処理部６４と係数値生成部６６と信号処理部６８とを具備し、各周波数成分Ｘ1[k,u]および各周波数成分Ｘ2[k,u]に対して死角制御型ビーム形成を利用した音源分離を実行することでＫ個の分離成分Ｙ1[k,u]（Ｙ1[1,u]〜Ｙ1[K,u]）とＫ個の分離成分Ｙ2[k,u]（Ｙ2[1,u]〜Ｙ2[K,u]）とを単位期間毎に生成する。すなわち、第２実施形態の音源分離部３５の動作は、第１実施形態においてＫ個の周波数Ｆ[1]〜Ｆ[K]の全部を第２周波数ＦB[1]〜ＦB[N]に選別した場合（Ｎ＝Ｋ）の第２音源分離部３２の動作と同様である。波形合成部４４は、第１実施形態と同様に、Ｋ個の分離成分Ｙ1[1,u]〜Ｙ1[K,u]から分離信号ｙ1(t)を生成するとともにＫ個の分離成分Ｙ2[1,u]〜Ｙ2[N,u]から分離信号ｙ2(t)を生成する。第２実施形態においても第１実施形態と同様の効果が実現される。 Similar to the second sound source separation unit 32 of the first embodiment, the sound source separation unit 35 includes the direction specifying unit 62, the directivity processing unit 64, the coefficient value generation unit 66, and the signal processing unit 68 of FIG. By performing sound source separation using blind spot control type beam forming for the frequency component X1 [k, u] and each frequency component X2 [k, u], K separated components Y1 [k, u] (Y1 [ 1, u] to Y1 [K, u]) and K separated components Y2 [k, u] (Y2 [1, u] to Y2 [K, u]) are generated for each unit period. That is, the operation of the sound source separation unit 35 of the second embodiment is to select all of the K frequencies F [1] to F [K] into the second frequencies FB [1] to FB [N] in the first embodiment. In this case (N = K), the operation is the same as that of the second sound source separation unit 32. Similarly to the first embodiment, the waveform synthesizer 44 generates a separation signal y1 (t) from the K separation components Y1 [1, u] to Y1 [K, u], and K separation components Y2 [ The separated signal y2 (t) is generated from 1, u] to Y2 [N, u]. In the second embodiment, the same effect as in the first embodiment is realized.

なお、第１実施形態の方向特定部６２は到来方向θe1および到来方向θe2の推定に分離行列Ｗ[m]を利用したが、第２実施形態の方向特定部６２が到来方向θe1および到来方向θe2を特定する方法には公知の技術が任意に採用される。例えば、方向特定部６２は、Ema Takuro and Nozomu Hamada, "FDICA using Time-Frequency Cell Selection for Blind Source Separation", 2005 RISP International Worksyop on Nonlinear Circuit and Signal Processing (NCSP'05), p.471 - 474 等に記載された方法で、各周波数成分Ｘ1[k,u]および各周波数成分Ｘ2[k,u]から到来方向θe1および到来方向θe2を推定する。また、第１実施形態の分離行列生成部５４を第２実施形態に追加し、分離行列生成部５４が生成した分離行列Ｗ[m]から第１実施形態と同様の方法で方向特定部６２が到来方向θe1および到来方向θe2を推定する構成（すなわち分離行列Ｗ[m]を到来方向θe1および到来方向θe2の推定のみに利用する構成）も採用される。 Although the direction specifying unit 62 of the first embodiment uses the separation matrix W [m] for estimating the arrival direction θe1 and the arrival direction θe2, the direction specifying unit 62 of the second embodiment uses the arrival direction θe1 and the arrival direction θe2. A known technique is arbitrarily adopted as a method for identifying the above. For example, the direction specifying unit 62 may be an Ema Takuro and Nozomu Hamada, “FDICA using Time-Frequency Cell Selection for Blind Source Separation”, 2005 RISP International Worksyop on Nonlinear Circuit and Signal Processing (NCSP'05), p.471-474, etc. The direction of arrival θe1 and the direction of arrival θe2 are estimated from each frequency component X1 [k, u] and each frequency component X2 [k, u]. Further, the separation matrix generation unit 54 of the first embodiment is added to the second embodiment, and the direction specification unit 62 uses the separation matrix W [m] generated by the separation matrix generation unit 54 in the same manner as in the first embodiment. A configuration that estimates the arrival direction θe1 and the arrival direction θe2 (that is, a configuration that uses the separation matrix W [m] only for estimation of the arrival direction θe1 and the arrival direction θe2) is also employed.

＜Ｃ：変形例＞
以上の各形態には様々な変形が加えられる。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様は適宜に併合され得る。 <C: Modification>
Various modifications are added to the above embodiments. Specific modifications are exemplified below. Two or more aspects arbitrarily selected from the following examples can be appropriately combined.

（１）変形例１
前述の各形態において方向特定部６２が到来方向θe1や到来方向θe2を特定する方法は任意である。例えば、分離行列生成部５４が生成したＭ個の分離行列Ｗ[1]〜Ｗ[M]のうち所定個の分離行列Ｗ[m]を選択して到来方向θe1および到来方向θe2を推定する構成も採用され得る。また、例えば、音源ＰS1の方向θ1や音源ＰS2の方向θ2が既知である場合には、到来方向θe1や到来方向θe2を記憶装置１４に事前に格納することも可能である。方向特定部６２は、記憶装置１４から到来方向θe1および到来方向θe2を取得する要素として機能する。利用者からの指示（例えば操作子の操作で方向を指定する動作）に応じて到来方向θe1および到来方向θe2を設定することも可能である。 (1) Modification 1
In each of the above-described embodiments, the method for specifying the arrival direction θe1 and the arrival direction θe2 by the direction specifying unit 62 is arbitrary. For example, a configuration in which a predetermined number of separation matrices W [m] is selected from the M number of separation matrices W [1] to W [M] generated by the separation matrix generation unit 54 and the arrival direction θe1 and the arrival direction θe2 are estimated. Can also be employed. For example, when the direction θ1 of the sound source PS1 and the direction θ2 of the sound source PS2 are known, the arrival direction θe1 and the arrival direction θe2 can be stored in the storage device 14 in advance. The direction specifying unit 62 functions as an element that acquires the arrival direction θe1 and the arrival direction θe2 from the storage device 14. It is also possible to set the arrival direction θe1 and the arrival direction θe2 in accordance with an instruction from the user (for example, an operation for designating a direction by operating the operation element).

（２）変形例２
第１実施形態の有意指標値σ[k]は、観測ベクトルＸv[k,u]の共分散行列Ｒxx[k]の行列式に限定されない。例えば特許文献１に例示された各種の指標（統計量）が有意指標値σ[k]として採用され得る。 (2) Modification 2
The significant index value σ [k] of the first embodiment is not limited to the determinant of the covariance matrix Rxx [k] of the observation vector Xv [k, u]. For example, various indexes (statistics) exemplified in Patent Document 1 can be adopted as the significant index value σ [k].

例えば、観測ベクトルＸv[k,u]の分布における基底の総数が多い周波数Ｆ[k]ほど学習処理の有意性が高いという傾向を考慮すると、観測ベクトルＸv[k,u]の共分散行列Ｒxx[k]の条件数を有意指標値σ[k]として指標算定部２６が算定し、有意指標値σ[k]が小さいＭ個の周波数Ｆ[k]を周波数選別部２４が第１周波数ＦA[k]に選別することが可能である。すなわち、共分散行列Ｒxx[k]の行列式や条件数は、観測ベクトルＸv[k,u]の分布における基底の総数の指標として利用される。なお、共分散行列Ｒxx[k]のトレースを有意指標値σ[k]として算定し、有意指標値σ[k]が大きい周波数Ｆ[k]を第１周波数ＦA[m]に選別する構成も好適である。 For example, in consideration of the tendency that the significance of the learning process is higher as the frequency F [k] has a larger total number of bases in the distribution of the observed vector Xv [k, u], the covariance matrix Rxx of the observed vector Xv [k, u]. The index calculation unit 26 calculates the condition number of [k] as the significant index value σ [k], and the frequency selection unit 24 selects the first frequency FA with M frequencies F [k] having a small significant index value σ [k]. It is possible to sort into [k]. That is, the determinant and condition number of the covariance matrix Rxx [k] are used as an index of the total number of bases in the distribution of the observation vector Xv [k, u]. A configuration is also possible in which the trace of the covariance matrix Rxx [k] is calculated as the significant index value σ [k], and the frequency F [k] having a large significant index value σ [k] is selected as the first frequency FA [m]. Is preferred.

独立成分分析の学習処理は、音源分離後の各信号が統計的に独立となるように分離行列Ｗ[m]を更新する処理であるから、観測信号ｘ1(t)と観測信号ｘ2(t)との間の統計的な相関が低い周波数Ｆ[k]ほど学習処理の有意性は高いと評価できる。以上の傾向を考慮すると、観測信号ｘ1(t)と観測信号ｘ2(t)との間の独立性の指標が有意指標値σ[k]として好適である。独立性の指標としては相互相関や相互情報量が例示される。周波数選別部２４は、観測信号ｘ1(t)と観測信号ｘ2(t)との間の独立性が高い（相互相関や相互情報量が小さい）Ｍ個の周波数Ｆ[k]を第１周波数ＦA[m]に選別する。 The learning process of independent component analysis is a process of updating the separation matrix W [m] so that each signal after sound source separation is statistically independent. Therefore, the observation signal x1 (t) and the observation signal x2 (t) It can be evaluated that the significance of the learning process is higher as the frequency F [k] has a lower statistical correlation with. Considering the above tendency, an index of independence between the observed signal x1 (t) and the observed signal x2 (t) is suitable as the significant index value σ [k]. Examples of the independence index include cross-correlation and mutual information. The frequency selection unit 24 selects M frequencies F [k] having high independence between the observation signal x1 (t) and the observation signal x2 (t) (small cross-correlation and mutual information amount) as the first frequency FA. Select [m].

また、観測信号ｘ1(t)および観測信号ｘ2(t)に含まれる音響の種類数（音源数）が多いほど学習処理の有意性は高いと評価できる。音響の混合数が多いほど観測信号ｘ1(t)や観測信号ｘ2(t)の強度分布の尖度（カートシス）が低下するという傾向（中心極限定理）を考慮すると、観測信号ｘ1(t)または観測信号ｘ2(t)の強度分布（確率分布）の尖度が有意指標値σ[k]として採用され得る。周波数選別部２４は、観測信号ｘ1(t)および観測信号ｘ2(t)の片方または双方の強度分布の尖度が低い（音響の混合数が多い）Ｍ個の周波数Ｆ[k]を第１周波数ＦA[m]に選別する。 Further, it can be evaluated that the significance of the learning process is higher as the number of types of sound (number of sound sources) included in the observation signal x1 (t) and the observation signal x2 (t) is larger. Considering the tendency (the central limit theorem) that the kurtosis (cartesis) of the intensity distribution of the observation signal x1 (t) and the observation signal x2 (t) decreases as the number of acoustic mixtures increases, the observation signal x1 (t) or The kurtosis of the intensity distribution (probability distribution) of the observation signal x2 (t) can be adopted as the significant index value σ [k]. The frequency selection unit 24 first selects M frequencies F [k] having a low kurtosis (a large number of acoustic mixtures) in one or both of the observation signal x1 (t) and the observation signal x2 (t). Sort to frequency FA [m].

複数種の指標（例えば以上の例示から選択された２種以上の指標）から有意指標値σ[k]を算定することも可能である。例えば、前述の複数種の指標（例えば共分散行列Ｒxx[k]の行列式とトレース）の加重和を有意指標値σ[k]として算定する構成が採用される。 It is also possible to calculate a significant index value σ [k] from a plurality of types of indexes (for example, two or more types of indexes selected from the above examples). For example, a configuration is adopted in which the weighted sum of the above-described plural types of indices (for example, the determinant of the covariance matrix Rxx [k] and the trace) is calculated as the significant index value σ [k].

もっとも、第１周波数ＦA[m]と第２周波数ＦB[n]との選別に有意指標値σ[k]を利用する構成（指標算定部２６）は省略され得る。具体的には、観測信号ｘ1(t)や観測信号ｘ2(t)とは無関係にＫ個の周波数Ｆ[k]を選別することも可能である。例えば、Ｋ個の周波数Ｆ[k]から所定個の間隔で選択した周波数Ｆ[k]（例えば奇数番目の周波数Ｆ[k]）を第１周波数ＦA[m]に選別するとともに残余の周波数Ｆ[k]（例えば偶数番目の周波数Ｆ[k]）を第２周波数ＦB[n]に選別する構成が採用される。また、観測信号ｘ1(t)および観測信号ｘ2(t)に想定される音響特性や学習処理の内容等の事情から、学習処理の有意性が高い周波数Ｆ[k]が事前に判明しているならば、その周波数Ｆ[k]を第１周波数ＦA[m]に選別するとともに残余の周波数Ｆ[k]を第２周波数ＦB[n]に選別することも可能である。 However, the configuration (index calculation unit 26) that uses the significant index value σ [k] for selection between the first frequency FA [m] and the second frequency FB [n] can be omitted. Specifically, K frequencies F [k] can be selected regardless of the observation signal x1 (t) or the observation signal x2 (t). For example, the frequency F [k] (for example, odd-numbered frequency F [k]) selected from the K frequencies F [k] at a predetermined interval is selected as the first frequency FA [m] and the remaining frequency F is selected. A configuration is adopted in which [k] (for example, even-numbered frequency F [k]) is selected as second frequency FB [n]. In addition, the frequency F [k] with high significance of the learning process is known in advance from the situation such as the acoustic characteristics assumed for the observation signal x1 (t) and the observation signal x2 (t) and the contents of the learning process. Then, the frequency F [k] can be selected as the first frequency FA [m] and the remaining frequency F [k] can be selected as the second frequency FB [n].

（３）変形例３
以上の各形態では、２個の音源ＰSi（ＰS1，ＰS2）からの音響Ｓiを２個の収音機器ＰMj（ＰM1，ＰM2）で収音する構成を例示したが、音源ＰSiの総数や収音機器ＰMjの総数は適宜に変更される。ただし、収音機器ＰMjの総数は音源ＰSiの総数以上である必要がある。 (3) Modification 3
In each of the above embodiments, the configuration in which the sound Si from the two sound sources PSi (PS1, PS2) is collected by the two sound collecting devices PMj (PM1, PM2) is exemplified. The total number of devices PMj is changed as appropriate. However, the total number of sound collection devices PMj needs to be equal to or greater than the total number of sound sources PSi.

（４）変形例４
例えば携帯電話機やパーソナルコンピュータ等の端末装置から送信された観測信号ｘ1(t)および観測信号ｘ2(t)をインターネット等の通信網を介して音響処理装置１００（１００A，１００B）が受信する構成も採用され得る。音響処理装置１００は、観測信号ｘ1(t)および観測信号ｘ2(t)から第１実施形態や第２実施形態と同様に分離信号ｙ1(t)および分離信号ｙ2(t)を生成して端末装置に送信する。各周波数成分Ｘ1[k,u]と各周波数成分Ｘ2[k,u]とが端末装置から音響処理装置１００に送信される構成（周波数解析部２２が端末装置に搭載されて音響処理装置１００には搭載されない構成）や、各分離成分Ｙ1[k,u]と各分離成分Ｙ2[k,u]とが音響処理装置１００から端末装置に送信される構成（波形合成部４４が端末装置に搭載されて音響処理装置１００には搭載されない構成）も採用される。 (4) Modification 4
For example, the sound processing device 100 (100A, 100B) receives the observation signal x1 (t) and the observation signal x2 (t) transmitted from a terminal device such as a mobile phone or a personal computer via a communication network such as the Internet. Can be employed. The sound processing apparatus 100 generates a separation signal y1 (t) and a separation signal y2 (t) from the observation signal x1 (t) and the observation signal x2 (t) in the same manner as in the first embodiment and the second embodiment to generate a terminal. Send to device. Configuration in which each frequency component X1 [k, u] and each frequency component X2 [k, u] are transmitted from the terminal device to the sound processing device 100 (the frequency analysis unit 22 is mounted on the terminal device and is added to the sound processing device 100) In which the separated components Y1 [k, u] and the separated components Y2 [k, u] are transmitted from the sound processing device 100 to the terminal device (the waveform synthesis unit 44 is mounted on the terminal device). And a configuration that is not mounted in the sound processing apparatus 100).

１００A，１００B……音響処理装置、１２……演算処理装置、１４……記憶装置、２２……周波数解析部、２４……周波数選別部、２６……指標算定部、３１……第１音源分離部、３２……第２音源分離部、３５……音源分離部、４２……周波数統合部、４４……波形合成部、５２……信号処理部、５４……分離行列生成部、６２……方向特定部、６４……指向処理部、６６……係数値生成部、６８……信号処理部、ＰS1，ＰS2……音源、ＰM1，ＰM2……収音機器。 100A, 100B: Acoustic processing device, 12: Arithmetic processing device, 14: Storage device, 22: Frequency analysis unit, 24: Frequency selection unit, 26: Index calculation unit, 31: First sound source separation , 32... Second sound source separation unit, 35... Sound source separation unit, 42... Frequency integration unit, 44 ...... waveform synthesis unit, 52 ...... signal processing unit, 54 ...... separation matrix generation unit, 62 ...... Direction specifying unit, 64... Direction processing unit, 66... Coefficient value generation unit, 68... Signal processing unit, PS 1, PS 2 .. Sound source, PM 1, PM 2.

Claims

An acoustic processing device that processes a plurality of observation signals obtained by collecting a mixed sound of sounds coming from a plurality of sound sources by a plurality of sound collecting devices,
Direction specifying means for specifying the direction of arrival of sound for each of the plurality of sound sources;
Directional processing means for generating a directional signal for each of the plurality of arrival directions specified by the direction specifying means by executing blind angle control type beam forming that forms a dead angle of sound collection in the arrival direction for the plurality of observation signals. When,
Coefficient value generating means for generating a processing coefficient value corresponding to the ratio of the amplitude of one directional signal to the sum of amplitudes of a plurality of directional signals generated by the directional processing means for each frequency;
A sound processing apparatus comprising: first signal processing means for causing a processing coefficient value of the frequency to act on each frequency component of the observation signal.

Frequency sorting means for sorting a plurality of frequencies into a first frequency and a second frequency;
Separation matrix generating means for generating a separation matrix for each of the first frequencies from the components of the first frequencies in the plurality of observation signals;
Second signal processing means for generating a first separated signal by applying a separation matrix of the first frequency to each first frequency component in the plurality of observation signals;
The directional processing means generates the directional signal for each arrival direction from the components of the second frequencies in the plurality of observation signals,
The coefficient value generating means generates the processing coefficient value for each of the second frequencies,
The acoustic processing device according to claim 1, wherein the first signal processing unit generates a second separated component by applying the processing coefficient value of the second frequency to the component of the second frequency in the plurality of observation signals.

The acoustic processing apparatus according to claim 2, wherein the direction specifying unit estimates an arrival direction of sound for each of the plurality of sound sources from the separation matrix generated by the separation matrix generation unit for each first frequency.

Comprising index calculation means for calculating, for each frequency, a significant index value indicating the significance of learning processing for generating a separation matrix from components of each frequency in the plurality of observation signals;
The acoustic processing apparatus according to claim 2 or 3, wherein the frequency sorting unit sorts the plurality of frequencies into a first frequency and a second frequency according to a significant index value of each frequency.