JP2012178679A - Sound processing device - Google Patents

Sound processing device Download PDF

Info

Publication number
JP2012178679A
JP2012178679A JP2011040014A JP2011040014A JP2012178679A JP 2012178679 A JP2012178679 A JP 2012178679A JP 2011040014 A JP2011040014 A JP 2011040014A JP 2011040014 A JP2011040014 A JP 2011040014A JP 2012178679 A JP2012178679 A JP 2012178679A
Authority
JP
Japan
Prior art keywords
frequency
sound
processing
signal
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2011040014A
Other languages
Japanese (ja)
Other versions
JP5826502B2 (en
Inventor
Kazunobu Kondo
多伸 近藤
Kazuya Takeda
一哉 武田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nagoya University NUC
Yamaha Corp
Original Assignee
Nagoya University NUC
Yamaha Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nagoya University NUC, Yamaha Corp filed Critical Nagoya University NUC
Priority to JP2011040014A priority Critical patent/JP5826502B2/en
Publication of JP2012178679A publication Critical patent/JP2012178679A/en
Application granted granted Critical
Publication of JP5826502B2 publication Critical patent/JP5826502B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

PROBLEM TO BE SOLVED: To maintain the low frequency range intensities of signals after sound source separation.SOLUTION: An observation signal x1(t) and an observation signal x2(t) resulting from picking-up of a mixture of a sound S1 from a sound source PS1 and a sound S2 from a sound source PS2 by a sound pickup PM1 and a sound pickup PM2 are supplied to a sound processing device. A direction identifying unit 62 identifies the arriving direction θe1 of the sound S1 and the arriving direction θe2 of the sound S2. An orientation processor 64, by executing on the observation signal x1(t) and the observation signal x2(t) dead angle control type beam formation that forms a dead angle of sound picking-up in the arriving direction θei (i=1, 2), generates an orientation component Zi[n, u]. A coefficient value generator 66 generates as a processing coefficient value αi[n, u] the ratio of the amplitude of the orientation component Zi[n, u] to the amplitude sum of an orientation component Z1[n, u] and an orientation component Z2[n, u].

Description

本発明は、相異なる音源が発生した複数の音響の混合音のうち特定の音源からの音響を強調(分離または抽出)する技術に関する。   The present invention relates to a technique for emphasizing (separating or extracting) sound from a specific sound source among a plurality of mixed sound generated by different sound sources.

音声や雑音等の複数の音響の混合音を複数の収音機器で収音した複数の観測信号に音源分離を実行することで各音源からの音響を分離する音源分離技術が従来から提案されている。音源分離に適用される分離行列(逆混合行列)は、例えば周波数領域の独立成分分析(FDICA:Frequency-Domain Independent Component Analysis)を利用した学習処理(反復的な更新)で周波数毎に算定される。   Sound source separation technology has been proposed that separates sound from each sound source by performing sound source separation on multiple observation signals obtained by collecting multiple sounds such as voice and noise with multiple sound collection devices. Yes. The separation matrix (inverse mixing matrix) applied to sound source separation is calculated for each frequency by learning processing (iterative update) using, for example, frequency-domain independent component analysis (FDICA). .

特許文献1および非特許文献1には、複数の周波数から所定の条件で選択された周波数について複数の観測信号を利用した学習処理で分離行列を生成し、学習処理後の分離行列を利用して非選択の周波数の分離行列を補充する技術が開示されている。非選択の周波数の分離行列の生成には、例えば死角制御型ビーム形成(NBF(Null Beam Former))が利用される。すなわち、学習処理後の分離行列から推定される音響の到来方向に収音の死角が形成されるように非選択の周波数の分離行列が生成される。   In Patent Literature 1 and Non-Patent Literature 1, a separation matrix is generated by learning processing using a plurality of observation signals for frequencies selected from a plurality of frequencies under a predetermined condition, and the separation matrix after learning processing is used. A technique for supplementing the separation matrix of non-selected frequencies is disclosed. For example, blind angle control type beam forming (NBF (Null Beam Former)) is used to generate a separation matrix of non-selected frequencies. That is, a separation matrix of non-selected frequencies is generated such that a dead angle of sound collection is formed in the sound arrival direction estimated from the separation matrix after the learning process.

特開2010−117653号公報JP 2010-117653 A

大迫ほか3名,“死角制御型ビームフォーマによる周波数帯域補間を用いたブラインド音源分離の高速化手法”,日本音響学会講演論文集,日本音響学会,2007年3月,p.549-p.550Osako et al., “Fast method of blind source separation using frequency band interpolation by blind spot control type beamformer”, Proceedings of the Acoustical Society of Japan, Acoustical Society of Japan, March 2007, p.549-p.550

しかし、以上の技術では、音源分離後の信号のうち低域側の周波数での強度が死角制御型ビーム形成に起因して低くなるという問題がある。以上の事情を考慮して、本発明は、音源分離後の信号について低域側の強度を維持することを目的とする。   However, the above technique has a problem that the intensity at the low frequency side of the signal after the sound source separation is lowered due to blind spot control type beam formation. In view of the above circumstances, an object of the present invention is to maintain the low-frequency side strength of a signal after sound source separation.

以上の課題を解決するために本発明が採用する手段を説明する。なお、本発明の理解を容易にするために、以下の説明では、本発明の要素と後述の実施形態の要素との対応を括弧書で付記するが、本発明の範囲を実施形態の例示に限定する趣旨ではない。   Means employed by the present invention to solve the above problems will be described. In order to facilitate the understanding of the present invention, in the following description, the correspondence between the elements of the present invention and the elements of the embodiments described later will be indicated in parentheses, but the scope of the present invention will be exemplified in the embodiments. It is not intended to be limited.

本発明の音響処理装置は、複数の音源(例えば音源PS1および音源PS2)から到来する音響(例えば音響S1および音響S2)の混合音を複数の収音機器(例えば収音機器PM1および収音機器PM2)で収音した複数の観測信号(例えば観測信号x1(t)および観測信号x2(t))を処理する音響処理装置であって、複数の音源の各々について音響の到来方向(例えば到来方向θe1および到来方向θe2)を特定する方向特定手段(例えば方向特定部62)と、方向特定手段が特定した複数の到来方向の各々について、当該到来方向に収音の死角を形成する死角制御型ビーム形成を複数の観測信号について実行することで指向信号(例えば指向成分Z1[n,u]および指向成分Z2[n,u])を生成する指向処理手段(例えば指向処理部64)と、指向処理手段が生成した複数の指向信号の振幅の加算値に対する一の指向信号の振幅の比に応じた処理係数値(例えば処理係数値αi[n,u])を周波数毎に生成する係数値生成手段(例えば係数値生成部66)と、観測信号の各周波数の成分に当該周波数の処理係数値を作用させる第1信号処理手段(例えば信号処理部68)とを具備する。   The sound processing apparatus of the present invention is configured to output mixed sound of sounds (for example, sound S1 and sound S2) coming from a plurality of sound sources (for example, sound source PS1 and sound source PS2) to a plurality of sound collection devices (for example, sound collection device PM1 and sound collection device). PM2) is an acoustic processing device for processing a plurality of observation signals (for example, observation signal x1 (t) and observation signal x2 (t)) collected by sound, and for each of the plurality of sound sources, the direction of arrival of sound (for example, the arrival direction) direction specifying means (for example, direction specifying unit 62) for specifying θe1 and direction of arrival θe2), and for each of a plurality of arrival directions specified by the direction specifying means, a blind spot control type beam that forms a dead angle of sound collection in the arrival direction. Directional processing means (for example, directional processing unit 64) for generating directional signals (for example, directional component Z1 [n, u] and directional component Z2 [n, u]) by executing formation on a plurality of observation signals; The compound generated by the means Coefficient value generation means (for example, coefficient value generation) that generates a processing coefficient value (for example, processing coefficient value αi [n, u]) corresponding to the ratio of the amplitude of one directional signal to the sum of the directional signal amplitudes Unit 66) and first signal processing means (for example, signal processing unit 68) for applying a processing coefficient value of the frequency to each frequency component of the observation signal.

以上の形態では、複数の指向信号の振幅の加算値に対する一の指向信号の振幅の比に応じた処理係数値を観測信号に作用させるから、例えば指向処理手段が生成した指向信号を音源分離後の音響信号として確定する構成と比較すると、音源分離後の信号について低域側の強度を維持することが可能である。   In the above embodiment, since the processing coefficient value corresponding to the ratio of the amplitude of one directional signal to the sum of the amplitudes of the plurality of directional signals is applied to the observation signal, for example, the directional signal generated by the directional processing means is separated from the sound source. As compared with the configuration determined as the acoustic signal, it is possible to maintain the low-side intensity of the signal after the sound source separation.

本発明の好適な態様において、複数の周波数(例えばK個の周波数F[1]〜F[K])を第1周波数(例えばM個の第1周波数FA[1]〜FA[M])と第2周波数(例えばN個の第2周波数FB[1]〜FB[N])とに選別する周波数選別手段(例えば周波数選別部24)と、複数の観測信号における各第1周波数の成分から分離行列(例えば分離行列W[m])を第1周波数毎に生成する分離行列生成手段(例えば分離行列生成部54)と、複数の観測信号における各第1周波数の成分に当該第1周波数の分離行列を作用させて第1分離信号(例えば分離成分YA1[m,u]または分離成分YA2[m,u])を生成する第2信号処理手段(例えば信号処理部52)とを具備し、指向処理手段は、複数の観測信号における各第2周波数の成分から到来方向毎に指向信号を生成し、係数値生成手段は、各第2周波数について処理係数値を生成し、第1信号処理手段は、複数の観測信号における各第2周波数の成分に当該第2周波数の処理係数値を作用させて第2分離成分(例えば分離成分YB1[n,u]または分離成分YB2[n,u])を生成する。以上の態様では、複数の周波数のうち第1周波数については分離行列を利用した観測信号の処理で第1分離信号が生成され、複数の周波数のうち第2周波数については処理係数値を利用した観測信号の処理で第2分離信号が生成される。したがって、第2分離信号における低域側の強度を維持しながら、全部の周波数について分離行列を生成する構成と比較して音響処理装置の演算量や必要な記憶容量を削減できるという利点がある。   In a preferred embodiment of the present invention, a plurality of frequencies (for example, K frequencies F [1] to F [K]) are set as a first frequency (for example, M first frequencies FA [1] to FA [M]). Frequency selecting means (for example, frequency selecting unit 24) for selecting the second frequency (for example, N second frequencies FB [1] to FB [N]), and separating from the components of the first frequencies in the plurality of observation signals Separation matrix generation means (for example, a separation matrix generation unit 54) for generating a matrix (for example, a separation matrix W [m]) for each first frequency, and separation of the first frequency into each first frequency component in a plurality of observation signals Second signal processing means (for example, signal processing unit 52) for generating a first separated signal (for example, separated component YA1 [m, u] or separated component YA2 [m, u]) by applying a matrix; The processing means generates a directional signal for each direction of arrival from each second frequency component in the plurality of observation signals, and generates a coefficient value The generating means generates a processing coefficient value for each second frequency, and the first signal processing means performs the second separation by applying the processing coefficient value of the second frequency to each second frequency component in the plurality of observation signals. A component (eg, separated component YB1 [n, u] or separated component YB2 [n, u]) is generated. In the above aspect, the first separated signal is generated by processing the observation signal using the separation matrix for the first frequency among the plurality of frequencies, and the observation using the processing coefficient value for the second frequency among the plurality of frequencies. A second separated signal is generated by processing the signal. Therefore, there is an advantage that the calculation amount and necessary storage capacity of the sound processing device can be reduced as compared with the configuration in which the separation matrix is generated for all frequencies while maintaining the low-frequency side strength in the second separated signal.

更に好適な態様において、方向特定手段は、分離行列生成手段が各第1周波数について生成した分離行列から複数の音源の各々について音響の到来方向を推定する。以上の態様では、分離行列生成手段が第1周波数について生成した分離行列から各音響の到来方向が推定されるから、分離行列生成手段による分離行列の生成とは別個に方向特定部が各音響の到来方向を推定する構成と比較して、音響処理装置の演算量や必要な記憶容量が削減されるという利点がある。   In a further preferred aspect, the direction specifying means estimates the sound arrival direction for each of the plurality of sound sources from the separation matrix generated by the separation matrix generation means for each first frequency. In the above aspect, since the direction of arrival of each sound is estimated from the separation matrix generated by the separation matrix generation unit for the first frequency, the direction specifying unit separates each sound from the generation of the separation matrix by the separation matrix generation unit. Compared with the configuration in which the direction of arrival is estimated, there is an advantage that the calculation amount and necessary storage capacity of the sound processing device are reduced.

本発明の好適な態様の音響処理装置は、複数の観測信号における各周波数の成分から分離行列を生成する学習処理の有意性を示す有意指標値を周波数毎に算定する指標算定手段(例えば指標算定部26)を具備し、周波数選別手段は、各周波数の有意指標値に応じて複数の周波数を第1周波数と第2周波数とに選別する。以上の態様では、学習処理(例えば独立成分分析)の有意性を示す有意指標値に応じて複数の周波数が選別されるから、複数の周波数を学習処理の有意性とは無関係に選別する構成と比較して、高精度な音源分離が可能な分離行列を生成することが可能である。   The acoustic processing device according to a preferred aspect of the present invention is an index calculation means (for example, index calculation) that calculates a significant index value indicating the significance of learning processing for generating a separation matrix from components of each frequency in a plurality of observation signals for each frequency. And a frequency sorting means sorts a plurality of frequencies into a first frequency and a second frequency according to a significant index value of each frequency. In the above aspect, since the plurality of frequencies are selected according to the significance index value indicating the significance of the learning process (for example, independent component analysis), the plurality of frequencies are selected regardless of the significance of the learning process. In comparison, it is possible to generate a separation matrix capable of highly accurate sound source separation.

以上の各態様の音響処理装置は、音声の処理に専用されるDSP(Digital Signal Processor)などのハードウェア(電子回路)によって実現されるほか、CPU(Central Processing Unit)などの汎用の演算処理装置とプログラムとの協働によっても実現される。本発明に係るプログラムは、複数の音源(例えば音源PS1および音源PS2)から到来する音響(例えば音響S1および音響S2)の混合音を複数の収音機器(例えば収音機器PM1および収音機器PM2)で収音した複数の観測信号(例えば観測信号x1(t)および観測信号x2(t))を処理するためのプログラムであって、複数の音源の各々について音響の到来方向(例えば到来方向θe1および到来方向θe2)を特定する方向特定処理(例えば方向特定部62)と、方向特定処理で特定した複数の到来方向の各々について、当該到来方向に収音の死角を形成する死角制御型ビーム形成を複数の観測信号について実行することで指向信号(例えば指向成分Z1[n,u]および指向成分Z2[n,u])を生成する指向処理(例えば指向処理部64)と、指向処理で生成した複数の指向信号の振幅の加算値に対する一の指向信号の振幅の比に応じた処理係数値(例えば処理係数値αi[n,u])を周波数毎に生成する係数値生成処理(例えば係数値生成部66)と、観測信号の各周波数の成分に当該周波数の処理係数値を作用させる第1信号処理(例えば信号処理部68)とをコンピュータに実行させる。以上のプログラムによれば、本発明に係る音響処理装置と同様の作用および効果が奏される。本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で利用者に提供されてコンピュータにインストールされるほか、通信網を介した配信の形態でサーバ装置から提供されてコンピュータにインストールされる。   The acoustic processing apparatus of each aspect described above is realized by hardware (electronic circuit) such as a DSP (Digital Signal Processor) dedicated to voice processing, and a general-purpose arithmetic processing apparatus such as a CPU (Central Processing Unit). This is also realized through collaboration with programs. The program according to the present invention converts a mixed sound of sounds (for example, sound S1 and sound S2) coming from a plurality of sound sources (for example, sound source PS1 and sound source PS2) to a plurality of sound collection devices (for example, sound collection device PM1 and sound collection device PM2). ) Is a program for processing a plurality of observation signals (for example, observation signal x1 (t) and observation signal x2 (t)), and the sound arrival direction (for example, arrival direction θe1) for each of the plurality of sound sources. And a direction-of-arrival θe2) direction specifying process (for example, the direction specifying unit 62) and a blind spot control type beam forming that forms a dead angle of sound collection in the direction of arrival for each of a plurality of directions of arrival specified by the direction specifying process. Is executed for a plurality of observation signals to generate a directional signal (for example, directional component Z1 [n, u] and directional component Z2 [n, u]) and a directional process to generate Multiple Coefficient value generation processing (for example, coefficient value generation unit) that generates a processing coefficient value (for example, processing coefficient value αi [n, u]) corresponding to the ratio of the amplitude of one directional signal to the sum of the directional signal amplitude values 66) and the first signal processing (for example, the signal processing unit 68) for causing the processing coefficient value of the frequency to act on each frequency component of the observation signal. According to the above program, the same operation and effect as the sound processing apparatus according to the present invention are exhibited. The program of the present invention is provided to a user in a form stored in a computer-readable recording medium and installed in the computer, or provided from a server device in a form of distribution via a communication network and installed in the computer. Is done.

第1実施形態の音響処理装置のブロック図である。1 is a block diagram of a sound processing apparatus according to a first embodiment. 第1音源分離部および第2音源分離部のブロック図である。It is a block diagram of a 1st sound source separation part and a 2nd sound source separation part. 指向処理部のブロック図である。It is a block diagram of an orientation processing unit. 音源分離の前後の振幅スペクトルである。It is an amplitude spectrum before and after sound source separation. 学習処理の対象となる第1周波数の個数と音源分離の精度(セグメンタルSNR)との関係を示すグラフである。It is a graph which shows the relationship between the number of the 1st frequency used as the object of a learning process, and the precision (segmental SNR) of sound source separation. 学習処理の対象となる第1周波数の個数と音源分離の精度(SIRの変化量)との関係を示すグラフである。It is a graph which shows the relationship between the number of the 1st frequency used as the object of a learning process, and the precision (SIR variation | change_quantity) of sound source separation. 第2実施形態の音響処理装置のブロック図である。It is a block diagram of the sound processing apparatus of 2nd Embodiment.

<A:第1実施形態>
図1は、第1実施形態に係る音響処理装置100Aのブロック図である。相互に間隔をあけて配置された収音機器PM1および収音機器PM2が音響処理装置100Aに接続される。収音機器PM1および収音機器PM2は、例えば無指向性または指向性のマイクロホンである。収音機器PM1および収音機器PM2の周辺の相異なる位置には音源PS1および音源PS2が存在する。音源PS1は、観測点(例えば収音機器PM1と収音機器PM2との中点)に対して方向θ1に位置し、音源PS2は観測点に対して方向θ2に位置する。
<A: First Embodiment>
FIG. 1 is a block diagram of a sound processing apparatus 100A according to the first embodiment. The sound collection device PM1 and the sound collection device PM2 that are arranged at a distance from each other are connected to the sound processing apparatus 100A. The sound collection device PM1 and the sound collection device PM2 are non-directional or directional microphones, for example. The sound source PS1 and the sound source PS2 exist at different positions around the sound collection device PM1 and the sound collection device PM2. The sound source PS1 is located in the direction θ1 with respect to the observation point (for example, the midpoint between the sound collection devices PM1 and PM2), and the sound source PS2 is located in the direction θ2 with respect to the observation point.

音源PS1が発生した音響S1と音源PS2が発生した音響S2との混合音が収音機器PM1および収音機器PM2に到達する。収音機器PM1は観測信号x1(t)を生成し、収音機器PM2は観測信号x2(t)を生成する。観測信号x1(t)および観測信号x2(t)の各々は、音響S1と音響S2との混合音の時間波形を表す音響信号である(t:時間)。   The mixed sound of the sound S1 generated by the sound source PS1 and the sound S2 generated by the sound source PS2 reaches the sound collection device PM1 and the sound collection device PM2. The sound collection device PM1 generates an observation signal x1 (t), and the sound collection device PM2 generates an observation signal x2 (t). Each of the observation signal x1 (t) and the observation signal x2 (t) is an acoustic signal representing a time waveform of a mixed sound of the sound S1 and the sound S2 (t: time).

音響処理装置100Aは、観測信号x1(t)および観測信号x2(t)に対する音源分離で分離信号y1(t)および分離信号y2(t)を生成する信号処理装置である。分離信号y1(t)は、音響S1を強調(音響S2を抑制)した音響信号であり、分離信号y2(t)は、音響S2を強調(音響S1を抑制)した音響信号である。すなわち、音響S1と音響S2とが分離(音源分離)される。   The acoustic processing device 100A is a signal processing device that generates a separated signal y1 (t) and a separated signal y2 (t) by sound source separation for the observed signal x1 (t) and the observed signal x2 (t). The separated signal y1 (t) is an acoustic signal that emphasizes the sound S1 (suppresses the sound S2), and the separated signal y2 (t) is an acoustic signal that emphasizes the sound S2 (suppresses the sound S1). That is, the sound S1 and the sound S2 are separated (sound source separation).

分離信号y1(t)および分離信号y2(t)は、スピーカやヘッドホン等の放音機器(図示略)に供給されることで音響として再生される。なお、分離信号y1(t)および分離信号y2(t)の一方のみを生成する構成(例えば分離信号y2(t)を雑音として破棄する構成)も採用される。また、観測信号x1(t)および観測信号x2(t)をアナログからデジタルに変換するA/D変換器や、分離信号y1(t)および分離信号y2(t)をデジタルからアナログに変換するD/A変換器の図示は便宜的に省略した。   The separated signal y1 (t) and the separated signal y2 (t) are reproduced as sound by being supplied to a sound emitting device (not shown) such as a speaker or a headphone. A configuration that generates only one of the separated signal y1 (t) and the separated signal y2 (t) (for example, a configuration that discards the separated signal y2 (t) as noise) is also employed. Also, an A / D converter that converts the observation signal x1 (t) and the observation signal x2 (t) from analog to digital, and a D that converts the separation signal y1 (t) and separation signal y2 (t) from digital to analog. The illustration of the / A converter is omitted for convenience.

図1に示すように、音響処理装置100Aは、演算処理装置12と記憶装置14とを具備するコンピュータシステムで実現される。記憶装置14は、演算処理装置12が実行するプログラムや演算処理装置12が使用する各種の情報を記憶する。磁気記録媒体や半導体記録媒体等の公知の記録媒体または複数種の記録媒体の組合せが記憶装置14として任意に採用される。観測信号x1(t)および観測信号x2(t)を事前に収録して記憶装置14に格納した構成(したがって収音機器PM1および収音機器PM2は省略される)も好適である。   As shown in FIG. 1, the sound processing device 100 </ b> A is realized by a computer system including an arithmetic processing device 12 and a storage device 14. The storage device 14 stores programs executed by the arithmetic processing device 12 and various types of information used by the arithmetic processing device 12. A known recording medium such as a magnetic recording medium or a semiconductor recording medium or a combination of a plurality of types of recording media is arbitrarily employed as the storage device 14. A configuration in which the observation signal x1 (t) and the observation signal x2 (t) are recorded in advance and stored in the storage device 14 (therefore, the sound collection device PM1 and the sound collection device PM2 are omitted) is also preferable.

演算処理装置12は、記憶装置14に格納されたプログラムを実行することで複数の要素(周波数解析部22,周波数選別部24,指標算定部26,第1音源分離部31,第2音源分離部32,周波数統合部42,波形合成部44)として機能する。なお、演算処理装置12の各機能を複数の集積回路に分散した構成や、専用の電子回路(DSP)が各機能を実現する構成も採用され得る。   The arithmetic processing unit 12 executes a program stored in the storage device 14 to thereby execute a plurality of elements (frequency analysis unit 22, frequency selection unit 24, index calculation unit 26, first sound source separation unit 31, second sound source separation unit). 32, a frequency integration unit 42, and a waveform synthesis unit 44). A configuration in which each function of the arithmetic processing unit 12 is distributed over a plurality of integrated circuits, or a configuration in which a dedicated electronic circuit (DSP) realizes each function may be employed.

周波数解析部22は、観測信号x1(t)を周波数軸上の周波数F[k](k=1〜K)毎(帯域毎)に区分したK個の周波数成分X1[k,u](X1[1,u]〜X1[K,u])と、観測信号x2(t)を周波数F[k]毎に区分したK個の周波数成分X2[k,u](X2[1,u]〜X2[K,u])とを単位期間(フレーム)毎に順次に生成する。記号kは周波数軸上の各周波数を示す変数であり、記号uは時間軸上の各時点を示す変数(例えば単位期間の番号)である。各周波数成分X1[k,u]および各周波数成分X2[k,u]の生成には、例えば短時間フーリエ変換等の公知の周波数解析が任意に採用される。また、通過帯域が相違するK個の帯域通過フィルタ(フィルタバンク)を周波数解析部22として利用することも可能である。周波数成分X1[k,u]と周波数成分X2[k,u]とを要素とする観測ベクトルXv[k,u](Xv[k,u]=[X1[k,u],X2[k,u]]T)が順次に記憶装置14に格納される。なお、記号Tは行列の転置を意味する。 The frequency analysis unit 22 divides the observation signal x1 (t) into K frequency components X1 [k, u] (X1) obtained by dividing the observation signal x1 (t) for each frequency F [k] (k = 1 to K) (for each band) on the frequency axis. [1, u] to X1 [K, u]) and K frequency components X2 [k, u] (X2 [1, u] to X) obtained by dividing the observation signal x2 (t) for each frequency F [k] X2 [K, u]) are sequentially generated for each unit period (frame). Symbol k is a variable indicating each frequency on the frequency axis, and symbol u is a variable indicating each time point on the time axis (for example, a unit period number). For generating each frequency component X1 [k, u] and each frequency component X2 [k, u], for example, known frequency analysis such as short-time Fourier transform is arbitrarily employed. Further, K band pass filters (filter banks) having different pass bands can be used as the frequency analysis unit 22. Observation vector Xv [k, u] (Xv [k, u] = [X1 [k, u], X2 [k, u] having frequency component X1 [k, u] and frequency component X2 [k, u] as elements) u]] T ) are sequentially stored in the storage device 14. Note that the symbol T means transposition of a matrix.

周波数選別部24は、K個の周波数F[1]〜F[K]をM個の第1周波数FA[1]〜FA[M]とN個の第2周波数FB[1]〜FB[N]とに単位期間毎に選別する(MおよびNは自然数。K=M+N)。周波数解析部22が生成したK個の周波数成分X1[1,u]〜X1[K,u]のうち各第1周波数FA[m](m=1〜M)のM個の周波数成分XA1[1,u]〜XA1[M,u]は第1音源分離部31に供給され、各第2周波数FB[n](n=1〜N)のN個の周波数成分XB1[1,u]〜XB1[N,u]は第2音源分離部32に供給される。同様に、K個の周波数成分X2[1,u]〜X2[K,u]のうち各第1周波数FA[m]のM個の周波数成分XA2[1,u]〜XA2[M,u]は第1音源分離部31に供給され、各第2周波数FB[n]のN個の周波数成分XB2[1,u]〜XB2[N,u]は第2音源分離部32に供給される。   The frequency selection unit 24 selects the K frequencies F [1] to F [K] from the M first frequencies FA [1] to FA [M] and the N second frequencies FB [1] to FB [N]. ] For each unit period (M and N are natural numbers, K = M + N). Of the K frequency components X1 [1, u] to X1 [K, u] generated by the frequency analysis unit 22, M frequency components XA1 [of the first frequencies FA [m] (m = 1 to M) are included. 1, u] to XA1 [M, u] are supplied to the first sound source separation unit 31, and N frequency components XB1 [1, u] to the second frequencies FB [n] (n = 1 to N) are provided. XB1 [N, u] is supplied to the second sound source separation unit 32. Similarly, M frequency components XA2 [1, u] to XA2 [M, u] of each first frequency FA [m] among K frequency components X2 [1, u] to X2 [K, u]. Is supplied to the first sound source separation unit 31, and N frequency components XB2 [1, u] to XB2 [N, u] of each second frequency FB [n] are supplied to the second sound source separation unit 32.

指標算定部26は、周波数選別部24による周波数F[k]の選別の基準となる有意指標値σ[k](σ[1]〜σ[K])をK個の周波数F[1]〜F[K]の各々について算定する。有意指標値σ[k]の算定は所定の周期(例えば単位期間の所定個毎)で実行される。有意指標値σ[k]は、その周波数F[k]の観測ベクトルXv[k,u]から分離行列を生成する学習処理(独立成分分析)の有意性の尺度となる数値である。第1実施形態の指標算定部26は、所定個の単位期間にわたる周波数F[k]の観測ベクトルXv[k,u]の時系列の共分散行列Rxx[k](Rxx[k]=E[Xv[k,u]Xv[k,u]H])の行列式をその周波数F[k]の有意指標値σ[k]として算定する。記号Hは行列の共役転置を意味し、記号E[ ]は所定個の単位期間にわたる平均値(期待値)または加算値を意味する。なお、共分散行列Rxx[k]の行列式の算定については特許文献1に詳述されている。 The index calculation unit 26 sets the significant index value σ [k] (σ [1] to σ [K]), which serves as a reference for selection of the frequency F [k] by the frequency selection unit 24, to K frequencies F [1] to Calculate for each of F [K]. The calculation of the significant index value σ [k] is executed at a predetermined cycle (for example, every predetermined unit period). The significant index value σ [k] is a numerical value that is a measure of the significance of learning processing (independent component analysis) that generates a separation matrix from the observation vector Xv [k, u] of the frequency F [k]. The index calculation unit 26 according to the first embodiment uses a time-series covariance matrix Rxx [k] (Rxx [k] = E [) of the observation vector Xv [k, u] of the frequency F [k] over a predetermined number of unit periods. Xv [k, u] Xv [k, u] H ]) is calculated as a significant index value σ [k] of the frequency F [k]. The symbol H means conjugate transpose of the matrix, and the symbol E [] means an average value (expected value) or an added value over a predetermined number of unit periods. The calculation of the determinant of the covariance matrix Rxx [k] is described in detail in Patent Document 1.

周波数選別部24は、指標算定部26が算定した各有意指標値σ[k]に応じてK個の周波数F[1]〜F[K]をM個の第1周波数FA[1]〜FA[M]とN個の第2周波数FB[1]〜FB[N]とに選別する。特許文献1に開示されるように、共分散行列Rxx[k]の行列式が小さいほど、観測ベクトルXv[k,u]を適用した行列処理の有意性(学習処理の前後で音源分離の精度が向上する度合)が低いという傾向がある。そこで、周波数選別部24は、K個の周波数F[1]〜F[K]のうち有意指標値σ[k]が大きいM個(例えば有意指標値σ[k]の降順で上位のM個や有意指標値σ[k]が所定の閾値を上回るM個)の周波数F[k]を第1周波数FA[1]〜FA[M]に選別し、有意指標値σ[k]が小さいN個の周波数F[k]を第2周波数FB[1]〜FB[N]に選別する。   The frequency selecting unit 24 selects the K frequencies F [1] to F [K] from the M first frequencies FA [1] to FA according to each significant index value σ [k] calculated by the index calculating unit 26. [M] and N second frequencies FB [1] to FB [N] are selected. As disclosed in Patent Document 1, the smaller the determinant of the covariance matrix Rxx [k], the more significant the matrix processing to which the observation vector Xv [k, u] is applied (the accuracy of sound source separation before and after the learning processing). Tend to be low). Therefore, the frequency selection unit 24 has M significant sign values [sigma] [k] among the K frequencies F [1] to F [K] (for example, the top M numbers in descending order of the significant index value [sigma] [k]). Or the frequency F [k] of which the significant index value σ [k] exceeds a predetermined threshold) is selected as the first frequencies FA [1] to FA [M], and the significant index value σ [k] is small N The individual frequencies F [k] are selected as the second frequencies FB [1] to FB [N].

図1の第1音源分離部31は、各周波数成分XA1[m,u]および各周波数成分XA2[m,u]に対して独立成分分析を適用した音源分離を実行することで、各第1周波数FA[m]に対応するM個の分離成分YA1[1,u]〜YA1[M,u]とM個の分離成分YA2[1,u]〜YA2[M,u]とを単位期間毎に生成する。分離成分YA1[m,u]は、第1周波数FA[m]における音響S1の成分を強調(音響S2を抑制)した周波数成分であり、分離成分YA2[m,u]は、第2周波数FB[m]における音響S2の成分を強調(音響S1を抑制)した周波数成分である。   The first sound source separation unit 31 in FIG. 1 performs each sound source separation by applying independent component analysis to each frequency component XA1 [m, u] and each frequency component XA2 [m, u]. M separation components YA1 [1, u] to YA1 [M, u] and M separation components YA2 [1, u] to YA2 [M, u] corresponding to the frequency FA [m] for each unit period. To generate. The separated component YA1 [m, u] is a frequency component in which the component of the sound S1 at the first frequency FA [m] is emphasized (the sound S2 is suppressed), and the separated component YA2 [m, u] is the second frequency FB. This is a frequency component in which the component of the sound S2 in [m] is emphasized (the sound S1 is suppressed).

第2音源分離部32は、第1音源分離部31とは相違する信号処理を各周波数成分XB1[n,u]および各周波数成分XB2[n,u]に対して実行することで、各第2周波数FB[m]に対応するN個の分離成分YB1[1,u]〜YB1[N,u]とN個の分離成分YB2[1,u]〜YB2[N,u]とを単位期間毎に生成する。分離成分YB1[n,u]は、第2周波数FB[n]における音響S1の成分を強調(音響S2を抑制)した周波数成分であり、分離成分YB2[n,u]は、第2周波数FB[n]における音響S2の成分を強調(音響S1を抑制)した周波数成分である。   The second sound source separation unit 32 performs the signal processing different from that of the first sound source separation unit 31 on each frequency component XB1 [n, u] and each frequency component XB2 [n, u], so A unit period of N separated components YB1 [1, u] to YB1 [N, u] and N separated components YB2 [1, u] to YB2 [N, u] corresponding to two frequencies FB [m] Generate every time. The separated component YB1 [n, u] is a frequency component in which the component of the sound S1 at the second frequency FB [n] is emphasized (the sound S2 is suppressed), and the separated component YB2 [n, u] is the second frequency FB. This is a frequency component in which the component of the sound S2 in [n] is emphasized (the sound S1 is suppressed).

周波数統合部42は、第1音源分離部31が生成したM個の分離成分YA1[1,u]〜YA1[M,u]と第2音源分離部32が生成したN個の分離成分YB1[1,u]〜YB1[N,u]とを周波数の順番に配列(統合)することでK個の分離成分Y1[1,u]〜Y1[K,u]を単位期間毎に生成する。同様に、周波数統合部42は、M個の分離成分YA2[1,u]〜YA2[M,u]とN個の分離成分YB2[1,u]〜YB2[N,u]とを配列したK個の分離成分Y2[1,u]〜Y2[K,u]を単位期間毎に生成する。   The frequency integration unit 42 includes M separation components YA1 [1, u] to YA1 [M, u] generated by the first sound source separation unit 31 and N separation components YB1 [generated by the second sound source separation unit 32. 1, u] to YB1 [N, u] are arranged (integrated) in the order of frequencies to generate K separation components Y1 [1, u] to Y1 [K, u] for each unit period. Similarly, the frequency integration unit 42 arranges M separated components YA2 [1, u] to YA2 [M, u] and N separated components YB2 [1, u] to YB2 [N, u]. K separation components Y2 [1, u] to Y2 [K, u] are generated for each unit period.

波形合成部44は、周波数統合部42が単位期間毎に生成するK個の分離成分Y1[1,u]〜Y1[K,u]から時間領域の分離信号y1(t)を生成する。具体的には、波形合成部44は、K個の分離成分Y1[1,u]〜Y1[K,u]の系列(周波数スペクトル)を逆フーリエ変換で時間領域に変換するとともに前後の単位期間について相互に連結することで分離信号y1(t)を生成する。同様に、波形合成部44は、周波数統合部42が単位期間毎に生成するK個の分離成分Y2[1,u]〜Y2[K,u]から分離信号y2(t)を生成する。   The waveform synthesizer 44 generates a time-domain separated signal y1 (t) from the K separated components Y1 [1, u] to Y1 [K, u] generated by the frequency integrating unit 42 for each unit period. Specifically, the waveform synthesizer 44 converts the sequence (frequency spectrum) of the K separated components Y1 [1, u] to Y1 [K, u] into the time domain by inverse Fourier transform, and before and after the unit period. Are separated from each other to generate a separation signal y1 (t). Similarly, the waveform synthesizer 44 generates a separation signal y2 (t) from the K separation components Y2 [1, u] to Y2 [K, u] generated by the frequency integration unit 42 for each unit period.

図2は、第1音源分離部31および第2音源分離部32のブロック図である。図2に示すように、第1音源分離部31は、信号処理部52と分離行列生成部54とを含んで構成される。信号処理部52は、各第1周波数FA[m]の周波数成分XA1[m,u]および周波数成分XA2[m,u]にその第1周波数FA[m]の分離行列W[m]を作用させることで分離成分YA1[m,u]および分離成分YA2[m,u]を生成する。具体的には、信号処理部52は、M個の第1周波数FA[1]〜FA[M]の各々について以下の数式(1)の演算(音源分離)を実行する。

Figure 2012178679
FIG. 2 is a block diagram of the first sound source separation unit 31 and the second sound source separation unit 32. As shown in FIG. 2, the first sound source separation unit 31 includes a signal processing unit 52 and a separation matrix generation unit 54. The signal processing unit 52 applies the separation matrix W [m] of the first frequency FA [m] to the frequency component XA1 [m, u] and the frequency component XA2 [m, u] of each first frequency FA [m]. By doing so, the separation component YA1 [m, u] and the separation component YA2 [m, u] are generated. Specifically, the signal processing unit 52 performs calculation (sound source separation) of the following formula (1) for each of the M first frequencies FA [1] to FA [M].
Figure 2012178679

図2の分離行列生成部54は、信号処理部52が数式(1)の音源分離に適用する分離行列W[m](W[1]〜W[M])をM個の第1周波数FA[1]〜FA[M]の各々について単位期間毎に生成する。分離行列W[m]の生成には独立成分分析を適用した学習処理(分離行列W[m]の累積的な更新)が採用される。分離行列W[m]の学習処理には公知の技術が任意に採用され得るが、第p回目の更新後の分離行列Wp[m]から直後の分離行列Wp+1[m]を算定する以下の数式(2)の演算が好適である。なお、最初の分離行列W1[m]の算定には所定の初期行列W0[m](例えば単位行列)が適用される。

Figure 2012178679
The separation matrix generation unit 54 in FIG. 2 converts the separation matrix W [m] (W [1] to W [M]) that the signal processing unit 52 applies to the sound source separation of Expression (1) to M first frequencies FA. Each of [1] to FA [M] is generated for each unit period. For the generation of the separation matrix W [m], a learning process (cumulative update of the separation matrix W [m]) to which independent component analysis is applied is employed. Although a known technique can be arbitrarily employed for the learning process of the separation matrix W [m], the immediately following separation matrix W p + 1 [m] is calculated from the p-th updated separation matrix W p [m]. The following equation (2) is preferable. A predetermined initial matrix W 0 [m] (for example, a unit matrix) is applied to the calculation of the first separation matrix W 1 [m].
Figure 2012178679

数式(2)の記号ηは所定の定数(ステップサイズ)を意味し、記号off-diag( )は、対角成分をゼロに置換する演算子を意味する。また、記号φ[m,n]は所定の非線形関数(例えば双曲線正接関数)である。数式(2)の記号Yvp[m,u]は、周波数成分XA1[m,u]および周波数成分XA2[m,u]に分離行列Wp[m]を作用させる数式(1)の演算で算定されるベクトル(Yvp[m,u]=[YA1[m,u],YA2[m,u]]T)を意味する。分離行列生成部54は、数式(2)の演算を所定回だけ反復した時点の分離行列Wp+1[m]を分離行列W[m]として確定する。以上が第1音源分離部31の構成および作用である。 The symbol η in Equation (2) means a predetermined constant (step size), and the symbol off-diag () means an operator that replaces the diagonal component with zero. The symbol φ [m, n] is a predetermined nonlinear function (for example, a hyperbolic tangent function). The symbol Yv p [m, u] in Equation (2) is an operation of Equation (1) that causes the separation matrix W p [m] to act on the frequency component XA1 [m, u] and the frequency component XA2 [m, u]. This means the calculated vector (Yv p [m, u] = [YA1 [m, u], YA2 [m, u]] T ). The separation matrix generation unit 54 determines the separation matrix W p + 1 [m] at the point of time when the calculation of Expression (2) is repeated a predetermined number of times as the separation matrix W [m]. The above is the configuration and operation of the first sound source separation unit 31.

図2に示すように、第2音源分離部32は、演算処理部60と信号処理部68とを含んで構成される。演算処理部60は、N個の第2周波数FB[1]〜FB[N]の各々について処理係数値α1[n,u](α1[1,u]〜α1[N,u])と処理係数値α2[n,u](α2[1,u]〜α2[N,u])とを設定する。処理係数値α1[n,u]および処理係数値α2[n,u]の算定は所定の周期毎(例えば単位期間毎)に実行される。   As shown in FIG. 2, the second sound source separation unit 32 includes an arithmetic processing unit 60 and a signal processing unit 68. The arithmetic processing unit 60 processes the processing coefficient value α1 [n, u] (α1 [1, u] to α1 [N, u]) for each of the N second frequencies FB [1] to FB [N]. The coefficient value α2 [n, u] (α2 [1, u] to α2 [N, u]) is set. The calculation of the processing coefficient value α1 [n, u] and the processing coefficient value α2 [n, u] is executed every predetermined cycle (for example, every unit period).

各第2周波数FB[n]の処理係数値α1[n,u]および処理係数値α2[n,u]は、音響S1のうちその第2周波数FB[n]の周波数成分S1[n,u]の振幅|S1[n,u]|と、音響S2のうちその第2周波数FB[n]の周波数成分S2[n,u]の振幅|S2[n,u]|との関係(大小)に応じて0以上かつ1以下の範囲内で可変に設定される。具体的には、音響S1の振幅|S1[n,u]|が振幅|S2[n,u]|に対して大きいほど処理係数値α1[n,u]は大きい数値に設定され、音響S2の振幅|S2[n,u]|が振幅|S1[n,u]|に対して大きいほど処理係数値α2[n,u]は大きい数値に設定される。   The processing coefficient value α1 [n, u] and the processing coefficient value α2 [n, u] of each second frequency FB [n] are the frequency components S1 [n, u of the second frequency FB [n] of the sound S1. ] | S1 [n, u] | and the amplitude | S2 [n, u] | of the frequency component S2 [n, u] of the second frequency FB [n] of the sound S2 (large or small) Accordingly, it is variably set within the range of 0 or more and 1 or less. Specifically, the processing coefficient value α1 [n, u] is set to a larger value as the amplitude | S1 [n, u] | of the sound S1 is larger than the amplitude | S2 [n, u] | The processing coefficient value α2 [n, u] is set to a larger numerical value as the amplitude | S2 [n, u] | is larger than the amplitude | S1 [n, u] |.

図2の信号処理部68は、各第2周波数FB[n]の周波数成分XB1[n,u]および周波数成分XB2[n,u]にその第2周波数FB[n]の処理係数値α1[n,u]および処理係数値α2[n,u]を作用させることで分離成分YB1[n,u]および分離成分YB2[n,u]を単位期間毎に生成する。具体的には、信号処理部68は、N個の第2周波数FB[1]〜FB[N]の各々について以下の数式(3A)および数式(3B)の演算を実行する。   The signal processing unit 68 in FIG. 2 applies the processing coefficient value α1 [of the second frequency FB [n] to the frequency component XB1 [n, u] and the frequency component XB2 [n, u] of each second frequency FB [n]. By applying n, u] and the processing coefficient value α2 [n, u], the separation component YB1 [n, u] and the separation component YB2 [n, u] are generated for each unit period. Specifically, the signal processing unit 68 performs the following calculations (3A) and (3B) for each of the N second frequencies FB [1] to FB [N].

Figure 2012178679
Figure 2012178679

すなわち、周波数成分XB1[n,u]に対する処理係数値α1[n,u]の乗算で、音響S1の周波数成分S1[n,u]を強調した分離成分YB1[n,u]が生成され、周波数成分XB2[n,u]に対する処理係数値α2[n,u]の乗算で、音響S2の周波数成分S2[n,u]を強調した分離成分YB2[n,u]が生成される。したがって、処理係数値α1[n,u]は周波数成分XB1[n,u]に対する利得(スペクトルゲイン)に相当し、処理係数値α2[n,u]は周波数成分XB2[n,u]に対する利得に相当する。   That is, by separating the frequency component XB1 [n, u] by the processing coefficient value α1 [n, u], a separated component YB1 [n, u] in which the frequency component S1 [n, u] of the sound S1 is emphasized is generated. A separated component YB2 [n, u] in which the frequency component S2 [n, u] of the sound S2 is emphasized is generated by multiplying the frequency component XB2 [n, u] by the processing coefficient value α2 [n, u]. Therefore, the processing coefficient value α1 [n, u] corresponds to a gain (spectral gain) for the frequency component XB1 [n, u], and the processing coefficient value α2 [n, u] is a gain for the frequency component XB2 [n, u]. It corresponds to.

図2に示すように、演算処理部60は、方向特定部62と指向処理部64と係数値生成部66とを含んで構成される。方向特定部62は、音響S1の到来方向(音源PS1の方向)θe1と音響S2の到来方向(音源PS2の方向)θe2とを特定する。なお、以下の説明において符号の添字eは推定値(estimate)を意味する。   As shown in FIG. 2, the arithmetic processing unit 60 includes a direction specifying unit 62, a directing processing unit 64, and a coefficient value generating unit 66. The direction specifying unit 62 specifies the arrival direction of the sound S1 (the direction of the sound source PS1) θe1 and the arrival direction of the sound S2 (the direction of the sound source PS2) θe2. In the following description, the subscript “e” denotes an estimated value (estimate).

第1実施形態の方向特定部62は、分離行列生成部54が第1周波数FA[m]毎に生成する分離行列W[m](W[1]〜W[M])を利用して到来方向θe1および到来方向θe2を推定する。到来方向θe1および到来方向θe2の推定には公知の技術(例えば非特許文献1に開示された方法)が任意に採用される。例えば、方向特定部62は、各第1周波数FA[m]の分離行列W[m]から音響S1の到来方向θe1[m]と音響S2の到来方向θe2[m]とを推定し、M個の到来方向θe1[1]〜θe1[M]の代表値(例えば加重和や平均値や中央値)を到来方向θe1として確定するとともにM個の到来方向θe2[1]〜θe2[M]の代表値を到来方向θe2として確定する。   The direction specifying unit 62 of the first embodiment arrives by using the separation matrix W [m] (W [1] to W [M]) generated by the separation matrix generation unit 54 for each first frequency FA [m]. The direction θe1 and the arrival direction θe2 are estimated. A known technique (for example, the method disclosed in Non-Patent Document 1) is arbitrarily employed for estimating the arrival direction θe1 and the arrival direction θe2. For example, the direction specifying unit 62 estimates the arrival direction θe1 [m] of the sound S1 and the arrival direction θe2 [m] of the sound S2 from the separation matrix W [m] of each first frequency FA [m], and M pieces. The representative values (for example, weighted sum, average value, and median) of the arrival directions θe1 [1] to θe1 [M] are determined as the arrival direction θe1 and are representative of the M arrival directions θe2 [1] to θe2 [M]. The value is determined as the direction of arrival θe2.

図2の指向処理部64は、所定の方向に収音の死角(収音の感度が低い領域)を形成する処理(以下「死角制御型ビーム形成」という)を周波数成分XB1[n,u]および周波数成分XB2[n,u]に対して実行することで、指向成分Z1[n,u](Z1[1,u]〜Z1[N,u])と指向成分Z2[n,u](Z2[1,u]〜Z2[N,u])とを単位期間毎に生成する。具体的には、指向処理部64は、方向特定部62が特定した到来方向θe2に収音の死角を形成する死角制御型ビーム形成(NBF)を周波数成分XB1[n,u]および周波数成分XB2[n,u]に実行することで指向成分Z1[n,u]を生成し、到来方向θe1に収音の死角を形成する死角制御型ビーム形成を周波数成分XB1[n,u]および周波数成分XB2[n,u]に実行することで指向成分Z2[n,u]を生成する。したがって、指向成分Z1[n,u]では到来方向θe2からの到来音(音響S2)が抑制され、指向成分Z2[n,u]では到来方向θe1からの到来音(音響S1)が抑制される。   The directivity processing unit 64 in FIG. 2 performs a process (hereinafter referred to as “dead angle control beam forming”) for forming a blind spot of sound collection (an area where the sensitivity of sound collection is low) in a predetermined direction as a frequency component XB1 [n, u]. And the directional component Z1 [n, u] (Z1 [1, u] to Z1 [N, u]) and the directional component Z2 [n, u] ( Z2 [1, u] to Z2 [N, u]) are generated for each unit period. Specifically, the directivity processing unit 64 performs blind angle control type beam forming (NBF) that forms a dead angle of sound collection in the arrival direction θe2 specified by the direction specifying unit 62, with frequency components XB1 [n, u] and frequency components XB2 [n, u] generates a directional component Z1 [n, u], and a blind spot control type beam forming that forms a dead angle of sound collection in the arrival direction θe1 is performed with a frequency component XB1 [n, u] and a frequency component. The directional component Z2 [n, u] is generated by executing it on XB2 [n, u]. Therefore, the directional component Z1 [n, u] suppresses the incoming sound (sound S2) from the arrival direction θe2, and the directional component Z2 [n, u] suppresses the incoming sound (sound S1) from the arrival direction θe1. .

図3は、指向処理部64のブロック図である。図3には、音源PS1が放射した音響S1(周波数成分S1[n,u])と音源PS2が放射した音響S2(周波数成分S2[n,u])とが収音機器PM1および収音機器PM2の各々に到達するまでの伝播経路のモデルが便宜的に併記されている。   FIG. 3 is a block diagram of the directional processing unit 64. In FIG. 3, the sound S1 (frequency component S1 [n, u]) radiated by the sound source PS1 and the sound S2 (frequency component S2 [n, u]) radiated by the sound source PS2 are collected by the sound collecting device PM1 and the sound collecting device. A model of a propagation path to reach each of PM2 is shown for convenience.

図3の記号Ai[n](i=1,2)は、音響Siの周波数成分Si[n,u]の伝播損失(伝播経路で付与される利得)を意味する。なお、周波数成分Si[n,u]の伝播遅延は、伝播損失Ai[n]に反映されることを考慮して図3では省略した。図3の記号τi1は、周波数成分Si[n,u]が収音機器PM2に到達してから収音機器PM1に到達するまでの遅延(時間差)を意味し、記号τi2は、周波数成分Si[n,u]が収音機器PM1に到達してから収音機器PM2に到達するまでの遅延を意味する。   The symbol Ai [n] (i = 1, 2) in FIG. 3 means the propagation loss (gain gained in the propagation path) of the frequency component Si [n, u] of the acoustic Si. Note that the propagation delay of the frequency component Si [n, u] is omitted in FIG. 3 in consideration of the fact that it is reflected in the propagation loss Ai [n]. The symbol τi1 in FIG. 3 means a delay (time difference) from when the frequency component Si [n, u] reaches the sound collection device PM2 until it reaches the sound collection device PM1, and the symbol τi2 represents the frequency component Si [ n, u] means a delay from reaching the sound collection device PM1 to reaching the sound collection device PM2.

図3から理解されるように、収音機器PMjによる収音後の周波数成分XB1[n,u]および周波数成分XB2[n,u]は、以下の数式(4A)および数式(4B)で表現される。数式(4A)および数式(4B)の記号ω[n]は第2周波数FB[n]に対応する角周波数を意味し、記号jは虚数単位を意味する。

Figure 2012178679
As understood from FIG. 3, the frequency component XB1 [n, u] and the frequency component XB2 [n, u] after sound collection by the sound collection device PMj are expressed by the following equations (4A) and (4B). Is done. In the equations (4A) and (4B), the symbol ω [n] means an angular frequency corresponding to the second frequency FB [n], and the symbol j means an imaginary unit.
Figure 2012178679

図3に示すように、指向処理部64は、指向成分Z1[n,u]を生成する第1処理部72と指向成分Z2[n,u]を生成する第2処理部74とを具備する。第1処理部72は、周波数成分XB1[n,u]に遅延τe22を付与する遅延部721と、周波数成分XB2[n,u]に遅延τe21を付与する遅延部723と、遅延部721および遅延部723の各出力間の差分を指向成分Z1[n,u]として生成する演算部725とを含んで構成される。同様に、第2処理部74は、周波数成分XB2[n,u]に遅延τe11を付与する遅延部741と、周波数成分XB1[n,u]に遅延τe12を付与する遅延部743と、遅延部741および遅延部743の各出力間の差分を指向成分Z2[n,u]として生成する演算部745とを含んで構成される。遅延τeijは、伝播経路で付与される遅延τijの推定値である。遅延τe21および遅延τe22は、到来方向θe2に収音の死角が形成されるように設定され、遅延τe11および遅延τe12は到来方向θe1に収音の死角が形成されるように設定される。   As shown in FIG. 3, the directional processing unit 64 includes a first processing unit 72 that generates a directional component Z1 [n, u] and a second processing unit 74 that generates a directional component Z2 [n, u]. . The first processing unit 72 includes a delay unit 721 that adds a delay τe22 to the frequency component XB1 [n, u], a delay unit 723 that adds a delay τe21 to the frequency component XB2 [n, u], a delay unit 721, and a delay And a calculation unit 725 that generates a difference between the outputs of the unit 723 as a directional component Z1 [n, u]. Similarly, the second processing unit 74 includes a delay unit 741 that applies a delay τe11 to the frequency component XB2 [n, u], a delay unit 743 that applies a delay τe12 to the frequency component XB1 [n, u], and a delay unit. 741 and a calculation unit 745 that generates a difference between the outputs of the delay unit 743 as a directional component Z2 [n, u]. The delay τeij is an estimated value of the delay τij given in the propagation path. The delay τe21 and the delay τe22 are set so that a sound collection dead angle is formed in the arrival direction θe2, and the delay τe11 and the delay τe12 are set so that a sound collection dead angle is formed in the arrival direction θe1.

図3から理解されるように、指向成分Z1[n,u]および指向成分Z2[n,u]は、以下の数式(5A)および数式(5B)で表現される。

Figure 2012178679
As understood from FIG. 3, the directional component Z1 [n, u] and the directional component Z2 [n, u] are expressed by the following formulas (5A) and (5B).
Figure 2012178679

数式(4A)および数式(4B)を数式(5A)に代入して変形すると以下の数式(6A)が導出される。同様に、数式(4A)および数式(4B)を数式(5B)に代入して変形すると以下の数式(6B)が導出される。

Figure 2012178679
By substituting Equation (4A) and Equation (4B) into Equation (5A) for transformation, the following Equation (6A) is derived. Similarly, the following formula (6B) is derived by substituting the formula (4A) and the formula (4B) into the formula (5B) for transformation.
Figure 2012178679

いま、方向特定部62による到来方向θe1および到来方向θe2の推定の精度が充分に高い(θe1≒θ1,θe2≒θ2)と仮定すると、指向処理部64に適用される遅延τeijを、実際の伝播経路における遅延τijで近似する(τeij≒τij)ことが可能である。したがって、数式(6A)の右辺の第2項と第4項とが相殺されて以下の数式(7A)が導出され、数式(6B)の右辺の第1項と第3項とが相殺されて以下の数式(7B)が導出される。

Figure 2012178679
Assuming that the accuracy of the estimation of the arrival direction θe1 and the arrival direction θe2 by the direction specifying unit 62 is sufficiently high (θe1≈θ1, θe2≈θ2), the delay τeij applied to the directivity processing unit 64 is actually propagated. It is possible to approximate by the delay τij in the path (τeij≈τij). Therefore, the second term and the fourth term on the right side of the formula (6A) are canceled and the following formula (7A) is derived, and the first and third terms on the right side of the formula (6B) are offset. The following formula (7B) is derived.
Figure 2012178679

いま、指向成分Z1[n,u]の振幅|Z1[n,u]|と指向成分Z2[n,u]の振幅|Z2[n,u]|との加算値(以下「振幅和」という)に対する指向成分Z1[n,u]の振幅|Z1[n,u]|の比は、数式(7A)および数式(7B)を考慮すると以下の数式(8A)のように表現される。同様に、振幅和に対する指向成分Z2[n,u]の振幅|Z2[n,u]|の比は、以下の数式(8B)のように表現される。

Figure 2012178679
Now, the sum of the amplitude | Z1 [n, u] | of the directional component Z1 [n, u] and the amplitude | Z2 [n, u] | of the directional component Z2 [n, u] (hereinafter referred to as “amplitude sum”) The ratio of the amplitude | Z1 [n, u] | of the directional component Z1 [n, u] with respect to) is expressed as the following equation (8A) in consideration of the equations (7A) and (7B). Similarly, the ratio of the amplitude | Z2 [n, u] | of the directivity component Z2 [n, u] to the amplitude sum is expressed as in the following formula (8B).
Figure 2012178679

数式(7A)および数式(7B)のうち遅延(位相)τijに関連する遅延項(後半の括弧部分)は指向成分Z1[n,u]と指向成分Z2[n,u]とで共通する。したがって、数式(8A)および数式(8B)では遅延項が消去される。   Of the equations (7A) and (7B), the delay term related to the delay (phase) τij (the parentheses in the latter half) is common to the directional component Z1 [n, u] and the directional component Z2 [n, u]. Therefore, the delay term is eliminated in Equation (8A) and Equation (8B).

図2の係数値生成部66は、以下の数式(9A)および数式(9B)に示すように、振幅和に対する指向成分Z1[n,u]の振幅|Z1[n,u]|の比(数式(8A))を処理係数値α1[n,u]として第2周波数FB[n]毎に算定し、振幅和に対する指向成分Z2[n,u]の振幅|Z2[n,u]|の比(数式(8B))を処理係数値α2[n,u]として第2周波数FB[n]毎に算定する。

Figure 2012178679
The coefficient value generation unit 66 in FIG. 2 has a ratio of the amplitude | Z1 [n, u] | of the directional component Z1 [n, u] to the sum of amplitudes as shown in the following equations (9A) and (9B) ( Equation (8A)) is calculated for each second frequency FB [n] as the processing coefficient value α1 [n, u], and the amplitude | Z2 [n, u] | of the directional component Z2 [n, u] with respect to the sum of amplitudes is calculated. The ratio (formula (8B)) is calculated for each second frequency FB [n] as the processing coefficient value α2 [n, u].
Figure 2012178679

数式(8A)および数式(8B)と数式(9A)および数式(9B)とから理解されるように、処理係数値α1[n,u]および処理係数値α2[n,u]は、観測点での音響S1および音響S2の単位期間毎の振幅の内分比(観測信号x1(t)および観測信号x2(t)の各々に対する各音源PSiの寄与度)に相当する。すなわち、観測点での音響S1の振幅比が処理係数値α1[n,u]で表現され、観測点での音響S2の振幅比が処理係数値α2[n,u]で表現され得る。例えば、観測点での音響S1の振幅(A1[n]|S1[n,u]|)と音響S2の振幅(A2[n]|S2[n,u]|)とが相等しい場合に処理係数値α1[n,u]および処理係数値α2[n,u]は0.5となり、音響S1の振幅(A1[n]|S1[n,u]|)が音響S2の振幅(A2[n]|S2[n,u]|)を上回る場合には、処理係数値α1[n,u]は処理係数値α2[n,u]を上回る。したがって、数式(9A)の処理係数値α1[n,u]および数式(9B)の処理係数値α2[n,u]は観測点での音響S1と音響S2との振幅比を表現する変数として妥当である。   As understood from the equations (8A) and (8B), the equations (9A) and (9B), the processing coefficient value α1 [n, u] and the processing coefficient value α2 [n, u] Corresponds to the internal ratio of the amplitudes of the sound S1 and the sound S2 for each unit period (contribution of each sound source PSi to each of the observation signal x1 (t) and the observation signal x2 (t)). That is, the amplitude ratio of the sound S1 at the observation point can be expressed by the processing coefficient value α1 [n, u], and the amplitude ratio of the sound S2 at the observation point can be expressed by the processing coefficient value α2 [n, u]. For example, processing is performed when the amplitude of the acoustic S1 at the observation point (A1 [n] | S1 [n, u] |) and the amplitude of the acoustic S2 (A2 [n] | S2 [n, u] |) are equal. The coefficient value α1 [n, u] and the processing coefficient value α2 [n, u] are 0.5, and the amplitude of the acoustic S1 (A1 [n] | S1 [n, u] |) is the amplitude of the acoustic S2 (A2 [A2 [n, u] | n] | S2 [n, u] |), the processing coefficient value α1 [n, u] exceeds the processing coefficient value α2 [n, u]. Therefore, the processing coefficient value α1 [n, u] in the equation (9A) and the processing coefficient value α2 [n, u] in the equation (9B) are variables representing the amplitude ratio between the sound S1 and the sound S2 at the observation point. It is reasonable.

処理係数値α1[n,u]および処理係数値α2[n,u]は以上のように設定されるから、数式(9A)の処理係数値α1[n,u]を適用した数式(3A)の演算で信号処理部68が生成する分離成分YB1[n,u]では音響S1の周波数成分S1[n,u]が強調され、数式(9B)の処理係数値α2[n,u]を適用した数式(3B)の演算で生成される分離成分YB2[n,u]では音響S2の周波数成分S2[n,u]が強調される。すなわち、N個の第2周波数FB[1]〜FB[N]の各々について音響S1(周波数成分S1[n,u])と音響S2(周波数成分S1[n,u])とが分離される。   Since the processing coefficient value α1 [n, u] and the processing coefficient value α2 [n, u] are set as described above, Formula (3A) to which the processing coefficient value α1 [n, u] of Formula (9A) is applied. In the separation component YB1 [n, u] generated by the signal processing unit 68 in the calculation of (2), the frequency component S1 [n, u] of the sound S1 is emphasized, and the processing coefficient value α2 [n, u] of Equation (9B) is applied. In the separated component YB2 [n, u] generated by the calculation of the mathematical formula (3B), the frequency component S2 [n, u] of the sound S2 is emphasized. That is, for each of the N second frequencies FB [1] to FB [N], the sound S1 (frequency component S1 [n, u]) and the sound S2 (frequency component S1 [n, u]) are separated. .

図4の部分(A)は、音源PSiが放射した音響Siの振幅スペクトルであり、図4の部分(C)は、第1実施形態の構成で生成された分離信号yi(t)の振幅スペクトルである。図4の部分(B)は、指向処理部64による死角制御型ビーム形成で生成された指向成分Zi[n,u]を分離成分YBi[n,u]とする構成(以下「対比例」という)で生成された分離信号yi(t)の振幅スペクトルである。   Part (A) in FIG. 4 is the amplitude spectrum of the acoustic Si radiated from the sound source PSi, and part (C) in FIG. 4 is the amplitude spectrum of the separated signal yi (t) generated in the configuration of the first embodiment. It is. Part (B) of FIG. 4 has a configuration in which the directional component Zi [n, u] generated by the blind spot control type beam forming by the directional processing unit 64 is a separated component YBi [n, u] (hereinafter referred to as “comparative”). ) Is an amplitude spectrum of the separated signal yi (t) generated.

指向成分Zi[n,u]を示す数式(7A)および数式(7B)の遅延項の各項(e-jω[n](τ11+τ12),e-jω[n](τ12+τ21))は角周波数ω[n]が小さいほど1に近付くから、角周波数ω[n]が小さいほど数式(7A)および数式(7B)の遅延項はゼロに近付く。したがって、指向成分Zi[n,u]は低域側ほど抑制される。すなわち、指向成分Zi[n,u]を分離成分YBi[n,u]として分離信号yi(t)を生成する対比例の構成では、図4の部分(B)からも把握されるように、分離信号yi(t)のうち低域側(特に0Hz〜500Hz)の強度(振幅)が本来の音響Si(部分(A))と比較して抑制されるという問題がある。 Each term (e −jω [n] (τ11 + τ12) , e −jω [n] (τ12 + τ21) ) of the expression (7A) and the expression (7B) indicating the directional component Zi [n, u] Since the smaller the angular frequency ω [n], the closer to 1, the smaller the angular frequency ω [n], the closer to zero the delay terms in the equations (7A) and (7B). Therefore, the directional component Zi [n, u] is suppressed toward the lower frequency side. That is, in the comparative configuration in which the separated signal yi (t) is generated with the directional component Zi [n, u] as the separated component YBi [n, u], as can be understood from the part (B) in FIG. There is a problem that the intensity (amplitude) of the low frequency side (particularly 0 Hz to 500 Hz) of the separated signal yi (t) is suppressed as compared with the original sound Si (part (A)).

他方、第1実施形態では、指向成分Z1[n,u]および指向成分Z2[n,u]の振幅から算定される処理係数値αi[n,u]を周波数成分XBi[n,u]に作用させて分離成分YBi[n,u]が生成される。前述の通り、処理係数値α1[n,u]および処理係数値α2[n,u]では、数式(7A)および数式(7B)における遅延項の影響は排除されるから、図4の部分(C)からも把握されるように、分離信号yi(t)における低域側の強度を音響Siと同等に維持することが可能である。すなわち、第1実施形態によれば、対比例と比較して高精度な音源分離が実現される(各音響Siを忠実に抽出できる)という利点がある。   On the other hand, in the first embodiment, the processing coefficient value αi [n, u] calculated from the amplitudes of the directional component Z1 [n, u] and the directional component Z2 [n, u] is used as the frequency component XBi [n, u]. The separated component YBi [n, u] is generated by the action. As described above, in the processing coefficient value α1 [n, u] and the processing coefficient value α2 [n, u], the influence of the delay term in Expression (7A) and Expression (7B) is eliminated. As can be seen from C), the intensity of the low frequency side in the separated signal yi (t) can be maintained equal to that of the sound Si. That is, according to the first embodiment, there is an advantage that sound source separation with higher accuracy can be realized (each sound Si can be faithfully extracted) compared with the comparative example.

また、数式(9A)や数式(9B)から理解されるように、第2音源分離部32の処理(処理係数値αi[n,u]の算定や数式(3A)および数式(3B)の演算)は第1音源分離部31の処理(学習処理の反復で分離行列W[m]を生成する処理)と比較して負荷が少ない。したがって、学習処理の対象となる第1周波数FA[m]の個数Mを削減できる第1実施形態によれば、音源分離の性能を低下させずに、演算処理装置12の処理負荷(消費電力)や記憶装置14に必要な記憶容量を削減できるという利点がある。以上の効果は、演算処理装置12の性能や電源容量や記憶容量が制約される可搬型の情報端末(例えば携帯電話機)に音響処理装置100Aを搭載する場合に格別に有利である。   Further, as understood from Equation (9A) and Equation (9B), the processing of the second sound source separation unit 32 (calculation of processing coefficient value αi [n, u] and calculation of Equation (3A) and Equation (3B)) ) Is less loaded than the process of the first sound source separation unit 31 (the process of generating the separation matrix W [m] by repeating the learning process). Therefore, according to the first embodiment in which the number M of the first frequencies FA [m] to be subjected to the learning process can be reduced, the processing load (power consumption) of the arithmetic processing unit 12 without reducing the sound source separation performance. There is an advantage that the storage capacity required for the storage device 14 can be reduced. The above effects are particularly advantageous when the sound processing device 100A is mounted on a portable information terminal (for example, a mobile phone) in which the performance, power supply capacity, and storage capacity of the arithmetic processing device 12 are restricted.

学習処理の対象となる第1周波数FA[m]の個数Mと音源分離の精度とについて以下に詳述する。なお、以下の説明では、観測信号xi(t)を以下の数式(10A)および数式(10B)のように表現し、分離信号yi(t)を以下の数式(11A)および数式(11B)のように表現する。記号xij(t)および記号yij(t)は、音源PSiから収音機器PMjに到来する音響成分を意味する。

Figure 2012178679
The number M of the first frequencies FA [m] to be subjected to learning processing and the accuracy of sound source separation will be described in detail below. In the following description, the observation signal xi (t) is expressed as in the following equations (10A) and (10B), and the separated signal yi (t) is expressed as in the following equations (11A) and (11B). Express as follows. The symbols xij (t) and yij (t) mean acoustic components that arrive at the sound collection device PMj from the sound source PSi.
Figure 2012178679

図5および図6は、独立成分分析の学習処理で分離行列W[m]を生成する第1周波数FA[m]の個数M(横軸)と音源分離の評価指標(縦軸)との関係を示すグラフである。図5および図6の横軸の記号“FDICA”は、K個の周波数F[1]〜F[K]の全部(例えばK=513)を第1周波数FA[m]に選別した場合(すなわち、第2音源分離部32を省略した構成)を意味する。また、図5および図6では、第1実施形態(実線)および対比例(破線)の各々について、無響室で収録された観測信号xi(t)を処理した場合の結果と、残響時間が500ミリ秒である音響室で収録された観測信号xi(t)を処理した場合の結果とが併記されている。   5 and 6 show the relationship between the number M (horizontal axis) of the first frequencies FA [m] for generating the separation matrix W [m] in the independent component analysis learning process and the evaluation index (vertical axis) of the sound source separation. It is a graph which shows. The symbol “FDICA” on the horizontal axis in FIGS. 5 and 6 represents the case where all of the K frequencies F [1] to F [K] (for example, K = 513) are selected as the first frequency FA [m] (that is, , A configuration in which the second sound source separation unit 32 is omitted. In FIGS. 5 and 6, for each of the first embodiment (solid line) and the proportionality (broken line), the result of processing the observation signal xi (t) recorded in the anechoic room and the reverberation time are shown. The result of processing the observation signal xi (t) recorded in the acoustic room of 500 milliseconds is also shown.

図5では、音源分離後のセグメンタルSNR(SegSNR:Segmental Signal-to-Noise Ratio)が音源分離の評価指標として縦軸に図示されている。音源分離後のセグメンタルSNRは以下の数式(12)で表現される。数式(12)の記号xij(h,u)は、数式(10A)および数式(10B)の音響成分xij(t)のうち第u番目の単位期間内の時点hでの信号値(振幅)を意味する。また、数式(12)の記号yi(h,u)は、音源分離後の分離信号yi(t)のうち第u番目の単位期間内の時点hでの信号値(振幅)を意味する。数式(12)から理解されるように、音源分離後のセグメンタルSNRが大きい(すなわち分離信号yi(t)が観測点での音響Siに近い)ほど音源分離の精度が高いと評価できる。

Figure 2012178679
In FIG. 5, a segmental signal-to-noise ratio (SegSNR) after sound source separation is shown on the vertical axis as an evaluation index for sound source separation. The segmental SNR after the sound source separation is expressed by the following formula (12). The symbol xij (h, u) in Equation (12) represents the signal value (amplitude) at the time point h within the u-th unit period among the acoustic components xij (t) in Equation (10A) and Equation (10B). means. The symbol yi (h, u) in the equation (12) means a signal value (amplitude) at the time point h within the u-th unit period of the separated signal yi (t) after the sound source separation. As understood from the equation (12), it can be evaluated that the higher the segmental SNR after the sound source separation (that is, the closer the separated signal yi (t) is to the acoustic Si at the observation point), the higher the sound source separation accuracy.
Figure 2012178679

図5から把握されるように、対比例の構成では、学習処理の対象となる第1周波数FA[m]の個数Mが減少するほど音源分離の精度(セグメンタルSNR)が低下するのに対し、第1実施形態では、第1周波数FA[m]の個数Mを削減した場合でも充分に高精度な音源分離が実現される。K個の全部を第1周波数FA[m]に選別した場合(FDICA)と比較しても第1実施形態のほうが音源分離の精度が高いことが図5から把握される。   As can be seen from FIG. 5, in the comparative configuration, the accuracy of sound source separation (segmental SNR) decreases as the number M of the first frequencies FA [m] to be subjected to learning processing decreases. In the first embodiment, sufficiently accurate sound source separation is realized even when the number M of the first frequencies FA [m] is reduced. It can be understood from FIG. 5 that the first embodiment has higher accuracy of sound source separation even when compared with the case where all K pieces are selected as the first frequency FA [m] (FDICA).

他方、図6では、音源分離の前後にわたるSIR(信号対干渉比:Signal-to-Interference Ratio)の変化量ΔSIRが音源分離の評価指標として縦軸に図示されている。音源分離前のSIRinは以下の数式(13A)で表現され、音源分離後のSIRoutは以下の数式(13B)で表現される。数式(13A)の音響成分x21(t)および音響成分x12(t)(数式(10A),数式(10B))と、数式(13B)の音響成分y21(t)および音響成分y12(t)(数式(11A),数式(11B))とが干渉成分(妨害音)に相当する。

Figure 2012178679
On the other hand, in FIG. 6, a change amount ΔSIR of SIR (Signal-to-Interference Ratio) before and after sound source separation is shown on the vertical axis as an evaluation index of sound source separation. SIR in before the sound source separation is expressed by the following formula (13A), and SIR out after the sound source separation is expressed by the following formula (13B). The acoustic component x21 (t) and the acoustic component x12 (t) (Equation (10A) and Equation (10B)) of the equation (13A) and the acoustic component y21 (t) and the acoustic component y12 (t) (Equation (13B)) Expressions (11A) and (11B)) correspond to interference components (interfering sounds).
Figure 2012178679

図6の縦軸に図示された変化量ΔSIRは、音源分離前のSIRinと音源分離後のSIRoutとの差分値(ΔSIR=SIRout−SIRin)に相当する。したがって、変化量ΔSIRが大きいほど音源分離の精度が高いと評価できる。図6から把握されるように、第1実施形態および対比例の双方について、学習処理の対象となる第1周波数FA[m]の個数Mが減少するほど音源分離の精度(変化量ΔSIR)が低下する。以上の傾向は、残響が発生する環境で特に顕著となる。 The change amount ΔSIR shown on the vertical axis in FIG. 6 corresponds to a difference value (ΔSIR = SIR out −SIR in ) between SIR in before sound source separation and SIR out after sound source separation. Therefore, it can be evaluated that the greater the change amount ΔSIR, the higher the accuracy of sound source separation. As can be seen from FIG. 6, in both the first embodiment and the comparative example, the accuracy (variation ΔSIR) of the sound source separation increases as the number M of the first frequencies FA [m] to be subjected to learning processing decreases. descend. The above tendency is particularly remarkable in an environment where reverberation occurs.

以上に説明したように、第1実施形態では、セグメンタルSNRの観点から評価した音源分離の精度とSIR(変化量ΔSIR)の観点から評価した音源分離の精度とが、第1周波数FA[m]の個数Mに対して相互に背反する関係にある。したがって、図5のセグメンタルSNRと図6のSIRの変化量ΔSIRとが高い水準で両立するように第1周波数FA[m]の個数Mを選定することで、対比例と比較して高精度な音源分離を実現することが可能である。   As described above, in the first embodiment, the accuracy of the sound source separation evaluated from the viewpoint of the segmental SNR and the accuracy of the sound source separation evaluated from the viewpoint of the SIR (variation amount ΔSIR) are the first frequency FA [m ] Are mutually contradictory to the number M. Therefore, by selecting the number M of the first frequencies FA [m] so that the segmental SNR in FIG. 5 and the SIR variation ΔSIR in FIG. Sound source separation can be realized.

例えば、第1実施形態では第1周波数FA[m]の個数Mが少ないほどセグメンタルSNRが上昇する。したがって、セグメンタルSNRを改善するという観点や、演算処理装置12の処理負荷(消費電力)および記憶装置14の容量を削減するという観点からすると、第1周波数FA[m]の個数Mを減少させるほど有利である。他方、第1周波数FA[m]の個数Mを極端に減少させた場合には、SIRの変化量ΔSIRの低下が顕在化する可能性があるが、第1周波数FA[m]の個数Mが周波数F[k]の総数K(K=513)の1/4程度(M=128)を上回る範囲であれば、SIRの変化量ΔSIRの低下は顕在化しないという傾向が図6から把握される。また、分離行列M[m]の個数Mが極端に少ない場合には到来方向θe1および到来方向θe2の推定精度が低下するが、個数Mが周波数F[k]の総数Kの1/4程度であれば、充分な精度で到来方向θe1および到来方向θe2を推定することが可能である。以上の傾向を考慮すると、第1周波数FA[m]の個数Mを周波数F[k]の総数Kの25%程度(例えば20%〜30%)に設定した構成が格別に好適である。   For example, in the first embodiment, the smaller the number M of the first frequencies FA [m], the higher the segmental SNR. Therefore, from the viewpoint of improving the segmental SNR and from the viewpoint of reducing the processing load (power consumption) of the arithmetic processing device 12 and the capacity of the storage device 14, the number M of the first frequencies FA [m] is reduced. Is more advantageous. On the other hand, when the number M of the first frequencies FA [m] is extremely reduced, there is a possibility that the decrease in the change amount SIR of the SIR becomes obvious, but the number M of the first frequencies FA [m] is From FIG. 6, it can be seen from FIG. 6 that the decrease in the amount of change SIR ΔSIR does not become apparent if the frequency F [k] is in a range that exceeds approximately ¼ (M = 128) of the total number K (K = 513) of the frequency F [k]. . Further, when the number M of the separation matrix M [m] is extremely small, the estimation accuracy of the arrival direction θe1 and the arrival direction θe2 is lowered, but the number M is about 1/4 of the total number K of the frequencies F [k]. If so, the arrival direction θe1 and the arrival direction θe2 can be estimated with sufficient accuracy. Considering the above tendency, a configuration in which the number M of the first frequencies FA [m] is set to about 25% (for example, 20% to 30%) of the total number K of the frequencies F [k] is particularly suitable.

<B:第2実施形態>
本発明の第2実施形態を以下に説明する。第1実施形態では、独立成分分析を利用した音源分離(第1音源分離部31)と死角制御型ビーム形成を利用した音源分離(第2音源分離部32)とを併用したが、第2実施形態では独立成分分析による音源分離が省略される。なお、以下に例示する各構成において作用や機能が第1実施形態と同等である要素については、以上の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。
<B: Second Embodiment>
A second embodiment of the present invention will be described below. In the first embodiment, the sound source separation using the independent component analysis (first sound source separation unit 31) and the sound source separation using the blind spot control type beam forming (second sound source separation unit 32) are used in combination. In the form, sound source separation by independent component analysis is omitted. In addition, about the element in which an effect | action and a function are equivalent to 1st Embodiment in each structure illustrated below, the detailed description of each is abbreviate | omitted suitably using the code | symbol referred by the above description.

図7は、第2実施形態の音響処理装置100Bのブロック図である。図7に示すように、第2実施形態の音響処理装置100Bは、第1実施形態における周波数選別部24と指標算定部26と第1音源分離部31と周波数統合部42とを省略した構成であり、周波数解析部22と音源分離部35と波形合成部44とを具備する。周波数解析部22は、第1実施形態と同様に、観測信号x1(t)のK個の周波数成分X1[k,u](X1[1,u]〜X1[K,u])と観測信号x2(t)のK個の周波数成分X2[k,u](X2[1,u]〜X2[K,u])とを生成する。   FIG. 7 is a block diagram of the sound processing apparatus 100B of the second embodiment. As shown in FIG. 7, the sound processing apparatus 100B of the second embodiment has a configuration in which the frequency selection unit 24, the index calculation unit 26, the first sound source separation unit 31, and the frequency integration unit 42 in the first embodiment are omitted. A frequency analysis unit 22, a sound source separation unit 35, and a waveform synthesis unit 44. Similarly to the first embodiment, the frequency analysis unit 22 uses the K frequency components X1 [k, u] (X1 [1, u] to X1 [K, u]) of the observation signal x1 (t) and the observation signal. X frequency components X2 [k, u] (X2 [1, u] to X2 [K, u]) of x2 (t) are generated.

音源分離部35は、第1実施形態の第2音源分離部32と同様に、図2の方向特定部62と指向処理部64と係数値生成部66と信号処理部68とを具備し、各周波数成分X1[k,u]および各周波数成分X2[k,u]に対して死角制御型ビーム形成を利用した音源分離を実行することでK個の分離成分Y1[k,u](Y1[1,u]〜Y1[K,u])とK個の分離成分Y2[k,u](Y2[1,u]〜Y2[K,u])とを単位期間毎に生成する。すなわち、第2実施形態の音源分離部35の動作は、第1実施形態においてK個の周波数F[1]〜F[K]の全部を第2周波数FB[1]〜FB[N]に選別した場合(N=K)の第2音源分離部32の動作と同様である。波形合成部44は、第1実施形態と同様に、K個の分離成分Y1[1,u]〜Y1[K,u]から分離信号y1(t)を生成するとともにK個の分離成分Y2[1,u]〜Y2[N,u]から分離信号y2(t)を生成する。第2実施形態においても第1実施形態と同様の効果が実現される。   Similar to the second sound source separation unit 32 of the first embodiment, the sound source separation unit 35 includes the direction specifying unit 62, the directivity processing unit 64, the coefficient value generation unit 66, and the signal processing unit 68 of FIG. By performing sound source separation using blind spot control type beam forming for the frequency component X1 [k, u] and each frequency component X2 [k, u], K separated components Y1 [k, u] (Y1 [ 1, u] to Y1 [K, u]) and K separated components Y2 [k, u] (Y2 [1, u] to Y2 [K, u]) are generated for each unit period. That is, the operation of the sound source separation unit 35 of the second embodiment is to select all of the K frequencies F [1] to F [K] into the second frequencies FB [1] to FB [N] in the first embodiment. In this case (N = K), the operation is the same as that of the second sound source separation unit 32. Similarly to the first embodiment, the waveform synthesizer 44 generates a separation signal y1 (t) from the K separation components Y1 [1, u] to Y1 [K, u], and K separation components Y2 [ The separated signal y2 (t) is generated from 1, u] to Y2 [N, u]. In the second embodiment, the same effect as in the first embodiment is realized.

なお、第1実施形態の方向特定部62は到来方向θe1および到来方向θe2の推定に分離行列W[m]を利用したが、第2実施形態の方向特定部62が到来方向θe1および到来方向θe2を特定する方法には公知の技術が任意に採用される。例えば、方向特定部62は、Ema Takuro and Nozomu Hamada, "FDICA using Time-Frequency Cell Selection for Blind Source Separation", 2005 RISP International Worksyop on Nonlinear Circuit and Signal Processing (NCSP'05), p.471 - 474 等に記載された方法で、各周波数成分X1[k,u]および各周波数成分X2[k,u]から到来方向θe1および到来方向θe2を推定する。また、第1実施形態の分離行列生成部54を第2実施形態に追加し、分離行列生成部54が生成した分離行列W[m]から第1実施形態と同様の方法で方向特定部62が到来方向θe1および到来方向θe2を推定する構成(すなわち分離行列W[m]を到来方向θe1および到来方向θe2の推定のみに利用する構成)も採用される。   Although the direction specifying unit 62 of the first embodiment uses the separation matrix W [m] for estimating the arrival direction θe1 and the arrival direction θe2, the direction specifying unit 62 of the second embodiment uses the arrival direction θe1 and the arrival direction θe2. A known technique is arbitrarily adopted as a method for identifying the above. For example, the direction specifying unit 62 may be an Ema Takuro and Nozomu Hamada, “FDICA using Time-Frequency Cell Selection for Blind Source Separation”, 2005 RISP International Worksyop on Nonlinear Circuit and Signal Processing (NCSP'05), p.471-474, etc. The direction of arrival θe1 and the direction of arrival θe2 are estimated from each frequency component X1 [k, u] and each frequency component X2 [k, u]. Further, the separation matrix generation unit 54 of the first embodiment is added to the second embodiment, and the direction specification unit 62 uses the separation matrix W [m] generated by the separation matrix generation unit 54 in the same manner as in the first embodiment. A configuration that estimates the arrival direction θe1 and the arrival direction θe2 (that is, a configuration that uses the separation matrix W [m] only for estimation of the arrival direction θe1 and the arrival direction θe2) is also employed.

<C:変形例>
以上の各形態には様々な変形が加えられる。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された2以上の態様は適宜に併合され得る。
<C: Modification>
Various modifications are added to the above embodiments. Specific modifications are exemplified below. Two or more aspects arbitrarily selected from the following examples can be appropriately combined.

(1)変形例1
前述の各形態において方向特定部62が到来方向θe1や到来方向θe2を特定する方法は任意である。例えば、分離行列生成部54が生成したM個の分離行列W[1]〜W[M]のうち所定個の分離行列W[m]を選択して到来方向θe1および到来方向θe2を推定する構成も採用され得る。また、例えば、音源PS1の方向θ1や音源PS2の方向θ2が既知である場合には、到来方向θe1や到来方向θe2を記憶装置14に事前に格納することも可能である。方向特定部62は、記憶装置14から到来方向θe1および到来方向θe2を取得する要素として機能する。利用者からの指示(例えば操作子の操作で方向を指定する動作)に応じて到来方向θe1および到来方向θe2を設定することも可能である。
(1) Modification 1
In each of the above-described embodiments, the method for specifying the arrival direction θe1 and the arrival direction θe2 by the direction specifying unit 62 is arbitrary. For example, a configuration in which a predetermined number of separation matrices W [m] is selected from the M number of separation matrices W [1] to W [M] generated by the separation matrix generation unit 54 and the arrival direction θe1 and the arrival direction θe2 are estimated. Can also be employed. For example, when the direction θ1 of the sound source PS1 and the direction θ2 of the sound source PS2 are known, the arrival direction θe1 and the arrival direction θe2 can be stored in the storage device 14 in advance. The direction specifying unit 62 functions as an element that acquires the arrival direction θe1 and the arrival direction θe2 from the storage device 14. It is also possible to set the arrival direction θe1 and the arrival direction θe2 in accordance with an instruction from the user (for example, an operation for designating a direction by operating the operation element).

(2)変形例2
第1実施形態の有意指標値σ[k]は、観測ベクトルXv[k,u]の共分散行列Rxx[k]の行列式に限定されない。例えば特許文献1に例示された各種の指標(統計量)が有意指標値σ[k]として採用され得る。
(2) Modification 2
The significant index value σ [k] of the first embodiment is not limited to the determinant of the covariance matrix Rxx [k] of the observation vector Xv [k, u]. For example, various indexes (statistics) exemplified in Patent Document 1 can be adopted as the significant index value σ [k].

例えば、観測ベクトルXv[k,u]の分布における基底の総数が多い周波数F[k]ほど学習処理の有意性が高いという傾向を考慮すると、観測ベクトルXv[k,u]の共分散行列Rxx[k]の条件数を有意指標値σ[k]として指標算定部26が算定し、有意指標値σ[k]が小さいM個の周波数F[k]を周波数選別部24が第1周波数FA[k]に選別することが可能である。すなわち、共分散行列Rxx[k]の行列式や条件数は、観測ベクトルXv[k,u]の分布における基底の総数の指標として利用される。なお、共分散行列Rxx[k]のトレースを有意指標値σ[k]として算定し、有意指標値σ[k]が大きい周波数F[k]を第1周波数FA[m]に選別する構成も好適である。   For example, in consideration of the tendency that the significance of the learning process is higher as the frequency F [k] has a larger total number of bases in the distribution of the observed vector Xv [k, u], the covariance matrix Rxx of the observed vector Xv [k, u]. The index calculation unit 26 calculates the condition number of [k] as the significant index value σ [k], and the frequency selection unit 24 selects the first frequency FA with M frequencies F [k] having a small significant index value σ [k]. It is possible to sort into [k]. That is, the determinant and condition number of the covariance matrix Rxx [k] are used as an index of the total number of bases in the distribution of the observation vector Xv [k, u]. A configuration is also possible in which the trace of the covariance matrix Rxx [k] is calculated as the significant index value σ [k], and the frequency F [k] having a large significant index value σ [k] is selected as the first frequency FA [m]. Is preferred.

独立成分分析の学習処理は、音源分離後の各信号が統計的に独立となるように分離行列W[m]を更新する処理であるから、観測信号x1(t)と観測信号x2(t)との間の統計的な相関が低い周波数F[k]ほど学習処理の有意性は高いと評価できる。以上の傾向を考慮すると、観測信号x1(t)と観測信号x2(t)との間の独立性の指標が有意指標値σ[k]として好適である。独立性の指標としては相互相関や相互情報量が例示される。周波数選別部24は、観測信号x1(t)と観測信号x2(t)との間の独立性が高い(相互相関や相互情報量が小さい)M個の周波数F[k]を第1周波数FA[m]に選別する。   The learning process of independent component analysis is a process of updating the separation matrix W [m] so that each signal after sound source separation is statistically independent. Therefore, the observation signal x1 (t) and the observation signal x2 (t) It can be evaluated that the significance of the learning process is higher as the frequency F [k] has a lower statistical correlation with. Considering the above tendency, an index of independence between the observed signal x1 (t) and the observed signal x2 (t) is suitable as the significant index value σ [k]. Examples of the independence index include cross-correlation and mutual information. The frequency selection unit 24 selects M frequencies F [k] having high independence between the observation signal x1 (t) and the observation signal x2 (t) (small cross-correlation and mutual information amount) as the first frequency FA. Select [m].

また、観測信号x1(t)および観測信号x2(t)に含まれる音響の種類数(音源数)が多いほど学習処理の有意性は高いと評価できる。音響の混合数が多いほど観測信号x1(t)や観測信号x2(t)の強度分布の尖度(カートシス)が低下するという傾向(中心極限定理)を考慮すると、観測信号x1(t)または観測信号x2(t)の強度分布(確率分布)の尖度が有意指標値σ[k]として採用され得る。周波数選別部24は、観測信号x1(t)および観測信号x2(t)の片方または双方の強度分布の尖度が低い(音響の混合数が多い)M個の周波数F[k]を第1周波数FA[m]に選別する。   Further, it can be evaluated that the significance of the learning process is higher as the number of types of sound (number of sound sources) included in the observation signal x1 (t) and the observation signal x2 (t) is larger. Considering the tendency (the central limit theorem) that the kurtosis (cartesis) of the intensity distribution of the observation signal x1 (t) and the observation signal x2 (t) decreases as the number of acoustic mixtures increases, the observation signal x1 (t) or The kurtosis of the intensity distribution (probability distribution) of the observation signal x2 (t) can be adopted as the significant index value σ [k]. The frequency selection unit 24 first selects M frequencies F [k] having a low kurtosis (a large number of acoustic mixtures) in one or both of the observation signal x1 (t) and the observation signal x2 (t). Sort to frequency FA [m].

複数種の指標(例えば以上の例示から選択された2種以上の指標)から有意指標値σ[k]を算定することも可能である。例えば、前述の複数種の指標(例えば共分散行列Rxx[k]の行列式とトレース)の加重和を有意指標値σ[k]として算定する構成が採用される。   It is also possible to calculate a significant index value σ [k] from a plurality of types of indexes (for example, two or more types of indexes selected from the above examples). For example, a configuration is adopted in which the weighted sum of the above-described plural types of indices (for example, the determinant of the covariance matrix Rxx [k] and the trace) is calculated as the significant index value σ [k].

もっとも、第1周波数FA[m]と第2周波数FB[n]との選別に有意指標値σ[k]を利用する構成(指標算定部26)は省略され得る。具体的には、観測信号x1(t)や観測信号x2(t)とは無関係にK個の周波数F[k]を選別することも可能である。例えば、K個の周波数F[k]から所定個の間隔で選択した周波数F[k](例えば奇数番目の周波数F[k])を第1周波数FA[m]に選別するとともに残余の周波数F[k](例えば偶数番目の周波数F[k])を第2周波数FB[n]に選別する構成が採用される。また、観測信号x1(t)および観測信号x2(t)に想定される音響特性や学習処理の内容等の事情から、学習処理の有意性が高い周波数F[k]が事前に判明しているならば、その周波数F[k]を第1周波数FA[m]に選別するとともに残余の周波数F[k]を第2周波数FB[n]に選別することも可能である。   However, the configuration (index calculation unit 26) that uses the significant index value σ [k] for selection between the first frequency FA [m] and the second frequency FB [n] can be omitted. Specifically, K frequencies F [k] can be selected regardless of the observation signal x1 (t) or the observation signal x2 (t). For example, the frequency F [k] (for example, odd-numbered frequency F [k]) selected from the K frequencies F [k] at a predetermined interval is selected as the first frequency FA [m] and the remaining frequency F is selected. A configuration is adopted in which [k] (for example, even-numbered frequency F [k]) is selected as second frequency FB [n]. In addition, the frequency F [k] with high significance of the learning process is known in advance from the situation such as the acoustic characteristics assumed for the observation signal x1 (t) and the observation signal x2 (t) and the contents of the learning process. Then, the frequency F [k] can be selected as the first frequency FA [m] and the remaining frequency F [k] can be selected as the second frequency FB [n].

(3)変形例3
以上の各形態では、2個の音源PSi(PS1,PS2)からの音響Siを2個の収音機器PMj(PM1,PM2)で収音する構成を例示したが、音源PSiの総数や収音機器PMjの総数は適宜に変更される。ただし、収音機器PMjの総数は音源PSiの総数以上である必要がある。
(3) Modification 3
In each of the above embodiments, the configuration in which the sound Si from the two sound sources PSi (PS1, PS2) is collected by the two sound collecting devices PMj (PM1, PM2) is exemplified. The total number of devices PMj is changed as appropriate. However, the total number of sound collection devices PMj needs to be equal to or greater than the total number of sound sources PSi.

(4)変形例4
例えば携帯電話機やパーソナルコンピュータ等の端末装置から送信された観測信号x1(t)および観測信号x2(t)をインターネット等の通信網を介して音響処理装置100(100A,100B)が受信する構成も採用され得る。音響処理装置100は、観測信号x1(t)および観測信号x2(t)から第1実施形態や第2実施形態と同様に分離信号y1(t)および分離信号y2(t)を生成して端末装置に送信する。各周波数成分X1[k,u]と各周波数成分X2[k,u]とが端末装置から音響処理装置100に送信される構成(周波数解析部22が端末装置に搭載されて音響処理装置100には搭載されない構成)や、各分離成分Y1[k,u]と各分離成分Y2[k,u]とが音響処理装置100から端末装置に送信される構成(波形合成部44が端末装置に搭載されて音響処理装置100には搭載されない構成)も採用される。
(4) Modification 4
For example, the sound processing device 100 (100A, 100B) receives the observation signal x1 (t) and the observation signal x2 (t) transmitted from a terminal device such as a mobile phone or a personal computer via a communication network such as the Internet. Can be employed. The sound processing apparatus 100 generates a separation signal y1 (t) and a separation signal y2 (t) from the observation signal x1 (t) and the observation signal x2 (t) in the same manner as in the first embodiment and the second embodiment to generate a terminal. Send to device. Configuration in which each frequency component X1 [k, u] and each frequency component X2 [k, u] are transmitted from the terminal device to the sound processing device 100 (the frequency analysis unit 22 is mounted on the terminal device and is added to the sound processing device 100) In which the separated components Y1 [k, u] and the separated components Y2 [k, u] are transmitted from the sound processing device 100 to the terminal device (the waveform synthesis unit 44 is mounted on the terminal device). And a configuration that is not mounted in the sound processing apparatus 100).

100A,100B……音響処理装置、12……演算処理装置、14……記憶装置、22……周波数解析部、24……周波数選別部、26……指標算定部、31……第1音源分離部、32……第2音源分離部、35……音源分離部、42……周波数統合部、44……波形合成部、52……信号処理部、54……分離行列生成部、62……方向特定部、64……指向処理部、66……係数値生成部、68……信号処理部、PS1,PS2……音源、PM1,PM2……収音機器。 100A, 100B: Acoustic processing device, 12: Arithmetic processing device, 14: Storage device, 22: Frequency analysis unit, 24: Frequency selection unit, 26: Index calculation unit, 31: First sound source separation , 32... Second sound source separation unit, 35... Sound source separation unit, 42... Frequency integration unit, 44 ...... waveform synthesis unit, 52 ...... signal processing unit, 54 ...... separation matrix generation unit, 62 ...... Direction specifying unit, 64... Direction processing unit, 66... Coefficient value generation unit, 68... Signal processing unit, PS 1, PS 2 .. Sound source, PM 1, PM 2.

Claims (4)

複数の音源から到来する音響の混合音を複数の収音機器で収音した複数の観測信号を処理する音響処理装置であって、
前記複数の音源の各々について音響の到来方向を特定する方向特定手段と、
前記方向特定手段が特定した複数の到来方向の各々について、当該到来方向に収音の死角を形成する死角制御型ビーム形成を前記複数の観測信号について実行することで指向信号を生成する指向処理手段と、
前記指向処理手段が生成した複数の指向信号の振幅の加算値に対する一の指向信号の振幅の比に応じた処理係数値を周波数毎に生成する係数値生成手段と、
前記観測信号の各周波数の成分に当該周波数の処理係数値を作用させる第1信号処理手段と
を具備する音響処理装置。
An acoustic processing device that processes a plurality of observation signals obtained by collecting a mixed sound of sounds coming from a plurality of sound sources by a plurality of sound collecting devices,
Direction specifying means for specifying the direction of arrival of sound for each of the plurality of sound sources;
Directional processing means for generating a directional signal for each of the plurality of arrival directions specified by the direction specifying means by executing blind angle control type beam forming that forms a dead angle of sound collection in the arrival direction for the plurality of observation signals. When,
Coefficient value generating means for generating a processing coefficient value corresponding to the ratio of the amplitude of one directional signal to the sum of amplitudes of a plurality of directional signals generated by the directional processing means for each frequency;
A sound processing apparatus comprising: first signal processing means for causing a processing coefficient value of the frequency to act on each frequency component of the observation signal.
複数の周波数を第1周波数と第2周波数とに選別する周波数選別手段と、
前記複数の観測信号における前記各第1周波数の成分から分離行列を前記第1周波数毎に生成する分離行列生成手段と、
前記複数の観測信号における前記各第1周波数の成分に当該第1周波数の分離行列を作用させて第1分離信号を生成する第2信号処理手段とを具備し、
前記指向処理手段は、前記複数の観測信号における前記各第2周波数の成分から前記到来方向毎に前記指向信号を生成し、
前記係数値生成手段は、前記各第2周波数について前記処理係数値を生成し、
前記第1信号処理手段は、前記複数の観測信号における前記各第2周波数の成分に当該第2周波数の前記処理係数値を作用させて第2分離成分を生成する
請求項1の音響処理装置。
Frequency sorting means for sorting a plurality of frequencies into a first frequency and a second frequency;
Separation matrix generating means for generating a separation matrix for each of the first frequencies from the components of the first frequencies in the plurality of observation signals;
Second signal processing means for generating a first separated signal by applying a separation matrix of the first frequency to each first frequency component in the plurality of observation signals;
The directional processing means generates the directional signal for each arrival direction from the components of the second frequencies in the plurality of observation signals,
The coefficient value generating means generates the processing coefficient value for each of the second frequencies,
The acoustic processing device according to claim 1, wherein the first signal processing unit generates a second separated component by applying the processing coefficient value of the second frequency to the component of the second frequency in the plurality of observation signals.
前記方向特定手段は、前記分離行列生成手段が各第1周波数について生成した分離行列から前記複数の音源の各々について音響の到来方向を推定する
請求項2の音響処理装置。
The acoustic processing apparatus according to claim 2, wherein the direction specifying unit estimates an arrival direction of sound for each of the plurality of sound sources from the separation matrix generated by the separation matrix generation unit for each first frequency.
前記複数の観測信号における前記各周波数の成分から分離行列を生成する学習処理の有意性を示す有意指標値を周波数毎に算定する指標算定手段を具備し、
前記周波数選別手段は、前記各周波数の有意指標値に応じて前記複数の周波数を第1周波数と第2周波数とに選別する
請求項2または請求項3の音響処理装置。
Comprising index calculation means for calculating, for each frequency, a significant index value indicating the significance of learning processing for generating a separation matrix from components of each frequency in the plurality of observation signals;
The acoustic processing apparatus according to claim 2 or 3, wherein the frequency sorting unit sorts the plurality of frequencies into a first frequency and a second frequency according to a significant index value of each frequency.
JP2011040014A 2011-02-25 2011-02-25 Sound processor Expired - Fee Related JP5826502B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2011040014A JP5826502B2 (en) 2011-02-25 2011-02-25 Sound processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2011040014A JP5826502B2 (en) 2011-02-25 2011-02-25 Sound processor

Publications (2)

Publication Number Publication Date
JP2012178679A true JP2012178679A (en) 2012-09-13
JP5826502B2 JP5826502B2 (en) 2015-12-02

Family

ID=46980250

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2011040014A Expired - Fee Related JP5826502B2 (en) 2011-02-25 2011-02-25 Sound processor

Country Status (1)

Country Link
JP (1) JP5826502B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105427860A (en) * 2015-11-11 2016-03-23 百度在线网络技术(北京)有限公司 Far field voice recognition method and device
CN107369460A (en) * 2017-07-31 2017-11-21 深圳海岸语音技术有限公司 Speech sound enhancement device and method based on acoustics vector sensor space sharpening technique
WO2018047643A1 (en) * 2016-09-09 2018-03-15 ソニー株式会社 Device and method for sound source separation, and program

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105427860A (en) * 2015-11-11 2016-03-23 百度在线网络技术(北京)有限公司 Far field voice recognition method and device
WO2018047643A1 (en) * 2016-09-09 2018-03-15 ソニー株式会社 Device and method for sound source separation, and program
US10924849B2 (en) 2016-09-09 2021-02-16 Sony Corporation Sound source separation device and method
CN107369460A (en) * 2017-07-31 2017-11-21 深圳海岸语音技术有限公司 Speech sound enhancement device and method based on acoustics vector sensor space sharpening technique
CN107369460B (en) * 2017-07-31 2020-08-21 深圳海岸语音技术有限公司 Voice enhancement device and method based on acoustic vector sensor space sharpening technology

Also Published As

Publication number Publication date
JP5826502B2 (en) 2015-12-02

Similar Documents

Publication Publication Date Title
US10334357B2 (en) Machine learning based sound field analysis
CN109074816B (en) Far field automatic speech recognition preprocessing
JP4897519B2 (en) Sound source separation device, sound source separation program, and sound source separation method
JP5229053B2 (en) Signal processing apparatus, signal processing method, and program
US8654990B2 (en) Multiple microphone based directional sound filter
CN103583054B (en) For producing the apparatus and method of audio output signal
EP2647221B1 (en) Apparatus and method for spatially selective sound acquisition by acoustic triangulation
CN106233382B (en) A kind of signal processing apparatus that several input audio signals are carried out with dereverberation
RU2015129784A (en) FILTER AND METHOD FOR INFORMED SPATIAL FILTRATION USING NUMEROUS INSTANT ESTIMATES OF ARRIVAL DIRECTION
WO2019187589A1 (en) Sound source direction estimation device, sound source direction estimation method, and program
RU2012106592A (en) FORMING THE DIAGRAM OF THE DIRECTION OF AUDIO SIGNALS
JP5277887B2 (en) Signal processing apparatus and program
JP2011164467A (en) Model estimation device, sound source separation device, and method and program therefor
CN111863015A (en) Audio processing method and device, electronic equipment and readable storage medium
JP5826502B2 (en) Sound processor
JP5034734B2 (en) Sound processing apparatus and program
JP5034735B2 (en) Sound processing apparatus and program
JP5387442B2 (en) Signal processing device
JP6840302B2 (en) Information processing equipment, programs and information processing methods
Zhu et al. Modified complementary joint sparse representations: a novel post-filtering to MVDR beamforming
US20210174820A1 (en) Signal processing apparatus, voice speech communication terminal, signal processing method, and signal processing program
JP6790659B2 (en) Sound processing equipment and sound processing method
JP2014215544A (en) Sound processing device
Garcia-Barrios et al. Exploiting spatial diversity for increasing the robustness of sound source localization systems against reverberation
JP2015046759A (en) Beamforming processor and beamforming method

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20140213

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A821

Effective date: 20140610

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20141114

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20150106

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20150303

RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20150410

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20150915

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20151014

R150 Certificate of patent or registration of utility model

Ref document number: 5826502

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

S111 Request for change of ownership or part of ownership

Free format text: JAPANESE INTERMEDIATE CODE: R313117

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

LAPS Cancellation because of no payment of annual fees