JP2017059956A

JP2017059956A - Sound source extraction system and sound source extraction method

Info

Publication number: JP2017059956A
Application number: JP2015182147A
Authority: JP
Inventors: ホルヘトレビーニョ; Trevino Jorge; 修一坂本; Shuichi Sakamoto; 鈴木　陽一; Yoichi Suzuki; 陽一鈴木
Original assignee: Tohoku University NUC; Rion Co Ltd
Current assignee: Tohoku University NUC; Rion Co Ltd
Priority date: 2015-09-15
Filing date: 2015-09-15
Publication date: 2017-03-23

Abstract

PROBLEM TO BE SOLVED: To provide a sound source extraction system capable of extracting the sound propagating from a target sound source reliably, regardless of the target sound source and an interfering sound source existing in the extraction region and the directivity.SOLUTION: A sound source extraction system calculates a matrix including a plurality of transfer functions of sound propagation, from each unit region associated previously with the position of each sound source (30, 31a, 31b), on the basis of the output signals from a plurality of microphones (1a, 1b) for collecting the sound propagating from an extraction region being divided into a plurality of unit regions, determines one inverse matrix obtained therefrom, and extracts the sound propagating from the target sound source (30) by using this inverse matrix.SELECTED DRAWING: Figure 6

Description

本発明は、複数のマイクロホンを用いて目的音源から伝搬する音を抽出する音源抽出システム及び音源抽出方法に関する。 The present invention relates to a sound source extraction system and a sound source extraction method for extracting sound propagating from a target sound source using a plurality of microphones.

一般に、多様な音源が存在する空間の中から、目的とする特定の音源から伝搬する音のみを抽出するための種々の技術が知られている（特許文献１参照）。このうち、特にビームフォーミング法は、複数のマイクロホンを有するマイクロホンアレイを用いて、所定位置に存在する目的音源から伝搬した音を収集し、演算処理により目的音源が位置する方向を特定することで、目的音源を他の音源から分離して抽出することが可能となる。このようなビームフォーミング法を適用することにより、目的音源が静止している場合だけではなく、目的音源が移動している場合もそれに追随することができる。 In general, various techniques for extracting only sound propagating from a target specific sound source from a space where various sound sources exist are known (see Patent Document 1). Among these, in particular, the beam forming method uses a microphone array having a plurality of microphones, collects the sound propagated from the target sound source existing at a predetermined position, and specifies the direction in which the target sound source is located by calculation processing. It is possible to extract the target sound source separately from other sound sources. By applying such a beam forming method, it is possible to follow not only when the target sound source is stationary but also when the target sound source is moving.

特開２０１３−１８３３５８JP2013-183358

上記ビームフォーミングを適用する場合、マイクロホンアレイから見て目的音源と同じ方向に妨害音源が存在すると、目的音源と妨害音源とを分離できない事態も生じ得る。この場合、マイクロホンアレイから目的音源に向かうビームの幅を狭めることで、目的音源を妨害音源から分離して音の空間分解能を高める効果が得られる。しかし、ビームフォーミング法によるビームの幅を狭めることは、マイクロホンや音の各種パラメータによる制約を受けるとともに、空間分解能を高めるための演算量の増加が避けられない。また、多数のマイクロホンアレイを設置して目的音源を多様な方向から監視することも想定されるが、この場合であっても各々のマイクロホンアレイが独立に目的音源を抽出するので、それぞれのビームの幅の範囲内に妨害音が存在する場合には、上述の問題を回避することは困難である。 When the beam forming is applied, if a disturbing sound source exists in the same direction as the target sound source when viewed from the microphone array, a situation in which the target sound source and the disturbing sound source cannot be separated may occur. In this case, by narrowing the width of the beam from the microphone array toward the target sound source, it is possible to obtain an effect of increasing the spatial resolution of the sound by separating the target sound source from the disturbing sound source. However, narrowing the beam width by the beam forming method is limited by various parameters of the microphone and sound, and an increase in the amount of calculation for increasing the spatial resolution is inevitable. In addition, it is assumed that a large number of microphone arrays are installed to monitor the target sound source from various directions, but even in this case, each microphone array independently extracts the target sound source, It is difficult to avoid the above-mentioned problem when there is an interference sound within the range of the width.

本発明はこれらの問題を解決するためになされたものであり、目的音源からの音を抽出する際の演算処理により、ビームフォーミング法を適用する場合に比べて良好な性能を確保し得る音源抽出システム等を提供することを目的とする。 The present invention has been made to solve these problems, and the sound source extraction that can ensure better performance than the case where the beam forming method is applied by the arithmetic processing when extracting the sound from the target sound source. The purpose is to provide a system.

上記課題を解決するために、本発明の音源抽出システムは、目的音源から伝搬する音を抽出する音源抽出システムであって、複数の単位領域に分割される所定の抽出領域外に分散配置され、前記目的音源を含む１又は２以上の音源から伝搬する音を収集する複数のマイクロホンと、前記複数のマイクロホンのそれぞれの出力信号に基づき、予め前記１又は２以上の音源の位置に対応付けられた前記複数の単位領域の各々から前記複数のマイクロホンの各々に至る音伝搬の複数の伝達関数を要素として含む行列を算出し、当該行列から１つの逆行列を求めておき、この逆行列を用いて前記目的音源から発生する音を抽出する演算手段と、を備えて構成される。 In order to solve the above problems, a sound source extraction system of the present invention is a sound source extraction system that extracts sound propagating from a target sound source, and is distributed and arranged outside a predetermined extraction region divided into a plurality of unit regions. A plurality of microphones collecting sound propagating from one or more sound sources including the target sound source, and the output signals of the plurality of microphones are associated with the positions of the one or more sound sources in advance. A matrix including a plurality of transfer functions of sound propagation from each of the plurality of unit regions to each of the plurality of microphones as an element is calculated, and one inverse matrix is obtained from the matrix, and the inverse matrix is used. And an arithmetic means for extracting a sound generated from the target sound source.

本発明の音源抽出システムによれば、目的音源と妨害音源が存在する抽出領域を複数の単位領域に区分し、複数のマイクロホンの各出力信号に基づき、予め各単位領域の位置毎に音伝搬の伝達関数を求めて、その逆特性を与える１つの逆行列を用いることで、目的音源から伝搬する音を抽出するものである。よって、各マイクロホンの位置から同一方向に目的音源と妨害音源が存在する状況であっても、従来のビームフォーミング法の音源分離のような煩雑な演算を行うことなく、抽出領域の全体に分布するマイクロホンの出力信号に基づき確実に目的音源を妨害音源から分離して抽出可能となる。 According to the sound source extraction system of the present invention, the extraction region where the target sound source and the disturbing sound source are present is divided into a plurality of unit regions, and sound propagation is performed in advance for each position of each unit region based on each output signal of the plurality of microphones. The sound that propagates from the target sound source is extracted by obtaining a transfer function and using one inverse matrix that gives its inverse characteristics. Therefore, even in a situation where the target sound source and the disturbing sound source exist in the same direction from the position of each microphone, it is distributed over the entire extraction region without performing complicated calculations such as sound source separation in the conventional beam forming method. The target sound source can be reliably separated and extracted from the disturbing sound source based on the output signal of the microphone.

本発明において、前記複数のマイクロホンに含まれる所定数の前記マイクロホンをそれぞれ具備し、前記抽出領域外の異なる位置に配置された複数のマイクロホンアレイを更に設けることができる。この場合、音源抽出システムにおける複数のマイクロホンアレイは、所謂アレイオブアレイズ（array of arrays）を構成する。例えば、Ｌ個のマイクロホンアレイの各々がＭ個のマイクロホンを有する場合、全部でＬ×Ｍ個のマイクロホンが設置されることになる。このような配置であっても、システム全体で１つの逆行列を生成するので、前述のように同一方向に存在する目的音源と妨害音源を分離する効果が得られることに加え、容易に設置できる点で有用性が高い。なお、複数のマイクロホンアレイは、抽出領域外で偏った配置にせずに、外縁部の近傍に分散配置することが望ましい。 In the present invention, a plurality of microphone arrays each having a predetermined number of the microphones included in the plurality of microphones and arranged at different positions outside the extraction region can be further provided. In this case, the plurality of microphone arrays in the sound source extraction system constitute a so-called array of arrays. For example, when each of the L microphone arrays has M microphones, L × M microphones are installed in total. Even with such an arrangement, since one inverse matrix is generated in the entire system, the effect of separating the target sound source and the disturbing sound source existing in the same direction as described above can be obtained, and in addition, it can be easily installed. Highly useful in terms. Note that it is desirable that the plurality of microphone arrays be dispersedly arranged in the vicinity of the outer edge portion without being biasedly arranged outside the extraction region.

本発明において、前記複数のマイクロホンアレイとして、球形状のバッフルの表面に前記所定数のマイクロホンが配置された球状マイクロホンアレイを用いることができる。球状マイクロンアレイは小型に構成できる点でメリットがあるとともに、比較的簡単に前述の逆行列を生成する演算を行うことが可能となる。また、システムとしてロバストになる。 In the present invention, a spherical microphone array in which the predetermined number of microphones are arranged on the surface of a spherical baffle can be used as the plurality of microphone arrays. The spherical micron array is advantageous in that it can be made compact, and it is possible to perform the operation for generating the inverse matrix relatively easily. Moreover, it becomes robust as a system.

また、上記課題を解決するために、本発明の音源抽出方法は、目的音源から伝搬する音を抽出する音源抽出方法であって、複数の単位領域に分割される所定の抽出領域外に分散配置された複数のマイクロホンにより、前記目的音源を含む１又は２以上の音源から伝搬する音を収集する音収集ステップと、前記複数のマイクロホンのそれぞれの出力信号に基づき、予め前記１又は２以上の音源の位置に対応付けられた前記複数の単位領域の各々から前記複数のマイクロホンの各々に至る音伝搬の複数の伝達関数を要素として含む行列を算出し、当該行列から得られた１つの逆行列を求めておき、この逆行列を用いて前記目的音源から伝搬する音を抽出する演算ステップと、を備えている。 In order to solve the above-described problem, the sound source extraction method of the present invention is a sound source extraction method for extracting sound propagating from a target sound source, and is distributed outside a predetermined extraction area divided into a plurality of unit areas. A sound collecting step of collecting sounds propagating from one or more sound sources including the target sound source using the plurality of microphones, and the one or more sound sources in advance based on respective output signals of the plurality of microphones A matrix including a plurality of transfer functions of sound propagation from each of the plurality of unit regions associated with the position to each of the plurality of microphones as an element, and one inverse matrix obtained from the matrix is calculated. And a calculation step for extracting a sound propagated from the target sound source using the inverse matrix.

本発明の音源抽出方法によれば、前述の音源抽出システムと同様の作用効果を実現することができる。また、複数のマイクロホンアレイを更に設ける構成についても、前述と同様に適用可能である。なお、本発明の音源抽出方法において、前記演算ステップでは、前記複数のマイクロホンの配置に応じた空間的窓関数を用いて前記目的音源を抽出することが望ましい。 According to the sound source extraction method of the present invention, it is possible to achieve the same operational effects as those of the sound source extraction system described above. Further, a configuration in which a plurality of microphone arrays are further provided can be applied in the same manner as described above. In the sound source extraction method of the present invention, it is preferable that the target sound source is extracted using a spatial window function corresponding to the arrangement of the plurality of microphones in the calculation step.

本発明によれば、複数の単位領域に分割される抽出領域外に複数のマイクロホンを分散配置し、音伝搬の複数の伝達関数に基づいて得られる１つの逆行列を用いて目的音源から伝搬する音を抽出するようにしたので、従来のビームフォーミング法で問題となる目的音源と妨害音源の方向性による影響を回避しつつ、シンプルな演算処理により信頼性の高い音源抽出システムを構築することが可能となる。 According to the present invention, a plurality of microphones are dispersedly arranged outside an extraction region divided into a plurality of unit regions, and propagated from a target sound source using one inverse matrix obtained based on a plurality of transfer functions of sound propagation. Since the sound is extracted, it is possible to construct a highly reliable sound source extraction system by simple arithmetic processing while avoiding the influence of the direction of the target sound source and the disturbing sound source, which is a problem in the conventional beam forming method. It becomes possible.

本実施形態の音源抽出システムで用いる主な構成要素である球状マイクロホンアレイの構造を示す図である。It is a figure which shows the structure of the spherical microphone array which is the main components used with the sound source extraction system of this embodiment. 本実施形態の音源抽出システムにおける複数のマイクロホンアレイの配置例を示す図である。It is a figure which shows the example of arrangement | positioning of the several microphone array in the sound source extraction system of this embodiment. 図２の配置例に対応する音源抽出システムの機能ブロックの一例を示す図である。It is a figure which shows an example of the functional block of the sound source extraction system corresponding to the example of arrangement | positioning of FIG. 演算処理部で実行される演算処理のうち主に逆行列の演算に関連する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process mainly relevant to the calculation of an inverse matrix among the arithmetic processes performed by the arithmetic processing part. 本実施形態の音源抽出システムにおいて演算処理を適用する場合との比較のため、従来のビームフォーミング法を適用する場合を模式的に示す図である。It is a figure which shows typically the case where the conventional beam forming method is applied for the comparison with the case where calculation processing is applied in the sound source extraction system of this embodiment. 本実施形態の音源抽出システムにおいて演算処理を適用する場合を模式的に示す図である。It is a figure which shows typically the case where a calculation process is applied in the sound source extraction system of this embodiment. 本実施形態の音源抽出システムを用いる場合の性能の検証結果を説明する図である。It is a figure explaining the verification result of the performance in the case of using the sound source extraction system of this embodiment.

以下、本発明を適用した音源抽出システムの実施形態について添付図面を参照しながら説明する。ただし、以下に述べる実施形態は本発明の技術思想を適用した形態の例であって、本発明が本実施形態の内容により限定されることはない。 Embodiments of a sound source extraction system to which the present invention is applied will be described below with reference to the accompanying drawings. However, the embodiments described below are examples of forms to which the technical idea of the present invention is applied, and the present invention is not limited by the contents of the present embodiments.

図１は、本実施形態の音源抽出システムで用いる主な構成要素である球状マイクロホンアレイ（以下、単に「マイクロホンアレイ」という）１の構造を示している。図１に示すマイクロホンアレイ１は、硬質材料からなる球形状のバッフル１０と、このバッフル１０の球表面の所定位置にそれぞれ配置された複数のマイクロホン１１と、複数のマイクロホン１１から出力される電気信号を伝送する複数の配線が収納された配線部１２とを備えている。 FIG. 1 shows the structure of a spherical microphone array (hereinafter simply referred to as “microphone array”) 1, which is a main component used in the sound source extraction system of the present embodiment. A microphone array 1 shown in FIG. 1 includes a spherical baffle 10 made of a hard material, a plurality of microphones 11 disposed at predetermined positions on the sphere surface of the baffle 10, and electrical signals output from the plurality of microphones 11. And a wiring section 12 in which a plurality of wirings for transmitting are stored.

図１の下部に示すように、本実施形態の音源抽出システムを含む空間内の位置は、Ｘ、Ｙ、Ｚ座標を変換して、方位角θ、仰角φ、距離ｒによる極座標で表示される。例えば、任意のマイクロホン１１の極座標上の位置は、（θ_ｍ、φ_ｍ、ｒ_ｍ）と表すことができ、球形状のバッフル１０の中心を原点と仮定すれば、１つのマイクロホンアレイ１に取り付けられた全てのマイクロホン１１は互いに等しい距離ｒ_ｍに設定されることになる。 As shown in the lower part of FIG. 1, the position in the space including the sound source extraction system of this embodiment is displayed in polar coordinates by converting the X, Y, and Z coordinates, and the azimuth angle θ, the elevation angle φ, and the distance r. . For example, the position of an arbitrary microphone 11 on the polar coordinates can be expressed as (θ _m , φ _m , r _m ). If the center of the spherical baffle 10 is assumed to be the origin, the microphone 11 is attached to one microphone array 1. are all microphone 11 will be set at equal distances r _m from each other.

１つのマイクロホンアレイ１が有する複数のマイクロホン１１のそれぞれの位置については制約されないが、一般的なビームフォーミング法と同様の構成を採用することができる。また、１つのマイクロホンアレイ１が有する複数のマイクロホン１１の個数については、少なすぎると精度が低下し、多すぎると後述の演算に必要な演算量が増大する。例えば、１つのマイクロホンアレイ１に６４個のマイクロホン１１が取り付けられる。 Each position of the plurality of microphones 11 included in one microphone array 1 is not limited, but a configuration similar to a general beam forming method can be employed. Further, if the number of the plurality of microphones 11 included in one microphone array 1 is too small, the accuracy is lowered. If the number is too large, the amount of computation necessary for the later-described computation increases. For example, 64 microphones 11 are attached to one microphone array 1.

ここで、各マイクロホン１１の音圧ｐ_ｍは、次の（１）式で表される。

ただし、ｋ：波数（＝２πｆ／ｃ）
ｒ^→ _ｍ：マイクロホンの位置べクトル
ｒ^→ _ｓ：音源の位置べクトル
ｐ_ｓ：音源の音圧
ｈ_ｎ：球ハンケル関数
ｈ’_ｎ：ｈ_ｎを微分した関数
Ｐ_ｎ：ｎ次ルジャンドル多項式 Here, the sound pressure _{p m} of each microphone 11 is expressed by the following equation (1).

Where k: wave number (= 2πf / c)
r ^→ _m : Microphone position vector
r ^→ _s : position vector of sound source
p _s : Sound pressure of the sound source
h _n : Spherical Hankel function
h ′ _n : a function obtained by differentiating h _n
P _n : n-th order Legendre polynomial

なお、従来の手法では、上述の音圧ｐ_ｍをそれぞれのマイクロホン１１について取得し、所謂ビームフォーミング法を用いて目的音源の抽出を行うのに対し、本実施形態の音源抽出システムにおいては、従来のビームフォーミング法とは異なる手法で目的音源の抽出を行う点が特徴的である。この点の詳細については後述する。 In the conventional method, the sound pressure p _m of the above-obtained for each of the microphones 11, while the extraction of the target sound source by using a so-called beam forming method, the signal extraction system of the present embodiment, conventional It is characteristic that the target sound source is extracted by a method different from the beam forming method. Details of this point will be described later.

本実施形態の音源抽出システムでは、図１のマイクロホンアレイ１を複数個用いることで、所謂アレイオブアレイズ（array of arrays）を構成する。各マイクロホン１１及び各マイクロホンアレイ１はサンプリング同期させる必要がある。図２は、本実施形態の音源抽出システムにおける複数のマイクロホンアレイ１の配置例を示している。図２では、理解の容易のため、ＸＹ座標で表される領域ＡＡに配置される音源抽出システムを想定しているが、実際の音源抽出システムはＺ方向も含む３次元空間に構成される。図２の例では、抽出領域Ａを含む矩形状の領域ＡＡのうち、抽出領域Ａの外部における４つの角部に対称的に配置された４つのマイクロホンアレイ１（ａ）、１（ｂ）、１（ｃ）、１（ｄ）を示している。この場合、各々のマイクロホンアレイ１がＮ個ずつのマイクロホン１１を有する場合、全部で４Ｎ個のマイクロホン１１が存在することになる。なお、それぞれのマイクロホンアレイ１の設置位置は自由に定めることができるが、できるだけ位置が偏らないよう、抽出領域Ａの外縁部の近傍に分散配置することが望ましい。 In the sound source extraction system of the present embodiment, a so-called array of arrays is configured by using a plurality of microphone arrays 1 of FIG. Each microphone 11 and each microphone array 1 must be sampling-synchronized. FIG. 2 shows an arrangement example of a plurality of microphone arrays 1 in the sound source extraction system of the present embodiment. In FIG. 2, for the sake of easy understanding, the sound source extraction system arranged in the area AA represented by the XY coordinates is assumed, but the actual sound source extraction system is configured in a three-dimensional space including the Z direction. In the example of FIG. 2, among the rectangular areas AA including the extraction area A, four microphone arrays 1 (a), 1 (b), which are symmetrically arranged at four corners outside the extraction area A, 1 (c) and 1 (d) are shown. In this case, when each microphone array 1 has N microphones 11, there are 4N microphones 11 in total. In addition, although the installation positions of the respective microphone arrays 1 can be freely determined, it is desirable to disperse them in the vicinity of the outer edge of the extraction region A so that the positions are not biased as much as possible.

抽出領域Ａは、Ｘ方向及びＹ方向に沿って等間隔に配置された直線群により多数のグリッドＧ（本発明の単位領域）を構成する。そして、抽出すべき目的音源を含む１又は２以上の音源の各々は、抽出領域ＡのいずれかのグリッドＧに点音源として配置されると仮定し、そのグリッドＧの位置に基づいて各音源から個々のマイクロホン１１に至る伝達関数による音源抽出の演算処理が行われる。なお、具体的な演算処理については後述する。ここで、抽出領域Ａ内におけるグリッドＧのサイズ及び個数は特に制約されないが、演算量と空間分解能に応じて適切に設定される。すなわち、抽出領域Ａ内において、グリッドＧが小さ過ぎると演算量が増加し、グリッドＧが大き過ぎると空間分解能が足りずに音源同士の分離が難しくなるので、グリッドＧを適度なサイズに設定する必要がある。 The extraction area A constitutes a number of grids G (unit areas of the present invention) by a group of straight lines arranged at equal intervals along the X and Y directions. Then, it is assumed that each of one or more sound sources including the target sound source to be extracted is arranged as a point sound source in any grid G of the extraction region A, and based on the position of the grid G, Calculation processing of sound source extraction by a transfer function reaching each microphone 11 is performed. Specific calculation processing will be described later. Here, the size and number of grids G in the extraction region A are not particularly limited, but are appropriately set according to the calculation amount and the spatial resolution. That is, in the extraction area A, if the grid G is too small, the amount of computation increases. If the grid G is too large, the spatial resolution is insufficient and separation of sound sources becomes difficult, so the grid G is set to an appropriate size. There is a need.

図３は、図２の配置例に対応する音源抽出システムの機能ブロックの一例を示している。図３に示す音源抽出システムには、図２の４つのマイクロホンアレイ１（ａ）、１（ｂ）、１（ｃ）、１（ｄ）に加えて、ＡＤ変換部２０と、演算処理部２１と、出力部２２とを備えている。このうち、ＡＤ変換部２０、演算処理部２１、出力部２２は、例えば、前述のマイクロホンアレイ１の４つの配線部１２と接続可能なパーソナルコンピュータ等により一体的に構成することができる。 FIG. 3 shows an example of functional blocks of a sound source extraction system corresponding to the arrangement example of FIG. In the sound source extraction system shown in FIG. 3, in addition to the four microphone arrays 1 (a), 1 (b), 1 (c), and 1 (d) shown in FIG. And an output unit 22. Among these, the AD conversion unit 20, the arithmetic processing unit 21, and the output unit 22 can be integrally configured by, for example, a personal computer that can be connected to the four wiring units 12 of the microphone array 1 described above.

４つのマイクロホンアレイ１が有する４Ｎ個のマイクロホン１１は、各音源から伝搬した音を収集してアナログ信号Ｓａにそれぞれ変換し、それを対応する配線部１２を経由してＡＤ変換部２０に伝送する。ＡＤ変換部２０は、４Ｎ個のマイクロホン１１から出力された４Ｎ個のアナログ信号Ｓａを所定のサンプリング周波数でそれぞれサンプリングし、４Ｎ個のディジタル信号Ｓｄに変換する。すなわち、ＡＤ変換部２０には、少なくともマイクロホン１１の個数に相当する複数のＡＤ変換器が並列に配置されている。演算処理部２１は、ＡＤ変換部２０で得られた各ディジタル信号Ｓｄを用いて、目的音源の抽出に必要な後述の演算処理を実行し、演算結果に対応する信号Ｓを生成する。出力部２２は、演算処理部２１から出力された信号Ｓを、システム外部の装置あるいはシステム内部の記憶手段や表示手段等に出力する。 The 4N microphones 11 included in the four microphone arrays 1 collect the sounds propagated from the respective sound sources, convert them into analog signals Sa, and transmit them to the AD conversion unit 20 via the corresponding wiring units 12. . The AD converter 20 samples the 4N analog signals Sa output from the 4N microphones 11 at a predetermined sampling frequency, and converts them into 4N digital signals Sd. That is, in the AD conversion unit 20, a plurality of AD converters corresponding to at least the number of microphones 11 are arranged in parallel. The arithmetic processing unit 21 uses the digital signals Sd obtained by the AD conversion unit 20 to perform later-described arithmetic processing necessary for extracting the target sound source, and generates a signal S corresponding to the calculation result. The output unit 22 outputs the signal S output from the arithmetic processing unit 21 to a device outside the system or a storage unit or display unit inside the system.

次に、本実施形態の音源抽出システムにおける演算処理の概要について説明する。図４は、予め逆行列を算出する処理の流れを示すフローチャートである。ここでは、所定の抽出領域Ａ内に存在するグリッドＧに既知の出力音を発生する基準音源を配置する。図４に示すように、各マイクロホン１１の出力信号（図３のディジタル信号Ｓｄに相当）に基づいて、Ｎ個のマイクロホンアレイ１が有する全てのマイクロホン１１の出力信号に対応する音圧ｐ_ｍを取得する（ステップＳ１）。例えば、全部でＬ個のマイクロホン１１が存在する場合、それぞれに対応するＬ個の音圧ｐ_ｍが得られる。 Next, an outline of arithmetic processing in the sound source extraction system of the present embodiment will be described. FIG. 4 is a flowchart showing a flow of processing for calculating an inverse matrix in advance. Here, a reference sound source that generates a known output sound is arranged in a grid G existing in a predetermined extraction area A. As shown in FIG. 4, on the basis of the output signals of the microphones 11 (corresponding to the digital signal Sd in FIG. 3), the sound pressure p _m corresponding to the output signals of all the microphones 11 having the N pieces of the microphone array 1 Obtain (step S1). For example, if there are L number of microphones 11 in total, the L of the sound pressure p _m corresponding to each are obtained.

ここで、球状のマイクロホンアレイ１を用いる場合の各々のマイクロホン１１における音圧ｐ_ｍは、既に説明したように、前述の（１）式で表される。グリッドＧに存在する基準音源からの音は、多様な経路を経て各々のマイクロホン１１に入力される。よって、各々のマイクロホンアレイ１が有する各々のマイクロホン１１に関し、当該基準音源の位置（図２のグリッドＧに対応）からの音伝搬の伝達関数を得ることができ、これを全てのグリッドＧに対して順次計算する（ステップＳ２）。ステップＳ２において、所定のマイクロホンアレイ１が有する各マイクロホン１１の伝達関数Ｈ_ａｍは、例えば、（１）式に関連して、次の（２）式で表すことができる。

ただし、ｒ^→ _ｍ：マイクロホンの位置べクトル
ｒ^→ _ａ：マイクロホンアレイの位置べクトル
ｒ^→：音源の位置べクトル
Ｒ_ａ：バッフルの半径
ｈ_ｎ：球ハンケル関数
ｈ’_ｎ：ｈ_ｎを微分した関数
Ｐ_ｎ：ｎ次ルジャンドル多項式 Here, the sound pressure p _m in each of the microphones 11 in the case of using a microphone array 1 of spherical, as already described, is represented by the above formula (1). The sound from the reference sound source existing in the grid G is input to each microphone 11 through various paths. Therefore, with respect to each microphone 11 included in each microphone array 1, a transfer function of sound propagation from the position of the reference sound source (corresponding to the grid G in FIG. 2) can be obtained. Are sequentially calculated (step S2). In step S2, the transfer function H _am of each microphone 11 included in the predetermined microphone array 1 can be expressed by, for example, the following equation (2) in relation to the equation (1).

Where r ^→ _m : microphone position vector
r ^→ _a : Microphone array position vector
r ^→ : Sound source position vector
R _a : radius of baffle
h _n : Spherical Hankel function
h ′ _n : a function obtained by differentiating h _n
P _n : n-th order Legendre polynomial

ここで、所定のマイクロホンアレイ１が有する各マイクロホン１１の全音圧ｐ_ａｍは、実際には、次の（３）式で表すように抽出領域Ａ内の体積積分で表される。

ただし、ψ：各位置の音源の分布 Here, the total sound pressure p _am of each microphone 11 included in the predetermined microphone array 1 is actually represented by volume integration in the extraction region A as represented by the following equation (3).

Where ψ: Distribution of sound sources at each position

一方、本実施形態の抽出領域Ａは前述のようにグリッドＧに分割されるので、所定の空間分解能に応じて（３）式の演算回数が増減し、グリッドＧの設定により演算処理を簡素化することができる。まず、全てのマイクロホンアレイ１が有する全てのマイクロホン１１の出力を要素とする出力ベクトルＳを用いると、次の（４）式が成り立つ。

ただし、Ｈ：全ての伝達関数Ｈ_ａｍからなる行列
Λ：全てのグリッド点に音源があると仮定したときにそれぞれの音源が発生している音の大きさ（分布）
（４）式において、分布Λの要素は、抽出領域Ａ内の任意のグリッドＧにおける音エネルギーの合計を表している。 On the other hand, since the extraction area A of the present embodiment is divided into the grid G as described above, the number of calculations of the expression (3) increases or decreases according to a predetermined spatial resolution, and the calculation process is simplified by setting the grid G. can do. First, when an output vector S having elements of outputs of all microphones 11 included in all microphone arrays 1 is used, the following expression (4) is established.

Where H: matrix consisting of all transfer functions H _am
Λ: Sound volume (distribution) generated by each sound source when it is assumed that there are sound sources at all grid points
In the equation (4), the element of the distribution Λ represents the sum of sound energy in an arbitrary grid G in the extraction area A.

本実施形態の音源抽出システムでは、上述の分布Λを求めるために（４）式より全ての伝達関数Ｈ_ａｍを要素とする行列Ｈを求め、さらにその逆行列Ｈ^−１を予め求めておき、この１つの逆行列Ｈ^−１を用いた演算を行う点が特徴的である。従って、（２）式で得られる全ての伝達関数Ｈ_ａｍに基づき、前述の逆行列Ｈ^−１を生成する（ステップＳ３）。一方、抽出領域Ａ内にある目的音源の音圧ｐ_ｔを求める場合は、まず全てのマイクロホンアレイ１が有する全てのマイクロホン１１の出力信号に基づき、出力ベクトルＳを決定する。 In the sound source extraction system of this embodiment, in order to obtain the above distribution Λ, a matrix H having all transfer functions H _am as elements is obtained from the equation (4), and an inverse matrix H ⁻¹ is obtained in advance. It is characteristic that an operation using this one inverse matrix H ⁻¹ is performed. Therefore, the above-described inverse matrix H ⁻¹ is generated based on all the transfer functions H _am obtained by the expression (2) (step S3). On the other hand, when obtaining the sound pressure p _t of the target sound source in the extraction region A, based on the output signals of all the microphones 11 to first all the microphone array 1 has, determines the output vector S.

次いで、目的音源の音圧ｐ_ｔを次の（５）式に基づき算出する。

ただし、Ｗ_ｔ：空間的窓関数 Then calculated based on the sound pressure p _t of the target source in the following equation (5).

Where W _t : spatial window function

なお、（５）式で用いる空間的窓関数Ｗ_ｔは、グリッド化された抽出領域Ａの空間分解能に依存して適宜に設定される。以上のようにして、本実施形態の音源抽出システムの演算処理の結果、抽出領域Ａ内における目的音源が抽出でき、後述するように妨害音源が存在する場合であっても、目的音源を確実に分離可能となる。 Note that the spatial window function W _t used in the equation (5) is set as appropriate depending on the spatial resolution of the gridded extraction region A. As described above, the target sound source can be extracted in the extraction area A as a result of the calculation processing of the sound source extraction system of the present embodiment, and the target sound source can be reliably detected even when there is a disturbing sound source as will be described later. Separable.

ここで、本実施形態の音源抽出システムにおいて、前述の演算処理を適用する場合の効果について、図５及び図６を用いて説明する。図５は、２個のマイクロホンアレイ１ａ、１ｂにより目的音源３０を抽出する際、それぞれ同じ方向に２個の妨害音源３１ａ、３１ｂが存在する状況で、従来のビームフォーミング法を適用する場合を模式的に示し、図６は、図５と同様の状況で本発明に係る手法を適用する場合を模式的に示している。いずれにおいても、一方のマイクロホンアレイ１ａからビームＢａの方向に目的音源３０及び一方の妨害音源３１ａが配置され、他方のマイクロホンアレイ１ｂから見てビームＢｂの方向に目的音源３０及び他方の妨害音源３１ｂが配置され、さらに両方のビームＢａ、Ｂｂが互いに直交する位置関係にある。 Here, in the sound source extraction system of the present embodiment, the effect when the above-described arithmetic processing is applied will be described with reference to FIGS. 5 and 6. FIG. 5 schematically shows a case where the conventional beam forming method is applied in a situation where there are two disturbing sound sources 31a and 31b in the same direction when the target sound source 30 is extracted by the two microphone arrays 1a and 1b. FIG. 6 schematically shows a case where the method according to the present invention is applied in the same situation as FIG. In any case, the target sound source 30 and the one disturbing sound source 31a are arranged in the direction of the beam Ba from the one microphone array 1a, and the target sound source 30 and the other disturbing sound source 31b in the direction of the beam Bb as seen from the other microphone array 1b. Are arranged, and both beams Ba and Bb are in a positional relationship orthogonal to each other.

まず、図５に示すように従来の手法を適用する場合には、一方のマイクロホンアレイ１ａによるビームＢａの範囲に目的音源３０と妨害音源３１ａの両方が存在するとともに、他方のマイクロホンアレイ１ｂによるビームＢｂの範囲に目的音源３０と妨害音源３１ｂの両方が存在する状態にある。従来の手法では、２個のマイクロホンアレイ１ａ、１ｂは、それぞれが独立に音響拡散の逆特性を計算して目的音源３０を抽出するので、それぞれのビームＢａ、Ｂｂの指向性（ビーム幅）の制約により目的音源３０を妨害音源３１ａ、３１ｂから分離することが困難となる。この場合、複雑な音源分離アルゴリズムを適用して目的音源３０と妨害音源３１ａ、３１ｂを分離することは演算量の増加を招くので現実的ではない。 First, when the conventional method is applied as shown in FIG. 5, both the target sound source 30 and the disturbing sound source 31a exist in the range of the beam Ba by one microphone array 1a, and the beam by the other microphone array 1b. Both the target sound source 30 and the disturbing sound source 31b exist in the range of Bb. In the conventional method, each of the two microphone arrays 1a and 1b independently calculates the inverse characteristic of acoustic diffusion and extracts the target sound source 30, so that the directivity (beam width) of each beam Ba and Bb is determined. Due to the restrictions, it becomes difficult to separate the target sound source 30 from the disturbing sound sources 31a and 31b. In this case, it is not realistic to apply the complicated sound source separation algorithm to separate the target sound source 30 and the disturbing sound sources 31a and 31b because the amount of calculation increases.

これに対し、図６では、本発明に係る手法を適用する場合において、予めそれぞれのマイクロホンアレイ１ａ、１ｂの全てのマイクロホン１１の出力に基づき計算される前述の１つの逆行列を用いて目的音源３０を抽出する。よって、例えば、一方のマイクロホンアレイ１ａから見たとき、図６に仮想的なビームＢｃを示すように、互いに方向が異なる目的音源３０と妨害音源３１ａ、３１ｂを分離することができる。他方のマイクロホンアレイ１ｂから見たときの目的音源３０と妨害音源３１ａ、３１ｂの関係についても同様である。従って、音源抽出システムの全体において、目的音源３０とそれ以外の多数の妨害音源がそれぞれ異なるグリッドＧに位置している限り、各々の妨害音源の影響を受けることなく目的音源３０を容易に抽出することが可能となる。 On the other hand, in FIG. 6, when applying the method according to the present invention, the target sound source is obtained using the above-described one inverse matrix calculated based on the outputs of all the microphones 11 of the respective microphone arrays 1a and 1b. 30 is extracted. Therefore, for example, when viewed from one microphone array 1a, the target sound source 30 and the disturbing sound sources 31a and 31b having different directions can be separated as shown in a virtual beam Bc in FIG. The same applies to the relationship between the target sound source 30 and the disturbing sound sources 31a and 31b when viewed from the other microphone array 1b. Therefore, as long as the target sound source 30 and many other disturbing sound sources are located on different grids G in the entire sound source extraction system, the target sound source 30 can be easily extracted without being affected by each disturbing sound source. It becomes possible.

次に、本実施形態の音源抽出システムを用いる場合の性能のシミュレーション検証結果について、図７を用いて説明する。本発明との対比のため、図７（Ａ）は音源抽出に関する手法を適用しない場合、図７（Ｂ）は従来のビームフォーミング法を適用する場合、図７（Ｃ）は本発明に係る手法を適用する場合のそれぞれの実験による検証結果が示される。いずれの図においても、目的音源３０と妨害音源３１の両方が存在する状況で、それぞれ目的音源３０の抽出性能を比較した。横軸の時間範囲は１秒とし、縦軸は−１〜＋１の範囲で正規化された音圧とした。また、音源抽出システムとしては、音源抽出の対象である抽出領域の中心から互いに３ｍ離れた２個のマイクロホンアレイ１を設置し、それぞれに６４個のマイクロホン１１を取り付けた条件とした。また、目的音源３０としては、２５ｍｓのホワイトノイズのバースト信号の出力期間と２５ｍｓの信号停止期間とを繰り返すように設定した。なお、目的音源３０と妨害音源３１は独立である。 Next, performance simulation verification results when the sound source extraction system of this embodiment is used will be described with reference to FIG. For comparison with the present invention, FIG. 7A shows a case where a method related to sound source extraction is not applied, FIG. 7B shows a case where a conventional beam forming method is applied, and FIG. 7C shows a method according to the present invention. The verification result by each experiment when applying is shown. In any figure, the extraction performance of the target sound source 30 was compared in the situation where both the target sound source 30 and the disturbing sound source 31 exist. The time range on the horizontal axis was 1 second, and the vertical axis was the sound pressure normalized in the range of −1 to +1. As the sound source extraction system, two microphone arrays 1 separated from each other by 3 m from the center of the extraction region that is the target of sound source extraction were installed, and 64 microphones 11 were attached to each. The target sound source 30 was set to repeat a 25 ms white noise burst signal output period and a 25 ms signal stop period. The target sound source 30 and the disturbing sound source 31 are independent.

図７（Ａ）に示すように、上記各手法を適用しない場合の結果から、目的音源３０と妨害音源３１は、それぞれの音圧レベルのピークが等しくなるように設定した。よって、図７（Ａ）では、目的音源３０の音圧レベルが妨害音源３１に埋もれた状態となる。一方、従来のビームフォーミング法を適用した図７（Ｂ）では、目的音源３０を妨害音源３１から分離できるが、そのＳＮ比は約１０．４ｄＢであった。これに対し、本発明に係る手法を適用した図７（Ｃ）では、目的音源３０を妨害音源３１から分離でき、かつＳＮ比が約１８．６ｄＢとなり、図７（Ｂ）に比べて明確に改善されたことが確認できた。 As shown in FIG. 7A, based on the results when the above methods are not applied, the target sound source 30 and the disturbing sound source 31 were set to have the same sound pressure level peaks. Therefore, in FIG. 7A, the sound pressure level of the target sound source 30 is buried in the disturbing sound source 31. On the other hand, in FIG. 7B to which the conventional beam forming method is applied, the target sound source 30 can be separated from the disturbing sound source 31, but the SN ratio thereof is about 10.4 dB. On the other hand, in FIG. 7C to which the method according to the present invention is applied, the target sound source 30 can be separated from the disturbing sound source 31, and the SN ratio is about 18.6 dB, which is clearly compared with FIG. 7B. The improvement was confirmed.

以上説明したように、本発明を適用した音源抽出システム（音源抽出方法）を採用することにより、グリッド化した抽出領域Ａ内の各音源から全てのマイクロホン１１に至る伝達関数の行列を反転した１つの逆行列を生成し、それにより妨害音源の方向性に影響されることなく目的音源を良好な性能で抽出することができる。この場合、従来のビームフォーミング法で用いる複雑な音源分離アルゴリズムは不要であり、シンプルな演算処理で目的音源の抽出が可能となる。また、本発明を適用した音源抽出システムにより、目的音源が静止している場合だけではなく、目的音源が移動している場合もそれに追随して抽出することができる。なお、抽出領域Ａ内の複数のマイクロホン１１は、マイクロホンアレイ１を構成しない場合も本発明の適用が可能であるが、所定数のマイクロホンアレイ１を用いることで、抽出領域Ａの角部などに簡単に設置できる効果を得られる。 As described above, by adopting the sound source extraction system (sound source extraction method) to which the present invention is applied, the matrix of the transfer function from each sound source in the grid extraction region A to all the microphones 11 is inverted 1 By generating two inverse matrices, the target sound source can be extracted with good performance without being affected by the directionality of the disturbing sound source. In this case, the complicated sound source separation algorithm used in the conventional beam forming method is unnecessary, and the target sound source can be extracted by a simple arithmetic processing. In addition, the sound source extraction system to which the present invention is applied can extract not only when the target sound source is stationary but also when the target sound source is moving. Note that the present invention can be applied to a plurality of microphones 11 in the extraction region A even when the microphone array 1 is not configured. However, by using a predetermined number of microphone arrays 1, the corners of the extraction region A can be used. The effect that it can be installed easily is obtained.

以上、本実施形態に基づき本発明の内容を具体的に説明したが、本発明は上述の実施形態に限定されるものではなく、その要旨を逸脱しない範囲で多様な変更を施すことができる。上記実施形態の主な構成要素（図１、図２、図３）や演算処理の手順（図４）などについては、上記実施形態で開示した内容に限定されるものではなく、本発明の作用効果を得られる限り、適宜に変更可能である。 The contents of the present invention have been specifically described above based on the present embodiment, but the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the scope of the present invention. The main components (FIGS. 1, 2, and 3) and the calculation processing procedure (FIG. 4) of the above embodiment are not limited to the contents disclosed in the above embodiment, and the operation of the present invention. As long as the effect is obtained, it can be appropriately changed.

１…マイクロホンアレイ
１０…バッフル
１１…マイクロホン
１２…配線部
２０…ＡＤ変換部
２１…演算処理部
２２…出力部
３０…目的音源
３１…妨害音源
Ａ…抽出領域
Ｇ…グリッド DESCRIPTION OF SYMBOLS 1 ... Microphone array 10 ... Baffle 11 ... Microphone 12 ... Wiring part 20 ... AD conversion part 21 ... Arithmetic processing part 22 ... Output part 30 ... Target sound source 31 ... Interference sound source A ... Extraction area G ... Grid

Claims

A sound source extraction system that extracts sound propagating from a target sound source,
A plurality of microphones that are distributed outside a predetermined extraction region divided into a plurality of unit regions and that collect sound propagating from one or more sound sources including the target sound source;
A plurality of transfer functions of sound propagation from each of the plurality of unit regions previously associated with the positions of the one or more sound sources to each of the plurality of microphones based on output signals of the plurality of microphones Calculating a matrix including the element, obtaining one inverse matrix from the matrix, and using the inverse matrix to extract sound propagating from the target sound source,
A sound source extraction system comprising:

2. The sound source extraction system according to claim 1, further comprising a plurality of microphone arrays respectively provided with a predetermined number of the microphones included in the plurality of microphones and disposed at different positions outside the extraction region.

The sound source extraction system according to claim 2, wherein the plurality of microphone arrays are distributedly arranged in the vicinity of an outer edge portion of the extraction region.

The sound source extraction system according to claim 2 or 3, wherein the plurality of microphone arrays are spherical microphone arrays in which the predetermined number of microphones are arranged on a surface of a spherical baffle.

A sound source extraction method for extracting sound propagating from a target sound source,
A sound collecting step of collecting sound propagating from one or more sound sources including the target sound source by using a plurality of microphones distributed outside a predetermined extraction region divided into a plurality of unit regions;
A plurality of transfer functions of sound propagation from each of the plurality of unit regions previously associated with the positions of the one or more sound sources to each of the plurality of microphones based on output signals of the plurality of microphones Calculating a matrix including the element, obtaining one inverse matrix obtained from the matrix, and extracting the sound propagating from the target sound source using the inverse matrix;
A sound source extraction method comprising:

The sound source extraction method according to claim 5, further comprising a plurality of microphone arrays respectively provided with a predetermined number of the microphones included in the plurality of microphones and disposed at different positions outside the extraction region.

The sound source extraction method according to claim 6, wherein in the calculation step, the target sound source is extracted using a spatial window function corresponding to the arrangement of the plurality of microphones.