JP5482854B2

JP5482854B2 - Sound collecting device and program

Info

Publication number: JP5482854B2
Application number: JP2012217315A
Authority: JP
Inventors: 一浩片桐
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2012-09-28
Filing date: 2012-09-28
Publication date: 2014-05-07
Anticipated expiration: 2032-09-28
Also published as: JP2014072708A

Description

本発明は、収音装置及びプログラムに関し、例えば、特定のエリアの音を強調し、それ以外のエリアの音を抑制する収音装置及びプログラムに適用し得るものである。 The present invention relates to a sound collection device and a program, and can be applied to, for example, a sound collection device and a program that emphasizes sounds in a specific area and suppresses sounds in other areas.

特定の方向に存在する音（音声や音響；以下、音声及び音響をまとめて音響と呼ぶこともある）を強調し、それ以外の音を抑圧する技術として、マイクロホンアレイを用いたビームフォーマ（以下、ＢＦ）がある。ここで、ビームフォーマとは、各マイクロホンに到達する信号の時間差を利用して指向性を形成する技術である（非特許文献１参照）。 A beamformer using a microphone array (hereinafter referred to as a technique for emphasizing sound existing in a specific direction (speech and sound; hereinafter referred to as sound collectively)) and suppressing other sounds. BF). Here, the beamformer is a technique for forming directivity using a time difference between signals reaching each microphone (see Non-Patent Document 1).

ＢＦは加算型と減算型の大きく２つの種類に分けられる。特に減算型ＢＦは、加算型ＢＦに比べ、少ないマイクロホン数で指向性を形成できるという利点がある。減算型ＢＦの代表的な手法として、スペクトル減算法（ＳｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎ；以下ＳＳ）を用いたＢＦが挙げられる（非特許文献２参照）。 BF is roughly divided into two types, an addition type and a subtraction type. In particular, the subtraction type BF has an advantage that directivity can be formed with a smaller number of microphones than the addition type BF. As a typical method of the subtraction type BF, there is BF using a spectral subtraction method (hereinafter referred to as SS) (see Non-Patent Document 2).

図１５は、マイクロホン数が２個の場合のＳＳに係る構成を示すブロック図である。図１５において、２個のマイクロホン１１及びマイクロホン１２の間の距離をｄとし、マイクロホン１１及びマイクロホン１２の正面から目的音源Ｔへの角度をθ_Ｌとする。ＳＳでは、まず遅延器１３が目的方向θ_Ｌからマイクロホン１１及びマイクロホン１２に到来する信号の時間差τ_Ｌを算出し、遅延を加えることにより目的音源方向の音信号の位相を合わせる。なお、他の方向からの音は遅延器１３を介しても位相は揃わず強調されない。時間差τ_Ｌは下記（１）式により算出される。式（１）において、ｃは音速、Ｄ_ｉは遅延量である。

FIG. 15 is a block diagram showing a configuration related to SS when the number of microphones is two. 15, the distance between the two microphones 11 and the microphone 12 is d, the angle to the target sound source T and theta _L from the front of the microphone 11 and the microphone 12. In SS, first delay unit 13 calculates the time difference tau _L of signals arriving from the target direction theta _L microphone 11 and the microphone 12, adjust the phase of the target sound source direction of the sound signal by adding delay. Note that the sound from other directions is not emphasized because the phases are not aligned even through the delay device 13. The time difference τ _L is calculated by the following equation (1). In equation (1), c is the speed of sound and _Di is the amount of delay.

ここで、目的音源Ｔがマイクロホン１１とマイクロホン１２の中心に対してマイクロホン１１の方向に存在する場合、マイクロホン１１の入力に対し遅延処理を行う。その後、加算器１４が（２）式に従い加算処理を行い、減算器１５が（３）式に従い減算処理を行う。これにより、加算処理により目的音源方向の音が強調され、また減算処理により目的音源方向以外の音が抽出される。なお、加算処理及び減算処理は、周波数領域でも同様に行うことができ、その場合（２）式、（３）式はそれぞれ式（４）式、（５）式に変更される。図１５では、(４)式、（５）式に従った加算処理及び減算処理を行う場合を例示している。 Here, when the target sound source T exists in the direction of the microphone 11 with respect to the center of the microphone 11 and the microphone 12, a delay process is performed on the input of the microphone 11. Thereafter, the adder 14 performs addition processing according to the equation (2), and the subtractor 15 performs subtraction processing according to the equation (3). Thereby, the sound in the target sound source direction is emphasized by the addition process, and the sound other than the target sound source direction is extracted by the subtraction process. Note that addition processing and subtraction processing can be performed in the same manner in the frequency domain, and in this case, Equation (2) and Equation (3) are changed to Equation (4) and Equation (5), respectively. FIG. 15 illustrates a case where addition processing and subtraction processing are performed according to equations (4) and (5).

加算処理及び減算処理されたデータを用いてスペクトル減算器１６が（６）式に従い処理を行うことにより、目的音源方向の音を強調し、それ以外の音を抑圧することができる。βはＳＳの強度を変更するための係数である。 The spectrum subtractor 16 performs processing according to the expression (6) using the data subjected to the addition processing and the subtraction processing, whereby the sound in the target sound source direction can be emphasized and the other sounds can be suppressed. β is a coefficient for changing the strength of SS.

実環境では、ある特定のエリアの音（以下、目的エリア音）だけを収音したい場合、そのエリアの周囲に多数の雑音（以下、非目的エリア音）が存在する状況が考えられる。通常ＢＦは、上下左右へ直線的にしか指向性を形成することができない。それ故、図１６に示すように目的エリアと同方向に非目的エリア音源が存在する場合、目的エリア音だけでなく非目的エリア音まで強調してしまう問題が存在する。 In an actual environment, when it is desired to collect only a sound in a specific area (hereinafter referred to as a target area sound), there may be a situation in which a large number of noises (hereinafter referred to as non-target area sounds) exist around the area. Ordinarily, BF can form directivity only in the vertical and horizontal directions. Therefore, as shown in FIG. 16, when a non-target area sound source exists in the same direction as the target area, there is a problem that not only the target area sound but also the non-target area sound is emphasized.

この課題を解決するために、特許文献１の記載技術は、２個のマイクロホンアレイを用いて、別々の位置から各マイクロホンアレイの指向性をＢＦにより目的エリア方向、目的エリア以外の方向に向け、各方向から到来する音のレベル差から目的エリア音を推定し強調する手法を提案している。 In order to solve this problem, the technique described in Patent Document 1 uses two microphone arrays, and directs the directivity of each microphone array from different positions in a target area direction and a direction other than the target area by BF. We have proposed a method for estimating and emphasizing the target area sound from the level difference of the sound coming from each direction.

特開２００７−２３５３５８号公報JP 2007-235358 A

浅野太著，“音のアレイ信号処理 −音源の定位・追跡と分離”，社団法人日本音響学会，コロナ社，２０１１年２月２５日発行Tadashi Asano, “Sound Array Signal Processing-Localization, Tracking and Separation of Sound Sources”, The Acoustical Society of Japan, Corona, published on February 25, 2011 矢頭隆，森戸誠，山田圭，小川哲司，“正方形マイクロホンアレイによる音源分離技術（＜特集＞音声認識技術の実用化への取り組み）”，一般社団法人情報処理学会，情報処理５１（１１），ｐｐ．１４１０−１４１６，２０１０年Takashi Yagami, Makoto Morito, Satoshi Yamada, Tetsuji Ogawa, "Sound source separation technology using a square microphone array (<Special issue> Efforts for practical application of speech recognition technology)", Information Processing Society of Japan, Information Processing 51 (11), pp. 1410-1416, 2010

しかしながら、特許文献１の記載技術では、マイクロホンアレイを目的エリアから等間隔の距離に配置しなければならない制限がある。つまり、例えば２個のマイクロホンアレイ１及びマイクロホンアレイ２を配置させるとき、マイクロホンアレイ１から目的エリアへの距離と、マイクロホンアレイ２から目的エリアへの距離を等しくする必要がある。このため、目的エリアを変更する場合、変更の毎に、マイクロホンアレイを配置し直さなければならない問題が生じ得る。また、特許文献１の記載技術は、加算型ＢＦに基づいているため、１個のマイクロホンアレイを構成するためのマイクロホンを多数設けることが必要となる。 However, in the technique described in Patent Document 1, there is a limitation that the microphone array must be arranged at an equal distance from the target area. That is, for example, when two microphone arrays 1 and 2 are arranged, it is necessary to make the distance from the microphone array 1 to the target area equal to the distance from the microphone array 2 to the target area. For this reason, when the target area is changed, there is a problem that the microphone array has to be rearranged for each change. Further, since the technique described in Patent Document 1 is based on the addition type BF, it is necessary to provide a large number of microphones for constituting one microphone array.

そのため、少ないマイクロホンでマイクロホンアレイを構成することができ、マイクロホンアレイの位置を調整することなく、目的エリアが非目的エリア音源に囲まれている状況でも、目的エリア音のみを強調することができる収音装置及びプログラムが求められている。 For this reason, a microphone array can be configured with a small number of microphones, and only the target area sound can be emphasized without adjusting the position of the microphone array, even in a situation where the target area is surrounded by a non-target area sound source. There is a need for sound devices and programs.

かかる課題を解決するために、第１の本発明は、（１）複数のマイクロホンアレイと、（２）各マイクロホンアレイの出力に対し、ビームフォーマによって目的エリア方向へ指向性を形成する指向性形成部と、（３）各マイクロホンアレイのビームフォーマ後の出力において、目的エリア音が全てのマイクロホンアレイに同時に到着するように、目的エリアと各マイクロホンアレイの距離の違いにより発生する遅延を補正するマイクロホンアレイ間遅延補正部と、（４）各マイクロホンアレイのビームフォーマ出力に含まれる目的エリア音のパワーを全て同じ大きさにするために、各マイクロホンアレイのビームフォーマ出力間の振幅スペクトルの比率の最頻値もしくは中央値を算出し、補正係数とする目的エリア音パワー補正係数算出部と、（５）目的エリア音パワー補正係数算出部で算出した補正係数を用い、各マイクロホンアレイのビームフォーマ出力を補正し、それぞれをスペクトル減算することで目的エリア方向に存在する非目的エリア音を抽出し、その後抽出した非目的エリア音を各マイクロホンアレイのビームフォーマ出力からスペクトル減算することにより目的エリア音を抽出する目的エリア音抽出部とを備えることを特徴とする収音装置である。 In order to solve such a problem, the first aspect of the present invention provides (1) a plurality of microphone arrays, and (2) directivity formation that forms a directivity in the direction of a target area by a beamformer with respect to the output of each microphone array. And (3) a microphone that corrects a delay caused by a difference in distance between the target area and each microphone array so that the target area sound simultaneously arrives at all microphone arrays in the output after the beam former of each microphone array. (4) In order to make the power of the target area sound included in the beamformer output of each microphone array all the same, the maximum of the ratio of the amplitude spectrum between the beamformer outputs of each microphone array. A target area sound power correction coefficient calculation unit that calculates a frequency value or a median value as a correction coefficient; (5) Using the correction coefficient calculated by the target area sound power correction coefficient calculation unit, the beamformer output of each microphone array is corrected, and the non-target area sound existing in the target area direction is extracted by spectrum subtraction. And a target area sound extraction unit that extracts a target area sound by performing spectral subtraction of the non-target area sound extracted thereafter from the beamformer output of each microphone array.

第２の本発明は、（１）複数のマイクロホンアレイと、（２）各マイクロホンアレイの出力に対し、ビームフォーマによって目的エリア方向へ指向性を形成する指向性形成部と、（３）各マイクロホンアレイのビームフォーマ後の出力において、目的エリア音が全てのマイクロホンアレイに同時に到着するように、目的エリアと各マイクロホンアレイの距離の違いにより発生する遅延を補正するマイクロホンアレイ間遅延補正部と、（４）各マイクロホンアレイのビームフォーマ出力に含まれる目的エリア音のパワーを全て同じ大きさにするために、各マイクロホンアレイのビームフォーマ出力のパワーの差の２乗を最も小さくする係数を算出し、補正係数とする目的エリア音パワー補正係数算出部と、（５）目的エリア音パワー補正係数算出部で算出した補正係数を用い、各マイクロホンアレイのビームフォーマ出力を補正し、それぞれをスペクトル減算することで目的エリア方向に存在する非目的エリア音を抽出し、その後抽出した非目的エリア音を各マイクロホンアレイのビームフォーマ出力からスペクトル減算することにより目的エリア音を抽出する目的エリア音抽出部とを備えることを特徴とする収音装置である。 The second aspect of the present invention includes: (1) a plurality of microphone arrays; (2) a directivity forming unit that forms directivity in the direction of a target area with a beamformer with respect to the output of each microphone array; and (3) each microphone. An inter-microphone array delay correction unit that corrects a delay caused by a difference in the distance between the target area and each microphone array so that the target area sound arrives at all the microphone arrays at the same time after the beamformer of the array; 4) In order to make the power of the target area sound included in the beamformer output of each microphone array all the same, calculate a coefficient that minimizes the square of the difference in power of the beamformer output of each microphone array; A target area sound power correction coefficient calculation unit as a correction coefficient; and (5) target area sound power correction coefficient calculation. The beamformer output of each microphone array is corrected using the correction coefficient calculated in the section, and the non-target area sound existing in the target area direction is extracted by subtracting the spectrum, and then the extracted non-target area sound is A sound collection device comprising: a target area sound extraction unit that extracts a target area sound by performing spectral subtraction from a beamformer output of a microphone array.

第３の本発明は、複数のマイクロホンアレイからの信号が与えられるコンピュータを、（１）各マイクロホンアレイの出力に対し、ビームフォーマによって目的エリア方向へ指向性を形成する指向性形成部、（２）各マイクロホンアレイのビームフォーマ後の出力において、目的エリア音が全てのマイクロホンアレイに同時に到着するように、目的エリアと各マイクロホンアレイの距離の違いにより発生する遅延を補正するマイクロホンアレイ間遅延補正部、（３）各マイクロホンアレイのビームフォーマ出力に含まれる目的エリア音のパワーを全て同じ大きさにするために、各マイクロホンアレイのビームフォーマ出力間の振幅スペクトルの比率の最頻値もしくは中央値を算出し、補正係数とする目的エリア音パワー補正係数算出部、（４）目的エリア音パワー補正係数算出部で算出した補正係数を用い、各マイクロホンアレイのビームフォーマ出力を補正し、それぞれをスペクトル減算することで目的エリア方向に存在する非目的エリア音を抽出し、その後抽出した非目的エリア音を各マイクロホンアレイのビームフォーマ出力からスペクトル減算することにより目的エリア音を抽出する目的エリア音抽出部として機能させることを特徴とする収音プログラムである。 According to a third aspect of the present invention, there is provided a computer to which signals from a plurality of microphone arrays are provided. (1) A directivity forming unit that forms directivity in the direction of a target area by a beam former with respect to the output of each microphone array; ) Inter-microphone array delay correction unit that corrects the delay caused by the difference in distance between the target area and each microphone array so that the target area sound arrives at all the microphone arrays simultaneously at the output after the beamformer of each microphone array (3) In order to make the power of the target area sound included in the beamformer output of each microphone array all the same, the mode value or median value of the ratio of the amplitude spectrum between the beamformer outputs of each microphone array is set. A target area sound power correction coefficient calculation unit that calculates and sets a correction coefficient, (4 Using the correction coefficient calculated by the target area sound power correction coefficient calculator, correct the beamformer output of each microphone array, extract the non-target area sound that exists in the direction of the target area by subtracting each spectrum, and then extract The sound collection program is configured to function as a target area sound extraction unit that extracts a target area sound by subtracting the spectrum of the non-target area sound from the beamformer output of each microphone array.

第４の本発明は、複数のマイクロホンアレイからの信号が与えられるコンピュータを、（１）各マイクロホンアレイの出力に対し、ビームフォーマによって目的エリア方向へ指向性を形成する指向性形成部、（２）各マイクロホンアレイのビームフォーマ後の出力において、目的エリア音が全てのマイクロホンアレイに同時に到着するように、目的エリアと各マイクロホンアレイの距離の違いにより発生する遅延を補正するマイクロホンアレイ間遅延補正部、（３）各マイクロホンアレイのビームフォーマ出力に含まれる目的エリア音のパワーを全て同じ大きさにするために、各マイクロホンアレイのビームフォーマ出力のパワーの差の２乗を最も小さくする係数を算出し、補正係数とする目的エリア音パワー補正係数算出部、（４）目的エリア音パワー補正係数算出部で算出した補正係数を用い、各マイクロホンアレイのビームフォーマ出力を補正し、それぞれをスペクトル減算することで目的エリア方向に存在する非目的エリア音を抽出し、その後抽出した非目的エリア音を各マイクロホンアレイのビームフォーマ出力からスペクトル減算することにより目的エリア音を抽出する目的エリア音抽出部として機能させることを特徴とする収音プログラムである。 According to a fourth aspect of the present invention, there is provided a computer to which signals from a plurality of microphone arrays are provided. (1) A directivity forming unit that forms directivity in the direction of a target area by a beam former with respect to the output of each microphone array; ) Inter-microphone array delay correction unit that corrects the delay caused by the difference in distance between the target area and each microphone array so that the target area sound arrives at all the microphone arrays simultaneously at the output after the beamformer of each microphone array (3) In order to make the power of the target area sound included in the beamformer output of each microphone array all the same, the coefficient that minimizes the square of the power difference of the beamformer output of each microphone array is calculated. And a target area sound power correction coefficient calculation unit as a correction coefficient, (4) target error Using the correction coefficient calculated by the sound power correction coefficient calculation unit, the beamformer output of each microphone array is corrected, and the non-target area sound existing in the direction of the target area is extracted by subtracting each spectrum, and then extracted. A sound collection program that functions as a target area sound extraction unit that extracts a target area sound by subtracting the spectrum of the non-target area sound from the beamformer output of each microphone array.

本発明によれば、少ないマイクロホンでマイクロホンアレイを構成することができ、マイクロホンアレイの位置を調整することなく、目的エリアが非目的エリア音源に囲まれている状況でも、目的エリア音のみを強調することができる。 According to the present invention, a microphone array can be configured with a small number of microphones, and only the target area sound is emphasized even in a situation where the target area is surrounded by a non-target area sound source without adjusting the position of the microphone array. be able to.

第１の実施形態に係る収音装置の構成を示すブロック図である。It is a block diagram which shows the structure of the sound collection device which concerns on 1st Embodiment. 目的エリア音抽出部の構成を示すブロック図である。It is a block diagram which shows the structure of the target area sound extraction part. 第１の実施形態に係る収音装置の処理を示すフローチャートである。It is a flowchart which shows the process of the sound collection device which concerns on 1st Embodiment. 第１の実施形態に係る性能評価実験でのマイクロホンアレイと各音源の配置を示した図である。It is the figure which showed arrangement | positioning of the microphone array and each sound source in the performance evaluation experiment which concerns on 1st Embodiment. 第１の実施形態と既存手法の各配置パターンでの非目的エリア音の抑圧量を示した図である。It is the figure which showed the suppression amount of the non-target area sound in each arrangement pattern of 1st Embodiment and the existing method. 第２の実施形態に係る収音装置の構成を示すブロック図である。It is a block diagram which shows the structure of the sound collection device which concerns on 2nd Embodiment. 第２の実施形態に係る収音装置の処理を示すフローチャートである。It is a flowchart which shows the process of the sound collection device which concerns on 2nd Embodiment. 第３の実施形態に係る収音装置の構成を示すブロック図である。It is a block diagram which shows the structure of the sound collection device which concerns on 3rd Embodiment. 第３の実施形態に係る収音装置の処理を示すフローチャートである。It is a flowchart which shows the process of the sound collection device which concerns on 3rd Embodiment. 目的エリア音パワー補正係数算出部の構成を示すブロック図である。It is a block diagram which shows the structure of the target area sound power correction coefficient calculation part. 第３の実施形態に係る性能評価実験でのマイクロホンアレイと各音源の配置を示した図である。It is the figure which showed arrangement | positioning of the microphone array and each sound source in the performance evaluation experiment which concerns on 3rd Embodiment. 第３の実施形態と既存手法の各配置パターンでの非目的エリア音の抑圧量を示した図である。It is the figure which showed the suppression amount of the non-target area sound in each arrangement pattern of 3rd Embodiment and the existing method. 第４の実施形態に係る収音装置の構成を示すブロック図である。It is a block diagram which shows the structure of the sound collection device which concerns on 4th Embodiment. 第４の実施形態に係る収音装置の処理を示すフローチャートである。It is a flowchart which shows the process of the sound collection device which concerns on 4th Embodiment. スペクトル減算法に係る構成を示すブロック図である。It is a block diagram which shows the structure which concerns on a spectrum subtraction method. １個のマイクロホンアレイから指向性ビームを目的エリア方向に向けた状態を示す説明図である。It is explanatory drawing which shows the state which orient | assigned the directional beam to the target area direction from one microphone array. ２個のマイクロホンアレイを用い、別々の場所から指向性ビームを目的エリア方向に向けた状態を示す説明図である。It is explanatory drawing which shows the state which used two microphone arrays and directed the directional beam from the different places to the direction of the target area. 各マイクロホンアレイのＢＦ出力信号と目的エリア音成分、非目的エリア音成分のスペクトルの違いを示す説明図である。It is explanatory drawing which shows the difference in the spectrum of BF output signal of each microphone array, a target area sound component, and a non-target area sound component. 各マイクロホンアレイのＢＦ出力信号間の振幅スペクトルの比率をヒストグラムで示した説明図である。It is explanatory drawing which showed the ratio of the amplitude spectrum between BF output signals of each microphone array with the histogram.

（Ａ）第１〜第４の実施形態に共通する技術思想
第１〜第４の実施形態では、まず目的エリアを含む空間内に複数のマイクロホンアレイを任意に配置し、ＢＦにより目的エリア方向へ指向性ビームを形成する。例として２個のマイクロホンアレイの指向性ビームを目的エリアに向けたときのイメージを図１７に示す。この状態では、各マイクロホンアレイ１、２のＢＦの指向性には目的エリア音方向の非目的エリア音成分が含まれている。しかし、目的エリアは、全てのマイクロホンアレイ１、２の指向性ビームに含まれている。そのため、目的エリア音成分は、図１８（ａ）、（ｃ）に示すように、全ＢＦの出力信号に同じ割合、分布で含まれることになる。それと比較し非目的エリア音１、２の成分は、図１８（ｂ）、（ｄ）に示すように、各マイクロホンアレイ１、２のＢＦ出力信号毎に異なっている。第１〜第４の実施形態は、このような特徴を利用するものである。 (A) Technical idea common to the first to fourth embodiments In the first to fourth embodiments, first, a plurality of microphone arrays are arbitrarily arranged in a space including the target area, and the target area is directed by BF. Form a directional beam. As an example, FIG. 17 shows an image when the directional beams of two microphone arrays are directed to the target area. In this state, the BF directivity of each of the microphone arrays 1 and 2 includes a non-target area sound component in the target area sound direction. However, the target area is included in the directional beams of all microphone arrays 1 and 2. Therefore, as shown in FIGS. 18A and 18C, the target area sound component is included in the output signals of all BFs in the same ratio and distribution. In contrast, the components of the non-target area sounds 1 and 2 are different for the BF output signals of the microphone arrays 1 and 2 as shown in FIGS. The first to fourth embodiments utilize such features.

つまり、一方のマイクロホンアレイ１のＢＦ出力信号からマイクロホンアレイ２のＢＦ出力信号をＳＳすると、図１８（ｅ）において重なっている目的エリア音成分は消去される。このとき、非目的エリア音１と非目的エリア音２の成分は重ならないため、非目的エリア音１のみを抽出することができる。抽出した非目的エリア音１の成分をマイクロホンアレイ１のＢＦ出力信号から更にＳＳすることにより、最終的に目的エリア音を抽出することができる。 That is, when the BF output signal of the microphone array 2 is SS from the BF output signal of one of the microphone arrays 1, the target area sound component overlapping in FIG. 18 (e) is deleted. At this time, since the components of the non-target area sound 1 and the non-target area sound 2 do not overlap, only the non-target area sound 1 can be extracted. By further SS processing the extracted component of the non-target area sound 1 from the BF output signal of the microphone array 1, the target area sound can be finally extracted.

この手法により目的エリア音を抽出するためには、各ＢＦ出力信号に目的エリア音成分のパワーが同じ大きさで含まれることが前提となる。しかし、通常、各ＢＦ出力信号の目的エリア音成分のパワーは、目的エリアと各マイクロホンアレイ１、２との距離の違いや、マイクロホンアレイ１及び２の間のゲインの違いによって変わってくる。 In order to extract the target area sound by this method, it is assumed that the power of the target area sound component is included in each BF output signal with the same magnitude. However, normally, the power of the target area sound component of each BF output signal varies depending on the difference in distance between the target area and each of the microphone arrays 1 and 2 and the difference in gain between the microphone arrays 1 and 2.

そこで、第１及び第２の実施形態では、まず各ＢＦ出力信号間で振幅スペクトルの比を求め、その比率の最頻値を算出する。前述のとおり、目的エリア音成分は、全てのＢＦ出力信号に同じ割合、分布で含まれているため、目的エリア音成分の周波数では、比率が全て同じになる。逆に非目的エリア音成分は、各ＢＦ出力信号で異なるので比率にはばらつきがある。この特性から、全ての周波数毎の比率に対して最頻値を求めれば、その値がそのまま、各ＢＦ出力信号の目的エリア音成分のパワーが等しくなるように補正する係数となる。図１９は、各マイクロホンアレイ１、２のＢＦ出力信号間の振幅スペクトルの比率をヒストグラムで示した説明図である。図１９（Ａ）は、各マイクロホンアレイ１、２が目的エリアから等距離に配置されている場合である。目的エリアからの距離が同じため、入力される目的エリア音成分のパワーはほぼ等しく、比率の最頻値は１に近い値となっている。図１９（Ｂ）は、マイクロホンアレイ１よりもマイクロホンアレイ２の方が目的エリアに近い場合である。目的エリアに近いマイクロホンアレイ２の方が目的エリア音成分のパワーが大きいため、比率の最頻値は１より小さい値となっていることが分かる。またパワー補正係数は、中央値を最頻値の近似として算出し求めることもできる。図１９（Ａ）及び（Ｂ）から分かるように、比率の分布は単峰であるので、中央値は最頻値と近い値になる。以上のように、第１及び第２の実施形態では、各ＢＦ出力信号間の振幅スペクトルの比率の最頻値もしくは中央値をパワー補正係数として算出する。算出したパワー補正係数を用い、各ＢＦ出力信号に含まれる目的エリア音成分のパワーが全て等しくなるように補正した後、上記手法により目的エリア音を抽出する。 Therefore, in the first and second embodiments, first, the ratio of the amplitude spectrum is obtained between the BF output signals, and the mode value of the ratio is calculated. As described above, since the target area sound component is included in all BF output signals in the same ratio and distribution, all the ratios are the same at the frequency of the target area sound component. On the contrary, the non-target area sound component differs in each BF output signal, so that the ratio varies. From this characteristic, if the mode value is obtained for the ratios for all frequencies, the value is used as it is as a coefficient for correcting the power of the target area sound component of each BF output signal to be equal. FIG. 19 is an explanatory diagram showing the ratio of the amplitude spectrum between the BF output signals of the microphone arrays 1 and 2 as a histogram. FIG. 19A shows a case where the microphone arrays 1 and 2 are arranged equidistant from the target area. Since the distance from the target area is the same, the powers of the input target area sound components are substantially equal, and the mode of the ratio is a value close to 1. FIG. 19B shows a case where the microphone array 2 is closer to the target area than the microphone array 1. It can be seen that the mode value of the ratio is smaller than 1 because the power of the target area sound component is larger in the microphone array 2 closer to the target area. The power correction coefficient can also be obtained by calculating the median value as an approximation of the mode value. As can be seen from FIGS. 19A and 19B, since the distribution of the ratio is unimodal, the median value is close to the mode value. As described above, in the first and second embodiments, the mode value or median value of the ratio of the amplitude spectrum between the BF output signals is calculated as the power correction coefficient. After correcting the power of all target area sound components included in each BF output signal using the calculated power correction coefficient, the target area sound is extracted by the above method.

また、第３及び第４の実施形態では、まず各ＢＦ出力信号のパワーの差の２乗が最小になる値を算出し、この最小値を目的エリア音成分のパワー補正係数とする。各ＢＦ出力信号の目的エリア音成分の分布は正規化すると同じになるため、各ＢＦ後のパワーの差が最小になったときが、目的エリア音成分のパワーが一致した状態であると考えられる。算出したパワー補正係数を用い、各ＢＦ出力信号に含まれる目的エリア音成分のパワーが全て等しくなるように補正した後、上記手法により目的エリア音を抽出する。 In the third and fourth embodiments, first, a value that minimizes the square of the power difference of each BF output signal is calculated, and this minimum value is used as the power correction coefficient of the target area sound component. Since the distribution of the target area sound component of each BF output signal becomes the same when normalized, it is considered that the power of the target area sound component matches when the difference in power after each BF is minimized. . After correcting the power of all target area sound components included in each BF output signal using the calculated power correction coefficient, the target area sound is extracted by the above method.

（Ｂ）第１の実施形態
以下では、本発明に係る収音装置及びプログラムの第１の実施形態を、図面を参照しながら詳細に説明する。 (B) First Embodiment Hereinafter, a first embodiment of a sound collection device and a program according to the present invention will be described in detail with reference to the drawings.

（Ｂ−１）第１の実施形態の構成
図１は、第１の実施形態に係る収音装置の構成を示すブロック図である。収音装置１０Ａにおける、デジタル信号に変換された後の処理構成を、ＣＰＵと、ＣＰＵが実行するプログラムで実現することもできるが、機能的には、図１で表すことができる。 (B-1) Configuration of First Embodiment FIG. 1 is a block diagram illustrating a configuration of a sound collection device according to the first embodiment. The processing configuration after being converted into a digital signal in the sound collection device 10A can be realized by a CPU and a program executed by the CPU, but can be functionally represented in FIG.

図１において、第１の実施形態に係る収音装置１０Ａは、マイクロホンアレイ１、マイクロホンアレイ２、データ入力部３、指向性形成部４、マイクロホンアレイ間遅延補正部５、目的エリア音パワー補正係数算出部６、目的エリア音抽出部７を備える。 In FIG. 1, a sound collection device 10A according to the first embodiment includes a microphone array 1, a microphone array 2, a data input unit 3, a directivity forming unit 4, an inter-microphone array delay correction unit 5, a target area sound power correction coefficient. A calculation unit 6 and a target area sound extraction unit 7 are provided.

マイクロホンアレイ１は、目的エリアが存在する空間の、目的エリアを指向できる場所に配置される。マイクロホンアレイ１は、２個以上のマイクロホンから構成され、各マイクロホンにより音響を収音し、音響信号を当該収音装置１０Ａのデータ入力部３に入力するものである。 The microphone array 1 is arranged at a location where the target area can be directed in the space where the target area exists. The microphone array 1 is composed of two or more microphones, collects sound by each microphone, and inputs an acoustic signal to the data input unit 3 of the sound collecting device 10A.

マイクロホンアレイ２は、マイクロホンアレイ１と同様の構成を有するものであり、マイクロホンアレイ１と異なる場所に配置される。 The microphone array 2 has the same configuration as the microphone array 1 and is arranged at a different location from the microphone array 1.

マイクロホンアレイ１、２を構成する複数個のマイクロホンの配置はＢＦを実行できる配置であれば良く、例えば、横一列、縦一列、十字状又は格子状のいずれかであっても良い。また、マイクロホンアレイの配置数は、２個以上であっても良い。 The arrangement of the plurality of microphones constituting the microphone arrays 1 and 2 may be any arrangement that can execute the BF, and may be, for example, one horizontal row, one vertical row, a cross shape, or a lattice shape. The number of microphone arrays may be two or more.

データ入力部３は、マイクロホンアレイ１、２で収音された音響信号をアナログ信号からデジタル信号（データ）に変換するものである。 The data input unit 3 converts an acoustic signal collected by the microphone arrays 1 and 2 from an analog signal to a digital signal (data).

指向性形成部４は、全てのマイクロホンアレイ１、２からの出力信号に基づいてＢＦにより目的エリアに向けた指向性ビームを形成するものである。ＢＦは、加算型の遅延和法、減算型のＳＳなど各種手法を適用することができる。また、ターゲットとする目的エリアの範囲に応じて、指向性形成部４は指向性の強度を変更できる。 The directivity forming unit 4 forms a directional beam directed to a target area by BF based on output signals from all the microphone arrays 1 and 2. Various methods such as an addition type delay sum method and a subtraction type SS can be applied to the BF. Further, the directivity forming unit 4 can change the intensity of directivity according to the range of the target area as a target.

マイクロホンアレイ間遅延補正部５は、各マイクロホンアレイ１、２のＢＦ後の出力において、目的エリア音が全てのマイクロホンアレイに同時に到達するように、目的エリアと各マイクロホンアレイの距離の違いにより発生する遅延を補正するものである。 The inter-microphone array delay correction unit 5 is generated by the difference in distance between the target area and each microphone array so that the target area sound reaches all the microphone arrays simultaneously in the output after BF of each microphone array 1 and 2. This is to correct the delay.

目的エリア音パワー補正係数算出部６は、各ＢＦ後のデータに含まれる目的エリア音成分のパワーを全て同じ大きさにするための補正係数を算出するものである。 The target area sound power correction coefficient calculation unit 6 calculates a correction coefficient for making the powers of the target area sound components included in the data after each BF all the same.

目的エリア音抽出部７は、目的エリア音パワー補正係数算出部６で算出した補正係数により補正した各ＢＦ出力データをＳＳし、目的エリア方向に存在する非目的エリア音を抽出する。さらに、目的エリア音抽出部７は、抽出した非目的エリア音を、各ＢＦ出力データからＳＳすることにより目的エリア音を抽出して出力するものである。 The target area sound extraction unit 7 extracts each BF output data corrected by the correction coefficient calculated by the target area sound power correction coefficient calculation unit 6 and extracts a non-target area sound existing in the target area direction. Furthermore, the target area sound extraction unit 7 extracts and outputs the target area sound by performing SS on the extracted non-target area sound from each BF output data.

図２は、目的エリア音抽出部７の構成を示すブロック図である。ここで、マイクロホンアレイ１、２のＢＦ後の出力データをＸ_１（ｎ）、Ｘ_２（ｎ）とし、各ＢＦ出力データに対するパワー補正係数をα_１（ｎ）、α_２（ｎ）とする。また、マイクロホンアレイ１からみた目的エリア方向に存在する非目的エリア音成分をＮ_１（ｎ）とし、マイクロホンアレイ２からみた目的エリア方向に存在する非目的エリア音成分をＮ_２（ｎ）とする。 FIG. 2 is a block diagram showing the configuration of the target area sound extraction unit 7. Here, the output data after BF of the microphone arrays 1 and 2 are X ₁ (n) and X ₂ (n), and the power correction coefficients for each BF output data are α ₁ (n) and α ₂ (n). . Further, a non-target area sound component existing in the target area direction viewed from the microphone array 1 is N ₁ (n), and a non-target area sound component existing in the target area direction viewed from the microphone array 2 is N ₂ (n). .

この場合、目的エリア音抽出部７は、マイクロホンアレイ２のＢＦ出力データＸ_２にパワー補正係数α_１（ｎ）を掛けてＳＳを行い、マイクロホンアレイ１のＢＦ出力データＸ_１（ｎ）に含まれる目的エリア方向の非目的エリア音成分Ｎ_１（ｎ）を抽出する。さらに、目的エリア音抽出部７は、マイクロホンアレイ１のＢＦ出力データＸ_１（ｎ）に対しＮ_１（ｎ）をＳＳし、目的エリア音成分Ｙ_１（ｎ）を抽出する。 In this case, the target area sound extraction unit 7 performs SS by multiplying the BF output data X ₂ of the microphone array 2 by the power correction coefficient α ₁ (n), and is included in the BF output data X ₁ (n) of the microphone array 1. The non-target area sound component N ₁ (n) in the target area direction is extracted. Furthermore, the target area sound extraction unit 7 SSs N ₁ (n) for the BF output data X ₁ (n) of the microphone array 1 and extracts the target area sound component Y ₁ (n).

目的エリア音成分Ｙ_２（ｎ）についても同様に、目的エリア音抽出部７は、マイクロホンアレイ１のＢＦ出力データＸ_１にパワー補正係数α_２（ｎ）を掛けてＳＳを行い、マイクロホンアレイ２のＢＦ出力データＸ_２（ｎ）に含まれる目的エリア方向の非目的エリア音成分Ｎ_２（ｎ）を抽出する。さらに、目的エリア音抽出部７は、マイクロホンアレイ２のＢＦ出力データＸ_２（ｎ）に対しＮ_２（ｎ）をＳＳし、目的エリア音成分Ｙ_２（ｎ）を抽出する。 Similarly for the target area sound component Y ₂ (n), the target area sound extraction unit 7 performs SS by multiplying the BF output data X ₁ of the microphone array 1 by the power correction coefficient α ₂ (n), and performs microphone array 2. extracting the BF output data _X 2 (n) non-target area sound object area direction included in the component _n 2 (n). Further, the target area sound extraction unit 7 SS SSs N ₂ (n) for the BF output data X ₂ (n) of the microphone array 2 and extracts the target area sound component Y ₂ (n).

（Ｂ−２）第１の実施形態の動作
次に、第１の実施形態に係る収音装置１０Ａの処理を説明する。図３は、第１の実施形態に係る収音装置１０Ａの処理を示すフローチャートである。 (B-2) Operation of the First Embodiment Next, processing of the sound collection device 10A according to the first embodiment will be described. FIG. 3 is a flowchart showing the processing of the sound collection device 10A according to the first embodiment.

目的エリアがある空間に存在する各種の音源からの音響は、マイクロホンアレイ１、２を構成するマイクロホンによって収音され、マイクロホンアレイ１、２で取得した音響信号がデータ入力部３に入力し、音響信号がデジタル信号に変換される（Ｓ１）。 Sounds from various sound sources existing in a space with a target area are picked up by the microphones constituting the microphone arrays 1 and 2, and the acoustic signals acquired by the microphone arrays 1 and 2 are input to the data input unit 3, The signal is converted into a digital signal (S1).

指向性形成部４は、全てのマイクロホンアレイ１、２の出力に対し、ＢＦによって目的エリア方向へ指向性を形成する（Ｓ２）。 The directivity forming unit 4 forms directivity in the direction of the target area by BF with respect to the outputs of all the microphone arrays 1 and 2 (S2).

マイクロホンアレイ間遅延補正部５は、目的エリアと各マイクロホンアレイの距離の違いにより発生する遅延を補正する（Ｓ３）。 The inter-microphone array delay correcting unit 5 corrects a delay caused by a difference in distance between the target area and each microphone array (S3).

マイクロホンアレイ間遅延補正部５は、まず目的エリアの位置とマイクロホンアレイの位置から、各マイクロホンアレイへの目的エリア音の到達時間を算出する。そして、最も目的エリアから遠い位置に配置されたマイクロホンアレイを基準として、全てのマイクロホンアレイに目的エリア音が同時に到達するように遅延を加える。マイクロホンアレイ間遅延補正部５によるこの操作により、任意に配置した各マイクロホンアレイ１、２の出力データを同時に扱うことが可能となる。 The inter-microphone array delay correcting unit 5 first calculates the arrival time of the target area sound to each microphone array from the position of the target area and the position of the microphone array. Then, with reference to the microphone array arranged farthest from the target area, a delay is added so that the target area sound reaches all the microphone arrays simultaneously. This operation by the inter-microphone array delay correction unit 5 makes it possible to simultaneously handle output data of the microphone arrays 1 and 2 that are arbitrarily arranged.

目的エリア音パワー補正係数算出部６は、各マイクロホンアレイ１、２からのＢＦ後の出力データに含まれる目的エリア音成分のパワーを全て同じにするための目的エリア音パワー補正係数を算出する（Ｓ４）。 The target area sound power correction coefficient calculation unit 6 calculates a target area sound power correction coefficient for making all the powers of the target area sound components included in the output data after BF from the microphone arrays 1 and 2 equal ( S4).

パワー補正係数を求めるために、目的エリア音パワー補正係数算出部６は、まず各ＢＦ出力データＸ_１、Ｘ_２間で振幅スペクトルの比率を求める。この際、指向性形成部４でＢＦを時間領域で行なっている場合には、各ＢＦ出力データを周波数領域に変換する。そして、目的エリア音パワー補正係数算出部６は、求めた比率から最頻値を算出し、その値をパワー補正係数とする（（７）、（８）式）。または、目的エリア音パワー補正係数算出部６は、比率の中央値を算出し、パワー補正係数とすることもできる（（９）、（１０）式）。

In order to obtain the power correction coefficient, the target area sound power correction coefficient calculation unit 6 first obtains the ratio of the amplitude spectrum between the BF output data X ₁ and X ₂ . At this time, if the directivity forming unit 4 performs BF in the time domain, each BF output data is converted to the frequency domain. Then, the target area sound power correction coefficient calculation unit 6 calculates the mode value from the obtained ratio, and sets the value as the power correction coefficient (Equations (7) and (8)). Alternatively, the target area sound power correction coefficient calculation unit 6 can also calculate the median of the ratios and use it as the power correction coefficient (Equations (9) and (10)).

ここで、Ｘ_１ｋ（ｎ）、Ｘ_２ｋ（ｎ）はマイクロホンアレイ１、２のＢＦ後の出力データ、Ｎは周波数ビンの総数、ｋは周波数、α_１（ｎ）、α_２（ｎ）は各ＢＦ出力に対するパワー補正係数である。目的エリア音パワー補正係数算出部６は、パワー補正係数を全て求める必要はなく、一方を求めたらもう一方を、その逆数としてもよい。つまり、目的エリア音パワー補正係数算出部６がα_１（ｎ）を求めたら、もう一方のα_２（ｎ）については、α_２（ｎ）＝１／α_１（ｎ）とすることができる。 Here, X _1k (n), X _2k (n) are output data after BF of the microphone arrays 1 and 2, N is the total number of frequency bins, k is the frequency, α ₁ (n), α ₂ (n) are It is a power correction coefficient for each BF output. The target area sound power correction coefficient calculation unit 6 does not have to obtain all the power correction coefficients, and when one is obtained, the other may be the reciprocal thereof. That is, when the target area sound power correction coefficient calculation unit 6 calculates α ₁ (n), α ₂ (n) = 1 / α ₁ (n) can be set for the other α ₂ (n). .

目的エリア音抽出部７は、目的エリア音パワー補正係数算出部６で算出したパワー補正係数により補正した各ＢＦ出力データをＳＳし、目的エリア方向に存在する非目的エリア音を抽出する（Ｓ５）。さらに、目的エリア音抽出部７は、抽出した非目的エリア音を各ＢＦの出力からＳＳすることにより目的エリア音を抽出する（Ｓ６）。マイクロホンアレイ１からみた目的エリア方向に存在する非目的エリア音Ｎ_１（ｎ）を抽出するには、（１１）式に示すように、マイクロホンアレイ１のＢＦ出力Ｘ_１（ｎ）からマイクロホンアレイ２のＢＦ出力Ｘ_２（ｎ）にパワー補正係数α_２を掛けたものをＳＳする。同様に（１２）式に従い、マイクロホンアレイ２からみた目的エリア方向に存在する非目的エリア音Ｎ_２（ｎ）を抽出する。

The target area sound extraction unit 7 SS each BF output data corrected by the power correction coefficient calculated by the target area sound power correction coefficient calculation unit 6, and extracts the non-target area sound existing in the target area direction (S5). . Furthermore, the target area sound extraction unit 7 extracts the target area sound by performing SS on the extracted non-target area sound from the output of each BF (S6). In order to extract the non-target area sound N ₁ (n) existing in the direction of the target area viewed from the microphone array 1, the microphone array 2 is obtained from the BF output X ₁ (n) of the microphone array 1 as shown in the equation (11). SS is obtained by multiplying the BF output X ₂ (n) by the power correction coefficient α ₂ . Similarly, the non-target area sound N ₂ (n) existing in the target area direction viewed from the microphone array 2 is extracted according to the equation (12).

その後、目的エリア音抽出部７は、（１３）式、（１４）式に従い、各ＢＦ出力データから非目的エリア音をＳＳすることにより目的エリア音を抽出する。（１３）式、（１４）式のγ_１（ｎ）、γ_２（ｎ）はＳＳ時の強度を変更するための係数である。

Thereafter, the target area sound extraction unit 7 extracts the target area sound by SS of the non-target area sound from each BF output data according to the equations (13) and (14). In equations (13) and (14), γ ₁ (n) and γ ₂ (n) are coefficients for changing the strength at the time of SS.

第１の実施形態の効果を示すために以下の実験を行った。 The following experiment was conducted to show the effect of the first embodiment.

図４は、マイクロホンアレイ１、２と音源の配置を示したものである。収音対象のエリアを一辺が２ｍの正方形とし、収音対象エリアを４つの区画に分けた。その内３つのエリアに、目的エリア音源１個と、非目的エリア音源を２個配置した。音源は全てヒトの声であり、これらをほぼ同じ音量で同時に再生し、マイクロホンアレイで録音した。マイクロホンアレイは２台使用し、それぞれ正面方向に目的エリア音源と非目的エリア音源が重なるように配置する。 FIG. 4 shows the arrangement of the microphone arrays 1 and 2 and the sound source. The area to be collected was a square with a side of 2 m, and the area to be collected was divided into four sections. One target area sound source and two non-target area sound sources were arranged in three areas. The sound sources were all human voices, and these were reproduced simultaneously at approximately the same volume and recorded with a microphone array. Two microphone arrays are used and arranged so that the target area sound source and the non-target area sound source overlap each other in the front direction.

図４（Ａ）の配置パターン１では、各マイクロホンアレイ１、２に対し、目的エリア音源を非目的エリア音源の手前に配置した。また図４（Ｂ）の配置パターン２では、目的エリア音源を非目的エリア音源の奥に配置した。各マイクロホンアレイ１、２は、同数のマイクロホンから構成され、１台のマイクロホンアレイに使用したマイクロホンの数は２個とした。マイクロホン間隔は全て３ｃｍとした。録音したデータを用い、本発明方式とマイクロホンアレイ単独でのＢＦの非目的エリア音の抑圧量を、計算機シミュレーションにより比較した。ＢＦの手法は既存の減算型ＢＦ（非特許文献２参照）を用いた。 In the arrangement pattern 1 of FIG. 4A, the target area sound source is arranged in front of the non-target area sound source for each of the microphone arrays 1 and 2. 4B, the target area sound source is placed behind the non-target area sound source. Each of the microphone arrays 1 and 2 is composed of the same number of microphones, and the number of microphones used in one microphone array is two. All the microphone intervals were 3 cm. Using recorded data, the amount of suppression of non-target area sound of BF by the method of the present invention and the microphone array alone was compared by computer simulation. The existing subtractive BF (see Non-Patent Document 2) was used as the BF method.

非目的エリア音をどの程度抑圧できるのかをＮｏｉｓｅＲｅｄｕｃｔｉｏｎＲａｔｅ（ＮＲＲ）を用いて評価した。 The extent to which non-target area sounds can be suppressed was evaluated using Noise Reduction Rate (NRR).

図５は、それぞれの配置パターンでの非目的エリア音の抑圧量を示したものである。図５（Ａ）の配置パターン１では、本発明方式は、マイクロホンアレイ単独のＢＦに比べ、約３ｄＢ非目的エリア音の抑圧量が大きい。図５（Ｂ）の配置パターン２においても、本発明方式の方が、マイクロホンアレイ単独のＢＦよりも約３．６ｄＢ大きく抑圧できている。このように本実施形態によれば、目的エリア方向に存在する非目的エリア音を抑圧することができる。 FIG. 5 shows the suppression amount of the non-target area sound in each arrangement pattern. In the arrangement pattern 1 of FIG. 5A, the method of the present invention has a large suppression amount of about 3 dB non-target area sound compared to the BF of the microphone array alone. Also in the arrangement pattern 2 of FIG. 5 (B), the method of the present invention can suppress approximately 3.6 dB larger than the BF of the microphone array alone. Thus, according to the present embodiment, it is possible to suppress the non-target area sound existing in the target area direction.

（Ｂ−３）第１の実施形態の効果
第１の実施形態によれば、各ＢＦの出力に含まれる目的エリア音成分の大きさを補正することにより目的エリア音を抽出するため、各マイクロホンアレイの位置を調整することなく、目的エリアが非目的エリア音源に囲まれている状況でも目的エリア昔のみを強調することができる。つまり複数のマイクロホンアレイを異なる方向に一度配置するだけで目的エリア音のみを強調することができる。 (B-3) Effect of First Embodiment According to the first embodiment, each microphone is extracted in order to extract a target area sound by correcting the magnitude of the target area sound component included in the output of each BF. Without adjusting the position of the array, it is possible to emphasize only the past target area even in a situation where the target area is surrounded by non-target area sound sources. That is, only the target area sound can be emphasized by arranging a plurality of microphone arrays once in different directions.

また、第１の実施形態によれば、指向性形成部が形成する指向性を変更することができるので、複数のマイクロホンアレイの位置などを変更することなく、目的エリアの変更にも容易に対応することができる。 Further, according to the first embodiment, the directivity formed by the directivity forming unit can be changed, so that it is possible to easily change the target area without changing the positions of a plurality of microphone arrays. can do.

さらに、第１の実施形態によれば、減算型ＢＦを使用することができるため、１個のマイクロホンアレイを、少ないマイクロホンで構成することができる。 Furthermore, according to the first embodiment, since the subtractive BF can be used, one microphone array can be configured with a small number of microphones.

（Ｃ）第２の実施形態
第１の実施形態では、目的エリア音が抽出されたデータは、マイクロホンアレイの数だけ出力される。エリア収音装置を使用する際、これらのデータの中から最終的に１つのデータを選択して出力する状況が想定される。 (C) Second Embodiment In the first embodiment, data from which target area sounds are extracted is output by the number of microphone arrays. When using the area sound pickup device, a situation is assumed in which one data is finally selected and output from these data.

そこで第２の実施形態は、目的エリアと各マイクロホンアレイの距離や、目的エリア音と非目的エリア音のＳＮの比を特徴量として利用し、最も目的エリア音が強調されているデータを選択する出力データ選択部を備える。 Accordingly, in the second embodiment, the distance between the target area and each microphone array and the SN ratio of the target area sound and the non-target area sound are used as feature amounts to select data in which the target area sound is most emphasized. An output data selection unit is provided.

以下、本発明に係る収音装置及びプログラムの第２の実施形態を、図面を参照して説明する。 Hereinafter, a second embodiment of a sound collecting device and a program according to the present invention will be described with reference to the drawings.

（Ｃ−１）第２の実施形態の構成
図６は、第２の実施形態に係る収音装置の構成を示すブロック図である。図６において、第２の実施形態に係る収音装置１０Ｂは、マイクロホンアレイ１、マイクロホンアレイ２、データ入力部３、指向性形成部４、マイクロホンアレイ間遅延補正部５、目的エリア音パワー補正係数算出部６、目的エリア音抽出部７、出力データ選択部８を備える。 (C-1) Configuration of Second Embodiment FIG. 6 is a block diagram showing a configuration of a sound collection device according to the second embodiment. In FIG. 6, the sound collection device 10B according to the second embodiment includes a microphone array 1, a microphone array 2, a data input unit 3, a directivity forming unit 4, an inter-microphone array delay correction unit 5, and a target area sound power correction coefficient. A calculation unit 6, a target area sound extraction unit 7, and an output data selection unit 8 are provided.

第２の実施形態に係る収音装置１０Ｂは、第１の実施形態で説明した構成要素に加えて、目的エリア音抽出部７の後段に出力データ選択部８を備える。 The sound collection device 10B according to the second embodiment includes an output data selection unit 8 subsequent to the target area sound extraction unit 7 in addition to the components described in the first embodiment.

出力データ選択部８は、目的エリア音抽出部７の出力の中から、目的エリアと各マイクロホンアレイ１、２との距離もしくはＳＮ比を、目的エリア音強調の指標とし、最も目的エリア音が強調されているデータを選択するものである。 The output data selection unit 8 uses the distance or S / N ratio between the target area and each of the microphone arrays 1 and 2 from the output of the target area sound extraction unit 7 as an index of the target area sound enhancement, and the target area sound is most emphasized. The selected data is selected.

（Ｃ−２）第２の実施形態の動作
次に、第２の実施形態に係る収音装置１０Ｂの処理を説明する。図７は、第２の実施形態に係る収音装置１０Ｂの処理を示すフローチャートである。図７において、Ｓ１〜Ｓ６の処理は図３のＳ１〜Ｓ６の処理と同様である。 (C-2) Operation of the Second Embodiment Next, processing of the sound collection device 10B according to the second embodiment will be described. FIG. 7 is a flowchart showing the processing of the sound collection device 10B according to the second embodiment. In FIG. 7, the process of S1-S6 is the same as the process of S1-S6 of FIG.

出力データ選択部８は、目的エリア音抽出部７で目的エリア音を抽出した複数個のデータから、最も目的エリア音が強調されているデータを選択する（Ｓ７）。 The output data selection unit 8 selects data in which the target area sound is most emphasized from the plurality of data extracted by the target area sound extraction unit 7 (S7).

出力データ選択部８は、目的エリア音強調の指標を、目的エリアとマイクロホンアレイ１、２との間の距離として、距離が最も近いものを出力データとして選択する。もしくは、ＳＮ比（この場合はＹ_ｉ（ｎ）／Ｎ_ｉ（ｎ））を目的エリア音強調の指標とし、出力データ選択部８は最もＳＮ比が良いものを選択する。さらに、出力データ選択部８は、これらの指標を組み合わせて選択することもできる。 The output data selection unit 8 selects the target area sound enhancement index as the distance between the target area and the microphone arrays 1 and 2 and outputs the closest distance as the output data. Alternatively, the SN ratio (in this case, Y _i (n) / N _i (n)) is used as an index for the target area sound enhancement, and the output data selection unit 8 selects the one with the best SN ratio. Furthermore, the output data selection unit 8 can also select a combination of these indices.

（Ｃ−３）第２の実施形態の効果
第２の実施形態によれば、第１の実施形態の効果に加えて、複数存在する目的エリア音が抽出されたデータの中から、最も目的エリア音が強調されたデータを選択し、出力することが可能になる。 (C-3) Effects of the Second Embodiment According to the second embodiment, in addition to the effects of the first embodiment, the target area is the most out of the data in which a plurality of target area sounds are extracted. Data with enhanced sound can be selected and output.

（Ｄ）第１及び第２の実施形態の変形実施形態
上記第１及び第２の実施形態では、マイクロホンアレイが２個の場合を示したが、マイクロホンアレイは３個以上あってもよい。この場合において（７）〜（１４）の各数式は、以下の様に拡張できる。ここでＭはマイクロホンアレイの総数である。

(D) Modified Embodiments of First and Second Embodiments In the first and second embodiments described above, the case where there are two microphone arrays is shown, but there may be three or more microphone arrays. In this case, the equations (7) to (14) can be expanded as follows. Here, M is the total number of microphone arrays.

（Ｅ）第３の実施形態
以下では、本発明に係る収音装置及びプログラムの第３の実施形態を、図面を参照して詳細に説明する。 (E) Third Embodiment Hereinafter, a third embodiment of the sound collection device and the program according to the present invention will be described in detail with reference to the drawings.

（Ｅ−１）第３の実施形態の構成
図８は、第３の実施形態に係る収音装置の構成を示すブロック図である。収音装置１０Ｃにおける、デジタル信号に変換された後の処理構成を、ＣＰＵと、ＣＰＵが実行するプログラムで実現することもできるが、機能的には、図８で表すことができる。 (E-1) Configuration of Third Embodiment FIG. 8 is a block diagram showing a configuration of a sound collection device according to the third embodiment. The processing configuration after being converted into a digital signal in the sound collecting device 10C can be realized by a CPU and a program executed by the CPU, but can be functionally represented by FIG.

第３の実施形態に係る収音装置１０Ｃは、マイクロホンアレイ１、マイクロホンアレイ２、データ入力部３、指向性形成部４、マイクロホンアレイ間遅延補正部５、目的エリア音パワー補正係数算出部９、目的エリア音抽出部７を備える。 The sound collection device 10C according to the third embodiment includes a microphone array 1, a microphone array 2, a data input unit 3, a directivity forming unit 4, an inter-microphone array delay correction unit 5, a target area sound power correction coefficient calculation unit 9, A target area sound extraction unit 7 is provided.

目的エリア音パワー補正係数算出部９は、各ＢＦ後のデータに含まれる目的エリア音成分のパワーを全て同じ大きさにするためのパワー補正係数を算出するものである。つまり、目的エリア音パワー補正係数算出部９は、各マイクロホンアレイ１、２のＢＦ出力のパワーの差の２乗を最も小さくする係数を算出し、これをパワー補正係数とする。 The target area sound power correction coefficient calculation unit 9 calculates a power correction coefficient for making all the powers of the target area sound components included in the data after each BF the same. That is, the target area sound power correction coefficient calculation unit 9 calculates a coefficient that minimizes the square of the power difference between the BF outputs of the microphone arrays 1 and 2 and sets this as the power correction coefficient.

目的エリア音抽出部７は、目的エリア音パワー補正係数算出部９で算出したパワー補正係数により補正した各ＢＦ出力データをＳＳし、目的エリア方向に存在する非目的エリア音を抽出する。さらに、目的エリア音抽出部７は、抽出した非目的エリア音を、各ＢＦ出力データからＳＳすることにより目的エリア音を抽出して出力するものである。 The target area sound extraction unit 7 extracts each BF output data corrected by the power correction coefficient calculated by the target area sound power correction coefficient calculation unit 9 and extracts a non-target area sound existing in the direction of the target area. Furthermore, the target area sound extraction unit 7 extracts and outputs the target area sound by performing SS on the extracted non-target area sound from each BF output data.

（Ｅ−２）第３の実施形態の動作
次に、実施形態に係る収音装置の動作を説明する。図９は、第３の実施形態に係る収音装置１０Ｃの処理を示すフローチャートである。 (E-2) Operation of the Third Embodiment Next, the operation of the sound collection device according to the embodiment will be described. FIG. 9 is a flowchart showing processing of the sound collecting device 10C according to the third embodiment.

マイクロホンアレイ間遅延補正部５は、まず目的エリアの位置とマイクロホンアレイの位置から、各マイクロホンアレイへの目的エリア音の到達時間を算出する。そして、最も目的エリアから遠い位置に配置されたマイクロホンアレイを基準として、全てのマイクロホンアレイに目的エリア音が同時に到達するように遅延を加える。マイクロホンアレイ間遅延補正部５によるこの操作により、任意に配置した各マイクロホンアレイ１、２の出力データを同時に扱うことが可能となる。 The inter-microphone array delay correcting unit 5 first calculates the arrival time of the target area sound to each microphone array from the position of the target area and the position of the microphone array. Then, with reference to the microphone array disposed farthest from the target area, a delay is added so that the target area sound reaches all the microphone arrays simultaneously. This operation by the inter-microphone array delay correction unit 5 makes it possible to simultaneously handle output data of the microphone arrays 1 and 2 that are arbitrarily arranged.

目的エリア音パワー補正係数算出部９は、マイクロホンアレイ１、２からの各ＢＦ出力データに含まれる目的エリア音成分のパワーを全て同じにするためのパワー補正係数を算出する。この際、目的エリア音パワー補正係数算出部９は、各マイクロホンアレイ１、２のＢＦ後の出力の差が最も小さくなるように目的エリア音パワー補正係数を更新する（Ｓ１４）。 The target area sound power correction coefficient calculation unit 9 calculates a power correction coefficient for making all the powers of the target area sound components included in the BF output data from the microphone arrays 1 and 2 the same. At this time, the target area sound power correction coefficient calculation unit 9 updates the target area sound power correction coefficient so that the difference between the outputs after BF of the microphone arrays 1 and 2 becomes the smallest (S14).

図１０は、目的エリア音パワー補正係数算出部９の構成を示すブロック図である。目的エリア音パワー補正係数算出部９は、パワー補正係数を求めるために、（１９）式、（２０）式に従い、２個のマイクロホンアレイ１、２のＢＦ後出力のパワーの差の２乗した評価関数の値を算出する。この際、指向性形成部４でＢＦを時間領域で行なっている場合は、目的エリア音パワー補正係数算出部９はＢＦ後出力データを周波数領域に変換する。

FIG. 10 is a block diagram showing the configuration of the target area sound power correction coefficient calculation unit 9. The target area sound power correction coefficient calculation unit 9 squares the power difference between the outputs after BF of the two

microphone arrays

1 and 2 according to the expressions (19) and (20) in order to obtain the power correction coefficient. The value of the evaluation function is calculated. At this time, if the directivity forming unit 4 performs BF in the time domain, the target area sound power correction coefficient calculation unit 9 converts the post-BF output data into the frequency domain.

ここで、Ｘ_１ｋ（ｎ）、Ｘ_２ｋ（ｎ）はマイクロホンアレイ１、２のＢＦ後出力データ、Ｎは周波数ビンの総数、ｋは周波数、α_１（ｎ）、α_２（ｎ）は各ＢＦ出力に対するパワー補正係数である。 Here, X _1k (n) and X _2k (n) are output data after BF of the microphone arrays 1 and 2, N is the total number of frequency bins, k is the frequency, and α ₁ (n) and α ₂ (n) are each This is a power correction coefficient for the BF output.

目的エリア音パワー補正係数算出部９は、評価関数Ｊ_１（ｎ），Ｊ_２（ｎ）の値が最も小さくなるように、（２１）式、（２２）式に従い、パワー補正係数α_１（ｎ）、α_２（ｎ）を更新する。ρは学習係数である。計算量を減らすために、目的エリア音パワー補正係数算出部９は、一方のパワー補正係数を先に求め、他方のパワー補正係数を、一方のパワー補正係数の逆数としても良い。

Destination area sound power correction coefficient calculator 9, the evaluation function _J 1 _(n), such that the value of _J 2 (n) is the smallest, (21), in accordance with (22), the power correction coefficient alpha ₁ ( n), α ₂ (n) is updated. ρ is a learning coefficient. In order to reduce the amount of calculation, the target area sound power correction coefficient calculation unit 9 may obtain one power correction coefficient first, and the other power correction coefficient may be the reciprocal of one power correction coefficient.

目的エリア音抽出部７は、目的エリア音パワー補正係数算出部６で算出した補正係数により補正した各ＢＦ出力データをＳＳし、目的エリア方向に存在する非目的エリア音を抽出する（Ｓ５）。さらに、目的エリア音抽出部７は、抽出した非目的エリア音を各ＢＦの出力データからＳＳすることにより目的エリア音を抽出する（Ｓ６）。 The target area sound extraction unit 7 SS SS each BF output data corrected by the correction coefficient calculated by the target area sound power correction coefficient calculation unit 6 and extracts a non-target area sound existing in the target area direction (S5). Furthermore, the target area sound extraction unit 7 extracts the target area sound by performing SS on the extracted non-target area sound from the output data of each BF (S6).

マイクロホンアレイ１からみた目的エリア方向に存在する非目的エリア音Ｎ_１（ｎ）を抽出するには、（１１）式に示すように、マイクロホンアレイ１のＢＦ出力Ｘ_１（ｎ）からマイクロホンアレイ２のＢＦ出力Ｘ_２（ｎ）にパワー補正係数α_２を掛けたものをＳＳする。同様に（１２）式に従い、マイクロホンアレイ２からみた目的エリア方向に存在する非目的エリア音Ｎ_２（ｎ）を抽出する。 In order to extract the non-target area sound N ₁ (n) existing in the direction of the target area viewed from the microphone array 1, the microphone array 2 is obtained from the BF output X ₁ (n) of the microphone array 1 as shown in the equation (11). SS is obtained by multiplying the BF output X ₂ (n) by the power correction coefficient α ₂ . Similarly, the non-target area sound N ₂ (n) existing in the target area direction viewed from the microphone array 2 is extracted according to the equation (12).

その後、目的エリア音抽出部７は、（１３）式、（１４）式に従い、各ＢＦ出力データから非目的エリア音をＳＳすることにより目的エリア音を抽出する。（１３）式、（１４）式のγ_１（ｎ）、γ_２（ｎ）はＳＳ時の強度を変更するための係数である。 Thereafter, the target area sound extraction unit 7 extracts the target area sound by SS of the non-target area sound from each BF output data according to the equations (13) and (14). In equations (13) and (14), γ ₁ (n) and γ ₂ (n) are coefficients for changing the strength at the time of SS.

第３の実施形態の効果を示すために以下の実験を行った。 The following experiment was conducted to show the effect of the third embodiment.

図１１は、マイクロホンアレイ１、２と音源の配置を示したものである。収音対象のエリアを一辺が２ｍの正方形とし、収音対象エリアを４つの区画に分けた。その内３つのエリアに、目的エリア音源１個と、非目的エリア音源を２個配置した。音源は全てヒトの声であり、これらをほぼ同じ音量で同時に再生し、マイクロホンアレイで録音した。マイクロホンアレイは２台使用し、それぞれ正面方向に目的エリア音源と非目的エリア音源が重なるように配置する。 FIG. 11 shows the arrangement of the microphone arrays 1 and 2 and the sound source. The area to be collected was a square with a side of 2 m, and the area to be collected was divided into four sections. One target area sound source and two non-target area sound sources were arranged in three areas. The sound sources were all human voices, and these were reproduced simultaneously at approximately the same volume and recorded with a microphone array. Two microphone arrays are used and arranged so that the target area sound source and the non-target area sound source overlap each other in the front direction.

図１１（Ａ）の配置パターン１では、各マイクロホンアレイ１、２に対し、目的エリア音源を非目的エリア音源の手前に配置した。また図１１（Ｂ）の配置パターン２では、目的エリア音源を非目的エリア音源の奥に配置した。各マイクロホンアレイ１、２は、同数のマイクロホンから構成され、１台のマイクロホンアレイに使用したマイクロホンの数は２個とした。マイクロホン間隔は全て３ｃｍとした。録音したデータを用い、本発明方式とマイクロホンアレイ単独でのＢＦの非目的エリア音の抑圧量を、計算機シミュレーションにより比較した。ＢＦの手法は既存の減算型ＢＦ（非特許文献２参照）を用いた。 In the arrangement pattern 1 of FIG. 11A, the target area sound source is arranged in front of the non-target area sound source for each of the microphone arrays 1 and 2. Further, in the arrangement pattern 2 of FIG. 11B, the target area sound source is arranged behind the non-target area sound source. Each of the microphone arrays 1 and 2 is composed of the same number of microphones, and the number of microphones used in one microphone array is two. All the microphone intervals were 3 cm. Using recorded data, the amount of suppression of non-target area sound of BF by the method of the present invention and the microphone array alone was compared by computer simulation. The existing subtractive BF (see Non-Patent Document 2) was used as the BF method.

図１２は、それぞれの配置パターンでの非目的エリア音の抑圧量を示したものである。図１２（Ａ）の配置パターン１では、本発明方式は、マイクロホンアレイ単独のＢＦに比べ、約４ｄＢ非目的エリア音の抑圧量が大きい。図１２（Ｂ）の配置パターン２においても、本発明方式の方が、マイクロホンアレイ単独のＢＦよりも約５．５ｄＢ大きく抑圧できている。このように、第３の実施形態によれば、目的エリア方向に存在する非目的エリア音を抑圧することができる。 FIG. 12 shows the suppression amount of the non-target area sound in each arrangement pattern. In the arrangement pattern 1 of FIG. 12A, the method of the present invention has a large suppression amount of about 4 dB non-target area sound compared to the BF of the microphone array alone. Also in the arrangement pattern 2 of FIG. 12B, the method of the present invention can suppress approximately 5.5 dB larger than the BF of the microphone array alone. Thus, according to the third embodiment, it is possible to suppress the non-target area sound existing in the target area direction.

（Ｅ−３）第３の実施形態の効果
第３の実施形態によれば、各ＢＦの出力に含まれる目的エリア音成分の大きさを補正することにより目的エリア音を抽出するため、各マイクロホンアレイの位置を調整することなく、目的エリアが非目的エリア音源に囲まれている状況でも目的エリア昔のみを強調することができる。つまり複数のマイクロホンアレイを異なる方向に一度配置するだけで目的エリア音のみを強調することができる。 (E-3) Effects of the Third Embodiment According to the third embodiment, each microphone is used to extract the target area sound by correcting the magnitude of the target area sound component included in the output of each BF. Without adjusting the position of the array, it is possible to emphasize only the past target area even in a situation where the target area is surrounded by non-target area sound sources. That is, only the target area sound can be emphasized by arranging a plurality of microphone arrays once in different directions.

また、第３の実施形態によれば、指向性形成部が形成する指向性を変更することができるので、複数のマイクロホンアレイの位置などを変更することなく、目的エリアの変更にも容易に対応することができる。 Further, according to the third embodiment, since the directivity formed by the directivity forming unit can be changed, it is possible to easily change the target area without changing the positions of a plurality of microphone arrays. can do.

さらに、第３の実施形態によれば、減算型ＢＦを使用することができるため、１個のマイクロホンアレイを、少ないマイクロホンで構成することができる。 Furthermore, according to the third embodiment, since the subtractive BF can be used, one microphone array can be configured with a small number of microphones.

（Ｆ）第４の実施形態
第３の実施形態では、目的エリア音が抽出されたデータは、マイクロホンアレイの数だけ出力される。エリア収音装置を使用する際、これらのデータの中から最終的に１つのデータを選択して出力する状況が想定される。そこで第４の実施形態は、目的エリアと各マイクロホンアレイの距離や、目的エリア音と非目的エリア音のＳＮの比を特徴量として利用し、最も目的エリア音が強調されているデータを選択する出力データ選択部を備える。 (F) Fourth Embodiment In the third embodiment, data from which target area sounds are extracted is output by the number of microphone arrays. When using the area sound pickup device, a situation is assumed in which one data is finally selected and output from these data. Accordingly, in the fourth embodiment, the distance between the target area and each microphone array and the SN ratio of the target area sound and the non-target area sound are used as feature amounts to select data in which the target area sound is most emphasized. An output data selection unit is provided.

以下、本発明による収音装置及びプログラムの第４の実施形態を図面を参照して説明する。 Hereinafter, a fourth embodiment of a sound collecting apparatus and a program according to the present invention will be described with reference to the drawings.

（Ｆ−１）第４の実施形態の構成
図１３は、第４の実施形態に係る収音装置の構成を示すブロック図である。図１３において、第４の実施形態に係る収音装置１０Ｄは、マイクロホンアレイ１、マイクロホンアレイ２、データ入力部３、指向性形成部４、マイクロホンアレイ間遅延補正部５、目的エリア音パワー補正係数算出部９、目的エリア音抽出部７、出力データ選択部８を備える。 (F-1) Configuration of Fourth Embodiment FIG. 13 is a block diagram illustrating a configuration of a sound collection device according to the fourth embodiment. In FIG. 13, a sound collection device 10D according to the fourth embodiment includes a microphone array 1, a microphone array 2, a data input unit 3, a directivity forming unit 4, an inter-microphone array delay correction unit 5, a target area sound power correction coefficient. A calculation unit 9, a target area sound extraction unit 7, and an output data selection unit 8 are provided.

第４の実施形態に係る収音装置１０Ｄは、第３の実施形態で説明した構成要素に加えて、目的エリア音抽出部７の後段に出力データ選択部８を備える。 The sound collection device 10D according to the fourth embodiment includes an output data selection unit 8 at the subsequent stage of the target area sound extraction unit 7 in addition to the components described in the third embodiment.

（Ｆ−２）第４の実施形態の動作
次に、第４の実施形態に係る収音装置１０Ｄの処理を説明する。図１４は、第４の実施形態に係る収音装置１０Ｄの処理を示すフローチャートである。図１４において、Ｓ１、Ｓ２、Ｓ３、Ｓ１４、Ｓ５、Ｓ６の処理は図９のＳ１、Ｓ２、Ｓ３、Ｓ１４、Ｓ５、Ｓ６の処理と同様である。 (F-2) Operation of Fourth Embodiment Next, processing of the sound collection device 10D according to the fourth embodiment will be described. FIG. 14 is a flowchart showing the processing of the sound collection device 10D according to the fourth embodiment. In FIG. 14, the processes of S1, S2, S3, S14, S5, and S6 are the same as the processes of S1, S2, S3, S14, S5, and S6 of FIG.

（Ｆ−３）第４の実施形態の効果
第４の実施形態によれば、第３の実施形態の効果に加えて、複数存在する目的エリア音が抽出されたデータの中から、最も目的エリア音が強調されたデータを選択し、出力することが可能になる。 (F-3) Effect of the fourth embodiment According to the fourth embodiment, in addition to the effect of the third embodiment, the target area is the most out of the data from which a plurality of target area sounds are extracted. Data with enhanced sound can be selected and output.

（Ｇ）第３及び第４の実施形態の変形実施形態
上記第３及び第４の実施形態では、マイクロホンアレイが２つのものを示したが、マイクロホンアレイは３個以上あってもよい。この場合において（１９）〜（２２）の各数式は、以下の様に拡張できる。ここでＭはマイクロホンアレイの総数である。

(G) Modified Embodiment of Third and Fourth Embodiments In the third and fourth embodiments, two microphone arrays are shown, but there may be three or more microphone arrays. In this case, the equations (19) to (22) can be expanded as follows. Here, M is the total number of microphone arrays.

（Ｈ）他の実施形態
上記各実施形態では、マイクロホンアレイが捕捉して得た音響信号をリアルタイムに処理するものを示したが、マイクロホンアレイが捕捉して得た音響信号を記憶媒体に記憶させ、その後、記憶媒体から読み出して処理して目的エリア音の強調信号を得るようにしても良い。このように記憶媒体を利用する場合には、マイクロホンアレイが設定されている場所と、強調処理をする場所とが離れていても良い。同様に、リアルタイムに処理する場合にも、マイクロホンアレイが設定されている場所と、強調処理する場所とが離れていても良く、通信により信号を遠隔地に供給するようにしても良い。以上のような記憶媒体や通信を利用したりする場合も、本発明の「収音装置」の概念に含まれるものとする。 (H) Other Embodiments In each of the above embodiments, the acoustic signal obtained by the microphone array is shown in real time. However, the acoustic signal obtained by the microphone array is stored in a storage medium. Thereafter, the enhancement signal of the target area sound may be obtained by reading out from the storage medium and processing. When the storage medium is used in this way, the place where the microphone array is set and the place where the enhancement processing is performed may be separated from each other. Similarly, when processing in real time, the place where the microphone array is set and the place where the emphasis processing is performed may be separated from each other, and the signal may be supplied to a remote place by communication. The case where the above storage medium or communication is used is also included in the concept of the “sound collecting device” of the present invention.

１０Ａ、１０Ｂ、１０Ｃ、１０Ｄ…収音装置、
１…マイクロホンアレイ、２…マクロホンアレイ、３…データ入力部、
４…指向性形成部、５…マイクロホンアレイ間遅延補正部、
６及び９…目的エリア音パワー補正係数算出部、７…目的エリア音抽出部。 10A, 10B, 10C, 10D ... sound collecting device,
1 ... microphone array, 2 ... macrophone array, 3 ... data input unit,
4 ... directivity forming part, 5 ... delay correction part between microphone arrays,
6 and 9: a target area sound power correction coefficient calculation unit, 7: a target area sound extraction unit.

Claims

Multiple microphone arrays,
A directivity forming unit that forms directivity in the direction of the target area by a beamformer with respect to the output of each microphone array,
Inter-microphone array delay correction that corrects the delay caused by the difference in the distance between the target area and each microphone array so that the target area sound arrives at all microphone arrays simultaneously at the output after the beam former of each microphone array. And
In order to make the power of the target area sound included in the beamformer output of each microphone array all the same, the mode value or median value of the ratio of the amplitude spectrum between the beamformer outputs of each microphone array is calculated. A target area sound power correction coefficient calculation unit as a correction coefficient;
Using the correction coefficient calculated by the target area sound power correction coefficient calculating unit, correcting the beamformer output of each microphone array, extracting each non-target area sound existing in the target area direction by subtracting the spectrum, A sound collection apparatus comprising: a target area sound extraction unit that extracts a target area sound by performing spectral subtraction on the extracted non-target area sound from the beamformer output of each microphone array.

Multiple microphone arrays,
A directivity forming unit that forms directivity in the direction of the target area by a beamformer with respect to the output of each microphone array,
Inter-microphone array delay correction that corrects the delay caused by the difference in the distance between the target area and each microphone array so that the target area sound arrives at all microphone arrays simultaneously at the output after the beam former of each microphone array. And
In order to make all the power of the target area sound included in the beamformer output of each microphone array the same magnitude, a coefficient that minimizes the square of the difference in power of the beamformer output of each microphone array is calculated, A target area sound power correction coefficient calculation unit as a correction coefficient;
Using the correction coefficient calculated by the target area sound power correction coefficient calculating unit, correcting the beamformer output of each microphone array, extracting each non-target area sound existing in the target area direction by subtracting the spectrum, A sound collection apparatus comprising: a target area sound extraction unit that extracts a target area sound by performing spectral subtraction on the extracted non-target area sound from the beamformer output of each microphone array.

An output data selection unit for selecting the data in which the target area sound is most emphasized from the outputs of the target area sound extraction unit, using the distance or SN ratio between the target area and each of the microphone arrays as an index of the target area sound enhancement. The sound collecting device according to claim 1, further comprising:

A computer that receives signals from multiple microphone arrays
A directivity forming unit that forms directivity in the direction of the target area by a beamformer with respect to the output of each microphone array,
Inter-microphone array delay correction that corrects the delay caused by the difference in the distance between the target area and each microphone array so that the target area sound arrives at all microphone arrays simultaneously at the output after the beam former of each microphone array. Part,
In order to make the power of the target area sound included in the beamformer output of each microphone array all the same, the mode value or median value of the ratio of the amplitude spectrum between the beamformer outputs of each microphone array is calculated. , A target area sound power correction coefficient calculation unit as a correction coefficient,
Using the correction coefficient calculated by the target area sound power correction coefficient calculating unit, correcting the beamformer output of each microphone array, extracting each non-target area sound existing in the target area direction by subtracting the spectrum, A sound collection program that functions as a target area sound extraction unit that extracts a target area sound by performing spectral subtraction on the extracted non-target area sound from the beamformer output of each microphone array.

A computer that receives signals from multiple microphone arrays
A directivity forming unit that forms directivity in the direction of the target area by a beamformer with respect to the output of each microphone array,
Inter-microphone array delay correction that corrects the delay caused by the difference in the distance between the target area and each microphone array so that the target area sound arrives at all microphone arrays simultaneously at the output after the beam former of each microphone array. Part,
In order to make all the power of the target area sound included in the beamformer output of each microphone array the same magnitude, a coefficient that minimizes the square of the difference in power of the beamformer output of each microphone array is calculated, A target area sound power correction coefficient calculation unit as a correction coefficient,
Using the correction coefficient calculated by the target area sound power correction coefficient calculating unit, correcting the beamformer output of each microphone array, extracting each non-target area sound existing in the target area direction by subtracting the spectrum, A sound collection program that functions as a target area sound extraction unit that extracts a target area sound by performing spectral subtraction on the extracted non-target area sound from the beamformer output of each microphone array.