JP6885483B1

JP6885483B1 - Sound collecting device, sound collecting program and sound collecting method

Info

Publication number: JP6885483B1
Application number: JP2020020077A
Authority: JP
Inventors: 一浩片桐
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2020-02-07
Filing date: 2020-02-07
Publication date: 2021-06-16
Anticipated expiration: 2040-02-07
Also published as: JP2021125851A

Abstract

【課題】非目的エリア音の到来方向に関わらず目的エリア音のみを収音する収音装置、収音プログラム及び収音方法を提供する。【解決手段】収音装置は、複数のマイクロホンアレイの入力信号についてビームフォーマによって目的方向信号を取得し、２つのマイクロホンアレイの組み合わせで形成されるマイクロホンアレイセットを３つ以上設定し、それぞれのマイクロホンアレイセットの目的方向信号に基づいてスペクトル減算処理により目的エリア音を抽出する手段と、それぞれのマイクロホンアレイセットを用いて抽出された目的エリア音を比較した結果に基づき、それぞれのマイクロホンアレイセットを用いて抽出された目的エリア音から１つの出力信号を生成して出力する。【選択図】図１PROBLEM TO BE SOLVED: To provide a sound collecting device, a sound collecting program and a sound collecting method for collecting only a target area sound regardless of the arrival direction of a non-target area sound. SOLUTION: A sound collecting device acquires a target direction signal by a beamformer for an input signal of a plurality of microphone arrays, sets three or more microphone array sets formed by a combination of two microphone arrays, and sets each microphone. Each microphone array set is used based on the result of comparing the target area sound extracted by the spectrum subtraction process based on the target direction signal of the array set and the target area sound extracted by each microphone array set. One output signal is generated and output from the target area sound extracted by the above. [Selection diagram] Fig. 1

Description

この発明は、収音装置、収音プログラム及び収音方法に関し、例えば、特定のエリアの音を強調し、それ以外のエリアの音を抑制するシステムに適用し得る。 The present invention relates to a sound collecting device, a sound collecting program, and a sound collecting method, and can be applied to, for example, a system that emphasizes sound in a specific area and suppresses sound in other areas.

複数の音源が存在する環境下において、ある特定方向の音のみ分離し収音する技術として、マイクロホンアレイを用いたビームフォーマ（ＢｅａｍＦｏｒｍｅｒ；以下「ＢＦ」とも呼ぶ）がある。ＢＦとは、各マイクロホンに到達する信号の時間差を利用して指向性を形成する技術である（非特許文献１参照）。 There is a beam former (hereinafter also referred to as "BF") using a microphone array as a technique for separating and collecting only sound in a specific direction in an environment where a plurality of sound sources exist. BF is a technique for forming directivity by utilizing the time difference between signals arriving at each microphone (see Non-Patent Document 1).

従来、ＢＦは、加算型と減算型の大きく２つの種類に分けられる。特に減算型ＢＦは、加算型即に比べ、少ないマイクロホン数で指向性を形成できるという利点がある。 Conventionally, BF is roughly classified into two types, an addition type and a subtraction type. In particular, the subtraction type BF has an advantage that directivity can be formed with a smaller number of microphones than the addition type immediate.

図１１は、マイクロホンＭの数が２個の場合の減算型ＢＦ２００に係る構成を示すブロック図である。 FIG. 11 is a block diagram showing a configuration related to the subtraction type BF200 when the number of microphones M is two.

図１２は、２個のマイクロホンＭ１、Ｍ２を用いた減算型ＢＦ２００により形成される指向性フィルタの例について示した説明図である。 FIG. 12 is an explanatory diagram showing an example of a directional filter formed by a subtraction type BF200 using two microphones M1 and M2.

減算型ＢＦ２００は、まず遅延器２１０により目的とする方向に存在する音（以下、「目的音」と呼ぶ）が各マイクロホンＭ１、Ｍ２に到来する信号の時間差を算出し、遅延を加えることにより目的音の位相を合わせる。上述の時間差は以下の（１）式により算出することができる。 The subtraction type BF200 first calculates the time difference between the signals that the sound existing in the target direction (hereinafter referred to as "target sound") arrives at the microphones M1 and M2 by the delay device 210, and adds a delay to the object. Match the phase of the sound. The above time difference can be calculated by the following equation (1).

ここで、ｄはマイクロホンＭ１、Ｍ２間の距離、ｃは音速、τ_ｉは遅延量である。またθ_Ｌは、各マイクロホンＭ（Ｍ１、Ｍ２）を結んだ直線に対する垂直方向から目的方向への角度である。 Here, d is the distance between the microphones M1 and M2, c is the speed of sound, and τ _i is the delay amount. Further, θ _L is an angle from the vertical direction to the target direction with respect to the straight line connecting the microphones M (M1, M2).

また、ここで、死角がマイクロホンＭ１とＭ２の中心に対し、マイクロホンＭ１の方向に存在する場合、遅延器２１０は、マイクロホンＭ１の入力信号ｘ_１（ｔ）に対し遅延処理を行う。その後、減算型ＢＦ２００では、以下の（２）式に従い処理（減算処理）を行う。 Further, when the blind spot exists in the direction of the microphone M1 with respect to the center of the microphones M1 and M2, the delay device 210 performs delay processing on _{the input signal x 1 (t) of the microphone M1.} After that, in the subtraction type BF200, processing (subtraction processing) is performed according to the following equation (2).

減算型ＢＦ２００の処理は周波数領域でも同様に行うことができ、その場合（２）式は以下の（３）のように変更される。

The processing of the subtraction type BF200 can be performed in the same manner in the frequency domain, in which case the equation (2) is modified as follows (3).

ここでθ_Ｌ＝±π／２の場合、減算型ＢＦ２００により形成される指向性は図１２（ａ）に示すように、カージオイド型の単一指向性となる。また、「θ_Ｌ＝０、π」の場合、減算型ＢＦ２００により形成される指向性は、図１２（ｂ）のような８の字型の双指向性となる。 Here, when θ _L = ± π / 2, the directivity formed by the subtraction type BF200 is a cardioid type unidirectionality as shown in FIG. 12A. Further, in the case of "θ _L = 0, π", the directivity formed by the subtraction type BF200 is a figure eight bidirectionality as shown in FIG. 12B.

以下では、入力信号から単一指向性を形成するフィルタを「単一指向性フィルタ」と呼び、双指向性を形成するフィルタを双指向性フィルタと呼ぶものとする。 Hereinafter, a filter that forms unidirectionality from an input signal is referred to as a "unidirectional filter", and a filter that forms bidirectionality is referred to as a bidirectional filter.

また、減算器２２０では、スペクトル減算法（ＳｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎ；以下、単に、「ＳＳ」とも呼ぶ）を用いることで、双指向性の死角に強い指向性を形成することもできる。ＳＳによる指向性は、以下の（４）式に従い全周波数、もしくは指定した周波数帯域で形成される。 Further, in the subtractor 220, a strong directivity can be formed in a bidirectional blind spot by using a spectral subtraction method (hereinafter, also simply referred to as “SS”). The directivity by SS is formed in all frequencies or a designated frequency band according to the following equation (4).

以下の（４）式では、マイクロホンＭ１の入力信号Ｘ_１を用いているが、マイクロホンＭ２の入力信号Ｘ_２でも同様の効果を得ることができる。ここでβは、ＳＳの強度を調節するための係数である。また、減算器２２０では、減算時に値がマイナスになった場合は、０または元の値を小さくした値に置き換えるフロアリング処理を行う。以上のような減算型ＢＦ２００の処理方式では、双指向性の特性によって目的方向以外に存在する音（以下、「非目的音」と呼ぶ）を抽出し、抽出した非目的音の振幅スペクトルを入力信号の振幅スペクトルから減算することで、目的音を強調することができる。

In the following equation (4), and using the input signal X ₁ microphone M1, but it is possible to obtain the same effect input signal X ₂ microphones M2. Here, β is a coefficient for adjusting the intensity of SS. Further, in the subtractor 220, when the value becomes negative at the time of subtraction, a flooring process is performed in which 0 or the original value is replaced with a smaller value. In the subtraction type BF200 processing method as described above, sounds existing in directions other than the target direction (hereinafter referred to as "non-purpose sounds") are extracted due to the bidirectional characteristics, and the amplitude spectrum of the extracted non-purpose sounds is input. The target sound can be emphasized by subtracting it from the amplitude spectrum of the signal.

ある特定のエリア内に存在する音（以下、「目的エリア音」と呼ぶ）だけを収音したい場合、減算型ＢＦを用いるだけでは、そのエリアの周囲に存在する音源の音（以下、「非目的エリア音」と呼ぶ）も収音してしまう可能性がある。そこで、特許文献１では、複数のマイクロホンアレイを用い、それぞれ別々の方向から目的エリアへ指向性を向け、指向性を目的エリアで交差させることで目的エリア音を収音する手法（以下、「エリア収音」と呼ぶ）を提案している。エリア収音では、まず各マイクロホンアレイのＢＦ出力に含まれる目的エリア音の振幅スペクトルの比率を推定し、それを補正係数とする。 If you want to collect only the sound that exists in a specific area (hereinafter referred to as "target area sound"), simply using the subtraction type BF will result in the sound of the sound source that exists around that area (hereinafter, "non-" There is a possibility that the sound will be picked up (called the target area sound). Therefore, in Patent Document 1, a method of collecting sound in a target area by using a plurality of microphone arrays, directing directivity from different directions to the target area, and intersecting the directivity in the target area (hereinafter, "area"). It is called "sound collection"). In the area sound collection, first, the ratio of the amplitude spectrum of the target area sound included in the BF output of each microphone array is estimated, and this is used as the correction coefficient.

例えば、２つのマイクロホンアレイを使用する場合、目的エリア音振幅スペクトルの補正係数は、以下の（５）式及び（６）式の組み合わせ、又は以下の（７）式及び（８）式の組み合わせにより算出することができる。ここで、Ｙ_１ｋ（ｎ）は第１のマイクロホンアレイのＢＦ出力の振幅スペクトルであり、Ｙ_２ｋ（ｎ）は第２のマイクロホンアレイのＢＦ出力の振幅スペクトルであり、Ｎは周波数ビンの総数であり、ｋは周波数である。また、ここで、α_１（ｎ）、α_２（ｎ）は各ＢＦ出力に対する振幅スペクトル補正係数である。さらに、ここで、ｍｏｄｅは最頻値を表し、ｍｅｄｅｉａｎは中央値を表している。

For example, when two microphone arrays are used, the correction coefficient of the target area sound amplitude spectrum is determined by the combination of the following equations (5) and (6) or the combination of the following equations (7) and (8). Can be calculated. Here, Y _1k (n) is the amplitude spectrum of the BF output of the first microphone array, Y _2k (n) is the amplitude spectrum of the BF output of the second microphone array, and N is the total number of frequency bins. Yes, k is the frequency. Further, here, α ₁ (n) and α ₂ (n) are amplitude spectrum correction coefficients for each BF output. Further, here, mode represents the mode and median represents the median.

以上の処理により、減算器２２０は、振幅スペクトル補正係数α_１（ｎ）、α_２（ｎ）を求め、求めた補正係数により各ＢＦ出力を補正し、ＳＳすることで、目的エリア方向に存在する非目的エリア音を抽出する。さらに、減算器２２０は、抽出した非目的エリア音を各ＢＦの出力からＳＳすることにより目的エリア音を抽出することができる。 Through the above processing, the subtractor 220 obtains the amplitude spectrum correction coefficients α ₁ (n) and α ₂ (n), corrects each BF output with the obtained correction coefficients, and SSs the amplitude spectrum correction coefficients to exist in the target area direction. Extract non-purpose area sounds. Further, the subtractor 220 can extract the target area sound by SSing the extracted non-purpose area sound from the output of each BF.

減算型ＢＦ２００は、第１のマイクロホンアレイからみた目的エリア方向に存在する非目的エリア音Ｎ_１（ｎ）を抽出際、例えば、（９）式に示すように、第１のマイクロホンアレイのＢＦ出力Ｙ_１（ｎ）から第２のマイクロホンアレイのＢＦ出力Ｙ_２（ｎ）に振幅スペクトル補正係数α_２を掛けたものをＳＳする。減算型ＢＦ２００は、同様に、以下の（１０）式に従い、第２のマイクロホンアレイからみた目的エリア方向に存在する非目的エリア音Ｎ_２（ｎ）を抽出する。 _{The subtraction type BF200 extracts the non-purpose area sound N 1} (n) existing in the direction of the target area as viewed from the first microphone array, for example, as shown in Eq. (9), the BF output of the first microphone array. The SS is obtained by multiplying _{the BF output Y 2} (n) of the second microphone array from _{Y 1} (n) by the amplitude spectrum correction coefficient α _2. _{Similarly, the subtraction type BF200 extracts the non-purpose area sound N 2} (n) existing in the direction of the target area as viewed from the second microphone array according to the following equation (10).

その後、減算型ＢＦ２００は、以下の（１１）式、又は（１２）式に従い、各ＢＦ出力から非目的エリア音をＳＳして目的エリア音を抽出する。なお、以下の（１１）式は、第１のマイクロホンアレイを基準として、目的エリア音を抽出する場合の処理を示している。また、以下の（１２）式は、第２のマイクロホンアレイを基準として目的エリア音を抽出する場合の処理を示している。ここでγ_１（ｎ）、γ_２（ｎ）は、ＳＳ時の強度を変更するための係数である。

After that, the subtraction type BF200 extracts the target area sound by SSing the non-target area sound from each BF output according to the following equation (11) or (12). The following equation (11) shows the process when the target area sound is extracted with reference to the first microphone array. Further, the following equation (12) shows the process when the target area sound is extracted with reference to the second microphone array. Here, γ ₁ (n) and γ ₂ (n) are coefficients for changing the intensity at the time of SS.

従来のエリア収音処理では、目的エリア音を抽出するために、（４）式と（１１）及び（１２）式で非線形処理であるＳＳを行っているため、高雑音環境下ではミュージカルノイズと呼ばれる不快な異音が発生する恐れがある。 In the conventional area sound collection processing, in order to extract the target area sound, SS, which is a non-linear processing in the equations (4), (11) and (12), is performed, so that the musical noise is generated in a high noise environment. An unpleasant noise called may occur.

そこで、特許文献２の記載技術では、入力信号に目的エリア音が存在している区間と存在していない区間を判定し、目的エリア音が存在していない区間ではエリア収音処理した音を出力しないことにより、ミュージカルノイズなどの異音を抑えている。 Therefore, in the technique described in Patent Document 2, the section in which the target area sound exists and the section in which the target area sound does not exist are determined in the input signal, and the area pickled sound is output in the section in which the target area sound does not exist. By not doing so, abnormal sounds such as musical noise are suppressed.

特許文献２の記載技術では、目的エリア音が存在しているかどうかを判定するために、まず（１３）式に従い入力信号と目的エリア音を抽出した出力（以後、「エリア音出力」と呼ぶ）間の振幅スペクトル比Ｒ（＝エリア音出力／入力信号）を算出する。 In the technique described in Patent Document 2, in order to determine whether or not the target area sound exists, first, the input signal and the target area sound are extracted according to the equation (13), and the output (hereinafter referred to as "area sound output"). Calculate the amplitude spectrum ratio R (= area sound output / input signal) between them.

また、目的エリア内に音源が存在する場合、入力信号Ｘ_１とエリア音出力Ｚ_１には目的エリア音が共通に含まれるため、目的エリア音成分の振幅スペクトル比は１に近い値となる。逆に、非目的エリア音成分は、エリア音出力では抑圧されているため、振幅スペクトル比は小さい値となる。その他の背景雑音成分に関してもエリア収音処理では複数回のＳＳを行うため、専用の雑音抑圧処理を事前にしなくてもある程度抑圧され、振幅スペクトル比は小さい値となる。逆に、目的エリア音が存在しない場合、エリア音出力には、入力信号と比べて消し残りの弱い雑音しか含まれていないため、振幅スペクトル比は全体域で小さい値となる。 Further, when the sound source exists in the _{target area, the target area sound is commonly included in the input signal X 1} and the area sound output Z ₁ , so that the amplitude spectrum ratio of the target area sound component is close to 1. On the contrary, since the non-purpose area sound component is suppressed in the area sound output, the amplitude spectrum ratio becomes a small value. Since the area sound collection process also performs SS a plurality of times for other background noise components, they are suppressed to some extent without performing a dedicated noise suppression process in advance, and the amplitude spectrum ratio becomes a small value. On the contrary, when the target area sound does not exist, the area sound output contains only weak noise that remains unerased as compared with the input signal, so that the amplitude spectrum ratio becomes a small value in the entire range.

特許文献２の記載技術では、この特徴により、（１４）式に従い各周波数で求めた振幅スペクトル比の平均値Ｕを取ると、目的エリア音が存在するときと存在しないときとで大きな差が生まれることになる。ここでｍとｎは、それぞれ処理帯域（周波数帯域）の上限と下限であり、例えば音声情報が十分に含まれる１００Ｈｚから６ｋＨｚとする。 In the technique described in Patent Document 2, if the average value U of the amplitude spectral ratios obtained at each frequency is taken according to the equation (14) due to this feature, a large difference is generated between the presence and absence of the target area sound. It will be. Here, m and n are the upper limit and the lower limit of the processing band (frequency band), respectively, and are set to, for example, 100 Hz to 6 kHz, which sufficiently include voice information.

そして、特許文献２の記載技術では、平均パワースペクトル比を予め設定した閾値で判定し、目的エリア音が存在しないと判定された場合は、エリア音出力データを出力せずに無音、もしくは入力信号のゲインを小さくした音を出力する。

Then, in the technique described in Patent Document 2, the average power spectral ratio is determined by a preset threshold value, and when it is determined that the target area sound does not exist, there is no sound or an input signal without outputting the area sound output data. Outputs a sound with a reduced gain.

特開２０１４−０７２７０８号公報Japanese Unexamined Patent Publication No. 2014-072708 特開２０１６−１２７４５７号公報Japanese Unexamined Patent Publication No. 2016-127457

浅野太著、“音響テクノロジーシリーズ１６音のアレイ信号処理−音源の定位・追跡と分離−”、日本音響学会編、コロナ社、２０１１年２月２５日発行Tadashi Asano, "Acoustic Technology Series 16 Sound Array Signal Processing-Localization, Tracking and Separation of Sound Sources-", edited by Acoustical Society of Japan, Corona Publishing Co., Ltd., February 25, 2011

特許文献１に記載された収音方式では、２つのマイクロホンアレイを収音したいエリアの正面の左右に設置し、マイクロホンアレイ間の距離及び角度を調節すれば、任意のエリア内に存在する音を収音することができる。 In the sound collection method described in Patent Document 1, two microphone arrays are installed on the left and right in front of the area where the sound is to be collected, and the sound existing in an arbitrary area can be obtained by adjusting the distance and angle between the microphone arrays. Sound can be picked up.

しかしながら、特許文献１の記載技術を用いて収音する場合、収音エリアは上下方向に指向性が広がっているため、目的エリア音の上下方向に雑音源が存在する場合には、その雑音も収音してしまうことになる。目的エリア音の上下方向に雑音源がある場合とは、例えば、目的エリアの真上にスピーカなどの音源がある場合である。また、特許文献１に記載された収音方式では、上下方向の指向性は、マイクロホンアレイが設置してある高さから離れるに従い徐々に広がる性質があるため、スピーカが真上になくても収音してしまう可能性がある。 However, when sound is picked up using the technique described in Patent Document 1, the sound picking area has a wide directivity in the vertical direction. Therefore, if there is a noise source in the vertical direction of the target area sound, the noise is also present. The sound will be picked up. The case where there is a noise source in the vertical direction of the target area sound is, for example, a case where there is a sound source such as a speaker directly above the target area. Further, in the sound collection method described in Patent Document 1, the directivity in the vertical direction has a property of gradually expanding as the distance from the height at which the microphone array is installed increases, so that the sound can be collected even if the speaker is not directly above. It may make a noise.

一方、特許文献２の記載技術のように、入力と出力の振幅スペクトル比により、目的エリア音が存在するかどうかを判定する場合は、目的エリア上空のスピーカから再生される音量が小さければ収音されない。しかしながら、特許文献２の記載技術を適用したとしても、駅構内のように大音量のアナウンス（非目的エリア音）が流れている環境では、当該アナウンスの音を収音してしまう恐れがある。 On the other hand, as in the technique described in Patent Document 2, when determining whether or not the target area sound exists based on the amplitude spectrum ratio of the input and the output, if the volume reproduced from the speaker above the target area is low, the sound is picked up. Not done. However, even if the technique described in Patent Document 2 is applied, there is a risk that the sound of the announcement will be picked up in an environment where a loud announcement (non-purpose area sound) is flowing, such as in a station yard.

以上のような問題に鑑みて、非目的エリア音の到来方向に関わらず目的エリア音のみを収音する収音装置、収音プログラム、及び収音方法が望まれている。 In view of the above problems, a sound collecting device, a sound collecting program, and a sound collecting method that collect only the target area sound regardless of the arrival direction of the non-target area sound are desired.

第１の本発明の収音装置は、（１）複数のマイクロホンアレイから供給される入力信号のそれぞれについて、ビームフォーマによって目的エリアが存在する目的エリア方向へ指向性を形成して、前記マイクロホンアレイごとに前記目的エリア方向からの目的方向信号を取得する指向性形成手段と、（２）２つの前記マイクロホンアレイの組み合わせで形成されるマイクロホンアレイセットを３つ以上設定し、それぞれの前記マイクロホンアレイセットについて、それぞれの前記マイクロホンアレイの前記目的方向信号をスペクトル減算することで前記目的エリア方向に存在する非目的エリア音を抽出し、抽出した前記非目的エリア音をいずれかの前記目的方向信号からスペクトル減算することにより目的エリア音を抽出する目的エリア音抽出手段と、（３）それぞれの前記マイクロホンアレイセットを用いて抽出された前記目的エリア音を比較した結果に基づき、それぞれの前記マイクロホンアレイセットを用いて抽出された前記目的エリア音から１つの出力信号を生成して出力する出力手段とを有することを特徴とする。 The first sound collecting device of the present invention (1) forms directivity for each of the input signals supplied from the plurality of microphone arrays in the direction of the target area where the target area exists by the beam former, and the microphone array. For each, three or more microphone array sets formed by a combination of the direction forming means for acquiring the target direction signal from the target area direction and (2) two microphone arrays are set, and each microphone array set is set. By subtracting the spectrum of the target direction signal of each of the microphone arrays, the non-purpose area sound existing in the target area direction is extracted, and the extracted non-purpose area sound is spectrumd from any of the target direction signals. Based on the result of comparing the target area sound extraction means for extracting the target area sound by subtraction and the target area sound extracted using each of the microphone array sets, each microphone array set is obtained. It is characterized by having an output means for generating and outputting one output signal from the target area sound extracted by the use.

第２の本発明の収音プログラムは、コンピュータを、（１）複数のマイクロホンアレイから供給される入力信号のそれぞれについて、ビームフォーマによって目的エリアが存在する目的エリア方向へ指向性を形成して、前記マイクロホンアレイごとに前記目的エリア方向からの目的方向信号を取得する指向性形成手段と、（２）２つの前記マイクロホンアレイの組み合わせで形成されるマイクロホンアレイセットを３つ以上設定し、それぞれの前記マイクロホンアレイセットについて、それぞれの前記マイクロホンアレイの前記目的方向信号をスペクトル減算することで前記目的エリア方向に存在する非目的エリア音を抽出し、抽出した前記非目的エリア音をいずれかの前記目的方向信号からスペクトル減算することにより目的エリア音を抽出する目的エリア音抽出手段と、（３）それぞれの前記マイクロホンアレイセットを用いて抽出された前記目的エリア音を比較した結果に基づき、それぞれの前記マイクロホンアレイセットを用いて抽出された前記目的エリア音から１つの出力信号を生成して出力する出力手段として機能させることを特徴とする。 The second sound pick-up program of the present invention makes the computer (1) direct the input signals supplied from the plurality of microphone arrays toward the target area where the target area exists by the beam former. For each of the microphone arrays, three or more microphone array sets formed by (2) a combination of the two microphone arrays and the directional forming means for acquiring the target direction signal from the destination area direction are set, and the respective microphone arrays are set. For the microphone array set, the non-purpose area sound existing in the target area direction is extracted by subtracting the spectrum of the target direction signal of each of the microphone arrays, and the extracted non-purpose area sound is used in any of the target directions. Based on the result of comparing the target area sound extraction means for extracting the target area sound by subtracting the spectrum from the signal and the target area sound extracted using each of the microphone array sets, each microphone It is characterized in that it functions as an output means for generating and outputting one output signal from the target area sound extracted by using the array set.

第３の本発明は、収音装置が行う収音方法において、（１）前記収音装置は、指向性形成手段、目的エリア音抽出手段、及び出力手段を有し、（２）前記指向性形成手段は、複数のマイクロホンアレイから供給される入力信号のそれぞれについて、ビームフォーマによって目的エリアが存在する目的エリア方向へ指向性を形成して、前記マイクロホンアレイごとに前記目的エリア方向からの目的方向信号を取得し、（３）前記目的エリア音抽出手段は、２つの前記マイクロホンアレイの組み合わせで形成されるマイクロホンアレイセットを３つ以上設定し、それぞれの前記マイクロホンアレイセットについて、それぞれの前記マイクロホンアレイの前記目的方向信号をスペクトル減算することで前記目的エリア方向に存在する非目的エリア音を抽出し、抽出した前記非目的エリア音をいずれかの前記目的方向信号からスペクトル減算することにより目的エリア音を抽出し、（４）前記出力手段は、それぞれの前記マイクロホンアレイセットを用いて抽出された前記目的エリア音を比較した結果に基づき、それぞれの前記マイクロホンアレイセットを用いて抽出された前記目的エリア音から１つの出力信号を生成して出力することを特徴とする。 According to a third aspect of the present invention, in the sound collecting method performed by the sound collecting device, (1) the sound collecting device has a directivity forming means, a target area sound extracting means, and an output means, and (2) the directivity. The forming means forms a directivity toward the target area where the target area exists by the beam former for each of the input signals supplied from the plurality of microphone arrays, and the target direction from the target area direction for each microphone array. The signal is acquired, and (3) the target area sound extraction means sets three or more microphone array sets formed by a combination of the two microphone arrays, and for each of the microphone array sets, the respective microphone array is set. The non-purpose area sound existing in the target area direction is extracted by subtracting the spectrum of the target direction signal, and the extracted non-purpose area sound is spectrally subtracted from any of the target direction signals to extract the target area sound. (4) The output means uses the respective microphone array sets to extract the target area sound based on the result of comparing the target area sounds extracted using the respective microphone array sets. It is characterized in that one output signal is generated from sound and output.

本発明によれば、非目的エリア音の到来方向に関わらず目的エリア音のみを収音することができる。 According to the present invention, only the target area sound can be picked up regardless of the direction of arrival of the non-target area sound.

第１の実施形態に係る収音装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the sound collecting apparatus which concerns on 1st Embodiment. 第１の実施形態に係る目的エリア周辺（マイクロホンアレイ装置を含む）の斜視図である。It is a perspective view of the area around the target area (including the microphone array device) which concerns on 1st Embodiment. 第１の実施形態に係るマイクロホンアレイ装置（マイクロホンアレイ）の構成について示した図である。It is a figure which showed the structure of the microphone array apparatus (microphone array) which concerns on 1st Embodiment. 第１の実施形態に係る目的エリア周辺（マイクロホンアレイ装置を含む）を上方向から見た図である。It is the figure which looked at the periphery of the target area (including the microphone array apparatus) which concerns on 1st Embodiment from the upper direction. 第１の実施形態に係る目的エリア周辺（マイクロホンアレイ装置を含む）を左方向から見た図である。It is the figure which looked at the periphery of the target area (including the microphone array apparatus) which concerns on 1st Embodiment from the left. 第１の実施形態に係る収音装置のハードウェア構成の例について示したブロック図である。It is a block diagram which showed the example of the hardware composition of the sound collecting apparatus which concerns on 1st Embodiment. 第２の実施形態に係る収音装置の構成を示すブロック図である。It is a block diagram which shows the structure of the sound collecting apparatus which concerns on 2nd Embodiment. 第２の実施形態に係るマイクロホンアレイ装置（マイクロホンアレイ）の構成について示した図である。It is a figure which showed the structure of the microphone array apparatus (microphone array) which concerns on 2nd Embodiment. 第３の実施形態に係る収音装置の構成を示すブロック図である。It is a block diagram which shows the structure of the sound collecting apparatus which concerns on 3rd Embodiment. 第２の実施形態の変形例に係るマイクロホンアレイ装置の構成例について示した図である。It is a figure which showed the configuration example of the microphone array apparatus which concerns on the modification of 2nd Embodiment. 従来の減算型ＢＦの構成を示すブロック図である。It is a block diagram which shows the structure of the conventional subtraction type BF. 従来の減算型ＢＦにより形成される指向性フィルタの例について示した説明図である。It is explanatory drawing which showed the example of the directivity filter formed by the conventional subtraction type BF.

（Ａ）第１の実施形態
以下、本発明による収音装置、収音プログラム及び収音方法の第１の実施形態について図面を参照して説明する。 (A) First Embodiment Hereinafter, the first embodiment of the sound collecting device, the sound collecting program, and the sound collecting method according to the present invention will be described with reference to the drawings.

（Ａ−１）第１の実施形態の構成
図１は、第１の実施形態に係る収音装置１０の機能的構成を示すブロック図である。 (A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing a functional configuration of the sound collecting device 10 according to the first embodiment.

収音装置１０は、複数のマイクロホンアレイを備えるマイクロホンアレイ装置ＭＥを用いて、目的エリアの音源からの目的エリア音を収音する目的エリア音収音処理を行う。この実施形態では、４つのマイクロホンアレイＭＡ（ＭＡＬ、ＭＡＲ、ＭＡＵ、ＭＡＢ）を有しているものとする。 The sound picking device 10 uses a microphone array device ME including a plurality of microphone arrays to perform a target area sound picking process for picking up a target area sound from a sound source in the target area. In this embodiment, it is assumed that the microphone array MA (MAL, MAR, MAU, MAB) is provided.

次に、マイクロホンアレイ装置ＭＥの構成について図２、図３を用いて説明する。 Next, the configuration of the microphone array device ME will be described with reference to FIGS. 2 and 3.

図２は、マイクロホンアレイ装置ＭＥを構成する各マイクロホンアレイの配置構成及び目的エリアの設定（設計）について示した図である。 FIG. 2 is a diagram showing an arrangement configuration of each microphone array constituting the microphone array device ME and a setting (design) of a target area.

図２では、マイクロホンアレイ装置ＭＥを構成するマイクロホンアレイが配置された目的エリア周辺を斜め上方向からみた図（斜視図）となっている。 FIG. 2 is a view (perspective view) of the periphery of the target area in which the microphone array constituting the microphone array device ME is arranged, viewed from an obliquely upward direction.

図２に示すように、マイクロホンアレイＭＡＬ、ＭＡＲ、ＭＡＵ、ＭＡＢは、目的エリアが存在する空間の任意の位置に配置される。目的エリアに対するマイクロホンアレイの位置は、各マイクロホンアレイの指向性が目的エリアでのみ重なればどこでも良い。 As shown in FIG. 2, the microphone arrays MAL, MAR, MAU, and MAB are arranged at arbitrary positions in the space where the target area exists. The position of the microphone array with respect to the target area may be anywhere as long as the directivity of each microphone array overlaps only in the target area.

この実施形態では、図２に示す通り、収音装置１０により収音対象となる目的エリア（収音エリア）は任意の空間内の平面ＤＨ上に設定されているものとする。図２では、平面ＤＨ上に設定された目的エリアにＴＡという符号を付し、目的エリアの中心位置にＰＣという符号を付している。 In this embodiment, as shown in FIG. 2, it is assumed that the target area (sound collection area) to be sound-collected by the sound-collecting device 10 is set on a flat surface DH in an arbitrary space. In FIG. 2, the target area set on the plane DH is designated by TA, and the center position of the target area is designated by PC.

図２では、目的エリア周辺の空間について平面ＤＨに設定された原点Ｐ０を基準とする座標系が設定されている。図２では、図２の方向から見て右方向が＋Ｘ方向、左方向が−Ｘ方向、手前側の方向が−Ｙ方向、奥側の方向が＋Ｙ方向、上方向が＋Ｚ方向、下方向が−Ｚ方向となっている。以下では、目的エリア周辺の空間を示す場合（Ｘ、Ｙ、Ｚ）の形式で示すものとする。 In FIG. 2, a coordinate system based on the origin P0 set on the plane DH is set for the space around the target area. In FIG. 2, when viewed from the direction of FIG. 2, the right direction is the + X direction, the left direction is the -X direction, the front side direction is the -Y direction, the back side direction is the + Y direction, the upward direction is the + Z direction, and the downward direction is the downward direction. It is in the −Z direction. In the following, it is assumed that the space around the target area is shown in the form of (X, Y, Z).

ここでは、図２に示すように、目的エリアの中心位置ＰＣは、原点Ｐ０から＋Ｘ方向（図２の方から見て右側）に配置されているものとする。また、ここでは、図２に示すように、マイクロホンアレイＭＡＬは原点Ｐ０から−Ｙ方向（図２の方から見て手前側）、マイクロホンアレイＭＡＲは原点Ｐ０から＋Ｙ方向（図２の方から見て奥側）、マイクロホンアレイＭＡＵは原点Ｐ０から＋Ｚ方向（図２の方から見て上側）、マイクロホンアレイＭＡＤは原点Ｐ０から−Ｚ方向（図２の方から見て下側）にそれぞれ配置されているものとする。 Here, as shown in FIG. 2, it is assumed that the center position PC of the target area is arranged in the + X direction (on the right side when viewed from the direction of FIG. 2) from the origin P0. Further, here, as shown in FIG. 2, the microphone array MAL is in the −Y direction from the origin P0 (front side when viewed from the direction of FIG. 2), and the microphone array MAR is in the + Y direction from the origin P0 (viewed from the direction of FIG. 2). The microphone array MAU is located in the + Z direction from the origin P0 (upper side when viewed from FIG. 2), and the microphone array MAD is located in the −Z direction (lower side when viewed from FIG. 2) from the origin P0. It is assumed that

図３は、目的エリアＴＡの中心位置から原点Ｐ０への方向（−Ｘ方向；以下、「正面方向」とも呼ぶ）を見た場合における各マイクロホンアレイの配置構成について示した図である。 FIG. 3 is a diagram showing an arrangement configuration of each microphone array when the direction from the center position of the target area TA to the origin P0 (−X direction; hereinafter, also referred to as “front direction”) is viewed.

図２、図３に示すように、マイクロホンアレイＭＡＬ、ＭＡＲ、ＭＡＵ、ＭＡＢは、指向性が目的エリアＴＡでのみ重なるように配置されている。また、図２、図３に示すように、各マイクロホンアレイＭＡは２つ以上のマイクロホンＭから構成され、各マイクロホンＭにより音響信号を収音する。 As shown in FIGS. 2 and 3, the microphone arrays MAL, MAR, MAU, and MAB are arranged so that the directivity overlaps only in the target area TA. Further, as shown in FIGS. 2 and 3, each microphone array MA is composed of two or more microphones M, and each microphone M collects an acoustic signal.

この実施形態では、図３に示すように、収音装置１０は、目的エリアＴＡ（中心位置ＰＣ）から見て左右方向（水平方向）に配置されたマイクロホンアレイＭＡＬ、ＭＡＲの組と、目的エリアＴＡ（中心位置ＰＣ）から見て上下方向（垂直方向）に配置されたマイクロホンアレイＭＡＵ、ＭＡＢの組とで分けて目的エリアＴＡに対する収音処理を行うものとする。以下では、説明を簡易とするため、収音処理において組（セット；組み合わせ）として扱われる２つのマイクロホンアレイを「マイクロホンアレイセット」と呼ぶものとする。この実施形態では、マイクロホンアレイＭＡＬ、ＭＡＲの組をマイクロホンアレイセットＭＳＬＲと呼び、マイクロホンアレイＭＡＵ、ＭＡＢの組をマイクロホンアレイセットＭＳＵＢと呼ぶものとする。 In this embodiment, as shown in FIG. 3, the sound collecting device 10 includes a set of microphone arrays MAL and MAR arranged in the left-right direction (horizontal direction) when viewed from the target area TA (center position PC) and the target area. It is assumed that the sound collection processing for the target area TA is performed separately for the set of the microphone arrays MAU and MAB arranged in the vertical direction (vertical direction) when viewed from the TA (center position PC). In the following, for the sake of simplicity, the two microphone arrays treated as a set in the sound collection process will be referred to as a "microphone array set". In this embodiment, the set of the microphone array MAL and MAR is referred to as the microphone array set MSLR, and the set of the microphone array MAU and MAB is referred to as the microphone array set MSUB.

この実施形態では、各マイクロホンアレイに、音響信号を収音する２つのマイクロホンＭ１、Ｍ２が配置されるものとして説明する。すなわち、この実施形態において、各マイクロホンアレイＭＡは、２ｃｈマイクロホンアレイを構成しているものとする。２個のマイクロホンＭ１、Ｍ２の間の距離は限定されないものであるが、この実施形態の例では、２個のマイクロホンＭ１、Ｍ２の間の距離は３ｃｍとする。 In this embodiment, it is assumed that two microphones M1 and M2 for collecting acoustic signals are arranged in each microphone array. That is, in this embodiment, it is assumed that each microphone array MA constitutes a 2ch microphone array. The distance between the two microphones M1 and M2 is not limited, but in the example of this embodiment, the distance between the two microphones M1 and M2 is 3 cm.

この実施形態では、マイクロホンアレイＭＡＬ、ＭＡＲは、平面ＤＨ上で、指向性が目的エリアＴＡでのみ重なるように配置されていればどこでも良く、例えば目的エリアＴＡを挟んで対向に配置しても良い。 In this embodiment, the microphone arrays MAL and MAR may be arranged anywhere on the plane DH so that the directivity overlaps only in the target area TA, and may be arranged opposite to each other with the target area TA in between, for example. ..

図４は目的エリアＴＡ（中心位置ＰＣ）周辺を上方向（＋Ｚ方向）から見た場合の図である。 FIG. 4 is a view when the periphery of the target area TA (center position PC) is viewed from above (+ Z direction).

図５は、目的エリアＴＡ（中心位置ＰＣ）周辺を、中心位置ＰＣから見て左方向（−Ｙ方向）から見た場合の図である。図４、図５では、各マイクロホンアレイから正面方向（マイクロホンＭ１、Ｍ２を結ぶ線と直交する方向；指向性の方向）に直線（点線の矢印）を付している。 FIG. 5 is a view when the periphery of the target area TA (center position PC) is viewed from the left direction (−Y direction) when viewed from the center position PC. In FIGS. 4 and 5, a straight line (dotted arrow) is attached to the front direction (direction orthogonal to the line connecting the microphones M1 and M2; directivity direction) from each microphone array.

ここでは、図４、図５に示すように、各マイクロホンアレイから正面方向に延びる直線（点線の矢印）は、目的エリアＴＡの中心位置ＰＣで交差するように調整されているものとする。 Here, as shown in FIGS. 4 and 5, it is assumed that the straight lines (dotted arrows) extending in the front direction from each microphone array are adjusted so as to intersect at the center position PC of the target area TA.

次に、図１を用いて、収音装置１０の内部構成について説明する。 Next, the internal configuration of the sound collecting device 10 will be described with reference to FIG.

図１に示すように、収音装置１０は、信号入力部１１、入力レベル補正部１２、指向性形成部１３、空間座標データ記憶部１４、遅延補正部１５、補正係数算出部１６、目的エリア音抽出部１７、及び目的エリア音選択部１８を有している。 As shown in FIG. 1, the sound collecting device 10 includes a signal input unit 11, an input level correction unit 12, a directivity forming unit 13, a spatial coordinate data storage unit 14, a delay correction unit 15, a correction coefficient calculation unit 16, and a target area. It has a sound extraction unit 17 and a target area sound selection unit 18.

そして、信号入力部１１、指向性形成部１３、遅延補正部１５、補正係数算出部１６、目的エリア音抽出部１７は、それぞれ２つのマイクロホンアレイセットＭＳＬＲ、ＭＳＵＢに対応する処理を行う構成要素を有している。具体的には、信号入力部１１、指向性形成部１３、遅延補正部１５、補正係数算出部１６、目的エリア音抽出部１７は、それぞれ信号入力処理部１１１（１１１Ａ、１１１Ｂ）、指向性形成処理部１３１（１３１Ａ、１３１Ｂ）、遅延補正処理部１５１（１５１Ａ、１５１Ｂ）、補正係数算出処理部１６１（１６１Ａ、１６１Ｂ）、及び目的エリア音抽出処理部１７１（１７１Ａ、１７１Ｂ）を有している。 Then, the signal input unit 11, the directivity forming unit 13, the delay correction unit 15, the correction coefficient calculation unit 16, and the target area sound extraction unit 17 form components that perform processing corresponding to the two microphone array sets MSLR and MSUB, respectively. Have. Specifically, the signal input unit 11, the directivity forming unit 13, the delay correction unit 15, the correction coefficient calculation unit 16, and the target area sound extraction unit 17 are the signal input processing unit 111 (111A, 111B) and the directivity formation, respectively. It has a processing unit 131 (131A, 131B), a delay correction processing unit 151 (151A, 151B), a correction coefficient calculation processing unit 161 (161A, 161B), and a target area sound extraction processing unit 171 (171A, 171B). ..

ここでは、信号入力処理部１１１Ａ、指向性形成処理部１３１Ａ、遅延補正処理部１５１Ａ、補正係数算出処理部１６１Ａ、目的エリア音抽出処理部１７１Ａは、それぞれマイクロホンアレイセットＭＳＬＲに対応する処理を行う構成要素であるものとする。また、ここでは、信号入力処理部１１１Ｂ、指向性形成処理部１３１Ｂ、遅延補正処理部１５１Ｂ、補正係数算出処理部１６１Ｂ、目的エリア音抽出処理部１７１Ｂは、それぞれマイクロホンアレイセットＭＳＵＢに対応する処理を行う構成要素であるものとする。 Here, the signal input processing unit 111A, the directivity forming processing unit 131A, the delay correction processing unit 151A, the correction coefficient calculation processing unit 161A, and the target area sound extraction processing unit 171A each perform processing corresponding to the microphone array set MSLR. It shall be an element. Further, here, the signal input processing unit 111B, the directivity formation processing unit 131B, the delay correction processing unit 151B, the correction coefficient calculation processing unit 161B, and the target area sound extraction processing unit 171B each perform processing corresponding to the microphone array set MSUB. It shall be a component to be performed.

次に、図２を用いて、収音装置１０のハードウェア構成について説明する。 Next, the hardware configuration of the sound collecting device 10 will be described with reference to FIG.

図２は、収音装置１０のハードウェア構成の例について示したブロック図である。 FIG. 2 is a block diagram showing an example of a hardware configuration of the sound collecting device 10.

収音装置１０は、全てハードウェア（例えば、専用チップ等）により構成するようにしてもよいし一部又は全部についてソフトウェア（プログラム）として構成するようにしてもよい。収音装置１０は、例えば、プロセッサ及びメモリを有するコンピュータにプログラム（実施形態の収音プログラムを含む）をインストールすることにより構成するようにしてもよい。 The sound collecting device 10 may be configured entirely by hardware (for example, a dedicated chip or the like), or may be partially or entirely configured as software (program). The sound collecting device 10 may be configured by installing a program (including the sound collecting program of the embodiment) on a computer having a processor and a memory, for example.

図２では、収音装置１０を、ソフトウェア（コンピュータ）を用いて構成する際のハードウェア構成の例について示している。 FIG. 2 shows an example of a hardware configuration when the sound collecting device 10 is configured by using software (computer).

図２に示す収音装置１０は、ハードウェア的な構成要素として、プログラム（実施形態の収音プログラムを含む）がインストールされたコンピュータ２００を有している。また、コンピュータ２００は、収音プログラム専用のコンピュータとしてもよいし、他の機能のプログラムと共用される構成としてもよい。 The sound collecting device 10 shown in FIG. 2 has a computer 200 in which a program (including the sound collecting program of the embodiment) is installed as a hardware component. Further, the computer 200 may be a computer dedicated to a sound collecting program, or may be configured to be shared with a program having another function.

図２に示すコンピュータ２００は、プロセッサ２０１、一次記憶部２０２、及び二次記憶部２０３を有している。一次記憶部２０２は、プロセッサ２０１の作業用メモリ（ワークメモリ）として機能する記憶手段であり、例えば、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の高速動作するメモリを適用することができる。二次記憶部２０３は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）やプログラムデータ（実施形態に係る収音プログラムのデータを含む）等の種々のデータを記録する記憶手段であり、例えば、ＦＬＡＳＨメモリやＨＤＤ等の不揮発性メモリを適用することができる。この実施形態のコンピュータ２００では、プロセッサ２０１が起動する際、二次記憶部２０３に記録されたＯＳやプログラム（実施形態に係る収音プログラムを含む）を読み込み、一次記憶部２０２上に展開して実行する。 The computer 200 shown in FIG. 2 has a processor 201, a primary storage unit 202, and a secondary storage unit 203. The primary storage unit 202 is a storage means that functions as a working memory (work memory) of the processor 201, and for example, a memory that operates at high speed such as a DRAM (Dynamic Random Access Memory) can be applied. The secondary storage unit 203 is a storage means for recording various data such as an OS (Operating System) and program data (including data of a sound collecting program according to an embodiment), and is, for example, a non-volatile memory such as a FLASH memory or an HDD. Sexual memory can be applied. In the computer 200 of this embodiment, when the processor 201 is started, the OS and programs (including the sound collecting program according to the embodiment) recorded in the secondary storage unit 203 are read and expanded on the primary storage unit 202. Execute.

なお、コンピュータ２００の具体的な構成は図２の構成に限定されないものであり、種々の構成を適用することができる。例えば、一次記憶部２０２が不揮発メモリ（例えば、ＦＬＡＳＨメモリ等）であれば、二次記憶部２０３については除外した構成としてもよい。 The specific configuration of the computer 200 is not limited to the configuration shown in FIG. 2, and various configurations can be applied. For example, if the primary storage unit 202 is a non-volatile memory (for example, FLASH memory or the like), the secondary storage unit 203 may be excluded.

（Ａ−２）第１の実施形態の動作
次に、以上のような構成を有する第１の実施形態の収音装置１０の動作を説明する。 (A-2) Operation of the First Embodiment Next, the operation of the sound collecting device 10 of the first embodiment having the above configuration will be described.

信号入力部１１は、各マイクロホンアレイで収音した音響信号をアナログ信号からデジタル信号に変換し入力する処理を行う。信号入力部１１は、その後、例えば高速フーリエ変換を用いて入力信号（デジタル信号）を、時間領域から周波数領域へ変換する。 The signal input unit 11 performs a process of converting an acoustic signal picked up by each microphone array from an analog signal to a digital signal and inputting the sound signal. The signal input unit 11 then converts the input signal (digital signal) from the time domain to the frequency domain by using, for example, a fast Fourier transform.

上述の通り、信号入力処理部１１１ＡがマイクロホンアレイセットＭＳＬＲ（マイクロホンアレイＭＡＬ、ＭＡＲ）の信号処理を行い、信号入力処理部１１１ＢがマイクロホンアレイセットＭＳＵＢ（マイクロホンアレイＭＡＵ、ＭＡＢ）の信号処理を行う。以下では、各マイクロホンアレイにおいて、マイクロホンＭ１、Ｍ２の周波数領域の入力信号を、それぞれＸ_１、Ｘ_２として説明する。 As described above, the signal input processing unit 111A performs signal processing of the microphone array set MSLR (microphone array MAL, MAR), and the signal input processing unit 111B performs signal processing of the microphone array set MSUB (microphone array MAU, MAB). Hereinafter, in each microphone array input signals in the frequency domain of the microphones M1, M2, described as _X 1, _{X 2,} respectively.

入力レベル補正部１２は、マイクロホンアレイＭＡＬ、ＭＡＲ、ＭＡＵ、ＭＡＢから供給された各入力信号のレベルが全て同じ大きさになるように補正する。入力レベル補正部１２は、例えば、例えば空間座標データ記憶部１４から目的エリアの中心位置ＰＣと各マイクロホンアレイとの距離を取得し、各マイクロホンアレイの距離の差から、各マイクロホンアレイの距離減衰を算出し、最も目的エリアの中心位置ＴＣに近いマイクロホンアレイを目的エリア音を基準として設定し、他のマイクロホンアレイの入力信号を基準のマイクロホンアレイと合わせるように補正するようにしてもよい。もしくは、予め目的エリアＴＡの中心位置ＰＣに図示しないスピーカを設置し、その図示しないスピーカからホワイトノイズを再生して各マイクロホンアレイで収録し、各マイクロホンアレイで収録されるホワイトノイズのレベル差を算出（各マイクロホンアレイで収録されるホワイトノイズのレベルを比較）し、最も大きなホワイトノイズのレベルが収録されたマイクロホンアレイを基準として、他のマイクロホンアレイの入力信号のレベルを補正するようにしてもよい。なお、目的エリアＴＡの中心位置ＰＣから全てのマイクロホンアレイが等距離にある場合は、入力レベル補正部１２の処理を省略するようにしてもよい。 The input level correction unit 12 corrects the levels of the input signals supplied from the microphone arrays MAL, MAR, MAU, and MAB so that they all have the same magnitude. For example, the input level correction unit 12 acquires the distance between the center position PC of the target area and each microphone array from the space coordinate data storage unit 14, and calculates the distance attenuation of each microphone array from the difference in the distance of each microphone array. The microphone array that is calculated and is closest to the center position TC of the target area may be set with reference to the target area sound, and the input signals of other microphone arrays may be corrected so as to match the reference microphone array. Alternatively, a speaker (not shown) is installed in advance in the center position PC of the target area TA, white noise is reproduced from the speaker (not shown) and recorded in each microphone array, and the level difference of the white noise recorded in each microphone array is calculated. (Compare the level of white noise recorded in each microphone array), and the level of the input signal of other microphone arrays may be corrected based on the microphone array in which the largest level of white noise is recorded. .. If all the microphone arrays are equidistant from the center position PC of the target area TA, the processing of the input level correction unit 12 may be omitted.

指向性形成部１３は、マイクロホンアレイ毎に入力信号に対し、（４）式に従いＢＦにより目的エリア方向に指向性を形成する。上述の通り、指向性形成処理部１３１ＡがマイクロホンアレイセットＭＳＬＲ（マイクロホンアレイＭＡＬ、ＭＡＲ）の信号処理を行い、指向性形成処理部１３１ＢがマイクロホンアレイセットＭＳＵＢ（マイクロホンアレイＭＡＵ、ＭＡＢ）の信号処理を行う。 The directivity forming unit 13 forms directivity in the direction of the target area by BF according to the equation (4) with respect to the input signal for each microphone array. As described above, the directivity forming processing unit 131A performs signal processing of the microphone array set MSLR (microphone array MAL, MAR), and the directivity forming processing unit 131B performs signal processing of the microphone array set MSUB (microphone array MAU, MAB). Do.

以下では、各マイクロホンアレイセットにおいて、任意の一方のマイクロホンアレイのＢＦ出力をＹ_１ｋ（ｎ）とし、他方のマイクロホンアレイのＢＦ出力をＹ_２ｋ（ｎ）として各式が適用されるものとして説明する。例えば、マイクロホンアレイＭＡＬ、ＭＡＲのＢＦ出力の振幅スペクトルを、それぞれＹ_１ｋ（ｎ）、Ｙ_２ｋ（ｎ）として各式に適用し、マイクロホンアレイＭＡＵ、ＭＡＢのＢＦ出力の振幅スペクトルを、それぞれＹ_１ｋ（ｎ）、Ｙ_２ｋ（ｎ）として各式に適用するようにしてもよい。 Hereinafter, in each microphone array set, it is assumed that the BF output of any one microphone array is Y _1k (n) and the BF output of the other microphone array is Y _2k (n). .. For example, the amplitude spectra of the BF outputs of the microphone arrays MAL and MAR _{are applied to each equation as Y 1k} (n) and Y _2k (n), respectively, and the amplitude spectra of the BF outputs of the microphone arrays MAU and MAB are Y _{1k, respectively.} It may be applied to each equation as (n) and Y _{2k (n).}

空間座標データ記憶部１４は、全ての目的エリアと各マイクロホンアレイを構成するマイクロホンの位置情報を保持する。 The spatial coordinate data storage unit 14 holds all the target areas and the position information of the microphones constituting each microphone array.

遅延補正部１５は、目的エリアＴＡ（中心位置ＰＣ）と各マイクロホンアレイの距離の違いにより発生する遅延を算出し、補正する。遅延補正部１５は、まず空間座標データ記憶部１４から目的エリアＴＡの位置（中心位置ＰＣ）と各マイクロホンアレイの位置を取得し、各マイクロホンアレイへの目的エリア音の到達時間の差を算出する。次に、遅延補正部１５は、最も目的エリアから遠い位置に配置されたマイクロホンアレイを基準として、全てのマイクロホンアレイに目的エリア音が同時に到達するように遅延を加える。上述の通り、遅延補正処理部１５１ＡがマイクロホンアレイセットＭＳＬＲ（マイクロホンアレイＭＡＬ、ＭＡＲ）の信号処理を行い、遅延補正処理部１５１ＢがマイクロホンアレイセットＭＳＵＢ（マイクロホンアレイＭＡＵ、ＭＡＢ）の信号処理を行う。なお、目的エリアＴＡの中心位置ＰＣから全てのマイクロホンアレイが等距離にある場合は、遅延補正部１５の処理を省略するようにしてもよい。 The delay correction unit 15 calculates and corrects the delay generated due to the difference in the distance between the target area TA (center position PC) and each microphone array. The delay correction unit 15 first acquires the position of the target area TA (center position PC) and the position of each microphone array from the spatial coordinate data storage unit 14, and calculates the difference in the arrival time of the target area sound to each microphone array. .. Next, the delay correction unit 15 adds a delay so that the target area sound reaches all the microphone arrays at the same time with reference to the microphone array arranged at the position farthest from the target area. As described above, the delay correction processing unit 151A performs signal processing of the microphone array set MSLR (microphone array MAL, MAR), and the delay correction processing unit 151B performs signal processing of the microphone array set MSUB (microphone array MAU, MAB). If all the microphone arrays are equidistant from the center position PC of the target area TA, the processing of the delay correction unit 15 may be omitted.

補正係数算出部１６は、各マイクロホンアレイセットについて、各ＢＦ出力に含まれる目的エリア音成分の振幅スペクトルを同じにするための補正係数を算出する。補正係数算出部１６は、「（５）式、（６）式」または「（７）式、（８）式」に従い補正係数を算出する。なお、ここでは、マイクロホンアレイＭＡＬ、ＭＡＲのＢＦ出力に対する補正係数をα_１（ｎ）、α_２（ｎ）として説明する。また、ここでは、マイクロホンアレイＭＡＵ、ＭＡＢのＢＦ出力に対する補正係数を、α_１（ｎ）、α_２（ｎ）として説明する。例えば、マイクロホンアレイセットＭＳＬＲにおいて、主マイクロホンアレイ（目的エリア音の抽出処理の際に基準となるマイクロホンアレイ）がマイクロホンアレイＭＡＬに設定される場合は、「（６）式、（８）式」により振幅スペクトル補正係数α_１（ｎ）が算出され、必要に応じて「（５）式、（７）式」によりα_２（ｎ）が算出される。 The correction coefficient calculation unit 16 calculates, for each microphone array set, a correction coefficient for making the amplitude spectrum of the target area sound component included in each BF output the same. The correction coefficient calculation unit 16 calculates the correction coefficient according to "Equation (5), (6)" or "Equation (7), Eq. (8)". Here, the correction coefficients for the BF output of the microphone arrays MAL and MAR will be described as _{α 1} (n) and α _{2 (n).} Further, here, the correction coefficients for the BF output of the microphone arrays MAU and MAB will be described as α ₁ (n) and α ₂ (n). For example, in the microphone array set MSLR, when the main microphone array (the microphone array that serves as a reference in the extraction processing of the target area sound) is set to the microphone array MAL, the equations (6) and (8) are used. The amplitude spectrum correction coefficient α ₁ _{(n) is calculated, and α 2} (n) is calculated by the “formulas (5) and (7)” as needed.

また、ここでは、マイクロホンアレイセットＭＳＬＲ、ＭＳＵＢのそれぞれについて、いずれかのマイクロホンアレイが主マイクロホンアレイとして設定されているものとする。なお、収音装置１０において、マイクロホンアレイセットから主マイクロホンアレイを選択する処理の具体的な手法については限定されないものである。さらに、上述の通り、補正係数算出部１６では、補正係数算出処理部１６１ＡがマイクロホンアレイセットＭＳＬＲ（マイクロホンアレイＭＡＬ、ＭＡＲ）に係る処理を行い、補正係数算出処理部１６１ＢがマイクロホンアレイセットＭＳＵＢ（マイクロホンアレイＭＡＵ、ＭＡＢ）に係る処理を行う。 Further, here, it is assumed that one of the microphone arrays is set as the main microphone array for each of the microphone array sets MSLR and MSUB. In the sound collecting device 10, the specific method of the process of selecting the main microphone array from the microphone array set is not limited. Further, as described above, in the correction coefficient calculation unit 16, the correction coefficient calculation processing unit 161A performs the processing related to the microphone array set MSLR (microphone array MAL, MAR), and the correction coefficient calculation processing unit 161B performs the processing related to the microphone array set MSUB (microphone). Performs processing related to array MAU, MAB).

目的エリア音抽出部１７は、マイクロホンアレイセットＭＳＬＲ、ＭＳＵＢのそれぞれについて、主マイクロホンアレイ（目的エリア音の抽出処理の際に基準となるマイクロホンアレイ）を基準として目的エリア音を抽出する。 The target area sound extraction unit 17 extracts the target area sound for each of the microphone array sets MSLR and MSUB with reference to the main microphone array (the microphone array that serves as a reference during the extraction process of the target area sound).

例えば、マイクロホンアレイセットＭＳＬＲにおいて、主マイクロホンアレイがマイクロホンアレイＭＡＬに設定される場合、目的エリア音抽出部１７は、補正係数算出部１６で算出した補正係数α_２（ｎ）により各ＢＦ出力を（９）式に従いＳＳし、目的エリア方向に存在する非目的エリア音を抽出する。さらに、目的エリア音抽出部１７は、抽出した非目的エリア音を各ＢＦの出力から（１１）式に従いＳＳすることにより目的エリア音を抽出する。また、例えば、マイクロホンアレイセットＭＳＬＲにおいて、主マイクロホンアレイがマイクロホンアレイＭＡＲに設定される場合、目的エリア音抽出部１７は、補正係数α_１（ｎ）により各ＢＦ出力を（１０）式に従い目的エリア方向に存在する非目的エリア音を抽出する。さらに、目的エリア音抽出部１７は、抽出した非目的エリア音を各ＢＦの出力から（１２）式に従い目的エリア音を抽出する。 For example, in the microphone array set MSLR, when the main microphone array is set to the microphone array MAL, the target area sound extraction unit 17 outputs each BF output by _{the correction coefficient α 2 (n) calculated by the correction coefficient calculation unit 16.} 9) SS is performed according to the equation, and the non-purpose area sound existing in the direction of the target area is extracted. Further, the target area sound extraction unit 17 extracts the target area sound by SSing the extracted non-purpose area sound from the output of each BF according to the equation (11). Further, for example, in the microphone array set MSLR, when the main microphone array is set to the microphone array MAR, the target area sound extraction unit 17 outputs each BF output according to the equation (10) according to _{the correction coefficient α 1 (n) to the target area.} Extract the non-purpose area sound that exists in the direction. Further, the target area sound extraction unit 17 extracts the extracted non-purpose area sound from the output of each BF according to the equation (12).

なお、上述の通り、目的エリア音抽出部１７では、目的エリア音抽出処理部１７１ＡがマイクロホンアレイセットＭＳＬＲ（マイクロホンアレイＭＡＬ、ＭＡＲ）の信号処理を行い、目的エリア音抽出処理部１７１ＢがマイクロホンアレイセットＭＳＵＢ（マイクロホンアレイＭＡＵ、ＭＡＢ）の信号処理を行う。 As described above, in the target area sound extraction unit 17, the target area sound extraction processing unit 171A performs signal processing of the microphone array set MSLR (microphone array MAL, MAR), and the target area sound extraction processing unit 171B performs the microphone array set. Signal processing of MSUB (microphone array MAU, MAB) is performed.

目的エリア音選択部１８は、目的エリア音抽出部１７Ａで抽出された目的エリア音と、目的エリア音抽出部１７Ｂで抽出された目的エリア音を比較し、いずれかの目的エリア音を選択して最終的な目的エリア音（出力信号）として出力する出力手段である。目的エリア音選択部１８が、２つの目的エリア音からいずれかを選択する方法については限定されないものであるが、例えば、平均振幅スペクトルに基づいて選択するようにしてもよい。例えば、目的エリア音選択部１８は、２つの目的エリア音についてそれぞれ平均振幅スペクトルを算出して値を比較し、値が小さい方を最終的に選択して出力するようにしてもよい。 The target area sound selection unit 18 compares the target area sound extracted by the target area sound extraction unit 17A with the target area sound extracted by the target area sound extraction unit 17B, and selects one of the target area sounds. It is an output means that outputs as a final target area sound (output signal). The method of selecting one of the two target area sounds by the target area sound selection unit 18 is not limited, but may be selected based on, for example, the average amplitude spectrum. For example, the target area sound selection unit 18 may calculate average amplitude spectra for each of the two target area sounds, compare the values, and finally select and output the smaller value.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、以下のような効果を奏することができる。 (A-3) Effect of First Embodiment According to the first embodiment, the following effects can be obtained.

第１の実施形態の収音装置１０では、目的エリア正面の左右方向に設置したマイクロホンアレイＭＡＬ、ＭＡＲ（マイクロホンアレイセットＭＳＬＲ）と、上下方向に設置したマイクロホンアレイＭＡＵ、ＭＵＢ（マイクロホンアレイセットＭＳＵＢ）のそれぞれで目的エリア音を抽出し、抽出した目的エリア音の内、音量レベルが小さい方（平均振幅スペクトルが小さい方）を選択して出力している。これにより、第１の実施形態の収音装置１０では、目的エリアの上下左右方向に非目的エリア音が存在しても収音せず、空中に浮かんだ目的エリア内の音を安定的に収音することができる。 In the sound collecting device 10 of the first embodiment, the microphone arrays MAL and MAR (microphone array set MSLR) installed in the left-right direction in front of the target area and the microphone arrays MAU and MUB (microphone array set MSUB) installed in the vertical direction are used. The target area sound is extracted for each of the above, and the one with the lower volume level (the one with the smaller average amplitude spectrum) is selected and output from the extracted target area sounds. As a result, the sound collecting device 10 of the first embodiment does not collect sound even if non-purpose area sound exists in the vertical and horizontal directions of the target area, and stably collects the sound in the target area floating in the air. Can make a sound.

また、これにより、第１の実施形態の収音装置１０では、雑音混入による不快感の低減や、音声認識率の向上が期待できる。上下と左右それぞれのマイクロホンアレイの組合せで抽出した目的エリア音を選択することは、上下および左右のマイクロホンアレイによる２つの収音エリアが重なった部分を収音していることになる。つまり、第１の実施形態の収音装置１０では、マイクロホンアレイの設置位置と角度を変更することにより、任意の空中に浮かんだエリア内の音を収音することが可能となる。 Further, as a result, the sound collecting device 10 of the first embodiment can be expected to reduce discomfort due to noise mixing and improve the voice recognition rate. Selecting the target area sound extracted by the combination of the upper and lower microphone arrays and the left and right microphone arrays means that the sound is picked up at the overlapping portion of the two sound collecting areas by the upper and lower microphone arrays and the left and right microphone arrays. That is, in the sound collecting device 10 of the first embodiment, it is possible to collect sound in an arbitrary floating area by changing the installation position and angle of the microphone array.

例えば、目的エリアの上方向の天井に大音量でアナウンス等の音が再生されるスピーカが配置されている場合、左右のマイクロホンアレイＭＡＬ、ＭＡＲ（マイクロホンアレイセットＭＳＬＲ）で抽出した目的エリア音にスピーカ音が含まれるが、上下のマイクロホンアレイＭＡＵ、ＭＵＢ（マイクロホンアレイセットＭＳＵＢ）で抽出した目的エリア音には含まれない。この場合、スピーカ音が含まれる分、左右のマイクロホンアレイで抽出した目的エリア音の方が、上下のマイクロホンアレイで抽出した目的エリア音よりも音量レベルが大きい。つまり、第１の実施形態の収音装置１０では、音量レベルが小さい上下のマイクロホンアレイで抽出した目的エリア音を選択すれば、その目的エリア音にはスピーカ音は含まれないことになる。さらに、目的エリア外で人が話している場合、第１の実施形態の収音装置１０では左右のマイクロホンアレイで抽出した目的エリア音が選択され、目的エリア外の人の声は収音しない。 For example, when a speaker that reproduces a loud sound such as an announcement is placed on the ceiling above the target area, the speaker is added to the target area sound extracted by the left and right microphone arrays MAL and MAR (microphone array set MSLR). Sound is included, but is not included in the target area sound extracted by the upper and lower microphone arrays MAU and MUB (microphone array set MSUB). In this case, the volume level of the target area sound extracted by the left and right microphone arrays is higher than that of the target area sound extracted by the upper and lower microphone arrays because the speaker sound is included. That is, in the sound collecting device 10 of the first embodiment, if the target area sound extracted by the upper and lower microphone arrays having a low volume level is selected, the speaker sound is not included in the target area sound. Further, when a person is speaking outside the target area, the sound collecting device 10 of the first embodiment selects the target area sound extracted by the left and right microphone arrays, and the voice of the person outside the target area is not picked up.

（Ｂ）第２の実施形態
以下、本発明による収音装置、収音プログラム及び収音方法の第２の実施形態について図面を参照して説明する。 (B) Second Embodiment Hereinafter, a second embodiment of the sound collecting device, the sound collecting program, and the sound collecting method according to the present invention will be described with reference to the drawings.

（Ｂ−１）第２の実施形態の構成
図７は、第２の実施形態の収音装置１０Ａの全体構成を示すブロック図である。図７では、上述の図１と同一又は対応する部分に、同一又は対応する符号を付している。以下では、第２の実施形態について第１の実施形態との差異を中心に説明する。 (B-1) Configuration of Second Embodiment FIG. 7 is a block diagram showing an overall configuration of the sound collecting device 10A of the second embodiment. In FIG. 7, the same or corresponding reference numerals are given to the same or corresponding parts as those in FIG. 1 described above. Hereinafter, the second embodiment will be described focusing on the differences from the first embodiment.

第２の実施形態の収音装置１０Ａでは、信号入力部１１、入力レベル補正部１２、指向性形成部１３、遅延補正部１５、補正係数算出部１６、目的エリア音抽出部１７、及び目的エリア音選択部１８が、それぞれ信号入力部１１Ａ、入力レベル補正部１２Ａ、指向性形成部１３Ａ、遅延補正部１５Ａ、補正係数算出部１６Ａ、目的エリア音抽出部１７Ａ、及び目的エリア音選択部１８Ａに置き換わっている。また、第２の実施形態の収音装置１０Ａでは、組合せ選択部１９が加わっている。 In the sound collecting device 10A of the second embodiment, the signal input unit 11, the input level correction unit 12, the directivity forming unit 13, the delay correction unit 15, the correction coefficient calculation unit 16, the target area sound extraction unit 17, and the target area The sound selection unit 18 is assigned to the signal input unit 11A, the input level correction unit 12A, the directivity forming unit 13A, the delay correction unit 15A, the correction coefficient calculation unit 16A, the target area sound extraction unit 17A, and the target area sound selection unit 18A, respectively. It has been replaced. Further, in the sound collecting device 10A of the second embodiment, the combination selection unit 19 is added.

図８は、第２の実施形態において、目的エリアＴＡの中心位置から原点Ｐ０への方向を見た場合における各マイクロホンアレイの配置構成について示した図である。 FIG. 8 is a diagram showing an arrangement configuration of each microphone array when the direction from the center position of the target area TA to the origin P0 is viewed in the second embodiment.

第２の実施形態においてマイクロホンアレイ装置ＭＥを構成する各マイクロホンアレイの物理的な構成（数や配置位置等）は第１の実施形態と同様であるが、収音装置１０Ａにおいて処理されるマイクロホンアレイセットの構成（論理的構成）が異なる。 In the second embodiment, the physical configuration (number, arrangement position, etc.) of each microphone array constituting the microphone array device ME is the same as that in the first embodiment, but the microphone array processed by the sound collecting device 10A. The set structure (logical structure) is different.

第１の実施形態では、左右方向（水平方向）に配置されたマイクロホンアレイＭＡＬ、ＭＡＲの組合せのマイクロホンアレイセットＭＳＬＲと、上下方向（垂直方向）に配置されたマイクロホンアレイＭＡＵ、ＭＡＢの組合せのマイクロホンアレイセットＭＳＵＢを論理的に構成していたが、第２の実施形態では対角上（斜め方向）のマイクロホンアレイの組合せのマイクロホンアレイセットが加わっている。具体的には、第２の実施形態では、図８に示す通り、対角上のマイクロホンアレイの組合せのマイクロホンアレイセットとして、マイクロホンアレイＭＡＵ、ＭＡＬの組合せのマイクロホンアレイセットＭＳＵＬ、マイクロホンアレイＭＡＬ、ＭＡＢの組合せのマイクロホンアレイセットＭＳＬＢ、マイクロホンアレイＭＡＢ、ＭＡＲの組合せのマイクロホンアレイセットＭＳＢＲ、及びマイクロホンアレイＭＡＲ、ＭＡＵの組合せのマイクロホンアレイセットＭＳＲＵが加わっている。 In the first embodiment, a microphone array set MSLR that is a combination of microphone arrays MAL and MAR arranged in the left-right direction (horizontal direction) and a microphone that is a combination of microphone arrays MAU and MAB arranged in the vertical direction (vertical direction) are used. The array set MSUB was logically configured, but in the second embodiment, a microphone array set that is a combination of diagonal (diagonal) microphone arrays is added. Specifically, in the second embodiment, as shown in FIG. 8, as the microphone array set of the combination of the diagonal microphone arrays, the microphone array set MSUL, the microphone array MAL, and the MAB of the combination of the microphone array MAU and MAL. The combination of the microphone array set MSLB, the microphone array MAB, the combination of the microphone array set MSBR, and the combination of the microphone array MAR and the MAU, the microphone array set MSRU are added.

すなわち、第２の実施形態の収音装置１０Ａでは、６つのマイクロホンアレイセットＭＳＬＲ、ＭＳＵＢ、ＭＳＵＬ、ＭＳＬＢ、ＭＳＢＲ、ＭＳＲＵが設定可能であるものとする。 That is, in the sound collecting device 10A of the second embodiment, it is assumed that six microphone array sets MSLR, MSUB, MSUL, MSLB, MSBR, and MSRU can be set.

ところで、非目的エリア音が目的エリアＴＡ（中心位置ＰＣ）の上下方向と左右方向どちらにも存在し、音量が大きい場合、上下・左右どちらかのマイクロホンアレイを使用して目的エリア音を抽出しても、どちらにも非目的エリア音が混入する恐れがある。そこで、第２の実施形態の収音装置１０Ａでは、上下・左右のマイクロホンアレイの組合せだけでなく、対角上にあるマイクロホンアレイの組合せ（上述の６つのマイクロホンアレイセット）も用いて目的エリア音を抽出して比較し、最も抽出される目的エリア音の音量レベルが小さいものを最終的に出力するものとして選択するものとする。 By the way, when the non-purpose area sound exists in both the vertical direction and the horizontal direction of the target area TA (center position PC) and the volume is high, the target area sound is extracted using either the vertical or horizontal microphone array. However, there is a risk that non-purpose area sounds will be mixed in both. Therefore, in the sound collecting device 10A of the second embodiment, not only the combination of the upper / lower / left / right microphone arrays but also the combination of the diagonal microphone arrays (the above-mentioned six microphone array sets) is used to produce the target area sound. Are extracted and compared, and the one with the lowest volume level of the extracted target area sound is selected as the final output.

なお、第１の実施形態ではマイクロホンアレイセットごとに処理する構成要素を分離して図示していたが、第２の実施形態の収音装置１０Ａでは省略して説明している。収音装置１０、１０Ａにおいて、マイクロホンアレイセットごとに処理する構成要素を分離する構成（ソフトウェア的又はハードウェア的に分離して構成）とするようにしてもよいし一体として構成するようにしてもよい。 In the first embodiment, the components to be processed for each microphone array set are shown separately, but in the sound collecting device 10A of the second embodiment, the description is omitted. In the sound collecting devices 10 and 10A, the components to be processed for each microphone array set may be separated (software or hardware separated) or integrated. Good.

（Ｂ−２）第２の実施形態の動作
次に、以上のような構成を有する第２の実施形態の収音装置１０Ａの動作を説明する。 (B-2) Operation of the Second Embodiment Next, the operation of the sound collecting device 10A of the second embodiment having the above configuration will be described.

以下では、第２の実施形態の収音装置１０Ａの動作について、第１の実施形態との差異を中心に説明する。 Hereinafter, the operation of the sound collecting device 10A of the second embodiment will be described focusing on the difference from the first embodiment.

信号入力部１１Ａは、各マイクロホンアレイで収音した音響信号をアナログ信号からデジタル信号に変換し入力信号として出力する処理を行う。 The signal input unit 11A performs a process of converting an acoustic signal picked up by each microphone array from an analog signal to a digital signal and outputting it as an input signal.

組合せ選択部１９は、上述の６つのマイクロホンアレイセットＭＳＬＲ、ＭＳＵＢ、ＭＳＵＬ、ＭＳＬＢ、ＭＳＢＲ、ＭＳＲＵ（マイクロホンアレイの組合せ）の中から所定のルールに基づきＭ個（２以上でマイクロホンアレイセットの最大数以下の整数；ここでは２以上６以下の整数）選択する。なお、組合せ選択部１９において、選択されるＭ個のマイクロホンアレイセットの組合せは、予め設定された内容としてもよいし、時系列に応じて内容を変更するようにしてもよい。 The combination selection unit 19 has M pieces (2 or more, maximum number of microphone array sets) from the above-mentioned six microphone array sets MSLR, MSUB, MSUL, MSLB, MSBR, and MSRU (combination of microphone arrays) based on a predetermined rule. The following integers; here, integers of 2 or more and 6 or less) are selected. The combination of the M microphone array sets selected by the combination selection unit 19 may have preset contents, or the contents may be changed according to the time series.

具体的には、例えば、組合せ選択部１９に、予めマイクロホンアレイの設置環境を考慮し、スピーカなどの非目的エリア音が混入し難いＭ個の組合せ（マイクロホンアレイセット）を設定（例えば、オペレータや設計者等により設定）しておくようにしてもよい。 Specifically, for example, in consideration of the installation environment of the microphone array in advance, M combinations (microphone array set) in which non-purpose area sounds such as speakers are unlikely to be mixed are set in the combination selection unit 19 (for example, an operator or an operator. It may be set by the designer or the like).

また、例えば、組合せ選択部１９は、最初は全てのマイクロホンアレイセットを選択し、目的エリア音選択部１８Ａで選択されたマイクロホンアレイセットの情報をもとに、各マイクロホンアレイセットを評価し、その後は評価の高い上位３つのマイクロホンアレイセットを選択するようにしてもよい。 Further, for example, the combination selection unit 19 first selects all the microphone array sets, evaluates each microphone array set based on the information of the microphone array sets selected by the target area sound selection unit 18A, and then evaluates each microphone array set. May select the top three highly rated microphone array sets.

さらに、例えば、組合せ選択部１９は、目的エリア音選択部１８Ａで最終的に選択されるマイクロホンアレイセットにおいて偏りがある場合（例えば、２つのマイクロホンアレイセットが選択される確率が全体の５割以上を占めるなど）、その良く選択される（高頻度で選択される）マイクロホンアレイセットについては固定的に選択し、もう１つは残りのマイクロホンアレイセットからランダムに選択するようにしてもよい。 Further, for example, when the combination selection unit 19 is biased in the microphone array set finally selected by the target area sound selection unit 18A (for example, the probability that two microphone array sets are selected is 50% or more of the total). The well-selected (frequently selected) microphone array set may be fixedly selected, and the other may be randomly selected from the remaining microphone array sets.

以下では、組合せ選択部１９により直近に選択されたマイクロホンアレイセットを「選択マイクロホンアレイセット」と呼ぶものとする。 Hereinafter, the microphone array set most recently selected by the combination selection unit 19 will be referred to as a “selection microphone array set”.

入力レベル補正部１２Ａは、マイクロホンアレイＭＡＬ、ＭＡＲ、ＭＡＵ、ＭＡＢから供給された各入力信号のレベルが全て同じ大きさになるように補正する。 The input level correction unit 12A corrects the levels of the input signals supplied from the microphone arrays MAL, MAR, MAU, and MAB so that they all have the same magnitude.

指向性形成部１３Ａは、マイクロホンアレイＭＡＬ、ＭＡＲ、ＭＡＵ、ＭＡＢのそれぞれについてＢＦにより目的エリア方向に指向性を形成する（ＢＦ出力を生成する）。第２の実施形態では、第１の実施形態と同様に、各マイクロホンアレイセットにおいて、任意の一方のマイクロホンアレイのＢＦ出力をＹ_１ｋ（ｎ）とし、他方のマイクロホンアレイのＢＦ出力をＹ_２ｋ（ｎ）として各式が適用されるものとして説明する。 The directivity forming unit 13A forms directivity toward the target area by BF for each of the microphone arrays MAL, MAR, MAU, and MAB (generates a BF output). In the second embodiment, as in the first embodiment, in each microphone array set, the BF output of any one microphone array is Y _1k (n), and the BF output of the other microphone array is Y _2k (Y 2k (n). It will be described as assuming that each equation is applied as n).

遅延補正部１５Ａは、第１の実施形態と同様に、各マイクロホンアレイのＢＦ出力について目的エリアＴＡ（中心位置ＰＣ）と各マイクロホンアレイの距離の違いにより発生する遅延を算出し、補正する。 Similar to the first embodiment, the delay correction unit 15A calculates and corrects the delay generated by the difference in the distance between the target area TA (center position PC) and each microphone array for the BF output of each microphone array.

補正係数算出部１６Ａは、選択マイクロホンアレイセットのそれぞれについて、第１の実施形態と同様に各ＢＦ出力に含まれる目的エリア音成分の振幅スペクトルを同じにするための補正係数を算出する。収音装置１０Ａでは、各マイクロホンアレイセットについて、第１の実施形態と同様にいずれかのマイクロホンアレイが主マイクロホンアレイとして設定されているものとする。 The correction coefficient calculation unit 16A calculates, for each of the selected microphone array sets, a correction coefficient for making the amplitude spectrum of the target area sound component included in each BF output the same, as in the first embodiment. In the sound collecting device 10A, it is assumed that one of the microphone arrays is set as the main microphone array for each microphone array set as in the first embodiment.

目的エリア音抽出部１７Ａは、選択マイクロホンアレイセットのそれぞれについて、第１の実施形態と同様に主マイクロホンアレイを基準として目的エリア音を抽出する。 The target area sound extraction unit 17A extracts the target area sound for each of the selected microphone array sets with reference to the main microphone array as in the first embodiment.

目的エリア音選択部１８Ａは、目的エリア音抽出部１７で抽出された目的エリア音のそれぞれ（各選択マイクロホンアレイセットを用いて抽出された目的エリア音）を比較して、いずれかの目的エリア音を選択して最終的な目的エリア音（出力信号）として出力する。目的エリア音選択部１８Ａが、複数の目的エリア音からいずれかの目的エリア音を選択する処理自体は、第１の実施形態と同様の方式を適用するようにしてもよい。また、目的エリア音選択部１８Ａは、目的エリア音の選択結果（いずれのマイクロホンアレイセットの目的エリア音を選択したかの情報）を、組合せ選択部１９にフィードバックする。なお、目的エリア音選択部１８Ａは、組合せ選択部１９の処理で必要な場合、目的エリア音抽出部１７で抽出された目的エリア音を比較した結果（例えば、評価結果の順位）も組合せ選択部１９にフィードバックするようにしてもよい。 The target area sound selection unit 18A compares each of the target area sounds extracted by the target area sound extraction unit 17 (target area sounds extracted using each selection microphone array set), and one of the target area sounds. Is selected and output as the final target area sound (output signal). The process itself of selecting one of the target area sounds from the plurality of target area sounds by the target area sound selection unit 18A may apply the same method as in the first embodiment. Further, the target area sound selection unit 18A feeds back the selection result of the target area sound (information on which microphone array set the target area sound is selected) to the combination selection unit 19. When necessary for the processing of the combination selection unit 19, the target area sound selection unit 18A also compares the target area sounds extracted by the target area sound extraction unit 17 (for example, the ranking of the evaluation results) with the combination selection unit. You may want to give feedback to 19.

（Ｂ−３）第２の実施形態の効果
第２の実施形態によれば、第１の実施形態の効果に加えて以下のような効果を奏することができる。 (B-3) Effect of Second Embodiment According to the second embodiment, the following effects can be obtained in addition to the effect of the first embodiment.

第２の実施形態の収音装置１０Ａでは、第１の実施形態と比較して、対角上のマイクロホンアレイの組合せ（マイクロホンアレイセット）で抽出した目的エリア音を加えた候補の中から、最も平均振幅スペクトルが小さいものを選択して出力している。これにより、第２の実施形態の収音装置１０Ａでは、上述の通り、より安定的に非目的エリア音の混入を抑えつつ目的エリア音の収音を行うことができる。例えば、第２の実施形態の収音装置１０Ａでは、収音エリアの周囲に非目的エリア音が複数存在している環境下でも、非目的エリア音の混入を抑えることができる。 In the sound collecting device 10A of the second embodiment, as compared with the first embodiment, the most candidate among the candidates to which the target area sound extracted by the diagonal combination of microphone arrays (microphone array set) is added. The one with a small average amplitude spectrum is selected and output. As a result, in the sound collecting device 10A of the second embodiment, as described above, it is possible to collect the target area sound more stably while suppressing the mixing of the non-target area sound. For example, in the sound collecting device 10A of the second embodiment, it is possible to suppress the mixing of non-purpose area sounds even in an environment in which a plurality of non-purpose area sounds exist around the sound collecting area.

（Ｃ）第３の実施形態
以下、本発明による収音装置、収音プログラム及び収音方法の第３の実施形態について図面を参照して説明する。 (C) Third Embodiment Hereinafter, a third embodiment of the sound collecting device, the sound collecting program, and the sound collecting method according to the present invention will be described with reference to the drawings.

（Ｃ−１）第３の実施形態の構成
図９は、第３の実施形態の収音装置１０Ｂの全体構成を示すブロック図である。図９では、上述の図１と同一又は対応する部分に、同一又は対応する符号を付している。以下では、第３の実施形態について第１の実施形態との差異を中心に説明する。 (C-1) Configuration of Third Embodiment FIG. 9 is a block diagram showing an overall configuration of the sound collecting device 10B of the third embodiment. In FIG. 9, the same or corresponding reference numerals are given to the same or corresponding parts as those in FIG. 1 described above. Hereinafter, the third embodiment will be described focusing on the differences from the first embodiment.

第３の実施形態の収音装置１０Ｂでは、目的エリア音選択部１８が、周波数別目的エリア音選択部２０に置き換わっている点で第１の実施形態と異なっている。 The sound collecting device 10B of the third embodiment is different from the first embodiment in that the target area sound selection unit 18 is replaced with the frequency-specific target area sound selection unit 20.

（Ｃ−２）第３の実施形態の動作
次に、以上のような構成を有する第３の実施形態の収音装置１０Ｂの動作を説明する。 (C-2) Operation of the Third Embodiment Next, the operation of the sound collecting device 10B of the third embodiment having the above configuration will be described.

以下では、第３の実施形態の収音装置１０Ｂの動作について、第１の実施形態との差異を中心に説明する。 Hereinafter, the operation of the sound collecting device 10B of the third embodiment will be described focusing on the difference from the first embodiment.

上述の通り、第３の実施形態の収音装置１０Ｂでは、周波数別目的エリア音選択部２０のみが第１の実施形態と異なっているので、ここでは周波数別目的エリア音選択部２０の動作についてのみについて説明する。 As described above, in the sound collecting device 10B of the third embodiment, only the frequency-specific target area sound selection unit 20 is different from the first embodiment. Only will be described.

第１の実施形態の目的エリア音選択部１８は、平均振幅スペクトルに基づいて、目的エリア音抽出部１７Ａで抽出された目的エリア音と目的エリア音抽出部１７Ｂで抽出された目的エリア音とのいずれかを選択して出力していた。これに対して第３の実施形態の周波数別目的エリア音選択部２０は、周波数毎に目的エリア音抽出部１７Ａで抽出された目的エリア音の成分と、目的エリア音抽出部１７Ｂで抽出された目的エリア音の成分とを比較して、周波数毎にいずれかの目的エリア音の成分を選択し、選択した各周波数の成分を集合して１つの目的エリア音（出力信号）を生成して出力する。 The target area sound selection unit 18 of the first embodiment combines the target area sound extracted by the target area sound extraction unit 17A and the target area sound extracted by the target area sound extraction unit 17B based on the average amplitude spectrum. One of them was selected and output. On the other hand, the frequency-specific target area sound selection unit 20 of the third embodiment has the target area sound components extracted by the target area sound extraction unit 17A for each frequency and the target area sound extraction unit 17B. Compare with the component of the target area sound, select one of the components of the target area sound for each frequency, and collect the components of each selected frequency to generate one target area sound (output signal) and output it. To do.

周波数別目的エリア音選択部２０が、２つの目的エリア音の成分からいずれかを選択する方法については限定されないものであるが、例えば、２つの目的エリア音の成分（比較する対象となる周波数の成分）の値（パワー）に基づいて選択するようにしてもよい。例えば、周波数別目的エリア音選択部２０は、２つの目的エリア音の成分のうち値が小さい方を最終的に選択して出力するようにしてもよい。 The method by which the frequency-specific target area sound selection unit 20 selects one of the two target area sound components is not limited, but for example, the two target area sound components (of the frequencies to be compared). The selection may be made based on the value (power) of the component). For example, the frequency-specific target area sound selection unit 20 may finally select and output the smaller value of the two target area sound components.

（Ｃ−３）第３の実施形態の効果
第３の実施形態によれば、以下のような効果を奏することができる。 (C-3) Effect of Third Embodiment According to the third embodiment, the following effects can be achieved.

第３の実施形態の収音装置１０Ｂでは、目的エリア正面の左右方向に設置したマイクロホンアレイＭＡＬ、ＭＡＲ（マイクロホンアレイセットＭＳＬＲ）と、上下方向に設置したマイクロホンアレイＭＡＵ、ＭＵＢ（マイクロホンアレイセットＭＳＵＢ）のそれぞれで目的エリア音を抽出し、抽出した目的エリア音の内、周波数毎に振幅スペクトルが小さい方を選択して出力している。これにより、第２の実施形態の収音装置１０Ｂでは、例えば、目的エリアの上下左右方向に非目的エリア音が存在しても収音せず、空中に浮かんだエリア内の音を収音することができる。 In the sound collecting device 10B of the third embodiment, the microphone arrays MAL and MAR (microphone array set MSLR) installed in the left-right direction in front of the target area and the microphone arrays MAU and MUB (microphone array set MSUB) installed in the vertical direction are used. The target area sound is extracted for each of the above, and the one with the smaller amplitude spectrum for each frequency is selected and output from the extracted target area sounds. As a result, in the sound collecting device 10B of the second embodiment, for example, even if there is a non-target area sound in the vertical and horizontal directions of the target area, the sound is not collected, and the sound in the area floating in the air is collected. be able to.

（Ｄ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (D) Other Embodiments The present invention is not limited to each of the above embodiments, and modified embodiments as illustrated below can also be mentioned.

（Ｄ−１）第２の実施形態のマイクロホンアレイ装置ＭＥでは、図２、図３に示すように、目的エリア正面の左右方向に設置したマイクロホンアレイＭＡＬ、ＭＡＲと、上下方向に設置したマイクロホンアレイＭＡＵ、ＭＵＢを用いてエリア収音を行っていた。言い換えると第２の実施形態のマイクロホンアレイ装置ＭＥでは、４つのマイクロホンアレイを原点Ｐ０を中心として十字型状に配置してマイクロホンアレイセットを形成していたが、図１０に示すように原点Ｐ０を中心として格子状に配置してマイクロホンアレイセットを形成するようにしてもよい。 (D-1) In the microphone array device ME of the second embodiment, as shown in FIGS. 2 and 3, the microphone arrays MAL and MAR installed in the left-right direction in front of the target area and the microphone arrays installed in the vertical direction Area sound collection was performed using MAU and MUB. In other words, in the microphone array device ME of the second embodiment, four microphone arrays are arranged in a cross shape with the origin P0 as the center to form a microphone array set. However, as shown in FIG. 10, the origin P0 is arranged. The microphone array set may be formed by arranging them in a grid pattern as the center.

図１０は、第２の実施形態の変形例において目的エリアＴＡの中心位置から原点Ｐ０への方向（−Ｘ方向；正面方向）を見た場合における各マイクロホンアレイの配置構成について示した図である。 FIG. 10 is a diagram showing an arrangement configuration of each microphone array when the direction (−X direction; front direction) from the center position of the target area TA to the origin P0 is viewed in the modified example of the second embodiment. ..

図１０に示すマイクロホンアレイ装置ＭＥは、目的エリアＴＡ（中心位置ＰＣ）から見て原点Ｐ０の右斜め上に配置されたマイクロホンアレイＭＡＲＵと、原点Ｐ０の右斜め下に配置されたマイクロホンアレイＭＡＲＢと、原点Ｐ０の左斜め下に配置されたマイクロホンアレイＭＡＬＢと、原点Ｐ０の左斜め上に配置されたマイクロホンアレイＭＡＬＵとを有している。 The microphone array device ME shown in FIG. 10 includes a microphone array MARU arranged diagonally above the origin P0 and a microphone array MARB arranged diagonally below the origin P0 when viewed from the destination area TA (center position PC). It has a microphone array MALB arranged diagonally to the left of the origin P0 and a microphone array MALU arranged diagonally to the left of the origin P0.

そして、図１０に示すマイクロホンアレイ装置ＭＥを適用した場合、第２の実施形態の収音装置１０Ａでは、マイクロホンアレイＭＡＬＵ、ＭＡＲＵの組み合わせのマイクロホンアレイセットＭＳ＿ＬＵ＿ＲＵ、マイクロホンアレイＭＡＲＵ、ＭＡＲＢの組み合わせのマイクロホンアレイセットＭＳ＿ＲＵ＿ＲＢ、マイクロホンアレイＭＡＲＢ、ＭＡＬＢの組み合わせのマイクロホンアレイセットＭＳ＿ＲＢ＿ＬＢ、マイクロホンアレイＭＡＬＢ、ＭＡＬＵの組み合わせのマイクロホンアレイセットＭＳ＿ＬＵ＿ＬＢ、マイクロホンアレイＭＡＬＵ、ＭＡＲＢの組み合わせのマイクロホンアレイセットＭＳ＿ＬＵ＿ＲＢ、マイクロホンアレイＭＡＲＵ、ＭＡＬＢの組み合わせのマイクロホンアレイセットＭＳ＿ＲＵ＿ＬＢが形成（設定）されることになる。 When the microphone array device ME shown in FIG. 10 is applied, in the sound collecting device 10A of the second embodiment, the microphone array set MS_LU_RU of the combination of the microphone array MARU and MARU, and the microphone array of the combination of the microphone array MARU and MARB are applied. Microphone array set of combination MS_RU_RB, microphone array MARB, MALB Microphone array set of combination MS_RB_LB, microphone array MALB, MALU Microphone array set of combination MS_LU_LB, microphone array MALU, MARB Combination of microphone array set MS_LU_RB, microphone array MARU, MALB The microphone array set MS_RU_LB will be formed (set).

（Ｄ−２）上記の各実施形態ではマイクロホンアレイ装置ＭＥに４つのマイクロホンアレイを配置する例について示したが、マイクロホンアレイ装置ＭＥに備えるマイクロホンアレイの数は３以上であればよい。マイクロホンアレイ装置ＭＥに３以上のマイクロホンアレイが配置されていれば、組み合わせにより２つ以上（３つ）のマイクロホンアレイセットを構成することができる。言い換えると、マイクロホンアレイ装置ＭＥには、２以上のマイクロホンアレイセットを形成することができるだけのマイクロホンが配置されていればよい。例えば、図２、図３に示すマイクロホンアレイ装置ＭＥにおいて、下側のマイクロホンアレイＭＡＢを除外した構成としてもよい。 (D-2) In each of the above embodiments, an example in which four microphone arrays are arranged in the microphone array device ME has been shown, but the number of microphone arrays provided in the microphone array device ME may be three or more. If three or more microphone arrays are arranged in the microphone array device ME, two or more (three) microphone array sets can be configured by combining them. In other words, the microphone array device ME may be provided with enough microphones to form two or more microphone array sets. For example, in the microphone array device ME shown in FIGS. 2 and 3, the microphone array MAB on the lower side may be excluded.

（Ｄ−３）第２の実施形態の収音装置１０Ａにおいて、第３の実施形態と同様に対角上のマイクロホンアレイを組み合わせたマイクロホンアレイセットも加えて処理可能とするようにしてもよい。すなわち、第２の実施形態と第３の実施形態とを組み合わせた収音装置を構成するようにしてもよい。 (D-3) In the sound collecting device 10A of the second embodiment, a microphone array set in which diagonal microphone arrays are combined may be added to enable processing as in the third embodiment. That is, a sound collecting device may be configured by combining the second embodiment and the third embodiment.

１０…収音装置、１１…信号入力部、１２…入力レベル補正部、１３…指向性形成部、１４…空間座標データ記憶部、１５…遅延補正部、１６…補正係数算出部、１７…目的エリア音抽出部、１８…目的エリア音選択部、１１１、１１１Ａ、１１１Ｂ…信号入力処理部、１３１、１３１Ａ、１３１Ｂ…指向性形成処理部、１５１、１５１Ａ、１５１Ｂ…遅延補正処理部、１６１、１６１Ａ、１６１Ｂ…補正係数算出処理部、１７１、１７１Ａ、１７１Ｂ…目的エリア音抽出処理部、ＭＥ…マイクロホンアレイ装置、ＭＡ、ＭＡＬ、ＭＡＲ、ＭＡＵ、ＭＡＢ…マイクロホンアレイ、ＭＳ、ＭＳＬＲ、ＭＳＵＢ…マイクロホンアレイセット、Ｍ、Ｍ１、Ｍ２…マイクロホン。 10 ... Sound collecting device, 11 ... Signal input unit, 12 ... Input level correction unit, 13 ... Directivity forming unit, 14 ... Spatial coordinate data storage unit, 15 ... Delay correction unit, 16 ... Correction coefficient calculation unit, 17 ... Purpose Area sound extraction unit, 18 ... Target area sound selection unit, 111, 111A, 111B ... Signal input processing unit, 131, 131A, 131B ... Directivity formation processing unit, 151, 151A, 151B ... Delay correction processing unit, 161, 161A , 161B ... Correction coefficient calculation processing unit, 171, 171A, 171B ... Target area sound extraction processing unit, ME ... Microphone array device, MA, MAL, MAR, MAU, MAB ... Microphone array, MS, MSLR, MSUB ... Microphone array set , M, M1, M2 ... Microphone.

Claims

For each of the input signals supplied from the plurality of microphone arrays, the beam former forms directivity toward the target area where the target area exists, and the target direction signal from the target area direction is acquired for each of the microphone arrays. Directivity forming means and
Three or more microphone array sets formed by a combination of the two microphone arrays are set, and for each of the microphone array sets, the target direction signal of each of the microphone arrays is spectrally subtracted to move toward the target area. A target area sound extraction means that extracts a target area sound by extracting an existing non-purpose area sound and subtracting the spectrum of the extracted non-purpose area sound from any of the target direction signals.
Based on the result of comparing the target area sounds extracted using each of the microphone array sets, one output signal is generated and output from the target area sounds extracted using each of the microphone array sets. A sound collecting device characterized by having an output means.

The first aspect of the present invention is characterized in that the output means selects the target area sound having the lowest volume level from the target area sounds extracted using the respective microphone array sets and outputs the target area sound as an output signal. The described sound collector.

The output means selects by comparing the component values for each frequency with respect to the target area sound extracted using each of the microphone array sets, and generates an output signal based on the component for each selected frequency. The sound collecting device according to claim 1.

The sound collecting device according to claim 3, wherein the output means selects a component having the lowest value for each frequency with respect to the target area sound extracted using each of the microphone array sets.

Computer,
For each of the input signals supplied from the plurality of microphone arrays, the beam former forms directivity toward the target area where the target area exists, and the target direction signal from the target area direction is acquired for each of the microphone arrays. Directivity forming means and
Three or more microphone array sets formed by a combination of the two microphone arrays are set, and for each of the microphone array sets, the target direction signal of each of the microphone arrays is spectrally subtracted to move toward the target area. A target area sound extraction means that extracts a target area sound by extracting an existing non-purpose area sound and subtracting the spectrum of the extracted non-purpose area sound from any of the target direction signals.
Based on the result of comparing the target area sounds extracted using each of the microphone array sets, one output signal is generated and output from the target area sounds extracted using each of the microphone array sets. A sound collection program characterized by functioning as an output means.

In the sound collecting method performed by the sound collecting device,
The sound collecting device includes a directivity forming means, a target area sound extracting means, and an output means.
The directivity forming means forms directivity for each of the input signals supplied from the plurality of microphone arrays in the direction of the target area where the target area exists by the beam former, and for each of the microphone arrays, from the direction of the target area. Get the destination direction signal of
The target area sound extraction means sets three or more microphone array sets formed by a combination of the two microphone arrays, and for each of the microphone array sets, spectrum subtracts the target direction signal of each of the microphone arrays. By doing so, the non-purpose area sound existing in the target area direction is extracted, and the target area sound is extracted by subtracting the spectrum of the extracted non-purpose area sound from any of the target direction signals.
The output means outputs one output signal from the target area sound extracted using the respective microphone array sets based on the result of comparing the target area sounds extracted using the respective microphone array sets. A sound collection method characterized by generating and outputting.