JP2020036139A

JP2020036139A - Sound pickup device, program and method

Info

Publication number: JP2020036139A
Application number: JP2018159969A
Authority: JP
Inventors: 一浩片桐; Kazuhiro Katagiri
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2018-08-29
Filing date: 2018-08-29
Publication date: 2020-03-05
Anticipated expiration: 2038-08-29
Also published as: JP6624256B1

Abstract

To provide a sound pickup device, a program and a method capable of picking up a target area sound with less distortion.SOLUTION: The present invention relates to a sound pickup device having target area sound extraction means for acquiring the beam former outputs of respective microphone arrays, on the basis of input signals from multiple microphone arrays, and extracting a target area sound from a target area, as the sound source, by using the beam former outputs thus acquired, selection means for selecting the input signal of any microphone array as a mixed signal, on the basis of comparison results of the average amplitude spectrum of the input signals of respective microphone arrays, signal mixture means for mixing, as the mixed signal, the input signal of a microphone array, selected by the selection means, to the target area sound extracted by the target area sound extraction means, and output means for outputting the signal mixed by the selection means.SELECTED DRAWING: Figure 1

Description

この発明は、収音装置、プログラム及び方法に関し、例えば、特定のエリアの音を強調し、それ以外のエリアの音を抑制するシステムに適用し得る。 The present invention relates to a sound collection device, a program, and a method, and can be applied to, for example, a system that emphasizes sound in a specific area and suppresses sound in other areas.

複数の音源が存在する環境下において、ある特定方向の音のみ分離し収音する技術として、マイクロホンアレイを用いたビームフォーマ（ＢｅａｍＦｏｒｍｅｒ；以下「ＢＦ」とも呼ぶ）がある。ＢＦとは、各マイクロホンに到達する信号の時間差を利用して指向性を形成する技術である（非特許文献１参照）。 In an environment where a plurality of sound sources exist, as a technique for separating and collecting only sound in a specific direction, there is a beam former (hereinafter also referred to as “BF”) using a microphone array. BF is a technique for forming directivity by using a time difference between signals reaching each microphone (see Non-Patent Document 1).

従来、ＢＦは、加算型と減算型の大きく２つの種類に分けられる。特に減算型ＢＦは、加算型即に比べ、少ないマイクロホン数で指向性を形成できるという利点がある。 Conventionally, BFs are roughly classified into two types, namely, an addition type and a subtraction type. In particular, the subtraction type BF has an advantage that directivity can be formed with a smaller number of microphones than the addition type BF.

図４は、マイクロホンＭの数が２個の場合の減算型ＢＦ２００に係る構成を示すブロック図である。 FIG. 4 is a block diagram illustrating a configuration of the subtraction type BF 200 when the number of microphones M is two.

図５は、２個のマイクロホンＭ１、Ｍ２を用いた減算型ＢＦ２００により形成される指向性フィルタの例について示した説明図である。 FIG. 5 is an explanatory diagram showing an example of a directional filter formed by a subtraction type BF 200 using two microphones M1 and M2.

減算型ＢＦ２００は、まず遅延器２１０により目的とする方向に存在する音（以下、「目的音」と呼ぶ）が各マイクロホンＭ１、Ｍ２に到来する信号の時間差を算出し、遅延を加えることにより目的音の位相を合わせる。上述の時間差は以下の（１）式により算出することができる。 The subtraction type BF 200 first calculates the time difference between the signals that arrive at the microphones M1 and M2 of the sound (hereinafter, referred to as “target sound”) present in the target direction by the delay unit 210, and adds the delay to the target. Adjust the sound phase. The above-mentioned time difference can be calculated by the following equation (1).

ここで、ｄはマイクロホンＭ１、Ｍ２間の距離、ｃは音速、τ_ｉは遅延量である。またθ_Ｌは、各マイクロホンＭ（Ｍ１、Ｍ２）を結んだ直線に対する垂直方向から目的方向への角度である。また、ここで、死角がマイクロホンＭ１とＭ２の中心に対し、マイクロホンＭ１の方向に存在する場合、遅延器２１０は、マイクロホンＭ１の入力信号ｘ_１（ｔ）に対し遅延処理を行う。 Here, d is the distance between the microphones M1 and M2, c is the sound speed, and τ _i is the delay amount. The theta _L is the angle from the vertical direction to the target direction against the line connecting the microphones M (M1, M2). Also, here, with respect to the center of the blind spot is a microphone M1 M2, when present in the direction of the microphone M1, delayer 210 performs delay processing on an input signal _x 1 of the microphone M1 (t).

その後、減算型ＢＦ２００では、以下の（２）式に従い処理（減算処理）を行う。減算型ＢＦ２００の処理は周波数領域でも同様に行うことができ、その場合（２）式は以下の（３）式のように変更される。

Thereafter, the subtraction type BF 200 performs processing (subtraction processing) according to the following equation (2). The processing of the subtraction type BF 200 can be similarly performed in the frequency domain, in which case the equation (2) is changed to the following equation (3).

ここでθ_Ｌ＝±π／２の場合、減算型ＢＦ２００により形成される指向性は図５(ａ)に示すように、カージオイド型の単一指向性となる。また、「θ_Ｌ＝０，π」の場合、減算型ＢＦ２００により形成される指向性は、図５(ｂ)のような８の字型の双指向性となる。 Here, when θ _L = ± π / 2, the directivity formed by the subtraction type BF 200 is a cardioid type single directivity as shown in FIG. In addition, in the case of “θ _L = 0, π”, the directivity formed by the subtraction type BF 200 is a figure-eight bidirectional directivity as shown in FIG.

以下では、入力信号から単一指向性を形成するフィルタを「単一指向性フィルタ」と呼び、双指向性を形成するフィルタを双指向性フィルタと呼ぶものとする。 Hereinafter, a filter that forms unidirectionality from an input signal is referred to as a “unidirectionality filter”, and a filter that forms bidirectionality is referred to as a bidirectional filter.

また、減算器２２０では、スペクトル減算法（ＳｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎ；以下、単に、「ＳＳ」とも呼ぶ）を用いることで、双指向性の死角に強い指向性を形成することもできる。ＳＳによる指向性は、以下の（４）式に従い全周波数、もしくは指定した周波数帯域で形成される。 In addition, the subtractor 220 can form a strong directivity in a bidirectional blind spot by using a spectral subtraction method (hereinafter, also simply referred to as “SS”). The directivity by the SS is formed in all frequencies or a designated frequency band according to the following equation (4).

以下の（４）式では、マイクロホンＭ１の入力信号Ｘ_１を用いているが、マイクロホンＭ２の入力信号Ｘ_２でも同様の効果を得ることができる。ここでβは、ＳＳの強度を調節するための係数である。また、減算器２２０では、減算時に値がマイナスなった場合は、０または元の値を小さくした値に置き換えるフロアリング処理を行う。以上のような減算型ＢＦ２００の処理方式では、双指向性の特性によって目的方向以外に存在する音（以下、「非目的音」と呼ぶ）を抽出し、抽出した非目的音の振幅スペクトルを入力信号の振幅スペクトルから減算することで、目的音を強調することができる。

In the following equation (4), and using the input signal X ₁ microphone M1, but it is possible to obtain the same effect input signal X ₂ microphones M2. Here, β is a coefficient for adjusting the strength of SS. When the value becomes negative at the time of subtraction, the subtractor 220 performs flooring processing of replacing it with 0 or a value obtained by reducing the original value. In the processing method of the subtraction type BF 200 as described above, a sound existing in a direction other than the target direction (hereinafter, referred to as a “non-target sound”) is extracted due to the bidirectional characteristic, and an amplitude spectrum of the extracted non-target sound is input. By subtracting from the amplitude spectrum of the signal, the target sound can be emphasized.

ある特定のエリア内に存在する音（以下、「目的エリア音」と呼ぶ）だけを収音したい場合、減算型ＢＦを用いるだけでは、そのエリアの周囲に存在する音源の音（以下、「非目的エリア音」と呼ぶ）も収音してしまう可能性がある。そこで、特許文献１では、複数のマイクロホンアレイを用い、それぞれ別々の方向から目的エリアヘ指向性を向けレ指向性を目的エリアで交差させることで目的エリア音を収音する手法（以下、「エリア収音」と呼ぶ）を提案している。エリア収音では、まず各マイクロホンアレイのＢＦ出力に含まれる目的エリア音の振幅スペクトルの比率を推定し、それを補正係数とする。 When it is desired to collect only sounds existing in a specific area (hereinafter, referred to as “target area sound”), the sound of a sound source existing around that area (hereinafter, “non (Referred to as “target area sound”). Therefore, in Japanese Patent Application Laid-Open No. H11-163, a method of collecting a target area sound by using a plurality of microphone arrays and directing the directivity from different directions to the target area so that the directivity intersects the target area is described. Sound "). In the area sound collection, first, the ratio of the amplitude spectrum of the target area sound included in the BF output of each microphone array is estimated, and the ratio is used as a correction coefficient.

例えば、２つのマイクロホンアレイを使用する場合、目的エリア音振幅スペクトルの補正係数は、以下の（５）式及び（６）式の組み合わせ、又は以下の（７）式及び（８）式の組み合わせにより算出することができる。ここで、Ｙ_１ｋ（ｎ）は第１のマイクロホンアレイのＢＦ出力の振幅スペクトルであり、Ｙ_２ｋ（ｎ）は第２のマイクロホンアレイのＢＦ出力の振幅スペクトルであり、Ｎは周波数ビンの総数であり、ｋは周波数である。また、ここで、α_１（ｎ）、α_２（ｎ）はそれぞれ各ＢＦ出力に対する振幅スペクトル補正係数である。さらに、ここで、ｍｏｄｅは最頻値を表し、ｍｅｄｅｉａｎは中央値を表している。

For example, when two microphone arrays are used, the correction coefficient of the target area sound amplitude spectrum is calculated by a combination of the following equations (5) and (6) or a combination of the following equations (7) and (8). Can be calculated. Here, Y _1k (n) is the amplitude spectrum of the BF output of the first microphone array, Y _2k (n) is the amplitude spectrum of the BF output of the second microphone array, and N is the total number of frequency bins. And k is the frequency. Here, α ₁ (n) and α ₂ (n) are amplitude spectrum correction coefficients for each BF output. Furthermore, here, mode represents the mode, and median represents the median.

以上の処理により、減算器２２０は、補正係数α_１（ｎ）、α_２（ｎ）を求め、求めた補正係数により各ＢＦ出力を補正し、ＳＳすることで、目的エリア方向に存在する非目的エリア音を抽出する。さらに、減算器２２０は、抽出した非目的エリア音を各ＢＦの出力からＳＳすることにより目的エリア音を抽出することができる。 By the above processing, the subtracter 220 calculates the correction coefficients α ₁ (n) and α ₂ (n), corrects each BF output with the obtained correction coefficients, and performs SS to obtain the non-existence in the target area direction. Extract the target area sound. Further, the subtracter 220 can extract the target area sound by applying the extracted non-target area sound from the output of each BF.

減算型ＢＦ２００は、第１のマイクロホンアレイからみた目的エリア方向に存在する非目的エリア音Ｎ_１（ｎ）を抽出する際、例えば、（９）式に示すように、第１のマイクロホンアレイのＢＦ出力Ｙ_１（ｎ）から第２のマイクロホンアレイのＢＦ出力Ｙ_２（ｎ）に振幅スペクトル補正係数α_２を掛けたものをＳＳする。減算型ＢＦ２００は、同様に、以下の（１０）式に従い、第２のマイクロホンアレイからみた目的エリア方向に存在する非目的エリア音Ｎ_２（ｎ）を抽出する。 When extracting the non-target area sound N ₁ (n) existing in the direction of the target area viewed from the first microphone array, the subtraction type BF 200 is, for example, as shown in Expression (9), the BF of the first microphone array. the multiplied by the amplitude spectrum correction coefficient alpha ₂ to the output _Y 1 from the (n) of the second microphone array BF output _Y 2 (n) to SS. Similarly, the subtraction type BF ₂₀₀ extracts the non-target area sound N ₂ (n) existing in the target area direction as viewed from the second microphone array according to the following equation (10).

その後、減算型ＢＦ２００は、以下の（１１）式、又は（１２）式に従い、各ＢＦ出力から非目的エリア音をＳＳして目的エリア音を抽出する。なお、以下の（１１）式は、第１のマイクロホンアレイを基準として、目的エリア音を抽出する場合の処理を示している。また、以下の（１２）式は、第２のマイクロホンアレイを基準として目的エリア音を抽出する場合の処理を示している。ここでγ_１（ｎ）、γ_２（ｎ）は、ＳＳ時の強度を変更するための係数である。

Thereafter, the subtraction type BF 200 extracts the non-target area sound from each BF output according to the following equation (11) or (12) to extract the target area sound. The following equation (11) shows a process for extracting a target area sound with reference to the first microphone array. The following equation (12) shows a process for extracting a target area sound based on the second microphone array. Here, γ ₁ (n) and γ ₂ (n) are coefficients for changing the strength at the time of SS.

ところで、背景雑音や非目的エリア音の音量レベルが大きい場合、目的エリア音抽出の際に行うＳＳにより、目的エリア音が歪んだり、ミュージカルノイズという耳障りな異音が発生する可能性がある。 By the way, when the volume level of the background noise or the non-target area sound is large, the target area sound may be distorted or annoying abnormal noise such as musical noise may occur due to SS performed at the time of extracting the target area sound.

そこで、特許文献３の手法では、背景雑音と非目的エリア音の大きさに応じて、マイクの入力信号と推定雑音の音量レベルをそれぞれ調節し、抽出した目的エリア音に混合している。 Therefore, in the method of Patent Document 3, the volume levels of the microphone input signal and the estimated noise are adjusted according to the loudness of the background noise and the non-target area sound, respectively, and are mixed with the extracted target area sound.

目的エリア音を抽出する処理により発生するミュージカルノイズは、背景雑音と非目的エリア音の音量レベルが大きいほど強くなるため、特許文献３の手法では、混合する入力信号と推定雑音の総和の音量レベルも、背景雑音と非目的エリア音の音量レベルに比例して大きくしている。 Since the musical noise generated by the process of extracting the target area sound increases as the volume levels of the background noise and the non-target area sound increase, the technique of Patent Document 3 discloses the technique of Japanese Patent Application Laid-Open No. H11-157100. Are also increased in proportion to the volume levels of the background noise and the non-target area sound.

具体的には、特許文献３の手法において、背景雑音の音量レベルは、背景雑音を抑圧する過程で求める推定雑音から算出する。また、特許文献３の手法では、非目的エリア音の音量レベルは、目的エリア音を強調する過程で抽出する目的エリア方向に存在する非目的エリア音と、目的エリア方向以外に存在する非目的エリア音を合わせたものから算出する。さらに、特許文献３の手法では、混合する入力信号と推定雑音の比率は、推定雑音と非目的エリア音の音量レベルから決定する。 Specifically, in the technique of Patent Document 3, the volume level of the background noise is calculated from the estimated noise obtained in the process of suppressing the background noise. Further, in the method of Patent Document 3, the volume level of the non-target area sound is divided into the non-target area sound existing in the target area direction extracted in the process of emphasizing the target area sound and the non-target area sound existing outside the target area direction. Calculated from the combined sound. Further, in the method of Patent Document 3, the ratio between the input signal to be mixed and the estimated noise is determined from the volume level of the estimated noise and the non-target area sound.

目的エリアの近くに非目的エリア音が存在する場合、混合する入力信号の音量レベルが大きすぎると目的エリア音に非目的エリア音が混入し、どちらが目的エリア音なのかが分からなくなってしまう。そこで、特許文献３の手法では、非目的エリア音が大きいときは混合する入力信号の音量レベルを下げ、推定雑音の音量レベルを大きくして混合する。つまり、特許文献３の手法では、非目的エリア音が存在しないか音量レベルが小さい場合は入力信号の割合を多くし、逆に非目的エリア音の音量レベルが大きい場合推定雑音の割合を多くして混合する。 When the non-target area sound exists near the target area, if the volume level of the input signal to be mixed is too high, the non-target area sound is mixed with the target area sound, and it is difficult to know which is the target area sound. Therefore, in the method of Patent Document 3, when the non-target area sound is loud, the volume level of the input signal to be mixed is lowered, and the volume level of the estimated noise is increased to mix. That is, in the method of Patent Document 3, when the non-target area sound does not exist or the volume level is low, the ratio of the input signal is increased, and when the volume level of the non-target area sound is high, the ratio of the estimated noise is increased. Mix.

このように特許文献３の手法を用いれば、目的エリア音に入力信号及び推定雑音を混合することにより、ミュージカルノイズをマスキングし、通常の背景雑音のように違和感なく聞かせることができる。さらに、特許文献３の手法では、マイク入力信号に含まれる目的エリア音の成分により、目的エリア音の歪みを補正し、音質を改善することができる。 Thus, by using the method of Patent Document 3, by mixing the input signal and the estimated noise with the sound of the target area, the musical noise can be masked and can be heard without discomfort like ordinary background noise. Further, according to the method of Patent Document 3, distortion of the target area sound can be corrected by the component of the target area sound included in the microphone input signal, and the sound quality can be improved.

特開２０１４−７２７０８号公報JP 2014-72708 A 特開２００５−１９５９５５号公報JP 2005-195555 A 特開２０１７−１８３９０２号公報JP 2017-183902 A

浅野太著，“音響テクノロジーシリーズ１６音のアレイ信号処理−音源の定位・追跡と分離−”，日本音響学会編，コロナ社，２０１１年２月２５日発行Tadashi Asano, "Acoustic Technology Series 16 Array Signal Processing of Sound-Localization / Tracking and Separation of Sound Sources", edited by The Acoustical Society of Japan, Corona, Feb. 25, 2011

しかしながら、特許文献３の手法では、目的エリアの近くに非目的エリア音が存在する場合、混合する入力信号のレベルを下げるため、非目的エリア音の混入は抑えることができるが、目的エリア音の歪みを改善する効果は低くなってしまう。 However, according to the method disclosed in Patent Document 3, when a non-target area sound exists near the target area, the level of the input signal to be mixed is reduced, so that mixing of the non-target area sound can be suppressed. The effect of improving distortion is reduced.

そのため、より少ない歪みの目的エリア音を収音する収音装置、プログラム及び方法が望まれている。 Therefore, a sound collecting device, a program, and a method for collecting a target area sound with less distortion are desired.

第１の本発明の収音装置は、（１）複数のマイクロホンアレイから入力された入力信号に基づいて、それぞれの前記マイクロホンアレイのビームフォーマ出力を取得し、取得したビームフォーマ出力を用いて目的エリアを音源とする目的エリア音を抽出する目的エリア音抽出手段と、（２）それぞれの前記マイクロホンアレイの入力信号の平均振幅スペクトルを比較した結果に基づいていずれかの前記マイクロホンアレイの入力信号を混合信号として選択する選択手段と、（３）前記目的エリア音抽出手段で抽出された目的エリア音に前記選択手段で選択された前記マイクロホンアレイの入力信号を混合信号として混合する信号混合手段と、（４）前記選択手段が混合した混合後信号を出力する出力手段とを有することを特徴とする。 According to a first aspect of the present invention, there is provided a sound collecting apparatus which (1) acquires beamformer outputs of the respective microphone arrays based on input signals inputted from a plurality of microphone arrays, and uses the acquired beamformer outputs. A target area sound extracting means for extracting a target area sound having an area as a sound source; and (2) an input signal of any one of the microphone arrays based on a result of comparing average amplitude spectra of input signals of the respective microphone arrays. Selecting means for selecting as a mixed signal; (3) signal mixing means for mixing, as a mixed signal, the input signal of the microphone array selected by the selecting means with the target area sound extracted by the target area sound extracting means; (4) The selection means has an output means for outputting a mixed signal after mixing.

第２の本発明の収音プログラムは、コンピュータを、（１）複数のマイクロホンアレイから入力された入力信号に基づいて、それぞれの前記マイクロホンアレイのビームフォーマ出力を取得し、取得したビームフォーマ出力を用いて目的エリアを音源とする目的エリア音を抽出する目的エリア音抽出手段と、（２）それぞれの前記マイクロホンアレイの入力信号の平均振幅スペクトルを比較した結果に基づいていずれかの前記マイクロホンアレイの入力信号を混合信号として選択する選択手段と、（３）前記目的エリア音抽出手段で抽出された目的エリア音に前記選択手段で選択された前記マイクロホンアレイの入力信号を混合信号として混合する信号混合手段と、（４）前記選択手段が混合した混合後信号を出力する出力手段として機能させることを特徴とする。 A sound collection program according to a second aspect of the present invention provides a computer which (1) acquires beamformer outputs of the respective microphone arrays based on input signals input from a plurality of microphone arrays, and outputs the acquired beamformer outputs. A target area sound extracting means for extracting a target area sound using the target area as a sound source, and (2) comparing one of the microphone arrays based on a result of comparing average amplitude spectra of input signals of the microphone arrays. Selecting means for selecting an input signal as a mixed signal; and (3) signal mixing for mixing an input signal of the microphone array selected by the selecting means with a target area sound extracted by the target area sound extracting means as a mixed signal. And (4) functioning as output means for outputting a mixed signal mixed by the selection means. And wherein the Rukoto.

第３の本発明は、収音装置が行う収音方法において、（１）目的エリア音抽出手段、選択手段、信号混合手段、出力手段を備え、（２）前記目的エリア音抽出手段は、複数のマイクロホンアレイから入力された入力信号に基づいて、それぞれの前記マイクロホンアレイのビームフォーマ出力を取得し、取得したビームフォーマ出力を用いて目的エリアを音源とする目的エリア音を抽出し、（３）前記選択手段は、それぞれの前記マイクロホンアレイの入力信号の平均振幅スペクトルを比較した結果に基づいていずれかの前記マイクロホンアレイの入力信号を混合信号として選択し、（４）前記信号混合手段は、前記目的エリア音抽出手段で抽出された目的エリア音に前記選択手段で選択された前記マイクロホンアレイの入力信号を混合信号として混合し、（５）前記出力手段は、前記選択手段が混合した混合後信号を出力することを特徴とする。 According to a third aspect of the present invention, there is provided a sound collection method performed by a sound collection device, comprising: (1) a target area sound extraction unit, a selection unit, a signal mixing unit, and an output unit; (3) acquiring a beamformer output of each of the microphone arrays based on an input signal input from the microphone array, and extracting a target area sound having a target area as a sound source using the acquired beamformer output. The selection means selects one of the microphone array input signals as a mixed signal based on a result of comparing the average amplitude spectra of the input signals of the microphone arrays, and (4) the signal mixing means The input signal of the microphone array selected by the selection means is added to the target area sound extracted by the target area sound extraction means as a mixed signal. Mixed Te, (5) and the output means, and outputs the mixed signal after said selecting means are combined.

本発明によれば、より少ない歪みの目的エリア音を収音する収音装置、プログラム及び方法を提供することができる。 According to the present invention, it is possible to provide a sound collecting apparatus, a program, and a method for collecting a target area sound with less distortion.

第１の実施形態に係る収音装置の機能的構成について示したブロック図である。FIG. 2 is a block diagram illustrating a functional configuration of the sound collection device according to the first embodiment. 第１の実施形態の効果について示した説明図である。FIG. 5 is an explanatory diagram showing an effect of the first embodiment. 第２の実施形態に係る収音装置の機能的構成について示したブロック図である。It is a block diagram showing a functional configuration of a sound collection device according to a second embodiment. 従来のマイクロホン数が２個の場合の減算型ＢＦに係る構成を示すブロック図である。FIG. 13 is a block diagram illustrating a configuration related to a conventional subtraction type BF when the number of microphones is two. 従来の２個のマイクロホンを用いた減算型ＢＦにより形成される指向特性を示す図である。FIG. 10 is a diagram illustrating a directional characteristic formed by a conventional subtraction-type BF using two microphones.

（Ａ）第１の実施形態
以下、本発明による収音装置、プログラム及び方法の第１の実施形態を、図面を参照しながら詳述する。 (A) First Embodiment Hereinafter, a first embodiment of a sound collection device, a program, and a method according to the present invention will be described in detail with reference to the drawings.

（Ａ−１）第１の実施形態の構成
図１は、この実施形態の収音装置１００の機能的構成について示したブロック図である。 (A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing a functional configuration of a sound collection device 100 of this embodiment.

収音装置１００は、２つのマイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）を用いて、目的エリアの音源からの目的エリア音を収音する目的エリア音収音処理を行う。 The sound pickup device 100 performs a target area sound pickup process of picking up a target area sound from a sound source in the target area using two microphone arrays MA (MA1, MA2).

マイクロホンアレイＭＡ１、ＭＡ２は、目的エリアが存在する空間の任意の場所に配置される。目的エリアに対するマイクロホンアレイＭＡ１、ＭＡ２の位置は、指向性が目的エリアでのみ重なればどこでも良く、例えば目的エリアを挟んで対向に配置しても良い。各マイクロホンアレイＭＡは２つ以上のマイクロホンＭから構成され、各マイクロホンＭにより音響信号を収音する。この実施形態では、各マイクロホンアレイＭＡに、音響信号を収音する２つのマイクロホンＭ（Ｍ１、Ｍ２）が配置されるものとして説明する。すなわち、各マイクロホンアレイＭＡは、２ｃｈマイクロホンアレイを構成している。なお、マイクロホンアレイＭＡの数は２つに限定するものではなく、目的エリアが複数存在する場合、全てのエリアをカバーできる数のマイクロホンアレイＭＡを配置する必要がある。 The microphone arrays MA1 and MA2 are arranged at any place in the space where the target area exists. The positions of the microphone arrays MA1 and MA2 with respect to the target area may be anywhere as long as the directivity overlaps only in the target area. For example, the microphone arrays MA1 and MA2 may be arranged to face each other across the target area. Each microphone array MA includes two or more microphones M, and each microphone M collects an acoustic signal. In this embodiment, a description will be given assuming that two microphones M (M1, M2) that collect sound signals are arranged in each microphone array MA. That is, each microphone array MA constitutes a 2ch microphone array. Note that the number of microphone arrays MA is not limited to two, and when there are a plurality of target areas, it is necessary to arrange the number of microphone arrays MA that can cover all the areas.

収音装置１００は、信号入力部１０１、雑音抑圧部１０２、指向性形成部１０３、遅延補正部１０４、空間座標データ１０５、補正係数算出部１０６、目的エリア音抽出部１０７、混合信号選択部１０８、信号混合部１０９、及び信号出力部１１０を備える。 The sound collection device 100 includes a signal input unit 101, a noise suppression unit 102, a directivity forming unit 103, a delay correction unit 104, spatial coordinate data 105, a correction coefficient calculation unit 106, a target area sound extraction unit 107, and a mixed signal selection unit 108. , A signal mixing unit 109, and a signal output unit 110.

収音装置１００を構成する各機能ブロックの詳細処理については後述する。 Detailed processing of each functional block configuring the sound collection device 100 will be described later.

収音装置１００は、全てハードウェア（例えば、専用チップ等）により構成するようにしてもよいし一部又は全部についてソフトウェア（プログラム）として構成するようにしてもよい。収音装置１００は、例えば、プロセッサ及びメモリを有するコンピュータにプログラム（実施形態の判定プログラムや収音プログラムを含む）をインストールすることにより構成するようにしてもよい。 The sound collection device 100 may be entirely configured by hardware (for example, a dedicated chip or the like), or may be partially or entirely configured as software (program). The sound collection device 100 may be configured by installing a program (including the determination program and the sound collection program of the embodiment) in a computer having a processor and a memory, for example.

（Ａ−２）第１の実施形態の動作
次に、以上のような構成を有する第１の実施形態の収音装置１００の動作（実施形態に係る収音方法）を説明する。 (A-2) Operation of First Embodiment Next, an operation (a sound collection method according to the embodiment) of the sound collection device 100 according to the first embodiment having the above-described configuration will be described.

信号入力部１０１は、各マイクロホンアレイで収音した音響信号をアナログ信号からデジタル信号に変換し入力する。その後、例えば高速フーリエ変換を用いて時間領域から周波数領域へ変換する。 The signal input unit 101 converts an acoustic signal collected by each microphone array from an analog signal to a digital signal and inputs the signal. Thereafter, the time domain is transformed into the frequency domain using, for example, a fast Fourier transform.

雑音抑圧部１０２は、信号入力部１０１で取得した信号に含まれる背景雑音の成分を推定し、抑圧する。雑音抑圧部１０２による雑音抑圧には、例えば、ＳＳやウィーナーフィルタリング法（ＷｉｅｎｅｒＦｉｌｔｅｒｉｎｇ）などを用いることができる。 The noise suppression unit 102 estimates and suppresses a background noise component included in the signal acquired by the signal input unit 101. For the noise suppression by the noise suppression unit 102, for example, SS, Wiener Filtering, or the like can be used.

指向性形成部１０３は、マイクロホンアレイ毎に雑音抑圧部により背景雑音を抑圧した信号に対し、（４）式に従いＢＦにより目的エリア方向に指向性を形成する。 The directivity forming unit 103 forms directivity in the direction of the target area by the BF according to the equation (4) with respect to the signal whose background noise has been suppressed by the noise suppressing unit for each microphone array.

遅延補正部１０４は、目的エリアと各マイクロホンアレイの距離の違いにより発生する遅延を算出し、補正する。遅延補正部１０４は、まず空間座標データ１０５から目的エリアの位置と各マイクロホンアレイの位置を取得し、各マイクロホンアレイヘの目的エリア音の到達時間の差を算出する。次に最も目的エリアから遠い位置に配置されたマイクロホンアレイを基準として、全てのマイクロホンアレイに目的エリア音が同時に到達するように遅延を加える。 The delay correction unit 104 calculates and corrects a delay generated due to a difference in distance between the target area and each microphone array. First, the delay correction unit 104 acquires the position of the target area and the position of each microphone array from the spatial coordinate data 105, and calculates the difference between the arrival time of the target area sound to each microphone array. Next, a delay is added so that the sound of the target area reaches all the microphone arrays simultaneously with reference to the microphone array located farthest from the target area.

空間座標データ１０５は、全ての目的エリアと各マイクロホンアレイと各マイクロホンアレイを構成するマイクロホンの位置情報を保持している。空間座標データ１０５が各マイクロホンアレイの各マイクロホンの位置情報を保持する方法や、空間座標データ１０５が保持する位置情報の具体的な形式は限定されないものであり、種々のデータ形式を適用することができる。 The spatial coordinate data 105 holds all target areas, each microphone array, and positional information of microphones constituting each microphone array. The method by which the spatial coordinate data 105 holds the position information of each microphone of each microphone array, and the specific format of the position information held by the spatial coordinate data 105 are not limited, and various data formats may be applied. it can.

補正係数算出部１０６は、各ＢＦ出力に含まれる目的エリア音成分の振幅スペクトルを同じにするための補正係数を（５）、（６）式または（７）、（８）式に従い算出する。 The correction coefficient calculation unit 106 calculates a correction coefficient for equalizing the amplitude spectrum of the target area sound component included in each BF output according to the formulas (5) and (6) or the formulas (7) and (8).

目的エリア音抽出部１０７は、補正係数算出部５で算出した補正係数により補正した各ＢＦ出力データを（９）、もしくは（１０）式に従いＳＳし、目的エリア方向に存在する非目的エリア音を抽出する。さらに抽出した雑音を各ＢＦの出力から（１Ｏ）、もしくは（１１）式に従いＳＳすることにより目的エリア音を抽出する。 The target area sound extraction unit 107 SSs each BF output data corrected by the correction coefficient calculated by the correction coefficient calculation unit 5 according to the equation (9) or (10), and removes the non-target area sound existing in the target area direction. Extract. Further, the target area sound is extracted by subjecting the extracted noise to SS from the output of each BF according to (10) or (11).

混合信号選択部１０８は、各マイクロホンアレイを構成するマイクロホンの入力信号の中で、最も平均振幅スペクトルが小さい信号を、混合信号として選択する。２ｃｈのマイクロホンアレイＭＡ１、ＭＡ２を用いエリア収音を行う場合、混合信号選択部１０８では、以下の（１３）式、（１４）式に従った処理により混合信号Ｘ_ＭＩＸ（ｎ）を選択することができる。 The mixed signal selection unit 108 selects a signal having the smallest average amplitude spectrum among the input signals of the microphones constituting each microphone array as a mixed signal. When performing area sound collection using the 2-channel microphone arrays MA1 and MA2, the mixed signal selection unit 108 selects the mixed signal X _MIX (n) by processing according to the following equations (13) and (14). Can be.

なお、以下の数式以外の文章部分（テキスト部分）では、使用可能な文字（記号）の都合上、（１３）式等の数式における「Ｘ」に上線（オーバーライン；バー）が付された記号を「＾Ｘ」と書き換えて記載するものとする。例えば、以下の（１３）式は、「＾Ｘ_ｉｊ（ｎ）＝ｍｉｎ（＾Ｘ_１１（ｎ）、＾Ｘ_１２（ｎ）、＾Ｘ_２１（ｎ）、＾Ｘ_２２（ｎ））」と書き換えて記載される。 In addition, in a sentence portion (text portion) other than the following mathematical formulas, for convenience of usable characters (symbols), a symbol in which “X” in mathematical formulas such as formula (13) is overlined (overline; bar) Is rewritten as “ΔX”. For example, the following expression (13) is expressed as “＾ X _ij (n) = min (＾ X ₁₁ (n), ＾ X ₁₂ (n), ＾ X ₂₁ (n), ＾ X ₂₂ (n))”. Rewritten and described.

ここで、＾Ｘ_１１（ｎ）、＾Ｘ_１２（ｎ）は、それぞれマイクロホンアレイＭＡ１を構成するマイクロホンＭ１、Ｍ２の入力信号（Ｘ_１１（ｎ）、Ｘ_１２（ｎ））の平均振幅スペクトルである。また、＾Ｘ_２１（ｎ）、＾Ｘ_２２（ｎ）は、それぞれマイクロホンアレイＭＡ２を構成するマイクロホンＭ１、Ｍ２の入力信号（Ｘ_２１（ｎ）、Ｘ_２２（ｎ））の平均振幅スペクトルである。さらに、iはマイクロホンアレイの番号（識別子）であり、jはマイクロホンアレイを構成するマイクロホンの番号（識別子）である。ここでは、マイクロホンアレイＭＡ１の番号を１、マイクロホンアレイＭＡ２の番号を２と表すものとする。また、ここでは、マイクロホンＭ１の番号を１、マイクロホンＭ２の番号を２と表すものとする。

Here, ＾ X ₁₁ (n) and ＾ X ₁₂ (n) are average amplitude spectra of the input signals (X ₁₁ (n), X ₁₂ (n)) of the microphones M1 and M2 constituting the microphone array MA1, respectively. is there. Further, ＾ X ₂₁ (n) and ＾ X ₂₂ (n) are the average amplitude spectra of the input signals (X ₂₁ (n), X ₂₂ (n)) of the microphones M1 and M2 constituting the microphone array MA2, respectively. . Further, i is the number (identifier) of the microphone array, and j is the number (identifier) of the microphones constituting the microphone array. Here, the number of the microphone array MA1 is represented by 1 and the number of the microphone array MA2 is represented by 2. Here, the number of the microphone M1 is represented by 1 and the number of the microphone M2 is represented by 2.

なお、混合信号選択部１０８において、混合信号としてＸ_１１（ｎ）もしくはＸ_１２（ｎ）が選択された場合、混合信号は、以下の（１５）式のように、マイクロホンアレイを構成するマイクロホンの入力信号の加算平均としても良い。なお平均振幅スペクトルは、全帯域ではなく、周波数の上限と下限を設けて帯域制限しても良い。

When X ₁₁ (n) or X ₁₂ (n) is selected as the mixed signal in the mixed signal selection unit 108, the mixed signal is output from the microphones constituting the microphone array as in the following equation (15). The average of the input signals may be used. The average amplitude spectrum may be band-limited by setting upper and lower limits of the frequency instead of the entire band.

信号混合部１０９は、目的エリア音抽出部１０７で抽出した目的エリア音に、混合信号選択部１０８で選択した入力信号を混合する。例えば、信号混合部１０９が、式（１１）に従いマイクロホンアレイＭＡ１を基準としてエリア収音を行う場合、最終的な出力Ｗ_１（ｎ）は以下の（１６）式に従い混合される。ここで、μは混合する信号の大きさを調整するパラメータである。

The signal mixing unit 109 mixes the input signal selected by the mixed signal selection unit 108 with the target area sound extracted by the target area sound extraction unit 107. For example, when the signal mixing unit 109 performs area sound collection based on the microphone array MA1 according to Expression (11), the final output W ₁ (n) is mixed according to Expression (16) below. Here, μ is a parameter for adjusting the magnitude of the signal to be mixed.

信号混合部１０９では、混合を行った出力信号に位相を復元する際、位相情報としては、目的エリア音抽出部１０７において基準としたマイクロホンアレイを構成するマイクロホンの入力信号の加算平均、もしくはどれか１つのマイクロホンの入力信号を使用するようにしてもよい。また、信号混合部１０９では、混合信号として選択した入力信号の位相を使用しても良い。 When restoring the phase to the mixed output signal in the signal mixing unit 109, the phase information includes, as the phase information, the averaging of the input signals of the microphones constituting the microphone array with reference to the target area sound extraction unit 107, or any one of them. The input signal of one microphone may be used. The signal mixing section 109 may use the phase of the input signal selected as the mixed signal.

例えば、信号混合部１０９において、（１１）式により目的エリア音を抽出した場合、マイクロホンアレイＭＡ１を基準としている。したがって、信号混合部１０９は、入力信号Ｘ_１１（ｎ）とＸ_１２（ｎ）の加算平均、又は、Ｘ_１１（ｎ）若しくはＸ_１２（ｎ）どちらかの位相情報を用いて、出力信号に位相を復元するようにしてもよい。 For example, in the signal mixing section 109, when the target area sound is extracted by Expression (11), the microphone array MA1 is used as a reference. Therefore, signal mixing section 109 uses the average of input signals X ₁₁ (n) and X ₁₂ (n) or the phase information of either X ₁₁ (n) or X ₁₂ (n) to output signals. The phase may be restored.

さらに、信号混合部１０９において、信号の混合処理は、目的エリア音と混合信号の振幅スペクトルにそれぞれ位相情報を復元した後に行っても良い。この場合、信号混合部１０９では、位相復元に使用する情報は目的エリア音と混合信号で別々にすることができる。例えば、信号混合部１０９では、目的エリア音には、目的エリア音抽出部１０７において基準としたマイクロホンアレイを構成するマイクロホンの入力信号の加算平均、もしくは基準としたマイクロホンアレイを構成するマイクロホンの内どれか１つの入力信号を適用し、混合信号には、混合信号として選択した入力信号の位相を使用するようにしてもよい。 Further, in the signal mixing section 109, the signal mixing process may be performed after restoring the phase information to the amplitude spectrum of the target area sound and the amplitude spectrum of the mixed signal. In this case, the signal mixing unit 109 can separate information used for phase restoration into the target area sound and the mixed signal. For example, in the signal mixing unit 109, the target area sound is determined by adding or averaging the input signals of the microphones constituting the microphone array used as the reference in the target area sound extracting unit 107, or any of the microphones constituting the reference microphone array. One input signal may be applied, and the phase of the input signal selected as the mixed signal may be used as the mixed signal.

信号出力部１１０は、信号混合部１０９において処理した出力信号を、周波数領域から時間領域へ変換し、出力する。 The signal output unit 110 converts the output signal processed by the signal mixing unit 109 from the frequency domain to the time domain and outputs the converted signal.

以上のように、混合信号選択部１０８は、混合信号として、エリア収音に使用する全マイクロホンの入力信号の平均振幅スペクトルを比較し、最も平均振幅スペクトルが小さい信号を選択する。さらに、第１の実施形態では、特定のエリアを収音する場合、マイクロホンアレイを収音エリアの周囲に設置することが望ましい。 As described above, the mixed signal selection unit 108 compares, as a mixed signal, the average amplitude spectra of the input signals of all microphones used for area sound pickup, and selects a signal having the smallest average amplitude spectrum. Furthermore, in the first embodiment, when sound is collected in a specific area, it is desirable that the microphone array be installed around the sound collection area.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、以下のような効果を奏することができる。 (A-3) Effects of First Embodiment According to the first embodiment, the following effects can be obtained.

第１の実施形態の収音装置１００では、各マイクロホンアレイの入力信号の中で、最も平均振幅スペクトルの小さいものを混合信号として選択することで、目的エリアの近くに非目的エリア音が存在する場合においても、混合後の非目的エリア音の混入を抑え、目的エリア音の歪みを改善することができる。 In the sound collection device 100 of the first embodiment, a non-target area sound exists near a target area by selecting a signal having the smallest average amplitude spectrum among input signals of each microphone array as a mixed signal. Also in this case, it is possible to suppress the mixing of the non-target area sound after mixing, and to improve the distortion of the target area sound.

図２は、各マイクロホンアレイの位置と、各音源の位置（目的エリア音及び非目的エリア音の音源の位置）との関係について示した説明図（イメージ図である）である。 FIG. 2 is an explanatory diagram (image diagram) showing the relationship between the position of each microphone array and the position of each sound source (the position of the sound source of the target area sound and the non-target area sound).

第１の実施形態において、各マイクロホンアレイから収音エリアの中心までは等距離であるものとすると、目的エリア音は、各マイクロホンアレイを構成するマイクロホン全てに同じ音量で入力される（図２（ａ）参照）。一方、非目的エリア音が存在する位置は、各マイクロホンアレイからの距離が異なる。そのため、各マイクロホンアレイの信号に含まれる非目的エリア音の音量は、距離減衰によって違う大きさとなる。また、１つのマイクロホンアレイを構成する各マイクロホンにおいても、非目的エリア音がマイクロホンアレイの正面以外に存在する場合、非目的エリア音と各マイクロホンとの距離が違うため、音量に差が生じる（図２（ｂ）参照）。つまり、非目的エリア音から最も遠い位置にあるマイクロホンの入力信号は、含まれる非目的エリア音が最も小さくなる。目的エリア音は、全てのマイクロホンに同じ音量で含まれているので、全マイクロホンの中で１番平均振幅スペクトルが小さい入力信号は、ＳＮ比が最も高いことになる。そのため、第１の実施形態では、目的エリアの近くに非目的エリア音が存在する場合においても、混合後の非目的エリア音の混入を抑え、目的エリア音の歪みを改善するという効果を奏することができる。 In the first embodiment, assuming that the distance from each microphone array to the center of the sound pickup area is equal, the target area sound is input to all the microphones constituting each microphone array at the same volume (FIG. 2 ( a)). On the other hand, the position where the non-target area sound exists has a different distance from each microphone array. Therefore, the volume of the non-target area sound included in the signal of each microphone array has a different magnitude due to the distance attenuation. Also, in each microphone constituting one microphone array, if the non-target area sound exists other than in front of the microphone array, the distance between the non-target area sound and each microphone is different, so that a difference occurs in the sound volume (see FIG. 2 (b)). That is, the input signal of the microphone located farthest from the non-target area sound includes the smallest non-target area sound. Since the target area sound is included in all the microphones at the same volume, the input signal having the smallest average amplitude spectrum among all the microphones has the highest SN ratio. Therefore, in the first embodiment, even when a non-target area sound exists near the target area, the effect of suppressing the mixing of the mixed non-target area sound and improving the distortion of the target area sound is obtained. Can be.

（Ｂ）第２の実施形態
以下、本発明による収音装置、プログラム及び方法の第２の実施形態を、図面を参照しながら詳述する。 (B) Second Embodiment Hereinafter, a second embodiment of a sound collection device, a program, and a method according to the present invention will be described in detail with reference to the drawings.

（Ｂ−１）第２の実施形態の構成
図３は、第２の実施形態の収音装置１００Ａに係る機能的構成について示したブロック図であり、上述の図１と同一部分又は対応部分については同一符号又は対応符号を付している。 (B-1) Configuration of Second Embodiment FIG. 3 is a block diagram showing a functional configuration of a sound collection device 100A according to a second embodiment. Have the same or corresponding reference numerals.

以下では、第２の実施形態について第１の実施形態との差異を説明する。 Hereinafter, differences between the second embodiment and the first embodiment will be described.

第２の実施形態の収音装置１００Ａでは、マイクロホンアレイ選択部１１が追加されている点で第１の実施形態と異なっている。また、第２の実施形態の収音装置１００Ａでは、目的エリア音抽出部１０７が、目的エリア音抽出部１０７Ａに置き換わっている点で第１の実施形態と異なっている。さらに、第２の実施形態では、混合信号選択部１０８の処理結果がマイクロホンアレイ選択部１１１及び目的エリア音抽出部１０７Ａに供給される点で第１の実施形態と異なっている。 The sound collection device 100A of the second embodiment differs from the first embodiment in that a microphone array selection unit 11 is added. Further, the sound collecting device 100A of the second embodiment is different from the first embodiment in that the target area sound extraction unit 107 is replaced with the target area sound extraction unit 107A. Furthermore, the second embodiment is different from the first embodiment in that the processing result of the mixed signal selection unit 108 is supplied to the microphone array selection unit 111 and the target area sound extraction unit 107A.

収音装置１００Ａを構成する各機能ブロックの詳細処理については後述する。 Detailed processing of each functional block configuring the sound collection device 100A will be described later.

（Ｂ−２）第２の実施形態の動作
次に、以上のような構成を有する第２の実施形態の収音装置１００Ａの動作（実施形態に係る収音方法）について、第１の実施形態との差異を中心に説明する。 (B-2) Operation of the Second Embodiment Next, the operation (sound collection method according to the embodiment) of the sound collection device 100A of the second embodiment having the above-described configuration will be described in the first embodiment. The following description focuses on the differences from.

マイクロホンアレイ選択部１１１は、混合信号選択部１０８によって選択された混合信号によって、目的エリア音抽出部１０７において基準とするマイクロホンアレイ（ＭＡ１又はＭＡ２）を選択する。 The microphone array selection unit 111 selects a microphone array (MA1 or MA2) as a reference in the target area sound extraction unit 107 based on the mixed signal selected by the mixed signal selection unit 108.

例えば、マイクロホンアレイ選択部１１１が、混合信号としてマイクロホンアレイＭＡ１を構成するマイクロホンの入力信号Ｘ_１１（ｎ）を選択した場合、基準となるマイクロホンアレイとして、マイクロホンアレイＭＡ１が選択されたことになる。この場合、目的エリア音抽出部１０７Ａ、では（１１）式に従って目的エリア音を抽出することになる。 For example, when the microphone array selection unit 111 selects the input signal X ₁₁ (n) of the microphones constituting the microphone array MA1 as the mixed signal, the microphone array MA1 has been selected as the reference microphone array. In this case, the target area sound extraction unit 107A extracts the target area sound according to equation (11).

（Ｂ−３）第２の実施形態の効果
第２の実施形態によれば、第１の実施形態と比較して以下のような効果を奏することができる。 (B-3) Effects of the Second Embodiment According to the second embodiment, the following effects can be obtained as compared with the first embodiment.

第２の実施形態では、目的エリア音抽出部１０７Ａが、混合信号選択部１０８によって選択された混合信号の供給元となるマイクロホンアレイ（マイクロホンアレイ選択部１１１が選択したマイクロホンアレイ）を基準とした目的エリア音の抽出処理を行う。これにより、第２の実施形態の収音装置１００Ａでは、混合信号と目的エリア音抽出処理で用いる信号の供給元（供給元のマイクロホンアレイ）が一致するため、目的エリア音の歪みをより改善するという効果を奏することができる。 In the second embodiment, the target area sound extraction unit 107A uses the microphone array (the microphone array selected by the microphone array selection unit 111) serving as the source of the mixed signal selected by the mixed signal selection unit 108 as a reference. Performs area sound extraction processing. Accordingly, in the sound collection device 100A of the second embodiment, the source of the mixed signal and the signal used in the target area sound extraction processing (the microphone array of the source) match, so that the distortion of the target area sound is further improved. The effect described above can be achieved.

（Ｃ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (C) Other Embodiments The present invention is not limited to the above embodiments, but may include modified embodiments as exemplified below.

（Ｃ−１）上記の各実施形態の収音装置では、収音に用いる各マイクロホンアレイＭＡのマイクロホンの数は２つであったが、３つ以上のマイクを用いて収音した音響信号に基づいて目的エリア方向の音を収音するようにしてもよい。上記の各実施形態において、適用するマイクロホンアレイＭＡ毎のマイクロホンの数や目的音方向の音を収音する方式については、既存の種々の方式を適用することができる。 (C-1) In the sound collection device of each of the above embodiments, the number of microphones of each microphone array MA used for sound collection is two, but the number of microphones used for sound collection is three or more. The sound in the direction of the target area may be collected based on the sound. In each of the above embodiments, various existing systems can be applied to the number of microphones for each microphone array MA to be applied and the system for collecting sound in the target sound direction.

１００…収音装置、Ｍ１、Ｍ２…マイクロホン、ＭＡ１、ＭＡ２…マイクロホンアレイ、１０１…信号入力部、１０２…雑音抑圧部、１０３…指向性形成部、１０４…遅延補正部、１０５…空間座標データ、１０６…補正係数算出部、１０７…目的エリア音抽出部、１０８…混合信号選択部、１０９…信号混合部、１１０…信号出力部。 100: sound collecting device, M1, M2: microphone, MA1, MA2: microphone array, 101: signal input unit, 102: noise suppression unit, 103: directivity forming unit, 104: delay correction unit, 105: spatial coordinate data, 106: correction coefficient calculation unit, 107: target area sound extraction unit, 108: mixed signal selection unit, 109: signal mixing unit, 110: signal output unit.

第１の本発明の収音装置は、（１）複数のマイクロホンアレイから入力された入力信号に基づいて、それぞれの前記マイクロホンアレイのビームフォーマ出力を取得し、取得したビームフォーマ出力を用いて目的エリアを音源とする目的エリア音を抽出する目的エリア音抽出手段と、（２）それぞれの前記マイクロホンアレイの入力信号の平均振幅スペクトルを比較した結果に基づいていずれかの前記マイクロホンアレイの入力信号を混合信号として選択する選択手段と、（３）前記目的エリア音抽出手段で抽出された目的エリア音に前記選択手段で選択された前記マイクロホンアレイの入力信号を混合信号として混合する信号混合手段と、（４）前記選択手段が混合した混合後信号を出力する出力手段とを有し、（５）前記選択手段は、複数の前記マイクロホンアレイの入力信号の中で平均振幅スペクトルが最小の信号を前記混合信号として選択することを特徴とする。 According to a first aspect of the present invention, there is provided a sound collecting apparatus which (1) acquires beamformer outputs of the respective microphone arrays based on input signals inputted from a plurality of microphone arrays, and uses the acquired beamformer outputs. A target area sound extracting means for extracting a target area sound having an area as a sound source; and (2) an input signal of any one of the microphone arrays based on a result of comparing average amplitude spectra of input signals of the respective microphone arrays. Selecting means for selecting as a mixed signal; (3) signal mixing means for mixing, as a mixed signal, the input signal of the microphone array selected by the selecting means with the target area sound extracted by the target area sound extracting means; (4) possess and output means for outputting a mixed signal after said selection means are mixed, (5) the selection means, double Average amplitude spectrum in the input signal of the microphone array and selects the minimum signal as the mixing signal.

第２の本発明の収音プログラムは、コンピュータを、（１）複数のマイクロホンアレイから入力された入力信号に基づいて、それぞれの前記マイクロホンアレイのビームフォーマ出力を取得し、取得したビームフォーマ出力を用いて目的エリアを音源とする目的エリア音を抽出する目的エリア音抽出手段と、（２）それぞれの前記マイクロホンアレイの入力信号の平均振幅スペクトルを比較した結果に基づいていずれかの前記マイクロホンアレイの入力信号を混合信号として選択する選択手段と、（３）前記目的エリア音抽出手段で抽出された目的エリア音に前記選択手段で選択された前記マイクロホンアレイの入力信号を混合信号として混合する信号混合手段と、（４）前記選択手段が混合した混合後信号を出力する出力手段として機能させ、（５）前記選択手段は、複数の前記マイクロホンアレイの入力信号の中で平均振幅スペクトルが最小の信号を前記混合信号として選択することを特徴とする。 A sound collection program according to a second aspect of the present invention provides a computer which (1) acquires beamformer outputs of the respective microphone arrays based on input signals input from a plurality of microphone arrays, and outputs the acquired beamformer outputs. A target area sound extracting means for extracting a target area sound using the target area as a sound source, and (2) comparing one of the microphone arrays based on a result of comparing average amplitude spectra of input signals of the microphone arrays. Selecting means for selecting an input signal as a mixed signal; and (3) signal mixing for mixing an input signal of the microphone array selected by the selecting means with a target area sound extracted by the target area sound extracting means as a mixed signal. And (4) functioning as output means for outputting a mixed signal mixed by the selection means. , (5) the selection means, the average amplitude spectrum in the plurality of input signals of the microphone array, characterized in you to select a minimum signal as the mixing signal.

第３の本発明は、収音装置が行う収音方法において、（１）目的エリア音抽出手段、選択手段、信号混合手段、出力手段を備え、（２）前記目的エリア音抽出手段は、複数のマイクロホンアレイから入力された入力信号に基づいて、それぞれの前記マイクロホンアレイのビームフォーマ出力を取得し、取得したビームフォーマ出力を用いて目的エリアを音源とする目的エリア音を抽出し、（３）前記選択手段は、それぞれの前記マイクロホンアレイの入力信号の平均振幅スペクトルを比較した結果に基づいていずれかの前記マイクロホンアレイの入力信号を混合信号として選択し、（４）前記信号混合手段は、前記目的エリア音抽出手段で抽出された目的エリア音に前記選択手段で選択された前記マイクロホンアレイの入力信号を混合信号として混合し、（５）前記出力手段は、前記選択手段が混合した混合後信号を出力し、（６）前記選択手段は、複数の前記マイクロホンアレイの入力信号の中で平均振幅スペクトルが最小の信号を前記混合信号として選択することを特徴とする。
According to a third aspect of the present invention, there is provided a sound collection method performed by a sound collection device, comprising: (1) a target area sound extraction unit, a selection unit, a signal mixing unit, and an output unit; (3) acquiring a beamformer output of each of the microphone arrays based on an input signal input from the microphone array, and extracting a target area sound having a target area as a sound source using the acquired beamformer output. The selection means selects one of the microphone array input signals as a mixed signal based on a result of comparing the average amplitude spectra of the input signals of the microphone arrays, and (4) the signal mixing means The input signal of the microphone array selected by the selection means is added to the target area sound extracted by the target area sound extraction means as a mixed signal. Mixed Te, (5) said output means outputs the mixed signal after said selection means are mixed, (6) the selection means, the average amplitude spectrum in the plurality of input signals of the microphone array is smallest A signal is selected as the mixed signal .

Claims

A target area sound for acquiring a beamformer output of each of the microphone arrays based on input signals input from a plurality of microphone arrays, and extracting a target area sound having a target area as a sound source using the acquired beamformer outputs; Extraction means;
Selecting means for selecting an input signal of any one of the microphone arrays as a mixed signal based on a result of comparing the average amplitude spectra of the input signals of the respective microphone arrays,
Signal mixing means for mixing the input signal of the microphone array selected by the selection means with the target area sound extracted by the target area sound extraction means as a mixed signal,
An output unit for outputting a mixed signal mixed by the selection unit.

2. The sound pickup device according to claim 1, wherein the selection unit selects a signal having a minimum average amplitude spectrum among the input signals of the plurality of microphone arrays as the mixed signal. 3.

3. The sound collection device according to claim 1, wherein the target area sound extraction unit performs a process of extracting a target area sound based on a beamformer output of any one of the microphone arrays. 4.

4. The sound pickup according to claim 3, wherein the target area sound extracting unit performs a process of extracting a target area sound based on a beamformer output of the microphone array related to the input signal selected by the selecting unit. 5. apparatus.

Computer
A target area sound for acquiring a beamformer output of each of the microphone arrays based on input signals input from a plurality of microphone arrays, and extracting a target area sound having a target area as a sound source using the acquired beamformer outputs; Extraction means;
Selecting means for selecting an input signal of any one of the microphone arrays as a mixed signal based on a result of comparing the average amplitude spectra of the input signals of the respective microphone arrays,
Signal mixing means for mixing the input signal of the microphone array selected by the selection means with the target area sound extracted by the target area sound extraction means as a mixed signal,
A sound collection program characterized by causing the selection means to function as an output means for outputting a mixed signal after mixing.

In the sound pickup method performed by the sound pickup device,
Target area sound extraction means, selection means, signal mixing means, output means,
The target area sound extracting means obtains a beamformer output of each of the microphone arrays based on input signals input from a plurality of microphone arrays, and uses the obtained beamformer outputs to set a target area as a sound source. Extract area sounds,
The selecting means selects any one of the microphone array input signals as a mixed signal based on a result of comparing the average amplitude spectrum of the respective microphone array input signals,
The signal mixing unit mixes the input signal of the microphone array selected by the selection unit with the target area sound extracted by the target area sound extraction unit as a mixed signal,
The sound pickup method, wherein the output means outputs a mixed signal mixed by the selection means.