JP6725014B1

JP6725014B1 - Sound collecting device, sound collecting program, and sound collecting method

Info

Publication number: JP6725014B1
Application number: JP2019009620A
Authority: JP
Inventors: 一浩片桐
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2019-01-23
Filing date: 2019-01-23
Publication date: 2020-07-15
Anticipated expiration: 2039-01-23
Also published as: JP2020120263A

Abstract

【課題】エリア収音処理の際に音質劣化を抑制する。【解決手段】本発明は、収音装置に関する。そして、本発明の収音装置は、複数のマイクアレイのビームフォーマ出力に基づく目的方向信号を取得する手段と、取得した目的方向信号をスペクトル減算処理することで非目的エリア音を抽出し、目的方向信号から非目的エリア音をスペクトル減算することにより目的エリア音を抽出する手段と、捕捉信号に基づく非目的エリアを音源とする音の成分が優勢な非目的エリア音成分優勢信号からピーク周波数を検出し、非目的エリア音成分優勢信号のピーク周波数に基づく抑圧周波数の成分を抑圧する抑圧フィルタを形成すると、抑圧フィルタを混合用信号に掛けて目的エリア音に混合して混合後信号を取得する手段と、混合後信号を目的エリアのエリア収音結果として出力する手段とを有することを特徴とする。【選択図】図１PROBLEM TO BE SOLVED: To suppress sound quality deterioration during area sound collection processing. The present invention relates to a sound collecting device. Then, the sound collecting device of the present invention extracts a non-target area sound by a means for acquiring a target direction signal based on the beamformer outputs of a plurality of microphone arrays and a spectrum subtraction process for the acquired target direction signal. A means for extracting the target area sound by spectrally subtracting the non-target area sound from the direction signal, and a peak frequency from the non-target area sound component dominant signal in which the sound component whose sound source is the non-target area based on the captured signal is dominant. When a suppression filter that detects and suppresses the suppression frequency component based on the peak frequency of the non-target area sound component dominant signal is formed, the suppression filter is applied to the mixing signal to mix with the target area sound and the mixed signal is obtained. And a means for outputting the mixed signal as an area sound collection result of the target area. [Selection diagram]

Description

本発明は、収音装置、収音プログラム及び収音方法に関し、例えば特定のエリアの音を強調し、それ以外のエリアの音を抑圧するエリア収音処理に適用し得る。 The present invention relates to a sound collecting device, a sound collecting program, and a sound collecting method, and can be applied to, for example, area sound collecting processing that emphasizes sound in a specific area and suppresses sound in other areas.

従来、複数の音源が存在する環境下において、ある特定の方向の音のみ分離し収音する技術として、マイクロホンアレイを用いたビームフォーマ（ＢｅａｍＦｏｒｍｅｒ；以下、「ＢＦ」と呼ぶ）がある。ＢＦとは、各マイクロホンに到達する信号の時間差を利用して指向性を形成する技術である（非特許文献１参照）。ＢＦは、加算型と減算型の大きく２つの種類に分けられる。特に減算型ＢＦは、加算型ＢＦに比べ、少ないマイクロホン数で指向性を形成できるという利点がある。 BACKGROUND ART Conventionally, there is a beam former (Beam Former; hereinafter referred to as “BF”) using a microphone array as a technique for separating and collecting only a sound in a specific direction in an environment where a plurality of sound sources exist. BF is a technique for forming directivity by utilizing the time difference between signals that reach each microphone (see Non-Patent Document 1). BFs are roughly classified into two types: addition type and subtraction type. In particular, the subtraction type BF has an advantage that directivity can be formed with a smaller number of microphones than the addition type BF.

図６は、マイクロホン数が２個の場合の減算型ＢＦ３００に係る構成を示すブロック図である。 FIG. 6 is a block diagram showing a configuration of the subtraction type BF 300 when the number of microphones is two.

図６に示す減算型ＢＦ３００は、遅延器３１０と減算器３２０とを有している。 The subtraction type BF 300 shown in FIG. 6 has a delay device 310 and a subtractor 320.

減算型ＢＦ３００は、まず遅延器３１０により目的とする方向に存在する音（以下、「目的音」と呼ぶ）が各マイクロホンに到来する信号の時間差を算出し、遅延を加えることにより目的音の位相を合わせる。時間差は下記（１）式により算出される。ここで「ｄ」はマイクロホン間の距離であり、「ｃ」は音速であり、「τ_L」は遅延量である。また、ここで「θ_Ｌ」は、各マイクロホン（Ｍ１、Ｍ２）の間を結んだ直線に対する垂直方向から目的方向への角度である。
τ_Ｌ＝（ｄｓｉｎθ_Ｌ）／ｃ …（１） The subtraction type BF 300 first calculates the time difference between the signals of the sounds existing in the target direction (hereinafter referred to as the “target sound”) that arrive at each microphone by the delay device 310, and adds the delay to obtain the phase of the target sound. Match. The time difference is calculated by the following equation (1). Here, “d” is the distance between the microphones, “c” is the speed of sound, and “τ _L ”is the delay amount. Further, here, “θ _L ”is an angle from the vertical direction to the straight line connecting the microphones (M1, M2) to the target direction.
τ _L =(dsin θ _L )/c (1)

ここで、死角がマイクロホンＭ１とマイクロホンＭ２の中心に対し、マイクロホンＭ１の方向に存在する場合、遅延器３１０は、マイクロホンＭ１の入力信号ｘ_１（ｔ）に対し遅延処理を行う。その後、減算型ＢＦ３００では、減算器３２０が（２）式に従い減算処理を行う。
ｍ（ｔ）＝ｘ_２（ｔ）−ｘ_１（ｔ−τ_Ｌ） …（２） Here, when the blind spot exists in the direction of the microphone M1 with respect to the centers of the microphones M1 and M2, the delay device 310 delays the input signal x ₁ (t) of the microphone M1. After that, in the subtraction type BF 300, the subtractor 320 performs the subtraction process according to the equation (2).
_{m (t) = x 2 (} t) -x 1 (t-τ L) ... (2)

減算器３２０では、周波数領域でも同様に減算処理を行うことができ、その場合（２）式は以下（３）式のように変更される。

The subtractor 320 can also perform the subtraction processing in the frequency domain as well, and in that case, the expression (2) is changed to the following expression (3).

図７は、２個のマイクロホンＭ１、Ｍ２を用いた減算型ＢＦ３００により形成される指向特性を示す図である。 FIG. 7 is a diagram showing a directional characteristic formed by the subtraction type BF300 using the two microphones M1 and M2.

ここでθ_Ｌ＝±π／２の場合、減算器３２０で形成される指向性は図７（ａ）に示すように、カージオイド型の単一指向性となり、θ_Ｌ＝０，πの場合は、図７（ｂ）のような８の字型の双指向性となる。ここでは、入力信号から単一指向性を形成するフィルタを「単一指向性フィルタ」と呼び、双指向性を形成するフィルタを「双指向性フィルタ」と呼ぶものとする。 Here, when θ _L =±π/2, the directivity formed by the subtractor 320 is a cardioid unidirectionality as shown in FIG. 7A, and when θ _L =0,π Has a figure-8 bidirectional pattern as shown in FIG. Here, a filter that forms unidirectionality from an input signal is called a "unidirectional filter", and a filter that forms bidirectionality is called a "bidirectional filter".

また、減算器３２０では、スペクトル減算法（ＳｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎ；以下単に「ＳＳ」とも呼ぶ）を用いることで、双指向性の死角に強い指向性を形成することもできる。ＳＳによる指向性は、（４）式に従い全周波数、又は指定した周波数帯域で形成される。（４）式では、マイクロホンＭ１の入力信号Ｘ_１を用いているが、マイクロホンＭ２の入力信号Ｘ_２でも同様の効果を得ることができる。ここでβはＳＳの強度を調節するための係数である。 Further, in the subtractor 320, a strong directivity can be formed in the bi-directional blind spot by using a spectral subtraction method (Spectral Subtraction; hereinafter also simply referred to as “SS”). The directivity due to the SS is formed at all frequencies or a designated frequency band according to the equation (4). In the formula (4), the input signal X ₁ of the microphone M1 is used, but the same effect can be obtained even with the input signal X ₂ of the microphone M2. Here, β is a coefficient for adjusting the strength of SS.

減算器３２０では、減算処理時に値がマイナスになった場合は、０または元の値を小さくした値に置き換える処理（フロアリング処理）を行う。この方式により、減算器３２０では、双指向性フィルタにより目的方向以外に存在する音（以下、「非目的音」と呼ぶ）を抽出し、抽出した非目的音の振幅スペクトルを入力信号の振幅スペクトルから減算することで、目的音を強調することができる。
Ｙ（ｎ）＝Ｘ_１（ｎ）−βＭ（ｎ） …（４） In the subtractor 320, when the value becomes negative during the subtraction process, a process (flooring process) of replacing 0 or the original value is performed. With this method, the subtractor 320 extracts sounds existing in directions other than the target direction (hereinafter, referred to as “non-target sound”) by the bidirectional filter, and extracts the amplitude spectrum of the extracted non-target sound as the amplitude spectrum of the input signal. The target sound can be emphasized by subtracting from.
Y(n)=X ₁ (n)-βM(n) (4)

ところで、ある特定のエリア内に存在する音（以下、「目的エリア音」と呼ぶ）だけを収音したい場合、減算型ＢＦを用いるだけでは、そのエリアの周囲に存在する音源（以下、「非目的エリア音」と呼ぶ）も収音してしまう可能性がある。そこで特許文献１の記載技術では、複数のマイクロホンアレイを用い、それぞれ別々の方向から目的エリアへ指向性を向け、指向性を目的エリアで交差させることで目的エリア音を収音する手法（以下、「エリア収音」と呼ぶ）を提案している。 By the way, when only the sound existing in a specific area (hereinafter, referred to as “target area sound”) is desired to be picked up, the sound source (hereinafter, referred to as “non There is also a possibility that the target area sound will be picked up). Therefore, in the technique described in Patent Document 1, a method of collecting a target area sound by using a plurality of microphone arrays, directing the directivity from different directions to the target area, and intersecting the directivity in the target area (hereinafter, Called "area pickup").

従来のエリア収音では、まず各マイクロホンアレイのＢＦ出力に含まれる目的エリア音の振幅スペクトルの比率を推定し、それを補正係数とする。例えば、２つのマイクロホンアレイを使用する場合、目的エリア音振幅スペクトルの補正係数は、（５）、（６）式または（７）、（８）式により算出される。

In the conventional area sound collection, first, the ratio of the amplitude spectrum of the target area sound included in the BF output of each microphone array is estimated and used as the correction coefficient. For example, when two microphone arrays are used, the correction coefficient of the target area sound amplitude spectrum is calculated by the equations (5), (6) or (7), (8).

ここで、「Ｙ_１ｋ（ｎ）」、「Ｙ_２ｋ（ｎ）」は、それぞれ第１、第２のマイクロホンアレイのＢＦ出力の振幅スペクトルである。また、「Ｎ」は周波数ビンの総数であり、「ｋ」は周波数である。さらに、「α_１（ｎ）」、「α_２（ｎ）」は、それぞれ第１、第２のマイクロホンアレイのＢＦ出力に対する振幅スペクトル補正係数である。さらにまた、「ｍｏｄｅ」は最頻値、「ｍｅｄｉａｎ」は中央値をそれぞれ表している。 Here, “Y _1k (n)” and “Y _2k (n)” are the amplitude spectra of the BF outputs of the first and second microphone arrays, respectively. Also, "N" is the total number of frequency bins and "k" is the frequency. Furthermore, “α ₁ (n)” and “α ₂ (n)” are amplitude spectrum correction coefficients for the BF outputs of the first and second microphone arrays, respectively. Furthermore, “mode” represents the mode value, and “median” represents the median value.

従来のエリア収音処理では、その後、補正係数により各ＢＦ出力を補正し、ＳＳすることで、目的エリア方向に存在する非目的エリア音を抽出する。更に抽出した非目的エリア音を各ＢＦの出力からＳＳすることにより目的エリア音を抽出することができる。 In the conventional area sound collection process, each BF output is then corrected by the correction coefficient and SS is performed to extract the non-target area sound existing in the target area direction. Furthermore, the target area sound can be extracted by performing SS of the extracted non-target area sound from the output of each BF.

この場合、従来のエリア収音処理では、第１のマイクロホンアレイからみた目的エリア方向に存在する非目的エリア音Ｎ_１（ｎ）を抽出するには、（９）式に示すように、第１のマイクロホンアレイのＢＦ出力Ｙ_１（ｎ）から第２のマイクロホンアレイのＢＦ出力Ｙ_２（ｎ）に振幅スペクトル補正係数α_２を掛けたものをＳＳする。同様に（１０）式に従い、第２のマイクロホンアレイからみた目的エリア方向に存在する非目的エリア音Ｎ_２（ｎ）を抽出する。
Ｎ_１（ｎ）＝Ｙ_１（ｎ）−α_２（ｎ）Ｙ_２（ｎ） …（９）
Ｎ_２（ｎ）＝Ｙ_２（ｎ）−α_１（ｎ）Ｙ_１（ｎ） …（１０） In this case, in the conventional area sound collection processing, in order to extract the non-target area sound N ₁ (n) existing in the direction of the target area viewed from the first microphone array, as shown in Expression (9), to SS BF output _Y 1 of the microphone array from (n) to the BF output _Y 2 of the second microphone array (n) a multiplied by the amplitude spectrum correction coefficient alpha _2. Similarly, according to the equation (10), the non-target area sound N ₂ (n) existing in the direction of the target area viewed from the second microphone array is extracted.
N ₁ (n)=Y ₁ (n)−α ₂ (n)Y ₂ (n) (9)
N ₂ (n)=Y ₂ (n)−α ₁ (n)Y ₁ (n) (10)

その後、従来のエリア収音処理では、（１１）式、（１２）式に従い、各ＢＦ出力から非目的エリア音をＳＳして目的エリア音を抽出する。（１１）式は第１のマイクロホンアレイを基準として目的エリア音を抽出する処理を示しており、（１２）式は第２のマイクロホンアレイを基準として目的エリア音を抽出する処理を示している。
Ｚ_１（ｎ）＝Ｙ_１（ｎ）−γ_１（ｎ）Ｎ_１（ｎ） …（１１）
Ｚ_２（ｎ）＝Ｙ_２（ｎ）−γ_２（ｎ）Ｎ_２（ｎ） …（１２） After that, in the conventional area sound collection processing, the target area sound is extracted by SS of the non-target area sound from each BF output according to the expressions (11) and (12). Expression (11) shows a process of extracting the target area sound with the first microphone array as a reference, and Expression (12) shows a process of extracting the target area sound with the second microphone array as a reference.
Z ₁ (n)=Y ₁ (n)−γ ₁ (n)N ₁ (n) (11)
Z ₂ (n)=Y ₂ (n)−γ ₂ (n)N ₂ (n) (12)

ここでγ_１（ｎ）、γ_２（ｎ）はＳＳ時の強度を変更するための係数である。 Here, γ ₁ (n) and γ ₂ (n) are coefficients for changing the strength during SS.

背景雑音や非目的エリア音の音量レベルが大きい場合、目的エリア音抽出の際に行うＳＳにより目的エリア音が歪んだり、ミュージカルノイズといった耳障りな異音が発生する可能性がある。 When the volume level of the background noise or the non-target area sound is high, the target area sound may be distorted by the SS performed when the target area sound is extracted, or an unpleasant noise such as musical noise may occur.

そこで、特許文献２に記載されたエリア収音手法では、背景雑音と非目的エリア音の大きさに応じて、マイクの入力信号と推定雑音の音量レベルをそれぞれ調節し、抽出した目的エリア音に混合している。目的エリア音を抽出する処理により発生するミュージカルノイズは、背景雑音と非目的エリア音の音量レベルが大きいほど強くなるため、混合する入力信号と推定雑音の総和の音量レベルも、背景雑音と非目的エリア音の音量レベルに比例して大きくする。そこで、特許文献２に記載された手法では、背景雑音の音量レベルは、背景雑音を抑圧する過程で求める推定雑音から算出する。さらに、特許文献２に記載された手法では、非目的エリア音の音量レベルは、目的エリア音を強調する過程で抽出する目的エリア方向に存在する非目的エリア音と、目的エリア方向以外に存在する非目的エリア音を合わせたものから算出する。 Therefore, in the area sound collection method described in Patent Document 2, the volume level of the input signal of the microphone and the volume level of the estimated noise are adjusted according to the levels of the background noise and the non-target area sound, and the extracted target area sound is obtained. Mixed. The musical noise generated by the process of extracting the target area sound becomes stronger as the volume level of the background noise and the non-target area sound increases, so the volume level of the sum of the input signal and the estimated noise to be mixed is also the background noise and the non-target area. Increase in proportion to the volume level of the area sound. Therefore, in the method described in Patent Document 2, the volume level of the background noise is calculated from the estimated noise obtained in the process of suppressing the background noise. Further, in the method described in Patent Document 2, the volume level of the non-target area sound exists in the non-target area sound existing in the target area direction extracted in the process of emphasizing the target area sound and in the direction other than the target area direction. Calculated from the sum of non-target area sounds.

ところで、ここで、従来のエリア収音処理において、混合する入力信号と推定雑音の比率を、推定雑音と非目的エリア音の音量レベルから決定することを想定する。そうすると、目的エリアの近くに非目的エリア音が存在する場合、混合する入力信号の音量レベルが大きすぎると目的エリア音に非目的エリア音が混入し、どちらが目的エリア音なのかが分からなくなってしまう。 By the way, here, it is assumed that in the conventional area sound collection processing, the ratio of the input signal to be mixed with the estimated noise is determined from the volume levels of the estimated noise and the non-target area sound. Then, when there is a non-target area sound near the target area, if the volume level of the input signal to be mixed is too high, the target area sound is mixed with the non-target area sound, and it becomes unclear which is the target area sound. ..

そこで、特許文献２に記載された手法では、非目的エリア音が大きいときは混合する入力信号の音量レベルを下げ、推定雑音の音量レベルを大きくして混合する。つまり、特許文献２に記載された手法では、非目的エリア音が存在しないか音量レベルが小さい場合は入力信号の割合を多くし、逆に非目的エリア音の音量レベルが大きい場合推定雑音の割合を多くして混合する。 Therefore, in the method described in Patent Document 2, when the non-target area sound is large, the volume level of the input signal to be mixed is lowered, and the volume level of the estimated noise is increased to mix. That is, in the method described in Patent Document 2, the ratio of the input signal is increased when the non-target area sound does not exist or the volume level is low, and conversely, the ratio of the estimated noise when the volume level of the non-target area sound is high. Mix and mix a lot.

このように特許文献２の手法を用いれば、目的エリア音に入力信号及び推定雑音を混合することにより、ミュージカルノイズをマスキングし、通常の背景雑音のように違和感なく聞かせることができる。また、特許文献２の手法を用いれば、マイク入力信号に含まれる目的エリア音の成分により、目的エリア音の歪みを補正し、音質を改善することができる。 As described above, by using the method of Patent Document 2, it is possible to mask musical noise by mixing the target area sound with the input signal and the estimated noise, and to make the sound as if it were normal background noise. Further, if the method of Patent Document 2 is used, the distortion of the target area sound can be corrected by the component of the target area sound included in the microphone input signal, and the sound quality can be improved.

特開２０１４−０７２７０８号公報JP, 2014-072708, A 特開２０１７−１８３９０２号公報JP, 2017-183902, A

浅野太著，“音響テクノロジーシリーズ１６音のアレイ信号処理−音源の定位・追跡と分離−”，日本音響学会編，コロナ社，２０１１年２月２５日発行Futoshi Asano, “Acoustic Technology Series 16 Sound Array Signal Processing-Sound Source Localization/Tracking and Separation-”, Acoustical Society of Japan, Corona Publishing, February 25, 2011

しかしながら、特許文献２の手法では、周囲の非目的エリア音レベルが大きい場合、目的エリア音への加算は、入力信号を小さくして推定雑音を大きくするため、ミュージカルノイズをマスキングすることはできるが、音質改善の効果は弱まってしまう。 However, in the method of Patent Document 2, when the surrounding non-target area sound level is high, addition to the target area sound reduces the input signal and increases estimated noise, so that musical noise can be masked. , The effect of sound quality improvement is weakened.

そのため、エリア収音処理の際に音質劣化を抑制する収音装置、収音プログラム及び収音方法が望まれている。 Therefore, a sound collecting device, a sound collecting program, and a sound collecting method that suppress sound quality deterioration during the area sound collecting process are desired.

第１の本発明は、（１）複数のマイクアレイが出力する捕捉信号又は前記捕捉信号に基づく信号のそれぞれに対し、ビームフォーマによって目的エリア音方向へ指向性を形成して、前記マイクアレイごとに目的方向信号を取得する指向性形成手段と、（２）それぞれの前記目的方向信号をスペクトル減算することで目的エリア方向に存在する非目的エリア音を抽出し、抽出した前記非目的エリア音をいずれかの前記目的方向信号からスペクトル減算することにより目的エリア音を抽出する目的エリア音抽出手段と、（３）前記捕捉信号に基づく非目的エリアを音源とする音の成分が優勢な非目的エリア音成分優勢信号からピーク周波数を検出し、前記非目的エリア音成分優勢信号のピーク周波数に基づく抑圧周波数の成分を抑圧する抑圧フィルタを形成する混合フィルタ形成手段と、（４）前記混合フィルタ形成手段で形成された抑圧フィルタを、いずれかの前記マイクアレイが出力する前記捕捉信号又は前記捕捉信号に基づく信号により構成される混合用信号に掛けてフィルタ済混合用信号を取得し、さらに前記フィルタ済混合用信号を、前記目的エリア音抽出手段で抽出した前記目的エリア音に混合して混合後信号を取得する信号混合手段と、（５）前記信号混合手段が取得した混合後信号を目的エリアのエリア収音結果として出力する出力手段とを有することを特徴とする。 The first aspect of the present invention is: (1) Forming directivity in a target area sound direction by a beamformer with respect to each of capture signals output from a plurality of microphone arrays or signals based on the capture signals, And (2) extracting the non-target area sound existing in the target area direction by spectrally subtracting each of the target direction signals, and extracting the extracted non-target area sound. Target area sound extraction means for extracting a target area sound by spectrally subtracting from any one of the target direction signals; and (3) a non-target area in which a sound component whose sound source is a non-target area based on the captured signal is dominant. Mixing filter forming means for detecting a peak frequency from a sound component dominant signal and forming a suppression filter for suppressing a suppression frequency component based on the peak frequency of the non-target area sound component dominant signal; and (4) the mixing filter forming means. The suppression filter formed in 1. is applied to a mixing signal composed of the capture signal or a signal based on the capture signal output from any one of the microphone arrays to obtain a filtered mixing signal, and the filtered signal is further filtered. A signal mixing means for mixing a mixing signal with the target area sound extracted by the target area sound extracting means to obtain a mixed signal; and (5) the mixed signal acquired by the signal mixing means in the target area. It is characterized by having an output means for outputting as an area sound collection result.

第２の本発明の収音プログラムは、コンピュータを、（１）複数のマイクアレイが出力する捕捉信号又は前記捕捉信号に基づく信号のそれぞれに対し、ビームフォーマによって目的エリア音方向へ指向性を形成して、前記マイクアレイごとに目的方向信号を取得する指向性形成手段と、（２）それぞれの前記目的方向信号をスペクトル減算することで目的エリア方向に存在する非目的エリア音を抽出し、抽出した前記非目的エリア音をいずれかの前記目的方向信号からスペクトル減算することにより目的エリア音を抽出する目的エリア音抽出手段と、（３）前記捕捉信号に基づく非目的エリアを音源とする音の成分が優勢な非目的エリア音成分優勢信号からピーク周波数を検出し、前記非目的エリア音成分優勢信号のピーク周波数に基づく抑圧周波数の成分を抑圧する抑圧フィルタを形成する混合フィルタ形成手段と、（４）前記混合フィルタ形成手段で形成された抑圧フィルタを、いずれかの前記マイクアレイが出力する前記捕捉信号又は前記捕捉信号に基づく信号により構成される混合用信号に掛けてフィルタ済混合用信号を取得し、さらに前記フィルタ済混合用信号を、前記目的エリア音抽出手段で抽出した前記目的エリア音に混合して混合後信号を取得する信号混合手段と、（５）前記信号混合手段が取得した混合後信号を目的エリアのエリア収音結果として出力する出力手段として機能させることを特徴とする。 A sound collecting program according to a second aspect of the present invention causes a computer to (1) form a directivity in a sound direction of a target area by a beam former for each of a capture signal output by a plurality of microphone arrays or a signal based on the capture signal. Then, directivity forming means for acquiring a target direction signal for each microphone array, and (2) spectral subtraction of each target direction signal to extract non-target area sounds existing in the target area direction, and extract Target area sound extraction means for extracting the target area sound by spectrally subtracting the non-target area sound from any of the target direction signals; and (3) a sound sourced from the non-target area based on the captured signal. A mixing filter forming means for detecting a peak frequency from a non-target area sound component dominant signal having a dominant component, and forming a suppression filter for suppressing a suppression frequency component based on the peak frequency of the non-target area sound component dominant signal; 4) The suppression filter formed by the mixing filter forming means is applied to a mixing signal composed of the captured signal output from any one of the microphone arrays or a signal based on the captured signal to obtain a filtered mixed signal. And (5) the signal mixing means that acquires and further mixes the filtered mixed signal with the target area sound extracted by the target area sound extraction means to acquire a mixed signal. It is characterized in that it functions as an output means for outputting the mixed signal as an area sound collection result of the target area.

第３の本発明は、収音装置が行う収音方法において、（１）指向性形成手段、目的エリア音抽出手段、混合フィルタ形成手段、信号混合手段及び出力手段を有し、（２）前記指向性形成手段は、複数のマイクアレイが出力する捕捉信号又は前記捕捉信号に基づく信号のそれぞれに対し、ビームフォーマによって目的エリア音方向へ指向性を形成して、前記マイクアレイごとに目的方向信号を取得し、（３）前記目的エリア音抽出手段は、それぞれの前記目的方向信号をスペクトル減算することで目的エリア方向に存在する非目的エリア音を抽出し、抽出した前記非目的エリア音をいずれかの前記目的方向信号からスペクトル減算することにより目的エリア音を抽出し、（４）前記混合フィルタ形成手段は、前記捕捉信号に基づく非目的エリアを音源とする音の成分が優勢な非目的エリア音成分優勢信号からピーク周波数を検出し、前記非目的エリア音成分優勢信号のピーク周波数に基づく抑圧周波数の成分を抑圧する抑圧フィルタを形成し、（５）前記信号混合手段は、前記混合フィルタ形成手段で形成された抑圧フィルタを、いずれかの前記マイクアレイが出力する前記捕捉信号又は前記捕捉信号に基づく信号により構成される混合用信号に掛けてフィルタ済混合用信号を取得し、さらに前記フィルタ済混合用信号を、前記目的エリア音抽出手段で抽出した前記目的エリア音に混合して混合後信号を取得し、（６）前記出力手段は、前記信号混合手段が取得した混合後信号を目的エリアのエリア収音結果として出力することを特徴とする。 A third aspect of the present invention is a sound collecting method performed by a sound collecting device, including (1) directivity forming means, target area sound extracting means, mixing filter forming means, signal mixing means and output means, and (2) the above The directivity forming means forms a directivity in the target area sound direction by the beam former for each of the capture signals output by the plurality of microphone arrays or the signals based on the capture signals, and the target direction signal for each microphone array. (3) The target area sound extraction means extracts the non-target area sound existing in the target area direction by spectrally subtracting each of the target direction signals, and the extracted non-target area sound is A target area sound is extracted by spectrally subtracting from the target direction signal, and (4) the mixing filter forming means is a non-target area in which a sound component whose sound source is a non-target area based on the captured signal is dominant. A peak frequency is detected from the sound component dominant signal, and a suppression filter for suppressing a suppression frequency component based on the peak frequency of the non-target area sound component dominant signal is formed; (5) the signal mixing means forms the mixing filter. The suppression filter formed by means is applied to a mixing signal composed of the captured signal or a signal based on the captured signal output from any one of the microphone arrays to obtain a filtered mixed signal, and further the filter The mixed signal is mixed with the target area sound extracted by the target area sound extraction means to obtain a mixed signal, and (6) the output means targets the mixed signal acquired by the signal mixing means. It is characterized in that it is output as the area sound collection result of the area.

この本発明によれば、エリア収音処理の際に音質劣化を抑制することができる。 According to the present invention, it is possible to suppress sound quality deterioration during area sound collection processing.

第１の実施形態に係る収音装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the sound collection device which concerns on 1st Embodiment. 第１及び第２の実施形態に係る収音装置のハードウェア構成の例について示したブロック図である。It is the block diagram shown about the example of the hardware constitutions of the sound collection device concerning a 1st and 2nd embodiment. 第１の実施形態に係る収音装置で処理される混合用信号及びノッチフィルタの例について示したグラフである。3 is a graph showing an example of a mixing signal and a notch filter processed by the sound collection device according to the first embodiment. 第１の実施形態に係る収音装置が行う信号混合の処理例について示したグラフである。6 is a graph showing an example of signal mixing processing performed by the sound collection device according to the first embodiment. 第２の実施形態に係る収音装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the sound collection device which concerns on 2nd Embodiment. 従来の減算型ＢＦの構成を示すブロック図である。It is a block diagram which shows the structure of the conventional subtraction type BF. 従来の減算型ＢＦにより形成される指向性フィルタの例について示した説明図である。It is explanatory drawing shown about the example of the directional filter formed of the conventional subtraction type BF.

（Ａ）第１の実施形態
以下、本発明による収音装置、収音プログラム及び収音方法の第１の実施形態を、図面を参照しながら詳述する。 (A) First Embodiment Hereinafter, a first embodiment of a sound collecting device, a sound collecting program, and a sound collecting method according to the present invention will be described in detail with reference to the drawings.

（Ａ−１）第１の実施形態の構成
図１は、第１の実施形態に係る収音装置１００の機能的構成を示すブロック図である。 (A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing a functional configuration of a sound collecting device 100 according to the first embodiment.

収音装置１００は、２つのマイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）を用いて、目的エリアの音源からの目的エリア音を収音する目的エリア音収音処理を行う。 The sound collection device 100 uses two microphone arrays MA (MA1 and MA2) to perform target area sound collection processing for collecting a target area sound from a sound source in the target area.

マイクロホンアレイＭＡ１、ＭＡ２は、目的エリアが存在する空聞の任意の場所に配置される。目的エリアに対するマイクロホンアレイＭＡ１、ＭＡ２の位置は、指向性が目的エリアでのみ重なればどこでも良く、例えば目的エリアを挟んで対向に配置しても良い。各マイクロホンアレイＭＡは２つ以上のマイクロホンＭから構成され、各マイクロホンＭにより音響信号を収音する。この実施形態では、各マイクロホンアレイＭＡに、音響信号を収音する２つのマイクロホンＭ１、Ｍ２が配置されるものとして説明する。すなわち、この実施形態において、各マイクロホンアレイＭＡは、２ｃｈマイクロホンアレイを構成しているものとする。２個のマイクロホンＭ１、Ｍ２の間の距離は限定されないものであるが、この実施形態の例では、２個のマイクロホンＭ１、Ｍ２の間の距離は３ｃｍとする。なお、マイクロホンアレイＭＡの数は２つに限定するものではなく、目的エリアが複数存在する場合、全てのエリアをカバーできる数のマイクロホンアレイＭＡを配置する必要がある。 The microphone arrays MA1 and MA2 are arranged at any place in the space where the target area exists. The positions of the microphone arrays MA1 and MA2 with respect to the target area may be anywhere as long as the directivities overlap only in the target area, for example, they may be arranged opposite to each other across the target area. Each microphone array MA is composed of two or more microphones M, and each microphone M picks up an acoustic signal. In this embodiment, two microphones M1 and M2 that pick up an acoustic signal are arranged in each microphone array MA. That is, in this embodiment, each microphone array MA constitutes a 2ch microphone array. The distance between the two microphones M1 and M2 is not limited, but in the example of this embodiment, the distance between the two microphones M1 and M2 is 3 cm. Note that the number of microphone arrays MA is not limited to two, and when there are a plurality of target areas, it is necessary to arrange the microphone arrays MA in a number that can cover all areas.

次に、図１を用いて収音装置１００の内部構成について説明する。 Next, the internal configuration of the sound collection device 100 will be described with reference to FIG.

図１に示す通り、収音装置１００は、信号入力部１、雑音抑圧部２、指向性形成部３、遅延補正部４、空間座標データ５、補正係数算出部６、目的エリア音抽出部７、混合フィルタ形成部８、信号混合部９、及び信号出力部１０を有している。 As shown in FIG. 1, the sound collection device 100 includes a signal input unit 1, a noise suppression unit 2, a directivity formation unit 3, a delay correction unit 4, spatial coordinate data 5, a correction coefficient calculation unit 6, and a target area sound extraction unit 7. , A mixing filter forming unit 8, a signal mixing unit 9, and a signal output unit 10.

収音装置１００は、全てハードウェア（例えば、専用チップ等）により構成するようにしてもよいし一部又は全部についてソフトウェア（プログラム）として構成するようにしてもよい。収音装置１００は、例えば、プロセッサ及びメモリを有するコンピュータにプログラム（実施形態の信号処理プログラムを含む）をインストールすることにより構成するようにしてもよい。 The sound collection device 100 may be configured entirely by hardware (for example, a dedicated chip or the like), or may be configured as software (program) for a part or all. The sound collection device 100 may be configured, for example, by installing a program (including the signal processing program of the embodiment) in a computer having a processor and a memory.

次に、図２を用いて、収音装置１００のハードウェア構成について説明する。 Next, the hardware configuration of the sound collection device 100 will be described with reference to FIG.

収音装置１００は、全てハードウェア（例えば、専用チップ等）により構成するようにしてもよいし一部又は全部についてソフトウェア（プログラム）として構成するようにしてもよい。収音装置１００は、例えば、プロセッサ及びメモリを有するコンピュータにプログラム（実施形態の収音プログラムを含む）をインストールすることにより構成するようにしてもよい。 The sound collection device 100 may be configured entirely by hardware (for example, a dedicated chip or the like), or may be configured as software (program) for a part or all. The sound collection device 100 may be configured, for example, by installing a program (including the sound collection program of the embodiment) in a computer having a processor and a memory.

図２は、収音装置１００のハードウェア構成の例について示したブロック図である。 FIG. 2 is a block diagram showing an example of the hardware configuration of the sound collection device 100.

図２では、収音装置１００を、ソフトウェア（コンピュータ）を用いて構成する際のハードウェア構成の例について示している。 FIG. 2 shows an example of a hardware configuration when the sound collection device 100 is configured using software (computer).

図２に示す収音装置１００は、ハードウェア的な構成要素として、プログラム（実施形態の収音プログラムを含む）がインストールされたコンピュータ２００を有している。なお、コンピュータ２００に、アナログ信号（超指向性マイクロホンＭ１、Ｍ２から供給される信号）をデジタル信号に変換する変換手段が搭載されていない場合、収音装置１００に別途図示しない変換手段を搭載するようにしてもよい。また、コンピュータ２００は、収音プログラム専用のコンピュータとしてもよいし、他の機能のプログラムと共用される構成としてもよい。 The sound collection device 100 illustrated in FIG. 2 includes a computer 200 in which a program (including the sound collection program of the embodiment) is installed as a hardware component. If the computer 200 does not include a conversion unit that converts an analog signal (a signal supplied from the super-directional microphones M1 and M2) into a digital signal, the sound collection device 100 includes a conversion unit (not shown). You may do it. Further, the computer 200 may be a computer dedicated to a sound collection program, or may be configured to be shared with a program having another function.

図２に示すコンピュータ２００は、プロセッサ２０１、一次記憶部２０２、及び二次記憶部２０３を有している。一次記憶部２０２は、プロセッサ２０１の作業用メモリ（ワークメモリ）として機能する記憶手段であり、例えば、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の高速動作するメモリを適用することができる。二次記憶部２０３は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）やプログラムデータ（実施形態に係る収音プログラムのデータを含む）等の種々のデータを記録する記憶手段であり、例えば、ＦＬＡＳＨメモリやＨＤＤ等の不揮発性メモリを適用することができる。この実施形態のコンピュータ２００では、プロセッサ２０１が起動する際、二次記憶部２０３に記録されたＯＳやプログラム（実施形態に係る収音プログラムを含む）を読み込み、一次記憶部２０２上に展開して実行する。 The computer 200 illustrated in FIG. 2 includes a processor 201, a primary storage unit 202, and a secondary storage unit 203. The primary storage unit 202 is a storage unit that functions as a work memory (work memory) of the processor 201. For example, a high-speed operating memory such as a DRAM (Dynamic Random Access Memory) can be applied. The secondary storage unit 203 is a storage unit that records various data such as an OS (Operating System) and program data (including data of the sound collection program according to the embodiment), and is a nonvolatile memory such as a FLASH memory or an HDD. Sex memory can be applied. In the computer 200 of this embodiment, when the processor 201 is activated, the OS and programs recorded in the secondary storage unit 203 (including the sound collection program according to the embodiment) are read and expanded on the primary storage unit 202. Execute.

なお、コンピュータ２００の具体的な構成は図２の構成に限定されないものであり、種々の構成を適用することができる。例えば、一次記憶部２０２が不揮発メモリ（例えば、ＦＬＡＳＨメモリ等）であれば、二次記憶部２０３については除外した構成としてもよい。 The specific configuration of the computer 200 is not limited to the configuration of FIG. 2, and various configurations can be applied. For example, if the primary storage unit 202 is a non-volatile memory (for example, a FLASH memory or the like), the secondary storage unit 203 may be excluded.

（Ａ−２）第１の実施形態の動作
信号入力部１は、各マイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）が収音した音響信号の入力をうけるとその音響信号をアナログ信号からデジタル信号に変換する。そして、信号入力部１は、当該音響信号（デジタル信号）を、所定の方法（例えば、高速フーリエ変換）を用いて、時間領域から周波数領域へ変換する。以下では、各マイクロホンアレイＭＡにおいて、マイクロホンＭ１、Ｍ２の周波数領域の入力信号を、それぞれＸ_１、Ｘ_２として説明する。 (A-2) Operation of the first embodiment The signal input unit 1 receives an acoustic signal picked up by each microphone array MA (MA1, MA2) and converts the acoustic signal from an analog signal to a digital signal. .. Then, the signal input unit 1 transforms the acoustic signal (digital signal) from the time domain to the frequency domain using a predetermined method (for example, fast Fourier transform). Hereinafter, in each microphone array MA, the input signals in the frequency domain of the microphones M1 and M2 will be described as X ₁ and X ₂ , respectively.

雑音抑圧部２は、信号入力部１で取得した入力信号に含まれる背景雑音の成分を推定し、抑圧する。雑音抑圧部２に適用する雑音抑圧手法は限定されないものであるが、例えば、ＳＳやウィーナーフィルタリング法（Ｗｉｅｎｅｒｆｉｌｔｅｒｉｎｇ）などを用いることができる。 The noise suppression unit 2 estimates and suppresses the background noise component included in the input signal acquired by the signal input unit 1. The noise suppression method applied to the noise suppression unit 2 is not limited, but for example, SS or Wiener filtering can be used.

指向性形成部３は、マイクロホンアレイＭＡ毎に、雑音抑圧部２により背景雑音を抑圧した信号に対し、（４）式に従いＢＦにより目的エリア方向に指向性を形成する。以下では、マイクロホンアレイＭＡ１、ＭＡ２のＢＦ出力の振幅スペクトルを、それぞれＹ_１ｋ（ｎ）、Ｙ_２ｋ（ｎ）として説明する。 The directivity forming unit 3 forms a directivity in the target area direction by the BF according to the expression (4) for the signal in which the background noise is suppressed by the noise suppressing unit 2 for each microphone array MA. Hereinafter, the amplitude spectra of the BF outputs of the microphone arrays MA1 and MA2 will be described as Y _1k (n) and Y _2k (n), respectively.

遅延補正部４は、目的エリアと各マイクロホンアレイＭＡの距離の違いにより発生する遅延を算出し、補正する。遅延補正部４は、まず空間座標データ５から目的エリアの位置とマイクロホンアレイの位置を取得し、各マイクロホンアレイへの目的エリア音の到達時間の差を算出する。次に最も目的エリアから遠い位置に配置されたマイクロホンアレイを基準として、全てのマイクロホンアレイに目的エリア音が同時に到達するように遅延を加える。 The delay correction unit 4 calculates and corrects the delay caused by the difference in the distance between the target area and each microphone array MA. The delay correction unit 4 first acquires the position of the target area and the position of the microphone array from the spatial coordinate data 5, and calculates the difference in the arrival time of the target area sound to each microphone array. Next, with reference to the microphone array arranged farthest from the target area, a delay is added so that sounds of the target area reach all the microphone arrays at the same time.

空間座標データ５は、全ての目的エリアと各マイクロホンアレイＭＡと各マイクロホンアレイＭＡを構成するマイクロホンＭの位置情報を保持する。 The spatial coordinate data 5 holds the positional information of all the target areas, the microphone arrays MA, and the microphones M forming the microphone arrays MA.

補正係数算出部６は、各ＢＦ出力に含まれる目的エリア音成分の振幅スペクトルを同じにするための補正係数を算出する。以下では、マイクロホンアレイＭＡ１、ＭＡ２のＢＦ出力に対する振幅スペクトル補正係数を、α_１（ｎ）、α_２（ｎ）として説明する。補正係数算出部６は、例えば、（５）式、（６）式または（７）式、（８）式に従い、補正係数α_１（ｎ）、α_２（ｎ）を算出する。 The correction coefficient calculation unit 6 calculates a correction coefficient for making the amplitude spectra of the target area sound components included in each BF output the same. Hereinafter, the amplitude spectrum correction coefficients for the BF outputs of the microphone arrays MA1 and MA2 will be described as α ₁ (n) and α ₂ (n). The correction coefficient calculation unit 6 calculates the correction coefficients α ₁ (n) and α ₂ (n) according to, for example, Expression (5), Expression (6), Expression (7), or Expression (8).

目的エリア音抽出部７は、補正係数算出部６で算出した補正係数により補正した各ＢＦ出力データから、目的エリア方向に存在する非目的エリア音を抽出する。目的エリア音抽出部７は、補正係数算出部６で算出した補正係数により補正した各ＢＦ出力データを例えば、（９）式又は（１０）式に従いＳＳし、目的エリア方向に存在する非目的エリア音（Ｎ_１（ｎ）又はＮ_２（ｎ））を抽出する。 The target area sound extraction unit 7 extracts the non-target area sound existing in the target area direction from each BF output data corrected by the correction coefficient calculated by the correction coefficient calculation unit 6. The target area sound extraction unit 7 SS-processes each BF output data corrected by the correction coefficient calculated by the correction coefficient calculation unit 6 according to, for example, Expression (9) or Expression (10), and the non-target area existing in the direction of the target area. The sound (N ₁ (n) or N ₂ (n)) is extracted.

さらに、目的エリア音抽出部７は、抽出した非目的エリア音（Ｎ_１（ｎ）又はＮ_２（ｎ））を各ＢＦの出力から（１１）式又は（１２）式に従いＳＳすることにより目的エリア音（Ｚ_１（ｎ）又はＺ_２（ｎ））を抽出する。 Further, the target area sound extraction unit 7 performs SS by extracting the extracted non-target area sound (N ₁ (n) or N ₂ (n)) from the output of each BF according to Expression (11) or Expression (12). Area sounds (Z ₁ (n) or Z ₂ (n)) are extracted.

混合フィルタ形成部８は、目的エリア音抽出部７において（９）式又は（１０）式により抽出した非目的エリア音（Ｎ_１（ｎ）又はＮ_２（ｎ））からパワーがピークとなる周波数（以下、「ピーク周波数」と呼ぶ）を検出し、そのピーク周波数の成分（パワー）だけを抑圧する抑制フィルタ（以下、「ノッチフィルタ」、「バンドストップフィルタ」、又は「バンドエリミネーションフィルタ」と呼ぶ）を形成する。 The mixing filter formation unit 8 has a frequency at which the power reaches a peak from the non-target area sound (N ₁ (n) or N ₂ (n)) extracted by the target area sound extraction unit 7 by the expression (9) or the expression (10). (Hereinafter, referred to as “peak frequency”) and suppresses only the component (power) of the peak frequency (hereinafter referred to as “notch filter”, “band stop filter”, or “band elimination filter”) Form).

混合フィルタ形成部８では、ピーク周波数を検出する際、抽出した非目的エリア音（Ｎ_１（ｎ）又はＮ_２（ｎ））だけでなく、他の、入力信号（いずれかのマイクロホンアレイＭＡの入力信号）に基づき、「非目的エリアを音源とする音」の成分が優勢な信号（以下、「非目的エリア音成分優勢信号」と呼ぶ）を用いてもよい。例えば、混合フィルタ形成部８では、ピーク周波数を検出する際、指向性形成部３において（２）式又は（３）式により抽出した音（非目的音）を用いても良い。混合フィルタ形成部８では、ピーク周波数は、例えば、振幅スペクトルを平滑化し、最大振幅スペクトルを求めて検出するようにしてもよい。また、混合フィルタ形成部８では、抽出した非目的エリア音を時間領域に戻し、線形予測（ＬＰＣ：ＬｉｎｅａｒＰｒｅｄｉｃｔｉｖｅＣｏｄｉｎｇ）分析を行って、ＬＰＣスペクトル包絡のピークをピーク周波数として検出しても良い。 When detecting the peak frequency, the mixing filter forming unit 8 outputs not only the extracted non-target area sound (N ₁ (n) or N ₂ (n)) but also another input signal (of either microphone array MA). Based on the (input signal), a signal in which the component of the “sound whose sound source is the non-target area” is dominant (hereinafter, referred to as “non-target area sound component dominant signal”) may be used. For example, in the mixing filter forming unit 8, when detecting the peak frequency, the sound (non-target sound) extracted by the directivity forming unit 3 by the formula (2) or the formula (3) may be used. In the mixing filter forming unit 8, the peak frequency may be detected by, for example, smoothing the amplitude spectrum and obtaining the maximum amplitude spectrum. In addition, the mixed filter forming unit 8 may return the extracted non-target area sound to the time domain, perform linear predictive coding (LPC) analysis, and detect the peak of the LPC spectrum envelope as a peak frequency.

混合フィルタ形成部８は、ピーク周波数を検出した後、ノッチフィルタの形成する際に、既存のノッチフィルタ形成手法を用いても良いし、独自のフィルタ形成手法を用いても良い。混合フィルタ形成部８では、独自手法によりノッチフィルタを形成する場合、例えば、抑圧する周波数帯域（以下、「抑圧周波数」と呼ぶ）を、ピーク周波数だけとしても良いし、ピーク周波数を含む周波数帯（例えば、ピーク周波数を中心とした所定の帯域幅の周波数帯）としても良い。 The mixed filter forming unit 8 may use an existing notch filter forming method or an original filter forming method when forming the notch filter after detecting the peak frequency. When the notch filter is formed by the unique method in the mixed filter forming unit 8, for example, the frequency band to be suppressed (hereinafter referred to as “suppression frequency”) may be only the peak frequency, or the frequency band including the peak frequency ( For example, it may be a frequency band having a predetermined bandwidth centered on the peak frequency).

また、混合フィルタ形成部８は、抑圧周波数を複数するようにしてもよい。例えば、混合フィルタ形成部８は、非目的エリア音のピーク周波数（１番目のピーク）だけでなく、２番目以降（２番目、３番目、…、Ｎ番目）のピークを、それぞれ抑圧周波数として設定するようにしてもよい。 Further, the mixing filter forming unit 8 may have a plurality of suppression frequencies. For example, the mixing filter forming unit 8 sets not only the peak frequency (first peak) of the non-target area sound but also the second and subsequent peaks (second, third,..., Nth) as suppression frequencies. You may do so.

混合フィルタ形成部８において、各抑圧周波数に対して適用する抑圧量（減衰量）の決定方法は限定されないものである。混合フィルタ形成部８では、例えば、各抑圧周波数に対する抑圧量を固定値としてもよいし、非目的エリア音のピーク周波数の振幅スペクトルや全帯域の平均振幅スペクトルに応じて動的に各抑圧周波数に対する抑圧量を設定するようにしてもよい。例えば、混合フィルタ形成部８では、非目的エリア音のピーク周波数の振幅スペクトルや全帯域の平均振幅スペクトルが大きいほど、抑圧周波数の成分に適用する抑圧量を大きく設定するようにしてもよい。また、混合フィルタ形成部８は、信号出力部１０から出力される音に基づいて、抑圧量を変更するようにしてもよい。 The method of determining the suppression amount (attenuation amount) applied to each suppression frequency in the mixing filter forming unit 8 is not limited. In the mixing filter forming unit 8, for example, the amount of suppression for each suppression frequency may be a fixed value, or for each suppression frequency dynamically according to the amplitude spectrum of the peak frequency of the non-target area sound or the average amplitude spectrum of the entire band. The amount of suppression may be set. For example, in the mixing filter forming unit 8, the suppression amount applied to the suppression frequency component may be set to be larger as the amplitude spectrum of the peak frequency of the non-target area sound or the average amplitude spectrum of the entire band is larger. Moreover, the mixing filter forming unit 8 may change the suppression amount based on the sound output from the signal output unit 10.

混合フィルタ形成部８では、抑圧周波数に帯域幅をもたせる場合、当該帯域幅において一律で同じ抑圧量としても良いし、ピーク周波数から周波数が離れるに従い弱くなるよう（抑圧量が少なくなるように）に設定しても良い。 When the suppression frequency has a bandwidth, the mixing filter forming unit 8 may uniformly apply the same suppression amount in the bandwidth, or may become weaker as the frequency deviates from the peak frequency (so that the suppression amount decreases). You may set it.

混合フィルタ形成部８では、同じ非目的エリア音が定常的に存在する場合（例えば、定常的に大音量のサイレン音やエンジン音等が存在する場合）は、動的にフィルタを形成するのではなく、事前にピーク周波数（当該非目的エリア音）を検出してノッチフィルタを形成しておいても良いし、予めノッチフィルタを複数用意して切替えても良い。なお、混合フィルタ形成部８において、ノッチフィルタは、周波数領域と時間領域どちらでも適用することができる。 When the same non-target area sound constantly exists (for example, when there is a steady loud siren sound, engine sound, etc.), the mixed filter forming unit 8 does not dynamically form a filter. Instead, the peak frequency (the non-target area sound) may be detected in advance to form the notch filter, or a plurality of notch filters may be prepared and switched in advance. In addition, in the mixing filter forming unit 8, the notch filter can be applied in both the frequency domain and the time domain.

信号混合部９は、混合フィルタ形成部８で形成したノッチフィルタを混合用信号に掛けた信号（以下、「フィルタ処理済混合用信号Ｘ_ＭＩＸ（ｎ）」と呼ぶ）を形成する。そして、信号混合部９は、フィルタ処理済混合用信号Ｘ_ＭＩＸ（ｎ）と、目的エリア音抽出部７で抽出した目的エリア音（Ｚ_１（ｎ）又はＺ_２（ｎ））とを混合した信号（以下、「混合後信号Ｗ（ｎ）」と呼ぶ）を形成する。 The signal mixing unit 9 forms a signal obtained by applying the notch filter formed by the mixing filter forming unit 8 to the mixing signal (hereinafter, referred to as “filtered mixing signal X _MIX (n)”). Then, the signal mixing unit 9 mixes the filtered mixed signal X _MIX (n) with the target area sound (Z ₁ (n) or Z ₂ (n)) extracted by the target area sound extracting unit 7. A signal (hereinafter, referred to as “mixed signal W(n)”) is formed.

混合用信号は、信号入力部１で取得した入力信号（いずれかのマイクロホンアレイＭＡにおけるいずれかのマイクロホンの入力信号）でも良いし、雑音抑圧部２により背景雑音を抑圧した信号（いずれかのマイクロホンアレイＭＡにおけるいずれかのマイクロホンの入力信号について背景雑音を抑圧した信号）でも良い。 The mixing signal may be an input signal acquired by the signal input unit 1 (an input signal of any microphone in any microphone array MA), or a signal in which background noise is suppressed by the noise suppression unit 2 (any microphone). A signal with background noise suppressed for an input signal of any microphone in the array MA) may be used.

ここでは、例として、目的エリア音抽出部７において、（１１）式に従いマイクロホンアレイＭＡ１を基準としたエリア収音が行われ、目的エリア音Ｚ_１（ｎ）が取得された場合を想定する。この場合、信号混合部９は、例えば、以下の（１３）に従い、目的エリア音（Ｚ_１（ｎ））とフィルタ処理済混合用信号Ｘ_ＭＩＸ（ｎ）とを混合するようにしてもよい。ここで、「μ」は、混合するフィルタ処理済混合用信号Ｘ_ＭＩＸ（ｎ）の大きさを調整するパラメータ（係数）である。また、ここで、「Ｗ_１（ｎ）」はマイクロホンアレイＭＡ１を基準として抽出した目的エリア音Ｚ_１（ｎ）に基づいて算出した混合後信号である。なお、目的エリア音抽出部７において、（１２）式に従い、マイクロホンアレイＭＡ２を基準としたエリア収音が行われ、目的エリア音Ｚ_２（ｎ）にフィルタ処理済混合用信号Ｘ_ＭＩＸ（ｎ）を混合して混合後信号Ｗ_２（ｎ）を得る場合の式は（１４）式のようになる。 Here, as an example, it is assumed that the target area sound extraction unit 7 collects an area sound with the microphone array MA1 as a reference according to the expression (11) and acquires the target area sound Z ₁ (n). In this case, the signal mixing unit 9 may mix the target area sound (Z ₁ (n)) and the filtered mixed signal X _MIX (n) according to (13) below, for example. Here, “μ” is a parameter (coefficient) for adjusting the magnitude of the filtered mixed signal X _MIX (n) to be mixed. Further, here, “W ₁ (n)” is a mixed signal calculated based on the target area sound Z ₁ (n) extracted with the microphone array MA1 as a reference. In the target area sound extraction unit 7, according to the expression (12), the area sound is picked up with the microphone array MA2 as a reference, and the target area sound Z ₂ (n) is filtered by the mixed signal X _MIX (n). When the signals are mixed to obtain the mixed signal W ₂ (n), the expression is given by the expression (14).

目的エリア音抽出部７では、μを予め定めた定数としてもよいし、非目的エリア音のレベルに応じて適応的（動的）に変えても良い。また、目的エリア音抽出部７では、信号出力部１０から出力される信号（例えば、Ｗ_１（ｎ）又はＷ_２（ｎ）のパワー）に応じて、μの値を変更するようにしてもよい。 In the target area sound extraction unit 7, μ may be a predetermined constant or may be changed adaptively (dynamically) according to the level of the non-target area sound. In the target area sound extraction unit 7, the value of μ may be changed according to the signal output from the signal output unit 10 (for example, the power of W ₁ (n) or W ₂ (n)). Good.

また、目的エリア音抽出部７では、例えば、以下の（１５）式又は（１６）式に示すように、混合後信号を算出する際に、目的エリア音（Ｚ_１（ｎ）又はＺ_２（ｎ））の大きさを調整するパラメータ（係数）としてρを追加してもよい。この場合、目的エリア音抽出部７では、ρを０と設定することで、混合用の信号だけを出力することもできる。
Ｗ_１（ｎ）＝Ｚ_１（ｎ）＋μＸ_ＭＩＸ（ｎ） …（１３）
Ｗ_２（ｎ）＝Ｚ_２（ｎ）＋μＸ_ＭＩＸ（ｎ） …（１４）
Ｗ_１（ｎ）＝ρＺ_１（ｎ）＋μＸ_ＭＩＸ（ｎ） …（１５）
Ｗ_２（ｎ）＝ρＺ_２（ｎ）＋μＸ_ＭＩＸ（ｎ） …（１６） Further, in the target area sound extraction unit 7, for example, as shown in the following Expression (15) or Expression (16), when calculating the mixed signal, the target area sound (Z ₁ (n) or Z ₂ ( n)) may be added as a parameter (coefficient) for adjusting the size. In this case, the target area sound extraction unit 7 can output only the mixing signal by setting ρ to 0.
W ₁ (n)=Z ₁ (n)+μX _MIX (n) (13)
W ₂ (n)=Z ₂ (n)+μX _MIX (n) (14)
W ₁ (n)=ρZ ₁ (n)+μX _MIX (n) (15)
W ₂ (n)=ρZ ₂ (n)+μX _MIX (n) (16)

信号出力部１０は、信号混合部９で算出した混合後信号（Ｗ_１（ｎ）又はＷ_２（ｎ））又は混合後信号に基づく信号をエリア収音結果として出力する。信号出力部１０が混合後信号を出力する際の出力形式や出力手段（出力メディア）は限定されないものである。例えば、信号出力部１０は、混合後信号を周波数領域で出力するようにしてもよいし、時間領域で出力するようにしてもよい。 The signal output unit 10 outputs the mixed signal (W ₁ (n) or W ₂ (n)) calculated by the signal mixing unit 9 or a signal based on the mixed signal as the area sound collection result. The output format and the output means (output medium) when the signal output unit 10 outputs the mixed signal are not limited. For example, the signal output unit 10 may output the mixed signal in the frequency domain or the time domain.

次に、収音装置１００におけるエリア収音処理の具体例について図３、図４を用いて説明する。 Next, a specific example of the area sound collecting process in the sound collecting device 100 will be described with reference to FIGS. 3 and 4.

まず、ここでは、非目的エリア音における１番目のピークの周波数を「ｆＰＮＴ１」、非目的エリア音における２番目のピークの周波数を「ｆＰＮＴ２」、…、非目的エリア音におけるＮ番目のピークの周波数を「ｆＰＮＴＮ」と呼ぶものとする。 First, here, the frequency of the first peak in the non-target area sound is "fPNT1", the frequency of the second peak in the non-target area sound is "fPNT2",..., The frequency of the N-th peak in the non-target area sound. Shall be referred to as “fPNTN”.

図３（ａ）は、混合用信号としての入力信号Ｘ_１（ｎ）に含まれる非目的エリアを音源とする音（雑音）がサイレン音だったときの、信号入力部１、雑音抑圧部２、指向性形成部３もしくは目的エリア音抽出部７において抽出したサイレン音のスペクトラムである。また図３（ｂ）は、非目的エリアを音源とする音（雑音）の１番目のピーク（最大ピーク）と２番目のピークを抑圧するように設計したノッチフィルタの例を示した図であり、周波数ごとの抑圧量を示している。また図４は、混合用信号として、目的エリア音と非目的エリアを音源とする音（雑音）（図３（ａ）に示すサイレン音）が含まれる場合の入力信号Ｘ_１（ｎ）のスペクトラムを示している。 FIG. 3A shows the signal input unit 1 and the noise suppression unit 2 when the sound (noise) having the sound source of the non-target area included in the input signal X ₁ (n) as the mixing signal is a siren sound. , The spectrum of the siren sound extracted by the directivity forming unit 3 or the target area sound extracting unit 7. Further, FIG. 3B is a diagram showing an example of a notch filter designed to suppress the first peak (maximum peak) and the second peak of the sound (noise) whose sound source is the non-target area. , The suppression amount for each frequency is shown. In addition, FIG. 4 shows the spectrum of the input signal X ₁ (n) in the case where the mixing area signal includes the target area sound and the sound (noise) generated by the sound source in the non-target area (the siren sound shown in FIG. 3A). Is shown.

図３、図４の例では、混合フィルタ形成部８は、非目的エリアを音源とする音（雑音）の１番目のピーク（最大ピーク）の周波数ｆＰＮＴ１を中心とする前後２００Ｈｚの帯域と、非目的エリアを音源とする音（雑音）の２番目のピークの周波数ｆＰＮＴ２を中心とする前後２００Ｈｚの帯域を抑圧周波数として設定するものとする。以下では、周波数ｆＰＮＴ１に対応する抑圧周波数を第１の抑圧周波数と呼び、周波数ｆＰＮＴ２に対応する抑圧周波数を第２の抑圧周波数と呼ぶものとする。 In the example of FIG. 3 and FIG. 4, the mixing filter forming unit 8 has a band of 200 Hz before and after the frequency fPNT1 of the first peak (maximum peak) of the sound (noise) whose sound source is the non-target area, and A band of 200 Hz before and after the frequency fPNT2 of the second peak of the sound (noise) whose sound source is the target area is set as the suppression frequency. Hereinafter, the suppression frequency corresponding to the frequency fPNT1 will be referred to as a first suppression frequency, and the suppression frequency corresponding to the frequency fPNT2 will be referred to as a second suppression frequency.

図３、図４の例では、混合用信号（入力信号Ｘ_１（ｎ））の第１の抑圧周波数と第２の抑圧周波数において、非目的エリアを音源とする音（雑音）であるサイレン音の成分が、目的エリア音（例えば、電話装置の話者の音声等）より優勢であるものとして説明する。 In the examples of FIGS. 3 and 4, at the first suppression frequency and the second suppression frequency of the mixing signal (input signal X ₁ (n)), a siren sound that is a sound (noise) that uses the non-target area as a sound source. Will be described as being dominant over the target area sound (for example, the voice of the speaker of the telephone device).

図３（ｂ）に示すノッチフィルタでは、第１の抑圧周波数の帯域（周波数ｆＰＮＴ１を中心とする前後２００Ｈｚの帯域）と、第２の抑圧周波数の帯域（周波数ｆＰＮＴ２を中心とする前後２００Ｈｚの帯域）について抑圧量が１以下の値に設定されている。ノッチフィルタにおいて、各周波数の抑圧量は０から１の間の数値で表すことができ、数値が小さいほど抑圧量が大きいことを示すことになる。図３（ｂ）に示すノッチフィルタでは、第１及び第２の抑圧周波数の帯域ともに、ピークの周波数（ｆＰＮＴ１、ｆＰＮＴ２）の抑圧量が最大となっており、ピークの周波数（ｆＰＮＴ１、ｆＰＮＴ２）から離れるごとに抑圧量が小さくなるように設定されている。 In the notch filter shown in FIG. 3B, a first suppression frequency band (a band of 200 Hz before and after centering on the frequency fPNT1) and a second suppression frequency band (a band of 200 Hz before and after centering on the frequency fPNT2). ), the suppression amount is set to a value of 1 or less. In the notch filter, the suppression amount of each frequency can be represented by a numerical value between 0 and 1, and the smaller the numerical value, the larger the suppression amount. In the notch filter shown in FIG. 3B, the suppression amount of the peak frequencies (fPNT1, fPNT2) is maximum in both the first and second suppression frequency bands, and the peak frequencies (fPNT1, fPNT2) It is set so that the amount of suppression decreases as the distance increases.

これにより、図３（ａ）に示す非目的エリアを音源とする音（雑音）（サイレン音）では、図３（ｂ）に示すノッチフィルタにおける第１及び第２の抑圧周波数の帯域についてパワーが抑圧されている。 As a result, in the sound (noise) (siren sound) having the non-target area as the sound source shown in FIG. 3A, the power is reduced in the bands of the first and second suppression frequencies in the notch filter shown in FIG. 3B. It is suppressed.

図４は、目的エリア音と非目的エリアを音源とする音（雑音）（図３（ａ）に示すサイレン音）が含まれる場合の入力信号Ｘ_１（ｎ）を混合用信号Ｘ_ＭＩＸ（ｎ）としたときのスペクトラムを実線で図示し、混合用信号（入力信号Ｘ_１（ｎ））がノッチフィルタ（図３（ｂ）に示すノッチフィルタ）によりフィルタ処理された後のフィルタ処理済混合用信号Ｘ_ＭＩＸ（ｎ）のスペクトラムを点線で図示している。図４の例では、フィルタ処理済混合用信号Ｘ_ＭＩＸ（ｎ）を適用することで、混合後信号Ｗ_１（ｎ）において、非目的エリアを音源とする音（雑音）であるサイレン音の成分が優勢な帯域（第１及び第２の抑圧周波数）におけるＳＮ比が向上していることがわかる。 FIG. 4 shows the case where the input signal X ₁ (n) in the case where the target area sound and the sound (noise) generated by the sound source in the non-target area (the siren sound shown in FIG. 3A) are included are mixed signals X _MIX (n ) Is indicated by a solid line, and the mixed signal (input signal X ₁ (n)) is filtered by a notch filter (notch filter shown in FIG. 3B) and then mixed. The spectrum of the signal X _MIX (n) is shown by the dotted line. In the example of FIG. 4, by applying the filtered mixed signal X _MIX (n), in the mixed signal W ₁ (n), the component of the siren sound that is the sound (noise) whose sound source is the non-target area It can be seen that the SN ratio is improved in the band (1st and 2nd suppression frequencies) in which is dominant.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、以下のような効果を奏することができる。 (A-3) Effects of First Embodiment According to the first embodiment, the following effects can be achieved.

第１の実施形態の収音装置１００では、目的エリア音抽出の際に抽出した非目的エリア音のピーク周波数を検出し、ピーク周波数に基づく抑圧周波数の成分を抑圧するノッチフィルタを形成する。そして、収音装置１００は、ノッチフィルタを混合用信号に掛けることにより、混合用信号に含まれる非目的エリア音の主成分を抑圧したフィルタ処理済混合用信号Ｘ_ＭＩＸ（ｎ）を得ることができる。そして、収音装置１００は、フィルタ処理済混合用信号Ｘ_ＭＩＸ（ｎ）を、目的エリア音に混合した混合後信号（Ｗ_１（ｎ）又はＷ_２（ｎ））を生成し、混合後信号に基づく信号Ｚ（ｎ）をエリア収音結果として出力する。これにより、収音装置１００では、エリア収音結果に対して、非目的エリア音の混入を抑えながら音質を改善することができる。 The sound collecting device 100 of the first embodiment detects the peak frequency of the non-target area sound extracted at the time of extracting the target area sound, and forms a notch filter that suppresses the suppression frequency component based on the peak frequency. Then, the sound collection device 100 can obtain the filtered mixed signal X _MIX (n) in which the main component of the non-target area sound included in the mixed signal is suppressed by applying the notch filter to the mixed signal. it can. Then, the sound collection device 100 generates a mixed signal (W ₁ (n) or W ₂ (n)) in which the filtered mixed signal X _MIX (n) is mixed with the target area sound, and the mixed signal is generated. The signal Z(n) based on is output as the area sound collection result. As a result, the sound collection device 100 can improve the sound quality of the area sound collection result while suppressing the mixing of the non-target area sound.

（Ｂ）第２の実施形態
以下、本発明による収音装置、収音プログラム及び収音方法の第２の実施形態を、図面を参照しながら詳述する。 (B) Second Embodiment Hereinafter, a second embodiment of the sound collecting device, the sound collecting program, and the sound collecting method according to the present invention will be described in detail with reference to the drawings.

（Ｂ−１）第２の実施形態の構成
図５は、第２の実施形態の収音装置１００Ａに係る機能的構成について示したブロック図であり、上述の図１と同一部分又は対応部分については同一符号又は対応符号を付している。 (B-1) Configuration of Second Embodiment FIG. 5 is a block diagram showing a functional configuration according to the sound collecting device 100A of the second embodiment, and regarding the same or corresponding portions as those in FIG. 1 described above. Are given the same or corresponding symbols.

以下では、第２の実施形態について第１の実施形態との差異を説明する。 The differences between the second embodiment and the first embodiment will be described below.

なお、第２の実施形態の収音装置１００Ａのハードウェア構成についても上述の図２を用いて示すことができる。 The hardware configuration of the sound collection device 100A of the second embodiment can also be shown using FIG. 2 described above.

第２の実施形態の収音装置１００Ａでは、混合フィルタ調整部１１が追加されている点で第１の実施形態と異なっている。 The sound collecting device 100A of the second embodiment differs from that of the first embodiment in that a mixing filter adjusting unit 11 is added.

（Ｂ−２）第２の実施形態の動作
次に、以上のような構成を有する第２の実施形態の収音装置１００Ａの動作（実施形態に係る収音方法）について、第１の実施形態との差異を中心に説明する。 (B-2) Operation of Second Embodiment Next, the operation (sound collecting method according to the embodiment) of the sound collecting apparatus 100A of the second embodiment having the above-described configuration will be described as the first embodiment. The difference will be mainly explained.

第２の実施形態では、上述の通り、混合フィルタ調整部１１のみが第１の実施形態と異なっているため、以下では混合フィルタ調整部１１の動作を中心に説明する。 In the second embodiment, as described above, only the mixing filter adjusting unit 11 is different from the first embodiment, and hence the operation of the mixing filter adjusting unit 11 will be mainly described below.

混合フィルタ調整部１１は、まず目的エリア音抽出部７から目的エリア音を取得し、ピーク周波数を検出する。 The mixing filter adjusting unit 11 first acquires the target area sound from the target area sound extracting unit 7 and detects the peak frequency.

次に、混合フィルタ調整部１１は、目的エリア音のピーク周波数と、混合フィルタ形成部８において形成したノッチフィルタの抑圧周波数（非目的エリア音のピーク）を比較し、それらの周波数帯域の距離（例えば、目的エリア音のピーク周波数と非目的エリア音のピーク周波数の差；以下「周波数差」と呼ぶ）によってノッチフィルタによる抑圧量を調整する。 Next, the mixing filter adjusting unit 11 compares the peak frequency of the target area sound with the suppression frequency (peak of the non-target area sound) of the notch filter formed in the mixing filter forming unit 8, and determines the distance (frequency) of those frequency bands. For example, the suppression amount by the notch filter is adjusted by the difference between the peak frequency of the target area sound and the peak frequency of the non-target area sound; hereinafter referred to as “frequency difference”).

例えば、混合フィルタ調整部１１は、周波数差が１００Ｈｚ以内の場合、混合フィルタ形成部８で形成されるノッチフィルタの抑圧量を１／２に下げるようにしてもよい。 For example, the mixed filter adjusting unit 11 may reduce the suppression amount of the notch filter formed by the mixed filter forming unit 8 to 1/2 when the frequency difference is within 100 Hz.

混合フィルタ調整部１１では、抑圧量の調整は、一定の差以下であれば一律に変更してもよいし、差が大きくなるに従って徐々に大きくなるような設定にしても良い。 In the mixing filter adjustment unit 11, the adjustment of the suppression amount may be uniformly changed as long as it is equal to or less than a certain difference, or may be set to gradually increase as the difference increases.

また、混合フィルタ形成部８において、非目的エリア音のピーク周波数（１番目のピーク）だけでなく、２番目以降（２番目、３番目、…、Ｎ番目）のピークを、それぞれ抑圧周波数として設定する場合、混合フィルタ調整部１１は、目的エリア音についても１番目〜Ｎ番目のピークを検出し、非目的エリア音の各ピーク（１番目〜Ｎ番目のピーク）と、目的エリア音の各ピーク（１番目〜Ｎ番目のピーク）とを比較し、非目的エリア音の各ピーク（１番目〜Ｎ番目のピーク）に対応するそれぞれの抑圧周波数の抑圧量を調整してもよい。 In addition, in the mixing filter forming unit 8, not only the peak frequency (first peak) of the non-target area sound but also the second and subsequent peaks (second, third,..., Nth) are set as suppression frequencies. In this case, the mixing filter adjusting unit 11 detects the 1st to Nth peaks of the target area sound as well, and detects each peak of the non-target area sound (1st to Nth peak) and each peak of the target area sound. (1st to Nth peak) may be compared to adjust the suppression amount of each suppression frequency corresponding to each peak (1st to Nth peak) of the non-target area sound.

ここでは、目的エリア音における１番目のピークの周波数を「ＰＴ１」、目的エリア音における２番目のピークの周波数を「ＰＴ２」、…、目的エリア音におけるＮ番目のピークの周波数を「ＰＴＮ」と呼ぶものとする。また、ここでは、非目的エリア音における１番目のピークの周波数を「ＰＮＴ１」、非目的エリア音における２番目のピークの周波数を「ＰＮＴ２」、…、非目的エリア音におけるＮ番目のピークの周波数を「ＰＮＴＮ」と呼ぶものとする。この場合、混合フィルタ調整部１１は、周波数ＰＴ１と周波数ＰＮＴ１との間の周波数差に応じて周波数ＰＴ１に基づく抑圧周波数の抑圧量を調整し、周波数ＰＴ２と周波数ＰＮＴ２との間の周波数差に応じて周波数ＰＴ２に基づく抑圧周波数の抑圧量を調整し、…、周波数ＰＴＮと周波数ＰＮＴＮとの間の周波数差に応じて周波数ＰＴＮに基づく抑圧周波数の抑圧量を調整する。この場合、混合フィルタ調整部１１が調整する各抑圧量の調整の仕方は上記の例と同様の方式でもよい。 Here, the frequency of the first peak in the target area sound is "PT1", the frequency of the second peak in the target area sound is "PT2",..., The frequency of the Nth peak in the target area sound is "PTN". I shall call it. Further, here, the frequency of the first peak in the non-target area sound is "PNT1", the frequency of the second peak in the non-target area sound is "PNT2",..., The frequency of the N-th peak in the non-target area sound. Shall be referred to as "PNTN". In this case, the mixed filter adjustment unit 11 adjusts the suppression amount of the suppression frequency based on the frequency PT1 according to the frequency difference between the frequency PT1 and the frequency PNT1, and according to the frequency difference between the frequency PT2 and the frequency PNT2. Then, the suppression amount of the suppression frequency based on the frequency PT2 is adjusted, and the suppression amount of the suppression frequency based on the frequency PTN is adjusted according to the frequency difference between the frequency PTN and the frequency PNTN. In this case, the method of adjusting each suppression amount adjusted by the mixing filter adjusting unit 11 may be the same method as the above example.

（Ｂ−３）第２の実施形態の効果
第２の実施形態によれば、以下のような効果を奏することができる。 (B-3) Effects of Second Embodiment According to the second embodiment, the following effects can be achieved.

第２の実施形態の収音装置１００Ａでは、混合フィルタ調整部１１が、目的エリア音のピーク周波数とノッチフィルタの抑圧周波数（非目的エリア音のピーク周波数）を比較し、抑圧量を調整することで、非目的エリア音の種類によらず安定して音質を改善することができる。 In the sound collecting device 100A of the second embodiment, the mixing filter adjusting unit 11 compares the peak frequency of the target area sound with the suppression frequency of the notch filter (peak frequency of the non-target area sound) to adjust the suppression amount. Thus, the sound quality can be stably improved regardless of the type of the non-target area sound.

（Ｃ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (C) Other Embodiments The present invention is not limited to each of the above-described embodiments, and modified embodiments as exemplified below can be cited.

（Ｃ−１）上記の各実施形態において、雑音抑圧部２は必須ではないため除外するようにしてもよい。例えば、背景雑音がほとんどない静かな環境であれば、雑音抑圧部２の処理は除外するようにしてもよい。 (C-1) In each of the above embodiments, the noise suppression unit 2 is not essential and may be omitted. For example, in a quiet environment where there is almost no background noise, the process of the noise suppressing unit 2 may be omitted.

（Ｃ−２）上記の各実施形態において、遅延補正部４は必須ではないため除外するようにしてもよい。例えば、各マイクロホンアレイＭＡと目的エリア音の配置により、当初から遅延が発生しないか無視できる程度であれば、遅延補正部４の処理を除外するようにしてもよい。 (C-2) In each of the above-described embodiments, the delay correction unit 4 may be omitted because it is not essential. For example, the processing of the delay correction unit 4 may be excluded if the delay does not occur from the beginning or is negligible depending on the arrangement of each microphone array MA and the target area sound.

（Ｃ−３）上記の各実施形態において、補正係数算出部６は必須ではないため除外するようにしてもよい。例えば、各マイクロホンアレイＭＡと目的エリア音の配置により、各マイクロホンＭ（各マイクロホンアレイＭＡを構成する各マイクロホンＭ）で捕捉される目的エリア音の振幅スペクトルの差が小さいことが明白な場合は、補正係数算出部６の処理を除外してもよい。 (C-3) In each of the above embodiments, the correction coefficient calculation unit 6 is not essential and may be omitted. For example, when it is clear that the difference in the amplitude spectrum of the target area sound captured by each microphone M (each microphone M forming each microphone array MA) is small due to the arrangement of each microphone array MA and the target area sound, The processing of the correction coefficient calculation unit 6 may be excluded.

１００、１００Ａ…収音装置、１…信号入力部、２…雑音抑圧部、３…指向性形成部、４…遅延補正部、５…空間座標データ、６…補正係数算出部、７…目的エリア音抽出部、８…混合フィルタ形成部、９…信号混合部、１０…信号出力部、１１…混合フィルタ調整部、Ｍ、Ｍ１、Ｍ２…マイクロホン、ＭＡ、ＭＡ１、ＭＡ２…マイクロホンアレイ。 100, 100A... Sound collection device, 1... Signal input unit, 2... Noise suppression unit, 3... Directivity formation unit, 4... Delay correction unit, 5... Spatial coordinate data, 6... Correction coefficient calculation unit, 7... Target area Sound extraction unit, 8... Mixing filter forming unit, 9... Signal mixing unit, 10... Signal output unit, 11... Mixing filter adjusting unit, M, M1, M2... Microphone, MA, MA1, MA2... Microphone array.

Claims

Directivity formation for obtaining a target direction signal for each microphone array by forming a directivity in the target area sound direction by the beamformer with respect to each of the capture signals output by the plurality of microphone arrays or the signals based on the capture signals. Means and
A non-target area sound existing in the target area direction is extracted by spectrally subtracting each of the target direction signals, and the target area sound is extracted by spectrally subtracting the extracted non-target area sound from any of the target direction signals. Target area sound extraction means for extracting
The peak frequency is detected from the non-target area sound component dominant signal in which the sound component whose sound source is the non-target area based on the captured signal is dominant, and the suppression frequency component based on the peak frequency of the non-target area sound component dominant signal is detected. Mixing filter forming means for forming a suppressing filter for suppressing,
The suppression filter formed by the mixing filter forming means is applied to a mixing signal composed of the captured signal or a signal based on the captured signal output from any one of the microphone arrays to obtain a filtered mixed signal. A signal mixing means for further mixing the filtered mixing signal with the target area sound extracted by the target area sound extracting means to obtain a mixed signal,
Output means for outputting the mixed signal acquired by the signal mixing means as an area sound collection result of the target area.

The sound collecting device according to claim 1, wherein the mixing filter forming unit sets a frequency in a predetermined range including a peak frequency of the non-target area sound as the suppression frequency of the suppression filter.

The mixing filter forming means calculates and acquires a frequency difference between the suppression frequency and a peak frequency of the target area sound, and suppresses the suppression frequency component for the suppression filter according to the acquired frequency difference. The sound pickup device according to claim 1 or 2, wherein a suppression amount is set.

4. The sound collecting device according to claim 1, wherein the non-target area sound extracted by the target area sound extracting means is applied as the non-target area sound component dominant signal.

The non-target sound whose directivity is directed in a direction other than the target area is applied as the non-target area sound component dominant signal based on the captured signal of any one of the microphone arrays. The sound collecting device according to any one of 1.

Computer,
Directivity formation for obtaining a target direction signal for each microphone array by forming a directivity in the target area sound direction by the beamformer with respect to each of the capture signals output by the plurality of microphone arrays or the signals based on the capture signals. Means and
A non-target area sound existing in the target area direction is extracted by spectrally subtracting each of the target direction signals, and the target area sound is extracted by spectrally subtracting the extracted non-target area sound from any of the target direction signals. Target area sound extraction means for extracting
The peak frequency is detected from the non-target area sound component dominant signal in which the sound component whose sound source is the non-target area based on the captured signal is dominant, and the suppression frequency component based on the peak frequency of the non-target area sound component dominant signal is detected. Mixing filter forming means for forming a suppressing filter for suppressing,
The suppression filter formed by the mixing filter forming means is applied to a mixing signal composed of the captured signal or a signal based on the captured signal output from any one of the microphone arrays to obtain a filtered mixed signal. A signal mixing means for further mixing the filtered mixing signal with the target area sound extracted by the target area sound extracting means to obtain a mixed signal,
A sound collecting program, which is caused to function as output means for outputting the mixed signal acquired by the signal mixing means as an area sound collection result of a target area.

In the sound collecting method performed by the sound collecting device,
A directivity forming unit, a target area sound extracting unit, a mixing filter forming unit, a signal mixing unit and an output unit,
The directivity forming means forms a directivity in a target area sound direction by a beam former with respect to each of a capture signal output by a plurality of microphone arrays or a signal based on the capture signal, and the directivity is generated for each microphone array. Get the signal,
The target area sound extraction means extracts a non-target area sound existing in the target area direction by spectrally subtracting each of the target direction signals, and extracts the extracted non-target area sound from any of the target direction signals. Extract the target area sound by subtracting the spectrum,
The mixing filter forming means detects a peak frequency from a non-target area sound component dominant signal in which a sound component whose sound source is a non-target area based on the captured signal is dominant, and a peak frequency of the non-target area sound component dominant signal. Form a suppression filter that suppresses the suppression frequency component based on
The signal mixing means applies the suppression filter formed by the mixing filter forming means to the mixing signal output from any one of the microphone arrays, or the mixing signal composed of the signal based on the acquisition signal and filtered. A signal for mixing is acquired, and the filtered signal for mixing is further mixed with the target area sound extracted by the target area sound extracting means to acquire a signal after mixing,
The sound collecting method, wherein the output means outputs the mixed signal acquired by the signal mixing means as an area sound collection result of a target area.