JP6436180B2

JP6436180B2 - Sound collecting apparatus, program and method

Info

Publication number: JP6436180B2
Application number: JP2017059400A
Authority: JP
Inventors: 一浩片桐
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2017-03-24
Filing date: 2017-03-24
Publication date: 2018-12-12
Anticipated expiration: 2037-03-24
Also published as: JP2018164156A

Description

本発明は、収音装置、プログラム及び方法に関し、例えば、目的エリアの音を強調し、それ以外のエリアの音を抑圧する処理に適用し得る。 The present invention relates to a sound collection device, a program, and a method, and can be applied, for example, to a process of emphasizing sounds in a target area and suppressing sounds in other areas.

複数の音源が存在する環境下において、ある特定の方向の音のみ分離し収音する技術として、マイクロホンアレイを用いたビームフォーマ（ＢｅａｍＦｏｒｍｅｒ；以下「ＢＦ」と表す）がある。ＢＦとは、各マイクロホンに到達する信号の時間差を利用して指向性を形成する技術である（非特許文献１参照）。 As a technique for separating and collecting only sound in a specific direction in an environment where a plurality of sound sources exist, there is a beam former (Beam Former; hereinafter referred to as “BF”) using a microphone array. BF is a technique for forming directivity using the time difference between signals reaching each microphone (see Non-Patent Document 1).

ＢＦは、加算型と減算型の大きく２つの種類に分けられる。特に減算型ＢＦは、加算型ＢＦに比べ、少ないマイクロホン数で指向性を形成できるという利点がある。 BF is roughly divided into two types, an addition type and a subtraction type. In particular, the subtraction type BF has an advantage that directivity can be formed with a smaller number of microphones than the addition type BF.

図５は、従来の減算型ＢＦに係る構成を示すブロック図である。 FIG. 5 is a block diagram showing a configuration related to a conventional subtraction type BF.

図５に示す従来の減算型ＢＦでは、マイクロホン数が２個となっている。 In the conventional subtraction type BF shown in FIG. 5, the number of microphones is two.

従来の減算型ＢＦは、まず遅延器により目的とする方向に存在する音（以下、「目的音」とも呼ぶ）が各マイクロホンに到来する信号の時間差を算出し、遅延を加えることにより目的音の位相を合わせる。従来の減算型ＢＦの遅延器では、時間差は下記（１）式により算出される。 The conventional subtractive BF first calculates the time difference between signals arriving at each microphone by sounds that are present in a target direction (hereinafter also referred to as “target sound”) by a delay device, and adds a delay to the target sound. Match the phase. In the conventional subtractor BF delay unit, the time difference is calculated by the following equation (1).

下記の（１）式において、ｄはマイクロホン間の距離、ｃは音速、τ_ｉは遅延量である。また、下記の（１）式において、θ_Ｌは、各マイクロホンを結んだ直線に対する垂直方向から目的方向への角度である。
τ_Ｌ＝（ｄｓｉｎθ_Ｌ）／ｃ …（１） In the following formula (1), d is the distance between the microphones, c is the speed of sound, and τ _i is the delay amount. In the following equation (1), θ _L is an angle from a vertical direction to a target direction with respect to a straight line connecting the microphones.
τ _L = (dsin θ _L ) / c (1)

ここで、死角が第１のマイクロホンと第２のマイクロホンの中心に対し、第１のマイクロホンの方向に存在する場合、従来の減算型ＢＦにおける遅延器は、第１のマイクロホンの入力信号ｘ_１（ｔ）に対し遅延処理を行う。その後、遅延処理された入力信号ｘ_１（ｔ）は、（２）式に従い減算処理される。
Ａ（ｔ）＝ｘ_２（ｔ）−ｘ_１（ｔ−τ_Ｌ） …（２） Here, when the blind spot exists in the direction of the first microphone with respect to the center of the first microphone and the second microphone, the delay unit in the conventional subtractive BF has the input signal x ₁ ( Delay processing is performed for t). Thereafter, the input signal x ₁ (t) subjected to the delay process is subjected to a subtraction process according to the equation (2).
A (t) = x ₂ (t) −x ₁ (t−τ _L ) (2)

減算処理は周波数領域でも同様に行うことができ、その場合（２）式は以下のように変更される。

The subtraction process can be performed in the same manner in the frequency domain. In that case, the expression (2) is changed as follows.

ここでθ_Ｌ＝±π／２の場合、形成される指向性は図６（Ａ）に示すように、カージオイド型の単一指向性となり、θ_Ｌ＝０，πの場合は、図６（Ｂ）のような８の字型の双指向性となる。ここでは、入力信号から単一指向性を形成するフィルタを単一指向性フィルタ、双指向性を形成するフィルタを双指向性フィルタと呼ぶものとする。 Here, when θ _L = ± π / 2, the formed directivity is cardioid unidirectional as shown in FIG. 6A, and when θ _L = 0, π, FIG. As shown in (B), the figure is bi-directional. Here, a filter that forms unidirectionality from an input signal is called a unidirectional filter, and a filter that forms bidirectionality is called a bidirectional filter.

またスペクトル減算法（ＳｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎ；以下、「ＳＳ」とも呼ぶ）を用いることで、双指向性の死角に強い指向性を形成することもできる。ＳＳによる指向性の形成は、（４）式に従う。（４）式では、第１のマイクロホンＭ１の入力信号Ｘ_１を用いているが、第２のマイクロホンＭ２の入力信号Ｘ_２でも同様の効果を得ることができる。（４）式において、βはＳＳの強度を調節するための係数である。減算時に値がマイナスなった場合は、０または元の値を小さくした値に置き換えるフロアリング処理を行う。この方式は、双指向性フィルタにより目的方向以外に存在する音（以下、「非目的音」とも呼ぶ）を抽出し、抽出した非目的音のパワースペクトルを入力信号のパワースペクトルから減算することで、目的音を強調することができる。
Ｙ_１＝Ｘ_１−βＡ_１ …（４） Further, by using a spectral subtraction (hereinafter also referred to as “SS”), directivity that is strong against a blind spot of bi-directionality can be formed. The formation of directivity by SS follows equation (4). In the equation (4), the input signal X1 of the _first microphone M1 is used, but the same effect can be obtained with the input signal X2 of the _second microphone M2. In the equation (4), β is a coefficient for adjusting the strength of SS. If the value becomes negative during subtraction, flooring processing is performed in which 0 or the original value is replaced with a smaller value. This method uses a bi-directional filter to extract sound that exists outside the target direction (hereinafter also referred to as “non-target sound”), and subtracts the power spectrum of the extracted non-target sound from the power spectrum of the input signal. The target sound can be emphasized.
Y ₁ = X ₁ −βA ₁ (4)

ある特定のエリア内に存在する音（以下、「目的エリア音」と呼ぶ）だけを収音したい場合、減算型ＢＦを用いるだけでは、そのエリアの周囲に存在する音源（以下、「非目的エリア音」とも呼ぶ）も収音してしまう可能性がある。そこで特許文献１では、複数のマイクロホンアレイを用い、それぞれ別々の方向から目的エリアへ指向性を向け、指向性を目的エリアで交差させることで目的エリア音を収音する手法を提案している。 When it is desired to pick up only sound existing in a certain area (hereinafter referred to as “target area sound”), the sound source (hereinafter referred to as “non-target area”) around that area is simply obtained by using the subtractive BF. (Also called “sound”) may be picked up. Thus, Patent Document 1 proposes a method of collecting a target area sound by using a plurality of microphone arrays, directing directivity from different directions to the target area, and crossing the directivities in the target area.

図７は、２つのマイクロホンアレイＭＡ１、ＭＡ２を用いて、目的エリアの音源からの目的エリア音を収音する場合における各マイクロホンアレイの構成例について示した説明図である。 FIG. 7 is an explanatory diagram showing a configuration example of each microphone array in a case where the target area sound from the sound source in the target area is collected using the two microphone arrays MA1 and MA2.

図８は、図７に示すマイクロホンアレイＭＡ１、ＭＡ２のそれぞれのＢＦ出力について周波数領域で示した説明図（グラフ）である。図８（ａ）、図８（ｂ）は、それぞれマイクロホンアレイＭＡ１、ＭＡ２のＢＦ出力について周波数領域で示したグラフ（イメージ図）である。 FIG. 8 is an explanatory diagram (graph) showing the BF outputs of the microphone arrays MA1 and MA2 shown in FIG. 7 in the frequency domain. FIGS. 8A and 8B are graphs (image diagrams) showing the BF outputs of the microphone arrays MA1 and MA2 in the frequency domain, respectively.

特許文献１に記載された手法では、まずまず各マイクロホンアレイＭＡ１、ＭＡ２のＢＦ出力に含まれる目的エリア音のパワーの比率を推定し、それを補正係数とする。例として２つのマイクロホンアレイＭＡ１、ＭＡ２を使用する場合、目的エリア音パワーの補正係数は、（５）、（６）式または（７）、（８）式により算出される。

In the method described in Patent Document 1, first, the ratio of the power of the target area sound included in the BF output of each of the microphone arrays MA1 and MA2 is estimated and used as a correction coefficient. As an example, when two microphone arrays MA1 and MA2 are used, the correction coefficient for the target area sound power is calculated by the equations (5), (6) or (7), (8).

ここで｜Ｙ_１ｋ｜，｜Ｙ_２ｋ｜はマイクロホンアレイＭＡ１、ＭＡ２のＢＦ出力の周波数ｋのパワー、Ｎは周波数ビンの総数、αはＢＦ出力に対するパワー補正係数である。また、ｍｏｄｅは最頻値、ｍｅｄｉａｎは中央値を表している。その後、補正係数により各ＢＦ出力を補正し、ＳＳすることで、目的エリア方向に存在する非目的エリア音を抽出する。更に抽出した非目的エリア音を各ＢＦの出力からＳＳすることにより自的エリア音を抽出することができる。 Here, | Y _1k |, | Y _2k | is the power of the frequency k of the BF outputs of the microphone arrays MA1 and MA2, N is the total number of frequency bins, and α is a power correction coefficient for the BF output. Further, mode represents the mode value and median represents the median value. Thereafter, each BF output is corrected by the correction coefficient, and SS is performed to extract the non-target area sound existing in the target area direction. Further, by extracting the extracted non-target area sound from the output of each BF, the own area sound can be extracted.

図９は、図７に示すマイクロホンアレイＭＡ１、ＭＡ２を用いて取得したＢＦ出力に基づいてエリア収音処理した場合における各周波数成分のパワースペクトルの変化について示した説明図（イメージを図）である。 FIG. 9 is an explanatory diagram (image is a diagram) showing a change in the power spectrum of each frequency component when area sound collection processing is performed based on the BF output acquired using the microphone arrays MA1 and MA2 shown in FIG. .

まず、マイクロホンアレイＭＡ１の入力信号Ｘ_１から、非目的エリア音Ｎ_２を抑圧したＢＦ出力Ｙ_１を得る（図９（ａ）参照）。 First, the input signal _{X 1} of the microphone array MA1, obtain BF output _{Y 1} that suppresses the non-target area sound _{N 2} (see FIG. 9 (a)).

マイクロホンアレイＭＡ１からみた目的エリア方向に存在する非目的エリア音Ｎ_１（ｎ）を抽出するには、（７）式に示すように、マイクロホンアレイＭＡ１のＢＦ出力Ｙ_２（ｎ）からマイクロホンアレイＭＡ２のＢＦ出力Ｙ_２（ｎ）にパワー補正係数αを掛けたものをＳＳする（図９（ｂ）参照）。その後、（８）式に従い、各ＢＦ出力から非目的エリア音をＳＳして目的エリア音を抽出する（図９（ｃ）参照）。γ（ｎ）はＳＳ時の強度を変更するための係数である。
Ｎ_１＝Ｙ_１−αＹ_２ …（７）
Ｚ_１＝Ｙ_１−γＮ_１ …（８） In order to extract the non-target area sound N ₁ (n) existing in the direction of the target area viewed from the microphone array MA1, the microphone array MA2 is extracted from the BF output Y ₂ (n) of the microphone array MA1 as shown in the equation (7). SS obtained by multiplying the BF output Y ₂ (n) by the power correction coefficient α (see FIG. 9B). After that, according to the equation (8), the non-target area sound is SS from each BF output to extract the target area sound (see FIG. 9C). γ (n) is a coefficient for changing the strength at the time of SS.
N ₁ = Y ₁ −αY ₂ (7)
Z ₁ = Y ₁ −γN ₁ (8)

以上のように、特許文献１に記載された手法では、目的エリア音を抽出するために、（４）式と（８）式で非線形処理であるＳＳを行っているため、高雑音環境下ではミュージカルノイズと呼ばれる不快な異音が発生する恐れがある。 As described above, in the method described in Patent Document 1, since the non-linear processing SS is performed by the equations (4) and (8) in order to extract the target area sound, in a high noise environment, An unpleasant noise called musical noise may occur.

そこで特許文献２では、目的エリア音が存在している区間と存在していない区間を判定し、存在していない区間ではエリア収音処理した音を出力しないことにより、ミュージカルノイズなどの異音を抑えている。目的エリア音が存在しているかどうかを判定するために、まず（９）式に従い入力信号と目的エリア音を抽出した出力データ（以後、「エリア音出力データ」とも呼ぶ）間のパワースペクトル比（エリア音出力データ／入力信号）を算出する。目的エリア内に音源が存在する場合、入力信号Ｘ_１とエリア音出力データＺ_１には目的エリア音が共通に含まれるため、目的エリア音成分のパワースペクトル比は１に近い値となる。逆に非目的エリア音成分は、エリア音出力データでは抑圧されているため、パワースペクトル比は小さい値となる。また、特許文献２に記載の装置では、その他の背景雑音成分に関してもエリア収音処理では複数回のＳＳを行うため、専用の雑音抑圧処理を事前にしなくてもある程度抑圧され、パワースペクトル比は小さい値となる。逆に目的エリア音が存在しない場合、エリア音出力データには、入力信号と比べて消し残りの弱い雑音しか含まれていないため、パワースペクトル比は全体域で小さい値となる。この特徴により、（１０）式に従い各周波数で求めたパワースペクトル比の平均を取ると、目的エリア音が存在するときと存在しないときとで大きな差が生まれることになる。ここで、ｍとｎは、それぞれ処理帯域の上限と下限であり、例えば音声情報が十分に含まれる１００Ｈｚから６ｋＨｚとする。そして、特許文献２に記載された装置では、平均パワースペクトル比を予め設定した閾値で判定し、目的エリア音が存在しないと判定された場合は、エリア音出力データを出力せずに無音、もしくは入力音のゲインを小さくした音を出力する。

Therefore, in Patent Document 2, the section where the target area sound exists and the section where the target area sound does not exist are determined. In the section where the target area sound does not exist, the sound that has been subjected to the area sound collection processing is not output. It is suppressed. In order to determine whether or not the target area sound exists, first, the power spectrum ratio (hereinafter referred to as “area sound output data”) between the input signal and the output data obtained by extracting the target area sound according to the equation (9) ( Area sound output data / input signal) is calculated. When a sound source exists in the target area, the target area sound is included in both the input signal X ₁ and the area sound output data Z ₁ , so that the power spectrum ratio of the target area sound component is close to 1. Conversely, the non-target area sound component is suppressed in the area sound output data, and therefore the power spectrum ratio is a small value. In addition, in the apparatus described in Patent Document 2, other background noise components are also subjected to SS several times in the area sound collection process, so that the power spectrum ratio is reduced to some extent without performing dedicated noise suppression processing in advance. Small value. On the contrary, when the target area sound does not exist, the area sound output data includes only weak noise that is not erased compared to the input signal, and therefore, the power spectrum ratio becomes a small value in the entire region. Due to this feature, taking the average of the power spectrum ratios obtained at the respective frequencies according to the equation (10), a large difference is produced between when the target area sound exists and when it does not exist. Here, m and n are the upper limit and the lower limit of the processing band, respectively, and are set to, for example, 100 Hz to 6 kHz in which audio information is sufficiently included. And in the apparatus described in Patent Document 2, the average power spectrum ratio is determined by a preset threshold, and when it is determined that the target area sound does not exist, no sound is output without outputting the area sound output data, or Outputs sound with reduced gain of input sound.

特開２０１４−０７２７０８号公報JP 2014-072708 A 特開２０１６−１２７４５７号公報JP, 2006-127457, A

浅野太著，“音響テクノロジーシリーズ１６音のアレイ信号処理−音源の定位・追跡と分離−”，日本音響学会編，コロナ社，２０１１年２月２５日発行Asano Tadashi, "Acoustic Technology Series 16 Sound Array Signal Processing-Sound Source Localization / Tracking and Separation-", Acoustical Society of Japan, Corona, February 25, 2011

特許文献１に記載の手法を用いれば、目的とするエリアの周囲に非目的エリア音が存在していても、目的エリア音を収音することができる。すなわち、特許文献１の手法を用いれば、高騒音下においてもエリア内に存在する音だけを抽出することができる。しかしながら、ＳＮ比が０ｄＢ以下のような状況では、目的エリア音成分と非目的エリア音成分の一部が重なっている可能性がある。この状態でＳＳにより目的エリア音を抽出すると、目的エリア音成分と重なっている非目的エリア音成分の影響で目的エリア音成分が削られ、結果として抽出された目的エリア音が歪んでしまうことになり、さらにミュージカルノイズも強くなる恐れがある。 If the method described in Patent Document 1 is used, even if a non-target area sound exists around the target area, the target area sound can be collected. That is, if the method of Patent Document 1 is used, it is possible to extract only sound existing in the area even under high noise. However, in a situation where the SN ratio is 0 dB or less, there is a possibility that part of the target area sound component and the non-target area sound component overlap. If the target area sound is extracted by the SS in this state, the target area sound component is deleted due to the influence of the non-target area sound component overlapping the target area sound component, and as a result, the extracted target area sound is distorted. Furthermore, there is a risk that musical noise will also become stronger.

また、特許文献２に記載の手法を用いれば、エリア収音処理で発生するミュージカルノイズの影響を抑えることができる。しかしながら、イベント会場など人が多い場所、や周囲で音楽などが流れている場所などの高雑音環境下ではＳＮ比が悪化し、エリア収音により出力される音のパワースペクトルが小さくなる可能性がある。このような状況では、エリア収音出力と入力信号の平均パワースペクトル比も小さくなってしまう。特に無声子音の様なもともとパワーが小さい成分では、目的エリア音の判定精度が悪くなり、目的エリア音の一部が欠落してしまう恐れがある。また入力信号を混合して音質を改善する場合、目的エリア音の判定精度が悪くなると、目的エリア音が存在しないときも出力され、入力信号に含まれる非目的エリア音だけが聞こえてしまう恐れがある。 Moreover, if the method described in Patent Document 2 is used, it is possible to suppress the influence of musical noise generated in the area sound collection processing. However, in a high noise environment such as a place where there are many people such as an event venue or a place where music is flowing in the surroundings, there is a possibility that the SN ratio deteriorates and the power spectrum of the sound output by area sound collection becomes small. is there. In such a situation, the average power spectrum ratio between the area sound collection output and the input signal also becomes small. In particular, components with originally low power, such as unvoiced consonants, have a poor target area sound determination accuracy, and a part of the target area sound may be lost. Also, when improving the sound quality by mixing the input signals, if the target area sound judgment accuracy deteriorates, it will be output even when the target area sound does not exist, and only the non-target area sound included in the input signal may be heard. is there.

以上のような問題に鑑みて、背景雑音が強い環境下において、目的エリア音の判定精度を向上させることができる収音装置、プログラム及び方法、並びに、判定装置、プログラム及び方法が望まれている。 In view of the above problems, a sound collection device, a program, and a method, and a determination device, a program, and a method that can improve the determination accuracy of a target area sound in an environment with strong background noise are desired. .

第１の本発明は、（１）入力信号からビームフォーマにより目的エリア方向に指向性を形成する指向性形成手段と、（２）前記指向性形成手段で形成された指向性による目的エリア方向に存在する非目的エリア音を抽出する非目的エリア音抽出手段と、（３）前記ビームフォーマの出力から、前記非目的エリア音抽出手段が抽出した目的エリア方向に存在する非目的エリア音を利用して目的エリア音を抽出した結果の抽出音を出力する目的エリア音抽出手段と、（４）周波数成分ごとに前記入力信号と前記抽出音のパワースペクトル比を算出するパワースペクトル比算出手段と、（５）前記パワースペクトル比算出手段が算出したパワースペクトル比を用いて、周波数成分ごとに目的エリア音が存在するか否かを判定する判定手段と、（６）前記判定手段で目的エリア音が存在すると判定された周波数成分については前記抽出音の当該周波数成分を出力する出力手段とを有し、（７）前記出力手段は、すべての周波数成分に対して、前記判定手段が前記入力信号に目的エリア音が存在しないと判断した周波数成分の割合が、第２の閾値を超える場合、全周波数成分で前記抽出音を出力しないことを特徴とする。 The first aspect of the present invention is: (1) directivity forming means for forming directivity in the target area direction from the input signal by a beam former; and (2) in the target area direction by directivity formed by the directivity forming means. Non-target area sound extraction means for extracting existing non-target area sound; and (3) using non-target area sound existing in the target area direction extracted by the non-target area sound extraction means from the output of the beamformer. (4) power spectrum ratio calculating means for calculating a power spectrum ratio between the input signal and the extracted sound for each frequency component; 5) Determination means for determining whether or not there is a target area sound for each frequency component using the power spectrum ratio calculated by the power spectrum ratio calculation means; and (6) The determined frequency components with sound object area is present in serial determination means possess and output means for outputting the frequency components of said extracted sound, (7) and the output means, with respect to all frequency components, When the ratio of the frequency components determined by the determining means that the target area sound does not exist in the input signal exceeds a second threshold, the extracted sound is not output with all frequency components .

第２の本発明の収音プログラムは、コンピュータを、（１）入力信号からビームフォーマにより目的エリア方向に指向性を形成する指向性形成手段と、（２）前記指向性形成手段で形成された指向性による目的エリア方向に存在する非目的エリア音を抽出する非目的エリア音抽出手段と、（３）前記ビームフォーマの出力から、前記非目的エリア音抽出手段が抽出した目的エリア方向に存在する非目的エリア音を利用して目的エリア音を抽出した結果の抽出音を出力する目的エリア音抽出手段と、（４）周波数成分ごとに前記入力信号と前記抽出音のパワースペクトル比を算出するパワースペクトル比算出手段と、（５）前記パワースペクトル比算出手段が算出したパワースペクトル比を用いて、周波数成分ごとに目的エリア音が存在するか否かを判定する判定手段と、（６）前記判定手段で目的エリア音が存在すると判定された周波数成分については前記抽出音の当該周波数成分を出力する出力手段として機能させ、（７）前記出力手段は、すべての周波数成分に対して、前記判定手段が前記入力信号に目的エリア音が存在しないと判断した周波数成分の割合が、第２の閾値を超える場合、全周波数成分で前記抽出音を出力しないことを特徴とする。 The sound collecting program of the second aspect of the present invention is formed by (1) directivity forming means for forming directivity in the direction of a target area by a beam former from an input signal, and (2) the directivity forming means. Non-target area sound extracting means for extracting non-target area sound existing in the target area direction due to directivity; and (3) existing in the target area direction extracted by the non-target area sound extracting means from the output of the beamformer. A target area sound extraction means for outputting an extracted sound obtained by extracting a target area sound using a non-target area sound; and (4) a power for calculating a power spectrum ratio of the input signal and the extracted sound for each frequency component. (5) Whether there is a target area sound for each frequency component using the power spectrum ratio calculated by the power spectrum ratio calculation means; Determination means for determining, (6) the determination for the determined frequency components with sound object area is present in means to function as output means for outputting the frequency components of said extracted sound, (7) the output means Outputs the extracted sound with all frequency components when the ratio of the frequency components determined by the determining means that the target area sound does not exist in the input signal exceeds the second threshold for all frequency components It is characterized by not .

第３の本発明は、収音装置が行う収音方法において、（１）指向性形成手段、非目的エリア音抽出手段、目的エリア音抽出手段、パワースペクトル比算出手段、判定手段、及び出力手段を備え、（２）前記指向性形成手段は、入力信号からビームフォーマにより目的エリア方向に指向性を形成し、（３）前記非目的エリア音抽出手段は、前記指向性形成手段で形成された指向性による目的エリア方向に存在する非目的エリア音を抽出し、（４）前記目的エリア音抽出手段は、前記ビームフォーマの出力から、前記非目的エリア音抽出手段が抽出した目的エリア方向に存在する非目的エリア音を利用して目的エリア音を抽出した結果の抽出音を出力し、（５）前記パワースペクトル比算出手段は、周波数成分ごとに前記入力信号と前記抽出音のパワースペクトル比を算出すると、（６）前記判定手段は、前記パワースペクトル比算出手段が算出したパワースペクトル比を用いて、周波数成分ごとに目的エリア音が存在するか否かを判定し、（７）前記出力手段は、前記判定手段で目的エリア音が存在すると判定された周波数成分については前記抽出音の当該周波数成分を出力し、（８）前記出力手段は、すべての周波数成分に対して、前記判定手段が前記入力信号に目的エリア音が存在しないと判断した周波数成分の割合が、第２の閾値を超える場合、全周波数成分で前記抽出音を出力しないことを特徴とする。 According to a third aspect of the present invention, there is provided a sound collection method performed by the sound collection device. (1) Directivity forming means, non-target area sound extraction means, target area sound extraction means, power spectrum ratio calculation means, determination means, and output means (2) The directivity forming means forms directivity in the direction of the target area from the input signal by a beam former, and (3) the non-target area sound extracting means is formed by the directivity forming means. (4) The target area sound extraction means is present in the target area direction extracted by the non-target area sound extraction means from the output of the beamformer. (5) The power spectrum ratio calculation means outputs the input signal and the extracted sound for each frequency component. When the power spectrum ratio is calculated, (6) the determination means determines whether there is a target area sound for each frequency component using the power spectrum ratio calculated by the power spectrum ratio calculation means, and (7 ) The output means outputs the frequency component of the extracted sound for the frequency component determined by the determination means that the target area sound exists , and (8) the output means for all frequency components, When the ratio of the frequency components determined by the determining means that the target area sound does not exist in the input signal exceeds a second threshold, the extracted sound is not output with all frequency components .

本発明によれば、背景雑音が強い環境下において、目的エリア音の判定精度を向上させることができる。 ADVANTAGE OF THE INVENTION According to this invention, the determination precision of the target area sound can be improved in the environment where background noise is strong.

第１の実施形態に係る収音装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the sound collection device which concerns on 1st Embodiment. 第１の実施形態に係るエリア収音処理部の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the area sound collection process part which concerns on 1st Embodiment. 第２の実施形態に係る収音装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the sound collection device which concerns on 2nd Embodiment. 第３の実施形態に係る収音装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the sound collection device which concerns on 3rd Embodiment. 従来のマイクロホン数が２個の場合の減算型ＢＦに係る構成を示すブロック図である。It is a block diagram which shows the structure which concerns on the subtraction type BF in case the number of conventional microphones is two. 従来の２個のマイクロホンを用いた減算型ＢＦにより形成される指向特性を示す図である。It is a figure which shows the directional characteristic formed by the subtraction type | mold BF using the conventional two microphones. 従来の２つのマイクロホンアレイを用いて、目的エリアの音源からの目的エリア音を収音する場合における各マイクロホンアレイの構成例について示した説明図である。It is explanatory drawing shown about the example of a structure of each microphone array in the case of picking up the target area sound from the sound source of a target area using two conventional microphone arrays. 従来の２つマイクロホンアレイのそれぞれのＢＦ出力について周波数領域で示した説明図である。It is explanatory drawing shown in the frequency domain about each BF output of the conventional two microphone array. 従来の２つのマイクロホンアレイを用いて取得したＢＦ出力に基づいてエリア収音処理した場合における各成分のパワースペクトルの変化について示した説明図である。It is explanatory drawing shown about the change of the power spectrum of each component at the time of area sound collection processing based on BF output acquired using the conventional two microphone arrays.

（Ａ）第１の実施形態
以下、本発明による収音装置、プログラム及び方法の第１の実施形態を、図面を参照しながら詳述する。 (A) First Embodiment Hereinafter, a first embodiment of a sound collection device, a program, and a method according to the present invention will be described in detail with reference to the drawings.

（Ａ−１）第１の実施形態の構成
図１は、この実施形態の収音装置１００の機能的構成について示したブロック図である。 (A-1) Configuration of the First Embodiment FIG. 1 is a block diagram showing the functional configuration of the sound collection device 100 of this embodiment.

収音装置１００は、２つのマイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）を用いて、目的エリアの音源からの目的エリア音を収音する目的エリア音収音処理を行う。 The sound collection device 100 uses two microphone arrays MA (MA1, MA2) to perform target area sound collection processing for collecting a target area sound from a sound source in the target area.

マイクロホンアレイＭＡ１、ＭＡ２は、目的エリアが存在する空聞の任意の場所に配置される。目的エリアに対するマイクロホンアレイＭＡ１、ＭＡ２の位置は、例えば、図７に示すように、指向性が目的エリアでのみ重なればどこでも良く、例えば目的エリアを挟んで対向に配置しても良い。各マイクロホンアレイＭＡは２つ以上のマイクロホンＭから構成され、各マイクロホンＭにより音響信号を収音する。この実施形態では、各マイクロホンアレイＭＡに、音響信号を収音する２つのマイクロホンＭ（Ｍ１、Ｍ２）が配置されるものとして説明する。すなわち、各マイクロホンアレイＭＡは、２ｃｈマイクロホンアレイを構成している。なお、マイクロホンアレイＭＡの数は２つに限定するものではなく、目的エリアが複数存在する場合、全てのエリアをカバーできる数のマイクロホンアレイＭＡを配置する必要がある。 The microphone arrays MA1 and MA2 are arranged at any place in the air where the target area exists. For example, as shown in FIG. 7, the positions of the microphone arrays MA1 and MA2 with respect to the target area may be anywhere as long as the directivities overlap only in the target area. Each microphone array MA is composed of two or more microphones M, and an acoustic signal is collected by each microphone M. In this embodiment, description will be made assuming that two microphones M (M1, M2) that collect sound signals are arranged in each microphone array MA. That is, each microphone array MA constitutes a 2ch microphone array. The number of microphone arrays MA is not limited to two. When there are a plurality of target areas, it is necessary to arrange a number of microphone arrays MA that can cover all areas.

収音装置は、データ入力部１、エリア収音処理部２、周波数別パワー比算出部３、及び周波数別エリア音判定部４を有している。 The sound collection device includes a data input unit 1, an area sound collection processing unit 2, a frequency-specific power ratio calculation unit 3, and a frequency-specific area sound determination unit 4.

図２は、エリア収音処理部２の機能的構成の例ついて示したブロック図である。 FIG. 2 is a block diagram showing an example of a functional configuration of the area sound collection processing unit 2.

この実施形態の例では、エリア収音処理部２は、指向性形成部２−１、遅延補正部２−２、空間座標データ２−３、目的エリア音パワー補正係数算出部２−４、及び目的エリア音抽出部２−５を有しているものとして説明する。 In the example of this embodiment, the area sound collection processing unit 2 includes a directivity forming unit 2-1, a delay correction unit 2-2, spatial coordinate data 2-3, a target area sound power correction coefficient calculation unit 2-4, and Description will be made assuming that the target area sound extraction unit 2-5 is included.

収音装置１００を構成する各機能ブロックの詳細処理については後述する。 Detailed processing of each functional block constituting the sound collection device 100 will be described later.

収音装置１００は、全てハードウェア（例えば、専用チップ等）により構成するようにしてもよいし一部又は全部についてソフトウェア（プログラム）として構成するようにしてもよい。収音装置１００は、例えば、プロセッサ及びメモリを有するコンピュータにプログラム（実施形態の判定プログラムや収音プログラムを含む）をインストールすることにより構成するようにしてもよい。 The sound collection device 100 may be configured entirely by hardware (for example, a dedicated chip or the like), or may be partially or entirely configured as software (program). For example, the sound collection device 100 may be configured by installing a program (including the determination program and the sound collection program of the embodiment) in a computer having a processor and a memory.

（Ａ−２）第１の実施形態の動作
次に、以上のような構成を有する第１の実施形態の収音装置１００の動作（実施形態に係る収音方法）を説明する。 (A-2) Operation of First Embodiment Next, the operation (sound collection method according to the embodiment) of the sound collection device 100 of the first embodiment having the above-described configuration will be described.

データ入力部１は、各マイクロホンアレイＭＡ１、ＭＡ２で収音した音響信号をアナログ信号からデジタル信号に変換する。そして、データ入力部１は、当該デジタル信号について、変換処理（例えば、高速フーリエ変換等を用いて時間領域から周波数領域へ変換する処理）を行う。 The data input unit 1 converts the acoustic signals collected by the microphone arrays MA1 and MA2 from analog signals to digital signals. And the data input part 1 performs the conversion process (For example, the process converted from a time domain to a frequency domain using a fast Fourier transform etc.) about the said digital signal.

エリア収音処理部２は、データ入力部１から取得したマイクロホンアレイの入力信号をもとに、マイクロホンアレイ毎に指向性を形成し、それら指向性に同時に含まれる成分を目的エリア音として抽出する。 The area sound collection processing unit 2 forms directivity for each microphone array based on the input signal of the microphone array obtained from the data input unit 1, and extracts components included in the directivity as target area sounds. .

この実施形態において、エリア収音処理部２によるエリア収音処理は、例えば、図４に示す構成により実現されるものとして説明するが、その他の方式を用いて目的エリア音を抽出する構成を適用するようにしてもよい。 In this embodiment, the area sound collection processing by the area sound collection processing unit 2 will be described as being realized by the configuration shown in FIG. 4, for example, but a configuration in which the target area sound is extracted using other methods is applied. You may make it do.

以下では、図２に示すエリア収音処理部２の各構成要素の動作について説明する。 Below, operation | movement of each component of the area sound collection process part 2 shown in FIG. 2 is demonstrated.

指向性形成部２−１は、マイクロホンアレイＭＡ毎に、目的方向以外に存在する非目的エリア音を抽出（例えば、双指向性フィルタにより抽出）し、抽出した非目的エリア音のパワースペクトルを入力信号のパワースペクトルから減算することで、目的エリア方向に指向性を形成した音（ＢＦ出力）を取得する。具体的には、指向性形成部２−１は、マイクロホンアレイＭＡ毎に、（４）式に従いＢＦにより目的エリア方向に指向性を形成した音をＢＦ出力として取得する。 The directivity forming unit 2-1 extracts, for each microphone array MA, non-target area sounds that exist in directions other than the target direction (for example, extraction by a bi-directional filter), and inputs the power spectrum of the extracted non-target area sounds. By subtracting from the power spectrum of the signal, a sound (BF output) having directivity in the direction of the target area is acquired. Specifically, the directivity forming unit 2-1 acquires, as a BF output, a sound in which directivity is formed in the direction of the target area by BF according to the equation (4) for each microphone array MA.

遅延補正部２−２は、目的エリアと各マイクロホンアレイの距離の違いにより発生する遅延を算出して補正する。遅延補正部２−２は、空間座標データ２−３から目的エリアの位置とマイクロホンアレイの位置を取得し、各マイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）への目的エリア音の到達時間の差を算出する。そして、遅延補正部２−２は、最も目的エリアから遠い位置に配置されたマイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）を基準として、全てのマイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）に目的エリア音が同時に到達するように遅延を加える。 The delay correction unit 2-2 calculates and corrects a delay caused by a difference in distance between the target area and each microphone array. The delay correction unit 2-2 acquires the position of the target area and the position of the microphone array from the spatial coordinate data 2-3, and calculates the difference in arrival time of the target area sound to each microphone array MA (MA1, MA2). . Then, the delay correction unit 2-2 makes the target area sound reach all the microphone arrays MA (MA1, MA2) simultaneously with reference to the microphone array MA (MA1, MA2) arranged farthest from the target area. To add delay.

空間座標データ２−３は、全ての目的エリアと各マイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）と各マイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）を構成するマイクロホンＭ（Ｍ１、Ｍ２）の位置情報を保持する。 The spatial coordinate data 2-3 holds position information of all the target areas, the microphone arrays MA (MA1, MA2), and the microphones M (M1, M2) constituting the microphone arrays MA (MA1, MA2).

目的エリア音パワー補正係数算出部２−４は、各ＢＦ出力に含まれる目的エリア音成分のパワーを同じにするための補正係数を（５）式または（６）式に従い算出する。 The target area sound power correction coefficient calculation unit 2-4 calculates a correction coefficient for making the power of the target area sound component included in each BF output the same according to the equation (5) or (6).

目的エリア音抽出部２−５は、目的エリア音パワー補正係数算出部２−４で算出した補正係数により補正した各ＢＦ出力データを（７）式に従いＳＳし、目的エリア方向に存在する雑音を抽出する。さらに、目的エリア音抽出部２−５は、抽出した雑音を各ＢＦの出力から（８）式に従いＳＳすることにより目的エリア音を抽出する。 The target area sound extraction unit 2-5 performs SS on each BF output data corrected by the correction coefficient calculated by the target area sound power correction coefficient calculation unit 2-4 according to the equation (7), and removes noise present in the target area direction. Extract. Further, the target area sound extraction unit 2-5 extracts the target area sound by performing SS on the extracted noise from the output of each BF according to the equation (8).

周波数別パワー比算出部３は、周波数毎に、データ入力部１から供給される入力信号Ｘ_１と、エリア収音処理部２から供給されるエリア音出力データＺ_１を用いて、周波数毎のパワー比｜Ｒ_ｋ｜を算出する。具体的には、周波数別パワー比算出部３は、（１１）式に基づき周波数毎のパワー比を算出する。ここで、｜Ｘ_１ｋ｜は、マイクロホンアレイＭＡ１の入力信号Ｘ_１（第１のマイクロホンＭ１の入力信号）における周波数ｋのパワーであり、｜Ｚ_１ｋ｜はエリア音出力データにおける周波数ｋのパワーである。また、ｍは処理対象の周波数の下限、ｎは周波数の上限である。

The frequency-specific power ratio calculation unit 3 uses, for each frequency, the input signal X ₁ supplied from the data input unit ₁ and the area sound output data Z ₁ supplied from the area sound collection processing unit 2 for each frequency. The power ratio | R _k | is calculated. Specifically, the power ratio calculation unit 3 for each frequency calculates the power ratio for each frequency based on the equation (11). Here, | X _1k | is the power of frequency k in the input signal X ₁ (input signal of the first microphone M1) of the microphone array MA1, and | Z _1k | is the power of frequency k in the area sound output data. is there. M is the lower limit of the frequency to be processed, and n is the upper limit of the frequency.

周波数別エリア音判定部４は、周波数別パワー比算出部３により算出したパワー比｜Ｒ｜を予め設定された閾値Ｔ１と周波数毎に比較し、エリア音成分を判定する。具体的には、周波数別エリア音判定部４は、周波数ごとに、パワー比｜Ｒ｜と閾値Ｔ１とを比較し、パワー比｜Ｒ｜が閾値Ｔ１を超える成分を抽出する。 The frequency-specific area sound determination unit 4 compares the power ratio | R | calculated by the frequency-specific power ratio calculation unit 3 with a preset threshold T1 for each frequency to determine an area sound component. Specifically, the frequency-specific area sound determination unit 4 compares the power ratio | R | and the threshold value T1 for each frequency, and extracts components whose power ratio | R | exceeds the threshold value T1.

周波数別エリア音判定部４において、閾値Ｔ１は全周波数で同じ値にしても良いし、周波数毎に異なる値を適用するようにしてもよい。周波数別エリア音判定部４では、例えば、Ｔ１について、低域から高域に行くに従って小さくなる値を適用するようにしてもよい。また、周波数別エリア音判定部４では、例えば、低域（例えば１００Ｈｚ以下）について、低域以外（例えば、１００Ｈｚより高い周波数）よりも大きな値をＴ１に設定するようにしてもよい。 In the frequency-specific area sound determination unit 4, the threshold value T1 may be the same value for all frequencies, or a different value may be applied for each frequency. In the frequency-specific area sound determination unit 4, for example, a value that decreases with increasing frequency from low to high may be applied to T <b> 1. In addition, the frequency-specific area sound determination unit 4 may set, for example, a value larger than the low frequency (for example, a frequency higher than 100 Hz) to T1 for the low frequency (for example, 100 Hz or less).

この実施形態では、周波数別エリア音判定部４は、パワー比｜Ｒ｜が閾値Ｔ１を超える（｜Ｒ｜＞Ｔ１）周波数（周波数成分）については、エリア音成分が存在する（入力信号Ｘ_１及びエリア音出力データＺ_１に目的エリア音の成分が存在する）と判定するものとして説明する。 In this embodiment, the frequency-specific area sound determination unit 4 has an area sound component for the frequency (frequency component) where the power ratio | R | exceeds the threshold T1 (| R |> T1) (input signal X ₁ and there are components of the objective area sound area sound output data Z ₁₎ and is described as being determined.

周波数別エリア音判定部４は、エリア音成分が存在すると判定された周波数（周波数成分）について、エリア収音処理部２から供給されたエリア音出力データＺ_１をそのまま出力し、エリア音成分が存在しないと判定した周波数については、エリア音出力データＺ_１は出力せずに所定の音声データ（例えば、予め設定された無音のデータ）を出力する。 Frequency-area sound determination unit 4, for a determined frequency area sound components are present (frequency components), and directly outputs the area sound output data Z ₁ supplied from the area sound-pickup processing section 2, the area sound component the frequency is determined not to exist, the area sound output data Z ₁ outputs predetermined sound data without outputting (e.g., silence data set in advance).

なお、周波数別エリア音判定部４は、エリア音成分が存在しないと判定された周波数について、無音の代わりに、エリア音出力データＺ_１もしくは入力信号Ｘ_１のゲインを弱めて出力するようにしてもよい。 Note that the frequency-specific area sound determination unit 4 outputs the frequency determined to have no area sound component by decreasing the gain of the area sound output data Z ₁ or the input signal X ₁ instead of silence. Also good.

（Ａ−３）第１の実施形態の効果
この実施形態によれば、以下のような効果を奏することができる。 (A-3) Effects of First Embodiment According to this embodiment, the following effects can be achieved.

この実施形態の収音装置１００では、周波数毎（周波数成分ごと）にエリア音出力データと入力信号のパワー比を求め、目的エリア音成分であるかどうかの判定を行う。そして、この実施形態の収音装置１００では、各周波数のパワー比を予め設定した閾値Ｔ１と比較し、パワー比が閾値Ｔ１を上回った周波数については、目的エリア音成分として判定し、その周波数のエリア音出力データを出力する。また、この実施形態の収音装置１００では、パワー比が閾値Ｔ１を下回った周波数については、目的エリア音成分でないと判定し、その周波数では何も出力しない、もしくはエリア音出力データのゲインを下げて出力する。エリア音出力データにおいて目的エリア音の主要な成分では値が大きくなるため、この実施形態の収音装置１００では、目的エリア音が存在する成分についてはそのまま出力される。また、この実施形態の収音装置１００では、値が小さく目的エリア音成分でないと判定された成分は出力されないが、目的エリア音には関与しないため影響はない。全帯域の平均パワーが小さい無声子音であっても、パワースペクトルにはピークがあるが、この実施形態の収音装置１００では、周波数毎にパワー比を求めるため、無声子音の主要な成分は大きな値となり、目的エリア音成分であると判定されることになる。 In the sound collection device 100 of this embodiment, the power ratio between the area sound output data and the input signal is obtained for each frequency (for each frequency component), and it is determined whether the sound is the target area sound component. In the sound collection device 100 of this embodiment, the power ratio of each frequency is compared with a preset threshold value T1, and a frequency whose power ratio exceeds the threshold value T1 is determined as a target area sound component. Output area sound output data. Further, in the sound collection device 100 of this embodiment, it is determined that the frequency whose power ratio is lower than the threshold value T1 is not the target area sound component, and nothing is output at that frequency, or the gain of the area sound output data is lowered. Output. In the area sound output data, the value of the main component of the target area sound becomes large. Therefore, in the sound collection device 100 of this embodiment, the component in which the target area sound exists is output as it is. Further, in the sound collecting apparatus 100 of this embodiment, a component whose value is small and determined not to be the target area sound component is not output, but there is no influence because it is not related to the target area sound. Even in a voiceless consonant having a small average power in the entire band, there is a peak in the power spectrum. However, in the sound collection device 100 of this embodiment, the power ratio is obtained for each frequency, so the main component of the voiceless consonant is large. It becomes a value and it is determined that it is the target area sound component.

以上のように、この実施形態の収音装置１００では、周波数成分毎にエリア音出力データと入力信号のパワー比を求め、目的エリア音成分の有無を判定し、目的音成分と判定された周波数成分のみ出力することにより、高雑音環境下においても目的エリア音の欠落を防ぐことができる。 As described above, in the sound collection device 100 of this embodiment, the power ratio between the area sound output data and the input signal is obtained for each frequency component, the presence / absence of the target area sound component is determined, and the frequency determined as the target sound component By outputting only the components, it is possible to prevent the lack of the target area sound even in a high noise environment.

（Ｂ）第２の実施形態
以下、本発明による収音装置、プログラム及び方法の第２の実施形態を、図面を参照しながら詳述する。 (B) Second Embodiment Hereinafter, a second embodiment of the sound collection device, program and method according to the present invention will be described in detail with reference to the drawings.

（Ｂ−１）第２の実施形態の構成
図３は、この実施形態の収音装置１００Ａの機能的構成について示したブロック図である。図３では、上述の図１と同一部分又は対応部分に同一符号又は対応符号を付している。 (B-1) Configuration of Second Embodiment FIG. 3 is a block diagram showing a functional configuration of the sound collection device 100A of this embodiment. In FIG. 3, the same or corresponding parts as those in FIG.

以下では、第２の実施形態の収音装置１００Ａについて、第１の実施形態との差異を説明する。 Hereinafter, the difference from the first embodiment will be described for the sound collection device 100A of the second embodiment.

収音装置１００Ａでは、周波数別エリア音判定部４の後段に、エリア音判定部５が追加されている点で、第１の実施形態と異なっている。 The sound collection device 100 </ b> A is different from the first embodiment in that an area sound determination unit 5 is added after the frequency-specific area sound determination unit 4.

（Ｂ−２）第２の実施形態の動作
次に、以上のような構成を有する第２の実施形態の収音装置１００Ａの動作（実施形態に係る収音方法）について第１の実施形態との差異を説明する。 (B-2) Operation of the Second Embodiment Next, the operation of the sound collection device 100A of the second embodiment having the above-described configuration (sound collection method according to the embodiment) and the first embodiment. Explain the difference.

上述の通り、収音装置１００Ａでは、エリア音判定部５が追加されている点で第１の実施形態と異なる。以下では、エリア音判定部５を中心とした目的エリア音の判定処理について説明する。 As described above, the sound collection device 100A is different from the first embodiment in that the area sound determination unit 5 is added. Below, the determination process of the target area sound centering on the area sound determination part 5 is demonstrated.

エリア音判定部５は、周波数別エリア音判定部４の全周波数の判定結果から目的エリア音が存在する区間かどうか（入力信号Ｘ_１及びエリア音出力データＺ_１において目的エリア音が存在する区間であるか否か）を判定し、目的エリア音が存在する区間であると判定した場合は全周波数でエリア音出力データＺ_１を出力し、存在しないと判定した場合は全周波数で所定のデータ（例えば、無音のデータ）を出力するものとする。 Area sound determination unit 5, there is interest area sound in a section if (input signal X ₁ and the area sound output data Z ₁ there is interest areas sound from the determination results of all frequencies of the frequency by area sound determination unit 4 sections determines that whether) a, when it is determined that an interval in which there is interest area sound output area sound output data Z ₁ at all frequencies, predetermined data in all frequency if it is determined non-existent and (For example, silent data) is output.

以下に、エリア音判定部５の動作の具体例について説明する。 Below, the specific example of operation | movement of the area sound determination part 5 is demonstrated.

エリア音判定部５は、まず周波数別エリア音判定部４の判定結果において、目的エリア音成分であると判定された周波数と目的エリア音成分でないと判定された周波数の割合を算出する。 The area sound determination unit 5 first calculates the ratio of the frequency determined to be the target area sound component and the frequency determined not to be the target area sound component in the determination result of the frequency-specific area sound determination unit 4.

例えば、目的エリア音成分であると判定された周波数（パワー比｜Ｒ｜が閾値Ｔ１を超える周波数）の数をＣ１、目的エリア音成分でないと判定された周波数（パワー比｜Ｒ｜が閾値Ｔ１以下の周波数）の数をＣ２とした場合、目的エリア音成分であると判定された周波数の割合Ｐ１は、Ｐ１＝Ｃ１／（Ｃ１＋Ｃ２）となり、目的エリア音成分でないと判定された周波数の割合Ｐ２は、Ｐ２＝Ｃ２／（Ｃ１＋Ｃ２）となる。なお、Ｃ１とＣ２を合計した値は、全周波数成分の数（例えば、周波数ビンの数）となる。 For example, C1 is the number of frequencies determined to be the target area sound component (frequency where the power ratio | R | exceeds the threshold T1), and the frequency determined that is not the target area sound component (the power ratio | R | is the threshold T1). When the number of the following frequencies) is C2, the frequency ratio P1 determined to be the target area sound component is P1 = C1 / (C1 + C2), and the frequency ratio P2 determined not to be the target area sound component Is P2 = C2 / (C1 + C2). Note that the sum of C1 and C2 is the number of all frequency components (for example, the number of frequency bins).

そして、目的エリア音成分でないと判定された周波数の割合Ｐ２が閾値Ｔ２[％]を超えた場合（Ｐ２＞Ｔ２の場合；すなわちＰ１＜（１００[％]−Ｔ２[％]）の場合）、エリア音判定部５は、全周波数（全成分）で目的エリア音成分でないという判定に更新し、全周波数について無音データを出力する。 When the frequency ratio P2 determined not to be the target area sound component exceeds the threshold T2 [%] (when P2> T2; that is, when P1 <(100 [%] − T2 [%])), The area sound determination unit 5 updates the determination that all the frequencies (all components) are not the target area sound components, and outputs silence data for all the frequencies.

また、目的エリア音成分でないと判定された周波数の割合Ｐ２が閾値Ｔ２を下回った場合（Ｐ２＜Ｔ２の場合；すなわちＰ１＞（１００[％]−Ｔ２[％]）の場合）、エリア音判定部５は、Ｐ２と閾値Ｔ３とを比較する。なお、Ｔ３はＴ２より小さい値である（Ｔ２＞Ｔ３）。 When the frequency ratio P2 determined not to be the target area sound component falls below the threshold T2 (when P2 <T2; that is, when P1> (100 [%] − T2 [%])), the area sound determination The unit 5 compares P2 with the threshold value T3. T3 is smaller than T2 (T2> T3).

そして、目的エリア音成分でないと判定された周波数の割合Ｐ２がＴ３を下回った場合（Ｐ２＜Ｔ３の場合；すなわちＰ１＞（１００％−Ｔ３[％]）の場合）、エリア音判定部５は、全周波数で目的エリア音成分であるという判定に更新し、全周波数についてエリア音出力データＺ_１を出力する。 When the frequency ratio P2 determined not to be the target area sound component is lower than T3 (when P2 <T3; that is, when P1> (100% −T3 [%])), the area sound determination unit 5 updates to a determination that an object area sound components in all frequency, and outputs the area sound output data Z ₁ for all frequencies.

なお、Ｐ２がＴ３を下回らなかった場合（Ｔ３≦Ｐ２≦Ｔ２の場合；すなわち（１００％−Ｔ３[％]）≧Ｐ１≧（１００％−Ｔ２[％]）の場合）、エリア音判定部５は、周波数別エリア音判定部４の判定に従い周波数毎の出力を行う。すなわち、この場合、エリア音判定部５は、前段（周波数別エリア音判定部４）から供給された内容をそのまま出力することになる。 When P2 does not fall below T3 (when T3 ≦ P2 ≦ T2; that is, when (100% −T3 [%]) ≧ P1 ≧ (100% −T2 [%])), the area sound determination unit 5 Performs output for each frequency in accordance with the determination by the frequency-specific area sound determination unit 4. That is, in this case, the area sound determination unit 5 outputs the content supplied from the preceding stage (frequency-specific area sound determination unit 4) as it is.

Ｔ２とＴ３の値は、限定されないものであるが、例えば、例えばＴ２＝８０％、Ｔ３＝２０％としてもよい。 The values of T2 and T3 are not limited, but may be T2 = 80% and T3 = 20%, for example.

（Ｂ−３）第２の実施形態の効果
第２の実施形態によれば、第１の実施形態の効果に加えて、以下のような効果を奏することができる。 (B-3) Effects of Second Embodiment According to the second embodiment, the following effects can be obtained in addition to the effects of the first embodiment.

第２の実施形態の収音装置１００Ａでは、周波数別エリア音判定部４により周波数毎に目的エリア音成分の有無を判定した後、さらにエリア音判定部５により、全周波数での目的エリア音成分の割合から最終的な出力を決定している。そして、エリア音判定部５では、全周波数の内、目的エリア音成分が存在しないと判定された周波数が一定割合以上になった場合、全ての周波数で目的エリア音成分が存在しないと再度判定し、無音のデータを出力する。これにより、収音装置１００Ａでは、目的エリア音が存在しないときに、目的エリア音が存在すると誤判定された周波数があったとしても、その影響を抑えることが出来る。 In the sound collection device 100A of the second embodiment, after the presence / absence of the target area sound component is determined for each frequency by the frequency-specific area sound determination unit 4, the area sound determination unit 5 further performs the target area sound component at all frequencies. The final output is determined from the ratio of. Then, the area sound determination unit 5 determines again that the target area sound component does not exist at all frequencies when the frequency determined that the target area sound component does not exist exceeds a certain ratio among all frequencies. Output silence data. Thereby, in the sound collection device 100A, even when there is a frequency erroneously determined that the target area sound exists when the target area sound does not exist, the influence can be suppressed.

（Ｃ）第３の実施形態
以下、本発明による収音装置、プログラム及び方法の第３の実施形態を、図面を参照しながら詳述する。 (C) Third Embodiment Hereinafter, a third embodiment of the sound collection device, program and method according to the present invention will be described in detail with reference to the drawings.

（Ｃ−１）第３の実施形態の構成
図４は、この実施形態の収音装置１００Ｂの機能的構成について示したブロック図である。図４では、上述の図１と同一部分又は対応部分に同一符号又は対応符号を付している。 (C-1) Configuration of Third Embodiment FIG. 4 is a block diagram showing a functional configuration of the sound collection device 100B of this embodiment. In FIG. 4, the same or corresponding parts as those in FIG.

以下では、第３の実施形態の収音装置１００Ｂについて、第１の実施形態との差異を説明する。 Below, the difference with 1st Embodiment is demonstrated about the sound collection apparatus 100B of 3rd Embodiment.

収音装置１００Ｂでは、信号混合部６及び混合レベル算出部７が追加されている点で、第１の実施形態と異なっている。なお、収音装置１００Ｂにおいて、信号混合部６は、周波数別エリア音判定部４の後段に挿入されている。 The sound collection device 100B is different from the first embodiment in that a signal mixing unit 6 and a mixing level calculation unit 7 are added. In the sound collecting device 100B, the signal mixing unit 6 is inserted in the subsequent stage of the frequency-specific area sound determination unit 4.

（Ｃ−２）第３の実施形態の動作
次に、以上のような構成を有する第３の実施形態の収音装置１００Ｂの動作（実施形態に係る収音方法）について第１の実施形態との差異を説明する。 (C-2) Operation of the Third Embodiment Next, the operation of the sound collection device 100B of the third embodiment having the above-described configuration (sound collection method according to the embodiment) and the first embodiment. Explain the difference.

上述の通り、収音装置１００Ｂでは、信号混合部６及び混合レベル算出部７が追加されている点で第１の実施形態と異なる。以下では、信号混合部６及び混合レベル算出部７を中心とした目的エリア音の判定処理について説明する。 As described above, the sound collection device 100B is different from the first embodiment in that the signal mixing unit 6 and the mixing level calculation unit 7 are added. Below, the determination process of the target area sound centering on the signal mixing part 6 and the mixing level calculation part 7 is demonstrated.

混合レベル算出部７は、エリア音出力データＺ_１と非目的エリア音Ｎ_１の比（以下、「ＳＮ比」と呼ぶ）から、出力する目的エリア音（出力データ）に混合する入力信号Ｘ_１の音量レベルを決定する。なお、非目的エリア音Ｎ_１のパワースペクトルＯ_１は、例えば、式（３）に従い、入力信号Ｘ_１からエリア音出力データＺ_１をＳＳすることにより抽出するようにしてもよい。すなわち、Ｏ_１は、（１２）式のように示すことができえる。入力信号Ｘ_１の混合音量レベルを調節する混合レベル係数δ_１は、エリア音出力データＺ_１と非目的エリア音Ｎ_１のＳＮ比Ｚ_１／Ｏ_１に比例する変数であり、例えばＳＮ比０ｄＢでＸ_１を−２０ｄＢにする値とする。δ_１により混合音量レベルはδ_１Ｘ_１となる。またδ_１は、全周波数一定値ではなく周波数毎に重み付けをし、δ_１Φ_１としても良い。ここでΦ_１は、例えば低域から高域に行くに従って小さくなる値とするようにしてもよい。その場合、混合音量レベルはδ_１Φ_１Ｘ_１となる。
Ｏ_１＝Ｘ_１−Ｚ_１ …（１２） The mixing level calculation unit 7 uses the ratio of the area sound output data Z ₁ and the non-target area sound N ₁ (hereinafter referred to as “SN ratio”) to mix the input signal X ₁ with the target area sound to be output (output data). Determine the volume level. The power spectrum O ₁ of the non-target area sound N ₁ may be extracted by, for example, SSing the area sound output data Z ₁ from the input signal X ₁ according to the equation (3). That is, O ₁ can be expressed as shown in Equation (12). The mixing level coefficient δ ₁ for adjusting the mixing volume level of the input signal X ₁ is a variable proportional to the SN ratio Z ₁ / O ₁ of the area sound output data Z ₁ and the non-target area sound N ₁ , for example, the SN ratio 0 dB. in the value of the _{X 1} to -20 dB. Due to δ ₁ , the mixed sound volume level becomes δ ₁ X ₁ . Further, δ ₁ may be set to δ ₁ Φ ₁ by weighting for each frequency instead of a constant value for all frequencies. Here, Φ ₁ may be set to a value that decreases, for example, from the low range to the high range. In that case, the mixed sound volume level is δ ₁ Φ ₁ X ₁ .
O ₁ = X ₁ −Z ₁ (12)

信号混合部６は、周波数別エリア音判定部４において目的エリア音成分であると判定された周波数で、エリア収音処理部２で抽出したエリア音出力データに、データ入力部１で取得した入力信号を、混合レベル算出部７において算出したレベルに基づき混合する。最終的な出力｜Ｗ_１ｋ｜は以下の（１３）式に従い混合されるものとする。ここでｋは、周波数別エリア音判定部４において目的エリア音成分であると判定された周波数である。
｜Ｗ_１ｋ｜＝｜Ｚ_１ｋ｜＋δ_１｜Ｘ_１ｋ｜ …（１３） The signal mixing unit 6 uses the frequency acquired by the data input unit 1 as the area sound output data extracted by the area sound collection processing unit 2 at the frequency determined by the frequency-specific area sound determination unit 4 as the target area sound component. The signals are mixed based on the level calculated by the mixing level calculation unit 7. The final output | W _1k | is assumed to be mixed according to the following equation (13). Here, k is a frequency determined by the frequency-specific area sound determination unit 4 to be a target area sound component.
| W _1k | = | Z _1k | + δ ₁ | X _1k | (13)

（Ｃ−３）第３の実施形態の効果
第３の実施形態によれば、第１の実施形態の効果に加えて以下のような効果を奏することができる。 (C-3) Effects of the Third Embodiment According to the third embodiment, the following effects can be obtained in addition to the effects of the first embodiment.

第３の実施形態の収音装置１００Ｂでは、周波数別エリア音判定部４により周波数毎に目的エリア音成分の有無を判定した後、信号混合部６及び混合レベル算出部７により、目的エリア音成分であると判定された周波数にのみ、入力信号のゲインを調節、加算して出力する。これにより、収音装置１００Ｂでは、目的エリア収音成分にのみ入力信号が加算されるため、非目的エリア音の混入を防ぎ、かつ目的エリア音の歪みを補正することができる。 In the sound collection device 100B of the third embodiment, after determining whether or not there is a target area sound component for each frequency by the frequency-specific area sound determination unit 4, the signal mixing unit 6 and the mixing level calculation unit 7 perform the target area sound component. The gain of the input signal is adjusted and added only to the frequency determined to be output. Thereby, in the sound collection device 100B, since the input signal is added only to the target area sound collection component, mixing of the non-target area sound can be prevented and distortion of the target area sound can be corrected.

言い換えると、第３の実施形態の収音装置１００Ｂでは、目的エリア音の歪み補正のために入力信号を混合する際は、目的エリア音成分と判定された周波数にのみ入力信号を加算する。入力信号に非目的エリア音が存在していても、各周波数で目的エリア音成分と非目的エリア音成分が重なる確率が低い。そのため、第３の実施形態の収音装置１００Ｂでは、出力（収音結果）に非目的エリア音成分は加算されず、最終的に目的エリア音成分のみ出力される。 In other words, in the sound collection device 100B of the third embodiment, when mixing the input signal for correcting distortion of the target area sound, the input signal is added only to the frequency determined as the target area sound component. Even if the non-target area sound exists in the input signal, the probability that the target area sound component and the non-target area sound component overlap at each frequency is low. Therefore, in the sound collection device 100B of the third embodiment, the non-target area sound component is not added to the output (sound collection result), and only the target area sound component is finally output.

また目的エリア音の歪みを補正するために入力信号を混合する際、非目的エリア音が存在していても、第３の実施形態の収音装置１００Ｂでは、目的エリア音成分だけを出力するため、エリア収音の性能を保ったまま音質を改善することができる。 In addition, when mixing input signals to correct distortion of the target area sound, the sound collection device 100B of the third embodiment outputs only the target area sound component even if there is a non-target area sound. The sound quality can be improved while maintaining the performance of area pickup.

（Ｄ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (D) Other Embodiments The present invention is not limited to the above-described embodiments, and may include modified embodiments as exemplified below.

（Ｄ−１）第１の実施形態のエリア音判定部５において、またパワー比が閾値Ｔ１よりも一定以上大きい成分が存在する場合、その後の数秒間は、当該成分についてパワー比の値に関わらず目的エリア音成分が存在すると判定する機能（ハングオーバー機能）を追加するようにしてもよい。 (D-1) In the area sound determination unit 5 of the first embodiment, and when there is a component whose power ratio is larger than the threshold value T1 by a certain amount or more, the subsequent few seconds are related to the value of the power ratio for the component. A function for determining that the target area sound component is present (hangover function) may be added.

（Ｄ−２）第２の実施形態の収音装置１００Ａ（エリア音判定部５）において、全周波数（全帯域）ではなく、全周波数（全帯域）を複数の帯域に分割（以下、この分割した帯域を「分割帯域」とも呼ぶ）し、各分割帯域の各成分についてパワー比を算出し、分割帯域ごとに目的エリア音の有無を判定し、分割帯域ごとに出力の有無（収音結果に加算するか否か）を判定するようにしてもよい。 (D-2) In the sound collection device 100A (area sound determination unit 5) of the second embodiment, not all frequencies (all bands) but all frequencies (all bands) are divided into a plurality of bands (hereinafter, this division). And the power ratio is calculated for each component of each divided band, the presence / absence of the target area sound is determined for each divided band, and the presence / absence of output for each divided band (in the sound collection result) It may be determined whether or not to add.

具体的には、例えば、エリア音判定部５は、ある分割帯域において目的エリア音成分でないと判定された周波数（パワー比が閾値Ｔ１を超えた周波数）の割合が閾値Ｔ２を超えた場合、その分割帯域全体が目的エリア音成分でないという判定に更新し、無音データを出力するようにしてもよい。 Specifically, for example, when the ratio of the frequency (frequency at which the power ratio exceeds the threshold T1) determined as not being the target area sound component in a certain divided band exceeds the threshold T2, the area sound determination unit 5 It may be updated to the determination that the entire divided band is not the target area sound component, and silence data may be output.

また、例えば、エリア音判定部５は、ある分割帯域において、目的エリア音成分でないと判定された周波数の割合が閾値Ｔ２を下回った場合は、当該割合を閾値Ｔ３と比較する（Ｔ２＞Ｔ３）。そして、エリア音判定部５は、当該分割帯域の当該割合がＴ３を下回った場合は、当該分割帯域全体で目的エリア音成分が存在するという判定に更新し、当該分割帯域全体のエリア音出力データを出力するようにしてもよい。 For example, when the ratio of the frequency determined not to be the target area sound component in a certain divided band falls below the threshold T2, the area sound determination unit 5 compares the ratio with the threshold T3 (T2> T3). . When the ratio of the divided band falls below T3, the area sound determination unit 5 updates the determination that the target area sound component exists in the entire divided band, and outputs area sound output data of the entire divided band. May be output.

また、エリア音判定部５は、当該分割帯域の当該割合がＴ３を下回らなかった場合は、周波数別エリア音判定部４の判定結果（周波数毎の判定結果）に従って出力（当該分割帯域については周波数別エリア音判定部４の出力結果をそのまま出力）するようにしてもよい。 In addition, when the ratio of the divided band does not fall below T3, the area sound determination unit 5 outputs according to the determination result (determination result for each frequency) of the frequency-specific area sound determination unit 4 (frequency for the divided band). The output result of the separate area sound determination unit 4 may be output as it is).

さらに、エリア音判定部５は、分割帯域のうち、一つでも目的エリア音成分であるという判定になった周波数があった場合、当該分割帯域全体で、目的エリア音成分が存在するという判定に更新し、当該分割帯域の全周波数についてエリア音出力データを出力するようにしてもよい。 Furthermore, when there is a frequency determined to be the target area sound component in one of the divided bands, the area sound determination unit 5 determines that the target area sound component exists in the entire divided band. The area sound output data may be output for all frequencies of the divided band.

（Ｄ−３）第２の実施形態と第３の実施形態を組み合わせた構成としてもよい。具体的には、収音装置１００Ａに、信号混合部６及び混合レベル算出部７を追加するようにしてもよい。この場合、エリア音判定部５の後段に信号混合部６を挿入するようにしてもよい。 (D-3) It is good also as a structure which combined 2nd Embodiment and 3rd Embodiment. Specifically, the signal mixing unit 6 and the mixing level calculation unit 7 may be added to the sound collection device 100A. In this case, the signal mixing unit 6 may be inserted after the area sound determination unit 5.

１００…収音装置、１…データ入力部、２…エリア収音処理部、３…周波数別パワー比算出部、４…周波数別エリア音判定部、５…エリア音判定部、２−１…指向性形成部、２−２…遅延補正部、２−３…空間座標データ、２−４…目的エリア音パワー補正係数算出部、２−５…目的エリア音抽出部。 DESCRIPTION OF SYMBOLS 100 ... Sound collection apparatus, 1 ... Data input part, 2 ... Area sound collection process part, 3 ... Power ratio calculation part according to frequency, 4 ... Area sound determination part according to frequency, 5 ... Area sound determination part, 2-1 ... Direction Sex formation unit, 2-2 ... delay correction unit, 2-3 ... spatial coordinate data, 2-4 ... target area sound power correction coefficient calculation unit, 2-5 ... target area sound extraction unit.

Claims

Directivity forming means for forming directivity in the direction of the target area by a beamformer from an input signal;
Non-target area sound extracting means for extracting non-target area sound existing in the target area direction due to directivity formed by the directivity forming means;
A target area sound extraction means for outputting a sound extracted as a result of extracting a target area sound using a non-target area sound existing in a target area direction extracted by the non-target area sound extraction means from the output of the beam former; ,
Power spectrum ratio calculating means for calculating a power spectrum ratio of the input signal and the extracted sound for each frequency component;
Determination means for determining whether or not a target area sound exists for each frequency component, using the power spectrum ratio calculated by the power spectrum ratio calculation means;
An output means for outputting the frequency component of the extracted sound for the frequency component determined by the determination means that the target area sound exists ;
The output means extracts all frequency components when the ratio of the frequency components determined by the determining means that the target area sound does not exist in the input signal exceeds a second threshold with respect to all frequency components. A sound collecting device characterized by not outputting sound.

The determination means determines, for each frequency component, whether or not a target area sound exists based on a comparison result between the power spectrum ratio calculated by the power spectrum ratio calculation means and a first threshold value. The sound collection device according to claim 1.

The output means has a ratio of frequency components that the determination means determines that no target area sound is present in the input signal for all frequency components below a third threshold value that is smaller than the second threshold value. If, pickup device according to claim 1, characterized in that it outputs the extracted sound in all frequency components.

Mixing level calculation means for calculating a volume level of the input signal to be mixed with an output sound based on a ratio between the input signal and a non-target area sound extracted based on the extracted sound and the extracted sound. ,
The output means mixes and outputs the input signal gain-adjusted based on the volume level calculated by the mixing level calculation means for the frequency component determined by the determination means that the target area sound exists. sound collection device according to any one of claims 1 to 3,.

Computer
Directivity forming means for forming directivity in the direction of the target area by a beamformer from an input signal;
Non-target area sound extracting means for extracting non-target area sound existing in the target area direction due to directivity formed by the directivity forming means;
A target area sound extraction means for outputting a sound extracted as a result of extracting a target area sound using a non-target area sound existing in a target area direction extracted by the non-target area sound extraction means from the output of the beam former; ,
Power spectrum ratio calculating means for calculating a power spectrum ratio of the input signal and the extracted sound for each frequency component;
Determination means for determining whether or not a target area sound exists for each frequency component, using the power spectrum ratio calculated by the power spectrum ratio calculation means;
To function as an output unit for outputting the frequency components of said extracted sound for the determined frequency components with sound object area is present in the determination unit,
The output means extracts all frequency components when the ratio of the frequency components determined by the determining means that the target area sound does not exist in the input signal exceeds a second threshold with respect to all frequency components. Sound collection program characterized by not outputting sound.

In the sound collection method performed by the sound collection device,
Directivity forming means, non-target area sound extraction means, target area sound extraction means, power spectrum ratio calculation means, determination means, and output means,
The directivity forming means forms directivity in the direction of the target area by a beamformer from an input signal,
The non-target area sound extracting means extracts non-target area sound existing in the target area direction due to the directivity formed by the directivity forming means,
The target area sound extraction means extracts the extracted sound as a result of extracting the target area sound from the output of the beamformer using the non-target area sound existing in the target area direction extracted by the non-target area sound extraction means. Output,
The power spectrum ratio calculating means calculates the power spectrum ratio of the input signal and the extracted sound for each frequency component,
The determination means determines whether or not there is a target area sound for each frequency component, using the power spectrum ratio calculated by the power spectrum ratio calculation means,
The output means outputs the frequency component of the extracted sound for the frequency component determined by the determining means that the target area sound exists ,
The output means extracts all frequency components when the ratio of the frequency components determined by the determining means that the target area sound does not exist in the input signal exceeds a second threshold with respect to all frequency components. A sound collection method characterized by not outputting sound.