JP2018132737A

JP2018132737A - Sound pick-up device, program and method, and determining apparatus, program and method

Info

Publication number: JP2018132737A
Application number: JP2017028268A
Authority: JP
Inventors: 一浩片桐; Kazuhiro Katagiri
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2017-02-17
Filing date: 2017-02-17
Publication date: 2018-08-23
Anticipated expiration: 2037-02-17
Also published as: JP6540730B2

Abstract

PROBLEM TO BE SOLVED: To improve determination precision of target area sound in environments with strong background noise.SOLUTION: This invention relates to a sound pick-up device. The sound pick-up device of the present disclosure acquires extracted sound as a result of extracting target area sound using non-target area sound present in a target area direction from output of a beam former, divides each of an input signal and the extracted sound into plural bands, computes a power spectrum ratio between the input signal and the extracted sound for each divided band, determines whether or not target area sound is present in the input signal by employing the power spectrum ratios for each divided band, and outputs the extracted sound as a sound pick-up result when target area sound has been determined to be present.SELECTED DRAWING: Figure 1

Description

本発明は、収音装置、プログラム及び方法、並びに、判定装置、プログラム及び方法に関し、例えば、目的エリアの音を強調し、それ以外のエリアの音を抑圧する処理に適用し得る。 The present invention relates to a sound collection device, a program and a method, and a determination device, a program and a method, and can be applied to, for example, a process of emphasizing a sound in a target area and suppressing sounds in other areas.

複数の音源が存在する環境下において、ある特定の方向の音のみ分離し収音する技術として、マイクロホンアレイを用いたビームフォーマ（ＢｅａｍＦｏｒｍｅｒ；以下ＢＦ）がある。ＢＦとは、各マイクロホンに到達する信号の時間差を利用して指向性を形成する技術である（非特許文献１参照）。ＢＦは、加算型と減算型の大きく２つの種類に分けられる。 There is a beam former (hereinafter referred to as BF) using a microphone array as a technique for separating and collecting only sound in a specific direction in an environment where a plurality of sound sources exist. BF is a technique for forming directivity using the time difference between signals reaching each microphone (see Non-Patent Document 1). BF is roughly divided into two types, an addition type and a subtraction type.

特に減算型ＢＦは、加算型ＢＦに比べ、少ないマイクロホン数で指向性を形成できるという利点がある。 In particular, the subtraction type BF has an advantage that directivity can be formed with a smaller number of microphones than the addition type BF.

図７は、従来の減算型ＢＦに係る構成を示すブロック図である。 FIG. 7 is a block diagram showing a configuration related to a conventional subtractive BF.

図７に示す従来の減算型ＢＦでは、マイクロホン数が２個となっている。 In the conventional subtraction type BF shown in FIG. 7, the number of microphones is two.

従来の減算型ＢＦは、まず遅延器により目的とする方向に存在する音（以下、「目的音」とも呼ぶ）が各マイクロホンに到来する信号の時間差を算出し、遅延を加えることにより目的音の位相を合わせる。従来の減算型ＢＦの遅延器では、時間差は下記（１）式により算出される。 The conventional subtractive BF first calculates the time difference between signals arriving at each microphone by sounds that are present in a target direction (hereinafter also referred to as “target sound”) by a delay device, and adds a delay to the target sound. Match the phase. In the conventional subtractor BF delay unit, the time difference is calculated by the following equation (1).

下記の（１）式において、ｄはマイクロホン間の距離、ｃは音速、τ_ｉは遅延量である。また、下記の（１）式において、θ_Ｌは、各マイクロホンを結んだ直線に対する垂直方向から目的方向への角度である。
τ_Ｌ＝（ｄｓｉｎθ_Ｌ）／ｃ …（１） In the following formula (1), d is the distance between the microphones, c is the speed of sound, and τ _i is the delay amount. In the following equation (1), θ _L is an angle from a vertical direction to a target direction with respect to a straight line connecting the microphones.
τ _L = (dsin θ _L ) / c (1)

ここで、死角が第１のマイクロホンと第２のマイクロホンの中心に対し、第１のマイクロホンの方向に存在する場合、従来の減算型ＢＦにおける遅延器は、第１のマイクロホンの入力信号ｘ_１（ｔ）に対し遅延処理を行う。その後、遅延処理された入力信号ｘ_１（ｔ）は、（２）式に従い減算処理される。
ａ（ｔ）＝ｘ_２（ｔ）−ｘ_１（ｔ−τ_Ｌ） …（２） Here, when the blind spot exists in the direction of the first microphone with respect to the center of the first microphone and the second microphone, the delay unit in the conventional subtractive BF has the input signal x ₁ ( Delay processing is performed for t). Thereafter, the input signal x ₁ (t) subjected to the delay process is subjected to a subtraction process according to the equation (2).
a (t) = x ₂ (t) −x ₁ (t−τ _L ) (2)

従来の減算型ＢＦにおける減算処理は、周波数領域でも同様に行うことができ、その場合（２）式は以下の（３）式のように変更される。

The subtraction process in the conventional subtraction type BF can be similarly performed in the frequency domain, and in this case, the expression (2) is changed to the following expression (3).

ここでθ_Ｌ＝±π／２の場合、形成される指向性は図８（Ａ）に示すように、カージオイド型の単一指向性となり、θ_Ｌ＝０，πの場合は、図８（Ｂ）のような８の字型の双指向性となる。以下では、入力信号から単一指向性を形成するフィルタを単一指向性フィルタ、双指向性を形成するフィルタを双指向性フィルタと呼ぶものとする。 Here, when θ _L = ± π / 2, the formed directivity is cardioid unidirectional as shown in FIG. 8A, and when θ _L = 0, π, FIG. As shown in (B), the figure is bi-directional. Hereinafter, a filter that forms unidirectionality from an input signal is referred to as a unidirectional filter, and a filter that forms bidirectionality is referred to as a bidirectional filter.

またスペクトル減算法（ＳｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎ；以下「ＳＳ」とも呼ぶ）を用いることで、双指向性の死角に強い指向性を形成することもできる。ＳＳによる指向性の形成は、（４）式に従う。（４）式では、第１のマイクロホンの入力信号Ｘ_１を用いているが、第２のマイクロホンの入力信号Ｘ_２でも同様の効果を得ることができる。ここでβはＳＳの強度を調節するための係数である。減算時に値がマイナスになった場合は、０または元の値を小さくした値に置き換えるフロアリング処理を行う。この方式は、双指向性フィルタにより目的方向以外に存在する音（以下、「非目的音」とも呼ぶ）を抽出し、抽出した非目的音のパワースペクトルを入力信号のパワースペクトルから減算することで、目的音を強調することができる。
｜Ｙ（ω）｜＝｜Ｘ_１（ω）｜−β｜Α（ω）｜ …（４） Further, by using a spectral subtraction (hereinafter referred to as “SS”), it is possible to form directivity that is strong against the blind spot of bi-directionality. The formation of directivity by SS follows equation (4). In the equation (4), the input signal X1 of the _first microphone is used, but the same effect can be obtained with the input signal X2 of the _second microphone. Here, β is a coefficient for adjusting the strength of SS. If the value becomes negative during subtraction, flooring processing is performed in which 0 or the original value is replaced with a smaller value. This method uses a bi-directional filter to extract sound that exists outside the target direction (hereinafter also referred to as “non-target sound”), and subtracts the power spectrum of the extracted non-target sound from the power spectrum of the input signal. The target sound can be emphasized.
| Y (ω) | = | X ₁ (ω) | −β | Α (ω) | (4)

ある特定のエリア内に存在する音（以下、「目的エリア音」と呼ぶ）だけを収音したい場合、減算型ＢＦを用いるだけでは、そのエリアの周囲に存在する音源（以下、「非目的エリア音」と呼ぶ）も収音してしまう可能性がある。そこで特許文献１では、複数のマイクロホンアレイを用い、それぞれ別々の方向から目的エリアへ指向性を向け、指向性を目的エリアで交差させることで目的エリア音を収音する手法を提案している。 When it is desired to pick up only sound existing in a certain area (hereinafter referred to as “target area sound”), the sound source (hereinafter referred to as “non-target area”) around that area is simply obtained by using the subtractive BF. May also be picked up. Thus, Patent Document 1 proposes a method of collecting a target area sound by using a plurality of microphone arrays, directing directivity from different directions to the target area, and crossing the directivities in the target area.

次に、特許文献１に記載された目的エリア音の収音処理の例について説明する。 Next, an example of the sound collection process of the target area sound described in Patent Document 1 will be described.

図９は、２つのマイクロホンアレイＭＡ１、ＭＡ２を用いて、目的エリアの音源からの目的エリア音を収音する場合における各マイクロホンアレイの構成例について示した説明図である。 FIG. 9 is an explanatory diagram showing a configuration example of each microphone array in the case of collecting the target area sound from the sound source in the target area using the two microphone arrays MA1 and MA2.

図１０は、図９に示すマイクロホンアレイＭＡ１、ＭＡ２のそれぞれのＢＦ出力について周波数領域で示した説明図（グラフ）である。図１０（ａ）、図１０（ｂ）は、それぞれマイクロホンアレイＭＡ１、ＭＡ２のＢＦ出力について周波数領域で示したグラフ（イメージ図）である。 FIG. 10 is an explanatory diagram (graph) showing the respective BF outputs of the microphone arrays MA1 and MA2 shown in FIG. 9 in the frequency domain. FIGS. 10A and 10B are graphs (image diagrams) showing the BF outputs of the microphone arrays MA1 and MA2 in the frequency domain, respectively.

特許文献１に記載された手法では、まず各マイクロホンアレイＭＡ１、ＭＡ２のＢＦ出力に含まれる目的エリア音のパワーの比率を推定し、それを補正係数とする。具体的には、２つのマイクロホンアレイＭＡ１、ＭＡ２を使用する場合、目的エリア音パワーの補正係数は、例えば、（５）、（６）式又は（７）、（８）式により算出することができる。

In the method described in Patent Document 1, first, the ratio of the power of the target area sound included in the BF outputs of the microphone arrays MA1 and MA2 is estimated and used as a correction coefficient. Specifically, when two microphone arrays MA1 and MA2 are used, the correction coefficient for the target area sound power can be calculated by, for example, the equations (5), (6) or (7), (8). it can.

ここで、Ｙ_１ｋ（ｎ），Ｙ_２ｋ（ｎ）はマイクロホンアレイＭＡ１、ＭＡ２のＢＦ出力のパワースペクトル、Ｎは周波数ビンの総数、ｋは周波数、α（ｎ）はＢＦ出力に対するパワー補正係数である。またｍｏｄｅは最頻値、ｍｅｄｉａｎは中央値を表している。その後、補正係数により各ＢＦ出力を補正し、ＳＳすることで、目的エリア方向に存在する非目的エリア音を抽出する。更に抽出した非目的エリア音を各ＢＦの出力からＳＳすることにより目的エリア音を抽出することができる。 Here, Y _1k (n) and Y _2k (n) are power spectra of the BF outputs of the microphone arrays MA1 and MA2, N is the total number of frequency bins, k is the frequency, and α (n) is a power correction coefficient for the BF output. is there. Further, mode represents the mode value and median represents the median value. Thereafter, each BF output is corrected by the correction coefficient, and SS is performed to extract the non-target area sound existing in the target area direction. Furthermore, the target area sound can be extracted by SS extracting the extracted non-target area sound from the output of each BF.

図１１は、図９に示すマイクロホンアレイＭＡ１、ＭＡ２を用いて取得したＢＦ出力に基づいてエリア収音処理した場合における各成分のパワースペクトルの変化について示した説明図（イメージを図）である。 FIG. 11 is an explanatory diagram (image is a diagram) showing changes in the power spectrum of each component when the area sound collection process is performed based on the BF output acquired using the microphone arrays MA1 and MA2 shown in FIG.

まず、マイクロホンアレイＭＡ１の入力信号Ｘ_１から、非目的エリア音Ｎ_２を抑圧したＢＦ出力Ｙ_１を得る（図１１（ａ）参照）。 First, the input signal _{X 1} of the microphone array MA1, obtain BF output _{Y 1} that suppresses the non-target area sound _{N 2} (see FIG. 11 (a)).

マイクロホンアレイＭＡ１からみた目的エリア方向に存在する非目的エリア音Ｎ_１（ｎ）を抽出するには、（７）式に示すように、マイクロホンアレイＭＡ１のＢＦ出力Ｙ_２（ｎ）からマイクロホンアレイＭＡ２のＢＦ出力Ｙ_２（ｎ）にパワー補正係数αを掛けたものをＳＳする（図１１（ｂ）参照）。その後、（８）式に従い、各ＢＦ出力から非目的エリア音をＳＳして目的エリア音を抽出する（図１１（ｃ）参照）。γ（ｎ）はＳＳ時の強度を変更するための係数である。
Ｎ_１（ｎ）＝Ｙ_１（ｎ）−α（ｎ）Ｙ_２（ｎ） …（７）
Ｚ_１（ｎ）＝Ｙ_１（ｎ）−γ（ｎ）Ｎ_１（ｎ） …（８） In order to extract the non-target area sound N ₁ (n) existing in the direction of the target area viewed from the microphone array MA1, the microphone array MA2 is extracted from the BF output Y ₂ (n) of the microphone array MA1 as shown in the equation (7). SS is obtained by multiplying the BF output Y ₂ (n) by the power correction coefficient α (see FIG. 11B). After that, according to the equation (8), the non-target area sound is SS from each BF output and the target area sound is extracted (see FIG. 11C). γ (n) is a coefficient for changing the strength at the time of SS.
N ₁ (n) = Y ₁ (n) −α (n) Y ₂ (n) (7)
Z ₁ (n) = Y ₁ (n) −γ (n) N ₁ (n) (8)

目的エリア音を抽出するために、（４）式と（８）式で非線形処理であるＳＳを行っているため、高雑音環境下ではミュージカルノイズと呼ばれる不快な異音が発生する恐れがある。 In order to extract the target area sound, SS, which is a nonlinear process, is performed using Equations (4) and (8). Therefore, an unpleasant noise called musical noise may occur in a high noise environment.

そこで特許文献２では、目的エリア音が存在している区間と存在していない区間を判定し、存在していない区間ではエリア収音処理した音を出力しないことにより、ミュージカルノイズなどの異音を抑えている。目的エリア音が存在しているかどうかを判定するために、まず（９）式に従い入力信号と目的エリア音を抽出した出力（以下、「エリア音出力」と呼ぶ）間のパワースペクトル比（エリア音出力／入力信号）を算出する。目的エリア内に音源が存在する場合、入力信号Ｘ_１とエリア音出力Ｚ_１には目的エリア音が共通に含まれるため、目的エリア音成分のパワースペクトル比は１に近い値となる。逆に非目的エリア音成分は、エリア音出力では抑圧されているため、パワースペクトル比は小さい値となる。またその他の背景雑音成分に関してもエリア収音処理では複数回のＳＳを行うため、専用の雑音抑圧処理を事前にしなくてもある程度抑圧され、パワースペクトル比は小さい値となる。逆に目的エリア音が存在しない場合、エリア音出力には、入力信号と比べて消し残りの弱い雑音しか含まれていないため、パワースペクトル比は全体域で小さい値となる。この特徴により、（１０）式に従い各周波数で求めたパワースペクトル比の平均（以下、「平均パワースペクトル比」とも呼ぶ）を取ると、目的エリア音が存在するときと存在しないときとで大きな差が生まれることになる。ここで、ｍとｎは、それぞれ処理帯域の上限と下限であり、例えば音声情報が十分に含まれる１００Ｈｚから６ｋＨｚとしてもよい。そして、特許文献２に記載された装置では平均パワースペクトル比を予め設定した閾値で判定し、目的エリア音が存在しないと判定された場合は、エリア音出力データを出力せずに無音、もしくは入力音のゲインを小さくした音を出力する。

Therefore, in Patent Document 2, the section where the target area sound exists and the section where the target area sound does not exist are determined. In the section where the target area sound does not exist, the sound that has been subjected to the area sound collection processing is not output. It is suppressed. In order to determine whether the target area sound exists, first, the power spectrum ratio (area sound) between the input signal and the output obtained by extracting the target area sound (hereinafter referred to as “area sound output”) according to the equation (9). Output / input signal). If there is a sound source in the destination area, since the destination area sound to the input signal X ₁ and Area sound output Z ₁ is included in the common power spectral ratios object area sound component is a value close to 1. On the other hand, since the non-target area sound component is suppressed in the area sound output, the power spectrum ratio becomes a small value. In addition, other background noise components are also subjected to SS multiple times in the area sound collection process, so that they are suppressed to some extent without performing dedicated noise suppression processing in advance, and the power spectrum ratio becomes a small value. On the contrary, when the target area sound does not exist, the area sound output includes only weak noise that is not erased as compared with the input signal, so that the power spectrum ratio becomes a small value in the entire area. Due to this feature, if the average of the power spectrum ratio obtained at each frequency according to the equation (10) (hereinafter also referred to as “average power spectrum ratio”) is taken, there is a large difference between when the target area sound is present and when it is not present. Will be born. Here, m and n are the upper limit and the lower limit of the processing band, respectively, and may be set to, for example, 100 Hz to 6 kHz that sufficiently include audio information. In the apparatus described in Patent Document 2, the average power spectrum ratio is determined by a preset threshold value. When it is determined that the target area sound does not exist, no sound is input without outputting the area sound output data or input. Outputs sound with reduced sound gain.

特開２０１４−０７２７０８号公報JP 2014-072708 A 特開２０１６−１２７４５７号公報JP, 2006-127457, A

浅野太著，“音響テクノロジーシリーズ１６音のアレイ信号処理−音源の定位・追跡と分離−”，日本音響学会編，コロナ社，２０１１年２月２５日発行Asano Tadashi, "Acoustic Technology Series 16 Sound Array Signal Processing-Sound Source Localization / Tracking and Separation-", Acoustical Society of Japan, Corona, February 25, 2011

特許文献１に記載の手法を用いれば、目的とするエリアの周囲に非目的エリア音が存在していても、目的エリア音を収音することができる。また、特許文献２に記載の手法を用いれば、エリア収音処理で発生するミュージカルノイズの影響を抑えることができる。しかしながら、イベント会場など人が多い場所、また周囲で音楽などが流れている場所などの高雑音環境下ではＳＮ比が悪化し、エリア収音により出力される音のパワースペクトルが小さくなる可能性がある。このような状況では、エリア収音出力と入力信号の平均パワースペクトル比も小さくなってしまう。特に無声子音の様なもともとパワースペクトルが小さい成分では、非目的エリア音区間の平均パワースペクトル比との差が小さくなるため、目的エリア音の判定精度が悪くなり、目的エリア音の一部が欠落してしまう恐れがある。 If the method described in Patent Document 1 is used, even if a non-target area sound exists around the target area, the target area sound can be collected. Moreover, if the method described in Patent Document 2 is used, it is possible to suppress the influence of musical noise generated in the area sound collection processing. However, in a high noise environment such as a place where there are many people such as an event venue or a place where music is flowing in the surroundings, there is a possibility that the SN ratio deteriorates and the power spectrum of the sound output by the area sound collection becomes small. is there. In such a situation, the average power spectrum ratio between the area sound collection output and the input signal also becomes small. Especially for components with a small power spectrum, such as unvoiced consonants, the difference from the average power spectrum ratio of the non-target area sound section is small, so the target area sound judgment accuracy is poor and part of the target area sound is missing. There is a risk of doing.

以上のような問題に鑑みて、背景雑音が強い環境下において、目的エリア音の判定精度を向上させることができる収音装置、プログラム及び方法、並びに、判定装置、プログラム及び方法が望まれている。 In view of the above problems, a sound collection device, a program, and a method, and a determination device, a program, and a method that can improve the determination accuracy of a target area sound in an environment with strong background noise are desired. .

第１の本発明の収音装置は、（１）入力信号からビームフォーマにより目的エリア方向に指向性を形成する指向性形成手段と、（２）前記指向性形成手段で形成された指向性による目的エリア方向に存在する非目的エリア音を抽出する非目的エリア音抽出手段と、（３）前記ビームフォーマの出力から、前記非目的エリア音抽出手段が抽出した目的エリア方向に存在する非目的エリア音を利用して目的エリア音を抽出した結果の抽出音を出力する目的エリア音抽出手段と、（４）前記入力信号と前記抽出音をそれぞれ複数の帯域に分割する帯域分割手段と、（５）前記帯域分割手段で分割された分割帯域ごとに、前記入力信号と前記抽出音のパワースペクトル比を算出するパワースペクトル比算出手段と、（６）前記パワースペクトル比算出手段で算出された分割帯域ごとのパワースペクトル比を用いて、前記入力信号に目的エリア音が存在するか否かを判定する判定手段と、（７）前記判定手段で目的エリア音が存在すると判定された場合に収音結果として前記抽出音を出力する出力手段とを有することを特徴とする。 The sound collecting device according to the first aspect of the present invention includes (1) directivity forming means for forming directivity in the direction of a target area from an input signal by a beam former, and (2) directivity formed by the directivity forming means. Non-target area sound extracting means for extracting non-target area sound existing in the target area direction; and (3) a non-target area existing in the target area direction extracted by the non-target area sound extracting means from the output of the beam former. A target area sound extracting means for outputting an extracted sound obtained as a result of extracting the target area sound using the sound; (4) a band dividing means for dividing the input signal and the extracted sound into a plurality of bands; ) Power spectrum ratio calculating means for calculating a power spectrum ratio between the input signal and the extracted sound for each divided band divided by the band dividing means; and (6) calculating the power spectrum ratio. Determining means for determining whether a target area sound exists in the input signal using the power spectrum ratio for each divided band calculated in the stage; and (7) determining that the target area sound exists by the determining means. And output means for outputting the extracted sound as a sound collection result.

第２の本発明の収音プログラムは、コンピュータを、（１）入力信号からビームフォーマにより目的エリア方向に指向性を形成する指向性形成手段と、（２）前記指向性形成手段で形成された指向性による目的エリア方向に存在する非目的エリア音を抽出する非目的エリア音抽出手段と、（３）前記ビームフォーマの出力から、前記非目的エリア音抽出手段が抽出した目的エリア方向に存在する非目的エリア音を利用して目的エリア音を抽出した結果の抽出音を出力する目的エリア音抽出手段と、（４）前記入力信号と前記抽出音をそれぞれ複数の帯域に分割する帯域分割手段と、（５）前記帯域分割手段で分割された分割帯域ごとに、前記入力信号と前記抽出音のパワースペクトル比を算出するパワースペクトル比算出手段と、（６）前記パワースペクトル比算出手段で算出された分割帯域ごとのパワースペクトル比を用いて、前記入力信号に目的エリア音が存在するか否かを判定する判定手段と、（７）前記判定手段で目的エリア音が存在すると判定された場合に収音結果として前記抽出音を出力する出力手段として機能させることを特徴とする。 The sound collecting program of the second aspect of the present invention is formed by (1) directivity forming means for forming directivity in the direction of a target area by a beam former from an input signal, and (2) the directivity forming means. Non-target area sound extracting means for extracting non-target area sound existing in the target area direction due to directivity; and (3) existing in the target area direction extracted by the non-target area sound extracting means from the output of the beamformer. A target area sound extraction means for outputting an extraction sound obtained by extracting a target area sound using a non-target area sound; and (4) a band division means for dividing the input signal and the extracted sound into a plurality of bands, respectively. (5) power spectrum ratio calculating means for calculating a power spectrum ratio of the input signal and the extracted sound for each of the divided bands divided by the band dividing means; Determining means for determining whether or not a target area sound exists in the input signal using the power spectrum ratio for each divided band calculated by the word spectrum ratio calculating means; and (7) the target area sound by the determining means. When it is determined that the sound is present, it functions as an output means for outputting the extracted sound as a sound collection result.

第３の本発明の収音方法は、（１）指向性形成手段、非目的エリア音抽出手段、目的エリア音抽出手段、帯域分割手段、パワースペクトル比算出手段、判定手段、及び出力手段を有し、（２）前記指向性形成手段は、入力信号からビームフォーマにより目的エリア方向に指向性を形成し、（３）前記非目的エリア音抽出手段は、前記指向性形成手段で形成された指向性による目的エリア方向に存在する非目的エリア音を抽出し、（４）前記目的エリア音抽出手段は、前記ビームフォーマの出力から、前記非目的エリア音抽出手段が抽出した目的エリア方向に存在する非目的エリア音を利用して目的エリア音を抽出した結果の抽出音を出力し、（５）前記帯域分割手段は、前記入力信号と前記抽出音をそれぞれ複数の帯域に分割し、（６）前記パワースペクトル比算出手段は、前記帯域分割手段で分割された分割帯域ごとに、前記入力信号と前記抽出音のパワースペクトル比を算出し、（７）前記判定手段は、前記パワースペクトル比算出手段で算出された分割帯域ごとのパワースペクトル比を用いて、前記入力信号に目的エリア音が存在するか否かを判定し、（８）前記出力手段は、前記判定手段で目的エリア音が存在すると判定された場合に収音結果として前記抽出音を出力することを特徴とする。 The sound collection method of the third aspect of the present invention includes (1) directivity formation means, non-target area sound extraction means, target area sound extraction means, band division means, power spectrum ratio calculation means, determination means, and output means. (2) The directivity forming means forms directivity in the direction of the target area from the input signal by a beamformer. (3) The non-target area sound extraction means is a directivity formed by the directivity forming means. (4) The target area sound extraction means exists in the target area direction extracted by the non-target area sound extraction means from the output of the beamformer. (5) The extracted sound obtained by extracting the target area sound using the non-target area sound is output. (5) The band dividing unit divides the input signal and the extracted sound into a plurality of bands. Above The power spectrum ratio calculating means calculates a power spectrum ratio between the input signal and the extracted sound for each divided band divided by the band dividing means. (7) The determining means is the power spectrum ratio calculating means. Using the calculated power spectrum ratio for each divided band, it is determined whether or not a target area sound exists in the input signal. (8) The output means determines that the target area sound exists in the determining means. The extracted sound is output as a sound collection result when the sound is collected.

第４の本発明の判定装置は、（１）入力信号からビームフォーマにより目的エリア方向に指向性を形成する指向性形成手段と、（２）前記指向性形成手段で形成された指向性による目的エリア方向に存在する非目的エリア音を抽出する非目的エリア音抽出手段と、（３）前記ビームフォーマの出力から、前記非目的エリア音抽出手段が抽出した目的エリア方向に存在する非目的エリア音を利用して目的エリア音を抽出した結果の抽出音を出力する目的エリア音抽出手段と、（４）前記入力信号と前記抽出音をそれぞれ複数の帯域に分割する帯域分割手段と、（５）前記帯域分割手段で分割された分割帯域ごとに、前記入力信号と前記抽出音のパワースペクトル比を算出するパワースペクトル比算出手段と、（６）前記パワースペクトル比算出手段で算出された分割帯域ごとのパワースペクトル比を用いて、前記入力信号に目的エリア音が存在するか否かを判定する判定手段とを有することを特徴とする。 The determination apparatus of the fourth aspect of the present invention includes (1) directivity forming means for forming directivity in the direction of a target area from an input signal by a beamformer, and (2) an object based on directivity formed by the directivity forming means. Non-target area sound extracting means for extracting non-target area sound existing in the area direction; and (3) non-target area sound existing in the target area direction extracted by the non-target area sound extracting means from the output of the beamformer. A target area sound extracting means for outputting an extracted sound obtained as a result of extracting the target area sound by using (4), a band dividing means for dividing the input signal and the extracted sound into a plurality of bands, and (5) Power spectrum ratio calculating means for calculating a power spectrum ratio between the input signal and the extracted sound for each divided band divided by the band dividing means; and (6) calculating the power spectrum ratio. Using power spectral ratios for each divided band calculated in stage, and having a determining means for determining whether or not sound object area is present in the input signal.

第５の本発明の判定プログラムは、コンピュータを、（１）入力信号からビームフォーマにより目的エリア方向に指向性を形成する指向性形成手段と、（２）前記指向性形成手段で形成された指向性による目的エリア方向に存在する非目的エリア音を抽出する非目的エリア音抽出手段と、（３）前記ビームフォーマの出力から、前記非目的エリア音抽出手段が抽出した目的エリア方向に存在する非目的エリア音を利用して目的エリア音を抽出した結果の抽出音を出力する目的エリア音抽出手段と、（４）前記入力信号と前記抽出音をそれぞれ複数の帯域に分割する帯域分割手段と、（５）前記帯域分割手段で分割された分割帯域ごとに、前記入力信号と前記抽出音のパワースペクトル比を算出するパワースペクトル比算出手段と、（６）前記パワースペクトル比算出手段で算出された分割帯域ごとのパワースペクトル比を用いて、前記入力信号に目的エリア音が存在するか否かを判定する判定手段として機能させることを特徴とする。 According to a fifth aspect of the present invention, there is provided a determination program, comprising: (1) directivity forming means for forming directivity from an input signal in a target area direction by a beamformer; and (2) directivity formed by the directivity forming means. Non-target area sound extraction means for extracting non-target area sound existing in the direction of the target area due to the nature, and (3) non-target area sound extraction means that is extracted in the target area direction extracted by the non-target area sound extraction means from the output of the beam former. A target area sound extracting means for outputting an extracted sound obtained by extracting the target area sound using the target area sound; and (4) a band dividing means for dividing the input signal and the extracted sound into a plurality of bands, respectively. (5) Power spectrum ratio calculation means for calculating a power spectrum ratio between the input signal and the extracted sound for each divided band divided by the band dividing means; (6) Using word spectrum ratio calculating means power spectral ratios for each calculated sub-bands in, characterized in that to function as determination means for determining whether or not sound object area is present in the input signal.

第６の本発明の判定方法は、（１）指向性形成手段、非目的エリア音抽出手段、目的エリア音抽出手段、帯域分割手段、パワースペクトル比算出手段、及び判定手段を有し、（２）前記指向性形成手段は、入力信号からビームフォーマにより目的エリア方向に指向性を形成し、（３）前記非目的エリア音抽出手段は、前記指向性形成手段で形成された指向性による目的エリア方向に存在する非目的エリア音を抽出し、（４）前記目的エリア音抽出手段は、前記ビームフォーマの出力から、前記非目的エリア音抽出手段が抽出した目的エリア方向に存在する非目的エリア音を利用して目的エリア音を抽出した結果の抽出音を出力し、（５）前記帯域分割手段は、前記入力信号と前記抽出音をそれぞれ複数の帯域に分割し、（６）前記パワースペクトル比算出手段は、前記帯域分割手段で分割された分割帯域ごとに、前記入力信号と前記抽出音のパワースペクトル比を算出し、（７）前記判定手段は、前記パワースペクトル比算出手段で算出された分割帯域ごとのパワースペクトル比を用いて、前記入力信号に目的エリア音が存在するか否かを判定することを特徴とする。 The determination method of the sixth aspect of the present invention includes (1) directivity formation means, non-target area sound extraction means, target area sound extraction means, band division means, power spectrum ratio calculation means, and determination means. ) The directivity forming means forms directivity in the direction of the target area from the input signal by a beamformer. (3) The non-target area sound extracting means is a target area by directivity formed by the directivity forming means. (4) The target area sound extracting means extracts the non-target area sound existing in the direction of the target area extracted by the non-target area sound extracting means from the output of the beamformer. (5) The band dividing means divides the input signal and the extracted sound into a plurality of bands, respectively, and (6) the power spectrum is output. Toll ratio calculating means calculates a power spectrum ratio of the input signal and the extracted sound for each divided band divided by the band dividing means. (7) The determining means is calculated by the power spectrum ratio calculating means. Using the power spectrum ratio for each divided band, it is determined whether a target area sound exists in the input signal.

本発明によれば、背景雑音が強い環境下において、目的エリア音の判定精度を向上させることができる。 ADVANTAGE OF THE INVENTION According to this invention, the determination precision of the target area sound can be improved in the environment where background noise is strong.

第１の実施形態に係る収音装置（判定装置）の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the sound collection device (determination device) concerning a 1st embodiment. 第１の実施形態に係る周波数帯域分割部が処理対象信号のパワースペクトルを分割帯域ごとに分割した例について示した図（グラフ）である。It is the figure (graph) shown about the example which the frequency band division part which concerns on 1st Embodiment divided | segmented the power spectrum of the process target signal for every division | segmentation band. 第１の実施形態に係る帯域別平均パワースペクトル比算出部が算出した分割帯域ごとの平均パワースペクトル比について示した図（グラフ）である。It is the figure (graph) shown about the average power spectrum ratio for every division band which the average power spectrum ratio calculation part classified by zone concerning a 1st embodiment computed. 第２の実施形態に係る収音装置（判定装置）の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the sound collection apparatus (determination apparatus) which concerns on 2nd Embodiment. 第２の実施形態に係る収音装置（判定装置）の目的エリア音判定処理の動作について示したフローチャートである。It is the flowchart shown about the operation | movement of the target area sound determination process of the sound collection device (determination apparatus) which concerns on 2nd Embodiment. 第３の実施形態に係る収音装置（判定装置）の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the sound collection device (judgment device) which concerns on 3rd Embodiment. 従来のマイクロホン数が２個の場合の減算型ＢＦに係る構成を示すブロック図である。It is a block diagram which shows the structure which concerns on the subtraction type BF in case the number of conventional microphones is two. 従来の２個のマイクロホンを用いた減算型ＢＦにより形成される指向特性を示す図である。It is a figure which shows the directional characteristic formed by the subtraction type | mold BF using the conventional two microphones. 従来の２つのマイクロホンアレイを用いて、目的エリアの音源からの目的エリア音を収音する場合における各マイクロホンアレイの構成例について示した説明図である。It is explanatory drawing shown about the example of a structure of each microphone array in the case of picking up the target area sound from the sound source of a target area using two conventional microphone arrays. 従来の２つマイクロホンアレイのそれぞれのＢＦ出力について周波数領域で示した説明図である。It is explanatory drawing shown in the frequency domain about each BF output of the conventional two microphone array. 従来の２つのマイクロホンアレイを用いて取得したＢＦ出力に基づいてエリア収音処理した場合における各成分のパワースペクトルの変化について示した説明図である。It is explanatory drawing shown about the change of the power spectrum of each component at the time of area sound collection processing based on BF output acquired using the conventional two microphone arrays.

（Ａ）第１の実施形態
以下、本発明による収音装置、プログラム及び方法、並びに、判定装置、プログラム及び方法の第１の実施形態を、図面を参照しながら詳述する。 (A) First Embodiment Hereinafter, a first embodiment of a sound collection device, a program and a method, and a determination device, a program and a method according to the present invention will be described in detail with reference to the drawings.

（Ａ−１）第１の実施形態の構成
図１は、この実施形態の収音装置１００の機能的構成について示したブロック図である。 (A-1) Configuration of the First Embodiment FIG. 1 is a block diagram showing the functional configuration of the sound collection device 100 of this embodiment.

収音装置１００は、２つのマイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）を用いて、目的エリアの音源からの目的エリア音を収音する目的エリア音収音処理を行う。 The sound collection device 100 uses two microphone arrays MA (MA1, MA2) to perform target area sound collection processing for collecting a target area sound from a sound source in the target area.

マイクロホンアレイＭＡ１、ＭＡ２は、目的エリアが存在する空聞の任意の場所に配置される。目的エリアに対するマイクロホンアレイＭＡ１、ＭＡ２の位置は、例えば、図９に示すように、指向性が目的エリアでのみ重なればどこでも良く、例えば目的エリアを挟んで対向に配置しても良い。各マイクロホンアレイＭＡは２つ以上のマイクロホンＭから構成され、各マイクロホンＭにより音響信号を収音する。この実施形態では、各マイクロホンアレイＭＡに、音響信号を収音する２つのマイクロホンＭ（Ｍ１、Ｍ２）が配置されるものとして説明する。すなわち、各マイクロホンアレイＭＡは、２ｃｈマイクロホンアレイを構成している。なお、マイクロホンアレイＭＡの数は２つに限定するものではなく、目的エリアが複数存在する場合、全てのエリアをカバーできる数のマイクロホンアレイＭＡを配置する必要がある。 The microphone arrays MA1 and MA2 are arranged at any place in the air where the target area exists. For example, as shown in FIG. 9, the positions of the microphone arrays MA1 and MA2 with respect to the target area may be anywhere as long as the directivity overlaps only in the target area. Each microphone array MA is composed of two or more microphones M, and an acoustic signal is collected by each microphone M. In this embodiment, description will be made assuming that two microphones M (M1, M2) that collect sound signals are arranged in each microphone array MA. That is, each microphone array MA constitutes a 2ch microphone array. The number of microphone arrays MA is not limited to two. When there are a plurality of target areas, it is necessary to arrange a number of microphone arrays MA that can cover all areas.

収音装置１００は、データ入力部１、指向性形成部２、遅延補正部３、空間座標データ４、目的エリア音パワー補正係数算出部５、目的エリア音抽出部６、周波数帯域分割部７、帯域別平均パワースペクトル比算出部８、及びエリア音判定部９を有している。収音装置１００を構成する各機能ブロックの詳細処理については後述する。 The sound collection device 100 includes a data input unit 1, a directivity forming unit 2, a delay correction unit 3, spatial coordinate data 4, a target area sound power correction coefficient calculation unit 5, a target area sound extraction unit 6, a frequency band division unit 7, A band-specific average power spectrum ratio calculation unit 8 and an area sound determination unit 9 are provided. Detailed processing of each functional block constituting the sound collection device 100 will be described later.

なお、この実施形態では、入力信号に目的エリア音が存在するか否かの判定処理結果に基づいて、目的エリア音の収音結果を出力する収音装置１００について説明するが、収音装置１００から目的エリア音の収音結果を出力する出力手段（エリア音判定部９の一部の処理）を省略して、目的エリア音の判定処理結果を出力する判定装置（判定プログラム、判定方法）として構成するようにしてもよい。 In this embodiment, the sound collection device 100 that outputs the sound collection result of the target area sound based on the determination processing result of whether or not the target area sound exists in the input signal will be described. As a determination device (determination program, determination method) for outputting the determination process result of the target area sound by omitting the output means for outputting the sound collection result of the target area sound (part of the processing of the area sound determination unit 9) You may make it comprise.

収音装置１００は、全てハードウェア（例えば、専用チップ等）により構成するようにしてもよいし一部又は全部についてソフトウェア（プログラム）として構成するようにしてもよい。収音装置１００は、例えば、プロセッサ及びメモリを有するコンピュータにプログラム（実施形態の判定プログラムや収音プログラムを含む）をインストールすることにより構成するようにしてもよい。 The sound collection device 100 may be configured entirely by hardware (for example, a dedicated chip or the like), or may be partially or entirely configured as software (program). For example, the sound collection device 100 may be configured by installing a program (including the determination program and the sound collection program of the embodiment) in a computer having a processor and a memory.

（Ａ−２）第１の実施形態の動作
次に、以上のような構成を有する第１の実施形態の収音装置１００の動作（実施形態に係る判定方法、及び収音方法）を説明する。 (A-2) Operation of First Embodiment Next, the operation (determination method and sound collection method according to the embodiment) of the sound collection device 100 of the first embodiment having the above-described configuration will be described. .

データ入力部１は、各マイクロホンアレイＭＡ１、ＭＡ２で収音した音響信号をアナログ信号からデジタル信号に変換する。そして、データ入力部１は、当該デジタル信号について、変換処理（例えば、高速フーリエ変換等を用いて時間領域から周波数領域へ変換する処理）を行う。 The data input unit 1 converts the acoustic signals collected by the microphone arrays MA1 and MA2 from analog signals to digital signals. And the data input part 1 performs the conversion process (For example, the process converted from a time domain to a frequency domain using a fast Fourier transform etc.) about the said digital signal.

指向性形成部２は、マイクロホンアレイＭＡ毎に、目的方向以外に存在する非目的エリア音を抽出（例えば、双指向性フィルタにより抽出）し、抽出した非目的エリア音の振幅スペクトルを入力信号の振幅スペクトルから減算することで、目的エリア方向に指向性を形成した音（ＢＦ出力）を取得する。具体的には、指向性形成部２は、マイクロホンアレイＭＡ毎に、（４）式に従いＢＦにより目的エリア方向に指向性を形成した音をＢＦ出力として取得する。なお、入力される信号が、マイクロホンアレイＭＡではなく、指向性マイクロホンから入力される信号である場合、指向性形成部２の処理を省略して、入力信号をそのまま後段側に供給するようにしてもよい。 The directivity forming unit 2 extracts, for each microphone array MA, a non-target area sound that exists in a direction other than the target direction (for example, by a bi-directional filter), and extracts the amplitude spectrum of the extracted non-target area sound of the input signal. By subtracting from the amplitude spectrum, a sound having a directivity in the direction of the target area (BF output) is acquired. Specifically, the directivity forming unit 2 acquires, as a BF output, a sound in which directivity is formed in the direction of the target area by BF according to the equation (4) for each microphone array MA. When the input signal is not a microphone array MA but a signal input from a directional microphone, the processing of the directivity forming unit 2 is omitted and the input signal is supplied as it is to the subsequent stage side. Also good.

遅延補正部３は、目的エリアと各マイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）の距離の違いにより発生する遅延を算出し、補正する。遅延補正部３は、空間座標データ４から目的エリアの位置とマイクロホンアレイの位置を取得し、各マイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）への目的エリア音の到達時間の差を算出する。次に、遅延補正部３は、最も目的エリアから遠い位置に配置されたマイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）を基準として、全てのマイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）に目的エリア音が同時に到達するように遅延を加える。 The delay correction unit 3 calculates and corrects a delay caused by a difference in distance between the target area and each microphone array MA (MA1, MA2). The delay correction unit 3 obtains the position of the target area and the position of the microphone array from the spatial coordinate data 4, and calculates the difference in arrival time of the target area sound to each microphone array MA (MA1, MA2). Next, the delay correcting unit 3 uses the microphone array MA (MA1, MA2) arranged farthest from the target area as a reference so that the target area sound reaches all the microphone arrays MA (MA1, MA2) simultaneously. Add a delay to

空間座標データ４は、全ての目的エリアと各マイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）と各マイクロホンアレイＭＡ（ＭＡ１、ＭＡ２）を構成するマイクロホンＭ（Ｍ１、Ｍ２）の位置情報を保持する。 The spatial coordinate data 4 holds position information of all target areas, microphone arrays MA (MA1, MA2), and microphones M (M1, M2) constituting each microphone array MA (MA1, MA2).

目的エリア音パワー補正係数算出部５は、各ＢＦ出力に含まれる目的エリア音成分のパワーを同じにするための補正係数を（５）式または（６）式に従い算出する。 The target area sound power correction coefficient calculation unit 5 calculates a correction coefficient for making the power of the target area sound component included in each BF output the same according to the equation (5) or (6).

目的エリア音抽出部６は、目的エリア音パワー補正係数算出部５で算出した補正係数により補正した各ＢＦ出力データを（７）式に従いＳＳし、目的エリア方向に存在する雑音を抽出する。さらに、目的エリア音抽出部６は、抽出した雑音を各ＢＦの出力から（８）式に従いＳＳすることにより目的エリア音を抽出する。 The target area sound extraction unit 6 performs SS on each BF output data corrected by the correction coefficient calculated by the target area sound power correction coefficient calculation unit 5 according to the equation (7), and extracts noise existing in the target area direction. Further, the target area sound extraction unit 6 extracts the target area sound by performing SS on the extracted noise from the output of each BF according to the equation (8).

周波数帯域分割部７は、データ入力部１からの入力信号、及び目的エリア音抽出部６からのエリア音出力Ｚ_１を取得し、それぞれを複数の帯域に分割する。ここで入力信号とエリア音出力の帯域幅は同じであるものとする。 Frequency band division unit 7 obtains an input signal from the data input unit 1, and the area sound output Z ₁ from the target area sound extraction unit 6 divides each of a plurality of bands. Here, it is assumed that the bandwidth of the input signal and the area sound output is the same.

以下では、周波数帯域分割部７及び帯域別平均パワースペクトル比算出部８における処理対象の入力信号として、マイクロホンアレイＭＡ１の入力信号Ｘ_１を代表して用いるものとするが、他のマイクロホン（他のマイクロホンアレイＭＡのマイクロホンであってもよい）の入力信号に置き換えるようにしてもよい。 In the following, as an input signal to be processed in the frequency band division unit 7 and the band-by-band average power spectrum ratio calculating unit 8, it is assumed to be used on behalf of the input signal X ₁ of the microphone array MA1, other microphone (other It may be replaced with the input signal of the microphone array MA).

周波数帯域分割部７は、例えば、処理対象の信号（入力信号Ｘ_１及びエリア音出力Ｚ_１）を、それぞれ所定の周波数帯域幅（一定間隔又は不定間隔）で分割する。以下では、周波数帯域分割部７が、処理対象の信号について複数に分割した周波数帯域をそれぞれ「分割帯域」と呼び、各分割帯域の信号（分割対象の信号から分割した信号）を「分割帯域信号」とも呼ぶものとする。 For example, the frequency band dividing unit 7 divides a signal to be processed (input signal X ₁ and area sound output Z ₁ ) by a predetermined frequency bandwidth (constant interval or indefinite interval). Hereinafter, the frequency band division unit 7 divides a frequency to be processed into a plurality of frequency bands, which are referred to as “divided bands”, and signals in each divided band (signals divided from the signals to be divided) are “divided band signals”. ".

周波数帯域分割部７は、各分割帯域の帯域幅を均等（等間隔）に設定してもよいし、周波数帯によって偏りを持たせて設定するようにしてもよい。例えば、周波数帯域分割部７は、高周波数であるほど分割帯域を広く設定（低域周波数であるほど分割帯域を狭く設定）するようにしてもよい。例えば、周波数帯域分割部７は、低周波数の帯域（例えば、１ｋＨｚ未満）については１００Ｈｚ間隔で分割帯域を設定し、低周波数でない帯域（例えば、１ｋＨｚ以上）については１ｋＨｚ間隔で分割帯域を設定するようにしてもよい。 The frequency band dividing unit 7 may set the bandwidths of the respective divided bands to be equal (equal intervals), or may be set with a bias depending on the frequency band. For example, the frequency band dividing unit 7 may set a wider divided band as the frequency is higher (set a smaller divided band as the frequency is lower). For example, the frequency band dividing unit 7 sets divided bands at 100 Hz intervals for low frequency bands (for example, less than 1 kHz), and sets divided bands at 1 kHz intervals for non-low frequency bands (for example, 1 kHz or more). You may do it.

また、周波数帯域分割部７は、音声情報（音声の成分）が十分に含まれる所定範囲の帯域（例えば、１００ｈｚ〜６ｋＨｚの範囲）内に分割帯域を設定し、それ以外の周波数帯の信号を捨象（分割帯域の対象外として切り捨て）するようにしてもよい。 In addition, the frequency band dividing unit 7 sets a divided band within a predetermined band (for example, a range of 100 hz to 6 kHz) in which audio information (sound component) is sufficiently included, and signals in other frequency bands You may make it discard (it cuts off as the object of a division | segmentation band).

この実施形態の例では、周波数帯域分割部７は、説明を簡易とするため、処理対象の信号を１ｋＨｚ間隔の分割帯域に分割するものとして以下の説明を行う。 In the example of this embodiment, in order to simplify the description, the frequency band dividing unit 7 will be described below assuming that the signal to be processed is divided into divided bands of 1 kHz intervals.

図２は、周波数帯域分割部７が処理する処理対象信号の例について示した図（帯域ごとのパワースペクトルを示したグラフ）である。 FIG. 2 is a diagram (a graph showing a power spectrum for each band) illustrating an example of a processing target signal processed by the frequency band dividing unit 7.

図２では、周波数帯域分割部７が、１００Ｈｚ〜６ｋＨｚまでの帯域の処理対象信号を、概ね１ｋｈｚ間隔で、６つの分割帯域Ｂ_１〜Ｂ_６に分割した例について示している。 FIG. 2 shows an example in which the frequency band dividing unit 7 divides a processing target signal in a band from 100 Hz to 6 kHz into six divided bands B _{1 to} B ₆ at approximately 1 kHz intervals.

帯域別平均パワースペクトル比算出部８は、各処理対象信号（入力信号Ｘ_１及びエリア音出力Ｚ_１）について、周波数帯域分割部７により分割した分割帯域（分割帯域信号）毎に、パワースペクトルを抽出（取得）する。そして、帯域別平均パワースペクトル比算出部８は、分割帯域ごとに（１１）式に基づき、平均パワースペクトル比（各分割帯域内のパワースペクトル比の平均）を算出する。 The band-specific average power spectrum ratio calculation unit 8 calculates the power spectrum for each divided band (divided band signal) divided by the frequency band dividing unit 7 for each processing target signal (input signal X ₁ and area sound output Z ₁ ). Extract (acquire). Then, the band-specific average power spectrum ratio calculation unit 8 calculates the average power spectrum ratio (the average of the power spectrum ratios in each divided band) based on the equation (11) for each divided band.

（１１）式において、「Ｒ_ｊ」は、ｊ番目の分割帯域（ｊは１〜Ｍのいずれかの整数；Ｍは分割した帯域の総数（分割帯域の個数））における平均パワースペクトル比である。また、（１１）式において、「Ｘ_１ｊ」は、マイクロホンアレイＭＡ１の入力信号Ｘ_１におけるｊ番目の分割帯域内の平均パワースペクトル（パワースペクトルの平均値）であり、「Ｚ_１ｊ」はエリア音出力Ｚ_１におけるｊ番目の分割帯域内の平均パワースペクトル（パワースペクトルの平均値）である。 In Expression (11), “R _j ” is an average power spectrum ratio in the j-th divided band (j is an integer from 1 to M; M is the total number of divided bands (number of divided bands)). . In Expression (11), “X _1j ” is an average power spectrum (average value of power spectrum) in the j-th divided band in the input signal X ₁ of the microphone array MA1, and “Z _1j ” is an area sound. is the average power spectrum of the j-th divided band in the output Z ₁ (average value of the power spectrum).

例えば、周波数帯域分割部７が、図２に示すように、各処理対象信号（入力信号Ｘ_１及びエリア音出力Ｚ_１）を６個の分割帯域Ｂ_１〜Ｂ_６に分割した場合を想定する。この場合、帯域別平均パワースペクトル比算出部８は、帯域別平均パワースペクトル比算出部８は、入力信号Ｘ_１の分割帯域Ｂ_１〜Ｂ_６からそれぞれ入力信号の平均パワースペクトルＸ_１１〜Ｘ_１６を取得する。また、帯域別平均パワースペクトル比算出部８は、エリア音出力Ｚ_１の分割帯域Ｂ_１〜Ｂ_６からそれぞれエリア音出力の平均パワースペクトルＺ_１１〜Ｚ_１６を取得する。 For example, it is assumed that the frequency band dividing unit 7 divides each signal to be processed (input signal X ₁ and area sound output Z ₁ ) into six divided bands B _{1 to} B _{6 as shown} in FIG. . In this case, the band-specific average power spectrum ratio calculation unit 8 and the band-specific average power spectrum ratio calculation unit 8 respectively calculate the average power spectrum X _{11 to} X _{16 of the} input signal from the divided bands B _{1 to} B ₆ of the input signal X _1. To get. Furthermore, per-band average power spectrum ratio calculating unit 8 obtains an average power spectrum _Z 11 _{to Z 16} each area sound output from the sub-bands _B 1 .about.B ₆ area sound output _{Z 1.}

そして、帯域別平均パワースペクトル比算出部８は、Ｘ_１１〜Ｘ_１６、及びＺ_１１〜Ｚ_１６を式（１１）に適用して分割帯域ごとの平均パワースペクトル比Ｒ_１〜Ｒ_６を算出する。 Then, the band-by-band average power spectrum ratio calculating unit _8, X 11 _{to X 16,} and _Z 11 _{to Z 16} calculates the average power spectrum ratio _R 1 to R ₆ of each divided band is applied to equation (11) .

図３は、帯域別平均パワースペクトル比算出部８が算出した分割帯域ごとの平均パワースペクトル比Ｒ_１〜Ｒ_６について示した図（グラフ）である。 FIG. 3 is a diagram (graph) showing the average power spectrum ratios R _{1 to} R ₆ for each divided band calculated by the band-specific average power spectrum ratio calculation unit 8.

図３では、分割帯域ごとの平均パワースペクトル比Ｒ_１〜Ｒ_６と、全帯域での平均パワースペクトル（右端の値）を示している。 FIG. 3 shows the average power spectrum ratios R _{1 to} R ₆ for each divided band and the average power spectrum (rightmost value) in all bands.

そして、帯域別平均パワースペクトル比算出部８は、分割帯域ごとの平均パワースペクトル比Ｒ_１〜Ｒ_６から（１２）式に従って、最も大きい値（平均パワースペクトル比）を、最大平均パワースペクトル比Ｕ_ｍａｘとして取得する。 Then, the band-specific average power spectrum ratio calculation unit 8 calculates the maximum value (average power spectrum ratio) from the average power spectrum ratios R _{1 to} R ₆ for each divided band according to the equation (12), and the maximum average power spectrum ratio U. Get as _max .

例えば、分割帯域ごとの平均パワースペクトル比Ｒ_１〜Ｒ_６の値が図３のような結果となった場合、最大平均パワースペクトル比Ｕ_ｍａｘは、分割帯域Ｂ_６の値となり、全帯域での平均パワースペクトルよりも大きい値になっていることが分かる。

For example, when the values of the average power spectrum ratios R _{1 to} R ₆ for each divided band are as shown in FIG. 3, the maximum average power spectrum ratio U _max is the value of the divided band B ₆ , and It can be seen that the value is larger than the average power spectrum.

エリア音判定部９は、帯域別平均パワースペクトル比算出部８により算出した最大平均パワースペクトル比Ｕ_ｍａｘを予め設定した閾値Ｔ１と比較し、目的エリア音が存在するか否か（入力信号に目的エリア音が含まれるか否か）を判定する。エリア音判定部９は、例えば、最大平均パワースペクトル比Ｕ_ｍａｘが閾値Ｔ１を超える場合に目的エリア音が存在すると判定し、最大平均パワースペクトル比Ｕ_ｍａｘが閾値Ｔ１以下の場合に目的エリア音が存在しないと判定するようにしてもよい。 The area sound determination unit 9 compares the maximum average power spectrum ratio U _max calculated by the band-specific average power spectrum ratio calculation unit 8 with a preset threshold value T1, and determines whether or not the target area sound exists (the input signal has a target Whether or not an area sound is included). For example, the area sound determination unit 9 determines that the target area sound is present when the maximum average power spectrum ratio U _max exceeds the threshold value T1, and the target area sound is determined when the maximum average power spectrum ratio U _max is equal to or less than the threshold value T1. You may make it determine with not existing.

エリア音判定部９は、目的エリア音が存在すると判定した場合、エリア収音処理データ（エリア音出力Ｚ_１（抽出音））をそのまま出力するようにしてもよい。一方、逆に目的エリア音が存在しないと判定した場合、エリア音判定部９は、エリア収音処理データ（エリア音出力Ｚ_１（抽出音））は出力せずに無音の音声データを出力するようにしてもよい。なお、エリア音判定部９は、無音の音声データの代わりに、入力信号（例えば、マイクロホンアレイＭＡ１の入力信号Ｘ_１）のゲインを弱めたものを出力しても良い。 If the area sound determination unit 9 determines that the target area sound exists, the area sound determination processing data (area sound output Z ₁ (extracted sound)) may be output as it is. On the other hand, when it is determined that the target area sound does not exist, the area sound determination unit 9 outputs silent sound data without outputting the area sound collection processing data (area sound output Z ₁ (extracted sound)). You may do it. The area sound determination unit 9 may output the input signal (for example, the input signal X ₁ of the microphone array MA1) with a weakened gain instead of the silent sound data.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、以下のような効果を奏することができる。 (A-3) Effects of First Embodiment According to the first embodiment, the following effects can be achieved.

第１の実施形態の収音装置１００では、入力信号（上記の例ではＸ_１）及びエリア音出力Ｚ_１を複数の分割帯域に分割し、分割帯域ごとの平均パワースペクトル比を求め、その最大値である最大平均パワースペクトル比Ｕ_ｍａｘに基づいて、目的エリア音が存在するか否かを判定している。 In the sound collection device 100 of the first embodiment, the input signal (X _{1 in the} above example) and the area sound output Z ₁ are divided into a plurality of divided bands, an average power spectrum ratio for each divided band is obtained, and the maximum Based on the maximum average power spectrum ratio U _max that is a value, it is determined whether or not the target area sound exists.

言い換えると、第１の実施形態の収音装置１００では、１つでも平均パワースペクトル比が閾値（上記の例ではＴ１）を超える分割帯域があれば、目的エリア音が存在すると判定する。人間の音声を目的エリア音とする場合、無声子音のパワーは小さいが、パワースペクトルにはピークがあるため、帯域を分割すれば、ピークを含む帯域のパワーは大きくなる。上述のような特性が存在するため、分割帯域ごとの平均パワースペクトル比の最大値（最大平均パワースペクトル比Ｕ_ｍａｘ）は、全帯域の平均パワースペクトル比と比べて差（例えば、ミュージカルノイズなどの雑音が発生している非目的エリア音区間と目的エリア音区間との差）が明確になる。したがって、第１の実施形態の収音装置１００では、全帯域の平均パワースペクトル比等を用いた従来の目的エリア音判定と比較して背景雑音が強い環境下において、目的エリア音の判定精度を向上させることができる。 In other words, in the sound collecting device 100 of the first embodiment, if there is even one divided band whose average power spectrum ratio exceeds the threshold (T1 in the above example), it is determined that the target area sound exists. When human voice is used as the target area sound, the power of the unvoiced consonant is small, but the power spectrum has a peak. Therefore, if the band is divided, the power of the band including the peak is increased. Since the characteristics as described above exist, the maximum value (maximum average power spectrum ratio U _max ) of the average power spectrum ratio for each divided band is different from the average power spectrum ratio of the entire band (for example, musical noise or the like). The difference between the non-target area sound section where the noise is generated and the target area sound section becomes clear. Therefore, in the sound collection device 100 of the first embodiment, the determination accuracy of the target area sound is improved in an environment where the background noise is strong compared to the conventional target area sound determination using the average power spectrum ratio of the entire band. Can be improved.

さらに、第１の実施形態では、分割帯域ごとの平均パワースペクトル比の最大値（最大平均パワースペクトル比Ｕ_ｍａｘ）を用いた目的エリア音の判定を行うため、目的エリア音判定に用いる帯域をピーク周辺に局所化しつつも、１点のサンプルだけを用いて判定を行うわけではないため、バースト的に発生したノイズに左右されにくい安定的な判定処理を行うことができる。 Furthermore, in the first embodiment, since the target area sound is determined using the maximum value of the average power spectrum ratio for each divided band (maximum average power spectrum ratio U _max ), the band used for the target area sound determination is peaked. Although the determination is not performed using only one sample while being localized in the periphery, it is possible to perform a stable determination process that is not easily influenced by noise generated in a burst manner.

（Ｂ）第２の実施形態
以下、本発明による収音装置、プログラム及び方法、並びに、判定装置、プログラム及び方法の第２の実施形態を、図面を参照しながら詳述する。 (B) Second Embodiment Hereinafter, a second embodiment of a sound collection device, a program and a method, and a determination device, a program and a method according to the present invention will be described in detail with reference to the drawings.

（Ｂ−１）第２の実施形態の構成
図４は、この実施形態の収音装置１００Ａの機能的構成について示したブロック図である。図４では、上述の図１と同一部分又は対応部分に同一符号又は対応符号を付している。 (B-1) Configuration of Second Embodiment FIG. 4 is a block diagram showing a functional configuration of the sound collection device 100A of this embodiment. In FIG. 4, the same or corresponding parts as those in FIG.

以下では、第２の実施形態の収音装置１００Ａについて、第１の実施形態との差異を説明する。 Hereinafter, the difference from the first embodiment will be described for the sound collection device 100A of the second embodiment.

収音装置１００Ａでは、エリア音判定部９がエリア音判定部９Ａに置き換わり、さらに、全帯域平均パワースペクトル比算出部１０が追加されている点で、第１の実施形態と異なっている。 The sound collection device 100A is different from the first embodiment in that the area sound determination unit 9 is replaced with an area sound determination unit 9A, and an all-band average power spectrum ratio calculation unit 10 is added.

全帯域平均パワースペクトル比算出部１０は、全帯域で平均パワースペクトル比を算出するものである。 The all-band average power spectrum ratio calculation unit 10 calculates an average power spectrum ratio in all bands.

エリア音判定部９Ａでは、周波数帯域分割部７、帯域別平均パワースペクトル比算出部８、及び全帯域平均パワースペクトル比算出部１０を制御して、目的エリア音の有無を判定する。 The area sound determination unit 9A controls the frequency band division unit 7, the band-specific average power spectrum ratio calculation unit 8, and the all-band average power spectrum ratio calculation unit 10 to determine the presence or absence of the target area sound.

（Ｂ−２）第２の実施形態の動作
次に、以上のような構成を有する第２の実施形態の収音装置１００Ａの動作（実施形態に係る判定方法、及び収音方法）について第１の実施形態との差異を説明する。 (B-2) Operation of the Second Embodiment Next, the operation (the determination method and the sound collection method according to the embodiment) of the sound collection device 100A of the second embodiment having the above configuration is described first. Differences from the embodiment will be described.

収音装置１００Ａでは、エリア音判定部９Ａによる目的エリア音の判定処理が異なる点で第１の実施形態と異なる。以下では、エリア音判定部９Ａを中心とした目的エリア音の判定処理について説明する。 The sound collection device 100A is different from the first embodiment in that the target area sound determination processing by the area sound determination unit 9A is different. Below, the determination process of the target area sound centering on the area sound determination part 9A is demonstrated.

図５は、収音装置１００Ａ（エリア音判定部９Ａ）による目的エリア音の判定処理について示したフローチャートである。 FIG. 5 is a flowchart showing the determination process of the target area sound by the sound collection device 100A (area sound determination unit 9A).

図５のフローチャートにおいて、エリア音判定処理に用いるＴ１、Ｔ２、Ｔ３は閾値である。閾値Ｔ１は第１の実施形態と同様のものを適用することができる。また、閾値Ｔ２は、閾値Ｔ３よりも大きい値（Ｔ２＞Ｔ３）であるものとする。「Ｔ１」と、「Ｔ２、Ｔ３」の大小関係は限定されないものであり、実験等により確認された好適な値を適用することができる。 In the flowchart of FIG. 5, T1, T2, and T3 used for the area sound determination process are threshold values. The threshold T1 can be the same as that in the first embodiment. The threshold value T2 is assumed to be a value larger than the threshold value T3 (T2> T3). The magnitude relationship between “T1” and “T2, T3” is not limited, and a suitable value confirmed by an experiment or the like can be applied.

エリア音判定部９Ａは、まず、全帯域平均パワースペクトル比算出部１０を制御して、全帯域平均パワースペクトル比を算出させる（Ｓ１０１）。 First, the area sound determination unit 9A controls the all-band average power spectrum ratio calculation unit 10 to calculate the all-band average power spectrum ratio (S101).

全帯域平均パワースペクトル比算出部１０は、（９）式、（１０）式に従い全帯域平均パワースペクトル比を算出する。 The all-band average power spectrum ratio calculation unit 10 calculates the all-band average power spectrum ratio according to the equations (9) and (10).

次に、エリア音判定部９Ａは、全帯域平均パワースペクトル比算出部１０が算出した全帯域平均パワースペクトル比が、閾値Ｔ２を超えているか否か（Ｕ＞Ｔ２か否か）を判断する（Ｓ１０２）。エリア音判定部９Ａは、全帯域平均パワースペクトル比が、閾値Ｔ２を超えている場合後述するステップＳ１０４から動作し、そうでない場合には後述するステップＳ１０３から動作する。 Next, the area sound determination unit 9A determines whether or not the all-band average power spectrum ratio calculated by the all-band average power spectrum ratio calculation unit 10 exceeds the threshold T2 (whether U> T2 or not). S102). The area sound determination unit 9A operates from step S104 to be described later when the all-band average power spectrum ratio exceeds the threshold value T2, and operates from step S103 to be described otherwise.

全帯域平均パワースペクトル比が閾値Ｔ２を超えている場合（Ｕ＞Ｔ２の場合）、エリア音判定部９Ａは、目的エリア音は存在すると判断し（Ｓ１０４）、目的エリア音の判定処理を終了する。 When the all-band average power spectrum ratio exceeds the threshold value T2 (when U> T2), the area sound determination unit 9A determines that the target area sound exists (S104), and ends the target area sound determination process. .

一方、全帯域平均パワースペクトル比が閾値Ｔ２以下の場合（Ｕ≦Ｔ２の場合）、エリア音判定部９Ａは、全帯域平均パワースペクトル比が、閾値Ｔ３を超えているか否か（Ｕ＞Ｔ３か否か）を判断する（Ｓ１０３）。エリア音判定部９Ａは、全帯域平均パワースペクトル比が、閾値Ｔ３を超えている場合後述するステップＳ１０５から動作し、そうでない場合には後述するステップＳ１０８から動作する。 On the other hand, when the all-band average power spectrum ratio is equal to or less than the threshold T2 (when U ≦ T2), the area sound determination unit 9A determines whether the all-band average power spectrum ratio exceeds the threshold T3 (U> T3). (S103). The area sound determination unit 9A operates from step S105 described later when the all-band average power spectrum ratio exceeds the threshold T3, and operates from step S108 described later otherwise.

全帯域平均パワースペクトル比が閾値Ｔ３を超えている場合（Ｕ＞Ｔ３の場合）、エリア音判定部９Ａは、周波数帯域分割部７及び帯域別平均パワースペクトル比算出部８を制御して、第１の実施形態と同様の処理により、分割帯域ごとに、平均パワースペクトル比を算出させる（Ｓ１０５）。 When the all-band average power spectrum ratio exceeds the threshold T3 (when U> T3), the area sound determination unit 9A controls the frequency band division unit 7 and the band-specific average power spectrum ratio calculation unit 8 to The average power spectrum ratio is calculated for each divided band by the same processing as in the first embodiment (S105).

次に、エリア音判定部９Ａは、第１の実施形態と同様に、帯域別平均パワースペクトル比算出部８を制御して、分割帯域ごとの平均パワースペクトル比から最大平均パワースペクトル比Ｕ_ｍａｘを算出させ、最大平均パワースペクトル比Ｕ_ｍａｘが閾値Ｔ１を超えるか否か判定する（Ｓ１０６）。言い換えると、エリア音判定部９Ａ及び帯域別平均パワースペクトル比算出部８は、分割帯域ごとの平均パワースペクトル比に閾値Ｔ１を超えるものがあるか否かを判定する処理を行うことになる。 Next, similarly to the first embodiment, the area sound determination unit 9A controls the band-specific average power spectrum ratio calculation unit 8 to obtain the maximum average power spectrum ratio U _max from the average power spectrum ratio for each divided band. It is calculated, and it is determined whether or not the maximum average power spectrum ratio U _max exceeds the threshold T1 (S106). In other words, the area sound determination unit 9A and the band-specific average power spectrum ratio calculation unit 8 perform a process of determining whether or not there is an average power spectrum ratio for each divided band that exceeds the threshold T1.

最大平均パワースペクトル比Ｕ_ｍａｘが閾値Ｔ１を超える場合（分割帯域ごとの平均パワースペクトル比に閾値Ｔ１を超えるものがある場合）、エリア音判定部９Ａは、後述するステップＳ１０７から動作し、そうでない場合後述するステップＳ１０８から動作する。 When the maximum average power spectrum ratio U _max exceeds the threshold value T1 (when there is an average power spectrum ratio for each divided band that exceeds the threshold value T1), the area sound determination unit 9A operates from step S107 described later, and is not so. In this case, the operation starts from step S108 described later.

最大平均パワースペクトル比Ｕ_ｍａｘが閾値Ｔ１を超える場合（Ｕ_ｍａｘ＞Ｔ１の場合）、エリア音判定部９Ａは、目的エリア音は存在すると判断し（Ｓ１０７）、目的エリア音の判定処理を終了する。 When the maximum average power spectrum ratio U _max exceeds the threshold T1 (when U _max > T1), the area sound determination unit 9A determines that the target area sound exists (S107), and ends the determination process of the target area sound. .

一方、上述のステップＳ１０３で全帯域平均パワースペクトル比が閾値Ｔ３以下の場合（Ｕ≦Ｔ３の場合）、又は上述のステップＳ１０６で最大平均パワースペクトル比Ｕ_ｍａｘが閾値Ｔ１以下の場合（Ｕ_ｍａｘ≦Ｔ１の場合）、エリア音判定部９Ａは、目的エリア音は存在しないと判断し（Ｓ１０８）、目的エリア音の判定処理を終了する。 On the other hand, when the all-band average power spectrum ratio is equal to or less than the threshold T3 in the above-described step S103 (when U ≦ T3), or when the maximum average power spectrum ratio U _max is equal to or less than the threshold T1 in the above-described step S106 (U _max ≦ In the case of T1), the area sound determination unit 9A determines that there is no target area sound (S108), and ends the determination process of the target area sound.

エリア音判定部９Ａは、まず、全帯域平均パワースペクトル比算出部１０に全帯域平均パワースペクトル比を算出させて、全帯域平均パワースペクトル比に基づいた目的エリア音の判定処理（以下、「第１の判定処理」と呼ぶ）を行う。具体的には、エリア音判定部９Ａは上述の通り全帯域平均パワースペクトル比が閾値Ｔ２より大きい場合には目的エリア音は存在すると判定し、全帯域平均パワースペクトル比が閾値Ｔ３以下の場合は目的エリア音は存在しないと判定する。 The area sound determination unit 9A first causes the all-band average power spectrum ratio calculation unit 10 to calculate the all-band average power spectrum ratio, and determines the target area sound based on the all-band average power spectrum ratio (hereinafter referred to as “No. 1 determination process). Specifically, the area sound determination unit 9A determines that the target area sound exists when the all-band average power spectrum ratio is larger than the threshold T2 as described above, and when the all-band average power spectrum ratio is equal to or less than the threshold T3. It is determined that there is no destination area sound.

そして、エリア音判定部９Ａは、全帯域平均パワースペクトル比が閾値Ｔ２以下で、閾値Ｔ３を超える場合（Ｔ２≦Ｕ＞Ｔ３）には、第１の判定処理では目的エリア音の判定はできないと判断し、周波数帯域分割部７及び帯域別平均パワースペクトル比算出部８を制御して、第１の実施形態と同様の処理により、最大平均パワースペクトル比Ｕ_ｍａｘを算出させ、最大平均パワースペクトル比Ｕ_ｍａｘに基づいた目的エリア音の有無を判定する処理（以下、「第２の判定処理」と呼ぶ）を行う。 Then, the area sound determination unit 9A determines that the target area sound cannot be determined by the first determination process when the all-band average power spectrum ratio is equal to or less than the threshold T2 and exceeds the threshold T3 (T2 ≦ U> T3). The frequency band dividing unit 7 and the band-specific average power spectrum ratio calculating unit 8 are controlled to calculate the maximum average power spectrum ratio U _max by the same processing as in the first embodiment, and the maximum average power spectrum ratio A process of determining the presence or absence of a target area sound based on U _max (hereinafter referred to as “second determination process”) is performed.

（Ｂ−３）第２の実施形態の効果
第２の実施形態によれば、以下のような効果を奏することができる。 (B-3) Effects of Second Embodiment According to the second embodiment, the following effects can be achieved.

第２の実施形態の収音装置１００Ａ（エリア音判定部９Ａ）は、まず全帯域平均パワースペクトル比に基づいて目的エリア音の判定処理（第１の判定処理）を行い、全帯域平均パワースペクトル比が閾値Ｔ２以下で、閾値Ｔ３を超える場合（Ｔ２≦Ｕ＞Ｔ３）には最大平均パワースペクトル比Ｕ_ｍａｘに基づいて目的エリア音の判定処理（第２の判定処理）を行う。 The sound collection device 100A (area sound determination unit 9A) of the second embodiment first performs a target area sound determination process (first determination process) based on the entire band average power spectrum ratio, and the entire band average power spectrum. When the ratio is equal to or smaller than the threshold T2 and exceeds the threshold T3 (T2 ≦ U> T3), the target area sound determination process (second determination process) is performed based on the maximum average power spectrum ratio U _max .

これにより、収音装置１００Ａ（エリア音判定部９Ａ）は、第１の判定処理（全帯域平均パワースペクトル比に基づいた判定処理）のみで充分な精度で目的エリア音の判定処理が可能な場合（例えば、全帯域平均パワースペクトル比が十分大きい場合）には、第２の判定処理（帯域分割の処理等）は行わない。一方、収音装置１００Ａ（エリア音判定部９Ａ）は、第１の判定処理（全帯域平均パワースペクトル比に基づいた判定処理）では充分な精度で目的エリア音の判定処理ができない場合（例えば、無声子音のように平均パワースペクトル比Ｕが小さいとき）にのみ、第２の判定処理（帯域分割により最大平均パワースペクトル比Ｕ_ｍａｘを算出して目的エリア音を判定する処理）を行う。 Thereby, the sound collection device 100A (area sound determination unit 9A) can perform the determination process of the target area sound with sufficient accuracy only by the first determination process (determination process based on the entire band average power spectrum ratio). For example (when the all-band average power spectrum ratio is sufficiently large), the second determination process (band division process or the like) is not performed. On the other hand, the sound collection device 100A (area sound determination unit 9A) cannot perform the target area sound determination process with sufficient accuracy in the first determination process (determination process based on the entire band average power spectrum ratio) (for example, Only when the average power spectrum ratio U is small as in the case of an unvoiced consonant, the second determination process (a process of determining the target area sound by calculating the maximum average power spectrum ratio U _max by band division) is performed.

すなわち、収音装置１００Ａ（エリア音判定部９Ａ）は、第１の判定処理では充分な精度で目的エリア音の判定処理が可能な場合にのみ、より処理量の多い帯域分割を伴う第２の判定処理を行うため、効率的な目的エリア音の判定処理を行うことができる。 That is, the sound collection device 100A (area sound determination unit 9A) performs the second band division with a larger processing amount only when the target area sound can be determined with sufficient accuracy in the first determination process. Since the determination process is performed, an efficient target area sound determination process can be performed.

（Ｃ）第３の実施形態
以下、本発明による収音装置、プログラム及び方法、並びに、判定装置、プログラム及び方法の第３の実施形態を、図面を参照しながら詳述する。 (C) Third Embodiment Hereinafter, a third embodiment of the sound collection device, program and method, and determination device, program and method according to the present invention will be described in detail with reference to the drawings.

（Ｃ−１）第３の実施形態の構成
図６は、この実施形態の収音装置１００Ｂの機能的構成について示したブロック図である。図６では、上述の図１と同一部分又は対応部分に同一符号又は対応符号を付している。 (C-1) Configuration of Third Embodiment FIG. 6 is a block diagram showing a functional configuration of the sound collection device 100B of this embodiment. In FIG. 6, the same or corresponding parts as those in FIG.

以下では、第３の実施形態の収音装置１００Ｂについて、第１の実施形態との差異を説明する。 Below, the difference with 1st Embodiment is demonstrated about the sound collection apparatus 100B of 3rd Embodiment.

収音装置１００Ｂでは、エリア音判定部９がエリア音判定部９Ｂに置き換わり、さらに、帯域間パワースペクトル比算出部１１が追加されている点で、第１の実施形態と異なっている。 The sound collection device 100B is different from the first embodiment in that the area sound determination unit 9 is replaced with an area sound determination unit 9B, and an interband power spectrum ratio calculation unit 11 is added.

帯域間パワースペクトル比算出部１１は、帯域別平均パワースペクトル比算出部８が求めた分割帯域ごとの平均パワースペクトル比から最小値（以下、「最小平均パワースペクトル比Ｕ_ｍｉｎ」と呼ぶ）を算出する。そして、帯域間パワースペクトル比算出部１１は、帯域別平均パワースペクトル比算出部８が求めた最大平均パワースペクトル比Ｕ_ｍａｘ（分割帯域ごとの平均パワースペクトル比Ｒの最大値）と、最小平均パワースペクトル比Ｕ_ｍｉｎの比（以下、「帯域間パワースペクトル比Ｖ」と呼ぶ）を求める。 The inter-band power spectrum ratio calculation unit 11 calculates a minimum value (hereinafter referred to as “minimum average power spectrum ratio U _min ”) from the average power spectrum ratio for each divided band obtained by the band-specific average power spectrum ratio calculation unit 8. To do. Then, the inter-band power spectrum ratio calculation unit 11 calculates the maximum average power spectrum ratio U _max (the maximum value of the average power spectrum ratio R for each divided band) obtained by the band-specific average power spectrum ratio calculation unit 8 and the minimum average power. A ratio of spectrum ratio U _min (hereinafter referred to as “interband power spectrum ratio V”) is obtained.

そして、エリア音判定部９Ｂは、帯域間パワースペクトル比Ｖに基づいて目的エリア音を判定する点で、第１の実施形態と異なっている。 The area sound determination unit 9B is different from the first embodiment in that the target area sound is determined based on the inter-band power spectrum ratio V.

なお、第２の実施形態において、第２の判定処理を、帯域間パワースペクトル比Ｖを用いた判定処理に置き換えるようにしてもよい。 In the second embodiment, the second determination process may be replaced with a determination process using the interband power spectrum ratio V.

（Ｃ−２）第３の実施形態の動作
次に、以上のような構成を有する第３の実施形態の収音装置１００Ｂの動作（実施形態に係る判定方法、及び収音方法）について第１の実施形態との差異を説明する。 (C-2) Operation of the Third Embodiment Next, the operation (the determination method and the sound collection method according to the embodiment) of the sound collection device 100B of the third embodiment having the above configuration is described first. Differences from the embodiment will be described.

収音装置１００Ｂでは、エリア音判定部９Ｂによる目的エリア音の判定処理が異なる点で第１の実施形態と異なる。以下では、エリア音判定部９Ｂを中心とした目的エリア音の判定処理について説明する。 The sound collection device 100B is different from the first embodiment in that the determination process of the target area sound by the area sound determination unit 9B is different. Below, the determination process of the target area sound centering on the area sound determination part 9B is demonstrated.

帯域間パワースペクトル比算出部１１は、（１３）式に従い、帯域別平均パワースペクトル比算出部８が求めた分割帯域ごとの平均パワースペクトル比から最小平均パワースペクトル比Ｕ_ｍｉｎを求める。 The inter-band power spectrum ratio calculation unit 11 calculates the minimum average power spectrum ratio U _min from the average power spectrum ratio for each divided band determined by the band-based average power spectrum ratio calculation unit 8 according to the equation (13).

そして、帯域間パワースペクトル比算出部１１は、（１４）式に従い、最大平均パワースペクトル比Ｕ_ｍａｘ及び最小平均パワースペクトル比Ｕ_ｍｉｎに基づき帯域間パワースペクトル比Ｖを算出する。

Then, the inter-band power spectrum ratio calculation unit 11 calculates the inter-band power spectrum ratio V based on the maximum average power spectrum ratio U _max and the minimum average power spectrum ratio U _min according to the equation (14).

例えば、分割帯域ごとの平均パワースペクトル比（Ｒ_１〜Ｒ_６）の値が図３のような結果となった場合、最大平均パワースペクトル比Ｕ_ｍａｘは、分割帯域Ｂ_６の値となり、最小平均パワースペクトル比Ｕ_ｍｉｎの値は分割帯域Ｂ_３の値となる。 For example, when the value of the average power spectrum ratio (R _{1 to} R ₆ ) for each divided band is as shown in FIG. 3, the maximum average power spectrum ratio U _max is the value of the divided band B ₆ , and the minimum average The value of the power spectrum ratio U _{min is} the value of the divided band B ₃ .

エリア音判定部９Ｂは、帯域間パワースペクトル比Ｖと閾値Ｔ４を比較し、帯域間パワースペクトル比Ｖが閾値Ｔ４より大きい場合（Ｖ＞Ｔ４の場合）には目的エリア音が存在すると判定し、帯域間パワースペクトル比Ｖが閾値Ｔ４以下の場合（Ｖ≦Ｔ４の場合）目的エリア音は存在しないと判定するものとする。 The area sound determination unit 9B compares the power spectrum ratio V between the bands and the threshold T4, and determines that the target area sound exists when the power spectrum ratio V between the bands is larger than the threshold T4 (when V> T4), When the inter-band power spectrum ratio V is equal to or less than the threshold value T4 (when V ≦ T4), it is determined that there is no target area sound.

（Ｃ−３）第３の実施形態の効果
第３の実施形態によれば、第１の実施形態と比較して以下のような効果を奏することができる。 (C-3) Effects of the Third Embodiment According to the third embodiment, the following effects can be achieved as compared with the first embodiment.

第３の実施形態の収音装置１００Ｂでは、帯域間パワースペクトル比Ｖに基づいて目的エリア音を検出するので、より小さなパワースペクトルの目的エリア音成分も検出することができる。 In the sound collection device 100B of the third embodiment, since the target area sound is detected based on the interband power spectrum ratio V, a target area sound component having a smaller power spectrum can also be detected.

（Ｄ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (D) Other Embodiments The present invention is not limited to the above-described embodiments, and may include modified embodiments as exemplified below.

（Ｄ−１）上記の各実施形態において、エリア音判定部９（９Ａ、９Ｂ）は、最大平均パワースペクトル比Ｕ_ｍａｘが閾値Ｔ１よりも一定以上大きい場合、その後の数秒間は、最大平均パワースペクトル比Ｕ_ｍａｘに関わらず目的エリア音が存在すると判定する機能（ハングオーバー機能）に対応するようにしてもよい。 (D-1) In each of the above-described embodiments, the area sound determination unit 9 (9A, 9B) determines that the maximum average power for a few seconds thereafter when the maximum average power spectrum ratio U _max is greater than a certain value by more than a threshold value T1. You may make it respond | correspond to the function (hangover function) which determines that the target area sound exists irrespective of the spectrum ratio _Umax .

（Ｄ−２）上記の各実施形態の収音装置（判定装置）では、分割帯域ごとの平均パワースペクトル比を算出し、その最大値である最大平均パワースペクトル比Ｕ_ｍａｘを目的エリア音判定に利用しているが、各分割帯域におけるパワースペクトル比の平均値（平均パワースペクトル比）ではなく、各分割帯域におけるパワースペクトル比の代表値を１つ取得し、その代表値（以下、「代表パワースペクトル比」と呼ぶ）の最大値（以下、「最大代表パワースペクトル比」と呼ぶ）を、最大平均パワースペクトル比Ｕ_ｍａｘに置き換えて利用するようにしてもよい。 (D-2) In the sound collection device (determination device) of each of the above embodiments, the average power spectrum ratio for each divided band is calculated, and the maximum average power spectrum ratio U _max that is the maximum value is used for target area sound determination. Although it is used, one representative value of the power spectrum ratio in each divided band is obtained instead of the average value (average power spectrum ratio) of the power spectrum ratio in each divided band, and the representative value (hereinafter referred to as “representative power”). The maximum value (hereinafter referred to as “maximum representative power spectrum ratio”) of the “spectrum ratio” may be used by replacing it with the maximum average power spectrum ratio U _max .

すなわち、上記の各実施形態において、帯域別平均パワースペクトル比算出部８が、各分割帯域から代表パワースペクトル比を取得し、各分割帯域の代表パワースペクトル比の最大値を最大代表パワースペクトル比として取得し、最大平均パワースペクトル比Ｕ_ｍａｘに置き換えて目的エリア音判定に利用するようにしてもよい。上記の各実施形態において、各分割帯域から代表パワースペクトル比（代表値）を取得する位置は限定されないものであるが、例えば、中央値等を取得するようにしてもよい。 That is, in each of the embodiments described above, the band-specific average power spectrum ratio calculation unit 8 acquires the representative power spectrum ratio from each divided band, and sets the maximum value of the representative power spectrum ratio of each divided band as the maximum representative power spectrum ratio. It may be acquired and replaced with the maximum average power spectrum ratio U _max for use in target area sound determination. In each of the above embodiments, the position for obtaining the representative power spectrum ratio (representative value) from each divided band is not limited, but for example, the median value or the like may be obtained.

以上のように、上記の各実施形態では、目的エリア音判定において、分割帯域ごとのパワースペクトル比（例えば、平均パワースペクトル比や代表パワースペクトル比）の最大値（例えば、最大平均パワースペクトル比Ｕ_ｍａｘや最大代表パワースペクトル比等）を用いる。 As described above, in each of the above embodiments, in the target area sound determination, the maximum value (for example, the maximum average power spectrum ratio U) of the power spectrum ratio (for example, the average power spectrum ratio or the representative power spectrum ratio) for each divided band. _max , maximum representative power spectrum ratio, etc.).

１００…収音装置（判定装置）、１…信号入力部、２…指向性形成部、３…遅延補正部、４…空間座標データ、５…目的エリア音パワー補正係数算出部、６…目的エリア音抽出部、７…周波数帯域分割部、８…帯域別平均パワースペクトル比算出部、９…エリア音判定部、ＭＡ、ＭＡ１、ＭＡ２…マイクロホンアレイ、Ｍ、Ｍ１、Ｍ２…マイクロホン。 DESCRIPTION OF SYMBOLS 100 ... Sound collecting device (determination device), 1 ... Signal input part, 2 ... Directionality formation part, 3 ... Delay correction part, 4 ... Spatial coordinate data, 5 ... Target area sound power correction coefficient calculation part, 6 ... Target area Sound extraction unit, 7... Frequency band division unit, 8... Average power spectrum ratio calculation unit by band, 9... Area sound determination unit, MA, MA1, MA2 .. microphone array, M, M1, M2.

Claims

Directivity forming means for forming directivity in the direction of the target area by a beamformer from an input signal;
Non-target area sound extracting means for extracting non-target area sound existing in the target area direction due to directivity formed by the directivity forming means;
A target area sound extraction means for outputting a sound extracted as a result of extracting a target area sound using a non-target area sound existing in a target area direction extracted by the non-target area sound extraction means from the output of the beam former; ,
Band dividing means for dividing the input signal and the extracted sound into a plurality of bands, respectively.
Power spectrum ratio calculating means for calculating a power spectrum ratio of the input signal and the extracted sound for each divided band divided by the band dividing means;
Determining means for determining whether or not a target area sound exists in the input signal, using the power spectrum ratio for each divided band calculated by the power spectrum ratio calculating means;
And an output means for outputting the extracted sound as a sound collection result when it is determined by the determination means that a target area sound exists.

The determination means determines whether a target area sound exists in the input signal according to a comparison result between the maximum value of the power spectrum ratio for each divided band calculated by the power spectrum ratio calculation means and the first threshold value. The sound collecting device according to claim 1, wherein the sound collecting device is determined.

A total band average power spectrum ratio calculating means for calculating an average power spectrum ratio of the entire band of the input signal and the extracted sound;
The determination means first determines whether or not a target area sound exists in the input signal based on the average power spectrum ratio of the all bands calculated by the all band average power spectrum ratio calculation means. Process
The band dividing unit divides the input signal and the extracted sound into a plurality of bands, respectively, when it is not possible to determine whether a target area sound exists in the input signal in the first determination process,
The power spectrum ratio calculation means, for each band that is divided by the band dividing means, when the first determination process cannot determine whether a target area sound exists in the input signal. Calculate the power spectrum ratio of the signal and the extracted sound,
The determination means determines the input signal from the power spectrum ratio calculated by the power spectrum ratio calculation means when the first determination processing cannot determine whether or not a target area sound exists in the input signal. The sound collection device according to claim 1, wherein a second determination process is performed to determine whether or not there is a target area sound.

In the first determination process, when the average power spectrum ratio of all the bands calculated by the all-band average power spectrum ratio calculating unit is a value exceeding a second threshold value in the first determination process, When it is determined that sound is present, and the average power spectrum ratio of the entire band calculated by the all-band average power spectrum ratio calculating unit is a value equal to or smaller than a third threshold smaller than the second threshold, the input signal When it is determined that the target area sound does not exist and the average power spectrum ratio of the entire band calculated by the all-band average power spectrum ratio calculating unit exceeds the third threshold and is equal to or less than the second threshold. 4. The sound collecting device according to claim 3, wherein a result that it cannot be determined whether or not a target area sound exists in the input signal is obtained.

In the second determination process, the determination unit includes a target area in the input signal according to a comparison result between the maximum value of the power spectrum ratio for each divided band calculated by the power spectrum ratio calculation unit and the first threshold value. The sound collecting device according to claim 3, wherein it is determined whether or not sound exists.

The determination means is responsive to a comparison result between the fourth threshold and the inter-band power spectrum ratio represented by the ratio between the maximum value and the minimum value of the power spectrum ratio for each divided band calculated by the power spectrum ratio calculation means. The sound collection device according to claim 1, wherein it is determined whether or not a target area sound exists in the input signal.

Computer
Directivity forming means for forming directivity in the direction of the target area by a beamformer from an input signal;
Non-target area sound extracting means for extracting non-target area sound existing in the target area direction due to directivity formed by the directivity forming means;
A target area sound extraction means for outputting a sound extracted as a result of extracting a target area sound using a non-target area sound existing in a target area direction extracted by the non-target area sound extraction means from the output of the beam former; ,
Band dividing means for dividing the input signal and the extracted sound into a plurality of bands, respectively.
Power spectrum ratio calculating means for calculating a power spectrum ratio of the input signal and the extracted sound for each divided band divided by the band dividing means;
Determining means for determining whether or not a target area sound exists in the input signal, using the power spectrum ratio for each divided band calculated by the power spectrum ratio calculating means;
A sound collection program which functions as an output means for outputting the extracted sound as a sound collection result when it is determined by the determination means that a target area sound exists.

Directivity forming means, non-target area sound extraction means, target area sound extraction means, band division means, power spectrum ratio calculation means, determination means, and output means,
The directivity forming means forms directivity in the direction of the target area by a beamformer from an input signal,
The non-target area sound extracting means extracts non-target area sound existing in the target area direction due to the directivity formed by the directivity forming means,
The target area sound extraction means extracts the extracted sound as a result of extracting the target area sound from the output of the beamformer using the non-target area sound existing in the target area direction extracted by the non-target area sound extraction means. Output,
The band dividing unit divides the input signal and the extracted sound into a plurality of bands,
The power spectrum ratio calculating means calculates a power spectrum ratio of the input signal and the extracted sound for each divided band divided by the band dividing means,
The determination means determines whether or not there is a target area sound in the input signal using the power spectrum ratio for each divided band calculated by the power spectrum ratio calculation means,
The output means outputs the extracted sound as a sound collection result when the determination means determines that a target area sound is present.

Directivity forming means for forming directivity in the direction of the target area by a beamformer from an input signal;
Non-target area sound extracting means for extracting non-target area sound existing in the target area direction due to directivity formed by the directivity forming means;
A target area sound extraction means for outputting a sound extracted as a result of extracting a target area sound using a non-target area sound existing in a target area direction extracted by the non-target area sound extraction means from the output of the beam former; ,
Band dividing means for dividing the input signal and the extracted sound into a plurality of bands, respectively.
Power spectrum ratio calculating means for calculating a power spectrum ratio of the input signal and the extracted sound for each divided band divided by the band dividing means;
And a determination unit that determines whether or not a target area sound exists in the input signal using the power spectrum ratio for each divided band calculated by the power spectrum ratio calculation unit.

Computer
Directivity forming means for forming directivity in the direction of the target area by a beamformer from an input signal;
Non-target area sound extracting means for extracting non-target area sound existing in the target area direction due to directivity formed by the directivity forming means;
A target area sound extraction means for outputting a sound extracted as a result of extracting a target area sound using a non-target area sound existing in a target area direction extracted by the non-target area sound extraction means from the output of the beam former; ,
Band dividing means for dividing the input signal and the extracted sound into a plurality of bands, respectively.
Power spectrum ratio calculating means for calculating a power spectrum ratio of the input signal and the extracted sound for each divided band divided by the band dividing means;
The determination functioning as a determination unit that determines whether or not a target area sound exists in the input signal, using the power spectrum ratio for each divided band calculated by the power spectrum ratio calculation unit. program.

Directivity forming means, non-target area sound extraction means, target area sound extraction means, band division means, power spectrum ratio calculation means, and determination means,
The directivity forming means forms directivity in the direction of the target area by a beamformer from an input signal,
The non-target area sound extracting means extracts non-target area sound existing in the target area direction due to the directivity formed by the directivity forming means,
The target area sound extraction means extracts the extracted sound as a result of extracting the target area sound from the output of the beamformer using the non-target area sound existing in the target area direction extracted by the non-target area sound extraction means. Output,
The band dividing unit divides the input signal and the extracted sound into a plurality of bands,
The power spectrum ratio calculating means calculates a power spectrum ratio of the input signal and the extracted sound for each divided band divided by the band dividing means,
The determination means determines whether or not a target area sound exists in the input signal, using the power spectrum ratio for each divided band calculated by the power spectrum ratio calculation means.