JP6182169B2

JP6182169B2 - Sound collecting apparatus, method and program thereof

Info

Publication number: JP6182169B2
Application number: JP2015005646A
Authority: JP
Inventors: 悠馬小泉; 健太丹羽; 小林　和則; 和則小林; 仲大室
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-01-15
Filing date: 2015-01-15
Publication date: 2017-08-16
Anticipated expiration: 2035-01-15
Also published as: JP2016131343A

Description

本発明は、マイクロホンアレイ信号処理技術に関する。より詳しくは、複数の領域で発生した音声を領域毎に分離して、同時に強調するマイクロホンアレイ収音技術に関する。 The present invention relates to a microphone array signal processing technique. More specifically, the present invention relates to a microphone array sound collection technique in which sounds generated in a plurality of areas are separated for each area and simultaneously enhanced.

例えば、図１のように、３６０°を事前に決められたL個(ただし、Lは２以上の整数の何れか)の領域に分割し、それぞれの領域で発生した音声を別々に録音することができると、会議などで話者ごと、もしくは、座席ごとに議事録を作成することが容易になる。また、全方位カメラなどの装置を利用して録画した映像を選択的に組み合わせることで、視聴者の所望の方向の映像を再生するインタラクティブ・パノラマ映像技術（例えば特許文献１参照）のような映像視聴系と組み合わせると、視聴方向に存在する音声のみを再生することも可能になる。なお、この明細書では、「音声」は、人の発する声に限定されるものではなく、人や動物の声はもとより楽音や環境雑音など「音」一般を指す。 For example, as shown in FIG. 1, 360 ° is divided into L predetermined areas (where L is an integer of 2 or more), and the sound generated in each area is recorded separately. If it is possible, it will be easy to create minutes for each speaker or for each seat in a meeting. In addition, video such as interactive panoramic video technology (see, for example, Patent Document 1) that reproduces video in a desired direction of a viewer by selectively combining video recorded using an apparatus such as an omnidirectional camera. When combined with a viewing system, it is also possible to reproduce only the sound existing in the viewing direction. In this specification, “speech” is not limited to a voice uttered by a person, but refers to a general “sound” such as a musical sound or an environmental noise as well as a voice of a person or an animal.

このような、特定の方向・領域から到来する音声を強調して収音する技術は、「狭指向音声強調技術」と呼ばれ、従来から研究・開発されている。代表的な狭指向音声強調として、指向性マイクロホンのように振動版の表裏に加わる音圧の位相差を利用するものや、マイクロホンアレイを用いた線形フィルタリングによるビームフォーマなど、複数のマイクロホンと信号処理を用いるもの（以降「アレイ収音」と呼ぶ）がある。 Such a technology for enhancing and collecting sound coming from a specific direction / region is called “narrow-directed speech enhancement technology” and has been researched and developed conventionally. Typical narrow-directional speech enhancement uses multiple microphones and signal processing, such as those that use the phase difference of the sound pressure applied to the front and back of the vibration plate, such as directional microphones, and beamformers that use linear filtering using a microphone array (Hereinafter referred to as “array sound collection”).

マイクロホンの周囲の方向とマイクロホンの感度との関係は指向性と呼ばれ、ある方向（目的方向）への指向性が鋭いほど、その方向を含む狭い領域の音声を強調し、それ以外の領域で発生した音声を抑圧することができる。図３は、６個のマイクロホンを４．０ｃｍ間隔で、同一平面上の正六角形の頂点に配置し、非特許文献１で述べられている最尤法で設計した線形フィルタによる狭指向音声強調の指向性の例である。ゲインが最大となる目的方向からゲインが−１５ｄＢに低下する方向までの幅が、約±７５°程度あることが見て取れる。以降、この明細書では便宜上、指向性が目的方向のゲインから所定の値（例えば−１５ｄＢ）に低下するまでの領域を「ビーム領域」と呼ぶ。 The relationship between the direction of the microphone and the sensitivity of the microphone is called directivity. The sharper the directivity in a certain direction (target direction), the more the sound in a narrow area including that direction is emphasized, and the other areas The generated voice can be suppressed. FIG. 3 shows a narrow-directional speech enhancement using a linear filter designed by the maximum likelihood method described in Non-Patent Document 1, in which six microphones are arranged at the apex of a regular hexagon on the same plane at intervals of 4.0 cm. This is an example of directivity. It can be seen that the width from the target direction where the gain is maximized to the direction where the gain is reduced to −15 dB is about ± 75 °. Hereinafter, in this specification, for the sake of convenience, an area from when the directivity decreases to a predetermined value (for example, −15 dB) from the gain in the target direction is referred to as a “beam area”.

したがって、狭い間隔（隣接する目的方向の成す角度が小さい状態）で複数の狭指向音声強調を行う場合、図２のようにビーム領域にオーバーラップ領域（以下「重畳領域」ともいう）が生じる。オーバーラップ領域に音源が存在する場合、両方のビーム出力に同一の音源が含まれる。図４は、第一出力の目的方向を０°、第二出力の目的方向を６０°とした２つの単一指向性マイクで、到来方向が３０°の音源を収音した出力波形例である。出力波形から、それぞれのビームに音源が、ほぼ等しい音量で含まれていることがわかる。 Accordingly, when a plurality of narrow-directional speech enhancements are performed at a narrow interval (a state in which an angle between adjacent target directions is small), an overlap region (hereinafter also referred to as “superimposition region”) is generated in the beam region as shown in FIG. When a sound source exists in the overlap region, the same sound source is included in both beam outputs. FIG. 4 is an example of an output waveform in which a sound source having an arrival direction of 30 ° is collected by two unidirectional microphones having a first output target direction of 0 ° and a second output target direction of 60 °. . From the output waveform, it can be seen that the sound source is included in each beam at substantially the same volume.

これは、複数領域の狭指向音声強調を独立に実行しても、図１のように領域を分割した録音ができないことを示している。 This indicates that even if narrow direction speech enhancement of a plurality of areas is executed independently, recording with divided areas cannot be performed as shown in FIG.

特開２０１２−２０９６３４号公報JP 2012-209634 A

浅野太, ”音のアレイ信号処理 -音源の定位・追跡と分離-”, コロナ社, 2011年, pp.82-83.Tadashi Asano, “Sound Array Signal Processing-Localization, Tracking and Separation of Sound Sources”, Corona, 2011, pp.82-83.

上述の問題を解決するためには、各狭指向音声強調の信号処理部が連携して音声を、領域毎に分離して、強調する必要がある。言い換えれば、あるビームで強調された音声はその他のビームでは雑音とみなして抑圧するなどの、オーバーラップ領域内の音声を制御する信号処理技術が必要である。上述のような状況を鑑み、本発明では、二つのビーム出力に同一の音源からの入力が含まれた場合、一方のビームの音源出力を抑圧する、非線形フィルタリングによるアレイ収音技術を提供する。 In order to solve the above-described problem, it is necessary for each narrow-directional speech enhancement signal processor to cooperate to separate and enhance speech for each region. In other words, there is a need for a signal processing technique for controlling the speech in the overlap region, such as suppressing the speech enhanced by a certain beam as noise in other beams. In view of the situation as described above, the present invention provides an array sound collection technique by nonlinear filtering that suppresses the sound source output of one beam when two beam outputs include the input from the same sound source.

本発明が解決しようとする問題は、図５の音源１のように、オーバーラップ領域に音源が存在する場合に生じる。この場合、ビーム領域１とビーム領域２のどちらかの、音源１に対する感度を抑圧する必要がある。 The problem to be solved by the present invention occurs when a sound source is present in the overlap region as in the sound source 1 of FIG. In this case, it is necessary to suppress the sensitivity to the sound source 1 in either the beam region 1 or the beam region 2.

そのためには、オーバーラップ領域に音源が存在するかどうかを判別する必要がある。最も容易な方法として、ビーム領域１と２の出力を比較し、ある一定以上の類似度であれば片方を抑圧する、という処理が考えられる。しかし、例えば音源２のように、片方のビーム領域にのみ影響を与える音源が、音源１と同時に存在すると、類似度が低くなり、単純な比較では、オーバーラップ領域の音源の有無は判別できない。 For this purpose, it is necessary to determine whether or not a sound source exists in the overlap region. As the easiest method, it is conceivable to compare the outputs of the beam areas 1 and 2 and suppress one of them if the similarity is a certain level or higher. However, if a sound source that affects only one beam region, such as the sound source 2, exists simultaneously with the sound source 1, the similarity is low, and the presence or absence of a sound source in the overlap region cannot be determined by simple comparison.

本発明は、類似度を用いずにオーバーラップ領域に音源が存在するかどうかを判別し、オーバーラップ領域に音源が存在する場合、オーバーラップ領域を形成する何れかのビーム領域から得られる信号に対して、オーバーラップ領域に存在する音源に対する感度を抑圧する処理を行う収音装置、その方法及びプログラムを提供することを目的とする。 The present invention discriminates whether or not a sound source exists in the overlap region without using the similarity, and when a sound source exists in the overlap region, the signal obtained from any of the beam regions forming the overlap region is used. On the other hand, an object of the present invention is to provide a sound collection device, a method and a program for performing processing for suppressing sensitivity to a sound source existing in an overlap region.

上記の課題を解決するために、本発明の一態様によれば、収音装置は、M及びLをそれぞれ２以上の整数の何れかとし、M個のマイクロホンで構成したマイクロホンアレイでL個の方向をそれぞれ目的方向とするL個のメインビーム領域を形成し、メインビーム領域ごとに音を収音する。収音装置は、L個のメインビーム領域を形成している状態で、隣接する２つのメインビーム領域の重畳領域を目的方向とするサブビーム領域に音源が存在することを検出した場合、当該音源が２つのメインビーム領域のどちらに近い側に存在しているかを、メインビーム領域およびサブビーム領域の音を強調した強調信号の特徴量に基づいて判別することで、当該音源に近くないと判定したメインビーム領域側の信号に含まれる当該音源の音を抑圧するように、調整する。 In order to solve the above-described problems, according to one aspect of the present invention, a sound collection device is configured such that M and L are each an integer equal to or greater than 2, and L microphones are configured with M microphones. L main beam regions each having a direction as a target direction are formed, and sound is collected for each main beam region. When the sound collection device detects that a sound source exists in a sub beam region whose target direction is a superposition region of two adjacent main beam regions in a state where L main beam regions are formed, the sound source By determining based on the feature amount of the enhancement signal that emphasizes the sound in the main beam region and the sub beam region, which of the two main beam regions is closer to the main beam region, it is determined that the main beam region is not close to the sound source. Adjustment is performed so as to suppress the sound of the sound source included in the signal on the beam region side.

上記の課題を解決するために、本発明の他の態様によれば、収音方法は、M及びLをそれぞれ２以上の整数の何れかとし、M個のマイクロホンで構成したマイクロホンアレイでL個の方向をそれぞれ目的方向とするL個のメインビーム領域を形成し、メインビーム領域ごとに音を収音する。収音方法は、L個のメインビーム領域を形成している状態で、隣接する２つのメインビーム領域の重畳領域を目的方向とするサブビーム領域に音源が存在することを検出した場合、当該音源が２つのメインビーム領域のどちらに近い側に存在しているかを、メインビーム領域およびサブビーム領域の音を強調した強調信号の特徴量に基づいて判別することで、当該音源に近くないと判定したメインビーム領域側の信号に含まれる当該音源の音を抑圧するように、調整する。 In order to solve the above-described problem, according to another aspect of the present invention, a sound collection method is performed by using a microphone array configured with M microphones, where M and L are each an integer of 2 or more. L main beam areas are formed, each of which is a target direction, and sound is collected for each main beam area. When the sound collection method detects that a sound source exists in a sub-beam region whose target direction is an overlap region of two adjacent main beam regions in a state where L main beam regions are formed, the sound source By determining based on the feature amount of the enhancement signal that emphasizes the sound in the main beam region and the sub beam region, which of the two main beam regions is closer to the main beam region, it is determined that the main beam region is not close to the sound source. Adjustment is performed so as to suppress the sound of the sound source included in the signal on the beam region side.

本発明によれば、類似度を用いずにオーバーラップ領域に音源が存在するかどうかを判別し、オーバーラップ領域に音源が存在する場合、オーバーラップ領域を形成する何れかのビーム領域から得られる信号に対して、オーバーラップ領域に存在する音源に対する感度を抑圧する処理を行うことができるという効果を奏する。 According to the present invention, it is determined whether or not a sound source exists in the overlap region without using the similarity, and if a sound source exists in the overlap region, it is obtained from any of the beam regions that form the overlap region. There is an effect that it is possible to perform processing for suppressing the sensitivity to the sound source existing in the overlap region on the signal.

収音区間の分割例を示す図。The figure which shows the example of a division | segmentation of a sound collection area. 独立した狭指向音声強調技術による収音区間の分割例を示す図。The figure which shows the example of a division | segmentation of the sound collection area by the independent narrow directivity audio enhancement technique. 狭指向音声強調の指向性の例を示す図。The figure which shows the example of the directivity of narrow directivity voice emphasis. オーバーラップ領域に音源が存在した場合の狭指向音声強調の出力例を示す図。The figure which shows the output example of the narrow directivity audio | voice emphasis when a sound source exists in an overlap area | region. オーバーラップ領域に音源が存在する場合の配置例を示す図。The figure which shows the example of arrangement | positioning when a sound source exists in an overlap area | region. サブビームによるオーバーラップ領域の音源の検知を示す図。The figure which shows the detection of the sound source of the overlap area | region by a sub beam. 第一実施形態に係る収音装置の機能ブロック図。The functional block diagram of the sound collection device which concerns on 1st embodiment. 第一実施形態に係る収音装置の処理フローの例を示す図。The figure which shows the example of the processing flow of the sound collection device which concerns on 1st embodiment. ビーム形成部の構成例を示す図。The figure which shows the structural example of a beam formation part. ウィナーゲイン計算部の処理フローの例を示す図。The figure which shows the example of the processing flow of a winner gain calculation part. ウィナーゲイン計算部の構成例を示す図。The figure which shows the structural example of a winner gain calculation part. ビーム再形成部の構成例を示す図。The figure which shows the structural example of a beam reforming part. 第一実施形態のビーム再形成部の処理フローの例を示す図。The figure which shows the example of the processing flow of the beam reforming part of 1st embodiment.

以下、本発明の実施形態について、説明する。なお、以下の説明に用いる図面では、同じ機能を持つ構成部や同じ処理を行うステップには同一の符号を記し、重複説明を省略する。以下の説明において、テキスト中で使用する記号「^-」「~」等は、本来直前の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直後に記載する。式中においてはこれらの記号は本来の位置に記述している。また、ベクトルや行列の各要素単位で行われる処理は、特に断りが無い限り、そのベクトルやその行列の全ての要素に対して適用されるものとする。 Hereinafter, embodiments of the present invention will be described. In the drawings used for the following description, constituent parts having the same function and steps for performing the same process are denoted by the same reference numerals, and redundant description is omitted. In the following explanation, the symbols `` ^- '', `` ~ '', etc. used in the text should be described immediately above the character immediately before, but are described immediately after the character due to restrictions on the text notation. . In the formula, these symbols are written in their original positions. Further, the processing performed for each element of a vector or matrix is applied to all elements of the vector or matrix unless otherwise specified.

＜第一実施形態＞
本実施形態では図６のように、出力用のビーム（便宜上、以降メインビームと呼ぶ）とは別に、オーバーラップ領域を目的方向とするビーム（便宜上、以降サブビームと呼ぶ）を用いる。なお、各ビームはマイクロホンアレイを用いたビームフォーミングにより形成してもよいし、単一の指向性マイクロホンを用いて形成してもよい。サブビーム領域で音源を検知すると、隣接するメインビーム領域の片方の出力から検知した音源を抑圧する非線形フィルタを生成し、適用する。本実施形態では、非線形フィルタとしてウィナーフィルタによる時間周波数マスクを用いる。 <First embodiment>
In this embodiment, as shown in FIG. 6, a beam having an overlap region in a target direction (for convenience, hereinafter referred to as a sub beam) is used in addition to an output beam (for convenience, hereinafter referred to as a main beam). Each beam may be formed by beam forming using a microphone array, or may be formed using a single directional microphone. When a sound source is detected in the sub-beam region, a nonlinear filter that suppresses the detected sound source from one output of the adjacent main beam region is generated and applied. In the present embodiment, a time frequency mask using a Wiener filter is used as the nonlinear filter.

本実施形態の説明に先立ち、前提技術となる、マイクロホンアレイを用いた線形フィルタリングによる狭指向音声強調と、単一指向性マイクロホンによる狭指向音声強調とを説明し、定式化する。 Prior to the description of the present embodiment, narrow-directional speech enhancement by linear filtering using a microphone array and narrow-directional speech enhancement by a unidirectional microphone, which are prerequisite technologies, will be described and formulated.

マイクロホンアレイを用いた線形フィルタリングによる狭指向音声強調は、各マイクロホンで収音された信号に、時間差や音圧レベル差の情報が含まれているフィルタを掛けて重畳することで、目的方向の音声を強調する。音響管マイクやパラボラマイクとは異なり、アレイ収音は信号処理によって音声強調を行うため、機械的にマイクを回転させることなく、任意の方向の音声を強調することができる。 Narrow-directed speech enhancement by linear filtering using a microphone array is performed by superimposing the signal collected by each microphone with a filter containing information on time difference or sound pressure level difference, and then superimposing the sound in the target direction. To emphasize. Unlike acoustic tube microphones and parabola microphones, array sound collection performs sound enhancement by signal processing, so that sound in an arbitrary direction can be enhanced without mechanically rotating the microphone.

今、P個の音源信号s_p(t)をM個のマイクロホンを用いて収音し、L方向のビーム空間を同時に形成することを考える。ただし、Pは１以上の整数の何れか、M及びLはそれぞれ２以上の整数の何れかであり、p=1,2,…,P、tは離散時間を表すインデックスである。 Now, consider that P number of source signal s _p (t) is picked up by using the M microphones, formed at the same time L direction of the beam space. Here, P is any integer greater than 1, M and L are each any integer greater than 2, and p = 1, 2,..., P, t are indices representing discrete time.

M個のマイクロホンで収音される収音信号x_m(t)は、p番目の音源位置からm番目のマイクロホンまでの伝達特性a_p,mを用いて次のように記述できる。

ただし、m=1,2,…,Mであり、*は畳み込み演算である。したがって、時間領域の収音信号x_m(t)を離散フーリエ変換して得られる周波数領域の収音信号X_m(ω,k)は以下のように記述できる。

ここで、A_p,m(ω)とS_p(ω,k)はそれぞれa_p,mとs_p(t)をフーリエ変換して得られる周波数領域の伝達特性と音源、kは時間フレームのインデックス、ωは周波数ビンのインデックスを表す。さらに上付き文字のTを転置とし、x^-(ω,k)=(X₁(ω,k),…,X_M(ω,k))^T、s^-(ω,k)=(S₁(ω,k),…,S_P(ω,k))^T、a^- _p(ω)=(A_p,1(ω),…,A_p,M(ω))^T、A^-(ω)=(a^- ₁(ω),…,a^- _P(ω))のように行列形式で書き直すと、x^-(ω,k)は以下のように記述できる。
x^-(ω,k)=A^-(ω)s^-(ω,k) (3) The collected sound signal x _m (t) picked up by M microphones can be described as follows using transfer characteristics a _{p, m} from the p-th sound source position to the m-th microphone.

However, m = 1, 2,..., M, and * is a convolution operation. Therefore, the frequency domain sound pickup signal X _m (ω, k) obtained by discrete Fourier transform of the time domain sound pickup signal x _m (t) can be described as follows.

_Here, A _{p, m} (ω) and S _p (ω, k), respectively a _p, transfer characteristics and a sound source in the frequency domain obtained by _m and s _p (t) is the Fourier transform, k is the time frame The index, ω, represents the frequency bin index. In addition to the T superscript and ^{transpose, x - (ω, k)} = (X 1 (ω, k), ..., X M (ω, k)) T, s - (ω, k) = (S 1 (ω, k), ..., S P (ω, k)) T, a - p (ω) = (A p, 1 (ω), ..., A p, M (ω)) T, A - (ω ^{_{) = (a - 1 (ω}} ), ..., a - rewritten in matrix form as _{^{P (ω)), x -}} (ω, k) can be described as follows.
^{x - (ω, k) =} A - (ω) s - (ω, k) (3)

線形フィルタリングによるアレイ収音信号強調技術は、収音信号x_m(t)に対して線形フィルタw_m(θ)を畳み込むことで、目的方向θの音声を強調する。この線形フィルタは、マイクロホンの位置差を補償する方法や、目的方向以外の音声パワーを最小化する方法などの既存技術で設計できる。時間領域の線形フィルタw_m(θ)を離散フーリエ変換して得られる周波数領域の線形フィルタをW_m(ω,θ)と置くと、狭指向強調信号（以下「強調信号」ともいう）Y(ω,k)は

と書くことができ、またw^-(ω,θ)=(W₁(ω,θ),…,W_M(ω,θ))^Hと置くことで行列形式を用いて
Y(ω,k,θ)=w^-(ω,θ)^Hx^-(ω,k) (5)
と記述できる。ここで上付き文字のHはエルミート転置を表す。さらに、L方向の目的方向{θ_l|l∈1,…,L}の出力信号Y(ω,k,l)も、y^-(ω,k)=(Y(ω,k,1),…,Y(ω,k,L))^T、W^-(ω)=(w^-(ω,θ₁),…,w(ω,θ_L))^Hと置くことで
y^-(ω,k)=W^-(ω)x^-(ω,k) (6)
と書くことができる。 The array collected signal enhancement technique by linear filtering enhances the sound in the target direction θ by convolving a linear filter w _m (θ) with the collected signal x _m (t). This linear filter can be designed by existing techniques such as a method of compensating for the positional difference of the microphone and a method of minimizing the sound power other than the target direction. When a frequency domain linear filter obtained by performing discrete Fourier transform on the time domain linear filter w _m (θ) is W _m (ω, θ), a narrow directivity enhancement signal (hereinafter also referred to as “emphasis signal”) Y ( (ω, k) is

And it can be written, also ^{w - (ω, θ) =} (W 1 (ω, θ), ..., W M (ω, θ)) using a matrix form by placing the ^H
Y (ω, k, θ) = w - (ω, θ) H x - (ω, k) (5)
Can be described. Here, the superscript H represents Hermitian transpose. Further, the output signal Y (ω, k, l) of the target direction {θ _l | l∈1,..., L} in the L direction is also y ⁻ (ω, k) = (Y (ω, k, 1), ..., Y (ω, k, L)) T, W - (ω) = (w - (ω, θ 1), ..., w (ω, θ L) by placing a) ^H
^{y - (ω, k) =} W - (ω) x - (ω, k) (6)
Can be written.

指向性マイクロホンは、例えば、振動版の表裏に加わる音圧の位相差を利用して指向性を実現する。参考文献１によると、その指向性は

と近似的に記述できることが知られており、α=1で単一指向性マイクロホンの指向特性となる。ここでθは、基準となる方向に対するマイクロホンと音源とのなす角である。
（参考文献１）中島平太郎編, ”応用電気音響-”, コロナ社, 1979年、pp.16-17. The directional microphone realizes directivity by using, for example, a phase difference between sound pressures applied to the front and back of the vibration plate. According to Reference 1, the directivity is

It is known that the directional characteristics of a unidirectional microphone can be obtained when α = 1. Here, θ is an angle formed by the microphone and the sound source with respect to the reference direction.
(Reference 1) Hirataro Nakajima, “Applied electroacoustics”, Corona, 1979, pp.16-17.

指向性マイクロホンが音源の位相を変化させず、またその指向性がすべての周波数で一定と仮定すると、収音信号X_m(ω,k)と音源信号S_p(ω,k,θ_p)の関係は以下のように記述できる。

ただし、θ_pは、基準となる方向に対するマイクロホンアレイの中心と音源pとのなす角である。さらに、h^- _p=(H₁(θ_p),…,H_M(θ_p))^T（ただし、H_m(θ_p)は、角θ_pにおけるマイクロホンmの音源pに対する指向性を表す）、θ^-=(θ₁,…,θ_P)、H^-(θ^-)=(h^- ₁,…,h^- _P)、s^-(ω,k,θ^-)=(S₁(ω,k,θ₁),…,S_P(ω,k,θ_P))^Tと置くと、
x^-(ω,k)=prod[H^-(θ^-),A^-(ω)]s^-(ω,k,θ^-) (9)
と書くことができる。
ここでprod[A,B]は2つの行列A,Bの要素ごとの積を表す。 Assuming that the directional microphone does not change the phase of the sound source and that the directivity is constant at all frequencies, the collected sound signal X _m (ω, k) and the sound source signal S _p (ω, k, θ _p ) The relationship can be described as follows:

Here, θ _p is an angle formed by the center of the microphone array and the sound source p with respect to the reference direction. Furthermore, h ⁻ _p = (H ₁ (θ _p ),..., H _M (θ _p )) ^T (where H _m (θ _p ) represents the directivity of the microphone m with respect to the sound source p at the angle θ _p ) ^{_{, θ - = (θ 1,}} ..., θ P), H - (θ -) = (h - 1, ..., h - P), s - (ω, k, θ -) = (S 1 (ω, k, θ ₁ ),…, S _P (ω, k, θ _P )) ^T
^{x - (ω, k) =} prod [H - (θ -), A - (ω)] s - (ω, k, θ -) (9)
Can be written.
Here, prod [A, B] represents the product of each element of the two matrices A and B.

したがって、指向性マイクロホンで収音された収音信号は、収音された段階で線形フィルタリングが行われたものともみなせば、強調信号は
y^-(ω,k)=x^-(ω,k) (10)
である。 Therefore, if the collected sound signal collected by the directional microphone is considered to have been subjected to linear filtering at the stage of sound collection, the enhancement signal is
^{y - (ω, k) =} x - (ω, k) (10)
It is.

以下、本実施形態について説明する。
＜第一実施形態に係る収音装置１００＞
図７は第一実施形態に係る収音装置１００の機能ブロック図を、図８はその処理フローを示す。 Hereinafter, this embodiment will be described.
<Sound Collection Device 100 according to First Embodiment>
FIG. 7 is a functional block diagram of the sound collecting apparatus 100 according to the first embodiment, and FIG. 8 shows a processing flow thereof.

収音装置１００は、M個のマイクロホンmからなるマイクロホンアレイのアナログ収音信号x₁(a),…,x_M(a)を入力とし、メインビーム領域l(エル)から発せられる音を強調した再生信号z₁(t),…,ｚ_L(t)を出力する。M及びLはそれぞれ２以上の整数の何れかである。a、tはそれぞれ連続時間、連続時間を示すインデックスであり、Lはメインビームの個数を表し、m=1,2,…,M、l=1,2,…,Lである。収音装置１００は、マイクロホンアレイでL個の方向をそれぞれ目的方向とするL個のメインビーム領域を形成し、メインビーム領域ごとに音を収音する。なお、収音装置１００は、L個のメインビーム領域を形成している状態で、隣接する２つのメインビーム領域の重畳領域を目的方向とするサブビーム領域に音源が存在することを検出した場合、当該音源が２つのメインビーム領域のどちらに近い側に存在しているかを、メインビーム領域およびサブビーム領域の音を強調した強調信号の特徴量に基づいて判別することで、当該音源に近いと判定したメインビーム領域側に当該音源からの信号が含まれることとし、当該音源に近くないと判定したメインビーム領域側の信号に含まれる当該音源の音を抑圧するように、調整する。例えば、後述するビーム再形成部１４０、フィルタ適用部１５０において、このような調整を行う。 The sound collection device 100 receives an analog sound collection signal x ₁ (a),..., X _M (a) of a microphone array composed of M microphones m, and emphasizes a sound emitted from the main beam region l (el). reproduction signal z ₁ (t) that, ..., and outputs a z _L (t). M and L are each an integer of 2 or more. a and t are continuous time and an index indicating the continuous time, respectively, L represents the number of main beams, and m = 1, 2,..., M, l = 1, 2,. The sound collection device 100 forms L main beam regions each having L directions as target directions in a microphone array, and collects sound for each main beam region. When the sound collection device 100 detects the presence of a sound source in a sub beam region whose target direction is a superposition region of two adjacent main beam regions in a state where L main beam regions are formed, It is determined that the sound source is closer to the sound source by determining which of the two main beam regions is closer based on the feature amount of the enhancement signal that emphasizes the sound of the main beam region and the sub beam region. It is assumed that the signal from the sound source is included in the main beam region side, and adjustment is performed so as to suppress the sound of the sound source included in the signal on the main beam region side determined not to be close to the sound source. For example, such adjustment is performed in the beam reforming unit 140 and the filter application unit 150 described later.

なお、L個のメインビーム領域はそれぞれ、マイクロホンアレイの中心に向かって到来する音を収音するように形成される（図２参照）。サブビーム領域の個数をL個以上とし、更に各サブビーム領域をN個(N≧1)のサブビーム領域に細分化する。 Note that each of the L main beam regions is formed so as to pick up sound coming toward the center of the microphone array (see FIG. 2). The number of sub-beam regions is set to L or more, and each sub-beam region is further subdivided into N (N ≧ 1) sub-beam regions.

収音装置１００は、入力部１１０、周波数領域変換部１２０、ビーム形成部１３０、ビーム再形成部１４０、フィルタ適用部１５０及び時間領域変換部１６０を含む。 The sound collection device 100 includes an input unit 110, a frequency domain conversion unit 120, a beam forming unit 130, a beam reforming unit 140, a filter application unit 150, and a time domain conversion unit 160.

メインビームの目的方向θ_l,0は任意であるが、説明の簡単のために、３６０°に等角度で配置することを考える。例えばメインビームの個数がL=6であれば、l番目のメインビームとl+1番目のメインビーム間の間隔θ_l+1,0-θ_l,0は６０°となる。 Although the target direction θ _{l, 0} of the main beam is arbitrary, it is assumed that it is arranged at an equal angle of 360 ° for ease of explanation. For example, if the number of main beams is L = 6, the interval θ _{l + 1,0} −θ _{l, 0} between the l-th main beam and the l + 1-th main beam is 60 °.

1セット(この例では1セット6個)のメインビームに対するサブビームの個数も任意であるが、サブビームは、メインビームの間（言い換えると、二つのメインビーム領域が成すオーバーラップ領域）に放たれるため、サブビームの個数をL個以上用意するのが望ましい。しかし、1セットのサブビームの個数が増加すると後述するパワースペクトル密度(PSD)推定の逆行列計算が不安定となるため、1セットのサブビームの個数もLであることが望ましい。本実施形態では、L個のメインビームに対してL個のサブビームを設定したものとして説明する。 The number of sub-beams for one set (6 in this example) of the main beam is arbitrary, but the sub-beams are released between the main beams (in other words, the overlap region formed by the two main beam regions). Therefore, it is desirable to prepare L or more sub-beams. However, since the inverse matrix calculation of power spectral density (PSD) estimation, which will be described later, becomes unstable as the number of sub-beams in one set increases, the number of sub-beams in one set is preferably L. In the present embodiment, a description will be given assuming that L sub-beams are set for L main beams.

さらに、本実施形態では、1つのサブビーム領域をさらに細分化してN個のサブビーム領域に分割する。また、分割数Nも任意であり、例えばN=3などに設定できる。なお、N個に分割前の1つのサブビーム領域を便宜上サブセット領域ともいう。 Further, in the present embodiment, one sub-beam region is further subdivided and divided into N sub-beam regions. The number of divisions N is also arbitrary, and can be set to N = 3, for example. Note that one sub-beam region before being divided into N is also referred to as a subset region for convenience.

細分化したサブビーム領域のサブビームの目的方向θ_l,nも任意である。隣接するメインビームの間に等間隔で配置するならば、l番目のメインビーム領域とl+1番目のメインビーム領域との間のサブセット領域内のn番目のサブビーム領域の目的方向θ_l,nは、

などに設定できる。 The target direction θ _{l, n} of the sub beam in the sub beam region is also arbitrary. If they are arranged at equal intervals between adjacent main beams, the target direction θ _{l, n} of the n-th sub-beam region in the subset region between the l-th main beam region and the l + 1-th main beam region Is

Etc. can be set.

以下、各部の処理内容について説明する。
＜入力部１１０＞
入力部１１０は、M個のマイクロホンmでそれぞれ収音したM個のアナログ収音信号x_m(a)を受け取り、これらの値に対してＡＤ変換を行い、各マイクロホンmに対応するディジタル収音信号x_m(t)に変換し（Ｓ１１０）、周波数領域変換部１２０に出力する。なお、ここで用いるM個のマイクロホンmの指向性は問わず、たとえば無指向性マイクロホンを用いることも、単一指向性マイクロホンを用いることもできる。 Hereinafter, the processing content of each part is demonstrated.
<Input unit 110>
The input unit 110 receives M analog sound pickup signals x _m (a) picked up by M microphones m, performs AD conversion on these values, and picks up digital sound pickup corresponding to each microphone m. The signal is converted to a signal x _m (t) (S110) and output to the frequency domain converter 120. The directivity of the M microphones m used here is not limited, and for example, an omnidirectional microphone or a unidirectional microphone can be used.

＜周波数領域変換部１２０＞
周波数領域変換部１２０は、M個のデジタル収音信号x_m(t)を受け取り、時間フレームkあたりQ点のデジタル収音信号x_m((k-1)Q+1),x_m((k-1)Q+2),…,x_m(kQ)をバッファに貯め、離散フーリエ変換などの手法で周波数領域の収音信号X_m(ω,k)に変換し（Ｓ１２０）、ビーム形成部１３０に出力する。例えば、Qは16KHzサンプリングの場合で、２５６や５１２に設定できる。 <Frequency domain transform unit 120>
The frequency domain transform unit 120 receives M digital sound pickup signals x _m (t), and Q points of the digital sound pickup signals x _m ((k−1) Q + 1), x _m (( k-1) Q + 2), ..., x _m (kQ) are stored in a buffer and converted into a frequency-domain sound pickup signal X _m (ω, k) by a method such as discrete Fourier transform (S120), and beam formation is performed. To the unit 130. For example, Q can be set to 256 or 512 in the case of 16 KHz sampling.

＜ビーム形成部１３０＞
ビーム形成部１３０は、M個の周波数領域収音信号X_m(ω,k)を受け取り、メインビームの目的方向{θ_l,0|l∈1,…,L}と、サブビームの目的方向{θ_l,n|l∈1,…,L|n∈1,…,N}に従い、ウィナーゲインG₀(ω,k,l)とG_n(ω,k,l)(n=1,2,…,N)とを計算し、ビーム再形成部１４０に出力する。なお、メインビームの目的方向{θ_l,0|l∈1,…,L}と、サブビームの目的方向{θ_l,n|l∈1,…,L|n∈1,…,N}とは、予め設定されているものとする。例えば、デフォルト値として設定されていてもよいし、収音装置１００の利用者により設定されてもよい。ウィナーゲインはウィナーフィルタで用いられるゲインであり、ウィナーフィルタは音声と雑音の相関が無いという前提で、本来の音源信号と推定した音源信号の平均二乗誤差を最小にするような線形フィルタである。本実施形態では、ビーム形成部１３０でメインビーム領域及びサブビーム領域のパワースペクトル密度（PSD）の推定値を用いて、ウィナーフィルタを求め、ビーム再形成部１４０でウィナーフィルタを修正し、隣接するメインビーム領域の重畳領域が存在しないようにする。 <Beam forming unit 130>
The beam forming unit 130 receives M frequency domain sound pickup signals X _m (ω, k), and the main beam target direction {θ _{l, 0} | lε1,..., L} and the sub-beam target direction { According to θ _{l, n} | l∈1,…, L | n∈1,…, N}, the winner gains G ₀ (ω, k, l) and G _n (ω, k, l) (n = 1,2 ,..., N) are calculated and output to the beam reforming unit 140. Note that the main beam target direction {θ _{l, 0} | l∈1, ..., L} and the sub-beam target direction {θ _{l, n} | l∈1, ..., L | n∈1, ..., N} Are set in advance. For example, it may be set as a default value, or may be set by the user of the sound collection device 100. The Wiener gain is a gain used in the Wiener filter. The Wiener filter is a linear filter that minimizes the mean square error between the original sound source signal and the estimated sound source signal on the assumption that there is no correlation between speech and noise. In the present embodiment, the beam forming unit 130 obtains a Wiener filter using the estimated power spectral density (PSD) values of the main beam region and the sub beam region, the beam reforming unit 140 corrects the Wiener filter, The overlapping area of the beam area should not exist.

無指向性マイクロホンを用いる場合は、メインビームの目的方向を強調する線形フィルタ係数W^- ₀(ω)(W_{0_(m,l)}(ω)を要素とするL×Mの行列)とサブビームの目的方向を強調する線形フィルタ係数W^- _n(ω)(W_{n_(m,l)}(ω)を要素とするL×Mの行列)を用いて強調信号Y₀(ω,k,l)とY_n(ω,k,l)とを求め、出力する。
Y^- ₀(ω,k)=W^- ₀(ω)x^-(ω,k) (22)
Y^- ₀(ω,k)=(Y^- ₀(ω,k,1),Y^- ₀(ω,k,2),…,Y^- ₀(ω,k,L))
Y^- _n(ω,k)=W^- _n(ω)x^-(ω,k) (23)
Y^- _n(ω,k)=(Y^- _n(ω,k,1),Y^- _n(ω,k,2),…,Y^- _n(ω,k,L)) When using an omnidirectional microphone, the linear filter coefficient W ^- ₀ (ω) (L × M matrix with W _{0_ (m, l)} (ω) as an element) that emphasizes the target direction of the main beam and the sub beam Using the linear filter coefficient W ^- _n (ω) (L × M matrix with W _{n_ (m, l)} (ω) as an element) to emphasize the target direction, the enhancement signal Y ₀ (ω, k, l) and Find and output Y _n (ω, k, l).
^{_{Y - 0 (ω, k)}} = W - 0 (ω) x - (ω, k) (22)
Y ^- ₀ (ω, k) = (Y ^- ₀ (ω, k, 1), Y ^- ₀ (ω, k, 2),…, Y ^- ₀ (ω, k, L))
^{_{Y - n (ω, k)}} = W - n (ω) x - (ω, k) (23)
Y ^- _n (ω, k) = (Y ^- _n (ω, k, 1), Y ^- _n (ω, k, 2),…, Y ^- _n (ω, k, L))

また、１つの指向性マイクロホンで１つのメインビーム領域の音を収音する場合、M=Lであり、m=lとし、線形フィルタを用いずに、Y₀(ω,k,l)=X_l(ω,k)、Y_n(ω,k,l)=X_l(ω,k)として出力する。つまり、１つの指向性マイクロホンで１つのメインビーム領域の音を収音する場合、指向性マイクロホンの出力がメインビーム領域の音を強調する強調信号に相当し、サブビーム領域の音を強調する強調信号を必要としない。その理由については後述する。便宜上、１つの指向性マイクロホンで１つのメインビーム領域の音を収音する場合であっても、Y_n(ω,k,l)=X_l(ω,k)をサブビーム領域の音を強調する強調信号と呼ぶ。少なくともメインビーム領域の音を強調する処理がビーム形成（Ｓ１３０）に相当する。 Also, when collecting sound of one main beam region with one directional microphone, M = L, m = l, and Y ₀ (ω, k, l) = X without using a linear filter. Output as _l (ω, k), Y _n (ω, k, l) = X _l (ω, k). That is, when a single directional microphone picks up sound in one main beam region, the output of the directional microphone corresponds to an enhancement signal that emphasizes the sound in the main beam region, and an enhancement signal that enhances the sound in the sub beam region. Do not need. The reason will be described later. For convenience, even if a single directional microphone picks up sound in one main beam region, Y _n (ω, k, l) = X _l (ω, k) is emphasized in the sub-beam region. This is called an enhancement signal. The process of enhancing at least the sound in the main beam region corresponds to beam forming (S130).

＜ビーム再形成部１４０＞
ビーム再形成部１４０は、メインビームに対応するL個のウィナーゲインG₀(ω,k,l)及び強調信号Y₀(ω,k,l)と、サブビームに対応するL×N個のウィナーゲインG_n(ω,k,l)及び強調信号Y_n(ω,k,l)とを受け取り、これらの値を連携させて、収音領域を重畳領域のないL個のメインビーム領域に分割するL個のウィナーゲインG(ω,k,l)を求め（Ｓ１４０）、メインビームに対応するL個の強調信号Y₀(ω,k,l)と併せて出力する。 <Beam reforming unit 140>
The beam reformer 140 includes L winner gains G ₀ (ω, k, l) and enhancement signals Y ₀ (ω, k, l) corresponding to the main beam, and L × N winners corresponding to the sub beams. The gain G _n (ω, k, l) and the enhancement signal Y _n (ω, k, l) are received, and these values are linked to divide the sound collection area into L main beam areas with no overlap area. L winner gains G (ω, k, l) to be obtained are obtained (S140) and output together with L enhancement signals Y ₀ (ω, k, l) corresponding to the main beam.

＜フィルタ適用部１５０＞
フィルタ適用部１５０は、メインビームに対応するL個の強調信号Y₀(ω,k,l)と、L個のウィナーゲインG(ω,k,l)とを受け取り、これらの値の積を取ったL個の再生信号Z_l(ω,k)=Y₀(ω,k,l)×G(ω,k,l)を計算し（Ｓ１５０）、時間領域変換部１６０に出力する。なお、再生信号Z_l(ω,k)は、メインビーム領域lに対応する狭指向音声信号であり、メインビーム領域l(エル)から発せられる音を強調した信号である。再生信号Z_l(ω,k)と強調信号Y₀(ω,k,l)との違いは、Y₀(ω,k,l)とY₀(ω,k,l+1)との間に重畳領域が存在するのに対し、再生信号Z_l(ω,k)とZ_l+1(ω,k)との間には、重畳領域が存在しないことである。より正確に言うと、メインビーム領域lとメインビーム領域l+1との間の重畳領域に音源が存在する場合に、強調信号Y₀(ω,k,l)にウィナーゲインG(ω,k,l)を、強調信号Y₀(ω,k,l+1)にウィナーゲインG(ω,k,l+1)を乗じることで、その音源が一方のメインビーム領域に存在し、他方のメインビーム領域には存在しないように信号処理を行う。このような信号処理により、再生信号Z_l(ω,k)とZ_l+1(ω,k)との間に重畳領域が存在しないものとしている。 <Filter application unit 150>
The filter application unit 150 receives L enhancement signals Y ₀ (ω, k, l) corresponding to the main beam and L winner gains G (ω, k, l), and calculates the product of these values. The obtained L reproduction signals Z _l (ω, k) = Y ₀ (ω, k, l) × G (ω, k, l) are calculated (S 150) and output to the time domain conversion unit 160. The reproduction signal Z _l (ω, k) is a narrow-directional audio signal corresponding to the main beam region l, and is a signal that emphasizes the sound emitted from the main beam region l (L). The difference between the reproduction signal Z _l (ω, k) and the enhancement signal Y ₀ (ω, k, l) is between Y ₀ (ω, k, l) and Y ₀ (ω, k, l + 1). Is that there is no overlap area between the reproduction signals Z _l (ω, k) and Z _{l + 1} (ω, k). More precisely, when a sound source is present in the overlap region between the main beam region l and the main beam region l + 1, the Wiener gain G (ω, k) is added to the enhancement signal Y ₀ (ω, k, l). , l) by multiplying the emphasis signal Y ₀ (ω, k, l + 1) by the Wiener gain G (ω, k, l + 1), the sound source exists in one main beam region, and the other Signal processing is performed so as not to exist in the main beam region. By such signal processing, it is assumed that there is no overlapping region between the reproduction signals Z _l (ω, k) and Z _{l + 1} (ω, k).

＜時間領域変換部１６０＞
時間領域変換部１６０は、L個の周波数領域の再生信号Z_l(ω,k)を逆離散フーリエ変換などの手法で時間領域に変換し（Ｓ１６０）、L個の時間領域の再生信号z_l(t)を求め、出力する。 <Time domain conversion unit 160>
The time domain transform unit 160 transforms the L frequency domain reproduction signals Z _l (ω, k) into the time domain using a technique such as inverse discrete Fourier transform (S160), and L time domain reproduction signals z _l. Find (t) and output.

＜効果＞
このような構成により、類似度を用いずにオーバーラップ領域に音源が存在するかどうかを判別し、オーバーラップ領域に音源が存在する場合、オーバーラップ領域を形成する何れかのビーム領域から得られる信号に対して、オーバーラップ領域に存在する音源に対する感度を抑圧する処理を行うことができる。 <Effect>
With such a configuration, it is determined whether or not a sound source exists in the overlap region without using the similarity, and when a sound source exists in the overlap region, it is obtained from any beam region that forms the overlap region. It is possible to perform processing for suppressing the sensitivity to the sound source existing in the overlap region on the signal.

＜変形例＞
本実施形態のポイントは、再生信号Z_l(ω,k)とZ_l+1(ω,k)との間には、重畳領域が存在しないことであり、そのためのウィナーフィルタG(ω,k,l)を求める点である。よって、収音装置１００は、少なくともビーム形成部１３０とビーム再形成部１４０とを含めばよく、他の構成を別装置として設けてもよい。例えば、周波数領域の収音信号X_m(ω,k)を入力とし、ウィナーフィルタG(ω,k,l)を出力してもよい。 <Modification>
The point of this embodiment is that there is no overlap region between the reproduction signals Z _l (ω, k) and Z _{l + 1} (ω, k), and the Wiener filter G (ω, k) for that purpose. , l). Therefore, the sound collection device 100 only needs to include at least the beam forming unit 130 and the beam reshaping unit 140, and other configurations may be provided as separate devices. For example, the collected sound signal X _m (ω, k) in the frequency domain may be input and the Wiener filter G (ω, k, l) may be output.

メインビーム間およびサブビーム間の間隔は、それぞれ等間隔である必要はない。また、サブセット領域はL個以上である必要はないし、サブセット領域毎に含まれるサブビーム領域の個数を変更してもよい。ただし、後述するパワースペクトル密度（ＰＳＤ）推定の逆行列計算が不安定となる。 The intervals between the main beams and the sub beams do not need to be equal. Further, the number of subset areas need not be L or more, and the number of sub-beam areas included in each subset area may be changed. However, the inverse matrix calculation of power spectral density (PSD) estimation described later becomes unstable.

本実施形態では、マイクロホンアレイの中心に対して３６０°を収音領域としたが、どのような収音領域としてもよい。隣接するメインビーム領域が重畳領域を形成し、重畳領域にサブビーム領域を形成するような構成であればどのような収音領域であってもよい。 In the present embodiment, the sound collection region is 360 ° with respect to the center of the microphone array, but any sound collection region may be used. Any sound collection area may be used as long as the adjacent main beam areas form a superposition area and the sub-beam area is formed in the superposition area.

＜第二実施形態＞
この発明の第二実施形態では、第一実施形態のビーム形成部１３０の機能構成を具体化する。図９はビーム形成部１３０の機能ブロック図を、図１０はそのフローチャートの例を示す。 <Second embodiment>
In the second embodiment of the present invention, the functional configuration of the beam forming unit 130 of the first embodiment is embodied. 9 shows a functional block diagram of the beam forming unit 130, and FIG. 10 shows an example of a flowchart thereof.

ビーム形成部１３０は、N+1個のウィナーゲイン計算部１３１−ｑとメモリ１３４とを含む。ここでは、q=0,1,…,Nとし、q=0はメインビームを示すインデックスとする。 The beam forming unit 130 includes N + 1 winner gain calculating units 131-q and a memory 134. Here, q = 0, 1,..., N, and q = 0 is an index indicating the main beam.

ビーム形成部１３０では、参考文献２の手法を用いてメインビームとサブビームの目的方向に対し、それぞれ独立に狭指向音声強調ビームを形成する。
（参考文献２）：Y. Hioka, K. Furuya, K. Kobayashi, K. Niwa, Y. Haneda, "Underdetermined sound source separation using power spectrum density estimated by combination of directivity gain", IEEE Transactions on Audio, Speech, and Language Processing, 2013, Vol.21-6, pp.1240-1250. The beam forming unit 130 forms a narrow-directed speech enhancement beam independently for each of the target directions of the main beam and the sub beam using the technique of Reference 2.
(Reference 2): Y. Hioka, K. Furuya, K. Kobayashi, K. Niwa, Y. Haneda, "Underdetermined sound source separation using power spectrum density estimated by combination of directivity gain", IEEE Transactions on Audio, Speech, and Language Processing, 2013, Vol.21-6, pp.1240-1250.

具体的には、各ビーム領域の音を強調した強調信号のパワースペクトル密度（以下「PSD」ともいう）の推定値に基づいたウィナーゲイン（時間周波数マスク）計算を行う。 Specifically, a Wiener gain (time frequency mask) calculation based on an estimated value of a power spectrum density (hereinafter also referred to as “PSD”) of an enhanced signal that emphasizes the sound of each beam region is performed.

ウィナーゲイン計算部１３１−ｑは、参考文献２の実装である。参考文献２は、音源s^-(ω,k)のPSDを推定することで、強調信号y^-(ω,k)に対する適切なウィナーゲインG^-(ω,k)=(G(ω,k,1),…,G(ω,k,L))を計算し、さらに音声強調を行う後処理を提供している。詳細なアルゴリズムを本節の末尾に示す。 The winner gain calculation unit 131-q is an implementation of Reference 2. Reference 2, the sound source s ^- (ω, k) to estimate the PSD of enhanced signal y ^- (ω, k) to the appropriate Wiener gain ^{G - (ω, k) =} (G (ω, k, 1), ..., G (ω, k, L)) are calculated, and post-processing is performed to further enhance speech. A detailed algorithm is given at the end of this section.

＜ウィナーゲイン計算部１３１−ｑ＞
図１１にウィナーゲイン計算部１３１−ｑの構成例を示す。 <Winner Gain Calculation Unit 131-q>
FIG. 11 shows a configuration example of the winner gain calculation unit 131-q.

ウィナーゲイン計算部１３１−ｑは、線形フィルタリング部１３１−ｑ−１、PSD推定部１３１−ｑ−２、平方根算出部１３１−ｑ−３及びゲイン計算部１３１−ｑ−４を含む。 The winner gain calculation unit 131-q includes a linear filtering unit 131-q-1, a PSD estimation unit 131-q-2, a square root calculation unit 131-q-3, and a gain calculation unit 131-q-4.

ウィナーゲイン計算部１３１−ｑは、M個の周波数領域の収音信号X_m(ω,k)を受け取り、これらの値を用いて、ウィナーゲインG_q(ω,k,l)と、強調信号Y_q(ω,k,l)と、音源のPSDφ_{S_q}(ω,k,l)(ただし、下付文字のA_BはA_Bを表す。)の平方根を取ったものS_q(ω,k,l)とを求め、同時に出力する。強調信号G_q(ω,k,l)は、M個の無指向性のマイクロホンを用いて、L個のメインビーム領域と、L×N個のサブビーム領域とを形成し、各ビーム領域の音を収音する場合には、ウィナーゲインG_q(ω,k,l)の計算過程で計算される線形フィルタリング後の信号であり、１つの指向性マイクロホンで１つのメインビーム領域の音を収音する場合には、指向性マイクロホンの出力値である。なお、前述の通り、q=0,1,…,Nであり、n=1,2,…,Nである。 The winner gain calculation unit 131-q receives the M frequency-range collected sound signals X _m (ω, k), and uses these values to determine the winner gain G _q (ω, k, l) and the enhancement signal. _{Y q (ω, k, l} ) and the sound source of _{PSDφ S_q (ω, k, l} ) ( however, A_B subscript represents. a a _B) as taking the square root of S _q (ω, k, l) and output simultaneously. The enhancement signal G _q (ω, k, l) forms L main beam regions and L × N sub beam regions using M omnidirectional microphones, and generates sound in each beam region. Is a signal after linear filtering that is calculated in the process of calculating the Wiener gain G _q (ω, k, l), and the sound of one main beam region is collected by one directional microphone. If so, it is the output value of the directional microphone. As described above, q = 0, 1,..., N, and n = 1, 2,.

以下では参考文献２を簡潔に説明する。無指向性マイクロホンを使用して収音した場合（Ｓ１３１−ｑ−０）、まず、ウィナーゲイン計算部１３１−ｑは、線形フィルタ係数W^- _q(ω)をメモリ１３４よりロードする（取り出す）。次に、線形フィルタリング部１３１−ｑ−１は、次式により、収音信号x^-(ω,k)と各ビームの線形フィルタ係数W^- _q(ω)の積をとり、線形フィルタリング後の狭指向強調信号Y_q(ω,k)を得る（Ｓ１３１−ｑ−１）。
Y_q(ω,k)=W^- _q(ω)x^-(ω,k) (31)
なお、１つの指向性マイクロホンで１つのメインビーム領域の音を収音する場合（Ｓ１３１−ｑ−０）、この処理Ｓ１３１−ｑ−１は行わなくてもよい。前述の通り、Y_q(ω,k,l)=X_l(ω,k)として出力すればよい。 Reference Document 2 will be briefly described below. When sound is collected using an omnidirectional microphone (S131-q-0), first, the winner gain calculation unit 131-q loads (takes out) the linear filter coefficient W ^- _q (ω) from the memory 134. Then, linear filtering unit 131-q-1 is the following equation, the collected signal x ^- (ω, k) and each beam of the linear filter coefficient W ^- taking the product of _q (omega), after linear filtering narrow A directivity enhancement signal Y _q (ω, k) is obtained (S131-q-1).
_{Y q (ω, k) =} W - q (ω) x - (ω, k) (31)
In addition, when the sound of one main beam area is picked up by one directional microphone (S131-q-0), this process S131-q-1 may not be performed. As described above, Y _q (ω, k, l) = X _l (ω, k) may be output.

前述のアレイ収音信号強調技術の定式化より、線形フィルタリング後の強調音声y^-(ω,k)は音源s^-(ω,k)、伝達特性A^-(ω)、収音信号x^-(ω,k)、線形フィルタ係数W^-(ω)を用いて以下のように記述できる（式(6)参照）。y^-(ω,k)=W^-(ω)x^-(ω,k)=W^-(ω)A^-(ω)s^-(ω,k)=D^-(ω)s^-(ω,k) (32)
ただし、D^-(ω)=W^-(ω)A^-(ω)
なお、１つの指向性マイクロホンで１つのメインビーム領域の音を収音する場合は、y^-(ω,k)=x^-(ω,k)であり、y^-(ω,k)=D^-(ω)s^-(ω,k)とすると、D^-(ω)=prod[H^-(θ^-),A^-(ω)]である(式(9),(10)参照)。 From the formulation of the above-mentioned array collected sound signal enhancement technique, enhancement audio y after linear filtering ^{- (ω,} k) is the sound source s ^{- (ω,} k), transfer characteristic A ^- (omega), the collected signal x ^- ( Using ω, k) and the linear filter coefficient W ⁻ (ω), it can be described as follows (see equation (6)). ^{y - (ω, k) =} W - (ω) x - (ω, k) = W - (ω) A - (ω) s - (ω, k) = D - (ω) s - (ω, k ) (32)
^{However, D - (ω) = W} - (ω) A - (ω)
In the case of picking up the sound of one main beam region in one directional ^{microphone, y - (ω, k)} = x - (ω, k) ^{a, y - (ω, k)} = D - If (ω) s ⁻ (ω, k), then D ⁻ (ω) = prod [H ⁻ (θ ⁻ ), A ⁻ (ω)] (see equations (9) and (10)).

ここで、狭指向強調音声y^-(ω,k)のPSDφ_y(ω,k,l)=E[Y(ω,k,l)Y(ω,k,l)^*]は、P個の音源がそれぞれ独立と仮定することで、

と書くことができる。ここでE[]は期待値演算、上付き文字の＊は複素共役を表す。さらに、Φ^- _y(ω,k)=(φ_y(ω,k,1),…,φ_y(ω,k,L))^T、Φ^- _S(ω,k)=(φ_S(ω,k,1),…,φ_S(ω,k,P))^T、D^-(ω)を|D_l,p(ω)|²を成分とするL×P行列とすることで、
Φ^- _y(ω,k)=D^-(ω)Φ^- _S(ω,k) (34)
と行列形式で記述できる。したがって、音源s^-(ω,k)のPSDは、
Φ^- _S(ω,k)=D^-+(ω)Φ^- _y(ω,k) (35)
と求めることができる（Ｓ１３１−ｑ−２）。ここで上付き文字の＋は擬似逆行列を示す。また、D^-(ω)が正則行列であれば(L=Pであり、メインビームの個数と音源の個数とが同じ場合であれば)、擬似逆行列を逆行列に置き換えてもよい。 Here, PSDφ _y (ω, k, l) = E [Y (ω, k, l) Y (ω, k, l) ^* ] of the narrow-directed emphasized speech y ⁻ (ω, k) is P pieces By assuming that the sound sources are independent,

Can be written. Here, E [] represents an expected value calculation, and the superscript * represents a complex conjugate. Furthermore, Φ ^- _y (ω, k) = (φ _y (ω, k, 1),…, φ _y (ω, k, L)) ^T , Φ ^- _S (ω, k) = (φ _S (ω , k, 1), ..., φ _S (ω, k, P)) ^T , D ⁻ (ω) is an L × P matrix with | D _{l, p} (ω) | ² as components,
^{_{Φ - y (ω, k)}} = D - (ω) Φ - S (ω, k) (34)
And can be described in matrix form. Therefore, the sound source s ^- (ω, k) PSD of,
Φ ^- _S (ω, k) = D- ⁺ (ω) Φ ^- _y (ω, k) (35)
(S131-q-2). Here, the superscript + indicates a pseudo inverse matrix. Further, if D ⁻ (ω) is a regular matrix (if L = P and the number of main beams and the number of sound sources are the same), the pseudo inverse matrix may be replaced with an inverse matrix.

したがって、目的方向に対する適切なゲインは、i∈θ_lを目的方向θ_lと到来方向が一致する音源番号として、

と計算できる（Ｓ１３１−ｑ−４）。 Therefore, the appropriate gain for the target direction is as follows: i∈θ _l is used as the sound source number whose arrival direction coincides with the target direction θ _l .

(S131-q-4).

しかし、実環境では多くの場合、音源の到来方向と伝達特性A^-(ω)は未知である。よってその際には、伝達特性A^-(ω)を、目的方向θ_lからマイク位置までの伝搬波の位相差を表すアレイ・マニフォールドベクトルを用いて近似計算する。また、L=Pと仮定し、音源信号s_l(t)は目的方向θ_lから到来すると仮定して、

としてウィナーゲインを計算する（Ｓ１３１−ｑ−４）。 However, in many cases in the real environment, transmitted arrival direction of the sound source characteristics A ^- (omega) is unknown. Therefore At that time, transfer characteristics A ^- a (omega), the approximation is calculated using the array manifold vector representing the phase difference of the propagating wave from the target direction theta _l to the microphone position. Also, assuming that L = P, the sound source signal s _l (t) is assumed to arrive from the target direction θ _l ,

As a result, the winner gain is calculated (S131-q-4).

また、所望の音源以外の雑音レベルが高い場合は、参考文献３のように、ウィナーゲイン計算時などに耐雑音処理を行ってもよい。
（参考文献３）K. Niwa, Y. Hioka, K. Kobayashi, “Post-filter design for speech enhancement in various noisy environments”, 14th International Workshop on Acoustic Signal Enhancement (IWAENC), 2014, pp. 35 - 39. If the noise level of the sound source other than the desired sound source is high, noise resistance processing may be performed at the time of calculating the winner gain, as in Reference 3.
(Reference 3) K. Niwa, Y. Hioka, K. Kobayashi, “Post-filter design for speech enhancement in various noisy environments”, 14th International Workshop on Acoustic Signal Enhancement (IWAENC), 2014, pp. 35-39.

（線形フィルタリング部１３１−ｑ−１）
よって、無指向性のマイクロホンを用いて収音した場合（Ｓ１３１−ｑ−０）、言い換えると、M個の無指向性のマイクロホンを用いて、L個のメインビーム領域と、L×N個のサブビーム領域とを形成し、各ビーム領域の音を収音する場合、線形フィルタリング部１３１−ｑ−１は、線形フィルタ係数W^- _q(ω)と収音信号x^-(ω,k)とを受け取り、式(31)により、線形フィルタリングを行い（Ｓ１３１−ｑ−１）、強調信号Y_q(ω,k)=(Y_q(ω,k,1),Y_q(ω,k,2),…,Y_q(ω,k,L))を算出し、出力する。 (Linear filtering unit 131-q-1)
Therefore, when sound is collected using an omnidirectional microphone (S131-q-0), in other words, using M omnidirectional microphones, L main beam regions and L × N forming a sub-beam region, to pick up the sound of each beam region, linear filtering unit 131-q-1 is the linear filter coefficient W ^- _q (ω) and the collected sound signal x ^{- (ω,} k) and the Then, linear filtering is performed according to equation (31) (S131−q−1), and the enhancement signal Y _q (ω, k) = (Y _q (ω, k, 1), Y _q (ω, k, 2) ,..., Y _q (ω, k, L)) are calculated and output.

指向性のマイクロホンを用いて収音した場合（Ｓ１３１−ｑ−０）、言い換えると、１つの指向性マイクロホンで１つのメインビーム領域の音を収音する場合、この処理Ｓ１３１−ｑ−１は行わなくてもよい。M=L,m=lとし、線形フィルタリング部１３１−ｑ−１は、Y_q(ω,k,l)=X_l(ω,k)として出力すればよい。 When sound is collected using a directional microphone (S131-q-0), in other words, when sound of one main beam region is collected by one directional microphone, this processing S131-q-1 is performed. It does not have to be. M = L, m = l, and the linear filtering unit 131-q-1 may output Y _q (ω, k, l) = X _l (ω, k).

（PSD推定部１３１−ｑ−２）
PSD推定部１３１−ｑ−２は、強調信号Y_q(ω,k,l)を受け取り、式（35）により、PSDを推定し（Ｓ１３１−ｑ−２）、推定値φ_{S_q}(ω,k,l)を出力する。ただし、L=Pと仮定し、
Φ_{S_q}(ω,k)=D^- _q ⁺(ω)Φ_{y_q}(ω,k)
Φ_{S_q}(ω,k)=(φ_{S_q}(ω,k,1),…,φ_{S_q}(ω,k,L))^T
Φ^- _{y_q}(ω,k)=(φ_{y_q}(ω,k,1),…,φ_{y_q}(ω,k,L))^T
φ_{y_q}(ω,k,l)=E[Y_q(ω,k,l)Y_q(ω,k,l)^*]
である。 (PSD estimation unit 131-q-2)
The PSD estimation unit 131-q-2 receives the enhancement signal Y _q (ω, k, l), estimates the PSD according to Equation (35) (S131-q-2), and estimates the value φ _{S_q} (ω, k , l). However, assuming L = P,
Φ _{S_q} (ω, k) = D ^- _q ⁺ (ω) Φ _{y_q} (ω, k)
Φ _{S_q} (ω, k) = (φ _{S_q} (ω, k, 1),…, φ _{S_q} (ω, k, L)) ^T
Φ ^- _{y_q} (ω, k) = (φ _{y_q} (ω, k, 1),…, φ _{y_q} (ω, k, L)) ^T
φ _{y_q} (ω, k, l) = E [Y _q (ω, k, l) Y _q (ω, k, l) ^* ]
It is.

なお、式（35）における行列D^-(ω)は、無指向性のマイクロホンを用いて収音した場合には、伝達特性A^-(ω)及び線形フィルタ係数W^- _q(ω)を用いて、
D^- _q(ω)=W^- _q(ω)A^-(ω)
と表し、指向性のマイクロホンを用いて収音した場合には、伝達特性A^-(ω)及び指向性H^-(θ^- _q)を用いて、
D^- _q(ω)=prod[H^-(θ^- _q),A^-(ω)]
と表すことができる。θ^- _q=(θ_{1_q},…,θ_{L_q})であり、θ_{l_q}は、q=nのときは、基準となる方向に対するl番目のサブセット領域のn番目のサブビーム領域の目的方向の成す角度を表す。q=0のとき、基準となる方向に対するl番目のメインビーム領域の目的方向の成す角度を表す。指向性のマイクロホンを用いて収音した場合、強調信号Y_q(ω,k,l)は、Y_q(ω,k,l)=X_l(ω,k)であり、サブビーム領域の音を強調した信号とは言えないが、このような行列D^- _q(ω)を用いて、サブビーム領域の音を強調した強調信号Y_q(ω,k,l)に対するPSDの推定値φ_{S_q}(ω,k,l)を求めることができる。さらに、後述する平方根算出部１３１−ｑ−３及びゲイン計算部１３１−ｑ−４では、推定値φ_{S_q}(ω,k,l)を用いて処理を行うため、サブビーム領域の音を強調した強調信号Y_q(ω,k,l)に対応する値(推定値φ_{S_q}(ω,k,l)の平方根S_q(ω,k,l)及びウィナーゲインG_q(ω,k,l))を得ることができる。推定値φ_{S_q}(ω,k,l)及びその平方根S_q(ω,k,l)を、強調信号Y_q(ω,k,l)に対応する特徴量ともいう。 Incidentally, the matrix in equation (35) D ^- (ω), when picked up by using a non-directional microphones, transfer characteristic A ^- (omega) and the linear filter coefficient W ^- with _q (omega) ,
^{_{D - q (ω) = W}} - q (ω) A - (ω)
And it represents, when picked up by using the directional microphones, transfer characteristic A ^- (omega) and directional H ^{^-} (θ ^- _q) with,
^{_{D - q (ω) = prod}} [H - (θ - q), A - (ω)]
It can be expressed as. θ ^- _q = (θ _{1_q} , ..., θ _{L_q} ), and θ _{l_q} is the angle formed by the target direction of the n-th sub-beam region of the l-th subset region with respect to the reference direction when q = n. Represent. When q = 0, it represents the angle formed by the target direction of the l-th main beam region with respect to the reference direction. When sound is collected using a directional microphone, the enhancement signal Y _q (ω, k, l) is Y _q (ω, k, l) = X _l (ω, k), and the sound in the sub-beam region is Although it cannot be said that the signal is an enhanced signal, the estimated value φ _{S_q} (ω of PSD for the enhanced signal Y _q (ω, k, l) that emphasizes the sound in the sub-beam region using such a matrix D ⁻ _q (ω). , k, l). Furthermore, the square root calculator 131-q-3 and the gain calculating unit 131-q-4 will be described later, the estimate _{φ S_q (ω, k, l} ) for processing with emphasis that emphasizes the sound of the sub-beam region The value corresponding to the signal Y _q (ω, k, l) (the square root S _q (ω, k, l) and the winner gain G _q (ω, k, l) of the estimated value φ _{S_q} (ω, k, l)) Can be obtained. Estimates _{φ S_q (ω, k, l} ) and its square root _{S q (ω, k, l} ) and enhanced signal _{Y q (ω, k, l} ) also referred to as feature amount corresponding to.

（平方根算出部１３１−ｑ−３）
平方根算出部１３１−ｑ−３は、PSDの推定値φ_{S_q}(ω,k,l)を受け取り、その平方根S_q(ω,k,l)=√(φ_{S_q}(ω,k,l))を算出し（Ｓ１３１−ｑ−３）、出力する。 (Square root calculation unit 131-q-3)
The square root calculation unit 131-q-3 receives the estimated value φ _{S_q} (ω, k, l) of PSD, and the square root S _q (ω, k, l) = √ (φ _{S_q} (ω, k, l)) Is calculated (S131-q-3) and output.

（ゲイン計算部１３１−ｑ−４）
ゲイン計算部１３１−ｑ−４は、PSDの推定値φ_{S_q}(ω,k,l)を受け取り、式(36)または式(37)により、ウィナーゲインG_q(ω,k,l)を計算し（Ｓ１３１−ｑ−４）、出力する。

言い換えると、N個のゲイン計算部１３１−ｑ−４は、L個のメインビーム領域ごとの音を強調した強調信号Y₀(ω,k,l)およびL×N個のサブビーム領域ごとの音を強調した強調信号Y_q(ω,k,l)について音源の存在を検出するために用いるフィルタG^-(ω,k,l)=｛G₀(ω,k,l),G₁(ω,k,l),…,G_N(ω,k,l)｝を形成する。なお、このフィルタG^-(ω,k,l)は、前述の通り、ウィナーフィルタである。 (Gain calculator 131-q-4)
The gain calculation unit 131-q-4 receives the estimated value φ _{S_q} (ω, k, l) of the PSD and calculates the winner gain G _q (ω, k, l) by using the equation (36) or the equation (37). (S131-q-4) and output.

In other words, the N gain calculators 131-q-4 perform the enhancement signal Y ₀ (ω, k, l) that emphasizes the sound for each of the L main beam regions and the sound for each of the L × N sub beam regions. enhancement signal emphasizing the _{Y q (ω, k, l} ) filter used to detect the presence of a sound source for ^{G - (ω, k, l} ) = {G 0 (ω, k, l), G 1 (ω , k, l),..., G _N (ω, k, l)}. Incidentally, the filter ^{G - (ω, k, l} ) , as described above, is a Wiener filter.

＜効果＞
このような構成により、第一実施形態と同様の効果を得ることができる。 <Effect>
With such a configuration, the same effect as that of the first embodiment can be obtained.

＜変形例＞
線形フィルタリングを行う場合には、M個のマイクロホンが無指向性であるとの前提で、処理を行っているが、M個のマイクロホンは、それぞれ指向性を有してもよいし、無指向性であってもよく、指向性を有するマイクロホンと無指向性のマイクロホンとが混在してもよい。指向性を有したマイクロホンを用いた場合も、線形フィルタリングを行うことで、ビームを形成し、メインビーム領域及びサブビーム領域の音を収音することができる。なお、第一実施形態においても同様の変形が可能である。 <Modification>
When performing linear filtering, processing is performed on the assumption that M microphones are omnidirectional, but each of the M microphones may have directivity or omnidirectionality. Alternatively, a directional microphone and an omnidirectional microphone may be mixed. Even when a microphone having directivity is used, by performing linear filtering, a beam can be formed and sounds in the main beam region and the sub beam region can be collected. Note that similar modifications are possible in the first embodiment.

＜第三実施形態＞
この発明の第三実施形態では、第一実施形態のビーム再形成部の機能構成を具体化する。図１２はビーム再形成部１４０の機能ブロック図を、図１３はその処理フローの例を示す。 <Third embodiment>
In the third embodiment of the present invention, the functional configuration of the beam reshaping unit of the first embodiment is embodied. FIG. 12 is a functional block diagram of the beam reshaping unit 140, and FIG. 13 shows an example of the processing flow.

ビーム再形成部１４０は、強調信号Y_q(ω,k,l)と、ウィナーゲインG_q(ω,k,l)と、ビーム領域のそれぞれのPSDの推定値φ_{S_q}(ω,k,l)の平方根S_q(ω,k,l)とを受け取り(ただし、q=0,1,…,N、l=1,2,…,Lである)、隣接する２つのメインビーム領域の重畳領域を目的方向とするサブビーム領域に音源が存在することを検出した場合には、メインビーム領域のウィナーゲインG₀(ω,k,l)またはG₀(ω,k,l+1)を抑圧し、サブビーム領域に音源が存在することを検出しなかった場合には、メインビーム領域のウィナーゲインG₀(ω,k,l)を抑圧せずに、メインビーム領域のY₀(ω,k,l)とともに、出力する。以下のこの処理を実現する例について説明する。 The beam reforming unit 140 includes an enhancement signal Y _q (ω, k, l), a winner gain G _q (ω, k, l), and an estimated value φ _{S_q} (ω, k, l) of each PSD of the beam region. ) Square root S _q (ω, k, l) (where q = 0, 1,..., N, l = 1, 2,..., L) and superimposing two adjacent main beam regions When it is detected that a sound source is present in the sub-beam region with the region as the target direction, the winner gain G ₀ (ω, k, l) or G ₀ (ω, k, l + 1) in the main beam region is suppressed. However, if it is not detected that a sound source is present in the sub-beam region, the main beam region Y ₀ (ω, k, l) is not suppressed without suppressing the main beam region's winner gain G ₀ (ω, k, l). , l). An example of realizing the following processing will be described.

ビーム再形成部１４０は、音源方向検出部１４１とゲイン抑圧部１４２とゲイン修正部１４３とを含む。 The beam reshaping unit 140 includes a sound source direction detecting unit 141, a gain suppressing unit 142, and a gain correcting unit 143.

（音源方向検出部１４１）
音源方向検出部１４１では、図６のように２つのメインビームとその間にあるN個のサブビームの出力とを比較し、音源方向を検出する。 (Sound source direction detection unit 141)
The sound source direction detection unit 141 compares the two main beams and the outputs of N sub-beams between them as shown in FIG. 6, and detects the sound source direction.

２つのメインビームの強調音声とウィナーゲインをそれぞれ、Y₀(ω,k,l),G₀(ω,k,l)とY₀(ω,k,l+1),G₀(ω,k,l+1)とし、音源方向検出部１４１は、Y₀(ω,k,l),G₀(ω,k,l),Y₀(ω,k,l+1),G₀(ω,k,l+1)を受け取り、強調音声とウィナーゲインの積の絶対値を計算する（Ｓ１４１ａ）。便宜上、|Y₀(ω,k,l)×G₀(ω,k,l)|を(a)と呼び、|Y₀(ω,k,l+1)×G₀(ω,k,l+1)|を(b)と呼ぶ。なお、図１２では、Y₀(ω,k,l),G₀(ω,k,l)のみを受け取っているが、l=1,2,…,Lについて受け取るため、Y₀(ω,k,l+1),G₀(ω,k,l+1)についても受け取ることができる。なお、l=Lのときには、l+1番目の強調音声とウィナーゲインとして、1番目のY₀(ω,k,1),G₀(ω,k,1)を用いる。以下の説明でも、l=L(言い換えるとl+1>L)の場合はl+1は1と読み替えて計算するものである。
(a)=|Y₀(ω,k,l)×G₀(ω,k,l)| (41)
(b)=|Y₀(ω,k,l+1)×G₀(ω,k,l+1)| (42) The emphasized speech and winner gain of the two main beams are respectively Y ₀ (ω, k, l), G ₀ (ω, k, l) and Y ₀ (ω, k, l + 1), G ₀ (ω, k, l + 1), and the sound source direction detection unit 141 uses Y ₀ (ω, k, l), G ₀ (ω, k, l), Y ₀ (ω, k, l + 1), G ₀ ( ω, k, l + 1) is received and the absolute value of the product of the emphasized speech and the Wiener gain is calculated (S141a). For convenience, | Y ₀ (ω, k, l) × G ₀ (ω, k, l) | is called (a), and | Y ₀ (ω, k, l + 1) × G ₀ (ω, k, l + 1) | is called (b). In FIG. 12, only Y ₀ (ω, k, l) and G ₀ (ω, k, l) are received, but since Y = 1, 2,..., L are received, Y ₀ (ω, k, l + 1), G ₀ (ω, k, l + 1) can also be received. When l = L, the first Y ₀ (ω, k, 1), G ₀ (ω, k, 1) is used as the l + 1st emphasized speech and the winner gain. Also in the following description, when l = L (in other words, l + 1> L), l + 1 is read as 1, and is calculated.
(a) = | Y ₀ (ω, k, l) × G ₀ (ω, k, l) | (41)
(b) = | Y ₀ (ω, k, l + 1) × G ₀ (ω, k, l + 1) | (42)

次に、l番目のメインビームとl+1番目のメインビームの間に存在する、N個のサブビームの強調音声とウィナーゲインをそれぞれY_n(ω,k,l),G_n(ω,k,l)とする。音源方向検出部１４１は、それぞれN個のY_n(ω,k,l),G_n(ω,k,l)を受け取り、メインビームと同様に強調音声とウィナーゲインの積の絶対値を計算する（Ｓ１４１ｂ）。便宜上、|Y_n(ω,k,l)×G_n(ω,k,l)|を(c,n)と呼ぶ。
(c,n)=|Y_n(ω,k,l)×G_n(ω,k,l)| (43) Next, the emphasized speech and Wiener gain of the N sub-beams existing between the l-th main beam and the l + 1-th main beam are respectively _expressed as Y _n (ω, k, l), G _n (ω, k , l). The sound source direction detection unit 141 receives N pieces of Y _n (ω, k, l) and G _n (ω, k, l), respectively, and calculates the absolute value of the product of the enhanced speech and the Wiener gain in the same manner as the main beam. (S141b). For convenience, | Y _n (ω, k, l) × G _n (ω, k, l) | is called (c, n).
(c, n) = | Y _n (ω, k, l) × G _n (ω, k, l) | (43)

そして、(a),(b)およびN個の(c,n)のうち、その値が最大のものを探索する（Ｓ１４１ｃ）。(a)もしくは(b)が最大であれば、メインビームの目的方向に音源があると判断し、(c,n)のいずれかが最大の場合は、サブビームの目的方向に音源があると判断する。また後続するゲイン修正部１４３のために、サブビームが最大であったときは、どのサブビームが最大であったかを記憶しておく。具体的には、最大であったサブビームのインデックスnを変数F1(l)に格納する(F1(l)←n)。何れかのメインビームが最大の場合はF1(l)←0とする。音源方向検出部１４１は、求めたF1(l)を出力する。 Then, a search is made for the largest value among (a), (b) and N (c, n) (S141c). If (a) or (b) is maximum, it is determined that there is a sound source in the target direction of the main beam, and if any of (c, n) is maximum, it is determined that there is a sound source in the target direction of the sub beam. To do. Further, for the subsequent gain correction unit 143, when the sub beam is the maximum, which sub beam is the maximum is stored. Specifically, the index n of the sub beam that has been maximized is stored in the variable F1 (l) (F1 (l) ← n). If any of the main beams is maximum, F1 (l) ← 0. The sound source direction detection unit 141 outputs the obtained F1 (l).

言い換えると、ビーム再形成部１４０は、フィルタ毎に算出した信号のパワーを比較して、信号パワーが最大となる領域がサブビーム領域か、メインビーム領域かを判別する。 In other words, the beam reshaping unit 140 compares the signal power calculated for each filter, and determines whether the region where the signal power is maximum is the sub-beam region or the main beam region.

（ゲイン抑圧部１４２）
ゲイン抑圧部１４２は、F1(l)とY₀(ω,k,l),G₀(ω,k,l),Y₀(ω,k,l+1),G₀(ω,k,l+1)とを受け取り、サブビームの目的方向に音源があった場合(F1(l)≠0の場合)、l番目のメインビームかl+1番目のメインビームのウィナーゲインを抑圧する。サブビーム上の音源が、(a)と(b)のどちら側に存在するかを判定するために、(a)と(b)の値を比較する（Ｓ１４２ａ）。この処理により、サブビーム領域に存在すると推定される音源が、隣接するメインビーム領域のどちらに近いかを判別する。全領域に存在する音源が周波数方向にスパースかつ独立であると仮定し、Y₀(ω,k,l)とY₀(ω,k,l+1)には一つの音源の情報しか含まれないと考えると、(a)が大きければ音源はl番目のメインビームの目的方向の近くに存在し、(b)が大きければ音源はl+1番目のメインビームの目的方向の近くに存在すると考えられる。 (Gain suppression unit 142)
The gain suppression unit 142 includes F1 (l), Y ₀ (ω, k, l), G ₀ (ω, k, l), Y ₀ (ω, k, l + 1), G ₀ (ω, k, l + 1) is received, and if there is a sound source in the target direction of the sub-beam (if F1 (l) ≠ 0), the winner gain of the l-th main beam or the l + 1-th main beam is suppressed. In order to determine which side (a) or (b) the sound source on the sub beam exists on, the values of (a) and (b) are compared (S142a). By this process, it is determined which sound source estimated to be present in the sub-beam region is closer to the adjacent main beam region. Assuming that the sound sources in all regions are sparse and independent in the frequency direction, Y ₀ (ω, k, l) and Y ₀ (ω, k, l + 1) contain information for only one sound source. If (a) is large, the sound source is near the target direction of the l-th main beam, and if (b) is large, the sound source is near the target direction of the l + 1-th main beam. Conceivable.

これに基づき、(a)が大きければG₀(ω,k,l+1)を元の値よりも小さいある小さな値g_spに設定し（G₀(ω,k,l+1)←g_sp、Ｓ１４２ｂ）、(b)が大きければG₀(ω,k,l)を元の値よりも小さいある小さな値g_spに設定する（G₀(ω,k,l)←g_sp、Ｓ１４２ｃ）。g_spはたとえば0.0や0.05などに設定できる。このG₀(ω,k,l+1)またはG₀(ω,k,l)を小さな値に設定する処理が、G₀(ω,k,l+1)またはG₀(ω,k,l)を抑圧する処理に相当する。なお、値g_spが十分に小さいのであれば、値g_spはG₀(ω,k,l)またはG₀(ω,k,l+1)の元の値よりも大きくてもよい。値g_spが十分に小さい(例えば0.05未満)のであれば、元の値よりも大きく設定したとしても出力にほとんど影響を与えないためである。 Based on this, if (a) is large, G ₀ (ω, k, l + 1) is set to a small value g _sp smaller than the original value (G ₀ (ω, k, l + 1) ← g _{If sp} , S142b) and (b) are large, G ₀ (ω, k, l) is set to a small value g _sp smaller than the original value (G ₀ (ω, k, l) ← g _sp , S142c ). g _sp can be set to 0.0 or 0.05, for example. The process of setting this G ₀ (ω, k, l + 1) or G ₀ (ω, k, l) to a small value is G ₀ (ω, k, l + 1) or G ₀ (ω, k, This corresponds to the processing of suppressing l). If the value _gsp is sufficiently small, the value _gsp may be larger than the original value of G ₀ (ω, k, l) or G ₀ (ω, k, l + 1). This is because if the value _gsp is sufficiently small (for example, less than 0.05), the output is hardly affected even if it is set larger than the original value.

上述の処理を、全てのメインビームの目的方向に対して行う（Ｓ１４０ａ、Ｓ１４０ｂ、Ｓ１４０ｃ）。 The above-described processing is performed on the target directions of all main beams (S140a, S140b, S140c).

上述の比較処理には、音源位置の揺らぎを吸収するアルゴリズムを追加することができる。たとえばl番目のメインビームとl+1番目のメインビームの目的方向がそれぞれ０°と６０°であり、人間などの音源が３０°付近に存在するとする。この音源は、たとえば首の回転や座り直しなどの要因で、音源位置が時刻kで２９°、音源位置が時刻k+1で３１°などと、その位置が時間的に揺らぎをもつことがある。 An algorithm that absorbs fluctuations in the sound source position can be added to the comparison process described above. For example, it is assumed that the target directions of the l-th main beam and the l + 1-th main beam are 0 ° and 60 °, respectively, and that a sound source such as a human exists near 30 °. This sound source may have temporal fluctuations, for example, when the sound source position is 29 ° at time k and the sound source position is 31 ° at time k + 1 due to factors such as neck rotation and re-sitting. .

このような状況では、時刻kではG₀(ω,k,l)を抑圧し、時刻k+1ではG₀(ω,k+1,l+1)を抑圧するなどの処理が起こり、出力の時間的連続性が保証されない場合がある。これを防ぐために、音源位置の若干の揺らぎを吸収するアルゴリズムを追加実装してもよい。
アルゴリズムの例として、以下の条件
C.1 時刻k-1で(a)の値がg_th未満である。
C.2 時刻kで(a)の値がg_th以上である。
C.3 時刻kで(a)>(b)である。
を満たす場合、それ以降の時刻では、
D.1 (a)がg_th以上である限り、
D.2 (b)>(a)であっても、(b)が(a)のr_th倍以上の値にならなければ、
G₀(ω,k,l+1)を抑圧する、などの処理が考えられる。この処理は、(a)と(b)を入れ替えて読むことで、(b)が先に反応した場合にも対応できる。g_thとr_thの値は実験的に決定することがよいが、例えばg_thは０．２、r_thは１．４などに設定できる。また、音声などでは、促音などで局所的にg_thを下回ることもあるため、条件D.1を、
D.1 t_th(ms)以上連続してg_th未満にならなければ、
と変更してもよい。t_thの値はたとえば５００などに設定できる。 In such a situation, processing such as suppressing G ₀ (ω, k, l) at time k and suppressing G ₀ (ω, k + 1, l + 1) at time k + 1 occurs, and output In some cases, the continuity of time may not be guaranteed. In order to prevent this, an algorithm for absorbing slight fluctuations in the sound source position may be additionally implemented.
As an example of the algorithm, the following conditions
C.1 The value of (a) is less than g _th at time k-1.
C.2 The value of (a) is greater than or equal to g _th at time k.
C.3 (a)> (b) at time k.
If the time is satisfied,
As long as D.1 (a) is greater than or equal to g _th
D.2 Even if (b)> (a), if (b) is not more than r _th times (a),
A process such as suppressing G ₀ (ω, k, l + 1) can be considered. This process can be handled even when (b) reacts first by reading (a) and (b) interchangeably. The values of g _th and r _th are preferably determined experimentally. For example, g _th can be set to 0.2, r _th can be set to 1.4, and the like. Also, for voice etc., it may be lower than g _th locally due to prompting sound, etc., so condition D.1
D.1 t _th (ms) or more, if not continuously less than g _th
It may be changed. The value of t _th can be set to 500, for example.

このようにして、ゲイン抑圧部１４２は、一旦、当該音源に近くないと判定したメインビーム領域については、その後、当該音源に近いと判定された場合であっても、所定の条件を満たさない限り、当該音源に近くないと判定したメインビーム領域側の信号に含まれる当該音源の音を抑圧するように、調整する。 In this way, the gain suppression unit 142 once determines that the main beam region that is not close to the sound source is not close to the sound source, even if it is determined that the main beam region is close to the sound source. Then, adjustment is performed so as to suppress the sound of the sound source included in the signal on the main beam region side determined not to be close to the sound source.

ゲイン抑圧部１４２は、サブビームの目的方向に音源があった場合、抑圧したウィナーゲインと受け取ったウィナーゲイン(つまり、G₀(ω,k,l)=g_spとG₀(ω,k,l+1)、または、G₀(ω,k,l)とG₀(ω,k,l+1)=g_sp)を出力し、メインビームの目的方向に音源があった場合、受け取ったウィナーゲインをそのまま(G₀(ω,k,l)とG₀(ω,k,l+1))出力する。なお、l番目のメインビームに対応するウィナーゲインG₀(ω,k,l)は、l-1番目のメインビームに対応するウィナーゲインG₀(ω,k,l-1)との関係で抑圧される場合もあるし、l+1番目のメインビームに対応するウィナーゲインG₀(ω,k,l+1)との関係で抑圧される場合もある。 When there is a sound source in the target direction of the sub-beam, the gain suppression unit 142 suppresses the received winner gain and the received winner gain (that is, G ₀ (ω, k, l) = g _sp and G ₀ (ω, k, l +1) or G ₀ (ω, k, l) and G ₀ (ω, k, l + 1) = g _sp ), and if there is a sound source in the target direction of the main beam, the received winner The gain is output as it is (G ₀ (ω, k, l) and G ₀ (ω, k, l + 1)). Note that the winner gain G ₀ (ω, k, l) corresponding to the l-th main beam is related to the winner gain G ₀ (ω, k, l-1) corresponding to the (l-1) -th main beam. In some cases, it may be suppressed, or may be suppressed in relation to the Wiener gain G ₀ (ω, k, l + 1) corresponding to the l + 1-th main beam.

（ゲイン修正部１４３）
最後に、ゲイン修正部１４３は、F1(l)とウィナーゲインG₀(ω,k,l)と音源のPSDの推定値φ_{S_q}(ω,k,l)の平方根S_q(ω,k,l)（平方根算出部１３１−ｑ−３の出力値）を受け取り、後述するゲイン修正係数λ(ω,k)またはΛ^-(ω,k)を計算し、適用し、ゲイン全体の大きさを調整する（Ｓ１４３）。ゲイン抑圧部１４２ではウィナーゲインを小さくする処理を行うため、このまま時間周波数マスクを行うと、再生信号z_l(t)の音量の総量は、音源信号s_p(t)の音量の総量よりも小さくなる可能性がある。また、本実施形態は時間フレームに対して独立に処理を行うため、再生信号の音量から時間的な連続性が失われる可能性がある。これら問題を解決するために、ゲイン全体の大きさを調節する。 (Gain correction unit 143)
Finally, the gain correction unit 143 _calculates the square root S _q (ω, k, l) of F1 (l), the winner gain G ₀ (ω, k, l), and the estimated PSD φ _{S_q} (ω, k, l) of the sound source. l) (output value of the square root calculation unit 131-q-3) is received, a gain correction coefficient λ (ω, k) or Λ ⁻ (ω, k) described later is calculated and applied, and the magnitude of the entire gain is calculated. Adjust (S143). For processing to reduce the Wiener gain in the gain suppression unit 142, when the still time-frequency mask, the total volume of the reproduced signal z _l (t) is smaller than the total amount of the sound volume of the sound source signals s _p (t) There is a possibility. In addition, since the present embodiment performs processing independently for the time frame, there is a possibility that temporal continuity is lost from the volume of the reproduction signal. In order to solve these problems, the overall gain is adjusted.

ここで、強調信号の振幅スペクトルベクトルをY^- _ω,k=(|Y₀(ω,k,1)|,…,|Y₀(ω,k,L)|)^T、抑圧されたゲインのベクトルをg~(ω,k)=(G₀(ω,k,1),…,G₀(ω,k,L))、g~(ω,k)を対角成分に持つ対角行列をG~(ω,k)としたとき、ウィナーフィルタリング後の信号の振幅スペクトルZ^- _ω,k=(|Z^-(ω,k,1)|,…,|Z^-(ω,k,L)|)^Tは
Z^- _ω,k=G~(ω,k)Y^- _ω,k (44)
と記述できる。 Here, the amplitude spectrum vector of the enhancement signal is Y ⁻ _{ω, k} = (| Y ₀ (ω, k, 1) |,…, | Y ₀ (ω, k, L) |) ^T , Diagonal matrix with vectors g ~ (ω, k) = (G ₀ (ω, k, 1), ..., G ₀ (ω, k, L)) and g ~ (ω, k) as diagonal components the when the G ~ (ω, k), Wiener filtering the signal after the amplitude spectrum ^{_{Z - ω, k = (|}} Z - (ω, k, 1) |, ..., | Z - (ω, k, L ) |) ^T is
Z ^- _{ω, k} = G ~ (ω, k) Y ^- _{ω, k} (44)
Can be described.

したがって、|S(ω,k,l)|＝|Z(ω,k,l)|となる狭指向音声強調を実現するためには、S^- _ω,k=(S_F1(1)(ω,k,1),…,S_F1(L)(ω,k,L))として、目的関数J=|S^- _ω,k-G^-(ω,k)Y^- _ω,k|²を最小化するように、ゲインG~(ω,k)をゲインG^-(ω,k)に調節すればよい。なお、S_F1(l)(ω,k,l)の値は、F1(l)と平方根S_q(ω,k,l)とから求めることができる。 Therefore, in order to realize narrow-directional speech enhancement such that | S (ω, k, l) | = | Z (ω, k, l) |, S ⁻ _{ω, k} = (S _{F1 (1)} (ω , k, 1), ..., S F1 (L) (ω, k, as L)), the objective function ^{_{J = | S - ω, k}} -G - (ω, k) Y - ω, k | minimum ² The gain G˜ (ω, k) may be adjusted to the gain G ⁻ (ω, k) so that The value of S _{F1 (l)} (ω, k, l) can be obtained from F1 (l) and the square root S _q (ω, k, l).

この問題は、ゲインの調整法として、抑圧されたゲインを、すべての領域で同じレベルで増幅する修正を行うのであれば、
G^-(ω,k)=λ(ω,k)G~(ω,k) (45)
となるスカラーλ(ω,k)を求める問題となり、各領域で異なるレベルで増幅する修正を行うのであれば、
G^-(ω,k)=Λ^-(ω,k)G~(ω,k) (46)
となる対角行列Λ^-(ω,k)を求める問題に帰結する。 If this problem is corrected by amplifying the suppressed gain at the same level in all regions as a method of adjusting the gain,
^{G - (ω, k) =} λ (ω, k) G ~ (ω, k) (45)
If the correction to amplify at different levels in each region is performed,
^{G - (ω, k) =} Λ - (ω, k) G ~ (ω, k) (46)
To become diagonal matrix Λ ^- (ω, k) results in a problem of finding a.

これを満たすλ(ω,k)とΛ^-(ω,k)は、目的関数Jをそれぞれのパラメータで偏微分することで、

で計算できる。ここでdiag[・]は[・]内の行列の非対角成分を０にする処理である。詳細な導出は、以下の通りである。 Λ (ω, k) and Λ ⁻ (ω, k) satisfying this are obtained by partial differentiation of the objective function J with each parameter,

It can be calculated with Here, diag [•] is processing for setting the non-diagonal component of the matrix in [•] to 0. Detailed derivation is as follows.

（λ(ω,k)の導出）
目的関数J=|S^- _ω,k-G^-(ω,k)Y^- _ω,k|²を式(45)に基づき、
J=(S^- _ω,k-λ(ω,k)G~(ω,k)Y^- _ω,k)^T(S^- _ω,k-λ(ω,k)G~(ω,k)Y^- _ω,k) (49)
と式変形し、

のように目的関数をλ(ω,k)で偏微分して0と置くと、
λ(ω,k)Y^- _ω,k ^TG~^T(ω,k)G~(ω,k)Y^- _ω,k=Y^- _ω,k ^TG~^T(ω,k)S^- _ω,k

となる。なお、行列Gは対角行列であり、G~(ω,k)=G~^T(ω,k)なので、実装上は、計算量を減らすために、

としてもよい。 (Derivation of λ (ω, k))
The objective function ^{_{J = | S - ω, k}} -G - (ω, k) Y - ω, k | based ² on the formula (45),
J = (S ^- _{ω, k} -λ (ω, k) G ~ (ω, k) Y ^- _{ω, k} ) ^T (S ^- _{ω, k} -λ (ω, k) G ~ (ω, k) Y ^{- _ω,} _k) (49)
And the formula deformed,

If the objective function is partially differentiated by λ (ω, k) and set to 0,
λ (ω, k) Y ^- _{ω, k} ^T G ~ ^T (ω, k) G ~ (ω, k) Y ^- _{ω, k} = Y ^- _{ω, k} ^T G ~ ^T (ω, k) S ^- _{ω , k}

It becomes. Note that the matrix G is a diagonal matrix, and G ~ (ω, k) = G ~ ^T (ω, k).

It is good.

（Λ^-(ω,k)の導出）
目的関数J=|S^- _ω,k-G^-(ω,k)Y^- _ω,k|²を式(46)に基づき、
J=(S^- _ω,k-Λ^-(ω,k)G~(ω,k)Y^- _ω,k)^T(S^- _ω,k-Λ^-(ω,k)G~(ω,k)Y^- _ω,k) (51)
と式変形し、

のように目的関数をΛ^-(ω,k)で偏微分して0と置くと、
Y^- _ω,k ^TG~^T(ω,k)Λ^-(ω,k)G~(ω,k)Y^- _ω,k=Y^- _ω,k ^TG~^T(ω,k)S^- _ω,k
Λ^-(ω,k)G~(ω,k)Y^- _ω,kY^- _ω,k ^T=S^- _ω,kY^- _ω,k ^T
Λ^-(ω,k)=S^- _ω,kY^- _ω,k ^T(G~(ω,k)Y^- _ω,kY^- _ω,k ^T)^-1 (53)
となる。ここでΛ^-(ω,k)はそれぞれの方向のメインビームのゲインを修正する係数なので、非対角成分を０にする処理（diag[・]）を行う。 (Λ ^- (ω, k) the derivation of)
The objective function ^{_{J = | S - ω, k}} -G - (ω, k) Y - ω, k | based ² in equation (46),
^{_{J = (S - ω, k}} -Λ - (ω, k) G ~ (ω, k) Y - ω, k) T (S - ω, k -Λ - (ω, k) G ~ (ω, k ) Y ^- _{ω, k} ) (51)
And the formula deformed,

The objective function Λ as ^{- (ω,} k) 0 and when put to partial differential in,
^{_{^{Y - ω, k T G ~}}} T (ω, k) Λ - (ω, k) G ~ (ω, k) Y - ω, k = Y - ω, k T G ~ T (ω, k) S - _{ω, k}
^{Λ - (ω, k) G} ~ (ω, k) Y - ω, k Y - ω, k T = S - ω, k Y - ω, k T
^{Λ - (ω, k) =} S - ω, k Y - ω, k T (G ~ (ω, k) Y - ω, k Y - ω, k T) -1 (53)
It becomes. Here, since Λ ⁻ (ω, k) is a coefficient for correcting the gain of the main beam in each direction, a process (diag [•]) for setting the off-diagonal component to 0 is performed.

以上より、l番目のビームに対する修正されたウィナーゲインG(ω,k,l)は、
G(ω,k,l)=λ(ω,k)G~(ω,k,l) (54)
もしくは
G(ω,k,l)=Λ^- _ll(ω,k)G~(ω,k,l) (55)
で計算できる。ここでΛ^- _ll(ω,k)はΛ^-(ω,k)のl番目の対角成分である。 From the above, the corrected winner gain G (ω, k, l) for the l-th beam is
G (ω, k, l) = λ (ω, k) G ~ (ω, k, l) (54)
Or
G (ω, k, l) = Λ ^- _ll (ω, k) G ~ (ω, k, l) (55)
It can be calculated with Here Λ ^- _ll (ω, k) is Λ ^- (ω, k) is the l-th diagonal elements of.

ゲイン修正部１４３は、サブビームの目的方向に音源があった場合、修正したウィナーゲインを出力し、メインビームの目的方向に音源があった場合、受け取ったウィナーゲインをそのまま（G(ω,k,l)=G₀(ω,k,l)）出力する。 The gain correction unit 143 outputs a corrected winner gain when there is a sound source in the target direction of the sub-beam. If there is a sound source in the target direction of the main beam, the gain correction unit 143 outputs the received winner gain as it is (G (ω, k, l) = G ₀ (ω, k, l)) is output.

よって、収音装置１００が出力するメインビーム領域毎の再生信号は、メインビーム領域の調整後の各メインビーム領域の信号のパワーの総和と、調整前の各メインビーム領域の収音信号のパワーの総和との誤差が最小になるように、ゲイン修正部１４３ではメインビーム領域の調整後の信号に対してゲインを調節していると言える。 Therefore, the reproduction signal for each main beam area output from the sound collection device 100 is the sum of the powers of the signals in each main beam area after adjustment of the main beam area and the power of the sound collection signals in each main beam area before adjustment. It can be said that the gain correction unit 143 adjusts the gain with respect to the signal after the adjustment of the main beam region so that the error from the sum of the two is minimized.

＜効果＞
このような構成とすることで、第一実施形態と同様の効果を得ることができる。 <Effect>
By setting it as such a structure, the effect similar to 1st embodiment can be acquired.

＜その他の変形例＞
本発明は上記の実施形態及び変形例に限定されるものではない。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 <Other variations>
The present invention is not limited to the above-described embodiments and modifications. For example, the various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. In addition, it can change suitably in the range which does not deviate from the meaning of this invention.

＜プログラム及び記録媒体＞
また、上記の実施形態及び変形例で説明した各装置における各種の処理機能をコンピュータによって実現してもよい。その場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 <Program and recording medium>
In addition, various processing functions in each device described in the above embodiments and modifications may be realized by a computer. In that case, the processing contents of the functions that each device should have are described by a program. Then, by executing this program on a computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶部に格納する。そして、処理の実行時、このコンピュータは、自己の記憶部に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実施形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよい。さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、プログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its storage unit. When executing the process, this computer reads the program stored in its own storage unit and executes the process according to the read program. As another embodiment of this program, a computer may read a program directly from a portable recording medium and execute processing according to the program. Further, each time a program is transferred from the server computer to the computer, processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program includes information provided for processing by the electronic computer and equivalent to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、コンピュータ上で所定のプログラムを実行させることにより、各装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, although each device is configured by executing a predetermined program on a computer, at least a part of these processing contents may be realized by hardware.

Claims

M and L are each an integer of 2 or more, and L main beam regions each having L directions as target directions are formed by a microphone array composed of M microphones, and sound is generated for each main beam region. A sound collecting device for collecting sound,
When it is detected that a sound source exists in a sub-beam region whose target direction is a superposition region of two adjacent main beam regions in a state where L main beam regions are formed, the sound source is divided into two main beam regions. The signal on the main beam region side that is determined not to be close to the sound source is determined based on the feature amount of the emphasis signal that emphasizes the sound in the main beam region and the sub beam region. Adjust to suppress the sound of the sound source included in
Sound collection device.

The sound collection device according to claim 1,
N is any integer of 1 or more, and the L main beam regions are formed so as to collect sounds arriving toward the center of the microphone array, the number of sub beam regions is L or more, and A filter used for subtracting each sub-beam area into N sub-beam areas and detecting the presence of a sound source for the feature quantity of the enhancement signal that emphasizes the sounds in the L main beam areas and the L × N sub-beam areas. Forming and comparing the signal power calculated for each filter to determine whether the region where the signal power is maximum is the sub-beam region or the main beam region,
Sound collection device.

The sound collecting device according to claim 1 or 2,
When detecting the presence of a sound source in the main beam region and the sub beam region, a Wiener filter is used in a signal obtained by converting a microphone sound pickup signal into a frequency region
Sound collection device.

The sound collecting device according to any one of claims 1 to 3,
The reproduction signal for each main beam area output by the sound collection device is the sum of the power of the signals of each main beam area after adjustment of the main beam area and the sum of the powers of the sound collection signals of each main beam area before adjustment. And adjust the gain for the signal after adjustment of the main beam area so that the error with
Sound collection device.

The sound collecting device according to any one of claims 1 to 4,
For the main beam region that has been determined not to be close to the sound source, even if it is determined to be close to the sound source, the main beam that has been determined not to be close to the sound source unless a predetermined condition is satisfied. Adjust so as to suppress the sound of the sound source included in the signal on the region side,
Sound collection device.

M and L are each an integer of 2 or more, and L main beam regions each having L directions as target directions are formed by a microphone array composed of M microphones, and sound is generated for each main beam region. A sound collection method for collecting sound,
When it is detected that a sound source exists in a sub-beam region whose target direction is a superposition region of two adjacent main beam regions in a state where L main beam regions are formed, the sound source is divided into two main beam regions. The signal on the main beam region side that is determined not to be close to the sound source is determined based on the feature amount of the emphasis signal that emphasizes the sound in the main beam region and the sub beam region. Adjust to suppress the sound of the sound source included in
Sound collection method.

A program for causing a computer to function as the sound collecting device according to any one of claims 1 to 5.