JP5337189B2

JP5337189B2 - Reflector arrangement determination method, apparatus, and program for filter design

Info

Publication number: JP5337189B2
Application number: JP2011084728A
Authority: JP
Inventors: 健太丹羽; 澄宇阪内; 賢一古家; 陽一羽田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-04-06
Filing date: 2011-04-06
Publication date: 2013-11-06
Anticipated expiration: 2031-04-06
Also published as: JP2012222518A

Abstract

<P>PROBLEM TO BE SOLVED: To provide technology for determining the placement of a reflection object which reflects voices in designing a filter adopted for voice based information. <P>SOLUTION: In cases where a filter adopted for voice based information is designed on the basis of a prescribed evaluation function by using a spatial correlation matrix expressed by transfer characteristics in plural directions in space, the transfer characteristics each are expressed by the sum of the transfer characteristic of a direct sound and each transfer characteristic of one reflection sound reflecting at a reflection object, and the evaluation function is such a function that the more a voice in at least a target direction is emphasized, the smaller the value it takes. A storage unit has stored therein candidates for the placement of the reflection object for a microphone array or a speaker array. A placement determination unit calculates the values of the evaluation function using a spatial correlation matrix expressed by transfer characteristics determined by the candidate, and determines a candidate for placement corresponding to the smallest of those values as the placement of the reflection object. <P>COPYRIGHT: (C)2013,JPO&INPIT

Description

本発明は、音声に基づく情報に対して適用されるフィルタの設計において音声を反射する反射物の配置を決定する技術に関する。 The present invention relates to a technique for determining the arrangement of reflectors that reflect sound in the design of a filter applied to information based on sound.

例えばマイクロホンを備えた動画撮影装置（ビデオカメラやカムコーダ）で被写体をズームイン撮影する場合を考えると、ズームイン撮影に連動して被写体近傍のみからの音声が強調されることが動画撮影にとって好ましい。このような、所望の方向（目的方向）を含む狭い範囲の音声を強調する技術（音声強調技術）は、従来から研究・開発されている。なお、マイクロホンの周囲の方向とマイクロホンの感度との関係は指向性と呼ばれ、或る方向への指向性が鋭いほど、当該方向を含む狭い範囲の音声を強調し、当該範囲以外の範囲の音声を抑圧することができる。なお、この明細書では、「音声」は、人の発する声に限定されるものではなく、人や動物の声はもとより楽音や環境雑音など「音」一般を指す。 For example, considering a case where a subject is zoomed in with a moving image shooting apparatus (video camera or camcorder) equipped with a microphone, it is preferable for moving image shooting that the sound from only the vicinity of the subject is enhanced in conjunction with the zoom in shooting. Such a technique (speech enhancement technique) for enhancing a narrow range of speech including a desired direction (target direction) has been researched and developed conventionally. Note that the relationship between the direction around the microphone and the sensitivity of the microphone is called directivity. The sharper the directivity in a certain direction, the more the sound in a narrow range including the direction is emphasized. The voice can be suppressed. In this specification, “speech” is not limited to a voice uttered by a person, but refers to a general “sound” such as a musical sound or an environmental noise as well as a voice of a person or an animal.

反射音を選択収音することによる音声強調技術として、例えばマルチビームフォーミング法がある（非特許文献１参照）。マルチビームフォーミング法は、直接音や反射音という個々の音を寄せ集めることで、高SN比で目的方向の音声を収音することができる音声強調技術であり、音声分野よりも無線分野でよく研究されている。 As a voice enhancement technique by selectively collecting reflected sounds, for example, there is a multi-beam forming method (see Non-Patent Document 1). The multi-beam forming method is a voice enhancement technology that collects individual sounds such as direct sound and reflected sound, and can pick up the sound in the target direction with a high SN ratio, which is better in the wireless field than in the voice field. It has been studied.

以下、周波数領域でのマルチビームフォーミング法の処理内容を説明する。説明に先立ち、記号を定義する。周波数のインデックスをω、フレーム番号のインデックスをkとする。M個のマイクロホンで受音したアナログ信号の周波数領域表現をX^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^T、方向θ_sにある強調したい音源からの直接音の到来方向をθ_s1、反射音の到来方向をθ_s2,…,θ_sRとする。Tは転置を表し、R-1は反射音の総数である。方向θ_srの音声を強調するフィルタをW^→(ω,θ_sr)とする。ここで、rは1≦r≦Rを満たす各整数である。 Hereinafter, processing contents of the multi-beam forming method in the frequency domain will be described. Prior to explanation, symbols are defined. Let the frequency index be ω and the frame number index be k. A frequency domain representation of an analog signal received by M microphones is expressed as X ^→ (ω, k) = [X ₁ (ω, k), ..., X _M (ω, k)] ^T , with direction θ _s The direction of arrival of the direct sound from the desired sound source is θ _s1 , and the direction of arrival of the reflected sound is θ _s2 ,..., Θ _sR . T represents transposition and R-1 is the total number of reflected sounds. _Let W ^→ (ω, θ _sr ) be a filter that enhances the voice in the direction θ _sr . Here, r is an integer satisfying 1 ≦ r ≦ R.

マルチビームフォーミング法では、直接音および反射音の到来方向や到来時間が既知であることが前提である。つまり、音の反射が明らかに予想できる壁、床、反射板といった物体の数がR-1に等しい。また、反射音数R-1は３あるいは４という比較的小さな値に設定されることが多い。これは、直接音と低次の反射音との間に高い相関性が認められることに基づく。マルチビームフォーミング法は、各々の音声を個別に強調して同期加算する方式なので、出力信号Y(ω,k,θ_s)は式（１）で与えられる。Hはエルミート転置を表す。

In the multi-beam forming method, it is assumed that the arrival direction and arrival time of the direct sound and the reflected sound are known. In other words, the number of objects such as walls, floors, and reflectors that can clearly predict sound reflection is equal to R-1. The reflected sound number R-1 is often set to a relatively small value of 3 or 4. This is based on the fact that a high correlation is recognized between the direct sound and the low-order reflected sound. Since the multi-beam forming method is a method in which each sound is individually emphasized and synchronously added, the output signal Y (ω, k, θ _s ) is given by Equation (1). H represents Hermitian transpose.

フィルタW^→(ω,θ_sr)の設計法として遅延合成法を説明する。直接音や反射音が平面波到来すると仮定すると、フィルタW^→(ω,θ_sr)は式（２）で与えられる。h^→(ω,θ_sr)=[h₁(ω,θ_sr),…,h_M(ω,θ_sr)]^Tは、方向θ_srから到来する音声の伝搬ベクトルである。

A delay synthesis method will be described as a design method of the filter W ^→ (ω, θ _sr ). Assuming that direct sound or reflected sound arrives as a plane wave, the filter W ^→ (ω, θ _sr ) is given by equation (2). h ^→ (ω, θ _sr ) = [h ₁ (ω, θ _sr ),..., h _M (ω, θ _sr )] ^T is a propagation vector of speech coming from the direction θ _sr .

線形マイクロホンアレー（M個のマイクロホンが直線状に並べられたマイクロホンアレー）に平面波が到来することを仮定すると、h^→(ω,θ_sr)を構成する要素h_m(ω,θ_sr)は式（３）で与えられる。mは1≦m≦Mを満たす各整数である。cは音速を、uは隣り合うマイクロホン間の距離を表す。ｊは虚数単位である。τ(θ_sr)は、方向θ_srから到来する反射音の直接音に対する時間遅延を表す。

When linear microphone array assuming that the plane wave (M number of microphones microphone array are arranged in a straight line) ^{_{arrives, h → (ω, θ sr}} ) elements h _m (ω, θ _sr) that constitute the formula It is given by (3). m is an integer satisfying 1 ≦ m ≦ M. c represents the speed of sound, and u represents the distance between adjacent microphones. j is an imaginary unit. τ (θ _sr ) represents a time delay with respect to the direct sound of the reflected sound coming from the direction θ _sr .

最後に、出力信号Y(ω,k,θ_s)を時間領域に変換することによって、目的方向θ_sにある音源の音声を強調した信号が得られる。 Finally, by converting the output signal Y (ω, k, θ _s ) to the time domain, a signal in which the sound of the sound source in the target direction θ _s is enhanced is obtained.

J.L.Flanagan, A.C.Surendran, E.E.Jan, "Spatially selective sound capture for speech and audio processing," Speech Communication, Volume 13, Issue 1-2, pp.207-222, October 1993.J.L.Flanagan, A.C.Surendran, E.E.Jan, "Spatially selective sound capture for speech and audio processing," Speech Communication, Volume 13, Issue 1-2, pp.207-222, October 1993.

マルチビームフォーミング法では、直接音および反射音の到来方向や到来時間が既知であることが前提である。また、或る方向θ_srからの音声を強調するフィルタW^→(ω,θ_sr)を設計する際、式（２）で表されるように、当該方向θ_srの音声だけを単独で考慮していた。 In the multi-beam forming method, it is assumed that the arrival direction and arrival time of the direct sound and the reflected sound are known. Further, when designing a filter W ^→ (ω, θ _sr ) that emphasizes the voice from a certain direction θ _sr, only the voice in the direction θ _sr is considered alone as expressed by the equation (2). It was.

しかし、詳しくは本発明の実施形態で後述するが、フィルタ設計の段階にて、或る方向に関する音声を直接音と反射音の混合音声として考慮することが好ましい場合があり、この場合、マイクロホンアレーまたはスピーカアレーとの関係で、音声を反射する反射物の適切な配置を決定することが求められる場合がある。 However, as will be described later in detail in the embodiment of the present invention, it may be preferable to consider a sound in a certain direction as a mixed sound of a direct sound and a reflected sound at the filter design stage. Alternatively, there is a case where it is required to determine an appropriate arrangement of the reflector that reflects the sound in relation to the speaker array.

そこで本発明は、音声に基づく情報に対して適用されるフィルタの設計において音声を反射する反射物のマイクロホンアレーまたはスピーカアレーに対する配置を決定する技術を提供することを目的とする。 Therefore, an object of the present invention is to provide a technique for determining the arrangement of a reflector that reflects sound with respect to a microphone array or a speaker array in the design of a filter applied to information based on sound.

音声に基づく情報に対して適用されるフィルタが、空間中における複数の方向の伝達特性によって表される空間相関行列を用いて所定の評価関数に基づいて設計される場合であって、各伝達特性は、直接音の伝達特性と反射物で反射する一つの反射音の各伝達特性との和で表されており、評価関数は少なくとも目的方向の音声が強調されるほど小さな値をとる関数である。記憶部にはマイクロホンアレーまたはスピーカアレーに対する反射物の配置関係を表す情報（以下、配置情報という）が記憶されており、配置決定部が、配置情報に基づく反射物の各候補について、当該候補に基づいて特定される伝達特性によって表される空間相関行列を用いて評価関数の値を求め、当該値のうち最小のものに対応する候補を反射物の配置として決定する。 A filter applied to information based on speech is designed based on a predetermined evaluation function using a spatial correlation matrix represented by transfer characteristics in a plurality of directions in space, and each transfer characteristic Is expressed as the sum of the transfer characteristics of the direct sound and the transfer characteristics of one reflected sound reflected by the reflector, and the evaluation function is a function that takes at least a value so that the voice in the target direction is emphasized. . The storage unit stores information (hereinafter referred to as arrangement information) indicating the arrangement relationship of the reflector with respect to the microphone array or the speaker array, and the arrangement determination unit sets each candidate of the reflector based on the arrangement information as the candidate. The value of the evaluation function is obtained using the spatial correlation matrix expressed by the transfer characteristic specified on the basis, and the candidate corresponding to the smallest one among the values is determined as the arrangement of the reflectors.

本発明に拠ると、音声に基づく情報に対して適用されるフィルタの設計において音声を反射する反射物のマイクロホンアレーまたはスピーカアレーに対する配置を決定することができる。 According to the present invention, in the design of a filter applied to information based on sound, it is possible to determine the arrangement of a reflector that reflects sound with respect to the microphone array or the speaker array.

実施形態に係る反射物配置決定装置の機能構成を示す図。The figure which shows the function structure of the reflector arrangement | positioning determination apparatus which concerns on embodiment. 適用形態１の音声処理装置の機能構成を示す図。The figure which shows the function structure of the audio processing apparatus of the application form 1. FIG. 適用形態１の音声処理方法の処理手順を示す図。The figure which shows the process sequence of the audio | voice processing method of the application form 1. FIG. 適用形態２の音声処理装置の機能構成を示す図。The figure which shows the function structure of the audio processing apparatus of the application form 2. FIG. 適用形態２の音声処理方法の処理手順を示す図。The figure which shows the process sequence of the audio | voice processing method of the application form 2. FIG. 適用形態３の音声処理装置の機能構成を示す図。The figure which shows the function structure of the audio processing apparatus of the application form 3. FIG. 適用形態３の音声処理方法の処理手順を示す図。The figure which shows the process sequence of the audio | voice processing method of the application form 3. 適用形態４の音声処理装置の機能構成を示す図。The figure which shows the function structure of the audio processing apparatus of the application form 4. FIG. 適用形態４の音声処理方法の処理手順を示す図。The figure which shows the process sequence of the audio | voice processing method of the application form 4. マイクロホンアレーと反射板との位置関係等を示す図（その１）。The figure which shows the positional relationship etc. of a microphone array and a reflecting plate (the 1). マイクロホンアレーと反射板との位置関係等を示す図（その２）。FIG. 2 is a diagram illustrating a positional relationship between a microphone array and a reflecting plate (part 2); スピーカアレーと反射板との位置関係等を示す図（その１）。The figure which shows the positional relationship etc. of a speaker array and a reflecting plate (the 1). スピーカアレーと反射板との位置関係等を示す図（その２）。The figure (the 2) which shows the positional relationship etc. of a speaker array and a reflecting plate.

本発明の実施形態を、図１を参照して説明する。予め概略を述べると、本発明は、或る方向に関する音声を直接音と反射音の混合音声として考慮する場合に、音声に基づく情報（実施形態の例では、音声信号が周波数領域に変換された周波数領域信号）に対して適用されるフィルタの設計において音声を反射する反射物のマイクロホンアレーまたはスピーカアレーに対する配置を決定する技術であり、フィルタの設計コンセプト自体には影響を与えない。従って、本発明が適用されるフィルタ設計手法として格別の限定はない。フィルタの設計コンセプトは、統計的最適化規範であり、例えば、入力サンプル列にフィルタを適用して得られる出力と希望応答との差（推定誤差）について、推定誤差の平均２乗値、推定誤差の絶対値の期待値、推定誤差の絶対値に関する３次以上のべき乗の期待値、などを評価関数として挙げることができ、この評価関数を最小化（評価関数やその表現によっては最大化）することによりフィルタを設計する。ここでは説明を一貫させるため、評価関数は、少なくともマイクロホンアレーまたはスピーカアレーから見た目的方向の音声が強調されるほど絶対値の小さい値を出力する関数とする。「少なくとも・・・目的方向」とした理由は、後の＜距離の導入＞で説明するように、いずれの設計法においても、目的方向のみならず、マイクロホンアレーまたはスピーカアレーから音源までの距離も考慮してフィルタを設計することも可能であるからである。ここでは、フィルタ設計手法として、最小分散無歪応答法（MVDR method;minimum variance distortion response method）、SN比最大化規準によるフィルタ設計法、パワーインバージョン(Power Inversion)に基づくフィルタ設計法の３種類を例示する。最小分散無歪応答法については参考文献１を、SN比最大化規準によるフィルタ設計法とパワーインバージョンに基づくフィルタ設計法については参考文献２を参照されたい。
（参考文献１）大賀寿郎、山崎芳男、金田豊著、「音響システムとディジタル処理」、社団法人電子情報通信学会、1995、pp.203-209
（参考文献２）菊間信良著、「アダプティブアンテナ技術」、第１版、株式会社オーム社、２００３年、pp.35-90 An embodiment of the present invention will be described with reference to FIG. In general, in the present invention, when a sound in a certain direction is considered as a mixed sound of a direct sound and a reflected sound, information based on the sound (in the example of the embodiment, the sound signal is converted into the frequency domain). In the design of a filter applied to a frequency domain signal), it is a technique for determining the arrangement of a reflector that reflects sound with respect to a microphone array or a speaker array, and does not affect the filter design concept itself. Therefore, there is no particular limitation as a filter design technique to which the present invention is applied. The design concept of the filter is a statistical optimization rule. For example, for the difference (estimation error) between the output obtained by applying the filter to the input sample sequence and the desired response (estimation error), the mean square value of the estimation error, the estimation error The expected value of the absolute value of, the expected value of the third or higher power related to the absolute value of the estimation error, etc. can be listed as evaluation functions, and this evaluation function is minimized (maximized depending on the evaluation function and its expression). To design the filter. Here, in order to make the explanation consistent, the evaluation function is a function that outputs a value having a smaller absolute value so that at least the sound in the target direction viewed from the microphone array or the speaker array is emphasized. The reason for "at least ... the target direction" is that not only the target direction but also the distance from the microphone array or the speaker array to the sound source in any design method, as will be described later in <Introduction of distance>. This is because it is possible to design the filter in consideration. Here, there are three types of filter design methods: the minimum variance distortion response method (MVDR method), the filter design method based on the SNR maximization criterion, and the filter design method based on power inversion. Is illustrated. Refer to Reference Document 1 for the minimum variance distortionless response method, and Reference Document 2 for the filter design method based on the S / N ratio maximization criterion and the filter design method based on power inversion.
(Reference 1) Toshiro Oga, Yoshio Yamazaki, Yutaka Kaneda, "Acoustic System and Digital Processing", The Institute of Electronics, Information and Communication Engineers, 1995, pp.203-209
(Reference 2) Nobuyoshi Kikuma, “Adaptive Antenna Technology”, 1st Edition, Ohm Corporation, 2003, pp.35-90

本発明の実施形態である反射物配置決定装置１００は、それ単体で独立に存在するよりは、例えば後述する音声処理装置１，２，３，４を構成するエンティティとして存在するのが一般的である。さらに云えば、反射物配置決定装置１００は、音声処理装置１，２，３，４とは容易に分離可能に音声処理装置１，２，３，４を構成するエンティティではなく、音声処理装置１，２，３，４自体の一部の機能に着眼して片面的に評価したものと云うこともできる。要するに、反射物配置決定装置１００は、音声処理装置１，２，３，４そのものであることが一般的である。具体的には、反射物配置決定装置１００の機能を中央演算装置や専用ＬＳＩに実装して、反射物配置決定装置１００を実現することができる。
ただし、反射物配置決定装置１００が、単体独立のエンティティとして存在すること、音声処理装置１，２，３，４とは容易に分離可能に音声処理装置１，２，３，４を構成するエンティティであることを排除する趣旨ではない。例えば反射物の配置決定それ自体を目的とするならば、反射物配置決定装置１００を単体独立のエンティティとして実現することに何らの妨げは無い。
ここで音声処理装置１，２，３，４は、例えば専用のハードウェアで構成された専用機やパーソナルコンピュータのような汎用機といったコンピュータで実現されるとし、単体独立のエンティティとして反射物配置決定装置１００を実現する場合も同様である。 The reflector arrangement determining apparatus 100 according to the embodiment of the present invention is generally present as an entity constituting, for example, the audio processing apparatuses 1, 2, 3, and 4 to be described later, rather than being independently present alone. is there. Furthermore, the reflector arrangement determining device 100 is not an entity that constitutes the speech processing devices 1, 2, 3, and 4 so as to be easily separable from the speech processing devices 1, 2, 3, and 4, but the speech processing device 1 2, 3, 4 itself can be said to have been evaluated unilaterally with a focus on some functions. In short, the reflector arrangement determining device 100 is generally the sound processing device 1, 2, 3, 4 itself. Specifically, the reflector arrangement determining apparatus 100 can be realized by mounting the function of the reflector arrangement determining apparatus 100 in a central processing unit or a dedicated LSI.
However, the reflector arrangement determining apparatus 100 exists as a single independent entity, and the entities constituting the speech processing apparatuses 1, 2, 3, and 4 can be easily separated from the speech processing apparatuses 1, 2, 3, and 4. It is not intended to exclude that. For example, if the object is to determine the arrangement of the reflector itself, there is no obstacle to realizing the reflector arrangement determining apparatus 100 as a single independent entity.
Here, it is assumed that the audio processing devices 1, 2, 3, and 4 are realized by a computer such as a dedicated machine configured by dedicated hardware or a general-purpose machine such as a personal computer, and the reflector arrangement determination is performed as a single independent entity. The same applies to the case where the device 100 is realized.

<１>最小分散無歪応答法によるフィルタ設計法
説明に先立ち、改めて記号を定義する。離散周波数のインデックスをω（周波数ｆと角周波数ωとの間にはω=2πfの関係があるから、離散周波数のインデックスωをこの角周波数ωと同一視してもかまわない。ωに関して「離散周波数のインデックス」を単に「周波数」ともいう）、フレーム番号のインデックスをkとする。M個のマイクロホンで受音したアナログ信号の第kフレームの周波数領域表現をX^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^T、マイクロホンアレーの中心から見て目的方向θ_sの音声の周波数領域表現を周波数ωで強調するフィルタをW^→(ω,θ_s)とする。Mは2以上の整数とする。Tは転置を表す。このとき、目的方向θ_sの音声の周波数領域表現が周波数ωで強調された周波数領域信号（以下、出力信号と呼ぶ）Y(ω,k,θ_s)は式（４）で与えられる。Hはエルミート転置を表す。

<1> Filter design method based on minimum variance distortion-free response method Prior to explanation, symbols are defined again. The index of the discrete frequency is ω (there is a relationship of ω = 2πf between the frequency f and the angular frequency ω, so the index ω of the discrete frequency may be identified with the angular frequency ω. The frequency index "is also simply referred to as" frequency "), and the frame number index is k. The frequency domain representation of the kth frame of the analog signal received by M microphones is expressed as X ^→ (ω, k) = [X ₁ (ω, k), ..., X _M (ω, k)] ^T , microphone array Let W ^→ (ω, θ _s ) be a filter that emphasizes the frequency domain representation of the speech in the target direction θ _s with the frequency ω as viewed from the center of the. M is an integer of 2 or more. T represents transposition. At this time, a frequency domain signal (hereinafter referred to as an output signal) Y (ω, k, θ _s ) in which the frequency domain representation of the voice in the target direction θ _s is emphasized by the frequency ω is given by Expression (4). H represents Hermitian transpose.

「マイクロホンアレーの中心」は任意に定めることができるが、一般的にはM個のマイクロホンの配置の幾何学的中心が「マイクロホンアレーの中心」とされ、例えば線形マイクロホンアレーであれば両端のマイクロホンの中間点が「マイクロホンアレーの中心」とされ、例えばm×m（m²=M）の正方マトリックス状に配置された平面マイクロホンアレーであれば、四隅のマイクロホンの対角線が交わる位置が「マイクロホンアレーの中心」とされる。 The “center of the microphone array” can be arbitrarily determined, but generally, the geometric center of the arrangement of the M microphones is the “center of the microphone array”. For example, in the case of a linear microphone array, the microphones at both ends The middle point of the microphone is the “center of the microphone array”. For example, in the case of a planar microphone array arranged in a square matrix of m × m (m ² = M), the positions where the diagonal lines of the microphones at the four corners intersect are “microphone array”. The center of

フィルタW^→(ω,θ_s)の設計法として最小分散無歪応答法に拠る場合、フィルタW^→(ω,θ_s)は、式（６）の拘束条件の下、空間相関行列Q(ω)を用いて目的方向θ_s以外の方向の音声（以下、「目的方向θ_s以外の方向の音声」を「雑音」とも呼ぶ）のパワーが周波数ωで最小となるように設計される（式（５）参照）。a^→(ω,θ_s)＝[a₁(ω,θ_s),…,a_M(ω,θ_s)]^Tは、方向θ_sに音源が在ると仮定した場合の、当該音源とM本のマイクロホンとの間の周波数ωでの伝達特性である。換言すれば、a^→(ω,θ_s)＝[a₁(ω,θ_s),…,a_M(ω,θ_s)]^Tは、マイクロホンアレーに含まれる各マイクロホンへの方向θ_sの音声の周波数ωでの伝達特性である。

Filter W ^→ (ω, θ _s) if due to the minimum variance distortionless response method as a design method of the filter W ^→ (ω, θ _s) under the constraint condition of the equation (6), the spatial correlation matrix Q (omega ) audio (hereinafter the direction other than the target direction theta _s using a power of the "target direction theta _s other way voice" is also referred to as "noise") are designed to minimize the frequency omega (formula (Refer to (5)). a ^→ (ω, θ _s ) = [a ₁ (ω, θ _s ), ..., a _M (ω, θ _s )] ^T is the sound source when it is assumed that there is a sound source in the direction θ _s This is a transfer characteristic at a frequency ω between M microphones. In other words, a ^→ (ω, θ _s ) = [a ₁ (ω, θ _s ), ..., a _M (ω, θ _s )] ^T is the direction θ _s to each microphone included in the microphone array. It is a transfer characteristic at the frequency ω of sound.

式（５）の最適解であるフィルタW^→(ω,θ_s)は式（７）で与えられることが知られている（参考文献１参照）。

It is known that the filter W ^→ (ω, θ _s ), which is the optimum solution of the equation (5), is given by the equation (7) (see Reference 1).

式（５）から、雑音のパワーは空間相関行列Q(ω)の構造に依存することがわかる。そこで、空間相関行列Q(ω)の構造について説明する。雑音の到来方向のインデックスpが属する集合を{1,2,…,P-1}とする。目的方向θ_sのインデックスsは集合{1,2,…,P-1}に属さないとする。P-1個の雑音が任意の方向から到来すると仮定すると、空間相関行列Q(ω)は式（８ａ）で与えられる。多くの雑音が存在する中でも十分に機能するフィルタを作る観点から、Pはある程度大きい値であることが好ましく、M程度の整数であるとする。なお、ここでは発明の原理を分かり易く説明する観点から目的方向θ_sがあたかも特定の方向の如く説明しているが（それ故、目的方向θ_s以外の方向を「雑音」の方向としている）、実際には、目的方向θ_sは音声強調の対象となりえる任意の方向であり、目的方向θ_sになりえる方向として一般的に複数の方向が想定される。このような観点からすると、目的方向θ_sと雑音の方向との区別は凡そ主観的なものであり、目的音か雑音かの区別なく音声の到来方向として想定される複数の方向としてP個の異なる方向を予め決めておき、P個の方向のうち選択された一つの方向が目的方向であり、それ以外の方向が雑音の方向であると理解することがより正確である。そこで、集合{1,2,…,P-1}と集合{s}との和集合をΦとすると、空間相関行列Q(ω)は、音声の到来方向として想定される複数の方向に含まれる各方向θ_φの音声の各マイクロホンへの伝達特性a^→(ω,θ_φ)＝[a₁(ω,θ_φ),…,a_M(ω,θ_φ)]^T（φ∈Φ）によって表される空間相関行列であり、式（８ｂ）で表される。なお、|Φ|=Pである。|Φ|は集合Φの要素数を表す。

From equation (5), it can be seen that the noise power depends on the structure of the spatial correlation matrix Q (ω). Therefore, the structure of the spatial correlation matrix Q (ω) will be described. Let {1, 2,..., P-1} be a set to which the noise arrival direction index p belongs. It is assumed that the index s in the target direction θ _s does not belong to the set {1, 2,..., P-1}. Assuming that P-1 noises come from any direction, the spatial correlation matrix Q (ω) is given by equation (8a). From the viewpoint of creating a filter that functions sufficiently even in the presence of a lot of noise, P is preferably a somewhat large value, and is assumed to be an integer of about M. Here, from the viewpoint of easily explaining the principle of the invention, the target direction θ _s is described as if it were a specific direction (therefore, directions other than the target direction θ _s are set as “noise” directions). Actually, the target direction θ _s is an arbitrary direction that can be a target of speech enhancement, and a plurality of directions are generally assumed as directions that can be the target direction θ _s . From this point of view, the distinction between the target direction θ _s and the noise direction is almost subjective, and there are P number of directions that can be assumed as voice arrival directions regardless of whether the target sound or noise is distinguished. It is more accurate to determine different directions in advance and understand that one of the P directions selected is the target direction and the other directions are noise directions. Therefore, if the union of the set {1, 2, ..., P-1} and the set {s} is Φ, the spatial correlation matrix Q (ω) is included in multiple directions that are assumed as the voice arrival directions. transfer characteristics a to each microphone of the speech in each direction theta _phi to ^{_{→ (ω, θ φ) =}} [a 1 (ω, θ φ), ..., a M (ω, θ φ)] T (φ∈Φ) Is a spatial correlation matrix expressed by Equation (8b). Note that | Φ | = P. | Φ | represents the number of elements of the set Φ.

マイクロホンアレーの各マイクロホンには、音源からの直接音と、当該音源からの音が反射物で反射した反射音との二種類の音波（ここでは説明の便宜で平面波と仮定するが、球面波であってもよい）が混入することになる。反射音の数をΞとする。Ξは１以上の予め定められた整数である。このとき、伝達特性a^→(ω,θ)＝[a₁(ω,θ),…,a_M(ω,θ)]^Tは、音声強調の対象となりえる方向の音声がマイクロホンアレーに直接届く直接音の伝達特性と当該音声が反射物で反射してマイクロホンアレーに届く一つ以上の反射音の各伝達特性との和、具体的には、直接音とξ番目（1≦ξ≦Ξ）の反射音との到来時間差をτ_ξ(θ)とし、α_ξ（1≦ξ≦Ξ）を反射による音の減衰を考慮するための係数とすると、式（９ａ）のように、直接音のステアリングベクトルと、反射による音の減衰および直接音に対する到来時間差が補正されたΞ個の反射音のステアリングベクトルの和で表現できる。h^→ _d(ω,θ)=[h_d1(ω,θ),…,h_dM(ω,θ)]^Tは方向θの直接音のステアリングベクトルを、h^→ _rξ(ω,θ)=[h_r1ξ(ω,θ),…,h_rMξ(ω,θ)]^Tは方向θの直接音に対応する反射音のステアリングベクトルを表す。なお、ステアリングベクトルは、マイクロホンアレーの中心から見て方向θの音波について、基準点に対する各マイクロホンの周波数ωでの位相応答特性を並べた複素ベクトルである。α_ξ（1≦ξ≦Ξ）は、通常、α_ξ≦1（1≦ξ≦Ξ）である。各反射音について、音源からマイクロホンに到達するまでの反射回数が１回であるならば、α_ξ（1≦ξ≦Ξ）は、ξ番目の反射音が反射した物体の音の反射率を表していると考えて差し支えない。

Each microphone in the microphone array has two types of sound waves, a direct sound from a sound source and a reflected sound that is reflected from the sound source by a reflector (here it is assumed to be a plane wave for convenience of explanation, but a spherical wave May be present). Let the number of reflected sounds be Ξ. Ξ is a predetermined integer of 1 or more. At this time, the transfer characteristic a ^→ (ω, θ) = [a ₁ (ω, θ),..., A _M (ω, θ)] ^T directly transmits the sound in the direction that can be the target of speech enhancement to the microphone array. The sum of the direct sound transmission characteristics and the transmission characteristics of one or more reflected sounds that are reflected by the reflector and reach the microphone array, specifically, the direct sound and the ξth (1 ≦ ξ ≦ Ξ) If the arrival time difference from the reflected sound is τ _ξ (θ), and α _ξ (1 ≦ ξ ≦ Ξ) is a coefficient for considering the attenuation of the sound due to the reflection, the direct sound This can be expressed as the sum of the steering vector and the steering vector of several reflected sounds in which the sound attenuation due to reflection and the arrival time difference with respect to the direct sound are corrected. h ^→ _d (ω, θ) = [h _d1 (ω, θ),…, h _dM (ω, θ)] ^T is the steering vector of the direct sound in the direction θ, h ^→ _rξ (ω, θ) = [ h _r1ξ (ω, θ),..., h _rMξ (ω, θ)] ^T represents the steering vector of the reflected sound corresponding to the direct sound in the direction θ. The steering vector is a complex vector in which the phase response characteristics at the frequency ω of each microphone with respect to the reference point are arranged for sound waves in the direction θ as viewed from the center of the microphone array. α _ξ (1 ≦ ξ ≦ Ξ) is usually α _ξ ≦ 1 (1 ≦ ξ ≦ Ξ). For each reflected sound, if the number of reflections from the sound source to the microphone is one, α _ξ (1 ≦ ξ ≦ Ξ) represents the reflectance of the sound of the object reflected by the ξth reflected sound. You can think that it is.

Ｍ個のマイクロホンで構成されるマイクロホンアレーに対して一つ以上の反射音を与えることが望まれるので、一つ以上の反射物が存在することが好ましい。このような観点からすると、目的方向に音源が在るとして、当該音源とマイクロホンアレーと一つ以上の反射物との位置関係は、当該音源からの音が少なくとも一つの反射物で反射してマイクロホンアレーに届くように、各反射物が配置されていることが好ましい。各反射物の形状は、２次元形状（例えば平板）または３次元形状（例えばパラボラ形状）である。また、各反射物の大きさはマイクロホンアレーと同等かそれ以上（１〜２倍程度）の大きさを持つことが好ましい。反射音を効果的に活用するためには、各反射物の反射率α_ξ（1≦ξ≦Ξ）は少なくとも０よりも大きく、さらに言えば、マイクロホンアレーに届いた反射音の振幅が直接音の振幅の例えば0.2倍以上であることが望ましく、例えば各反射物は剛性を有する固体とされる。反射物は移動可能な物体（例えばマイクロホンアレーを設置している支持体に対して可動に組み合わされた反射板）であっても移動不能な物体（例えばマイクロホンアレーを設置している支持体に固定された反射板）であってもよい。反射物の配置関係の決定については後述する。 Since it is desired to provide one or more reflected sounds to a microphone array composed of M microphones, it is preferable that one or more reflectors exist. From this point of view, assuming that there is a sound source in the target direction, the positional relationship between the sound source, the microphone array, and one or more reflectors is such that the sound from the sound source is reflected by at least one reflector. Each reflector is preferably arranged to reach the array. Each reflector has a two-dimensional shape (for example, a flat plate) or a three-dimensional shape (for example, a parabolic shape). Moreover, it is preferable that the size of each reflector is equal to or larger than the microphone array (about 1 to 2 times). In order to effectively use the reflected sound, the reflectance α _ξ (1 ≦ ξ ≦ Ξ) of each reflector is at least greater than 0, and more specifically, the amplitude of the reflected sound that reaches the microphone array is a direct sound. For example, each reflector is a rigid solid. Even if the reflector is a movable object (for example, a reflector movably combined with the support on which the microphone array is installed), it cannot be moved (for example, fixed to the support on which the microphone array is installed). Reflection plate). The determination of the arrangement relationship of the reflecting objects will be described later.

線形マイクロホンアレーに音声が平面波として到来すると仮定すると、直接音のステアリングベクトルh^→ _d(ω,θ)を構成するm番目の要素h_dm(ω,θ)は例えば式（１０ａ）で与えられる。mは1≦m≦Mを満たす各整数である。cは音速を、uは隣り合うマイクロホン間の距離を表す。ｊは虚数単位である。基準点は線形マイクロホンアレーの全長の半分の位置（線形マイクロホンアレーの中心）である。方向θは線形マイクロホンアレーの中心から見て直接音の到来方向と線形マイクロホンアレーに含まれるマイクロホンの配列方向とがなす角度として定義した（図１０、図１１参照）。なお、ステアリングベクトルの表し方は種々あり、例えば、基準点を線形マイクロホンアレーの一端にあるマイクロホンの位置とすれば、直接音のステアリングベクトルh^→ _d(ω,θ)を構成するm番目の要素h_dm(ω,θ)は例えば式（１０ｂ）で与えられる。以下、直接音のステアリングベクトルh^→ _d(ω,θ)を構成するm番目の要素h_dm(ω,θ)は式（１０ａ）で与えられるとして説明する。

Assuming that speech arrives at the linear microphone array as a plane wave, the m-th element h _dm (ω, θ) constituting the direct sound steering vector h ^→ _d (ω, θ) is given by, for example, the following equation (10a). m is an integer satisfying 1 ≦ m ≦ M. c represents the speed of sound, and u represents the distance between adjacent microphones. j is an imaginary unit. The reference point is half the total length of the linear microphone array (the center of the linear microphone array). The direction θ is defined as an angle formed by the direct sound arrival direction and the arrangement direction of the microphones included in the linear microphone array as viewed from the center of the linear microphone array (see FIGS. 10 and 11). There are various ways of expressing the steering vector. For example, if the reference point is the position of the microphone at one end of the linear microphone array, the mth element constituting the direct sound steering vector h ^→ _d (ω, θ). h _dm (ω, θ) is given by, for example, equation (10b). In the following description, it is assumed that the m-th element h _dm (ω, θ) constituting the direct sound steering vector h ^→ _d (ω, θ) is given by equation (10a).

反射音のステアリングベクトルh^→ _r(ω,θ)=[h_r1(ω,θ),…,h_rM(ω,θ)]^Tのm番目の要素は、直接音のステアリングベクトルの表し方と同様に（式（１０ａ）参照）、式（１１ａ）で表される。関数Ψ(θ)は反射音の到来方向を出力する。なお、直接音のステアリングベクトルを式（１０ｂ）で表す場合には、反射音のステアリングベクトルh^→ _r(ω,θ)=[h_r1(ω,θ),…,h_rM(ω,θ)]^Tのm番目の要素は式（１１ｂ）で表される。一般的に、ξ番目（1≦ξ≦Ξ）のステアリングベクトルh^→ _rξ(ω,θ)=[h_r1ξ(ω,θ),…,h_rMξ(ω,θ)]^Tのm番目の要素は、式（１１ｃ）や式（１１ｄ）で表される。関数Ψ_ξ(θ)はξ番目（1≦ξ≦Ξ）の反射音の到来方向を出力する。

Reflection sound steering vector h ^→ _r (ω, θ) = [h _r1 (ω, θ),…, h _rM (ω, θ)] The m-th element of ^T Similarly (refer to Formula (10a)), it is expressed by Formula (11a). The function Ψ (θ) outputs the arrival direction of the reflected sound. When the direct sound steering vector is expressed by equation (10b), the reflected sound steering vector h ^→ _r (ω, θ) = [h _r1 (ω, θ),..., H _rM (ω, θ) ] The m-th element of ^T is expressed by Expression (11b). In general, the _ξth (1 ≦ ξ ≦ Ξ) steering vector h ^→ _rξ (ω, θ) = [h _r1ξ (ω, θ),…, h _rMξ (ω, θ)] The mth element of ^T Is expressed by equation (11c) or equation (11d). The function Ψ _ξ (θ) outputs the arrival direction of the ξth (1 ≦ ξ ≦ Ξ) reflected sound.

さて、到来時間差τ_ξ(θ)と関数Ψ_ξ(θ)は、マイクロホンアレーに対する反射物の配置関係によって定まる。到来時間差τ_ξ(θ)と関数Ψ_ξ(θ)が定まると、直接音のステアリングベクトルh^→ _d(ω,θ)と反射音のステアリングベクトルh^→ _rξ(ω,θ)が定まる。直接音のステアリングベクトルh^→ _d(ω,θ)と反射音のステアリングベクトルh^→ _rξ(ω,θ)が定まると、伝達特性a^→(ω,θ)が定まる。伝達特性a^→(ω,θ)が定まると、空間相関行列Q(ω)が定まる。そして、既述のとおり、雑音のパワーは空間相関行列Q(ω)の構造に依存する。よって、マイクロホンアレーに対する反射物の配置関係を決定することが重要である。ここでは、具体例として、マイクロホンアレーに対する角度（線形マイクロホンアレーに含まれるマイクロホンの配列方向と反射物とがなす角度）とマイクロホンアレーの中心からの距離をもって、マイクロホンアレーに対する反射物の配置関係を特定することとする（図１０、図１１、図１２、図１３参照）。 Now, the arrival time difference τ _ξ (θ) and the function ψ _ξ (θ) are determined by the arrangement relationship of the reflector with respect to the microphone array. When the arrival time difference τ _ξ (θ) and the function Ψ _ξ (θ) are determined, the direct sound steering vector h ^→ _d (ω, θ) and the reflected sound steering vector h ^→ _rξ (ω, θ) are determined. When the direct sound steering vector h ^→ _d (ω, θ) and the reflected sound steering vector h ^→ _rξ (ω, θ) are determined, the transfer characteristic a ^→ (ω, θ) is determined. When the transfer characteristic a ^→ (ω, θ) is determined, the spatial correlation matrix Q (ω) is determined. As described above, the noise power depends on the structure of the spatial correlation matrix Q (ω). Therefore, it is important to determine the positional relationship of the reflector with respect to the microphone array. Here, as a specific example, the positional relationship of the reflector with respect to the microphone array is specified based on the angle to the microphone array (the angle between the microphone array direction and the reflector included in the linear microphone array) and the distance from the center of the microphone array. (See FIGS. 10, 11, 12, and 13).

以下、具体的に説明する観点から、Ξ=1とし、反射音の反射回数は１回であって、マイクロホンアレーの中心から離れた位置に一つの反射物が存在すると仮定する。反射物は厚みのある剛体平板とする。以下、反射物を反射板と呼称する。この場合、Ξ=1であるからこれを表す添え字を略することとして、式（９ａ）は式（９ｂ）のように表すことができる。

Hereinafter, from the viewpoint of specific explanation, it is assumed that Ξ = 1, the number of reflections of reflected sound is one, and there is one reflector at a position away from the center of the microphone array. The reflector is a thick rigid plate. Hereinafter, the reflector is referred to as a reflector. In this case, since Ξ = 1, the subscript representing this is omitted, and the expression (9a) can be expressed as the expression (9b).

反射物配置決定装置の記憶部１０１には、マイクロホンアレーに対する反射板の配置関係を表す情報がデータとして記憶されている（後述するように実施形態によっては、「スピーカアレーに対する反射板の配置関係を表す情報」であるが、ここではマイクロホンアレーの場合を代表して説明する）。マイクロホンアレーに対する反射板の配置関係を表す情報の一例は、マイクロホンアレーに対する反射板の配置に関する予め定められた候補の集合であり、この集合をＵとする。集合Ｕに含まれる候補は、例えば、マイクロホンアレーに対する反射板の角度の候補の数をJとし、マイクロホンアレーの中心から反射板までの距離の候補の数をKとすると、角度の候補と距離の候補との組み合わせによって表され、集合Ｕに含まれる候補の総数はJ×K（以下、JKと略記する）となる。 The storage unit 101 of the reflector arrangement determining apparatus stores information representing the arrangement relationship of the reflector with respect to the microphone array as data (in some embodiments, as described later, “the arrangement relationship of the reflector with respect to the speaker array is changed. The information is “represented”, but here, the case of a microphone array will be described as a representative). An example of information representing the arrangement relationship of the reflectors with respect to the microphone array is a set of predetermined candidates relating to the arrangement of the reflectors with respect to the microphone array, and this set is U. The candidates included in the set U are, for example, that the number of reflector angle candidates with respect to the microphone array is J and the number of distance candidates from the center of the microphone array to the reflector is K. The total number of candidates that are represented by combinations with candidates and included in the set U is J × K (hereinafter abbreviated as JK).

マイクロホンアレーに対する反射板の配置関係を表す情報の他の例として、関数を表す情報でもよい。例えば、マイクロホンアレーに対する反射板の角度の候補C_Angle，j=j×Δθ[j=1,2,…，J]を与える離散関数と、マイクロホンアレーの中心から反射板までの距離の候補C_distance，k=k×ΔL[k=1,2,…，K]を与える離散関数を、マイクロホンアレーに対する反射板の配置関係を表す情報として反射物配置決定装置の記憶部１０１が記憶する構成でもよい。ここで、Δθは予め定められた角度、ΔLは予め定められた長さである。ここでは、等間隔に角度と距離の候補を与える離散関数を例示したが、非等間隔に角度と距離の候補を与える離散関数や、あるいは連続関数であってもよいことはもちろんである（連続関数の場合は例えば入力値を離散的に設定すればよい）。 As another example of the information indicating the arrangement relationship of the reflector with respect to the microphone array, information indicating a function may be used. For example, a reflector function candidate C _{Angle, j} = j × Δθ [j = 1, 2,..., J] with respect to the microphone array and a candidate C _distance of the distance from the center of the microphone array to the reflector _{, K} = k × ΔL [k = 1, 2,..., K] may be stored in the storage unit 101 of the reflector arrangement determination device as information representing the arrangement relationship of the reflector with respect to the microphone array. . Here, Δθ is a predetermined angle, and ΔL is a predetermined length. Here, a discrete function that gives candidates for angles and distances at equal intervals is illustrated, but it is of course possible to be a discrete function that gives candidates for angles and distances at non-uniform intervals, or a continuous function (continuous In the case of a function, for example, input values may be set discretely).

反射物配置決定装置１００の配置決定部１１０は、記憶部１０１から取得したJK個の候補、あるいは記憶部１０１に記憶された関数に従って配置決定部１１０が計算して得たJK個の候補、のそれぞれ（候補インデックスをｎとする）について、式（１２）によるパワー（評価関数）p_nを計算する。Ωは周波数ωの集合である。空間相関行列Q_n(ω)は、候補インデックスｎに対応する反射板の配置関係に基づく空間相関行列であり（式（８ａ）または式（８ｂ）参照）、フィルタW_n ^→(ω,θ_s)は、候補インデックスｎに対応する反射板の配置関係に基づくフィルタである（式（７）参照）。

The arrangement determining unit 110 of the reflecting object arrangement determining apparatus 100 is configured to calculate JK candidates acquired from the storage unit 101 or JK candidates obtained by the arrangement determining unit 110 according to a function stored in the storage unit 101. for each (candidate index and n), calculates the power (evaluation function) p _n according to formula (12). Ω is a set of frequencies ω. The spatial correlation matrix Q _n (ω) is a spatial correlation matrix based on the arrangement relationship of the reflectors corresponding to the candidate index n (see Expression (8a) or Expression (8b)), and the filter W _n ^→ (ω, θ _s ) Is a filter based on the arrangement relationship of the reflectors corresponding to the candidate index n (see Expression (7)).

目的方向θ_sが一つの場合は式（１２）に拠るが、目的方向が複数である場合は、配置決定部１１０は、式（１３）によるパワー（評価関数）p_nを計算する。複数の目的方向をθ_s1，…，θ_sAとする。ただし、その総数|{θ_s1，…，θ_sA}|=AはPを超えない。この処理は、複数の目的方向について、これらの目的方向のうちいずれかに特化して良好な音声強調を実現するフィルタを設計する観点ではなく、これらの目的方向のうちのどの方向であってもバランス良く良好な音声強調を実現するフィルタを設計する観点によるものである。

If the intended direction theta _s is one according to equation (12), but if the target direction is plural, arrangement determination unit 110 calculates the power (evaluation function) p _n according to formula (13). A plurality of target directions are θ _s1 ,..., Θ _sA . However, the total number | {θ _s1 ,..., Θ _sA } | = A does not exceed P. This processing is not a viewpoint of designing a filter that realizes good speech enhancement by specializing in any one of these target directions, and any one of these target directions. This is based on the viewpoint of designing a filter that realizes good speech enhancement in a balanced manner.

次に、配置決定部１１０は、JK個の候補の対応するパワーp₁，…，p_JKのうち最小のパワーを探索する。例えば、最小のパワーがp_gであれば、そのインデックスｇで特定される「マイクロホンアレーに対する反射板の角度とマイクロホンアレーの中心から反射板までの距離」がマイクロホンアレーに対する最適な反射板の配置条件として決定される。 Next, the arrangement determining unit 110 searches for the minimum power among the powers p ₁ ,..., P _JK corresponding to the JK candidates. For example, if the minimum power is _pg , the “angle of the reflector with respect to the microphone array and the distance from the center of the microphone array to the reflector” specified by the index g is the optimum reflector arrangement condition for the microphone array. As determined.

<２>SN比最大化規準によるフィルタ設計法
SN比最大化規準によるフィルタ設計法では、目的方向θ_sでのSN比（SNR）を最大化する規準でフィルタW^→(ω,θ_s)を決定する。目的方向θ_sの音声の空間相関行列をR_ss(ω)、目的方向θ_s以外の方向の音声の空間相関行列をR_nn(ω)とする。このとき、評価関数であるSNRは式（１４）で表される。なお、R_ss(ω)は式（１５）、R_nn(ω)は式（１６）で表される。伝達特性a^→(ω,θ_s)＝[a₁(ω,θ_s),…,a_M(ω,θ_s)]^Tは式（９ａ）で表される（正確には、式（９ａ）のθをθ_sとしたものである）。

<2> Filter design method based on S / N ratio maximization criteria
In the filter design method based on the SN ratio maximization criterion, the filter W ^→ (ω, θ _s ) is determined based on the criterion for maximizing the SN ratio (SNR) in the target direction θ _s . The spatial correlation matrix of the audio object direction θ _s R _ss (ω), the spatial correlation matrix of the direction other than the target direction theta _s voice and R _nn (ω). At this time, the SNR that is an evaluation function is expressed by Expression (14). R _ss (ω) is expressed by equation (15), and R _nn (ω) is expressed by equation (16). Transfer characteristic a ^→ (ω, θ _s ) = [a ₁ (ω, θ _s ),..., A _M (ω, θ _s )] ^T is expressed by equation (9a) (exactly, equation (9a ) Is θ _s ).

式（１４）のSNRを最大にするフィルタW^→(ω,θ_s)は、フィルタW^→(ω,θ_s)に関する勾配をゼロとすること、つまり式（１７）によって求めることができる。

The filter W ^→ (ω, θ _s ) that maximizes the SNR in Expression (14) can be obtained by setting the gradient related to the filter W ^→ (ω, θ _s ) to zero, that is, Expression (17).

これにより、式（１４）のSNRを最大にするフィルタW^→(ω,θ_s)は式（１８）で与えられる。

Thus, the filter W ^→ (ω, θ _s ) that maximizes the SNR in Expression (14) is given by Expression (18).

式（１８）には目的方向θ_s以外の方向の音声の空間相関行列R_nn(ω)の逆行列が含まれているが、R_nn(ω)の逆行列を、目的方向θ_sの音声と目的方向θ_s以外の方向の音声を含む入力全体の空間相関行列R_xx(ω)の逆行列に置換してもよいことが知られている。なお、R_xx(ω)=R_ss(ω)+R_nn(ω)=Q(ω)である。つまり、式（１４）のSNRを最大にするフィルタW^→(ω,θ_s)を式（１９）で求めてもよい。

Equation (18) includes an inverse matrix of the spatial correlation matrix R _nn (ω) of speech in a direction other than the target direction θ _s , and the inverse matrix of R _nn (ω) is used as the speech in the target direction θ _s . It is known that it may be replaced with an inverse matrix of the spatial correlation matrix R _xx (ω) of the entire input including speech in directions other than the target direction θ _s . Note that R _xx (ω) = R _ss (ω) + R _nn (ω) = Q (ω). That is, the filter W ^→ (ω, θ _s ) that maximizes the SNR in equation (14) may be obtained by equation (19).

反射物配置決定装置の記憶部１０１には、マイクロホンアレーに対する反射板の配置関係を表す情報がデータとして記憶されている（後述するように実施形態によっては、「スピーカアレーに対する反射板の配置関係を表す情報」であるが、ここではマイクロホンアレーの場合を代表して説明する）。マイクロホンアレーに対する反射板の配置関係を表す情報の一例は、マイクロホンアレーに対する反射板の配置に関する予め定められた候補の集合であり、この集合をＵとする。集合Ｕに含まれる候補は、例えば、マイクロホンアレーに対する反射板の角度の候補の数をJとし、マイクロホンアレーの中心から反射板までの距離の候補の数をKとすると、角度の候補と距離の候補との組み合わせによって表され、集合Ｕに含まれる候補の総数はJKとなる。 The storage unit 101 of the reflector arrangement determining apparatus stores information representing the arrangement relationship of the reflector with respect to the microphone array as data (in some embodiments, as described later, “the arrangement relationship of the reflector with respect to the speaker array is changed. The information is “represented”, but here, the case of a microphone array will be described as a representative). An example of information representing the arrangement relationship of the reflectors with respect to the microphone array is a set of predetermined candidates relating to the arrangement of the reflectors with respect to the microphone array, and this set is U. The candidates included in the set U are, for example, that the number of reflector angle candidates with respect to the microphone array is J and the number of distance candidates from the center of the microphone array to the reflector is K. The total number of candidates that are represented by combinations with candidates and are included in the set U is JK.

反射物配置決定装置１００の配置決定部１１０は、記憶部１０１から取得したJK個の候補、あるいは記憶部１０１に記憶された関数に従って配置決定部１１０が計算して得たJK個の候補、のそれぞれ（候補インデックスをｎとする）について、式（２０）によるSN比（評価関数）p_nを計算する。Ωは周波数ωの集合である。空間相関行列R_ss ⁽ⁿ⁾(ω)，R_nn ⁽ⁿ⁾(ω)は、候補インデックスｎに対応する反射板の配置関係に基づく空間相関行列であり（式（１５）、式（１６）参照）、フィルタW_n ^→(ω,θ_s)は、候補インデックスｎに対応する反射板の配置関係に基づくフィルタである（式（１８）または式（１９）参照）。

The arrangement determining unit 110 of the reflecting object arrangement determining apparatus 100 is configured to calculate JK candidates acquired from the storage unit 101 or JK candidates obtained by the arrangement determining unit 110 according to a function stored in the storage unit 101. for each (candidate index and n), calculates the SN ratio according to equation (20) (evaluation function) p _n. Ω is a set of frequencies ω. Spatial correlation matrices R _ss ⁽ⁿ⁾ (ω) and R _nn ⁽ⁿ⁾ (ω) are spatial correlation matrices based on the arrangement relationship of the reflectors corresponding to the candidate index n (Expression (15), Expression (16)). The filter W _n ^→ (ω, θ _s ) is a filter based on the arrangement relationship of the reflectors corresponding to the candidate index n (see formula (18) or formula (19)).

目的方向θ_sが一つの場合は式（２０）に拠るが、目的方向が複数である場合は、配置決定部１１０は、式（２１）によるSN比（評価関数）p_nを計算する。複数の目的方向をθ_s1，…，θ_sAとする。ただし、その総数|{θ_s1，…，θ_sA}|=AはPを超えない。この処理は、複数の目的方向について、これらの目的方向のうちいずれかに特化して良好な音声強調を実現するフィルタを設計する観点ではなく、これらの目的方向のうちのどの方向であってもバランス良く良好な音声強調を実現するフィルタを設計する観点によるものである。

While if the target direction theta _s is one according to equation (20), if the target direction is plural, arrangement determination unit 110 calculates the SN ratio (evaluation function) p _n according to formula (21). A plurality of target directions are θ _s1 ,..., Θ _sA . However, the total number | {θ _s1 ,..., Θ _sA } | = A does not exceed P. This processing is not a viewpoint of designing a filter that realizes good speech enhancement by specializing in any one of these target directions, and any one of these target directions. This is based on the viewpoint of designing a filter that realizes good speech enhancement in a balanced manner.

次に、配置決定部１１０は、JK個の候補の対応するSN比p₁，…，p_JKのうち最小のSN比を探索する。例えば、最小のSN比がp_gであれば、そのインデックスｇで特定される「マイクロホンアレーに対する反射板の角度とマイクロホンアレーの中心から反射板までの距離」がマイクロホンアレーに対する最適な反射板の配置条件として決定される。 Next, the arrangement determining unit 110 searches for the minimum SN ratio among the SN ratios p ₁ ,..., P _JK corresponding to the JK candidates. For example, if the minimum signal-to-noise ratio is _pg , the “angle of the reflector with respect to the microphone array and the distance from the center of the microphone array to the reflector” specified by the index g is the optimum arrangement of the reflector with respect to the microphone array. Determined as a condition.

<３>パワーインバージョンに基づくフィルタ設計法
パワーインバージョンに基づくフィルタ設計法では、一つのマイクロホンに対するフィルタ係数を一定値に固定した状態で出力のパワーを最小化する基準でフィルタW^→(ω,θ_s)を決定する。ここでは、一例として、M個のマイクロホンのうち1番目のマイクロホンに対するフィルタ係数を固定するとして説明する。この設計法では、フィルタW^→(ω,θ_s)は、式（２３）の拘束条件の下、空間相関行列R_xx(ω)を用いて全方向（音声の到来方向として想定される全ての方向）の音声のパワーが最小となるように設計される（式（２２）参照）。伝達特性a^→(ω,θ_s)＝[a₁(ω,θ_s),…,a_M(ω,θ_s)]^Tは式（９ａ）で表される（正確には、式（９ａ）のθをθ_sとしたものである）。なお、R_xx(ω)=Q(ω)である。

<3> Filter design method based on power inversion In the filter design method based on power inversion, the filter W ^→ (ω, θ _s ) is determined. Here, as an example, it is assumed that the filter coefficient for the first microphone among the M microphones is fixed. In this design method, the filter W ^→ (ω, θ _s ) is omnidirectional (all the possible directions of speech arrival) using the spatial correlation matrix R _xx (ω) under the constraint of equation (23). It is designed to minimize the power of the voice in the direction (see equation (22)). Transfer characteristic a ^→ (ω, θ _s ) = [a ₁ (ω, θ _s ),..., A _M (ω, θ _s )] ^T is expressed by equation (9a) (exactly, equation (9a ) Is θ _s ). Note that R _xx (ω) = Q (ω).

式（２２）の最適解であるフィルタW^→(ω,θ_s)は式（２４）で与えられることが知られている（参考文献２参照）。

It is known that the filter W ^→ (ω, θ _s ), which is the optimal solution of the equation (22), is given by the equation (24) (see Reference 2).

反射物配置決定装置１００の配置決定部１１０は、記憶部１０１から取得したJK個の候補、あるいは記憶部１０１に記憶された関数に従って配置決定部１１０が計算して得たJK個の候補、のそれぞれ（候補インデックスをｎとする）について、式（２５）によるパワー（評価関数）p_nを計算する。Ωは周波数ωの集合である。空間相関行列Q_n(ω)=R_xx(ω)は、候補インデックスｎに対応する反射板の配置関係に基づく空間相関行列であり（式（８ａ）または式（８ｂ）参照）、フィルタW_n ^→(ω,θ_s)は、候補インデックスｎに対応する反射板の配置関係に基づくフィルタである（式（２４）参照）。

The arrangement determining unit 110 of the reflecting object arrangement determining apparatus 100 is configured to calculate JK candidates acquired from the storage unit 101 or JK candidates obtained by the arrangement determining unit 110 according to a function stored in the storage unit 101. for each (candidate index and n), calculates the power (evaluation function) p _n according to formula (25). Ω is a set of frequencies ω. The spatial correlation matrix Q _n (ω) = R _xx (ω) is a spatial correlation matrix based on the arrangement relationship of the reflectors corresponding to the candidate index n (see Expression (8a) or Expression (8b)), and the filter W _n ^→ (ω, θ _s ) is a filter based on the arrangement relationship of the reflectors corresponding to the candidate index n (see Expression (24)).

目的方向θ_sが一つの場合は式（２５）に拠るが、目的方向が複数である場合は、配置決定部１１０は、式（２６）によるパワー（評価関数）p_nを計算する。複数の目的方向をθ_s1，…，θ_sAとする。ただし、その総数|{θ_s1，…，θ_sA}|=AはPを超えない。この処理は、複数の目的方向について、これらの目的方向のうちいずれかに特化して良好な音声強調を実現するフィルタを設計する観点ではなく、これらの目的方向のうちのどの方向であってもバランス良く良好な音声強調を実現するフィルタを設計する観点によるものである。

If the intended direction theta _s is one according to equation (25), but if the target direction is plural, arrangement determination unit 110 calculates the power (evaluation function) p _n according to formula (26). A plurality of target directions are θ _s1 ,..., Θ _sA . However, the total number | {θ _s1 ,..., Θ _sA } | = A does not exceed P. This processing is not a viewpoint of designing a filter that realizes good speech enhancement by specializing in any one of these target directions, and any one of these target directions. This is based on the viewpoint of designing a filter that realizes good speech enhancement in a balanced manner.

＜距離の導入＞
上述の説明では、いずれの設計法においても、目的方向のみを考慮していたが、音源までの距離（後述するようにスピーカアレーによる音声再生の場合では、スポット再生までの距離）も考慮してフィルタを設計することも可能である。この場合、各設計法において、マイクロホンアレーの中心からの距離をDと表す（特に目的方向への距離をD_hと表す）と、上記各式は下記のように修正される。 <Introduction of distance>
In the above description, in each design method, only the target direction is considered, but the distance to the sound source (in the case of audio reproduction using a speaker array as will be described later) is also considered. It is also possible to design a filter. In this case, in each design method, and the distance from the center of the microphone array expressed as D (in particular describes the distance to the target direction and D _h), the above equation is modified as follows.

<１>最小分散無歪応答法によるフィルタ設計法の場合
式（４）：

<1> In the case of the filter design method using the minimum variance distortion-free response method (4):

式（５）、式（６）：

Formula (5), Formula (6):

式（７）：

Formula (7):

式（８ａ）、式（８ｂ）：雑音の到来距離のインデックスzが属する集合を{1,2,…,Z-1}とする。目的距離D_hのインデックスhは集合{1,2,…,Z-1}に属さないとする。また、集合{1,2,…,Z-1}と集合{h}との和集合をΓとすると、|Γ|=Zである。|Γ|は集合Γの要素数を表す。

Expression (8a), Expression (8b): A set to which the index z of the arrival distance of noise belongs is {1, 2,..., Z-1}. It is assumed that the index h of the target distance D _h does not belong to the set {1, 2,..., Z-1}. Further, if the union of the set {1, 2,..., Z-1} and the set {h} is Γ, | Γ | = Z. | Γ | represents the number of elements of the set Γ.

式（９ａ）、式（９ｂ）：

Formula (9a), Formula (9b):

式（１０ａ）、式（１０ｂ）：ただし、音波が球面波として到来する場合の例である。mは1≦m≦Mを満たす各整数である。cは音速を表す。ｊは虚数単位である。適宜に設定した空間座標系において、v^→ _θ,D ^(d)は位置(θ,D)の位置ベクトルを、u^→ _mはm番目のマイクロホンの位置ベクトルを表す。記号‖・‖はノルムを表す。f(‖v^→ _θ,D ^(d)-u^→ _m‖)は音波の距離減衰を表す関数である。例えばf(‖v^→ _θ,D ^(d)-u^→ _m‖)=1/‖v^→ _θ,D ^(d)-u^→ _m‖である（置換後の式（１０ｂ）参照）。

Formula (10a), Formula (10b): However, it is an example in case a sound wave arrives as a spherical wave. m is an integer satisfying 1 ≦ m ≦ M. c represents the speed of sound. j is an imaginary unit. In an appropriately set spatial coordinate system, v ^→ _{θ, D} ^(d) represents the position vector of the position (θ, D), and u ^→ _m represents the position vector of the m-th microphone. The symbols ‖ and ‖ represent the norm. f (‖v ^→ _{θ, D} ^(d) -u ^→ _m ‖) is a function representing the attenuation of the sound wave distance. For example ^{_{^{f (‖v → θ, D (}}} d) -u → m ||) = 1 / ‖v ^→ _θ, a _D ^(d) -u ^→ _m ‖ (see equation after substitution (10b)).

式（１１ｃ）、式（１１ｄ）：ただし、音波が球面波として到来する場合の例である。mは1≦m≦Mを満たす各整数である。cは音速を表す。ｊは虚数単位である。上記空間座標系において、v^→ _θ,D ^(ξ)は位置(θ,D)がξ番目の反射物の反射面で鏡像対象に移された位置の位置ベクトルを、u^→ _mはm番目のマイクロホンの位置ベクトルを表す。記号‖・‖はノルムを表す。f(‖v^→ _θ,D ^(ξ)-u^→ _m‖)は音波の距離減衰を表す関数である。例えばf(‖v^→ _θ,D ^(ξ)-u^→ _m‖)=1/‖v^→ _θ,D ^(ξ)-u^→ _m‖である（置換後の式（１１ｄ）参照）。

Formula (11c), Formula (11d): However, it is an example in case a sound wave arrives as a spherical wave. m is an integer satisfying 1 ≦ m ≦ M. c represents the speed of sound. j is an imaginary unit. In the above spatial coordinate system, v ^→ _{θ, D} ^(ξ) is the position vector where the position (θ, D) is moved to the mirror image object on the reflecting surface of the ξth reflector, u ^→ _m is the mth Represents a microphone position vector. The symbols ‖ and ‖ represent the norm. f (‖v ^→ _{θ, D} ^(ξ) -u ^→ _m ‖) is a function that represents the distance attenuation of the sound wave. For example ^{_{^{f (‖v → θ, D (}}} ξ) -u → m ||) = 1 / ‖v ^→ _θ, a _{^{D (ξ)}} -u ^→ _m ‖ (see equation after substitution (11d)).

式（１２）：

Formula (12):

式（１３）：目的距離が複数である場合は、複数の目的距離をD_h1，…，θ_hBとする。その総数|{D_h1，…，θ_hB}|=Bとする。ただし、その総数|{D_h1，…，θ_hB}|=BはZを超えない。

Formula (13): When there are a plurality of target distances, the plurality of target distances are set as D _h1 ,..., Θ _hB . The total number | {D _h1 ,..., Θ _hB } | = B. However, the total number thereof | {D _h1 ,..., Θ _hB } | = B does not exceed Z.

式（１４）、式（１５）、式（１６）：

Formula (14), Formula (15), Formula (16):

式（１７）：

Formula (17):

式（１８）：

Formula (18):

式（１９）：

Formula (19):

式（２０）：

Formula (20):

式（２１）：

Formula (21):

式（２２）、式（２３）：

Formula (22), Formula (23):

式（２４）：

Formula (24):

式（２５）：

Formula (25):

式（２６）：

Formula (26):

上述の説明では、いずれの設計法においても、マイクロホンアレーによる収音を前提としていたが、スピーカアレーによって音声を再生する場合であっても全く同じ議論が成立する。なお、音声再生の場合の反射音を考慮するため、「双対音」を定義する。（１）スピーカアレーから放射された音声であって、（２）当該音声が反射物で反射して、反射音の進行方向が目的方向となる、という条件を満たす音声を「双対音」と呼ぶ（図１２、図１３参照）。マイクロホンアレーによる収音を前提として上述の説明において、マイクロホンアレーをスピーカアレー、雑音を漏れ音声、反射音を双対音と読み替えればよい。 In the above description, in any of the design methods, it is assumed that sound is collected by a microphone array. However, the same argument holds even when sound is reproduced by a speaker array. Note that “dual sound” is defined in order to consider the reflected sound in the case of audio reproduction. (1) Voice radiated from a speaker array, and (2) voice that satisfies the condition that the voice is reflected by a reflector and the reflected sound travels in the target direction is called “dual sound”. (See FIGS. 12 and 13). In the above description on the assumption that sound is collected by a microphone array, the microphone array may be read as a speaker array, noise as a leaked sound, and reflected sound as a dual sound.

以下、本発明の適用形態を説明する。適用形態の概要は下記のとおりである。
適用形態１：
マイクロホンアレーで収音した音声について所望の方向についての音声を狭指向で強調する。
適用形態２：
マイクロホンアレーで収音した音声について所望の方向および距離の音声を狭指向で強調する。
適用形態３：
スピーカアレーで所望の方向に音声を狭指向で再生する。
適用形態４：
スピーカアレーで所望の方向と距離の場所に音声を狭指向でスポット再生する。 Hereinafter, application modes of the present invention will be described. The outline of the application form is as follows.
Application form 1:
With respect to the sound collected by the microphone array, the sound in a desired direction is emphasized in a narrow direction.
Application form 2:
With respect to the sound picked up by the microphone array, the sound in a desired direction and distance is emphasized in a narrow direction.
Application form 3:
Sound is reproduced in a narrow direction in a desired direction with a speaker array.
Application form 4:
A speaker array is used to spot-reproduce sound in a narrow direction in a desired direction and distance.

《適用形態１》
適用形態１の機能構成および処理フローを図２と図３に示す。この適用形態１の音声処理装置１は、ＡＤ変換部２１０、フレーム生成部２２０、周波数領域変換部２３０、フィルタ適用部２４０、時間領域変換部２５０、フィルタ設計部２６０、記憶部２９０を含む。 << Applicable form 1 >>
The functional configuration and processing flow of application form 1 are shown in FIGS. The speech processing apparatus 1 according to the application form 1 includes an AD conversion unit 210, a frame generation unit 220, a frequency domain conversion unit 230, a filter application unit 240, a time domain conversion unit 250, a filter design unit 260, and a storage unit 290.

まず、反射板の位置は、上述の本発明の実施形態によって決定される。続いて、下記の処理が続行する（図１０、図１１も参照のこと）。 First, the position of the reflector is determined by the above-described embodiment of the present invention. Subsequently, the following processing continues (see also FIGS. 10 and 11).

ステップＳ１
予め、フィルタ設計部２６０が音声強調の対象となりえる離散的な方向ごとに、周波数ごとのフィルタW^→(ω,θ_i)を計算しておく。音声強調の対象となりえる離散的な方向の総数をI（Iは１以上の予め定められた整数であり、I≦Pを満たす）とすると、W^→(ω,θ₁)，…，W^→(ω,θ_i)，…，W^→(ω,θ_I)（1≦i≦I, ω∈Ω; iは整数、Ωは周波数ωの集合）を事前に計算しておくのである。このためには、伝達特性a^→(ω,θ_i)＝[a₁(ω,θ_i),…,a_M(ω,θ_i)]^T（1≦i≦I, ω∈Ω）を求める必要があるが、これは、マイクロホンアレーにおけるマイクロホンの配置、反射物である例えば反射板のマイクロホンアレーに対する位置関係（これは既に決定されている）、直接音とξ番目（1≦ξ≦Ξ）の反射音との到来時間差、反射物の音の反射率などの環境情報を基に式（９ａ）によって具体的に計算できる（正確には、式（９ａ）のθをθ_iとしたものである）。反射音の数Ξは１≦Ξを満たす整数に設定されるが、上述の実施形態によるとΞ＝１であり、一つの反射板３００をマイクロホンアレーの近傍に設置するので、伝達特性a^→(ω,θ_i)は式（９ｂ）によって具体的に計算できる（正確には、式（９ｂ）のθをθ_iとしたものである）。ステアリングベクトルの計算には、例えば式（１０ａ）、式（１０ｂ）、式（１１ａ）、式（１１ｂ）、式（１１ｃ）、式（１１ｄ）を用いることができる。なお、式（９ａ）や式（９ｂ）に拠らず、例えば実環境下における実測で得られた伝達特性を用いてもよい。そして、伝達特性a^→(ω,θ_i)を用いて、例えば式（７）、式（１８）、式（１９）、式（２４）のいずれかによってW^→(ω,θ_i)（1≦i≦I）を求める。なお、式（７）または式（１９）または式（２４）を用いる場合には空間相関行列Q(ω)（あるいはR_xx(ω)）は式（８ｂ）で計算できる。式（１８）を用いる場合には空間相関行列R_nn(ω)は式（１６）で計算できる。I×|Ω|個のフィルタW^→(ω,θ_i)（1≦i≦I,ω∈Ω）は記憶部２９０に記憶される。|Ω|は集合Ωの要素数を表す。 Step S1
In advance, the filter design unit 260 calculates a filter W ^→ (ω, θ _i ) for each frequency for each discrete direction that can be a target of speech enhancement. The total number of I discrete directions that may be subject to speech enhancement (I is 1 or more predetermined integer, satisfying the I ≦ P) ^{_{When, W → (ω, θ 1}} ), ..., W → (ω, θ _i ),..., W ^→ (ω, θ _I ) (1 ≦ i ≦ I, ω∈Ω; i is an integer, Ω is a set of frequencies ω) is calculated in advance. For this purpose, transfer characteristic a ^→ (ω, θ _i ) = [a ₁ (ω, θ _i ), ..., a _M (ω, θ _i )] ^T (1 ≦ i ≦ I, ω∈Ω) Although it is necessary to find out, this is the arrangement of the microphones in the microphone array, the positional relationship of the reflectors such as the reflector with respect to the microphone array (this has already been determined), the direct sound and the ξth (1 ≦ ξ ≦ Ξ ) Can be concretely calculated by the equation (9a) based on the environmental information such as the arrival time difference from the reflected sound and the reflectance of the sound of the reflector (exactly, θ in the equation (9a) is θ _i Is). The number of the reflected sounds is set to an integer satisfying 1 ≦ Ξ, but according to the above-described embodiment, Ξ = 1, and since one reflector 300 is installed in the vicinity of the microphone array, the transfer characteristic a ^→ ( (ω, θ _i ) can be specifically calculated by equation (9b) (more precisely, θ in equation (9b) is θ _i ). For the calculation of the steering vector, for example, Expression (10a), Expression (10b), Expression (11a), Expression (11b), Expression (11c), and Expression (11d) can be used. In addition, you may use the transfer characteristic obtained by actual measurement in a real environment, for example, without depending on Formula (9a) and Formula (9b). Then, using the transfer characteristic a ^→ (ω, θ _i ), for example, W ^→ (ω, θ _i ) (1) according to any one of the equations (7), (18), (19), and (24). ≦ i ≦ I) is obtained. In addition, when using Formula (7), Formula (19), or Formula (24), the spatial correlation matrix Q (ω) (or R _xx (ω)) can be calculated by Formula (8b). When using equation (18), the spatial correlation matrix R _nn (ω) can be calculated by equation (16). I × | Ω | filters W ^→ (ω, θ _i ) (1 ≦ i ≦ I, ω∈Ω) are stored in the storage unit 290. | Ω | represents the number of elements of the set Ω.

ステップＳ２
マイクロホンアレーを構成するM個のマイクロホン２００−１，…，２００−Ｍを用いて収音する。Mは２以上の整数である。 Step S2
Sound is picked up using M microphones 200-1,..., 200-M constituting the microphone array. M is an integer of 2 or more.

M個のマイクロホンの並べ方に制限は無い。ただし、２次元または３次元的にM個のマイクロホンを配置することによって、音声強調する方向の不確定性がなくなるという利点がある。つまり、M個のマイクロホンを水平方向に直線状に並べたときに例えば正面方向から到来する音声と真上から到来する音声との区別ができなくなるという問題を、マイクロホンを平面的ないし立体的に並べることで防ぐことができる。また、収音方向として設定できる方向を広くとるためには、各マイクロホンの指向性は、収音方向である目的方向θ_sになり得る方向にある程度の音圧で音声を収音可能な指向性を持っていたほうがよい。したがって、無指向性マイクロホンや単一指向性マイクロホンといった指向性が比較的緩やかなマイクロホンが好適である。 There is no limit to the way the M microphones are arranged. However, by arranging M microphones two-dimensionally or three-dimensionally, there is an advantage that there is no uncertainty in the direction of voice enhancement. In other words, when M microphones are arranged in a straight line in the horizontal direction, for example, it is impossible to distinguish between voices coming from the front direction and voices coming from directly above. Can be prevented. In addition, in order to take a wide range of directions that can be set as the sound collection direction, the directivity of each microphone is a directivity capable of collecting sound with a certain sound pressure in a direction that can be the target direction θ _s that is the sound collection direction. It is better to have Therefore, a microphone having a relatively gentle directivity such as an omnidirectional microphone or a unidirectional microphone is preferable.

ステップＳ３
ＡＤ変換部２１０が、M個のマイクロホン２００−１，…，２００−Ｍで収音されたアナログ信号（収音信号）をディジタル信号x^→(t)＝[x₁(t),…,x_M(t)]^Tへ変換する。ｔは離散時間のインデックスを表す。 Step S3
The AD converter 210 converts an analog signal (sound collected signal) collected by the M microphones 200-1,..., 200-M into a digital signal x ^→ (t) = [x ₁ (t),. _M (t)] Convert to ^T. t represents a discrete time index.

ステップＳ４
フレーム生成部２２０は、ＡＤ変換部２１０が出力したディジタル信号x^→(t)＝[x₁(t),…,x_M(t)]^Tを入力とし、チャネルごとにNサンプルをバッファに貯めてフレーム単位のディジタル信号x^→(k)＝[x^→ ₁(k),…,x^→ _M(k)]^Tを出力する。kはフレーム番号のインデックスである。x^→ _m(k)=[x_m((k-1)N+1),…,x_m(kN)]（1≦m≦M）である。Nはサンプリング周波数にもよるが、16kHzサンプリングの場合には512点あたりが妥当である。 Step S4
The frame generation unit 220 receives the digital signal x ^→ (t) = [x ₁ (t),..., X _M (t)] ^T output from the AD conversion unit 210 and stores N samples in a buffer for each channel. Thus, the digital signal x ^→ (k) = [x ^→ ₁ (k),..., X ^→ _M (k)] ^T is output in frame units. k is an index of the frame number. x ^→ _m (k) = [x _m ((k−1) N + 1),..., x _m (kN)] (1 ≦ m ≦ M). N depends on the sampling frequency, but in the case of 16kHz sampling, around 512 points is reasonable.

ステップＳ５
周波数領域変換部２３０は、各フレームのディジタル信号x^→(k)を周波数領域の信号X^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^Tに変換して出力する。ωは離散周波数のインデックスである。時間領域信号を周波数領域信号に変換する方法の一つに高速離散フーリエ変換があるが、これに限定されず、周波数領域信号に変換する他の方法を用いてもよい。周波数領域信号X^→(ω,k)は、各周波数ω、フレームkごとに出力される。 Step S5
The frequency domain transform unit 230 converts the digital signal x ^→ (k) of each frame into the frequency domain signal X ^→ (ω, k) = [X ₁ (ω, k),..., X _M (ω, k)] ^T Convert to and output. ω is an index of discrete frequency. One method for converting a time domain signal to a frequency domain signal is a fast discrete Fourier transform, but the present invention is not limited to this, and other methods for converting to a frequency domain signal may be used. The frequency domain signal X ^→ (ω, k) is output for each frequency ω and for each frame k.

ステップＳ６
フィルタ適用部２４０は、フレームkごとに、各周波数ω∈Ωについて、周波数領域信号X^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^Tに、強調したい目的方向θ_sに対応するフィルタW^→(ω,θ_s)を適用して、出力信号Y(ω,k,θ_s)を出力する（式（２７）参照）。目的方向θ_sのインデックスsは、s∈{1,…,I}であり、フィルタW^→(ω,θ_s)は記憶部２９０に記憶されているので、例えば、ステップＳ６の処理の都度、フィルタ適用部２４０は、強調したい目的方向θ_sに対応するフィルタW^→(ω,θ_s)を記憶部２９０から取得すればよい。目的方向θ_sのインデックスsが集合{1,…,I}に属さない場合、つまり、目的方向θ_sに対応するフィルタW^→(ω,θ_s)がステップＳ１の処理で計算されていない場合、臨時に目的方向θ_sに対応するフィルタW^→(ω,θ_s)をフィルタ設計部２６０に計算させてもよいし、あるいは目的方向θ_sに近い方向θ_s'に対応するフィルタW^→(ω,θ_s')を用いてよい。

Step S6
The filter application unit 240 changes the frequency domain signal X ^→ (ω, k) = [X ₁ (ω, k),..., X _M (ω, k)] ^T for each frequency ω∈Ω for each frame k. Then, an output signal Y (ω, k, θ _s ) is output by applying the filter W ^→ (ω, θ _s ) corresponding to the target direction θ _s to be emphasized (see Expression (27)). Since the index s of the target direction θ _s is s∈ {1,..., I}, and the filter W ^→ (ω, θ _s ) is stored in the storage unit 290, for example, every time the process of step S6 is performed, The filter application unit 240 may acquire the filter W ^→ (ω, θ _s ) corresponding to the target direction θ _s to be emphasized from the storage unit 290. When the index s of the target direction θ _s does not belong to the set {1,..., I}, that is, when the filter W ^→ (ω, θ _s ) corresponding to the target direction θ _s is not calculated in the process of step S1. The filter design unit 260 may calculate the filter W ^→ (ω, θ _s ) corresponding to the target direction θ _s temporarily, or the filter W ^→ ( ^→ corresponding to the direction θ _{s ′} close to the target direction θ _s (ω, θ _{s ′} ) may be used.

ステップＳ７
時間領域変換部２５０は、第kフレームの各周波数ω∈Ωの出力信号Y(ω,k,θ_s)を時間領域に変換して第kフレームのフレーム単位時間領域信号y(k)を得て、さらに、得られたフレーム単位時間領域信号y(k)をフレーム番号のインデックスの順番に連結して目的方向θ_sの音声が強調された時間領域信号y(t)を出力する。周波数領域信号を時間領域信号に変換する方法は、ステップＳ５の処理で用いた変換方法に対応する逆変換であり、例えば高速離散逆フーリエ変換である。 Step S7
The time domain transform unit 250 transforms the output signal Y (ω, k, θ _s ) of each frequency ω∈Ω of the kth frame into the time domain to obtain a frame unit time domain signal y (k) of the kth frame. Further, the obtained frame unit time domain signal y (k) is connected in the order of the frame number index to output the time domain signal y (t) in which the voice in the target direction θ _s is emphasized. The method for converting the frequency domain signal into the time domain signal is an inverse transformation corresponding to the transformation method used in the process of step S5, for example, a fast discrete inverse Fourier transform.

ここでは、ステップＳ１の処理で予めフィルタW^→(ω,θ_i)を計算しておく実施形態を説明したが、狭指向音声強調装置１の計算処理能力などに応じて、目的方向θ_sが定まってからフィルタ設計部２６０が周波数ごとのフィルタW^→(ω,θ_s)を計算する実施形態を採用することもできる。 Here, the embodiment in which the filter W ^→ (ω, θ _i ) is calculated in advance in the process of step S1 has been described, but the target direction θ _s is determined according to the calculation processing capability of the narrow directivity speech enhancement apparatus 1 and the like. An embodiment in which the filter design unit 260 calculates the filter W ^→ (ω, θ _s ) for each frequency after it is determined can also be adopted.

《適用形態２》
適用形態２の機能構成および処理フローを図４と図５に示す。この適用形態２の音声処理装置２は、ＡＤ変換部２１０、フレーム生成部２２０、周波数領域変換部２３０、フィルタ適用部２４０、時間領域変換部２５０、フィルタ設計部２６０、記憶部２９０を含む。 << Applicable form 2 >>
The functional configuration and processing flow of application mode 2 are shown in FIGS. The speech processing apparatus 2 according to Application 2 includes an AD conversion unit 210, a frame generation unit 220, a frequency domain conversion unit 230, a filter application unit 240, a time domain conversion unit 250, a filter design unit 260, and a storage unit 290.

まず、反射板の位置は、上述の本発明の実施形態によって決定される。続いて、下記の処理が続行する（図１０、図１１も参照のこと）。なお、適用形態２において引用する式は＜距離の導入＞欄の式とする。 First, the position of the reflector is determined by the above-described embodiment of the present invention. Subsequently, the following processing continues (see also FIGS. 10 and 11). The formula quoted in the application form 2 is the formula in the <Introduction of distance> column.

ステップＳ１
予め、フィルタ設計部２６０が音声強調の対象となりえる離散的な位置(θ_i,D_g)ごとに、周波数ごとのフィルタW^→(ω,θ_i,D_g)を計算しておく。音声強調の対象となりえる離散的な方向の総数をI（Iは１以上の予め定められた整数であり、I≦Pを満たす）、離散的な距離の総数をG（Gは１以上の予め定められた整数であり、G≦Zを満たす）とすると、W^→(ω,θ₁,D₁)，…，W^→(ω,θ_i,D₁)，…，W^→(ω,θ_I,D₁),W^→(ω,θ₁,D₂)，…，W^→(ω,θ_i,D₂)，…，W^→(ω,θ_I,D₂)，…，W^→(ω,θ₁,D_g)，…，W^→(ω,θ_i,D_g)，…，W^→(ω,θ_I,D_g)，…，W^→(ω,θ₁,D_G)，…，W^→(ω,θ_i,D_G)，…，W^→(ω,θ_I,D_G)（1≦i≦I, 1≦g≦G, ω∈Ω; iは整数、Ωは周波数ωの集合）を事前に計算しておくのである。このためには、伝達特性a^→(ω,θ_i,D_g)＝[a₁(ω,θ_i,D_g),…,a_M(ω,θ_i,D_g)]^T（1≦i≦I, 1≦g≦G, ω∈Ω）を求める必要があるが、これは、マイクロホンアレーにおけるマイクロホンの配置、反射物である例えば反射板のマイクロホンアレーに対する位置関係（これは既に決定されている）、直接音とξ番目（1≦ξ≦Ξ）の反射音との到来時間差、反射物の音の反射率などの環境情報を基に式（９ａ）によって具体的に計算できる（正確には、式（９ａ）のθをθ_i、DをD_gとしたものである）。反射音の数Ξは１≦Ξを満たす整数に設定されるが、上述の実施形態によるとΞ＝１であり、一つの反射板３００をマイクロホンアレーの近傍に設置するので、伝達特性a^→(ω,θ_i)は式（９ｂ）によって具体的に計算できる（正確には、式（９ｂ）のθをθ_iとしたものである）。ステアリングベクトルの計算には、例えば式（１０ａ）、式（１０ｂ）、式（１１ａ）、式（１１ｂ）、式（１１ｃ）、式（１１ｄ）を用いることができる。なお、式（９ａ）に拠らず、例えば実環境下における実測で得られた伝達特性を用いてもよい。そして、伝達特性a^→(ω,θ_i,D_g)を用いて、例えば式（７）、式（１８）、式（１９）、式（２４）のいずれかによってW^→(ω,θ_i,D_g)（1≦i≦I, 1≦g≦G）を求める。なお、式（７）または式（１９）または式（２４）を用いる場合には空間相関行列Q(ω,D_g)（あるいはR_xx(ω,D_g)）は式（８ｂ）で計算できる（正確には、式（８ｂ）のDをD_gとしたものである）。式（１８）を用いる場合には空間相関行列R_nn(ω,D_g)は式（１６）で計算できる（正確には、式（１６）のDをD_gとしたものである）。I×G×|Ω|個のフィルタW^→(ω,θ_i,D_g)（1≦i≦I, 1≦g≦G, ω∈Ω）は記憶部２９０に記憶される。|Ω|は集合Ωの要素数を表す。 Step S1
The filter design unit 260 calculates the filter W ^→ (ω, θ _i , D _g ) for each frequency in advance for each discrete position (θ _i , D _g ) that can be the target of speech enhancement. The total number of discrete directions that can be the target of speech enhancement is I (I is a predetermined integer of 1 or more and satisfies I ≦ P), and the total number of discrete distances is G (G is 1 or more in advance). W ^→ (ω, θ ₁ , D ₁ ), ..., W ^→ (ω, θ _i , D ₁ ), ..., W ^→ (ω, θ _I , D ₁ ), W ^→ (ω, θ ₁ , D ₂ ),…, W ^→ (ω, θ _i , D ₂ ),…, W ^→ (ω, θ _I , D ₂ ),…, W ^→ (ω, θ ₁ , D _g ), ..., W ^→ (ω, θ _i , D _g ), ..., W ^→ (ω, θ _I , D _g ), ..., W ^→ (ω, θ ₁ , D _G ), ..., W ^→ (ω, θ _i , D _G ), ..., W ^→ (ω, θ _I , D _G ) (1 ≦ i ≦ I, 1 ≦ g ≦ G, ω∈Ω; i is an integer, (Ω is a set of frequencies ω) is calculated in advance. For this purpose, the transfer characteristic a ^→ (ω, θ _i , D _g ) = [a ₁ (ω, θ _i , D _g ), ..., a _M (ω, θ _i , D _g )] ^T (1 ≦ i ≤ I, 1 ≤ g ≤ G, ω ∈ Ω), which is determined by the arrangement of the microphones in the microphone array and the positional relationship of the reflectors such as the reflector to the microphone array (this has already been determined) It can be concretely calculated by formula (9a) based on environmental information such as the arrival time difference between the direct sound and the ξ-th (1 ≦ ξ ≦ Ξ) reflected sound and the reflectance of the sound of the reflector (exact) In the equation (9a), θ _i is θ _i and D is D _g ). The number of the reflected sounds is set to an integer satisfying 1 ≦ Ξ, but according to the above-described embodiment, Ξ = 1, and since one reflector 300 is installed in the vicinity of the microphone array, the transfer characteristic a ^→ ( (ω, θ _i ) can be specifically calculated by equation (9b) (more precisely, θ in equation (9b) is θ _i ). For the calculation of the steering vector, for example, Expression (10a), Expression (10b), Expression (11a), Expression (11b), Expression (11c), and Expression (11d) can be used. In addition, you may use the transfer characteristic obtained by actual measurement in a real environment, for example, without relying on Formula (9a). Then, using the transfer characteristic a ^→ (ω, θ _i , D _g ), for example, W ^→ (ω, θ _i according to any one of Expression (7), Expression (18), Expression (19), and Expression (24). , D _g ) (1 ≦ i ≦ I, 1 ≦ g ≦ G). In addition, when using Expression (7), Expression (19), or Expression (24), the spatial correlation matrix Q (ω, D _g ) (or R _xx (ω, D _g )) can be calculated by Expression (8b). (To be exact, D in the formula (8b) is D _g ). When using equation (18), the spatial correlation matrix R _nn (ω, D _g ) can be calculated by equation (16) (exactly, D in equation (16) is D _g ). The I × G × | Ω | filters W ^→ (ω, θ _i , D _g ) (1 ≦ i ≦ I, 1 ≦ g ≦ G, ω∈Ω) are stored in the storage unit 290. | Ω | represents the number of elements of the set Ω.

ステップＳ６
フィルタ適用部２４０は、フレームkごとに、各周波数ω∈Ωについて、周波数領域信号X^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^Tに、強調したい位置(θ_s,D_h)に対応するフィルタW^→(ω,θ_s,D_h)を適用して、出力信号Y(ω,k,θ_s,D_h)を出力する（式（２８）参照）。位置(θ_s,D_h)のインデックスs, hは、s∈{1,…,I}, h∈{1,…,G}であり、フィルタW^→(ω,θ_s,D_h)は記憶部２９０に記憶されているので、例えば、ステップＳ６の処理の都度、フィルタ適用部２４０は、強調したい位置(θ_s,D_h)に対応するフィルタW^→(ω,θ_s,D_h)を記憶部２９０から取得すればよい。方向θ_sのインデックスsが集合{1,…,I}に属さない場合あるいは距離D_hのインデックスhが集合{1,…,G}に属さない場合、つまり、位置(θ_s,D_h)に対応するフィルタW^→(ω,θ_s,D_h)がステップＳ１の処理で計算されていない場合、臨時に位置(θ_s,D_h)に対応するフィルタW^→(ω,θ_s,D_h)をフィルタ設計部２６０に計算させてもよいし、あるいは方向θ_sに近い方向θ_s'や距離D_hに近い距離D_h'に対応するフィルタW^→(ω,θ_s',D_h)やW^→(ω,θ_s,D_h')やW^→(ω,θ_s',D_h')を用いてよい。

Step S6
The filter application unit 240 changes the frequency domain signal X ^→ (ω, k) = [X ₁ (ω, k),..., X _M (ω, k)] ^T for each frequency ω∈Ω for each frame k. Apply the filter W ^→ (ω, θ _s , D _h ) corresponding to the position (θ _s , D _h ) to be emphasized, and output the output signal Y (ω, k, θ _s , D _h ) (formula (Refer to (28)). The indices s, h at the position (θ _s , D _h ) are s∈ {1,…, I}, h∈ {1,…, G}, and the filter W ^→ (ω, θ _s , D _h ) is Since it is stored in the storage unit 290, for example, each time the process of step S6, the filter application unit 240 filters W ^→ (ω, θ _s , D _h ) corresponding to the position (θ _s , D _h ) to be emphasized. May be acquired from the storage unit 290. If the index s in the direction θ _s does not belong to the set {1, ..., I} or the index h in the distance D _h does not belong to the set {1, ..., G}, that is, the position (θ _s , D _h ) If the filter W ^→ (ω, θ _s , D _h ) corresponding to いない is not calculated in the process of step S1, the filter W ^→ (ω, θ _s , D) corresponding to the temporary position (θ _s , D _h ) it the _h) may be calculated in the filter design unit 260, or the filter corresponds to the _'distance D _h near and distance D _{_h'} direction theta _s close to the direction _{^{θ s W → (ω, θ}} s', D h ), W ^→ (ω, θ _s , D _{h ′} ) or W ^→ (ω, θ _{s ′} , D _{h ′} ).

ステップＳ７
時間領域変換部２５０は、第kフレームの各周波数ω∈Ωの出力信号Y(ω,k,θ_s,D_h)を時間領域に変換して第kフレームのフレーム単位時間領域信号y(k)を得て、さらに、得られたフレーム単位時間領域信号y(k)をフレーム番号のインデックスの順番に連結して位置(θ_s,D_h)からの音声が強調された時間領域信号y(t)を出力する。周波数領域信号を時間領域信号に変換する方法は、ステップＳ５の処理で用いた変換方法に対応する逆変換であり、例えば高速離散逆フーリエ変換である。 Step S7
The time domain conversion unit 250 converts the output signal Y (ω, k, θ _s , D _h ) of each frequency ω∈Ω of the kth frame into the time domain, and converts the frame unit time domain signal y (k) of the kth frame. ), And the obtained frame unit time domain signal y (k) is concatenated in the order of the index of the frame number, and the time domain signal y () in which the speech from the position (θ _s , D _h ) is enhanced. t) is output. The method for converting the frequency domain signal into the time domain signal is an inverse transformation corresponding to the transformation method used in the process of step S5, for example, a fast discrete inverse Fourier transform.

ここでは、ステップＳ１の処理で予めフィルタW^→(ω,θ_i,D_g)を計算しておく実施形態を説明したが、音声処理装置２の計算処理能力などに応じて、位置(θ_s,D_h)が定まってからフィルタ設計部２６０が周波数ごとのフィルタW^→(ω,θ_s,D_h)を計算する実施形態を採用することもできる。 Here, the embodiment in which the filter W ^→ (ω, θ _i , D _g ) is calculated in advance in the process of step S1 has been described, but the position (θ _s ) is determined according to the calculation processing capability of the speech processing device 2 and the like. , D _h ) can be adopted, and the filter design unit 260 can calculate the filter W ^→ (ω, θ _s , D _h ) for each frequency.

《適用形態３》
適用形態３の機能構成および処理フローを図６と図７に示す。この適用形態３の音声処理装置３は、ＡＤ変換部２１０、フレーム生成部２２０、周波数領域変換部２３０、フィルタ適用部２４０、時間領域変換部２５０、フィルタ設計部２６０、記憶部２９０を含む。 << Applicable form 3 >>
A functional configuration and a processing flow of the application form 3 are shown in FIGS. The speech processing apparatus 3 according to the application form 3 includes an AD conversion unit 210, a frame generation unit 220, a frequency domain conversion unit 230, a filter application unit 240, a time domain conversion unit 250, a filter design unit 260, and a storage unit 290.

まず、反射板の位置は、上述の本発明の実施形態によって決定される。続いて、下記の処理が続行する（図１２、図１３も参照のこと）。 First, the position of the reflector is determined by the above-described embodiment of the present invention. Subsequently, the following processing continues (see also FIGS. 12 and 13).

ステップＳ１
予め、フィルタ設計部２６０が音声再生の対象となりえる離散的な方向ごとに、周波数ごとのフィルタW^→(ω,θ_i)を計算しておく。音声再生の対象となりえる離散的な方向の総数をI（Iは１以上の予め定められた整数であり、I≦Pを満たす）とすると、W^→(ω,θ₁)，…，W^→(ω,θ_i)，…，W^→(ω,θ_I)（1≦i≦I, ω∈Ω; iは整数、Ωは周波数ωの集合）を事前に計算しておくのである。このためには、伝達特性a^→(ω,θ_i)＝[a₁(ω,θ_i),…,a_M(ω,θ_i)]^T（1≦i≦I, ω∈Ω）を求める必要があるが、これは、スピーカアレーにおけるスピーカの配置、反射物である例えば反射板のスピーカアレーに対する位置関係（これは既に決定されている）、直接音とξ番目（1≦ξ≦Ξ）の双対音との時間差、反射物の音の反射率などの環境情報を基に式（９ａ）によって具体的に計算できる（正確には、式（９ａ）のθをθ_iとしたものである）。双対音の数Ξは１≦Ξを満たす整数に設定されるが、上述の実施形態によるとΞ＝１であり、一つの反射板３００をマイクロホンアレーの近傍に設置するので、伝達特性a^→(ω,θ_i)は式（９ｂ）によって具体的に計算できる（正確には、式（９ｂ）のθをθ_iとしたものである）。ステアリングベクトルの計算には、例えば式（１０ａ）、式（１０ｂ）、式（１１ａ）、式（１１ｂ）、式（１１ｃ）、式（１１ｄ）を用いることができる。なお、式（１０ａ）や式（１０ｂ）に拠らず、例えば実環境下における実測で得られた伝達特性を用いてもよい。そして、伝達特性a^→(ω,θ_i)を用いて、例えば式（７）、式（１８）、式（１９）、式（２４）のいずれかによってW^→(ω,θ_i)（1≦i≦I）を求める。なお、式（７）または式（１９）または式（２４）を用いる場合には空間相関行列Q(ω)（あるいはR_xx(ω)）は式（８ｂ）で計算できる。式（１８）を用いる場合には空間相関行列R_nn(ω)は式（１６）で計算できる。I×|Ω|個のフィルタW^→(ω,θ_i)（1≦i≦I,ω∈Ω）は記憶部２９０に記憶される。|Ω|は集合Ωの要素数を表す。 Step S1
In advance, the filter design unit 260 calculates a filter W ^→ (ω, θ _i ) for each frequency for each discrete direction that can be a target of audio reproduction. The total number of I discrete directions that may be subject to audio playback (I is one or more predetermined integer, satisfying the I ≦ P) ^{_{When, W → (ω, θ 1}} ), ..., W → (ω, θ _i ),..., W ^→ (ω, θ _I ) (1 ≦ i ≦ I, ω∈Ω; i is an integer, Ω is a set of frequencies ω) is calculated in advance. For this purpose, transfer characteristic a ^→ (ω, θ _i ) = [a ₁ (ω, θ _i ), ..., a _M (ω, θ _i )] ^T (1 ≦ i ≦ I, ω∈Ω) Although it is necessary to obtain this, the speaker arrangement in the speaker array, the positional relationship of the reflector, for example, the position of the reflector with respect to the speaker array (which has already been determined), the direct sound and the ξth (1 ≦ ξ ≦ Ξ ) Can be concretely calculated by the equation (9a) based on the environmental information such as the time difference from the dual sound and the reflectance of the sound of the reflector (more precisely, θ in the equation (9a) is θ _i. is there). The number of dual sounds 音 is set to an integer satisfying 1 ≦ Ξ. However, according to the above-described embodiment, Ξ = 1, and since one reflector 300 is installed in the vicinity of the microphone array, the transfer characteristic a ^→ ( (ω, θ _i ) can be specifically calculated by equation (9b) (more precisely, θ in equation (9b) is θ _i ). For the calculation of the steering vector, for example, Expression (10a), Expression (10b), Expression (11a), Expression (11b), Expression (11c), and Expression (11d) can be used. In addition, you may use the transfer characteristic obtained by actual measurement in a real environment, for example, without depending on Formula (10a) and Formula (10b). Then, using the transfer characteristic a ^→ (ω, θ _i ), for example, W ^→ (ω, θ _i ) (1) according to any one of the equations (7), (18), (19), and (24). ≦ i ≦ I) is obtained. In addition, when using Formula (7), Formula (19), or Formula (24), the spatial correlation matrix Q (ω) (or R _xx (ω)) can be calculated by Formula (8b). When using equation (18), the spatial correlation matrix R _nn (ω) can be calculated by equation (16). I × | Ω | filters W ^→ (ω, θ _i ) (1 ≦ i ≦ I, ω∈Ω) are stored in the storage unit 290. | Ω | represents the number of elements of the set Ω.

ステップＳ２
音源２００が音源信号ss(t)を出力する。この実施形態では、音源２００からの音源信号ss(t)はアナログ信号であるとする。ただし、音源信号としてディジタル信号を用いることもできる。 Step S2
The sound source 200 outputs a sound source signal ss (t). In this embodiment, it is assumed that the sound source signal ss (t) from the sound source 200 is an analog signal. However, a digital signal can also be used as the sound source signal.

ステップＳ３
ＡＤ変換部２１０が、音源信号ss(t)をディジタル信号s(t)へAD変換する。ここでｔは離散時間のインデックスを表す。なお、ディジタル信号が音源信号である場合には、ステップＳ３の処理を行う必要がなく、音源信号をＡＤ変換部２１０の出力信号であるs(t)と見なすことができる。 Step S3
The AD converter 210 AD converts the sound source signal ss (t) into a digital signal s (t). Here, t represents an index of discrete time. When the digital signal is a sound source signal, it is not necessary to perform the process of step S3, and the sound source signal can be regarded as s (t) that is an output signal of the AD conversion unit 210.

ステップＳ４
フレーム生成部２２０は、ＡＤ変換部２１０が出力したディジタル信号s(t)を入力とし、Nサンプルをバッファに貯めてフレーム単位のディジタル信号s(k)を出力する。kはフレーム番号のインデックスである。s(k)=[s((k-1)N+1),…,s(kN)]である。Nはサンプリング周波数にもよるが、16kHzサンプリングの場合には512点あたりが妥当である。 Step S4
The frame generation unit 220 receives the digital signal s (t) output from the AD conversion unit 210, stores N samples in a buffer, and outputs a digital signal s (k) in units of frames. k is an index of the frame number. s (k) = [s ((k−1) N + 1),..., s (kN)]. N depends on the sampling frequency, but in the case of 16kHz sampling, around 512 points is reasonable.

ステップＳ５
周波数領域変換部２３０は、各フレームのディジタル信号s(k)を周波数領域の信号S(ω,k)に変換して出力する。ωは離散周波数のインデックスである。時間領域信号を周波数領域信号に変換する方法の一つに高速離散フーリエ変換があるが、これに限定されず、周波数領域信号に変換する他の方法を用いてもよい。周波数領域信号S(ω,k)は、各周波数ω、フレームkごとに出力される。 Step S5
The frequency domain converter 230 converts the digital signal s (k) of each frame into a frequency domain signal S (ω, k) and outputs it. ω is an index of discrete frequency. One method for converting a time domain signal to a frequency domain signal is a fast discrete Fourier transform, but the present invention is not limited to this, and other methods for converting to a frequency domain signal may be used. The frequency domain signal S (ω, k) is output for each frequency ω and for each frame k.

ステップＳ６
フィルタ適用部２４０は、フレームkごとに、各周波数ω∈Ωについて、周波数領域信号S(ω,k)に、再生したい目的方向θ_sに対応するフィルタW^→(ω,θ_s)を適用して、再生信号X^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^Tを出力する（式（２９）参照）。目的方向θ_sのインデックスsは、s∈{1,…,I}であり、フィルタW^→(ω,θ_s)は記憶部２９０に記憶されているので、例えば、ステップＳ６の処理の都度、フィルタ適用部２４０は、再生したい目的方向θ_sに対応するフィルタW^→(ω,θ_s)を記憶部２９０から取得すればよい。目的方向θ_sのインデックスsが集合{1,…,I}に属さない場合、つまり、目的方向θ_sに対応するフィルタW^→(ω,θ_s)がステップＳ１の処理で計算されていない場合、臨時に目的方向θ_sに対応するフィルタW^→(ω,θ_s)をフィルタ設計部２６０に計算させてもよいし、あるいは目的方向θ_sに近い方向θ_s'に対応するフィルタW^→(ω,θ_s')を用いてよい。

Step S6
The filter application unit 240 applies the filter W ^→ (ω, θ _s ) corresponding to the desired direction θ _s to be reproduced to the frequency domain signal S (ω, k) for each frequency ω∈Ω for each frame k. Thus, the reproduction signal X ^→ (ω, k) = [X ₁ (ω, k),..., X _M (ω, k)] ^T is output (see Expression (29)). Since the index s of the target direction θ _s is s∈ {1,..., I}, and the filter W ^→ (ω, θ _s ) is stored in the storage unit 290, for example, every time the process of step S6 is performed, The filter application unit 240 may acquire the filter W ^→ (ω, θ _s ) corresponding to the target direction θ _s to be reproduced from the storage unit 290. When the index s of the target direction θ _s does not belong to the set {1,..., I}, that is, when the filter W ^→ (ω, θ _s ) corresponding to the target direction θ _s is not calculated in the process of step S1. The filter design unit 260 may calculate the filter W ^→ (ω, θ _s ) corresponding to the target direction θ _s temporarily, or the filter W ^→ ( ^→ corresponding to the direction θ _{s ′} close to the target direction θ _s (ω, θ _{s ′} ) may be used.

ステップＳ７
時間領域変換部２５０は、第kフレームの各周波数ω∈Ωの再生信号X^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^Tを時間領域に変換して第kフレームのフレーム単位時間領域信号x^→(k)＝[x₁(k),…,x_M(k)]^Tを得て、さらに、得られたフレーム単位時間領域信号x^→(k)＝[x₁(k),…,x_M(k)]^Tをフレーム番号のインデックスの順番に連結して再生方向である目的方向θ_sに向けて音声が強調された時間領域信号x^→(t)＝[x₁(t),…,x_M(t)]^Tを出力する。周波数領域信号を時間領域信号に変換する方法は、ステップＳ５の処理で用いた変換方法に対応する逆変換であり、例えば高速離散逆フーリエ変換である。 Step S7
The time domain transform unit 250 converts the reproduction signal X of each frequency ω∈Ω of the k-th frame ^→ (ω, k) = [X ₁ (ω, k),..., X _M (ω, k)] ^T into the time domain. To obtain a frame unit time domain signal x ^→ (k) = [x ₁ (k),..., X _M (k)] ^T of the k-th frame, and further obtain the frame unit time domain signal x ^→ (k) = [x ₁ (k),..., X _M (k)] A time domain in which speech is emphasized toward the target direction θ _s that is the playback direction by concatenating ^T in the order of the frame number index. Signal x ^→ (t) = [x ₁ (t),..., X _M (t)] ^T is output. The method for converting the frequency domain signal into the time domain signal is an inverse transformation corresponding to the transformation method used in the process of step S5, for example, a fast discrete inverse Fourier transform.

ステップＳ８
Mチャネルの時間領域信号x₁(t),…,x_M(t)はそれぞれ、スピーカアレーを構成するM個のスピーカ２８０−１，…，２８０−Ｍのうち、チャネルに対応するスピーカで再生される。つまり、ｍ番目(1≦m≦M)のチャネルの時間領域信号x_m(t)はｍ番目のスピーカ２８０−ｍで再生される。 Step S8
M channel time domain signals x ₁ (t),..., X _M (t) are reproduced by the speakers corresponding to the channel among the M speakers 280-1,. Is done. That is, the time domain signal x _m (t) of the m-th (1 ≦ m ≦ M) channel is reproduced by the m-th speaker 280-m.

なお、M個のスピーカの並べ方に制限は無い。線形スピーカアレーのように直線状にスピーカを配置するアレー構成でもよいし、２次元または３次元的にM個のスピーカを配置するアレー構成でもよい。また、再生方向として設定できる方向を広くとるためには、各スピーカの指向性は、再生方向である目的方向θ_sになり得る方向に或る程度の音圧で音声を再生可能な指向性を持っていたほうがよい。したがって、無指向性スピーカや単一指向性スピーカといった指向性が比較的緩やかなスピーカが好適である。 There is no limit to the way the M speakers are arranged. An array configuration in which speakers are linearly arranged like a linear speaker array may be used, or an array configuration in which M speakers are two-dimensionally or three-dimensionally arranged. In addition, in order to make the direction that can be set as the reproduction direction wide, the directivity of each speaker is such that the sound can be reproduced with a certain sound pressure in the direction that can be the target direction θ _s that is the reproduction direction. You should have it. Therefore, a speaker having relatively gentle directivity such as a non-directional speaker or a unidirectional speaker is preferable.

ここでは、ステップＳ１の処理で予めフィルタW^→(ω,θ_i)を計算しておく実施形態を説明したが、音声処理装置３の計算処理能力などに応じて、再生方向である目的方向θ_sが定まってからフィルタ設計部２６０が周波数ごとのフィルタW^→(ω,θ_s)を計算する実施形態を採用することもできる。 Here, the embodiment in which the filter W ^→ (ω, θ _i ) is calculated in advance in the process of step S1 has been described. However, the target direction θ that is the reproduction direction depends on the calculation processing capability of the audio processing device 3 and the like. _An embodiment in which the filter design unit 260 calculates the filter W ^→ (ω, θ _s ) for each frequency after _s is determined may be employed.

《適用形態４》
適用形態４の機能構成および処理フローを図８と図９に示す。この適用形態４の音声処理装置４は、ＡＤ変換部２１０、フレーム生成部２２０、周波数領域変換部２３０、フィルタ適用部２４０、時間領域変換部２５０、フィルタ設計部２６０、記憶部２９０を含む。 << Applicable form 4 >>
The functional configuration and processing flow of application form 4 are shown in FIGS. The speech processing apparatus 4 according to Application Mode 4 includes an AD conversion unit 210, a frame generation unit 220, a frequency domain conversion unit 230, a filter application unit 240, a time domain conversion unit 250, a filter design unit 260, and a storage unit 290.

まず、反射板の位置は、上述の本発明の実施形態によって決定される。続いて、下記の処理が続行する（図１２、図１３も参照のこと）。なお、適用形態４において引用する式は＜距離の導入＞欄の式とする。 First, the position of the reflector is determined by the above-described embodiment of the present invention. Subsequently, the following processing continues (see also FIGS. 12 and 13). The formula quoted in the application form 4 is the formula in the <Introduction of distance> column.

ステップＳ１
予め、フィルタ設計部２６０が音声スポット再生の対象となりえる離散的な位置(θ_i,D_g)ごとに、周波数ごとのフィルタW^→(ω,θ_i,D_g)を計算しておく。音声スポット再生の対象となりえる離散的な方向の総数をI（Iは１以上の予め定められた整数であり、I≦Pを満たす）、離散的な距離の総数をG（Gは１以上の予め定められた整数であり、G≦Zを満たす）とすると、W^→(ω,θ₁,D₁)，…，W^→(ω,θ_i,D₁)，…，W^→(ω,θ_I,D₁),W^→(ω,θ₁,D₂)，…，W^→(ω,θ_i,D₂)，…，W^→(ω,θ_I,D₂)，…，W^→(ω,θ₁,D_g)，…，W^→(ω,θ_i,D_g)，…，W^→(ω,θ_I,D_g)，…，W^→(ω,θ₁,D_G)，…，W^→(ω,θ_i,D_G)，…，W^→(ω,θ_I,D_G)（1≦i≦I, 1≦g≦G, ω∈Ω; iは整数、Ωは周波数ωの集合）を事前に計算しておくのである。このためには、伝達特性a^→(ω,θ_i,D_g)＝[a₁(ω,θ_i,D_g),…,a_M(ω,θ_i,D_g)]^T（1≦i≦I, 1≦g≦G, ω∈Ω）を求める必要があるが、これは、スピーカアレーにおけるスピーカの配置、反射物である例えば反射板のスピーカアレーに対する位置関係（これは既に決定されている）、直接音とξ番目（1≦ξ≦Ξ）の双対音との時間差、反射物の音の反射率などの環境情報を基に式（９ａ）によって具体的に計算できる（正確には、式（９ａ）のθをθ_i、DをD_gとしたものである）。双対音の数Ξは１≦Ξを満たす整数に設定されるが、上述の実施形態によるとΞ＝１であり、一つの反射板３００をマイクロホンアレーの近傍に設置するので、伝達特性a^→(ω,θ_i)は式（９ｂ）によって具体的に計算できる（正確には、式（９ｂ）のθをθ_iとしたものである）。ステアリングベクトルの計算には、例えば式（１０ａ）、式（１０ｂ）、式（１１ａ）、式（１１ｂ）、式（１１ｃ）、式（１１ｄ）を用いることができる。なお、式（１０ａ）、式（１０ｂ）に拠らず、例えば実環境下における実測で得られた伝達特性を用いてもよい。そして、伝達特性a^→(ω,θ_i,D_g)を用いて、例えば式（７）、式（１８）、式（１９）、式（２４）のいずれかによってW^→(ω,θ_i,D_g)（1≦i≦I, 1≦g≦G）を求める。なお、式（７）または式（１９）または式（２４）を用いる場合には空間相関行列Q(ω,D_g)（あるいはR_xx(ω,D_g)）は式（８ｂ）で計算できる（正確には、式（８ｂ）のDをD_gとしたものである）。式（１８）を用いる場合には空間相関行列R_nn(ω,D_g)は式（１６）で計算できる（正確には、式（１６）のDをD_gとしたものである）。I×G×|Ω|個のフィルタW^→(ω,θ_i,D_g)（1≦i≦I, 1≦g≦G, ω∈Ω）は記憶部２９０に記憶される。|Ω|は集合Ωの要素数を表す。 Step S1
The filter design unit 260 calculates the filter W ^→ (ω, θ _i , D _g ) for each frequency in advance for each discrete position (θ _i , D _g ) that can be an audio spot reproduction target. The total number of discrete directions that can be the target of audio spot reproduction is I (I is a predetermined integer of 1 or more and satisfies I ≦ P), and the total number of discrete distances is G (G is 1 or more). a predetermined integer, when satisfy ^{G ≦ Z), W → (} ω, θ 1, D 1), ..., W → (ω, θ i, D 1), ..., W → (ω, θ _I , D ₁ ), W ^→ (ω, θ ₁ , D ₂ ), ..., W ^→ (ω, θ _i , D ₂ ), ..., W ^→ (ω, θ _I , D ₂ ), ..., W ^→ (ω, θ ₁ , D _g ), ..., W ^→ (ω, θ _i , D _g ), ..., W ^→ (ω, θ _I , D _g ), ..., W ^→ (ω, θ ₁ , D _G ), ..., W ^→ (ω, θ _i , D _G ), ..., W ^→ (ω, θ _I , D _G ) (1 ≦ i ≦ I, 1 ≦ g ≦ G, ω∈Ω; i is an integer , Ω is a set of frequencies ω) in advance. For this purpose, the transfer characteristic a ^→ (ω, θ _i , D _g ) = [a ₁ (ω, θ _i , D _g ), ..., a _M (ω, θ _i , D _g )] ^T (1 ≦ i ≤ I, 1 ≤ g ≤ G, ω ∈ Ω), which is determined by the arrangement of the speakers in the speaker array and the positional relationship of the reflector, for example, the reflector plate (this is already determined) It can be concretely calculated by equation (9a) based on environmental information such as the time difference between the direct sound and the ξth (1 ≦ ξ ≦ Ξ) dual sound, and the reflectance of the sound of the reflector (exactly In the formula (9a) is θ _i and D is D _g ). The number of dual sounds 音 is set to an integer satisfying 1 ≦ Ξ. However, according to the above-described embodiment, Ξ = 1, and since one reflector 300 is installed in the vicinity of the microphone array, the transfer characteristic a ^→ ( (ω, θ _i ) can be specifically calculated by equation (9b) (more precisely, θ in equation (9b) is θ _i ). For the calculation of the steering vector, for example, Expression (10a), Expression (10b), Expression (11a), Expression (11b), Expression (11c), and Expression (11d) can be used. In addition, you may use the transfer characteristic obtained by actual measurement in a real environment, for example, without depending on Formula (10a) and Formula (10b). Then, using the transfer characteristic a ^→ (ω, θ _i , D _g ), for example, W ^→ (ω, θ _i according to any one of Expression (7), Expression (18), Expression (19), and Expression (24). , D _g ) (1 ≦ i ≦ I, 1 ≦ g ≦ G). In addition, when using Expression (7), Expression (19), or Expression (24), the spatial correlation matrix Q (ω, D _g ) (or R _xx (ω, D _g )) can be calculated by Expression (8b). (To be exact, D in the formula (8b) is D _g ). When using equation (18), the spatial correlation matrix R _nn (ω, D _g ) can be calculated by equation (16) (exactly, D in equation (16) is D _g ). The I × G × | Ω | filters W ^→ (ω, θ _i , D _g ) (1 ≦ i ≦ I, 1 ≦ g ≦ G, ω∈Ω) are stored in the storage unit 290. | Ω | represents the number of elements of the set Ω.

ステップＳ６
フィルタ適用部２４０は、フレームkごとに、各周波数ω∈Ωについて、周波数領域信号S(ω,k)に、スポット再生したい位置(θ_s,D_h)に対応するフィルタW^→(ω,θ_s,D_h)を適用して、再生信号X^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^Tを出力する（式（３０）参照）。位置(θ_s,D_h)のインデックスs, hは、s∈{1,…,I}, h∈{1,…,G}であり、フィルタW^→(ω,θ_s,D_h)は記憶部２９０に記憶されているので、例えば、ステップＳ６の処理の都度、フィルタ適用部２４０は、スポット再生したい位置(θ_s,D_h)に対応するフィルタW^→(ω,θ_s,D_h)を記憶部２９０から取得すればよい。方向θ_sのインデックスsが集合{1,…,I}に属さない場合あるいは距離D_hのインデックスhが集合{1,…,G}に属さない場合、つまり、位置(θ_s,D_h)に対応するフィルタW^→(ω,θ_s,D_h)がステップＳ１の処理で計算されていない場合、臨時に位置(θ_s,D_h)に対応するフィルタW^→(ω,θ_s,D_h)をフィルタ設計部２６０に計算させてもよいし、あるいは方向θ_sに近い方向θ_s'や距離D_hに近い距離D_h'に対応するフィルタW^→(ω,θ_s',D_h)やW^→(ω,θ_s,D_h')やW^→(ω,θ_s',D_h')を用いてよい。

Step S6
For each frequency ω∈Ω, the filter application unit 240 filters the frequency W of the frequency domain signal S (ω, k) corresponding to the position (θ _s , D _h ) to be spot-reproduced for each frequency ω∈Ω ^→ (ω, θ _{s 1} , D _h ), and the reproduction signal X ^→ (ω, k) = [X ₁ (ω, k),..., X _M (ω, k)] ^T is output (see Expression (30)). . The indices s, h at the position (θ _s , D _h ) are s∈ {1,…, I}, h∈ {1,…, G}, and the filter W ^→ (ω, θ _s , D _h ) is Since it is stored in the storage unit 290, for example, each time the process of step S6, the filter application unit 240 filters W ^→ (ω, θ _s , D _h ) corresponding to the position (θ _s , D _h ) at which spot reproduction is desired. ) From the storage unit 290. If the index s in the direction θ _s does not belong to the set {1, ..., I} or the index h in the distance D _h does not belong to the set {1, ..., G}, that is, the position (θ _s , D _h ) If the filter W ^→ (ω, θ _s , D _h ) corresponding to いない is not calculated in the process of step S1, the filter W ^→ (ω, θ _s , D) corresponding to the temporary position (θ _s , D _h ) it the _h) may be calculated in the filter design unit 260, or the filter corresponds to the _'distance D _h near and distance D _{_h'} direction theta _s close to the direction _{^{θ s W → (ω, θ}} s', D h ), W ^→ (ω, θ _s , D _{h ′} ) or W ^→ (ω, θ _{s ′} , D _{h ′} ).

ステップＳ７
時間領域変換部２５０は、第kフレームの各周波数ω∈Ωの再生信号X^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^Tを時間領域に変換して第kフレームのフレーム単位時間領域信号x^→(k)＝[x₁(k),…,x_M(k)]^Tを得て、さらに、得られたフレーム単位時間領域信号x^→(k)＝[x₁(k),…,x_M(k)]^Tをフレーム番号のインデックスの順番に連結してスポット再生したい位置(θ_s,D_h)に向けて音声が強調された時間領域信号x^→(t)＝[x₁(t),…,x_M(t)]^Tを出力する。周波数領域信号を時間領域信号に変換する方法は、ステップＳ５の処理で用いた変換方法に対応する逆変換であり、例えば高速離散逆フーリエ変換である。
ステップＳ８
Mチャネルの時間領域信号x₁(t),…,x_M(t)はそれぞれ、スピーカアレーを構成するM個のスピーカ２８０−１，…，２８０−Ｍのうち、チャネルに対応するスピーカで再生される。つまり、ｍ番目(1≦m≦M)のチャネルの時間領域信号x_m(t)はｍ番目のスピーカ２８０−ｍで再生される。 Step S7
The time domain transform unit 250 converts the reproduction signal X of each frequency ω∈Ω of the k-th frame ^→ (ω, k) = [X ₁ (ω, k),..., X _M (ω, k)] ^T into the time domain. To obtain a frame unit time domain signal x ^→ (k) = [x ₁ (k),..., X _M (k)] ^T of the k-th frame, and further obtain the frame unit time domain signal x ^→ (k) = [x ₁ (k), ..., x _M (k)] The speech is emphasized toward the position (θ _s , D _h ) where spot playback is desired by concatenating ^T in the order of the frame number index. Time domain signal x ^→ (t) = [x ₁ (t),..., X _M (t)] ^T is output. The method for converting the frequency domain signal into the time domain signal is an inverse transformation corresponding to the transformation method used in the process of step S5, for example, a fast discrete inverse Fourier transform.
Step S8
M channel time domain signals x ₁ (t),..., X _M (t) are reproduced by the speakers corresponding to the channel among the M speakers 280-1,. Is done. That is, the time domain signal x _m (t) of the m-th (1 ≦ m ≦ M) channel is reproduced by the m-th speaker 280-m.

なお、M個のスピーカの並べ方に制限は無い。線形スピーカアレーのように直線状にスピーカを配置するアレー構成でもよいし、２次元または３次元的にM個のスピーカを配置するアレー構成でもよい。また、収音方向として設定できる方向を広くとるためには、各スピーカの指向性は、再生方向である目的方向θ_sになり得る方向に或る程度の音圧で音声を再生可能な指向性を持っていたほうがよい。したがって、無指向性スピーカや単一指向性スピーカといった指向性が比較的緩やかなスピーカが好適である。 There is no limit to the way the M speakers are arranged. An array configuration in which speakers are linearly arranged like a linear speaker array may be used, or an array configuration in which M speakers are two-dimensionally or three-dimensionally arranged. In addition, in order to make the direction that can be set as the sound collection direction wide, the directivity of each speaker is the directivity that can reproduce sound with a certain sound pressure in the direction that can be the target direction θ _s that is the reproduction direction. It is better to have Therefore, a speaker having relatively gentle directivity such as a non-directional speaker or a unidirectional speaker is preferable.

ここでは、ステップＳ１の処理で予めフィルタW^→(ω,θ_i,D_g)を計算しておく実施形態を説明したが、音声処理装置４の計算処理能力などに応じて、位置(θ_s,D_h)が定まってからフィルタ設計部２６０が周波数ごとのフィルタW^→(ω,θ_s,D_h)を計算する実施形態を採用することもできる。 Here, the embodiment in which the filter W ^→ (ω, θ _i , D _g ) is calculated in advance in the process of step S1 has been described, but the position (θ _s ) is determined according to the calculation processing capability of the speech processing device 4 and the like. , D _h ) can be adopted, and the filter design unit 260 can calculate the filter W ^→ (ω, θ _s , D _h ) for each frequency.

＜反射物配置決定装置のハードウェア構成例＞
上述の実施形態に関わる反射物配置決定装置は、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、ＣＰＵ（Central Processing Unit）〔キャッシュメモリなどを備えていてもよい。〕、メモリであるＲＡＭ（Random Access Memory）やＲＯＭ（Read Only Memory）と、ハードディスクである外部記憶装置、並びにこれらの入力部、出力部、ＣＰＵ、ＲＡＭ、ＲＯＭ、外部記憶装置間のデータのやり取りが可能なように接続するバスなどを備えている。また必要に応じて、反射物配置決定装置に、ＣＤ−ＲＯＭなどの記憶媒体を読み書きできる装置（ドライブ）などを設けるとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。 <Hardware configuration example of reflector arrangement determination device>
The reflector arrangement determining apparatus according to the above-described embodiment may include an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, a CPU (Central Processing Unit) [cache memory, or the like. ] RAM (Random Access Memory) or ROM (Read Only Memory) and external storage device as a hard disk, and data exchange between these input unit, output unit, CPU, RAM, ROM, and external storage device It has a bus that can be connected. Further, if necessary, the reflector arrangement determining device may be provided with a device (drive) that can read and write a storage medium such as a CD-ROM. A physical entity having such hardware resources includes a general-purpose computer.

反射物配置決定装置の外部記憶装置には、反射物の配置を決定するためのプログラム並びにこのプログラムの処理において必要となるデータなどが記憶されている〔外部記憶装置に限らず、例えばプログラムを読み出し専用記憶装置であるＲＯＭに記憶させておくなどでもよい。〕。また、これらのプログラムの処理によって得られるデータなどは、ＲＡＭや外部記憶装置などに適宜に記憶される。データやその格納領域のアドレスなどを記憶する記憶装置を単に「記憶部」と呼ぶことにする。 The external storage device of the reflector arrangement determination device stores a program for determining the arrangement of the reflector and data necessary for processing of the program [not limited to the external storage device, for example, reading the program. It may be stored in a ROM which is a dedicated storage device. ]. Data obtained by the processing of these programs is appropriately stored in a RAM or an external storage device. A storage device that stores data, addresses of storage areas, and the like is simply referred to as a “storage unit”.

反射物配置決定装置の記憶部には、反射板の配置に関する候補（JK個）、式（１２）や式（１３）などに基づいて反射板の配置を決定するためのプログラムが記憶されている。 The storage unit of the reflector arrangement determination device stores a program for determining the arrangement of the reflector based on the candidates (JK) for the reflector arrangement, Equation (12), Equation (13), and the like. .

反射物配置決定装置では、記憶部に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてＲＡＭに読み込まれて、ＣＰＵで解釈実行・処理される。この結果、ＣＰＵが所定の機能（配置決定部）を実現することで反射物の配置決定が実現される。 In the reflector arrangement determining apparatus, each program stored in the storage unit and data necessary for processing each program are read into the RAM as necessary, and are interpreted and executed by the CPU. As a result, the CPU determines the arrangement of the reflecting object by realizing a predetermined function (arrangement determining unit).

また、音声処理装置についても同様のハードウェア構成とすることができ、音声処理装置の記憶部には、空間相関行列を用いて周波数ごとにフィルタを求めるためのプログラムと、アナログ信号に対してＡＤ変換を行うためのプログラム、フレーム生成処理を行うためのプログラム、フレームごとのディジタル信号を周波数領域の周波数領域信号に変換するためのプログラム、所望方向（および所望距離）に対応するフィルタを周波数ごとに周波数領域信号に適用して出力信号を得るためのプログラムと、出力信号を時間領域信号に変換するためのプログラムが記憶されている。 Also, the sound processing device can have the same hardware configuration, and the storage unit of the sound processing device has a program for obtaining a filter for each frequency using a spatial correlation matrix, and AD for analog signals. A program for performing conversion, a program for performing frame generation processing, a program for converting a digital signal for each frame into a frequency domain signal in the frequency domain, and a filter corresponding to a desired direction (and desired distance) for each frequency A program for obtaining an output signal by applying to a frequency domain signal and a program for converting the output signal into a time domain signal are stored.

音声処理装置では、記憶部に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてＲＡＭに読み込まれて、ＣＰＵで解釈実行・処理される。この結果、ＣＰＵが所定の機能（フィルタ設計部、ＡＤ変換部、フレーム生成部、周波数領域変換部、フィルタ適用部、時間領域変換部）を実現することで上述の音声処理が実現される。 In the audio processing device, each program stored in the storage unit and data necessary for processing each program are read into the RAM as necessary, and are interpreted and executed by the CPU. As a result, the above-described sound processing is realized by the CPU realizing predetermined functions (filter design unit, AD conversion unit, frame generation unit, frequency domain conversion unit, filter application unit, time domain conversion unit).

＜補記＞
本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 <Supplementary note>
The present invention is not limited to the above-described embodiment, and can be appropriately changed without departing from the spirit of the present invention. In addition, the processing described in the above embodiment may be executed not only in time series according to the order of description but also in parallel or individually as required by the processing capability of the apparatus that executes the processing. .

また、上記実施形態において説明したハードウェアエンティティ（反射物配置決定装置／音声処理装置）における処理機能をコンピュータによって実現する場合、ハードウェアエンティティが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記ハードウェアエンティティにおける処理機能がコンピュータ上で実現される。 Further, when the processing functions in the hardware entity (reflecting object arrangement determination device / audio processing device) described in the above embodiment are realized by a computer, the processing contents of the functions that the hardware entity should have are described by a program. Then, by executing this program on a computer, the processing functions in the hardware entity are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、ハードウェアエンティティを構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, a hardware entity is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

Claims

A filter applied to speech-based information is designed based on a predetermined evaluation function using a spatial correlation matrix represented by transfer characteristics in multiple directions in space,
Each of the above transfer characteristics is represented by the sum of the transfer characteristic of the direct sound and each transfer characteristic of one or more reflected sounds reflected by the reflector,
The direct sound transfer characteristic and the reflected sound transfer characteristic are determined based on information (hereinafter referred to as arrangement information) indicating an arrangement relationship of the reflector with respect to a microphone array or a speaker array.
The evaluation function is a function that takes a value that is so small that the voice in at least one target direction selected from the plurality of directions is emphasized,
The storage unit, the arrangement information is stored,
For each candidate of the reflector based on the arrangement information, the arrangement determining unit obtains the value of the evaluation function using the spatial correlation matrix represented by the transfer characteristic specified based on the candidate, and the value A method for determining the arrangement of reflectors, comprising: an arrangement determining step for determining a candidate corresponding to the smallest of the above as the arrangement of the reflectors.

The arrangement determination method according to claim 1,
The method for determining the arrangement of reflectors is characterized in that the evaluation function is an evaluation function based on a minimum dispersion no distortion response method.

The arrangement determination method according to claim 1,
The method for determining the arrangement of reflectors, wherein the evaluation function is an evaluation function based on an S / N ratio maximization criterion.

The arrangement determination method according to claim 1,
The reflection function arrangement determination method, wherein the evaluation function is an evaluation function based on power inversion.

A filter applied to speech-based information is designed based on a predetermined evaluation function using a spatial correlation matrix represented by transfer characteristics in multiple directions in space,
Each of the above transfer characteristics is represented by the sum of the transfer characteristic of the direct sound and each transfer characteristic of one or more reflected sounds reflected by the reflector,
The direct sound transfer characteristic and the reflected sound transfer characteristic are determined based on information (hereinafter referred to as arrangement information) indicating an arrangement relationship of the reflector with respect to a microphone array or a speaker array.
The evaluation function is a function that takes a value that is so small that the voice in at least one target direction selected from the plurality of directions is emphasized,
A storage unit for storing the configuration information,
For each candidate for the reflector based on the arrangement information, the value of the evaluation function is obtained using the spatial correlation matrix represented by the transfer characteristic specified based on the candidate, and the smallest value among the values is obtained. And an arrangement determining unit for determining a candidate corresponding to the above as an arrangement of the reflector.

The program for making a computer perform the process of the arrangement | positioning determination method of the reflector in any one of Claims 1-4.