JP5683140B2

JP5683140B2 - Noise-to-noise direct ratio estimation device, interference noise elimination device, perspective determination device, sound source distance measurement device, method of each device, and device program

Info

Publication number: JP5683140B2
Application number: JP2010134495A
Authority: JP
Inventors: 裕輔日岡; 阪内　澄宇; 澄宇阪内; 古家　賢一; 賢一古家; 羽田　陽一; 陽一羽田; 健太丹羽
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-06-11
Filing date: 2010-06-11
Publication date: 2015-03-11
Anticipated expiration: 2030-06-11
Also published as: JP2011259398A

Description

この発明は、例えば、音声通話や、音声入力によって機器を操作するハンズフリー方式等に応用でき、マイクロホンから特定の距離範囲内に位置する音源の音だけを強調して収音する際に用いられる耐雑音直間比推定装置、干渉雑音除去装置、遠近判定装置、音源距離測定装置、各装置の方法と各装置プログラムに関する。 The present invention can be applied to, for example, a voice call, a hands-free method of operating a device by voice input, and the like, and is used when collecting sound by emphasizing only the sound of a sound source located within a specific distance range from a microphone. The present invention relates to a noise-to-noise ratio estimation device, an interference noise removal device, a perspective determination device, a sound source distance measurement device, a method of each device, and a device program.

従来、音源との距離を識別し特定の距離範囲にある音源からの音だけを強調若しくは抑圧する目的で、マイクロホンから受信した信号から、直接音と残響音のそれぞれのパワーを推定して直間比を求める考えがある（例えば、非特許文献１）。図面を参照して従来の直間比推定装置が直間比を求める考えを説明する。 Conventionally, for the purpose of identifying the distance from a sound source and enhancing or suppressing only the sound from the sound source within a specific distance range, the power of the direct sound and reverberant sound is estimated from the signal received from the microphone, There is an idea of obtaining the ratio (for example, Non-Patent Document 1). The idea that the conventional direct ratio estimation apparatus obtains the direct ratio will be described with reference to the drawings.

図１に直間比推定装置を利用する場面を例示する。小型マイクロホンアレー１１を、例えば４人の発話者１２〜１４が取り囲んで会議をしている場面を想定する。その会議室内には、テレビ１６、電話１７、館内放送用のスピーカ１８が配置されているものとする。このような場面において、館内放送の音声や、電話の音等を収音せずに、小型マイクロホンアレー１１を中心として所定の距離範囲内（破線で示す円内）に位置する発話者１２〜１４の発話だけを収音したい。 FIG. 1 illustrates a scene in which the direct ratio estimation apparatus is used. Assume that a small microphone array 11 is surrounded by, for example, four speakers 12 to 14 for a conference. It is assumed that a television 16, a telephone 17, and a speaker 18 for broadcasting in the hall are arranged in the conference room. In such a scene, the speakers 12 to 14 located within a predetermined distance range (within a circle indicated by a broken line) around the small microphone array 11 without picking up the voice of the in-house broadcast or the sound of the telephone. I want to collect only the utterances.

マイクロホンアレーから音源までの距離を見分けるために、受信音に含まれる直接音と間接音（残響音）との比（以降、直間比と称する）に着目する。図２に屋内にマイクロホンを置いて音を収録した際の、音源２１からマイクロホン２２までの音の伝搬経路を示す。直接音とは、音源２１からマイクロホンまで直接到達する太い実線で示す音波である。一方の残響音とは、音源２１から発した音が壁や床や天井などで反射してからマイクロホン２２に到達する破線で示す音波である。 In order to distinguish the distance from the microphone array to the sound source, attention is paid to the ratio of direct sound and indirect sound (reverberation sound) included in the received sound (hereinafter referred to as direct ratio). FIG. 2 shows a sound propagation path from the sound source 21 to the microphone 22 when a microphone is placed indoors and a sound is recorded. The direct sound is a sound wave indicated by a thick solid line that directly reaches from the sound source 21 to the microphone. One reverberant sound is a sound wave indicated by a broken line that reaches the microphone 22 after the sound emitted from the sound source 21 is reflected by a wall, floor, ceiling, or the like.

図３に直間比とマイクロホン間距離との関係を示す。図３の横軸はマイクロホンから音源までの距離、縦軸は直間比である。一般的に間接音はマイクロホンからの距離に依存しない一定の大きさを示す。その間接音に対して直接音は、マイクロホンからの距離の増加に伴って単調に減少する特性を示す。その直接音を間接音で除した直間比は、直接音と同様に距離の増加に伴って単調に減少する特性になる。 FIG. 3 shows the relationship between the direct ratio and the distance between the microphones. The horizontal axis in FIG. 3 is the distance from the microphone to the sound source, and the vertical axis is the direct ratio. In general, the indirect sound has a certain magnitude that does not depend on the distance from the microphone. In contrast to the indirect sound, the direct sound exhibits a characteristic that monotonously decreases as the distance from the microphone increases. The direct ratio obtained by dividing the direct sound by the indirect sound has a characteristic that decreases monotonously as the distance increases, as in the case of the direct sound.

従来の直間比推定装置は、受信音からこの直間比を推定し、音源のマイクロホンアレーからの距離を推定することができる。 A conventional direct ratio estimation device can estimate the direct ratio from the received sound and estimate the distance of the sound source from the microphone array.

Y.Hioka, K.Niwa, S.Sakauchi, K.Furuya, and Y.Haneda. Estimating direct-to-reverberant energy ratio based on spatial correlation model segregating direct sound and reverberation. Proceedings of 2010 IEEE International Conference of Acoustics, Speech and Signal Processing(ICASSP2010), pages 149-152, 2010.Y.Hioka, K.Niwa, S.Sakauchi, K.Furuya, and Y.Haneda. Estimating direct-to-reverberant energy ratio based on spatial correlation model segregating direct sound and reverberation.Proceedings of 2010 IEEE International Conference of Acoustics, Speech and Signal Processing (ICASSP2010), pages 149-152, 2010. 日岡裕輔，阪内澄宇，古家賢一，羽田陽一，“受音信号の直間比に基づく距離別収音の検討”，日本音響学会平2009年秋季研究発表会，pp.633-634Yusuke Hioka, Sumio Hannai, Kenichi Furuya, Yoichi Haneda, “Examination of sound collection by distance based on direct ratio of received signal”, Acoustical Society of Japan, 2009 Autumn Meeting, pp.633-634

しかし、一般にマイクロホンで収音した信号には、直接音と残響音の他に、各マイクロホン固有の雑音が重畳される。上記した従来の方法では雑音が考慮されておらず、雑音レベルが大きい場合に直間比の推定精度が劣化する課題がある。 However, in general, in addition to direct sound and reverberation sound, noise specific to each microphone is superimposed on a signal collected by a microphone. In the conventional method described above, noise is not taken into account, and there is a problem that the accuracy of the direct ratio is deteriorated when the noise level is large.

この発明は、このような問題点に鑑みてなされたものであり、雑音による直間比推定精度の劣化を防ぎ、雑音が存在しても高い精度で直間比を推定することができる耐雑音直間比推定装置と、それを用いた干渉雑音除去装置、遠近判定装置、音源距離測定装置と、各装置の方法と、装置プログラムを提供することを目的とする。 The present invention has been made in view of such problems, and prevents noise-to-direct ratio estimation accuracy from being deteriorated due to noise, and is capable of estimating the direct ratio with high accuracy even when noise is present. It is an object of the present invention to provide a direct ratio estimation device, an interference noise removal device, a perspective determination device, a sound source distance measurement device, a method of each device, and a device program using the same.

この発明の耐雑音直間比推定装置は、複数の周波数領域変換部と、直間比推定部とを備える。複数の周波数領域変換部は、複数のマイクロホンで受音された受音信号を周波数領域の信号に変換する。直間比推定部は、空間相関行列算出手段と、信号パワー推定手段と、直間比算出手段と、を具備する。空間相関行列算出手段は、複数の周波数領域変換部の出力する周波数領域の信号を入力としてその周波数領域の信号をベクトル化して空間相関行列を算出する。信号パワー推定手段は、予め与えられるマイクロホンの配置情報と、受信音から算出される空間相関行列とから直接音のパワーと残響音のパワーと雑音のパワーとで構成されるベクトルを求め、そのベクトル要素の内の直接音のパワーと残響音のパワーを出力する。直間比算出手段は、直接音のパワーを残響音のパワーで除した直間比を算出する。 The noise-to-noise direct ratio estimation device of the present invention includes a plurality of frequency domain conversion units and a direct ratio estimation unit. The plurality of frequency domain conversion units convert the received sound signals received by the plurality of microphones into frequency domain signals. The direct ratio estimating unit includes a spatial correlation matrix calculating means, a signal power estimating means, and a direct ratio calculating means. The spatial correlation matrix calculation means calculates the spatial correlation matrix by vectorizing the frequency domain signals with the frequency domain signals output from the plurality of frequency domain transform units as inputs. The signal power estimation means obtains a vector composed of direct sound power, reverberant power and noise power from the microphone arrangement information given in advance and the spatial correlation matrix calculated from the received sound, and the vector Outputs the power of the direct sound and reverberant sound of the element. The direct ratio calculation means calculates the direct ratio obtained by dividing the power of the direct sound by the power of the reverberant sound.

また、この発明の干渉雑音除去装置等は、この発明の耐雑音直間比推定装置を含むものであって、その他に１個のマイクロホンアレーと、処理対象信号生成部と、対象信号調整部と、逆周波数領域変換部と、を具備する。 Further, the interference noise removing apparatus of the present invention includes the noise-to-noise direct ratio estimating apparatus of the present invention, and in addition, one microphone array, a processing target signal generating unit, a target signal adjusting unit, And an inverse frequency domain transform unit.

この発明の耐雑音直間比推定装置は、直間比推定を行う際に利用するマイクロホン間相互相関の情報に、雑音が持つ相互相関のモデルを新たに追加して信号のパワーを求める。これにより、直接音、残響音、雑音の３つの成分のパワーを別々に推定することが可能であり、直間比の推定精度を向上させることができる。 The noise-resistant direct ratio estimation apparatus of the present invention newly adds a cross-correlation model of noise to information on the cross-correlation between microphones used when performing direct ratio estimation, and obtains signal power. Thereby, it is possible to estimate the powers of the three components of the direct sound, the reverberant sound, and the noise separately, and it is possible to improve the estimation accuracy of the direct ratio.

また、この発明の干渉雑音除去装置は、雑音のある環境でもマイクロホンに近い音源の音を強調し、遠い音源からの音を除去することができる。 Moreover, the interference noise removal apparatus of the present invention can enhance the sound of a sound source close to the microphone even in a noisy environment, and can remove the sound from a far sound source.

従来の直間比推定装置を利用する場面の一例を示す図。The figure which shows an example of the scene using the conventional direct ratio estimation apparatus. 屋内での音の伝搬経路を示す図。The figure which shows the propagation path of the sound indoors. 直間比とマイクロホン間距離との関係を示す図。The figure which shows the relationship between direct ratio and the distance between microphones. この発明の耐雑音直間比推定装置１００の機能構成例を示す図The figure which shows the function structural example of the noise-resistant direct ratio estimation apparatus 100 of this invention 耐雑音直間比推定装置１００の動作フローを示す図。The figure which shows the operation | movement flow of the noise-resistant direct ratio estimation apparatus 100. この発明の干渉雑音除去装置２００の機能構成例を示す図。The figure which shows the function structural example of the interference noise removal apparatus 200 of this invention. 干渉雑音除去装置２００の動作フローを示す図。The figure which shows the operation | movement flow of the interference noise removal apparatus 200. 処理対象信号生成部４３の機能構成例を示す図。The figure which shows the function structural example of the process target signal production | generation part 43. FIG. この発明の遠近判定装置３００の機能構成例を示す図。The figure which shows the function structural example of the perspective determination apparatus 300 of this invention. この発明の音源距離推定装置４００の機能構成例を示す図。The figure which shows the function structural example of the sound source distance estimation apparatus 400 of this invention. この発明の干渉雑音除去装置５００の機能構成例を示す図。The figure which shows the function structural example of the interference noise removal apparatus 500 of this invention. 等間隔配置のマイクロホンアレーの例を示す図。The figure which shows the example of the microphone array of equal intervals arrangement | positioning. 小アレーを移動させる概要を示す図。The figure which shows the outline | summary which moves a small array. 効果確認実験の実験条件を示す図。The figure which shows the experimental condition of an effect confirmation experiment. ＳＮＲ：１０ｄＢのときに従来方法とこの発明の方法で求めた直間比を示す図。The figure which shows the direct ratio calculated | required by the conventional method and the method of this invention when SNR: 10dB. ＳＮＲ：２０ｄＢのときに従来方法とこの発明の方法で求めた直間比を示す図。The figure which shows the direct ratio calculated | required with the method of this invention and the method of this invention when SNR: 20dB.

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。また、以下の説明において、テキスト中で使用する記号「￣」や「＾」等は、本来直前の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直後に記載する。式中においてはこれらの記号は本来の位置に記述している。 Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated. In the following description, the symbols “￣”, “^”, etc. used in the text should be written directly above the immediately preceding character, but immediately after the character due to restrictions on the text notation. It describes. In the formula, these symbols are written in their original positions.

図４にこの発明の耐雑音直間比推定装置１００の機能構成例を示す。その動作フローを図５に示す。耐雑音直間比推定装置１００は、複数の周波数領域変換部４２₁〜４２_Mと、直間比推定部４４とから成る。複数の周波数領域変換部４２₁〜４２_Mのそれぞれには、マイクロホンアレー４１を構成する複数のマイクロホンｍ₁〜ｍ_Mで受音された受音信号が入力される。直間比推定部４４は、空間相関行列算出手段４４１と、信号パワー推定手段４４２と、直間比算出手段４４３と、を具備する。耐雑音直間比推定装置１００の各部と各手段とは、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。 FIG. 4 shows an example of the functional configuration of the noise-resistant direct ratio estimation device 100 of the present invention. The operation flow is shown in FIG. The noise-to-noise direct ratio estimation device 100 includes a plurality of frequency domain conversion units 42 _{1 to} 42 _M and a direct ratio estimation unit 44. Each of the plurality of frequency domain transform unit 42 ₁ through 42 _M, the received sound signals received sound by a plurality of microphones m ₁ ~m _M constituting the microphone array 41 is input. The direct ratio estimating unit 44 includes a spatial correlation matrix calculating unit 441, a signal power estimating unit 442, and a direct ratio calculating unit 443. Each unit and each means of the noise-to-noise ratio estimation apparatus 100 are realized by a predetermined program being read into a computer composed of, for example, a ROM, a RAM, a CPU, etc., and executed by the CPU. It is.

複数の周波数領域変換部４２₁，…，４２_Mは、複数のマイクロホンｍ₁，…ｍ_Mで受音された受音信号ｘ_m（ｎ）を周波数領域の信号に変換する（ステップＳ４２）。周波数領域変換部４２₁，…，４２_Mは、受音信号ｘ_m（ｎ）を、例えばサンプリング周波数１６ｋＨｚでサンプリングしてディジタル信号に変換し、例えば２５６個のサンプルを１フレームとして、それぞれのフレームにおいて離散フーリエ変換を行い周波数成分Ｘ_m（ω，ｌ）を出力する（ステップＳ４２）。ωは周波数、ｌはフレーム番号である。なお、受音信号ｘ_m（ｎ）をディジタル信号に変換するＡ/Ｄ変換器は省略している。 A plurality of frequency domain transform section 42 _1, ..., 42 _M converts a plurality of microphones m _1, ... received sound signals are received sound in m _M x _m (n) to a frequency domain signal (step S42). The frequency domain converters 42 ₁ ,..., 42 _M sample the received sound signal x _m (n), for example, at a sampling frequency of 16 kHz and convert it into a digital signal, for example, 256 samples as one frame. In Step S42, discrete Fourier transform is performed to output a frequency component X _m (ω, l) (step S42). ω is a frequency, and l is a frame number. An A / D converter that converts the received sound signal x _m (n) into a digital signal is omitted.

空間相関行列算出手段４４１は、複数の周波数領域変換手段４２₁，…，４２_Mが出力する周波数領域の信号Ｘ₁（ω，ｌ），…，Ｘ_M（ω，ｌ）を入力として、周波数領域の信号Ｘ₁（ω，ｌ），…，Ｘ_M（ω，ｌ）をベクトル化し、その入力信号を用いて式（１）に示す空間相関行列Ｒ（ω，ｌ）を算出する（ステップＳ４４１）。 Spatial correlation matrix calculating unit 441, a plurality of frequency domain transform means 42 _1, ..., 42 signal X ₁ _in the frequency domain _M is output (ω, l), ..., X M (ω, l) as an input, the frequency The region signals X ₁ (ω, l),..., X _M (ω, l) are vectorized, and the spatial correlation matrix R (ω, l) shown in equation (1) is calculated using the input signals (step) S441).

ここでＴは行列の転置、Ｈは共役転置を、Ｌは平均を求めるフレームの数を表す。空間相関行列Ｒ（ω，ｌ）は、信号パワー推定手段４４２に入力される。 Here, T is a matrix transposition, H is a conjugate transposition, and L is the number of frames for which an average is obtained. The spatial correlation matrix R (ω, l) is input to the signal power estimation unit 442.

信号パワー推定手段４４２は、空間相関行列算出手段４４１が出力する空間相関行列Ｒ（ω，ｌ）の各成分Ｒ_ij（ω，ｌ）と、予め与えられているマイクロホンアレーのマイクロホン配置と、音源の方向より与えられる行列Ｒ_d（ω）（式（３））と、行列Ｒ_r（ω）（式（４））と行列Ｒ_ｎ（ω）（式（５））の各成分、ｄ_ij（ω）と、ｒ_ij（ω）と、n_ij（ω）と、のそれぞれで構成される式（６）に示す行列Ａ（ω）と、式（７）に示すＢ（ω）を用いる。行列Ｒ_ｎ（ω）（式（５））はＭ×Ｍの単位行列である。 The signal power estimation unit 442 includes each component R _ij (ω, l) of the spatial correlation matrix R (ω, l) output from the spatial correlation matrix calculation unit 441, a microphone arrangement of a microphone array given in advance, and a sound source. Components R _d (ω) (formula (3)), matrix R _r (ω) (formula (4)), and matrix R _n (ω) (formula (5)), d _ij (Ω), r _ij (ω), and n _ij (ω) are respectively used as a matrix A (ω) shown in Expression (6) and B (ω) shown in Expression (7). . The matrix R _n (ω) (formula (5)) is an M × M unit matrix.

ここで、Ｄ_mnはｍ番目のマイクロホンとｎ番目のマイクロホンの距離、θはマイクロホンアレーの正面から見た音源の方向である。ここでは、マイクロホンアレーの形状は直線配置とし、マイクロホンアレーの正面とはマイクロホンの並ぶ直線の法線方向を意味する。 Here, D _mn is the distance between the m-th microphone and the n-th microphone, and θ is the direction of the sound source viewed from the front of the microphone array. Here, the shape of the microphone array is a linear arrangement, and the front of the microphone array means the normal direction of a straight line in which the microphones are arranged.

そして、式（８）に示す連立方程式を立て、これを解くことで直接音のパワーＰ_d（ω，ｌ）と残響音のパワーＰ_r（ω，ｌ）と雑音のパワーＰ_ｎ（ω，ｌ）で構成されるベクトルＰ（ω，ｌ）（式（９））を求め、直接音パワーＰ_d（ω，ｌ）と残響音パワーＰ_r（ω，ｌ）を出力する。 Then, simultaneous equations _Pd (ω, l), reverberant power P _r (ω, l), and noise power P _n (ω, l) are established by solving the simultaneous equations shown in equation (8). The vector P (ω, l) (equation (9)) composed of l) is obtained, and the direct sound power P _d (ω, l) and the reverberation power P _r (ω, l) are output.

なお、マイクロホンアレーの配置が直線以外の配置の場合の行列Ｒ_d（ω）は、より一般的な式（１０）に示す形式で表せる。 Note that the matrix R _d (ω) in the case where the arrangement of the microphone array is other than a straight line can be expressed in the form shown in the more general expression (10).

ここでＤ_mn（θ）￣は、角度θ°方向から見たときのｍ番目のマイクロホンとｎ番目のマイクロホンの距離差を表す。また、式（８）の連立方程式の解の導出は、例えば式（１２）に示すようにＡ（ω）の擬似逆行列Ａ^＋（ω）（式（１１））を、Ｂ（ω，ｌ）の左から掛ける方法で行われる。 Here, D _mn (θ) ￣ represents a distance difference between the m-th microphone and the n-th microphone when viewed from the direction of the angle θ °. Further, the derivation of the solution of the simultaneous equations of Expression (8) is performed by, for example, converting the pseudo inverse matrix A ⁺ (ω) (Expression (11)) of A (ω) to B (ω, l as shown in Expression (12). ) From the left side.

直間比算出手段４４３は、直接音パワーＰ_d（ω，ｌ）と残響音パワーＰ_r（ω，ｌ）より、式（１３）によって直間比Ｅ_R（ｌ）を算出して出力する。 The direct ratio calculation means 443 calculates the direct ratio E _R (l) from the direct sound power P _d (ω, l) and the reverberant sound power P _r (ω, l) according to the equation (13) and outputs it. .

この実施例１の方法は、雑音のパワーを除いて直接的に直間比を求めるので、正確な直間比の推定が可能である。この耐雑音直間比推定装置１００は、干渉雑音除去装置に利用することができる。図６に、耐雑音直間比推定装置１００を含む干渉雑音除去装置２００の機能構成例を示す。その動作フローを図７に示す。 In the method according to the first embodiment, since the direct ratio is directly obtained without the noise power, it is possible to accurately estimate the direct ratio. This noise-to-noise direct ratio estimation device 100 can be used for an interference noise removal device. FIG. 6 shows a functional configuration example of the interference noise removal apparatus 200 including the noise-resistant direct ratio estimation apparatus 100. The operation flow is shown in FIG.

干渉雑音除去装置２００は、１個のマイクロホンアレー４１と、耐雑音直間比推定装置１００と、処理対象信号生成部４３と、対象信号調整部４５と、逆周波数領域変換部４６と、を具備する。耐雑音直間比推定装置１００は、複数の周波数領域変換部４２₁，…，４２_Mと直間比推定部４４を備える図４で説明済みのものと同じものである。マイクロホンアレー４１を除く各機能構成部は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。 The interference noise removal apparatus 200 includes a single microphone array 41, a noise-to-noise ratio ratio estimation apparatus 100, a processing target signal generation unit 43, a target signal adjustment unit 45, and an inverse frequency domain conversion unit 46. To do. The noise-to-noise direct ratio estimation apparatus 100 is the same as that already described with reference to FIG. 4 including a plurality of frequency domain conversion units 42 ₁ ,..., 42 _M and a direct ratio estimation unit 44. Each functional configuration unit excluding the microphone array 41 is realized by a predetermined program being read into a computer including, for example, a ROM, a RAM, and a CPU, and the CPU executing the program.

マイクロホンアレー４１は複数のマイクロホンｍ₁，…ｍ_Mから成る。複数の周波数領域変換部４２₁，…，４２_Mは、複数のマイクロホンｍ₁，…ｍ_Mで受音された受音信号ｘ_m（ｎ）がそれぞれ入力され、各受音信号を周波数領域の信号に変換する（ステップＳ４２）。 Microphone array 41 is a plurality of microphones m _1, consisting of ... m _M. A plurality of frequency domain transform section 42 _1, ..., 42 _M, a plurality of microphones m _1, ... m _M received sound has been received sound signal x _m (n) are inputted, respectively, each received sound signal in the frequency domain It converts into a signal (step S42).

処理対象信号生成部４３は、複数の周波数領域変換部４２₁，…，４２_Mが出力する周波数領域の信号Ｘ_m（ω，ｌ）を合成して処理対象信号Ｙ（ω，ｌ）を生成する（ステップＳ４３）。耐雑音直間比推定装置１００は、上記したのと同じ動作を行い直間比Ｅ_R（ω，ｌ）を算出して出力する（ステップＳ４４）。ただし、ここで述べる干渉雑音除去装置では、直間比として式（１３）の分母と分子における総和演算を除いた式（１４）を使用する。 Processing signal generating unit 43, a plurality of frequency domain transform section 42 _1, ..., generates a 42 _M signals X _m of frequency domain output (omega, l) synthesizing the processing object signal Y (ω, l) (Step S43). The noise-to-noise direct ratio estimation apparatus 100 performs the same operation as described above, and calculates and outputs the direct ratio E _R (ω, l) (step S44). However, in the interference noise elimination apparatus described here, the denominator of the equation (13) and the equation (14) excluding the sum calculation in the numerator are used as the direct ratio.

対象信号調整部４５は、処理対象信号Ｙ（ω，ｌ）と、直間比Ｅ_R（ω，ｌ）を入力としてその値に応じて処理対象信号Ｙ（ω，ｌ）の振幅を調整した処理後信号Ｚ（ω，ｌ）を生成する（ステップＳ４５）。 The target signal adjustment unit 45 receives the processing target signal Y (ω, l) and the direct ratio E _R (ω, l) as input and adjusts the amplitude of the processing target signal Y (ω, l) according to the values. A post-processing signal Z (ω, l) is generated (step S45).

逆周波数領域変換部４６は、処理後信号Ｚ（ω，ｌ）を時間領域の信号ｚ（ｎ）に変換する（ステップＳ４６）。ステップＳ４２〜ステップＳ４６までの動作は、全ての受音信号ｘ_m（ｎ）が終了するまで継続される。 The inverse frequency domain transform unit 46 transforms the processed signal Z (ω, l) into a time domain signal z (n) (step S46). The operations from step S42 to step S46 are continued until all sound reception signals x _m (n) are completed.

ここで、直間比Ｅ_R（ω，ｌ）の値に応じて調整とは、Ｅ_R（ω，ｌ）の閾値処理や、その値が大きいほど処理後信号Ｚ（ω，ｌ）の振幅を大きくする処理や、その値が大きいほど処理後信号Ｚ（ω，ｌ）の振幅を小さくする等の処理を含む。詳しくは後述する。 Here, adjustment according to the value of the direct ratio E _R (ω, l) means threshold processing of E _R (ω, l) and the amplitude of the post-processing signal Z (ω, l) as the value increases. And processing such as decreasing the amplitude of the post-processing signal Z (ω, l) as the value increases. Details will be described later.

以上の動作により、１個のマイクロホンアレーによって、例えば、特定の距離範囲にある音だけを強調し、その範囲外の音は抑圧して収音する雑音除去が行われる。以降、各部のより具体的な機能構成例を示して更に詳しくこの発明を説明する。 With the above operation, noise removal is performed by, for example, emphasizing only sounds within a specific distance range and suppressing and collecting sounds outside the range by one microphone array. Hereinafter, the present invention will be described in more detail by showing more specific functional configuration examples of the respective units.

〔処理対象信号生成部〕
図８に処理対象信号生成部４３のより具体的な機能構成例を示す。処理対象信号生成部４３は、複数の重み乗算手段４３１₁〜４３１_Mと、加算手段４３２を備える。複数の重み乗算手段４３１₁〜４３１_Mは、Ｍ個のマイクロホンで受音した複数の受音信号ｘ_m（ｎ）の、それぞれの周波数成分Ｘ₁（ω，ｌ），…，Ｘ_M（ω，ｌ）に重み係数ｗ_m（ω）を乗ずる。 [Processing signal generator]
FIG. 8 shows a more specific functional configuration example of the processing target signal generation unit 43. The processing target signal generation unit 43 includes a plurality of weight multiplication units 431 _{1 to} 431 _M and an addition unit 432. The plurality of weight multiplying means 431 _{1 to} 431 _M are respectively frequency components X ₁ (ω, l),..., X _M (ω) of the plurality of received signals x _m (n) received by M microphones. , L) is multiplied by a weighting factor w _m (ω).

重み乗算手段４３１₁〜４３１_Mで使用する重みには、例えばＭ個のマイクロホンが無指向性の場合にはｗ_m＝１/Ｍとすることで全ての周波数成分Ｘ₁（ω，ｌ），…，Ｘ_M（ω，ｌ）の平均を取ることで、処理対象信号Ｙ（ω，ｌ）を安定化させる。また、Ｍ個のマイクロホンが指向性を持つ場合には、ｗ₁＝１，ｗ_m＝０（ｍ＝{２，…，Ｍ}）とすることで、特定のマイクロホンの信号だけを使用することができる。例えば、参考文献「大賀、山崎、金田著、“音響システムとディジタル信号処理”電子情報通信学会発行」に記載されているような方法を利用して、重みビームフォーミングのフィルタ係数を使用すれば、マイクロホンアレーで任意の指向性を形成することもできる。 For the weights used in the weight multiplication means 431 _{1 to} 431 _M , for example, when M microphones are omnidirectional, w _m = 1 / M so that all frequency components X ₁ (ω, l), .., X _M (ω, l) is averaged to stabilize the processing target signal Y (ω, l). Also, when M microphones have directivity, use only a specific microphone signal by setting w ₁ = 1, w _m = 0 (m = {2,..., M}). Can do. For example, using a method such as that described in the reference “Oga, Yamazaki, Kanada,“ Sound System and Digital Signal Processing ”published by the Institute of Electronics, Information and Communication Engineers”, using filter coefficients for weighted beamforming, Arbitrary directivity can be formed with a microphone array.

加算手段４３２は、重みが乗ぜられた全ての周波数成分Ｘ₁（ω，ｌ），…，Ｘ_M（ω，ｌ）を加算して処理対象信号Ｙ（ω，ｌ）を出力する。 The adding means 432 adds all the frequency components X ₁ (ω, l),..., X _M (ω, l) multiplied by the weights and outputs the processing target signal Y (ω, l).

〔対象信号調整部〕
対象信号調整部４５は、例えば、フィルタ係数算出手段４５１と、乗算手段４５２とで構成できる（図６）。フィルタ係数算出手段４５１は、直間比Ｅ_R（ω，ｌ）を入力としてフィルタ係数Ｇ（ω，ｌ）を算出して出力する。フィルタ係数Ｇ（ω，ｌ）の算出には、例えば式（１５）に示すように閾値を用いた２値のフィルタなどが用いられる。 [Target signal adjustment section]
The target signal adjustment unit 45 can be configured by, for example, a filter coefficient calculation unit 451 and a multiplication unit 452 (FIG. 6). The filter coefficient calculation unit 451 calculates and outputs a filter coefficient G (ω, l) with the direct ratio E _R (ω, l) as an input. For the calculation of the filter coefficient G (ω, l), for example, a binary filter using a threshold value as shown in Expression (15) is used.

なお、閾値Ｔｈは、直間比Ｅ_R（ω，ｌ）の最小値と最大値の間の任意の値が設定できる。閾値Ｔｈを最小値（０）に近づけると音質は向上する。逆に閾値Ｔｈを最大値に近づけると雑音抑圧効果は高めるが受音信号の歪みが大きくなり音質が劣化する。 The threshold value Th can be set to any value between the minimum value and the maximum value of the direct ratio E _R (ω, l). The sound quality is improved when the threshold Th is brought close to the minimum value (0). On the contrary, when the threshold value Th is brought close to the maximum value, the noise suppression effect is enhanced, but the distortion of the received sound signal is increased and the sound quality is deteriorated.

このように閾値Ｔｈは、音質と雑音抑圧との関係でトレードオフの関係を持つ。よって、閾値Ｔｈは、このトレードオフの関係を考慮した上で、利用目的に応じて経験的に決定される。 Thus, the threshold Th has a trade-off relationship between the sound quality and the noise suppression. Therefore, the threshold Th is determined empirically in accordance with the purpose of use in consideration of this trade-off relationship.

また、フィルタ係数Ｇ（ω，ｌ）の算出に際して式（１６）に示すように、直間比の値が閾値Ｔｈ₂を下回る時間周波数帯域を強調するようにすれば、特定の距離範囲より遠くの音源を強調することができる。 Further, when calculating the filter coefficient G (ω, l), as shown in the equation (16), if a time frequency band in which the value of the direct ratio falls below the threshold Th ₂ is emphasized, it is farther than a specific distance range. The sound source can be emphasized.

なお、フィルタ係数Ｇ（ω，ｌ）の例として０か１の２値のフィルタを挙げたが、フィルタ係数Ｇ（ω，ｌ）は必ずしも０と１である必要はなく、例えば、０.１と０.９のように十分異なる値であれば良い。 In addition, although the binary filter of 0 or 1 was mentioned as an example of filter coefficient G ((omega), l), filter coefficient G ((omega), l) does not necessarily need to be 0 and 1, for example, 0.1 And a sufficiently different value such as 0.9.

また、フィルタ係数Ｇ（ω，ｌ）には、１以上の実数を設定するようにしても良い。つまり、処理対象信号Ｙ（ω，ｌ）を増幅するようにしても良い。また、０.１以下の値に設定して処理対象信号Ｙ（ω，ｌ）を大きく抑圧するようにしても良い。 Further, a real number of 1 or more may be set for the filter coefficient G (ω, l). That is, the processing target signal Y (ω, l) may be amplified. Alternatively, the processing target signal Y (ω, l) may be greatly suppressed by setting the value to 0.1 or less.

このようにして求めたフィルタ係数Ｇ（ω，ｌ）が、乗算手段４５２において、処理対象信号Ｙ（ω，ｌ）に乗じて処理後信号Ｚ（ω，ｌ）＝Ｇ（ω，ｌ）・Ｙ（ω，ｌ）が生成される。よって、処理後信号Ｚ（ω，ｌ）を、直間比Ｅ_R（ω，ｌ）の大きな処理対象信号Ｙ（ω，ｌ）のみで構成することができる。つまり、直接音のみを抽出することができる。 The multiplication coefficient 452 multiplies the processing target signal Y (ω, l) by the filter coefficient G (ω, l) obtained in this way, and the processed signal Z (ω, l) = G (ω, l) · Y (ω, l) is generated. Therefore, the post-processing signal Z (ω, l) can be composed of only the processing target signal Y (ω, l) having a large direct ratio E _R (ω, l). That is, only the direct sound can be extracted.

この発明の実施例２として、実施例１で述べた直間比Ｅ_R（ｌ）を用いて音源の遠近を判定する遠近判定装置３００を説明する。図９に遠近判定装置３００の機能構成例を示す。遠近判定装置３００は、マイクロホンアレー４１と、耐雑音直間比推定装置１００と、遠近判定部１２１と、を備える。マイクロホンアレー４１と、耐雑音直間比推定装置１００は、干渉雑音除去装置２００のものと同じである。遠近判定装置３００も、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現される。 As a second embodiment of the present invention, a perspective determination apparatus 300 that determines the perspective of a sound source using the direct ratio E _R (l) described in the first embodiment will be described. FIG. 9 shows a functional configuration example of the perspective determination device 300. The perspective determination device 300 includes a microphone array 41, a noise-to-noise direct ratio estimation device 100, and a perspective determination unit 121. The microphone array 41 and the noise-to-noise ratio estimation device 100 are the same as those of the interference noise removal device 200. The perspective determination device 300 is also realized by a predetermined program being read into a computer including, for example, a ROM, a RAM, and a CPU, and the CPU executing the program.

遠近判定装置３００は、複数の異なる距離にある音源が異なる時刻に発音するときに、ある時刻に受音された音の音源が遠くにあるのか近くにあるのかを判定するものである。遠近判定装置３００を構成する遠近判定部１２１は、蓄積手段１２１１と、判定手段１２１２と、を備える。 The perspective determination device 300 determines whether a sound source of a sound received at a certain time is far or near when sound sources at a plurality of different distances sound at different times. The perspective determination unit 121 included in the perspective determination device 300 includes an accumulation unit 1211 and a determination unit 1212.

蓄積手段１２１１は、直間比Ｅ_Rを過去Ｌ時間フレーム分蓄積して、比較対象直間比Ｅ＾を出力する。比較対象直間比Ｅ＾には、例えば蓄積された直間比Ｅ_R（ｌ）の平均値Ｅ＾＝１/ＬΣ_l ^LＥ_R（ｌ）や、最小値と最大値の平均値Ｅ＾＝１/２（maxＥ_R（ｌ）+minＥ_R（ｌ））等が用いられる。 The accumulating unit 1211 accumulates the direct ratio E _R for the past L time frames, and outputs the comparison direct ratio E ^. Compared Chokkan ratio E ^ in, for example stored Chokkan ratio average value E of _{E R (l) ^ = 1} / LΣ l L E R (l) and the average value E of the minimum and maximum values ^ = 1/2 (max E _R (l) + min E _R (l)) or the like is used.

判定手段１２１２は、直間比Ｅ_R（ｌ）と、比較対象直間比Ｅ＾を比較して、Ｅ_R（ｌ）＞Ｅ＾の時には遠近判定結果Ｙ_lに距離が近いことを表す例えば１を、Ｅ_R（ｌ）＜Ｅ＾の時には遠近判定結果Ｙ_lに距離が遠いことを表す例えば０を出力する。この遠近判定結果Ｙ_lは、直近の過去Ｌ時間分の受音信号が、比較的近い音源からの音であるか、又は、比較的遠い音源からの音であるかを表すものである。 The determination unit 1212 compares the direct ratio E _R (l) with the comparison target direct ratio E ^, and indicates that the distance is close to the perspective determination result Y _l when E _R (l)> E ^. 1, when E _R (l) <E ^ outputs 0 for example indicating that distance is long in the distance determination result Y _l. The distance determination result Y _l is nearest received sound signals of the past L time period is either a sound from relatively close sound source or those indicating which sounds from a relatively distant sound source.

この遠近判定結果Ｙ_lを用いることで、逐次入力される受音信号を、マイクロホンとその音源間との距離によって切り分けることが可能である。つまり、複数の音源の音を、マイクロホンからの距離に応じて選択することができる。 The distance determination result by using a Y _l, the received sound signal inputted sequentially, it is possible to isolate the distance between the between the microphone and the sound source. That is, sounds from a plurality of sound sources can be selected according to the distance from the microphone.

図１０にこの発明の音源距離測定装置４００の機能構成例を示す。音源距離測定装置４００は、１個のマイクロホンアレー４１と、耐雑音直間比推定装置１００と、距離−直間比データベース（以降、距離−直間比ＤＢと称する）４７と、距離判定部４８と、を具備する。耐雑音直間比推定装置１００は、複数の周波数領域変換部４２₁，…，４２_Mと直間比推定部４４を含む。マイクロホンアレー４１を除く各機能構成部は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。 FIG. 10 shows a functional configuration example of the sound source distance measuring apparatus 400 of the present invention. Sound source distance measuring apparatus 400, and one microphone array 41, the noise immunity Chokkan ratio estimation device 100, the distance - Chokkan ratio database (hereinafter, the distance - referred Chokkan ratio DB) 4 and 7, the distance determination unit 4 8, comprising a. The noise-to-noise direct ratio estimation apparatus 100 includes a plurality of frequency domain conversion units 42 ₁ ,..., 42 _M and a direct ratio estimation unit 44. Each functional configuration unit excluding the microphone array 41 is realized by a predetermined program being read into a computer including, for example, a ROM, a RAM, and a CPU, and the CPU executing the program.

マイクロホンアレー４１は複数のマイクロホンｍ₁，…ｍ_Mから成る。複数の周波数領域変換部４２₁，…，４２_Mは、複数のマイクロホンｍ₁，…ｍ_Mで受音された受音信号ｘ_m（ｎ）がそれぞれ入力され、各受音信号を周波数領域の信号に変換する。周波数領域変換部４２₁，…，４２_Mは、フレーム毎に離散フーリエ変換を行い周波数成分Ｘ_m（ω，ｌ）を出力する。 Microphone array 41 is a plurality of microphones m _1, consisting of ... m _M. A plurality of frequency domain transform section 42 _1, ..., 42 _M, a plurality of microphones m _1, ... m _M received sound has been received sound signal x _m (n) are inputted, respectively, each received sound signal in the frequency domain Convert to signal. The frequency domain transform units 42 ₁ ,..., 42 _M perform discrete Fourier transform for each frame and output a frequency component X _m (ω, l).

直間比推定部４４は、複数の周波数領域変換部４２₁，…，４２_mが出力する周波数領域の信号Ｘ_m（ω，ｌ）を入力として受音信号の直間比Ｅ_Rを推定する。 Chokkan ratio estimation unit 44, a plurality of frequency domain transform section 42 _1, ..., 42 signal _in the frequency domain _m outputs X _m (ω, l) to estimate the Chokkan ratio E _R of the received sound signals as input .

距離−直間比ＤＢ４７は、直間比Ｅ_Rとマイクロホンアレーと音源との距離との関係を
記録している。距離判定部４８は、直間比を入力として距離−直間比ＤＢ４７を参照してその直間比と対応する音源距離推定値ｄ＾を推定する。 Distance - Chokkan ratio DB4 7 records the relationship between the distance between the Chokkan ratio E _R and the microphone array and sound source. Distance determining section 4-8, the distance as an input Chokkan ratio - Chokkan ratio DB4 7 reference to estimate the sound source distance estimate d ^ corresponding to its Chokkan ratio.

受音信号の中には、特定の周波数帯域に成分が集中しているものもある。そのような受音信号の直間比Ｅ_Rを、直間比算出手段４４３で算出した場合、直間比Ｅ_Rの推定精度は劣化してしまう。 Some received signals have components concentrated in a specific frequency band. When the direct ratio E _R of such a sound reception signal is calculated by the direct ratio calculation means 443, the estimation accuracy of the direct ratio E _R deteriorates.

そこで、式（１７）に示すように、特定の周波数領域Ωにおける直間比Ｅを算出する直間比算出手段４４３′（図４）を用いることで、直間比の推定精度を向上させることが出来る。 Therefore, as shown in the equation (17), by using the direct ratio calculation means 443 ′ (FIG. 4) for calculating the direct ratio E in a specific frequency region Ω, the accuracy of the direct ratio is improved. I can do it.

ここで周波数領域Ωは、例えば信号成分の集中する周波数帯域を選択するなどして決定される。例えば、任意のｍ番目のマイクロホンに接続された周波数領域変換部４２_mの出力Ｘ_m（ω,ｌ）のうち、式（１８）に示す様にＸ_m（ω,ｌ）の絶対値が予め設定された閾値Ｐ_thより大きい値を持つ周波数ωを選んだり、Ｘ_m（ω,ｌ）の絶対値が大きい方からＫ番目までの周波数ωを選ぶことで決定される。 Here, the frequency region Ω is determined, for example, by selecting a frequency band in which signal components are concentrated. For example, among the outputs X _m (ω, l) of the frequency domain converter 42 _m connected to an arbitrary m-th microphone, the absolute value of X _m (ω, l) is preliminarily set as shown in the equation (18). It is determined by selecting the frequency ω having a value larger than the set threshold value P _th or by selecting the frequency ω from the largest absolute value of X _m (ω, l) to the Kth.

ここで、Ｐ_thは、例えば｜Ｘ_m（ω,ｌ）｜の全周波数の平均値などが用いられる。 Here, P _th is, for example _{| X m (ω, l)} | of an average value of all the frequency used.

図１１に、この発明の干渉雑音除去装置５００の機能構成例を示す。干渉雑音除去装置５００は、実施例１で述べた耐雑音直間比推定装置１００と、処理対象信号生成部７２と、対象信号調整部７３と、逆周波数領域変換部７４と、を具備する。 FIG. 11 shows a functional configuration example of the interference noise removing apparatus 500 of the present invention. The interference noise removal apparatus 500 includes the noise-to-noise direct ratio estimation apparatus 100 described in the first embodiment, a processing target signal generation unit 72, a target signal adjustment unit 73, and an inverse frequency domain conversion unit 74.

処理対象信号生成部７２は、耐雑音直間比推定装置１００内の複数の周波数領域変換部４２₁〜４２_Mが出力する周波数領域の信号Ｘ_m（ω，ｌ）を入力として処理対象信号Ｘ（ω，ｌ）を出力する。処理対象信号Ｘ（ω，ｌ）は、周波数領域の信号Ｘ_m（ω，ｌ）を例えば図示しない加算手段等で合成したものである。加算する前に、各周波数領域の信号Ｘ_m（ω，ｌ）に、重みを乗じる様にしても良い。 The processing target signal generation unit 72 receives the frequency domain signals X _m (ω, l) output from the plurality of frequency domain conversion units 42 _{1 to} 42 _M in the noise-to-noise direct ratio estimation apparatus 100 as input. (Ω, l) is output. The processing target signal X (ω, l) is a signal obtained by synthesizing the frequency domain signal X _m (ω, l) with, for example, an adding means (not shown). Before the addition, the signal X _m (ω, l) in each frequency domain may be multiplied by a weight.

対象信号調整部７３は、耐雑音直間比推定装置１００が出力する直間比Ｅ（ω）と、処理対象信号生成部７２が出力する処理対象信号Ｘ（ω，ｌ）を入力として、処理対象信号Ｘ（ω，ｌ）の振幅を調整した処理後信号Ｙ（ω，ｌ）を生成する。逆周波数領域変換部７４は、処理後信号Ｙ（ω，ｌ）を時間領域の信号ｙ（ｎ）に変換する。 The target signal adjustment unit 73 receives the direct ratio E (ω) output from the noise-to-noise direct ratio estimation device 100 and the processing target signal X (ω, l) output from the processing target signal generation unit 72 as inputs. A post-processing signal Y (ω, l) in which the amplitude of the target signal X (ω, l) is adjusted is generated. The inverse frequency domain transform unit 74 transforms the processed signal Y (ω, l) into a time domain signal y (n).

対象信号調整部７３は、例えば、距離算出手段７３１、フィルタ形成手段７３２、乗算手段７３３、を備える。距離算出手段７３１は、マイクロホンアレー４１と音源との間の距離と、直間比Ｅ_R（ω，ｌ）との関係を示す関数式ｄ＝ｆ（Ｅ_R（ω，ｌ））を内蔵し、入力される直間比Ｅに応じた音源距離推定値ｄ＾を算出する。 The target signal adjustment unit 73 includes, for example, a distance calculation unit 7 3 1, a filter formation unit 7 3 2, and a multiplication unit 7 3 3. The distance calculation means 7 3 1 obtains a functional expression d = f (E _R (ω, l)) indicating the relationship between the distance between the microphone array 41 and the sound source and the direct ratio E _R (ω, l). A sound source distance estimated value d ^ corresponding to the input direct ratio E is calculated.

フィルタ形成手段７３２は、式（１９）に示すように、音源距離推定値ｄ＾が、２つの大きさが異なる閾値ｄ_fとｄ_nの間の値を取る時間周波数成分を強調するように設定し、２つの距離区間内の帯状の領域にある音源だけを強調するフィルタを形成する。 Filter forming means 7 3 2, as shown in equation (19), the sound source distance estimate d ^ is, emphasize the two time frequency components takes a value between the different sizes threshold d _f and d _n so And a filter for emphasizing only the sound source in the band-like region in the two distance sections is formed.

ここで、Ｇ（ω，ｌ）のｌとωは、上記した直間比推定部４３の処理の内、空間相関行列算出手段４３１において式（１）で平均を行ったＬ個のフレーム及び直間比算出手段４４３において平均を行った周波数Ω（式（１７））に含まれる全ての周波数に対して、同じＧ（ω，ｌ）が乗算される。また、式（１９）においてＧ（ω，ｌ）の値は必ずしも１と０である必要は無く、例えば、０.９と０.１のように十分大きさが異なる値でも良い。 Here, l and ω of G (ω, l) are the L frames obtained by averaging the equation (1) in the spatial correlation matrix calculation means 431 and the direct values of the processing of the direct ratio estimation unit 43 described above. The same G (ω, l) is multiplied to all the frequencies included in the frequency Ω (equation (17)) averaged by the interval ratio calculation means 443. Further, in the equation (19), the value of G (ω, l) is not necessarily 1 and 0, and may be a value that is sufficiently different, for example, 0.9 and 0.1.

乗算手段７３３は、処理対象信号Ｘ（ω，ｌ）に、フィルタＧ（ω，ｌ）を乗じて処理後信号Ｙ（ω，ｌ）を生成する。したがって、処理後信号Ｙ（ω，ｌ）は、２つの距離区間内、つまり、マイクロホンアレー４１から特定の距離範囲に位置する音源の音声が、強調若しくは抑圧されたものとなる。この処理後信号Ｙ（ω，ｌ）は、逆周波数領域変換部７３で時間領域の信号ｙ（ｎ）に変換される。 The multiplying unit 7 3 3 multiplies the processing target signal X (ω, l) by the filter G (ω, l) to generate a processed signal Y (ω, l). Therefore, the post-processing signal Y (ω, l) is obtained by enhancing or suppressing the sound of the sound source located in a specific distance range from the microphone array 41 in two distance sections. The post-process signal Y (ω, l) is converted into a time domain signal y (n) by the inverse frequency domain converter 73.

上記した実施例の空間相関行列Ｒ（ω，ｌ）は、式（１）から明らかなように、フレーム数Ｌの平均値を元にしたものである。従って、音源の位置が移動する場合には、正確に直間比を求めることができない。そこで、音源の位置が移動する場合でも、正確に直間比を求めることができる耐雑音直間比推定装置６００を説明する。 The spatial correlation matrix R (ω, l) in the above-described embodiment is based on the average value of the number of frames L, as is apparent from the equation (1). Therefore, when the position of the sound source moves, the direct ratio cannot be obtained accurately. Therefore, a noise-resistant direct ratio estimation apparatus 600 that can accurately determine the direct ratio even when the position of the sound source moves will be described.

この実施例では、例えば図１２に示すような等間隔配置のマイクロホンアレー１３０を利用する。この実施例による直間比推定部４４の機能構成は、図４に示したものと同じである。 In this embodiment, for example, a microphone array 130 having an equal interval as shown in FIG. 12 is used. The functional configuration of the direct ratio estimation unit 44 according to this embodiment is the same as that shown in FIG.

信号パワー推定手段４４２′（図４）は、空間相関行列算出手段４４１′が出力する小空間相関行列Ｒ′（ω，ｌ）の各成分Ｒ′_i,j（ω，ｌ）と、予め与えられているマイクロホンアレーのマイクロホン配置と、音源の方向より与えられる行列Ｒ_d（ω）（式（３））と、行列Ｒ_r（ω）（式（４））と行列Ｒ_n（ω）（式（５））の各成分、ｄ_i,j（ω）と、ｒ_i,j（ω）と、ｎ_i,j（ω）より、それぞれ構成される式（２０）に示す行列Ａ（ω）と、式（２１）に示すＢ（ω，ｌ）を用いる。ここで、小空間相関行列Ｒ′（ω，ｌ）とは、小マイクロホンアレー毎に求めた空間相関行列の和で求まる行列である（式（２２））。 The signal power estimation means 442 ′ (FIG. 4) gives in advance each component R ′ _{i, j} (ω, l) of the small spatial correlation matrix R ′ (ω, l) output from the spatial correlation matrix calculation means 441 ′. And the matrix R _d (ω) given by the direction of the sound source (formula (3)), the matrix R _r (ω) (formula (4)) and the matrix R _n (ω) ( A matrix A (ω) shown in Expression (20), which is composed of each component of Expression (5), d _{i, j} (ω), r _{i, j} (ω), and n _{i, j} (ω). ) And B (ω, l) shown in equation (21). Here, the small spatial correlation matrix R ′ (ω, l) is a matrix obtained by the sum of the spatial correlation matrices obtained for each small microphone array (formula (22)).

但し、Ｂ（ω，ｌ）の各成分であるＲ₁₁′（ω，ｌ），Ｒ₁₂′（ω，ｌ），Ｒ₂₁′（ω，ｌ），Ｒ₂₂′（ω，ｌ）は、式（２２）に示すＲ′（ω，ｌ）で求められる。 However, R ₁₁ ′ (ω, l), R ₁₂ ′ (ω, l), R ₂₁ ′ (ω, l), R ₂₂ ′ (ω, l), which are the components of B (ω, l), It is obtained by R ′ (ω, l) shown in the equation (22).

但し、

However,

式（２２）と式（２３）は、図１３に示すように隣接するマイクロホンを２個ずつの小アレーとして移動した場合の空間相関行列の和で求まる小空間相関行列を算出する。つまり、隣接するマイクロホンを２個ずつ括った小アレーを移動（１３０ａ→１３０ｂ→ … →１３０ｇ）して空間相関行列の和を求める。マイクロホンの数をＭ′個とすると、式（２０）は式（２４）、式（２１）は式（２５）、式（２２）は式（２６）で表せる。 Expressions (22) and (23) calculate a small spatial correlation matrix obtained by the sum of the spatial correlation matrices when adjacent microphones are moved as two small arrays as shown in FIG. That is, a small array including two adjacent microphones is moved (130a → 130b →... → 130g) to obtain the sum of spatial correlation matrices. When the number of microphones is M ′, Expression (20) can be expressed by Expression (24), Expression (21) can be expressed by Expression (25), and Expression (22) can be expressed by Expression (26).

直間比算出手段４４３は、実施例１と全く同じ処理を行う。以上説明したように空間相関行列算出手段４４１′のように、小アレー毎に求めた空間相関行列の和で求まる小空間相関行列を算出することで、移動する音源に対しても正確な直間比を求めることができる。 The direct ratio calculation means 443 performs exactly the same processing as in the first embodiment. As described above, by calculating the small spatial correlation matrix obtained by the sum of the spatial correlation matrices obtained for each small array as in the spatial correlation matrix calculating means 441 ′, it is possible to obtain an accurate straight line for a moving sound source. The ratio can be determined.

なお、小アレーを構成するマイクロホンの数を２個の例で説明したが、その数はいくつでも良い。また、そのマイクロホンの配置も等間隔に直線配置されたリニアアレーに限定されない。長方形平面アレー、三角形平面アレー、直方体アレー等、一定規則で配列された複数のマイクロホンから成る小マイクロホンアレーの平行移動で重なる位置に、マイクロホンが設けられるマイクロホンアレーであれば何でも良い。 Although the number of microphones constituting the small array has been described with two examples, the number may be any number. Further, the arrangement of the microphones is not limited to a linear array arranged linearly at equal intervals. Any microphone array may be used as long as a microphone is provided at a position overlapping by translation of a small microphone array composed of a plurality of microphones arranged in a regular rule, such as a rectangular planar array, a triangular planar array, and a rectangular parallelepiped array.

〔実験結果〕
この発明の効果を確認する目的で、音源２１から白色雑音が発せられたときにマイクロホンアレーで受音した信号を用いて直間比を推定し、従来方法と比較する実験を行った。〔Experimental result〕
For the purpose of confirming the effect of the present invention, an experiment was performed in which the direct ratio was estimated using signals received by the microphone array when white noise was emitted from the sound source 21 and compared with the conventional method.

図１４にシミュレーション条件を示す。平面サイズが４×６ｍで、高さが２.７ｍの部屋を想定した。なお図１４は部屋を上から見た図である。８個のマイクロホンを半径６ｃｍの円状に配置したマイクロホンアレーを用いた。マイクロホンアレーは床から高さを１.５ｍの位置に配置した。そして円の中心軸から角度１０°の方向で、高さ１．５ｍの位置に音源２１を配置した。部屋の残響時間は約５５０ｍｓ、サンプリング周波数は１６ｋＨｚ、処理における１フレームの長さは５１２サンプルである。 FIG. 14 shows the simulation conditions. A room having a plane size of 4 × 6 m and a height of 2.7 m was assumed. FIG. 14 is a view of the room as viewed from above. A microphone array in which eight microphones were arranged in a circle with a radius of 6 cm was used. The microphone array was placed at a height of 1.5m from the floor. The sound source 21 was arranged at a height of 1.5 m in the direction of an angle of 10 ° from the center axis of the circle. The reverberation time of the room is about 550 ms, the sampling frequency is 16 kHz, and the length of one frame in the processing is 512 samples.

図１５にＳＮＲ：１０ｄＢとした時、図１６にＳＮＲ：２０ｄＢとした時を示す。それぞれ（ａ）が従来方法、（ｂ）がこの発明である。横軸は距離（ｃｍ）、縦軸は直間比（ｄＢ）である。○実線が推定した直間比、●破線が正しい直間比を示す。 FIG. 15 shows the time when SNR: 10 dB, and FIG. 16 shows the time when SNR: 20 dB. (A) is the conventional method, and (b) is the present invention. The horizontal axis is the distance (cm), and the vertical axis is the direct ratio (dB). ○ The solid line indicates the estimated direct ratio, and ● the broken line indicates the correct direct ratio.

図１５と１６の（ａ）と（ｂ）を比較すると、どちらもこの発明の方が誤差が少ないことが分かる。例えばＳＮＲ：１０ｄＢ（図１５）の距離１０cmの音源に対する直間比の推定誤差は、従来技術が１０ｄＢであるのに対して本発明による耐雑音直間比推定装置では約５ｄＢである。また、雑音のパワーが１０分の１の関係にある図１５と１６とを比較すると、図１５（ｂ）と図１６（ｂ）との間にはほとんど差がないことが分かる。この結果は、この発明の方法が、重畳する雑音の大きさに影響を受け難いことを示している。このように、この発明による耐雑音直間比推定方法によればマイクロホンに無相関な雑音が重畳した場合でも精度よく直間比を推定する効果を奏する。 Comparing FIGS. 15 and 16 (a) and (b), it can be seen that the present invention has less error. For example, the estimation error of the direct ratio with respect to a sound source with a SNR of 10 dB (FIG. 15) at a distance of 10 cm is about 5 dB in the noise-resistant direct ratio estimation apparatus according to the present invention, while the conventional technique has 10 dB. Further, comparing FIG. 15 and FIG. 16 where the noise power is 1/10, it can be seen that there is almost no difference between FIG. 15 (b) and FIG. 16 (b). This result shows that the method of the present invention is not easily affected by the magnitude of the superimposed noise. As described above, according to the noise-resistant direct ratio estimation method according to the present invention, there is an effect of accurately estimating the direct ratio even when uncorrelated noise is superimposed on the microphone.

なお、上記方法及び装置において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしても良い。また、上記装置及び方法を実装したマイクロホンアレーを２つ以上利用することで、干渉雑音除去システムなどを構築しても良い。 Note that the processes described in the above method and apparatus are not only executed in time series according to the order of description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. good. Further, an interference noise removal system or the like may be constructed by using two or more microphone arrays on which the above apparatus and method are mounted.

また、上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。 Further, when the processing means in the above apparatus is realized by a computer, the processing contents of functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）/ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

Claims

A plurality of frequency domain converters for receiving received signals received by a plurality of microphones and converting the received signals into frequency domain signals, and direct sound power and reverberation using the frequency domain signals as inputs. A noise-tolerant direct ratio estimation device comprising a direct ratio estimator for calculating a direct ratio obtained by calculating a sound power and a noise power and dividing the direct sound power by the reverberant power,
The direct ratio estimator is
A spatial correlation matrix calculating means for calculating a spatial correlation matrix by vectorizing a signal in the frequency domain with the frequency domain signal output from the plurality of frequency domain transform units as an input;
A vector composed of direct sound power, reverberant sound power and noise power is obtained from the microphone arrangement information given in advance and the spatial correlation matrix, and the direct sound power and reverberation among the vector elements are obtained. Signal power estimation means for outputting the power of sound;
Direct ratio calculation means for calculating the direct ratio obtained by dividing the power of the direct sound by the power of the reverberant sound;
A noise-to-noise direct ratio estimation apparatus comprising:

An interference noise elimination apparatus including the noise-to-noise ratio ratio estimation apparatus according to claim 1,
A microphone array composed of a plurality of microphones, each of which receives a received sound signal received by the plurality of frequency domain converters;
A processing target signal generation unit that generates a processing target signal by combining frequency domain signals output from the plurality of frequency domain conversion units;
A target signal adjustment unit that generates a processed signal in which the amplitude of the processing target signal is adjusted to be larger as the direct ratio is larger with the processing target signal and the direct ratio as an input;
An inverse frequency domain transform unit for transforming the processed signal into a time domain signal;
An interference noise removing apparatus comprising:

An interference noise elimination apparatus including the noise-to-noise ratio ratio estimation apparatus according to claim 1,
A microphone array composed of a plurality of microphones, each of which receives a received sound signal received by the plurality of frequency domain converters;
A processing target signal generation unit that generates a processing target signal by combining frequency domain signals output from the plurality of frequency domain conversion units;
A target signal adjustment unit that generates a post-processing signal in which the amplitude of the processing target signal is adjusted to be larger as the direct ratio is smaller, with the processing target signal and the direct ratio being input;
An inverse frequency domain transform unit for transforming the processed signal into a time domain signal;
An interference noise removing apparatus comprising:

A perspective determination device including the noise-to-noise direct ratio estimation device according to claim 1 and including a perspective determination unit,
The perspective determination unit
Frequency averaging means for averaging the direct ratio in the frequency direction and outputting the frequency average direct ratio;
Accumulating means for accumulating frames for a predetermined period of time in the frequency average direct ratio and outputting the comparative direct ratio;
A determination means for comparing the frequency average direct ratio and the comparison target direct ratio and outputting a perspective determination result;
A perspective determination device comprising:

A sound source distance measuring device including the noise-to-noise direct ratio estimating device according to claim 1,
A microphone array composed of a plurality of microphones, each of which receives a received sound signal received by the plurality of frequency domain converters;
A distance-direct ratio database that records the relationship between the direct ratio and distance;
A distance determination unit that estimates the sound source distance estimate corresponding to the direct ratio by referring to the distance-direct ratio database using the direct ratio as an input;
A sound source distance measuring device comprising:

An interference noise elimination apparatus including the noise-to-noise ratio ratio estimation apparatus according to claim 1,
A processing target signal generation unit that outputs a processing target signal with the frequency domain signals output by the plurality of frequency domain conversion units;
Processing that emphasizes or suppresses the sound of a sound source located in a specific distance range from the microphone array composed of the plurality of microphones, with the direct ratio output from the noise-tolerant direct ratio estimation device and the processing target signal as inputs. A target signal adjustment unit for generating a post signal;
An inverse frequency domain transform unit for transforming the processed signal into a time domain signal;
An interference noise removing apparatus comprising:

A plurality of frequency domain conversion units, a frequency domain conversion process for converting a received signal received by a plurality of microphones into a frequency domain signal;
A spatial correlation matrix calculation unit, wherein the spatial correlation matrix calculation unit calculates a spatial correlation matrix by vectorizing the frequency domain signal by inputting the frequency domain signal output from the plurality of frequency domain transform units;
A signal power estimation unit obtains a vector composed of direct sound power, reverberant sound power and noise power from the microphone arrangement information given in advance and the spatial correlation matrix, and among the vector elements, A signal power estimation process that outputs the power of the direct sound and the power of the reverberant sound,
The direct ratio calculation unit calculates the direct ratio obtained by dividing the power of the direct sound by the power of the reverberant sound; and
A noise-to-noise ratio estimation method including:

The noise-to-noise direct ratio estimation device according to claim 1, the interference noise elimination device according to claim 2, claim 3, or claim 6 , or the perspective determination device according to claim 4, or claim An apparatus program for causing a computer to function as the sound source distance measuring apparatus according to Item 5 .