JPH0587903A

JPH0587903A - Predicting method of direction of sound source

Info

Publication number: JPH0587903A
Application number: JP3249411A
Authority: JP
Inventors: Yutaka Kaneda; 豊金田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1991-09-27
Filing date: 1991-09-27
Publication date: 1993-04-09
Anticipated expiration: 2014-12-06
Also published as: JP2985982B2

Abstract

PURPOSE:To correctly predict the direction of a sound source even in the sound field where many reflecting sounds are present. CONSTITUTION:Outputs from two microphones 21, 21 are respectively divided to M frequency bands by a band splitting part 22. The power of the signals in each band is detected in a power operating part 23. The peak value of the signals is held by a peak hold part 24, logarithmically processed by a logarithmic processing part 25, and differentiated by a time difference processing part 26. The function of correlation between the signals in the corresponding frequency bands of the outputs of the microphones processed in the same manner is obtained in a correlation function operating part 27. The function of correlation in each frequency band is weighted and averaged in a sound source direction predicting part 23 thereby to obtain a weighted mean. The maximum weighted mean is regarded as the time difference of the direct sounds reaching the microphones 21, 21, whereby the direction of the sound source is operated.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、複数のマイクロホン
で観測される信号間の相互相関関数に基づいてその音源
の方向や位置を推定する音源方向推定方法に関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound source direction estimating method for estimating the direction and position of a sound source based on a cross-correlation function between signals observed by a plurality of microphones.

【０００２】[0002]

【従来の技術】音源方向の推定は、例えば、遠隔監視装
置などで異常音の発生した方向を推定してその方向にモ
ニタカメラを向ける目的や、テレビ会議において発声者
の方向にカメラを向ける目的や、屋外における航空機飛
行軌道の追跡など、様々な目的において必要とされる基
本技術である。2. Description of the Related Art A sound source direction is estimated by, for example, estimating a direction in which an abnormal sound is generated by a remote monitoring device and pointing a monitor camera in that direction, or aiming a camera toward a speaker in a video conference. It is also a basic technology required for various purposes such as tracking the flight trajectory of an aircraft outdoors.

【０００３】音源方向推定の最も基本的な従来方法は、
複数のマイクロホンで観測される複数の信号の時間差に
基づく方法である。図７Ａはこのことを説明する図であ
る。第１のマイクロホン１、第２のマイクロホン２に、
波面３で入射方向４から入射し、第１のマイクロホン１
の出力信号５をｘ（ｔ）、第２のマイクロホン２の出力
信号６をｙ（ｔ）とする。この図の状態では音波は最初
に第１のマイクロホン１で受音され、少し遅れて第２の
マイクロホン２で受音される。第１，第２のマイクロホ
ン１，２間の距離をｄ、音波の到来方向を図中に示した
角度θとすると、第２のマイクロホン２に到来する遅れ
の時間（時間差）は、音波が距離ｄ・sin θを進むのに
要する時間τ０であり、音速をｃと表すと τ０＝（ｄ・sin θ）／ｃ（１）と関係づけられる。音波の到来方向が一方向であるとす
ると、第１のマイクロホン１の出力信号ｘ（ｔ）を用い
て、第２のマイクロホン２の出力信号ｙ（ｔ）は、ｙ
（ｔ）＝ｘ（ｔ−τ０）と表すことができる。そして、
この２つの信号ｘ（ｔ）、ｙ（ｔ）よりτ０を求めるこ
とができれば、（１）式より、音波の到来方向は次式 θ＝ sin^-1（ｃ・τ０／ｄ）（２）で求めることができる。The most basic conventional method of sound source direction estimation is
This is a method based on the time difference between a plurality of signals observed by a plurality of microphones. FIG. 7A is a diagram explaining this. In the first microphone 1 and the second microphone 2,
Incident from the incident direction 4 at the wavefront 3 and the first microphone 1
, And the output signal 6 of the second microphone 2 is y (t). In the state shown in this figure, the sound wave is first received by the first microphone 1 and then by the second microphone 2 with a slight delay. Assuming that the distance between the first and second microphones 1 and 2 is d and the arrival direction of the sound wave is the angle θ shown in the figure, the delay time (time difference) of arrival at the second microphone 2 is the distance of the sound wave. It is the time τ0 required to advance through d · sin θ, and when the speed of sound is represented by c, it is related to τ0 = (d · sin θ) / c (1). When the arrival direction of the sound wave is one direction, the output signal y (t) of the second microphone 2 is y using the output signal x (t) of the first microphone 1.
It can be expressed as (t) = x (t−τ0). And
If τ0 can be obtained from these two signals x (t) and y (t), from the equation (1), the arrival direction of the sound wave can be expressed by the following equation θ = sin ⁻¹ (c · τ0 / d) (2) You can ask.

【０００４】音波の時間差τ０は、２つの信号ｘ
（ｔ）、ｙ（ｔ）の相互相関関数φｘｙ（τ）を計算
し、その最大値をとるτの値として求めることができ
る。ここで、離散化された信号（ｔが整数）の相互相関
関数を次式、 φｘｙ（τ）＝Σｘ（ｔ）ｙ（ｔ＋τ）（３） Σはｔについての加算で定義する（連続信号の場合には、総和（Σ）が積分に
変更される）。この時、ｙ（ｔ）＝ｘ（ｔ−τ０）の関
係を用いれば、 φｘｙ（τ）＝Σｘ（ｔ）ｘ（ｔ＋τ−τ０）＝φｘｘ（τ−τ０）（４） Σはｔについての加算となる。但し、φｘｘ（τ）はｘ（ｔ）の自己相関関数
で、知られているように、τ＝０の時最大値をとる。従
って、φｘｙ（τ）はτ＝τ０の時最大値をとることが
理解される。図７Ｂに音波がパルス音である場合を例と
して、信号ｘ（ｔ）、ｙ（ｔ）およびそれらから計算さ
れる相互相関関数φｘｙ（τ）を図示した。φｘｙ
（τ）はτ＝τ０の点で明確な最大値を持つことがわか
る。The time difference τ0 of the sound waves is the two signals x
It is possible to calculate the cross-correlation function φxy (τ) of (t) and y (t) and obtain the value of τ that takes the maximum value. Here, the cross-correlation function of the discretized signal (t is an integer) is defined by the following equation: φxy (τ) = Σx (t) y (t + τ) (3) Σ is defined by addition for t (of continuous signals In that case, the summation (Σ) is changed to integral). At this time, if the relation of y (t) = x (t−τ0) is used, φxy (τ) = Σx (t) x (t + τ−τ0) = φxx (τ−τ0) (4) Σ is about t It becomes addition. However, φxx (τ) is an autocorrelation function of x (t), and takes a maximum value when τ = 0, as is known. Therefore, it is understood that φxy (τ) takes the maximum value when τ = τ0. In FIG. 7B, the signals x (t) and y (t) and the cross-correlation function φxy (τ) calculated from them are illustrated by taking the case where the sound wave is a pulse sound as an example. φxy
It can be seen that (τ) has a clear maximum at the point of τ = τ0.

【０００５】[0005]

【発明が解決しようとする課題】この従来法は、推定す
べき音源が発生する音波以外にも音波が存在したり、ま
た、反射音が存在する場合であっても、それらのパワー
が小さい場合には良好に動作することが知られている
（ほぼτ＝τ０の点で最大値をとる）。しかし、特に室
内音場における音源方向推定を考える場合には、反射音
のパワーは大きい場合が多く、従来法に大きな影響を与
える。図７Ｃを用いてこのことを説明する。In this conventional method, when sound waves exist in addition to the sound waves generated by the sound source to be estimated, or when reflected sounds exist, their power is small. Is known to work well (it takes a maximum at about τ = τ0). However, especially when considering the sound source direction estimation in the room sound field, the power of the reflected sound is often large, which greatly affects the conventional method. This will be described with reference to FIG. 7C.

【０００６】図７Ｃは、音波がパルス音で、単一反射音
がある場合を例として、信号ｘ（ｔ）、ｙ（ｔ）および
それらから計算される相互相関関数φｘｙ（τ）を図示
したもので、直接音７に対し反射音８が受音される。直
接音７は、音源から直接マイクロホンに到達する音のこ
とを意味し、その到来方向は音源方向と一致している。
一方、反射音８は音源から出た音が壁などで反射されて
マイクロホンに到達する音であるため、一般には反射音
の到来方向は音源方向とは異なっている。従って、直接
音７の到達時間差のみが、音源方向に関する情報を含ん
でいる。FIG. 7C illustrates the signals x (t), y (t) and the cross-correlation function φxy (τ) calculated from them, taking as an example the case where the sound wave is a pulse sound and there is a single reflection sound. Therefore, the reflected sound 8 is received with respect to the direct sound 7. The direct sound 7 means a sound that directly reaches the microphone from the sound source, and its arrival direction matches the sound source direction.
On the other hand, the reflected sound 8 is a sound that the sound emitted from the sound source is reflected by a wall or the like and reaches the microphone. Therefore, the arrival direction of the reflected sound is generally different from the sound source direction. Therefore, only the arrival time difference of the direct sound 7 includes information about the sound source direction.

【０００７】さて、図７Ｃの例では、直接音７および反
射音８は、それぞれ異なった時間差τ０およびτ１をも
って２つのマイクロホン出力信号ｘ（ｔ）、ｙ（ｔ）に
含まれている。音源方向の情報を含んでいるのは、直接
音７の時間差τ０のみである。この時、信号ｘ（ｔ）、
ｙ（ｔ）より計算される相互相関関数φｘｙ（τ）は同
図に示したものとなり、図７Ｂと比較すればわかるよう
に、τ＝τ０以外の複数の点でも極大値が生じて最大値
を与えるτの値が不明確になることがわかる。さらにこ
の極大値を与える点の数は反射音の数の２乗に比例して
増加するので、パワーの大きな反射音が多数存在する室
内音場においては、従来法によりτ０の値を求めて音源
方向を推定することが困難であることが理解できる。In the example of FIG. 7C, the direct sound 7 and the reflected sound 8 are included in the two microphone output signals x (t) and y (t) with different time differences τ0 and τ1, respectively. Only the time difference τ0 of the direct sound 7 includes the information of the sound source direction. At this time, the signal x (t),
The cross-correlation function φxy (τ) calculated from y (t) is as shown in the same figure, and as can be seen by comparing with FIG. 7B, the maximum value is generated even at a plurality of points other than τ = τ0, and the maximum value is obtained. It can be seen that the value of τ that gives is unclear. Furthermore, since the number of points giving this maximum value increases in proportion to the square of the number of reflected sounds, in a room sound field where many reflected sounds with large power exist, the value of τ0 is obtained by the conventional method. It can be understood that it is difficult to estimate the direction.

【０００８】この発明の目的は、上記したような、従来
の音源方向推定方法の問題点を解決し、反射音の多い室
内音場においても良好な音源方向推定を実現する新規な
音源方向推定方法を提供することにある。An object of the present invention is to solve the problems of the conventional sound source direction estimating method as described above, and to realize a good sound source direction estimating method even in a room sound field with many reflected sounds. To provide.

【０００９】[0009]

【課題を解決するための手段】複数のマイクロホンで観
測される信号間の相互相関関数に基づいて音源方向を推
定する方法において、請求項１の発明では、複数のマイ
クロホンの出力信号に対して、それぞれピークホールド
処理を行うことを特徴とする。このようにして反射音の
影響がマスクされ、直接音の時間差推定が良好に行われ
る。In a method of estimating a sound source direction based on a cross-correlation function between signals observed by a plurality of microphones, the invention of claim 1 provides: Each is characterized by performing peak hold processing. In this way, the influence of the reflected sound is masked, and the time difference of the direct sound is estimated well.

【００１０】請求項２の発明によれば各マイクロホンの
出力をそれぞれ複数の周波数成分に分割し、その各分割
された成分についてそれぞれピークホールド処理を行
い、これら処理されたものの各対応する周波数成分につ
いて相互相関関数を計算し、これら相互相関関数を重み
付け平均してその値に基づいて音源方向を推定する。請
求項３の発明によれば請求項１又は２の発明においてピ
ークホールド処理したものについて対数化処理し、その
対数化処理したものについて上記相互相関関数を求め
る。According to the second aspect of the present invention, the output of each microphone is divided into a plurality of frequency components, peak hold processing is performed on each of the divided components, and the corresponding frequency components of these processed components are divided. A cross-correlation function is calculated, these cross-correlation functions are weighted and averaged, and the sound source direction is estimated based on the value. According to the invention of claim 3, the peak hold processing in the invention of claim 1 or 2 is logarithmized, and the crosscorrelation function is obtained for the logarithmized one.

【００１１】[0011]

【作用】まず最初にピークホールド処理の作用効果につ
いて説明する。音源方向の情報は直接音の時間差τ０の
みに含まれている。しかし、図７Ｃに示したように、信
号ｘ（ｔ）、ｙ（ｔ）のように時間差τ１（≠τ０）を
もった反射音が付加されると相互相関関数φｘｙ（τ）
はτ０の点において明確な最大値を持たない。そこで、
図７Ｃに示した信号ｘ（ｔ）、ｙ（ｔ）に対してピーク
ホールド処理（各時点までの入力信号のパワーの最大値
を保持し、出力する処理）を行ってやれば、その結果は
図１Ａに示すような信号ｘ（ｔ）、ｙ（ｔ）になる。図
１Ａより信号ｘ（ｔ）、ｙ（ｔ）はそれぞれ直接音７が
受音されるとそのピーク値に保持され、遅れて到達する
反射音８は直接音７よりパワーが小さいため反射音８が
マスクされ、観測できなくなっていることがわかる。こ
れらピークホールド処理された信号ｘ（ｔ）、ｙ（ｔ）
に対して時間差分処理ｘ（ｔ）← ｘ（ｔ）−ｘ（ｔ−１）（５）（または微分処理）を行うと、波形の変化（増大）する
部分のみが取り出され、差分処理結果は図７Ｂに示した
反射音の無い場合の信号ｘ（ｔ）、ｙ（ｔ）と同一の信
号となる。従って、それらの信号より計算される相互相
関関数も図７Ｂに示したφｘｙ（τ）と同一のものとな
って、τ０の点で明確な最大値を持つ。First, the operation and effect of the peak hold process will be described. The information on the sound source direction is included only in the time difference τ0 of the direct sound. However, as shown in FIG. 7C, when a reflected sound having a time difference τ1 (≠ τ0) like signals x (t) and y (t) is added, the cross-correlation function φxy (τ) is added.
Has no definite maximum at the point of τ0. Therefore,
If the peak hold process (the process of holding the maximum value of the power of the input signal up to each time point and outputting it) is performed on the signals x (t) and y (t) shown in FIG. 7C, the result is The signals x (t) and y (t) are as shown in FIG. 1A. From FIG. 1A, the signals x (t) and y (t) are held at their peak values when the direct sound 7 is received, and the reflected sound 8 that arrives later has a smaller power than the direct sound 7, and thus the reflected sound 8 It can be seen that is masked and is no longer observable. These peak-hold processed signals x (t), y (t)
When time difference processing x (t) ← x (t) −x (t−1) (5) (or differential processing) is performed on, only the part where the waveform changes (increases) is extracted, and the difference processing result Is the same as the signals x (t) and y (t) shown in FIG. 7B when there is no reflected sound. Therefore, the cross-correlation function calculated from those signals is also the same as φxy (τ) shown in FIG. 7B, and has a clear maximum value at τ0.

【００１２】次に対数化処理の作用効果について説明す
る。反射音は音源からマイクロホンまでの到達経路が直
接音より長く、また反射時における壁面吸収のため、直
接音に比べてパワーは小さくなり、従って前述したピー
クホールド処理が有効となるのである。しかし、実際の
室内反射音系列においては、パワーの大きな複数の初期
反射音がほぼ同一時刻に到来し、その結果、しばしば直
接音よりパワーの大きな反射音（正確には複数の反射音
の重畳したもの）が観測される。図１Ｂにパワーの大き
な反射音が到来している場合のピークホールド処理の結
果を示す。図からわかるように、そのような反射音の影
響９は、ピークホールド処理のみでは除去できない。し
かし反射音のパワーは直接のパワーと比べて高々数倍程
度であるので、この影響は対数化処理により軽減され
る。例えば、暗騒音（定常騒音）のパワーが１で、そこ
にパワーが１０００の直接音が到達し、続いてパワーが
２０００の反射音が到達したとする。これを対数化処理
した後の数値で考えると暗騒音は０ｄＢであり、直接音
は３０ｄＢ、反射音は３３ｄＢのパワーをそれぞれもつ
ことになる。従って、真数値においては、反射音の大き
さは直接音の２倍であるが、これを対数化処理した結果
は１．１倍となり、反射音の影響が数値的に軽減されて
いることがわかる。図１Ｃに、図１Ｂの信号に対数化処
理を行った結果を示す。図１Ｃに示した反射音の影響９
は、図１Ｂに示したものと比べて小さくなっており、対
数化処理の有効性が理解できる。Next, the function and effect of the logarithmic processing will be described. The reflected sound has a longer arrival path from the sound source to the microphone than the direct sound, and because the reflected sound absorbs the wall surface, the power becomes smaller than that of the direct sound, and therefore the peak hold processing described above is effective. However, in the actual indoor reflected sound sequence, multiple initial reflected sounds with large power arrive at almost the same time, and as a result, reflected sounds with larger power than the direct sound (more accurately, multiple reflected sounds are superimposed). Thing) is observed. FIG. 1B shows the result of the peak hold processing when the reflected sound with large power has arrived. As can be seen from the figure, such an influence 9 of the reflected sound cannot be removed only by the peak hold processing. However, since the power of the reflected sound is about several times as high as the direct power, this effect is reduced by the logarithmic processing. For example, it is assumed that the background noise (steady noise) has a power of 1, and a direct sound having a power of 1000 has arrived there, followed by a reflected sound having a power of 2000. Considering this after the logarithmic processing, the background noise has a power of 0 dB, the direct sound has a power of 30 dB, and the reflected sound has a power of 33 dB. Therefore, in the true value, the magnitude of the reflected sound is twice that of the direct sound, but the result of logarithmizing this is 1.1 times, and the influence of the reflected sound is numerically reduced. Recognize. FIG. 1C shows the result of logarithmizing the signal of FIG. 1B. Effect of reflected sound shown in FIG. 1C 9
Is smaller than that shown in FIG. 1B, and the effectiveness of the logarithmic processing can be understood.

【００１３】また、直接音の観測機会を増加させるとい
う点では帯域分割処理は有効である。図２Ａは音声信号
の時間−周波数スペクトルを等高線表示したものの例で
ある。この音声の生起時刻１０に対し、この音声の持つ
１ｋＨｚ〜２ｋＨｚの周波数成分が生起する時刻１１は
少し遅れている。このように音声信号などは、周波数成
分に分けて観測すると、各周波数成分毎にその生起時間
が異なっていることが理解される。この１ｋＨｚ〜２ｋ
Ｈｚの成分の生起部分においても直接音の情報が含まれ
ているが、信号を全帯域で見た場合には０〜１ｋＨｚの
成分などによりマスクされてしまい良好な直接音の観測
は行えない。これらのことより、帯域分割処理を行って
個々の帯域毎に信号処理を行うことは直接音の観測機会
を増加させるという点から時間差τ０の推定精度を向上
させる。この考えを利用したのが請求項２の発明であ
る。Band division processing is effective in increasing the chances of observing direct sound. FIG. 2A is an example of contour-displaying the time-frequency spectrum of the audio signal. The time 11 at which the frequency component of 1 kHz to 2 kHz of the voice occurs is slightly behind the time 10 at which the voice occurs. As described above, when an audio signal or the like is observed by dividing it into frequency components, it is understood that the occurrence time is different for each frequency component. This 1kHz-2k
The direct sound information is also included in the occurrence portion of the Hz component, but when the signal is viewed in the entire band, it is masked by the 0 to 1 kHz component or the like, and good direct sound cannot be observed. From these facts, performing the band division processing and performing the signal processing for each individual band improves the estimation accuracy of the time difference τ0 in terms of increasing the chance of observing the direct sound. The invention of claim 2 utilizes this idea.

【００１４】以上説明したように、この発明方法は、ピ
ークホールド処理ならびに対数化処理により反射音の影
響を除去することを特徴とする。その結果、相互相関関
数に基づいた従来の音源方向推定方法の問題点であった
相互相関関数に及ぼす反射音の影響は大幅に改善され
る。As described above, the method of the present invention is characterized by removing the influence of the reflected sound by the peak hold process and the logarithmic process. As a result, the influence of the reflected sound on the cross-correlation function, which is a problem of the conventional sound source direction estimation method based on the cross-correlation function, is significantly improved.

【００１５】[0015]

【実施例】図３はこの発明の実施例を示す。２つのマイ
クロホン２１の出力信号を２つの帯域分割部２２におい
てそれぞれＭ個の周波数帯域に分割する。この帯域分割
の方法としては、例えば、ＦＦＴ（高速フーリエ変換）
などを用いる。次に、この２つの系の各帯域の信号に対
してそれぞれ以下に述べる処理を行う。まず、パワー演
算部２３において信号のパワーを求める。次に、ピーク
ホールド処理部２４において信号のピークホールド処理
を行う。次に対数処理部２５において信号の対数化を行
う。次に、時間差分処理部２６において信号の時間差分
を求める。次に、相互相関関数演算部２７において、同
一の処理を行った２つのマイクロホン出力の対応する周
波数帯域の信号の間の相互相関関数を求める。第ｋ番目
の周波数帯域に対して、時間差分処理を行った信号をそ
れぞれｘ′（ｋ，ｔ）、ｙ′（ｋ，ｔ）と表すと、その
相互相関関数φｘｙ′（ｋ，τ）は、次式により求めら
れる。FIG. 3 shows an embodiment of the present invention. The output signals of the two microphones 21 are each divided into M frequency bands by the two band division units 22. The band division method is, for example, FFT (Fast Fourier Transform).
And so on. Next, the processes described below are performed on the signals in the respective bands of the two systems. First, the power calculator 23 determines the power of the signal. Next, the peak hold processing section 24 performs peak hold processing of the signal. Next, the logarithm processing unit 25 converts the signal into a logarithm. Next, the time difference processing unit 26 calculates the time difference between the signals. Next, the cross-correlation function calculator 27 obtains a cross-correlation function between the signals in the corresponding frequency bands of the two microphone outputs that have been subjected to the same processing. When the signals subjected to the time difference processing for the k-th frequency band are expressed as x ′ (k, t) and y ′ (k, t), respectively, the cross-correlation function φxy ′ (k, τ) is , Is calculated by the following formula.

【００１６】 φｘｙ′（ｋ，τ）＝Σｘ′（ｋ，ｔ）ｙ′（ｋ，ｔ＋τ）（６） Σはｔに関して行う音源方向推定部２８においては、各帯域毎に求められた
相互相関関数を次式のように平均化してφｘｙ（τ）を
計算する。 φｘｙ（τ）＝ΣＷｋ・φｘｙ（ｋ，τ）（７） Σはｋ＝１からｋ＝Ｍまでである。Φxy ′ (k, τ) = Σx ′ (k, t) y ′ (k, t + τ) (6) Σ is performed on t In the sound source direction estimation unit 28, the cross-correlation obtained for each band is calculated. The function is averaged as in the following equation to calculate φxy (τ). φxy (τ) = ΣWk · φxy (k, τ) (7) Σ is from k = 1 to k = M.

【００１７】但し、Ｗｋは平均化に際して各帯域に付与
する重み関数であって、例えば、ＳＮ比の悪い帯域はＷ
ｋの値を小さくするなど、測定条件によって決定される
値である。音源方向推定部２８においては、以上の操作
によって求めた相互相関関数φｘｙ（τ）が最大値をと
るτの値を求め、これを直接音の持つ時間差τ０の推定
値とする。そして、音源方向は、次式 θ＝ sin^-1（ｃ・τ０／ｄ）（８）により求められ、出力される。However, Wk is a weighting function given to each band at the time of averaging, and for example, W is a band having a poor SN ratio.
It is a value determined by measurement conditions such as reducing the value of k. In the sound source direction estimating unit 28, the value of τ at which the cross-correlation function φxy (τ) obtained by the above operation takes the maximum value is obtained, and this is used as the estimated value of the time difference τ0 of the direct sound. Then, the sound source direction is obtained by the following equation θ = sin ⁻¹ (c · τ0 / d) (8) and output.

【００１８】以上の例ではマイクロホンは２つとして説
明してきたが、３つ以上のマイクロホンを用いることが
可能であれば推定精度は向上する。その場合には、複数
のマイクロホンから２つのマイクロホンの組を複数選び
出し、各々の組に対して上記と同様の方法で音源方向θ
を推定し、得られた複数の音源方向を平均化処理して最
終推定結果とする。このように複数の推定結果を平均化
すれば、雑音などによる推定誤差の影響が軽減される。In the above example, the description has been made assuming that there are two microphones, but if three or more microphones can be used, the estimation accuracy will improve. In that case, a plurality of groups of two microphones are selected from the plurality of microphones, and the sound source direction θ is selected for each group by the same method as described above.
Is estimated, and the obtained plural sound source directions are averaged to obtain a final estimation result. By averaging a plurality of estimation results in this way, the influence of estimation error due to noise or the like is reduced.

【００１９】図３において帯域分割することなく、ピー
クホールド処理及び対数化処理して相互相関関数を求め
てもよい。またいずれの場合でも対数化処理を省略して
もよい。対象とする音源から発生される音響信号は音声
などのように非定常信号である場合も多い。もし、非定
常信号が断続的に生起する場合には、各生起時刻毎に直
接音の観測機会は増加するわけであるので、ピークホー
ルド特性には減衰特性を持たせることが有効である。こ
のことを図２Ｂ（ａ)(ｂ)(ｃ）により説明する。図２Ｂ
の（ａ）はパルス音が断続的に発生している場合のマイ
クロホンの受音信号を表したもので、直接音７₁〜７₃
が順次時間的に離れて発生し、これら各直接音７₁〜７
₃に対しそれぞれ複数の反射音系列８₁〜８₃が生じて
いる。この信号に対して減衰特性を持たないピークホー
ルド処理を行った結果を図２Ｂの（ｂ）に示した。この
図より、最初の直接音７₁のピーク値が保持されたまま
となり、反射音は除去されているが、第２，第３の直接
音７₂，７₃も除去されてしまっていることがわかる。
そこで図２Ｂの（ａ）の信号に対して減衰特性を持つピ
ークホールド処理を行う。その結果を同図（ｃ）に示
す。この図より、ピークホールドに減衰特性を持たせれ
ば、第２，第３の直接音７₂，７₃は除去されることな
く、反射音を除去することができる。このようにして直
接音の観測機会を増加させ、複数の直接音から時間差τ
０を推定することにより、一つの直接音からのみτ０を
推定する場合と比べて演算誤差や雑音の影響をうけにく
く、より高い推定精度が得られるという点において有効
である。In FIG. 3, the cross-correlation function may be obtained by performing peak hold processing and logarithmic processing without dividing the band. In any case, the logarithmic processing may be omitted. The acoustic signal generated from the target sound source is often a non-stationary signal such as voice. If a non-stationary signal occurs intermittently, the chance of observing the direct sound increases at each occurrence time, so it is effective to provide the peak hold characteristic with an attenuation characteristic. This will be described with reference to FIGS. 2B (a) (b) (c). Figure 2B
(A) shows the sound reception signal of the microphone when the pulse sound is generated intermittently, and the direct sound 7 _{1 to} 7 ₃
Are sequentially generated temporally apart from each other, and each of these direct sounds 7 _{1 to} 7 is generated.
_A plurality of reflected sound sequences 8 ₁ to 8 ₃ are generated for each of the _three . The result of performing peak hold processing having no attenuation characteristic on this signal is shown in (b) of FIG. 2B. According to this figure, the peak value of the first direct sound 7 ₁ is retained and the reflected sound is removed, but the second and third direct sounds 7 ₂ and 7 ₃ are also removed. I understand.
Therefore, peak hold processing having an attenuation characteristic is performed on the signal shown in FIG. The result is shown in FIG. From this figure, if the peak hold has an attenuation characteristic, the reflected sound can be removed without removing the second and third direct sounds 7 ₂ and 7 ₃ . In this way, the chance of observing direct sounds is increased, and the time difference τ from multiple direct sounds is increased.
Estimating 0 is more effective in that it is less susceptible to the effects of calculation errors and noise, as compared with the case where τ 0 is estimated from only one direct sound, and higher estimation accuracy can be obtained.

【００２０】[0020]

【発明の効果】次に、この発明方法の有効性を検証する
ために図３の実施例により行った実験結果について説明
する。実験は、体積５０ｍ³、残響時間０．２５秒の室
内で行った。実験における音源とマイクロホンの配置を
図４に示す。図において、３１，３２は音源、３３，３
４はマイクロホンを表している。音源方向推定のための
信号は音声を用いた。受音した信号は８ｋＨｚでサンプ
リングを行い、６４点のＦＦＴで帯域分割（つまり３２
帯域に分割）し、相関の計算には２秒分の受音データを
用いた。Next, the result of an experiment conducted by the embodiment of FIG. 3 in order to verify the effectiveness of the method of the present invention will be described. The experiment was conducted in a room with a volume of 50 m ³ and a reverberation time of 0.25 seconds. The arrangement of the sound source and the microphone in the experiment is shown in FIG. In the figure, 31 and 32 are sound sources, and 33 and 3
Reference numeral 4 represents a microphone. Speech was used as the signal for estimating the direction of the sound source. The received signal is sampled at 8 kHz and divided into 64 points by FFT (that is, 32 bits).
The received sound data for 2 seconds was used for the calculation of the correlation.

【００２１】図５の（ａ)(ｂ）に音源３２のみが存在す
る場合の実験結果を示す。実験結果は得られた相互相関
関数φｘｙ（τ）を表示している。図５の（ａ）はピー
クホールド処理を行わない従来法により求めた結果を、
図５の（ｂ）はこの発明手法によって得られた結果を表
している。音源の方向に対応する正しい時間差は図中黒
矢印で示した。図より、従来法を用いて得られた結果
（図５の（ａ））においても、その最大値は正解と一致
しているが、反射音の影響で正解とは一致しない極大値
も発生している。一方、この発明手法を用いた結果（図
５の（ｂ））では、正解位置に明確な最大値が得られて
いることがわかる。5 (a) and 5 (b) show experimental results when only the sound source 32 is present. The experimental result displays the obtained cross-correlation function φxy (τ). FIG. 5A shows the result obtained by the conventional method that does not perform peak hold processing.
FIG. 5B shows the result obtained by the method of the present invention. The correct time difference corresponding to the direction of the sound source is indicated by the black arrow in the figure. As can be seen from the figure, even in the result obtained using the conventional method ((a) in FIG. 5), the maximum value agrees with the correct answer, but the maximum value that does not agree with the correct answer also occurs due to the reflected sound. ing. On the other hand, the result of using the method of the present invention ((b) of FIG. 5) shows that a clear maximum value is obtained at the correct position.

【００２２】図６の（ａ)(ｂ）は２つの音源３１，３２
より異なった音声（パワー比１対２）を発生させた場合
の結果を示している。図６の（ａ）は従来法により求め
た結果を、図６の（ｂ）はこの発明手法によって得られ
た結果を表している。同図より、従来法を用いて得られ
た結果（図６の（ａ））ではこれら２つの音源の区別は
困難であるが、この発明手法を用いた結果（図６の
（ｂ））では、各音源に対する直接音の時間差を明瞭に
推定できることがわかる。6A and 6B show two sound sources 31, 32.
The results are shown when more different sounds (power ratio 1: 2) are generated. 6A shows the result obtained by the conventional method, and FIG. 6B shows the result obtained by the method of the present invention. From the figure, it is difficult to distinguish between these two sound sources in the result obtained using the conventional method ((a) in FIG. 6), but in the result using the inventive method ((b) in FIG. 6). , It is clear that the time difference of the direct sound for each sound source can be clearly estimated.

【００２３】以上説明し、実験により確認してきたよう
に、この発明は反射音の多数存在する音場における音源
方向の推定に大変有効な手法である。As described above and confirmed by experiments, the present invention is a very effective method for estimating the direction of a sound source in a sound field where many reflected sounds exist.

[Brief description of drawings]

【図１】Ａは図７Ｃの信号にピークホールド処理を行っ
た結果の信号を表す図、Ｂは反射音のパワーが直接音の
パワーより大きい信号にピークホールド処理を行った結
果の信号を表す図、Ｃは図１Ｂの信号に対数化処理を行
った結果の信号を表す図である。1A is a diagram showing a signal obtained as a result of performing peak hold processing on the signal of FIG. 7C, and FIG. 1B is a signal showing a result of performing peak hold processing on a signal in which the power of reflected sound is larger than that of direct sound. FIG. 1C is a diagram showing a signal as a result of logarithmizing the signal of FIG. 1B.

【図２】Ａは音声の時間−周波数スペクトルの例を示す
図、Ｂは減衰特性を持つピークホールド処理の有効性を
説明する図である。2A is a diagram showing an example of a time-frequency spectrum of voice, and FIG. 2B is a diagram explaining the effectiveness of peak hold processing having an attenuation characteristic.

【図３】この発明の実施例を示すブロック図。FIG. 3 is a block diagram showing an embodiment of the present invention.

【図４】この発明の有効性を確認するための実験条件を
示す図。FIG. 4 is a diagram showing experimental conditions for confirming the effectiveness of the present invention.

【図５】音源が一つの場合の実験結果を示す図。FIG. 5 is a diagram showing an experimental result when there is one sound source.

【図６】音源が二つの場合の実験結果を示す図。FIG. 6 is a diagram showing experimental results when there are two sound sources.

【図７】Ａは音波の到来方向θと２つのマイクロホンで
受音される信号との関係を説明する図、Ｂは反射音がな
い場合の２つのマイクロホンの出力信号とそれらから計
算される相関関数を表した図、Ｃは反射音がある場合の
２つのマイクロホンの出力信号とそれらから計算される
相関関数を表した図である。7A is a diagram for explaining the relationship between the arrival direction θ of a sound wave and a signal received by two microphones, and FIG. 7B is an output signal of two microphones when there is no reflected sound and a correlation calculated from them. FIG. 6 is a diagram showing a function, and C is a diagram showing output signals of two microphones in the case where there is a reflected sound and a correlation function calculated from them.

Claims

[Claims]

1. A method of estimating a sound source direction based on a cross-correlation function between signals observed by a plurality of microphones, wherein peak hold processing is performed on outputs of the plurality of microphones to generate a plurality of signals. , A sound source direction estimation method characterized by calculating a cross-correlation function from a plurality of these processed signals.

2. A method of estimating a sound source direction based on a cross-correlation function between signals observed by a plurality of microphones, wherein outputs of a plurality of microphones are respectively divided into a plurality of frequency components, and each frequency component is divided into a plurality of frequency components. Peak hold processing is performed to generate multiple signals, the cross-correlation function between the corresponding frequency components of these processed signals is calculated, and the cross-correlation function between these frequency components is weighted and averaged. A sound source direction estimation method comprising estimating a sound source direction based on a cross-correlation function obtained by.

3. The sound source direction estimating method according to claim 1, wherein the cross-correlation function is calculated for a logarithmized process after the peak hold process.