JPH0587903A - Predicting method of direction of sound source - Google Patents

Predicting method of direction of sound source

Info

Publication number
JPH0587903A
JPH0587903A JP3249411A JP24941191A JPH0587903A JP H0587903 A JPH0587903 A JP H0587903A JP 3249411 A JP3249411 A JP 3249411A JP 24941191 A JP24941191 A JP 24941191A JP H0587903 A JPH0587903 A JP H0587903A
Authority
JP
Japan
Prior art keywords
sound
sound source
signals
microphones
cross
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP3249411A
Other languages
Japanese (ja)
Other versions
JP2985982B2 (en
Inventor
Yutaka Kaneda
豊 金田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP3249411A priority Critical patent/JP2985982B2/en
Publication of JPH0587903A publication Critical patent/JPH0587903A/en
Application granted granted Critical
Publication of JP2985982B2 publication Critical patent/JP2985982B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

PURPOSE:To correctly predict the direction of a sound source even in the sound field where many reflecting sounds are present. CONSTITUTION:Outputs from two microphones 21, 21 are respectively divided to M frequency bands by a band splitting part 22. The power of the signals in each band is detected in a power operating part 23. The peak value of the signals is held by a peak hold part 24, logarithmically processed by a logarithmic processing part 25, and differentiated by a time difference processing part 26. The function of correlation between the signals in the corresponding frequency bands of the outputs of the microphones processed in the same manner is obtained in a correlation function operating part 27. The function of correlation in each frequency band is weighted and averaged in a sound source direction predicting part 23 thereby to obtain a weighted mean. The maximum weighted mean is regarded as the time difference of the direct sounds reaching the microphones 21, 21, whereby the direction of the sound source is operated.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】この発明は、複数のマイクロホン
で観測される信号間の相互相関関数に基づいてその音源
の方向や位置を推定する音源方向推定方法に関するもの
である。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound source direction estimating method for estimating the direction and position of a sound source based on a cross-correlation function between signals observed by a plurality of microphones.

【0002】[0002]

【従来の技術】音源方向の推定は、例えば、遠隔監視装
置などで異常音の発生した方向を推定してその方向にモ
ニタカメラを向ける目的や、テレビ会議において発声者
の方向にカメラを向ける目的や、屋外における航空機飛
行軌道の追跡など、様々な目的において必要とされる基
本技術である。
2. Description of the Related Art A sound source direction is estimated by, for example, estimating a direction in which an abnormal sound is generated by a remote monitoring device and pointing a monitor camera in that direction, or aiming a camera toward a speaker in a video conference. It is also a basic technology required for various purposes such as tracking the flight trajectory of an aircraft outdoors.

【0003】音源方向推定の最も基本的な従来方法は、
複数のマイクロホンで観測される複数の信号の時間差に
基づく方法である。図7Aはこのことを説明する図であ
る。第1のマイクロホン1、第2のマイクロホン2に、
波面3で入射方向4から入射し、第1のマイクロホン1
の出力信号5をx(t)、第2のマイクロホン2の出力
信号6をy(t)とする。この図の状態では音波は最初
に第1のマイクロホン1で受音され、少し遅れて第2の
マイクロホン2で受音される。第1,第2のマイクロホ
ン1,2間の距離をd、音波の到来方向を図中に示した
角度θとすると、第2のマイクロホン2に到来する遅れ
の時間(時間差)は、音波が距離d・sin θを進むのに
要する時間τ0であり、音速をcと表すと τ0=(d・sin θ)/c (1) と関係づけられる。音波の到来方向が一方向であるとす
ると、第1のマイクロホン1の出力信号x(t)を用い
て、第2のマイクロホン2の出力信号y(t)は、y
(t)=x(t−τ0)と表すことができる。そして、
この2つの信号x(t)、y(t)よりτ0を求めるこ
とができれば、(1)式より、音波の到来方向は次式 θ= sin-1(c・τ0/d) (2) で求めることができる。
The most basic conventional method of sound source direction estimation is
This is a method based on the time difference between a plurality of signals observed by a plurality of microphones. FIG. 7A is a diagram explaining this. In the first microphone 1 and the second microphone 2,
Incident from the incident direction 4 at the wavefront 3 and the first microphone 1
, And the output signal 6 of the second microphone 2 is y (t). In the state shown in this figure, the sound wave is first received by the first microphone 1 and then by the second microphone 2 with a slight delay. Assuming that the distance between the first and second microphones 1 and 2 is d and the arrival direction of the sound wave is the angle θ shown in the figure, the delay time (time difference) of arrival at the second microphone 2 is the distance of the sound wave. It is the time τ0 required to advance through d · sin θ, and when the speed of sound is represented by c, it is related to τ0 = (d · sin θ) / c (1). When the arrival direction of the sound wave is one direction, the output signal y (t) of the second microphone 2 is y using the output signal x (t) of the first microphone 1.
It can be expressed as (t) = x (t−τ0). And
If τ0 can be obtained from these two signals x (t) and y (t), from the equation (1), the arrival direction of the sound wave can be expressed by the following equation θ = sin −1 (c · τ0 / d) (2) You can ask.

【0004】音波の時間差τ0は、2つの信号x
(t)、y(t)の相互相関関数φxy(τ)を計算
し、その最大値をとるτの値として求めることができ
る。ここで、離散化された信号(tが整数)の相互相関
関数を次式、 φxy(τ)=Σx(t)y(t+τ) (3) Σはtについての加算 で定義する(連続信号の場合には、総和(Σ)が積分に
変更される)。この時、y(t)=x(t−τ0)の関
係を用いれば、 φxy(τ)=Σx(t)x(t+τ−τ0)=φxx(τ−τ0) (4) Σはtについての加算 となる。但し、φxx(τ)はx(t)の自己相関関数
で、知られているように、τ=0の時最大値をとる。従
って、φxy(τ)はτ=τ0の時最大値をとることが
理解される。図7Bに音波がパルス音である場合を例と
して、信号x(t)、y(t)およびそれらから計算さ
れる相互相関関数φxy(τ)を図示した。φxy
(τ)はτ=τ0の点で明確な最大値を持つことがわか
る。
The time difference τ0 of the sound waves is the two signals x
It is possible to calculate the cross-correlation function φxy (τ) of (t) and y (t) and obtain the value of τ that takes the maximum value. Here, the cross-correlation function of the discretized signal (t is an integer) is defined by the following equation: φxy (τ) = Σx (t) y (t + τ) (3) Σ is defined by addition for t (of continuous signals In that case, the summation (Σ) is changed to integral). At this time, if the relation of y (t) = x (t−τ0) is used, φxy (τ) = Σx (t) x (t + τ−τ0) = φxx (τ−τ0) (4) Σ is about t It becomes addition. However, φxx (τ) is an autocorrelation function of x (t), and takes a maximum value when τ = 0, as is known. Therefore, it is understood that φxy (τ) takes the maximum value when τ = τ0. In FIG. 7B, the signals x (t) and y (t) and the cross-correlation function φxy (τ) calculated from them are illustrated by taking the case where the sound wave is a pulse sound as an example. φxy
It can be seen that (τ) has a clear maximum at the point of τ = τ0.

【0005】[0005]

【発明が解決しようとする課題】この従来法は、推定す
べき音源が発生する音波以外にも音波が存在したり、ま
た、反射音が存在する場合であっても、それらのパワー
が小さい場合には良好に動作することが知られている
(ほぼτ=τ0の点で最大値をとる)。しかし、特に室
内音場における音源方向推定を考える場合には、反射音
のパワーは大きい場合が多く、従来法に大きな影響を与
える。図7Cを用いてこのことを説明する。
In this conventional method, when sound waves exist in addition to the sound waves generated by the sound source to be estimated, or when reflected sounds exist, their power is small. Is known to work well (it takes a maximum at about τ = τ0). However, especially when considering the sound source direction estimation in the room sound field, the power of the reflected sound is often large, which greatly affects the conventional method. This will be described with reference to FIG. 7C.

【0006】図7Cは、音波がパルス音で、単一反射音
がある場合を例として、信号x(t)、y(t)および
それらから計算される相互相関関数φxy(τ)を図示
したもので、直接音7に対し反射音8が受音される。直
接音7は、音源から直接マイクロホンに到達する音のこ
とを意味し、その到来方向は音源方向と一致している。
一方、反射音8は音源から出た音が壁などで反射されて
マイクロホンに到達する音であるため、一般には反射音
の到来方向は音源方向とは異なっている。従って、直接
音7の到達時間差のみが、音源方向に関する情報を含ん
でいる。
FIG. 7C illustrates the signals x (t), y (t) and the cross-correlation function φxy (τ) calculated from them, taking as an example the case where the sound wave is a pulse sound and there is a single reflection sound. Therefore, the reflected sound 8 is received with respect to the direct sound 7. The direct sound 7 means a sound that directly reaches the microphone from the sound source, and its arrival direction matches the sound source direction.
On the other hand, the reflected sound 8 is a sound that the sound emitted from the sound source is reflected by a wall or the like and reaches the microphone. Therefore, the arrival direction of the reflected sound is generally different from the sound source direction. Therefore, only the arrival time difference of the direct sound 7 includes information about the sound source direction.

【0007】さて、図7Cの例では、直接音7および反
射音8は、それぞれ異なった時間差τ0およびτ1をも
って2つのマイクロホン出力信号x(t)、y(t)に
含まれている。音源方向の情報を含んでいるのは、直接
音7の時間差τ0のみである。この時、信号x(t)、
y(t)より計算される相互相関関数φxy(τ)は同
図に示したものとなり、図7Bと比較すればわかるよう
に、τ=τ0以外の複数の点でも極大値が生じて最大値
を与えるτの値が不明確になることがわかる。さらにこ
の極大値を与える点の数は反射音の数の2乗に比例して
増加するので、パワーの大きな反射音が多数存在する室
内音場においては、従来法によりτ0の値を求めて音源
方向を推定することが困難であることが理解できる。
In the example of FIG. 7C, the direct sound 7 and the reflected sound 8 are included in the two microphone output signals x (t) and y (t) with different time differences τ0 and τ1, respectively. Only the time difference τ0 of the direct sound 7 includes the information of the sound source direction. At this time, the signal x (t),
The cross-correlation function φxy (τ) calculated from y (t) is as shown in the same figure, and as can be seen by comparing with FIG. 7B, the maximum value is generated even at a plurality of points other than τ = τ0, and the maximum value is obtained. It can be seen that the value of τ that gives is unclear. Furthermore, since the number of points giving this maximum value increases in proportion to the square of the number of reflected sounds, in a room sound field where many reflected sounds with large power exist, the value of τ0 is obtained by the conventional method. It can be understood that it is difficult to estimate the direction.

【0008】この発明の目的は、上記したような、従来
の音源方向推定方法の問題点を解決し、反射音の多い室
内音場においても良好な音源方向推定を実現する新規な
音源方向推定方法を提供することにある。
An object of the present invention is to solve the problems of the conventional sound source direction estimating method as described above, and to realize a good sound source direction estimating method even in a room sound field with many reflected sounds. To provide.

【0009】[0009]

【課題を解決するための手段】複数のマイクロホンで観
測される信号間の相互相関関数に基づいて音源方向を推
定する方法において、請求項1の発明では、複数のマイ
クロホンの出力信号に対して、それぞれピークホールド
処理を行うことを特徴とする。このようにして反射音の
影響がマスクされ、直接音の時間差推定が良好に行われ
る。
In a method of estimating a sound source direction based on a cross-correlation function between signals observed by a plurality of microphones, the invention of claim 1 provides: Each is characterized by performing peak hold processing. In this way, the influence of the reflected sound is masked, and the time difference of the direct sound is estimated well.

【0010】請求項2の発明によれば各マイクロホンの
出力をそれぞれ複数の周波数成分に分割し、その各分割
された成分についてそれぞれピークホールド処理を行
い、これら処理されたものの各対応する周波数成分につ
いて相互相関関数を計算し、これら相互相関関数を重み
付け平均してその値に基づいて音源方向を推定する。請
求項3の発明によれば請求項1又は2の発明においてピ
ークホールド処理したものについて対数化処理し、その
対数化処理したものについて上記相互相関関数を求め
る。
According to the second aspect of the present invention, the output of each microphone is divided into a plurality of frequency components, peak hold processing is performed on each of the divided components, and the corresponding frequency components of these processed components are divided. A cross-correlation function is calculated, these cross-correlation functions are weighted and averaged, and the sound source direction is estimated based on the value. According to the invention of claim 3, the peak hold processing in the invention of claim 1 or 2 is logarithmized, and the crosscorrelation function is obtained for the logarithmized one.

【0011】[0011]

【作用】まず最初にピークホールド処理の作用効果につ
いて説明する。音源方向の情報は直接音の時間差τ0の
みに含まれている。しかし、図7Cに示したように、信
号x(t)、y(t)のように時間差τ1(≠τ0)を
もった反射音が付加されると相互相関関数φxy(τ)
はτ0の点において明確な最大値を持たない。そこで、
図7Cに示した信号x(t)、y(t)に対してピーク
ホールド処理(各時点までの入力信号のパワーの最大値
を保持し、出力する処理)を行ってやれば、その結果は
図1Aに示すような信号x(t)、y(t)になる。図
1Aより信号x(t)、y(t)はそれぞれ直接音7が
受音されるとそのピーク値に保持され、遅れて到達する
反射音8は直接音7よりパワーが小さいため反射音8が
マスクされ、観測できなくなっていることがわかる。こ
れらピークホールド処理された信号x(t)、y(t)
に対して時間差分処理 x(t)← x(t)−x(t−1) (5) (または微分処理)を行うと、波形の変化(増大)する
部分のみが取り出され、差分処理結果は図7Bに示した
反射音の無い場合の信号x(t)、y(t)と同一の信
号となる。従って、それらの信号より計算される相互相
関関数も図7Bに示したφxy(τ)と同一のものとな
って、τ0の点で明確な最大値を持つ。
First, the operation and effect of the peak hold process will be described. The information on the sound source direction is included only in the time difference τ0 of the direct sound. However, as shown in FIG. 7C, when a reflected sound having a time difference τ1 (≠ τ0) like signals x (t) and y (t) is added, the cross-correlation function φxy (τ) is added.
Has no definite maximum at the point of τ0. Therefore,
If the peak hold process (the process of holding the maximum value of the power of the input signal up to each time point and outputting it) is performed on the signals x (t) and y (t) shown in FIG. 7C, the result is The signals x (t) and y (t) are as shown in FIG. 1A. From FIG. 1A, the signals x (t) and y (t) are held at their peak values when the direct sound 7 is received, and the reflected sound 8 that arrives later has a smaller power than the direct sound 7, and thus the reflected sound 8 It can be seen that is masked and is no longer observable. These peak-hold processed signals x (t), y (t)
When time difference processing x (t) ← x (t) −x (t−1) (5) (or differential processing) is performed on, only the part where the waveform changes (increases) is extracted, and the difference processing result Is the same as the signals x (t) and y (t) shown in FIG. 7B when there is no reflected sound. Therefore, the cross-correlation function calculated from those signals is also the same as φxy (τ) shown in FIG. 7B, and has a clear maximum value at τ0.

【0012】次に対数化処理の作用効果について説明す
る。反射音は音源からマイクロホンまでの到達経路が直
接音より長く、また反射時における壁面吸収のため、直
接音に比べてパワーは小さくなり、従って前述したピー
クホールド処理が有効となるのである。しかし、実際の
室内反射音系列においては、パワーの大きな複数の初期
反射音がほぼ同一時刻に到来し、その結果、しばしば直
接音よりパワーの大きな反射音(正確には複数の反射音
の重畳したもの)が観測される。図1Bにパワーの大き
な反射音が到来している場合のピークホールド処理の結
果を示す。図からわかるように、そのような反射音の影
響9は、ピークホールド処理のみでは除去できない。し
かし反射音のパワーは直接のパワーと比べて高々数倍程
度であるので、この影響は対数化処理により軽減され
る。例えば、暗騒音(定常騒音)のパワーが1で、そこ
にパワーが1000の直接音が到達し、続いてパワーが
2000の反射音が到達したとする。これを対数化処理
した後の数値で考えると暗騒音は0dBであり、直接音
は30dB、反射音は33dBのパワーをそれぞれもつ
ことになる。従って、真数値においては、反射音の大き
さは直接音の2倍であるが、これを対数化処理した結果
は1.1倍となり、反射音の影響が数値的に軽減されて
いることがわかる。図1Cに、図1Bの信号に対数化処
理を行った結果を示す。図1Cに示した反射音の影響9
は、図1Bに示したものと比べて小さくなっており、対
数化処理の有効性が理解できる。
Next, the function and effect of the logarithmic processing will be described. The reflected sound has a longer arrival path from the sound source to the microphone than the direct sound, and because the reflected sound absorbs the wall surface, the power becomes smaller than that of the direct sound, and therefore the peak hold processing described above is effective. However, in the actual indoor reflected sound sequence, multiple initial reflected sounds with large power arrive at almost the same time, and as a result, reflected sounds with larger power than the direct sound (more accurately, multiple reflected sounds are superimposed). Thing) is observed. FIG. 1B shows the result of the peak hold processing when the reflected sound with large power has arrived. As can be seen from the figure, such an influence 9 of the reflected sound cannot be removed only by the peak hold processing. However, since the power of the reflected sound is about several times as high as the direct power, this effect is reduced by the logarithmic processing. For example, it is assumed that the background noise (steady noise) has a power of 1, and a direct sound having a power of 1000 has arrived there, followed by a reflected sound having a power of 2000. Considering this after the logarithmic processing, the background noise has a power of 0 dB, the direct sound has a power of 30 dB, and the reflected sound has a power of 33 dB. Therefore, in the true value, the magnitude of the reflected sound is twice that of the direct sound, but the result of logarithmizing this is 1.1 times, and the influence of the reflected sound is numerically reduced. Recognize. FIG. 1C shows the result of logarithmizing the signal of FIG. 1B. Effect of reflected sound shown in FIG. 1C 9
Is smaller than that shown in FIG. 1B, and the effectiveness of the logarithmic processing can be understood.

【0013】また、直接音の観測機会を増加させるとい
う点では帯域分割処理は有効である。図2Aは音声信号
の時間−周波数スペクトルを等高線表示したものの例で
ある。この音声の生起時刻10に対し、この音声の持つ
1kHz〜2kHzの周波数成分が生起する時刻11は
少し遅れている。このように音声信号などは、周波数成
分に分けて観測すると、各周波数成分毎にその生起時間
が異なっていることが理解される。この1kHz〜2k
Hzの成分の生起部分においても直接音の情報が含まれ
ているが、信号を全帯域で見た場合には0〜1kHzの
成分などによりマスクされてしまい良好な直接音の観測
は行えない。これらのことより、帯域分割処理を行って
個々の帯域毎に信号処理を行うことは直接音の観測機会
を増加させるという点から時間差τ0の推定精度を向上
させる。この考えを利用したのが請求項2の発明であ
る。
Band division processing is effective in increasing the chances of observing direct sound. FIG. 2A is an example of contour-displaying the time-frequency spectrum of the audio signal. The time 11 at which the frequency component of 1 kHz to 2 kHz of the voice occurs is slightly behind the time 10 at which the voice occurs. As described above, when an audio signal or the like is observed by dividing it into frequency components, it is understood that the occurrence time is different for each frequency component. This 1kHz-2k
The direct sound information is also included in the occurrence portion of the Hz component, but when the signal is viewed in the entire band, it is masked by the 0 to 1 kHz component or the like, and good direct sound cannot be observed. From these facts, performing the band division processing and performing the signal processing for each individual band improves the estimation accuracy of the time difference τ0 in terms of increasing the chance of observing the direct sound. The invention of claim 2 utilizes this idea.

【0014】以上説明したように、この発明方法は、ピ
ークホールド処理ならびに対数化処理により反射音の影
響を除去することを特徴とする。その結果、相互相関関
数に基づいた従来の音源方向推定方法の問題点であった
相互相関関数に及ぼす反射音の影響は大幅に改善され
る。
As described above, the method of the present invention is characterized by removing the influence of the reflected sound by the peak hold process and the logarithmic process. As a result, the influence of the reflected sound on the cross-correlation function, which is a problem of the conventional sound source direction estimation method based on the cross-correlation function, is significantly improved.

【0015】[0015]

【実施例】図3はこの発明の実施例を示す。2つのマイ
クロホン21の出力信号を2つの帯域分割部22におい
てそれぞれM個の周波数帯域に分割する。この帯域分割
の方法としては、例えば、FFT(高速フーリエ変換)
などを用いる。次に、この2つの系の各帯域の信号に対
してそれぞれ以下に述べる処理を行う。まず、パワー演
算部23において信号のパワーを求める。次に、ピーク
ホールド処理部24において信号のピークホールド処理
を行う。次に対数処理部25において信号の対数化を行
う。次に、時間差分処理部26において信号の時間差分
を求める。次に、相互相関関数演算部27において、同
一の処理を行った2つのマイクロホン出力の対応する周
波数帯域の信号の間の相互相関関数を求める。第k番目
の周波数帯域に対して、時間差分処理を行った信号をそ
れぞれx′(k,t)、y′(k,t)と表すと、その
相互相関関数φxy′(k,τ)は、次式により求めら
れる。
FIG. 3 shows an embodiment of the present invention. The output signals of the two microphones 21 are each divided into M frequency bands by the two band division units 22. The band division method is, for example, FFT (Fast Fourier Transform).
And so on. Next, the processes described below are performed on the signals in the respective bands of the two systems. First, the power calculator 23 determines the power of the signal. Next, the peak hold processing section 24 performs peak hold processing of the signal. Next, the logarithm processing unit 25 converts the signal into a logarithm. Next, the time difference processing unit 26 calculates the time difference between the signals. Next, the cross-correlation function calculator 27 obtains a cross-correlation function between the signals in the corresponding frequency bands of the two microphone outputs that have been subjected to the same processing. When the signals subjected to the time difference processing for the k-th frequency band are expressed as x ′ (k, t) and y ′ (k, t), respectively, the cross-correlation function φxy ′ (k, τ) is , Is calculated by the following formula.

【0016】 φxy′(k,τ)=Σx′(k,t)y′(k,t+τ) (6) Σはtに関して行う 音源方向推定部28においては、各帯域毎に求められた
相互相関関数を次式のように平均化してφxy(τ)を
計算する。 φxy(τ)=ΣWk・φxy(k,τ) (7) Σはk=1からk=Mまでである。
Φxy ′ (k, τ) = Σx ′ (k, t) y ′ (k, t + τ) (6) Σ is performed on t In the sound source direction estimation unit 28, the cross-correlation obtained for each band is calculated. The function is averaged as in the following equation to calculate φxy (τ). φxy (τ) = ΣWk · φxy (k, τ) (7) Σ is from k = 1 to k = M.

【0017】但し、Wkは平均化に際して各帯域に付与
する重み関数であって、例えば、SN比の悪い帯域はW
kの値を小さくするなど、測定条件によって決定される
値である。音源方向推定部28においては、以上の操作
によって求めた相互相関関数φxy(τ)が最大値をと
るτの値を求め、これを直接音の持つ時間差τ0の推定
値とする。そして、音源方向は、次式 θ= sin-1(c・τ0/d) (8) により求められ、出力される。
However, Wk is a weighting function given to each band at the time of averaging, and for example, W is a band having a poor SN ratio.
It is a value determined by measurement conditions such as reducing the value of k. In the sound source direction estimating unit 28, the value of τ at which the cross-correlation function φxy (τ) obtained by the above operation takes the maximum value is obtained, and this is used as the estimated value of the time difference τ0 of the direct sound. Then, the sound source direction is obtained by the following equation θ = sin −1 (c · τ0 / d) (8) and output.

【0018】以上の例ではマイクロホンは2つとして説
明してきたが、3つ以上のマイクロホンを用いることが
可能であれば推定精度は向上する。その場合には、複数
のマイクロホンから2つのマイクロホンの組を複数選び
出し、各々の組に対して上記と同様の方法で音源方向θ
を推定し、得られた複数の音源方向を平均化処理して最
終推定結果とする。このように複数の推定結果を平均化
すれば、雑音などによる推定誤差の影響が軽減される。
In the above example, the description has been made assuming that there are two microphones, but if three or more microphones can be used, the estimation accuracy will improve. In that case, a plurality of groups of two microphones are selected from the plurality of microphones, and the sound source direction θ is selected for each group by the same method as described above.
Is estimated, and the obtained plural sound source directions are averaged to obtain a final estimation result. By averaging a plurality of estimation results in this way, the influence of estimation error due to noise or the like is reduced.

【0019】図3において帯域分割することなく、ピー
クホールド処理及び対数化処理して相互相関関数を求め
てもよい。またいずれの場合でも対数化処理を省略して
もよい。対象とする音源から発生される音響信号は音声
などのように非定常信号である場合も多い。もし、非定
常信号が断続的に生起する場合には、各生起時刻毎に直
接音の観測機会は増加するわけであるので、ピークホー
ルド特性には減衰特性を持たせることが有効である。こ
のことを図2B(a)(b)(c)により説明する。図2B
の(a)はパルス音が断続的に発生している場合のマイ
クロホンの受音信号を表したもので、直接音71 〜73
が順次時間的に離れて発生し、これら各直接音71 〜7
3 に対しそれぞれ複数の反射音系列81 〜83 が生じて
いる。この信号に対して減衰特性を持たないピークホー
ルド処理を行った結果を図2Bの(b)に示した。この
図より、最初の直接音71 のピーク値が保持されたまま
となり、反射音は除去されているが、第2,第3の直接
音72 ,73 も除去されてしまっていることがわかる。
そこで図2Bの(a)の信号に対して減衰特性を持つピ
ークホールド処理を行う。その結果を同図(c)に示
す。この図より、ピークホールドに減衰特性を持たせれ
ば、第2,第3の直接音72 ,73 は除去されることな
く、反射音を除去することができる。このようにして直
接音の観測機会を増加させ、複数の直接音から時間差τ
0を推定することにより、一つの直接音からのみτ0を
推定する場合と比べて演算誤差や雑音の影響をうけにく
く、より高い推定精度が得られるという点において有効
である。
In FIG. 3, the cross-correlation function may be obtained by performing peak hold processing and logarithmic processing without dividing the band. In any case, the logarithmic processing may be omitted. The acoustic signal generated from the target sound source is often a non-stationary signal such as voice. If a non-stationary signal occurs intermittently, the chance of observing the direct sound increases at each occurrence time, so it is effective to provide the peak hold characteristic with an attenuation characteristic. This will be described with reference to FIGS. 2B (a) (b) (c). Figure 2B
(A) shows the sound reception signal of the microphone when the pulse sound is generated intermittently, and the direct sound 7 1 to 7 3
Are sequentially generated temporally apart from each other, and each of these direct sounds 7 1 to 7 is generated.
A plurality of reflected sound sequences 8 1 to 8 3 are generated for each of the three . The result of performing peak hold processing having no attenuation characteristic on this signal is shown in (b) of FIG. 2B. According to this figure, the peak value of the first direct sound 7 1 is retained and the reflected sound is removed, but the second and third direct sounds 7 2 and 7 3 are also removed. I understand.
Therefore, peak hold processing having an attenuation characteristic is performed on the signal shown in FIG. The result is shown in FIG. From this figure, if the peak hold has an attenuation characteristic, the reflected sound can be removed without removing the second and third direct sounds 7 2 and 7 3 . In this way, the chance of observing direct sounds is increased, and the time difference τ from multiple direct sounds is increased.
Estimating 0 is more effective in that it is less susceptible to the effects of calculation errors and noise, as compared with the case where τ 0 is estimated from only one direct sound, and higher estimation accuracy can be obtained.

【0020】[0020]

【発明の効果】次に、この発明方法の有効性を検証する
ために図3の実施例により行った実験結果について説明
する。実験は、体積50m3 、残響時間0.25秒の室
内で行った。実験における音源とマイクロホンの配置を
図4に示す。図において、31,32は音源、33,3
4はマイクロホンを表している。音源方向推定のための
信号は音声を用いた。受音した信号は8kHzでサンプ
リングを行い、64点のFFTで帯域分割(つまり32
帯域に分割)し、相関の計算には2秒分の受音データを
用いた。
Next, the result of an experiment conducted by the embodiment of FIG. 3 in order to verify the effectiveness of the method of the present invention will be described. The experiment was conducted in a room with a volume of 50 m 3 and a reverberation time of 0.25 seconds. The arrangement of the sound source and the microphone in the experiment is shown in FIG. In the figure, 31 and 32 are sound sources, and 33 and 3
Reference numeral 4 represents a microphone. Speech was used as the signal for estimating the direction of the sound source. The received signal is sampled at 8 kHz and divided into 64 points by FFT (that is, 32 bits).
The received sound data for 2 seconds was used for the calculation of the correlation.

【0021】図5の(a)(b)に音源32のみが存在す
る場合の実験結果を示す。実験結果は得られた相互相関
関数φxy(τ)を表示している。図5の(a)はピー
クホールド処理を行わない従来法により求めた結果を、
図5の(b)はこの発明手法によって得られた結果を表
している。音源の方向に対応する正しい時間差は図中黒
矢印で示した。図より、従来法を用いて得られた結果
(図5の(a))においても、その最大値は正解と一致
しているが、反射音の影響で正解とは一致しない極大値
も発生している。一方、この発明手法を用いた結果(図
5の(b))では、正解位置に明確な最大値が得られて
いることがわかる。
5 (a) and 5 (b) show experimental results when only the sound source 32 is present. The experimental result displays the obtained cross-correlation function φxy (τ). FIG. 5A shows the result obtained by the conventional method that does not perform peak hold processing.
FIG. 5B shows the result obtained by the method of the present invention. The correct time difference corresponding to the direction of the sound source is indicated by the black arrow in the figure. As can be seen from the figure, even in the result obtained using the conventional method ((a) in FIG. 5), the maximum value agrees with the correct answer, but the maximum value that does not agree with the correct answer also occurs due to the reflected sound. ing. On the other hand, the result of using the method of the present invention ((b) of FIG. 5) shows that a clear maximum value is obtained at the correct position.

【0022】図6の(a)(b)は2つの音源31,32
より異なった音声(パワー比1対2)を発生させた場合
の結果を示している。図6の(a)は従来法により求め
た結果を、図6の(b)はこの発明手法によって得られ
た結果を表している。同図より、従来法を用いて得られ
た結果(図6の(a))ではこれら2つの音源の区別は
困難であるが、この発明手法を用いた結果(図6の
(b))では、各音源に対する直接音の時間差を明瞭に
推定できることがわかる。
6A and 6B show two sound sources 31, 32.
The results are shown when more different sounds (power ratio 1: 2) are generated. 6A shows the result obtained by the conventional method, and FIG. 6B shows the result obtained by the method of the present invention. From the figure, it is difficult to distinguish between these two sound sources in the result obtained using the conventional method ((a) in FIG. 6), but in the result using the inventive method ((b) in FIG. 6). , It is clear that the time difference of the direct sound for each sound source can be clearly estimated.

【0023】以上説明し、実験により確認してきたよう
に、この発明は反射音の多数存在する音場における音源
方向の推定に大変有効な手法である。
As described above and confirmed by experiments, the present invention is a very effective method for estimating the direction of a sound source in a sound field where many reflected sounds exist.

【図面の簡単な説明】[Brief description of drawings]

【図1】Aは図7Cの信号にピークホールド処理を行っ
た結果の信号を表す図、Bは反射音のパワーが直接音の
パワーより大きい信号にピークホールド処理を行った結
果の信号を表す図、Cは図1Bの信号に対数化処理を行
った結果の信号を表す図である。
1A is a diagram showing a signal obtained as a result of performing peak hold processing on the signal of FIG. 7C, and FIG. 1B is a signal showing a result of performing peak hold processing on a signal in which the power of reflected sound is larger than that of direct sound. FIG. 1C is a diagram showing a signal as a result of logarithmizing the signal of FIG. 1B.

【図2】Aは音声の時間−周波数スペクトルの例を示す
図、Bは減衰特性を持つピークホールド処理の有効性を
説明する図である。
2A is a diagram showing an example of a time-frequency spectrum of voice, and FIG. 2B is a diagram explaining the effectiveness of peak hold processing having an attenuation characteristic.

【図3】この発明の実施例を示すブロック図。FIG. 3 is a block diagram showing an embodiment of the present invention.

【図4】この発明の有効性を確認するための実験条件を
示す図。
FIG. 4 is a diagram showing experimental conditions for confirming the effectiveness of the present invention.

【図5】音源が一つの場合の実験結果を示す図。FIG. 5 is a diagram showing an experimental result when there is one sound source.

【図6】音源が二つの場合の実験結果を示す図。FIG. 6 is a diagram showing experimental results when there are two sound sources.

【図7】Aは音波の到来方向θと2つのマイクロホンで
受音される信号との関係を説明する図、Bは反射音がな
い場合の2つのマイクロホンの出力信号とそれらから計
算される相関関数を表した図、Cは反射音がある場合の
2つのマイクロホンの出力信号とそれらから計算される
相関関数を表した図である。
7A is a diagram for explaining the relationship between the arrival direction θ of a sound wave and a signal received by two microphones, and FIG. 7B is an output signal of two microphones when there is no reflected sound and a correlation calculated from them. FIG. 6 is a diagram showing a function, and C is a diagram showing output signals of two microphones in the case where there is a reflected sound and a correlation function calculated from them.

Claims (3)

【特許請求の範囲】[Claims] 【請求項1】 複数のマイクロホンで観測される信号間
の相互相関関数に基づいて音源方向を推定する方法にお
いて、 複数のマイクロホンの出力に対してそれぞれピークホー
ルド処理を行って複数の信号を生成し、 これら処理された複数の信号より相互相関関数を計算す
ることを特徴とする音源方向推定方法。
1. A method of estimating a sound source direction based on a cross-correlation function between signals observed by a plurality of microphones, wherein peak hold processing is performed on outputs of the plurality of microphones to generate a plurality of signals. , A sound source direction estimation method characterized by calculating a cross-correlation function from a plurality of these processed signals.
【請求項2】 複数のマイクロホンで観測される信号間
の相互相関関数に基づいて音源方向を推定する方法にお
いて、 複数のマイクロホンの出力をそれぞれ複数の周波数成分
に分割し、 その各周波数成分に対してそれぞれピークホールド処理
を行って複数の信号を生成し、 これら処理された複数の信号の対応する周波数成分間の
相互相関関数を計算し、 これら各周波数成分間の相互相関関数を重み付け平均す
ることにより得られる相互相関関数の基づいて音源方向
を推定することを特徴とする音源方向推定方法。
2. A method of estimating a sound source direction based on a cross-correlation function between signals observed by a plurality of microphones, wherein outputs of a plurality of microphones are respectively divided into a plurality of frequency components, and each frequency component is divided into a plurality of frequency components. Peak hold processing is performed to generate multiple signals, the cross-correlation function between the corresponding frequency components of these processed signals is calculated, and the cross-correlation function between these frequency components is weighted and averaged. A sound source direction estimation method comprising estimating a sound source direction based on a cross-correlation function obtained by.
【請求項3】 上記ピークホールド処理の後、対数化処
理を行ったものについて上記相互相関関数を計算するこ
とを特徴とする請求項1又は2記載の音源方向推定方
法。
3. The sound source direction estimating method according to claim 1, wherein the cross-correlation function is calculated for a logarithmized process after the peak hold process.
JP3249411A 1991-09-27 1991-09-27 Sound source direction estimation method Expired - Fee Related JP2985982B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP3249411A JP2985982B2 (en) 1991-09-27 1991-09-27 Sound source direction estimation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP3249411A JP2985982B2 (en) 1991-09-27 1991-09-27 Sound source direction estimation method

Publications (2)

Publication Number Publication Date
JPH0587903A true JPH0587903A (en) 1993-04-09
JP2985982B2 JP2985982B2 (en) 1999-12-06

Family

ID=17192577

Family Applications (1)

Application Number Title Priority Date Filing Date
JP3249411A Expired - Fee Related JP2985982B2 (en) 1991-09-27 1991-09-27 Sound source direction estimation method

Country Status (1)

Country Link
JP (1) JP2985982B2 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08220209A (en) * 1995-02-16 1996-08-30 Tech Res & Dev Inst Of Japan Def Agency Underwater acoustic signal azimuth calculating device
EP1108994A2 (en) * 1999-12-14 2001-06-20 Matsushita Electric Industrial Co., Ltd. Method and apparatus for concurrently estimating respective directions of a plurality of sound sources and for monitoring individual sound levels of respective moving sound sources
JP2003514412A (en) * 1999-11-05 2003-04-15 ウェーブメーカーズ・インコーポレーテッド How to determine if a sound source is near or far from a pair of microphones
JP2007248610A (en) * 2006-03-14 2007-09-27 Mitsubishi Electric Corp Musical piece analyzing method and musical piece analyzing device
JP2013097273A (en) * 2011-11-02 2013-05-20 Toyota Motor Corp Sound source estimation device, method, and program and moving body
JP2013254267A (en) * 2012-06-05 2013-12-19 Toyota Motor Corp Approaching vehicle detection device and driving support system
JP2014035226A (en) * 2012-08-08 2014-02-24 Jvc Kenwood Corp Sound source direction detection device, sound source direction detection method, and sound source direction detection program
US8816226B2 (en) 2011-08-24 2014-08-26 Omron Corporation Switch device
US9361576B2 (en) 2012-06-08 2016-06-07 Samsung Electronics Co., Ltd. Neuromorphic signal processing device and method for locating sound source using a plurality of neuron circuits
JP2016114512A (en) * 2014-12-16 2016-06-23 日本電気株式会社 Oscillation source estimation system, method, and program
CN110307895A (en) * 2018-03-20 2019-10-08 本田技研工业株式会社 Allophone decision maker and allophone determination method
JPWO2021181517A1 (en) * 2020-03-10 2021-09-16

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08220209A (en) * 1995-02-16 1996-08-30 Tech Res & Dev Inst Of Japan Def Agency Underwater acoustic signal azimuth calculating device
JP2003514412A (en) * 1999-11-05 2003-04-15 ウェーブメーカーズ・インコーポレーテッド How to determine if a sound source is near or far from a pair of microphones
EP1108994A2 (en) * 1999-12-14 2001-06-20 Matsushita Electric Industrial Co., Ltd. Method and apparatus for concurrently estimating respective directions of a plurality of sound sources and for monitoring individual sound levels of respective moving sound sources
EP1108994A3 (en) * 1999-12-14 2003-12-10 Matsushita Electric Industrial Co., Ltd. Method and apparatus for concurrently estimating respective directions of a plurality of sound sources and for monitoring individual sound levels of respective moving sound sources
US6862541B2 (en) 1999-12-14 2005-03-01 Matsushita Electric Industrial Co., Ltd. Method and apparatus for concurrently estimating respective directions of a plurality of sound sources and for monitoring individual sound levels of respective moving sound sources
JP2007248610A (en) * 2006-03-14 2007-09-27 Mitsubishi Electric Corp Musical piece analyzing method and musical piece analyzing device
US8816226B2 (en) 2011-08-24 2014-08-26 Omron Corporation Switch device
JP2013097273A (en) * 2011-11-02 2013-05-20 Toyota Motor Corp Sound source estimation device, method, and program and moving body
JP2013254267A (en) * 2012-06-05 2013-12-19 Toyota Motor Corp Approaching vehicle detection device and driving support system
US9361576B2 (en) 2012-06-08 2016-06-07 Samsung Electronics Co., Ltd. Neuromorphic signal processing device and method for locating sound source using a plurality of neuron circuits
JP2014035226A (en) * 2012-08-08 2014-02-24 Jvc Kenwood Corp Sound source direction detection device, sound source direction detection method, and sound source direction detection program
JP2016114512A (en) * 2014-12-16 2016-06-23 日本電気株式会社 Oscillation source estimation system, method, and program
US9961460B2 (en) 2014-12-16 2018-05-01 Nec Corporation Vibration source estimation device, vibration source estimation method, and vibration source estimation program
CN110307895A (en) * 2018-03-20 2019-10-08 本田技研工业株式会社 Allophone decision maker and allophone determination method
CN110307895B (en) * 2018-03-20 2021-08-24 本田技研工业株式会社 Abnormal sound determination device and abnormal sound determination method
JPWO2021181517A1 (en) * 2020-03-10 2021-09-16
WO2021181517A1 (en) * 2020-03-10 2021-09-16 日本電気株式会社 Trajectory estimation device, trajectory estimation system, trajectory estimation method, and program recording medium

Also Published As

Publication number Publication date
JP2985982B2 (en) 1999-12-06

Similar Documents

Publication Publication Date Title
US6792118B2 (en) Computation of multi-sensor time delays
CN110770827B (en) Near field detector based on correlation
JP2985982B2 (en) Sound source direction estimation method
JP2006194700A (en) Sound source direction estimation system, sound source direction estimation method and sound source direction estimation program
Di Carlo et al. Mirage: 2d source localization using microphone pair augmentation with echoes
Jensen et al. DOA estimation of audio sources in reverberant environments
Tengan et al. Multi-source direction-of-arrival estimation using group-sparse fitting of steered response power maps
Lebarbenchon et al. Evaluation of an open-source implementation of the SRP-PHAT algorithm within the 2018 LOCATA challenge
Parsayan et al. TDE-ILD-based 2D half plane real time high accuracy sound source localization using only two microphones and source counting
Hadad et al. Multi-speaker direction of arrival estimation using SRP-PHAT algorithm with a weighted histogram
Salvati et al. A real-time system for multiple acoustic sources localization based on ISP comparison
Karimian-Azari et al. Fast joint DOA and pitch estimation using a broadband MVDR beamformer
Pessentheiner et al. Localization and characterization of multiple harmonic sources
Firoozabadi et al. Combination of nested microphone array and subband processing for multiple simultaneous speaker localization
Habib et al. Comparison of SRP-PHAT and multiband-PoPi algorithms for speaker localization using particle filters
Mandel et al. A probability model for interaural phase difference
Nikunen et al. Time-difference of arrival model for spherical microphone arrays and application to direction of arrival estimation
Habib et al. Concurrent speaker localization using multi-band position-pitch (m-popi) algorithm with spectro-temporal pre-processing.
Jacob et al. On the bias of direction of arrival estimation using linear microphone arrays
Ramamurthy et al. Experimental performance analysis of sound source detection with SRP PHAT-β
Aarabi et al. Multi-channel time-frequency data fusion
Karimian-Azari et al. Robust DOA estimation of harmonic signals using constrained filters on phase estimates
Matsuo et al. Estimating DOA of multiple speech signals by improved histogram mapping method
Kowalk et al. Geometry-aware DoA Estimation using a Deep Neural Network with mixed-data input features
El Chami et al. A phase-based dual microphone method to count and locate audio sources in reverberant rooms

Legal Events

Date Code Title Description
LAPS Cancellation because of no payment of annual fees