KR20090128221A

KR20090128221A - Method for sound source localization and system thereof

Info

Publication number: KR20090128221A
Application number: KR1020080054284A
Authority: KR
Inventors: 김현수; 육동석; 조영규; 최우진
Original assignee: 삼성전자주식회사; 고려대학교 산학협력단
Priority date: 2008-06-10
Filing date: 2008-06-10
Publication date: 2009-12-15

Abstract

PURPOSE: A method for predicting a sound source location and a system thereof are provided to determine a candidate space and apply an SRP-PHAT(Steered Response Power-phase transform) algorithm only to the candidate space, thereby reducing the amount of calculation. CONSTITUTION: A location reference table is made(S131). TDOA(Time Difference Of Arrival) is calculated according to combinations of each microphone pair(S132). A candidate TDOA perceiving function value is calculated(S133). The TDOA(Time Difference Of Arrival) is selected.

Description

Sound source location estimation method and system according to the method {Method for Sound Source Localization and System about}

본 발명은 다수의 마이크로폰(microphone)을 이용하여 음원(音源, Sound Source)의 위치를 추정하는 방법 및 그 방법을 이용한 음원 위치 추정 시스템에 관한 것이다.The present invention relates to a method for estimating the position of a sound source using a plurality of microphones, and a sound source position estimation system using the method.

마이크로폰에 입력되는 소리의 파워, 시간차, 위상 등을 이용하여 음원의 위치를 추정할 수 있는데, 마이크로폰 하나만으로는 소리가 발생한 위치나 방향을 정확히 추정하기 어렵기 때문에 다수의 마이크로폰을 이용한다.The position of the sound source can be estimated by using the power, time difference, phase, etc. of the sound input to the microphone. Since a single microphone cannot accurately estimate the position or direction of the sound, a plurality of microphones are used.

도 1은 종래 기술에 따른 다수의 마이크로 폰의 배열 구조를 나타낸 도면이다. 도 1 (a)는 마이크로폰을 선형으로 배열한 것을 나타낸 도면이며, 도 1 (b)는 마이크로폰을 원형으로 배열한 것을 나타낸 도면이며, 도 1 (c)는 100개 이상의 마이크로폰을 배열한 것을 나타낸 도면이다.1 is a view showing the arrangement of a plurality of microphones according to the prior art. Figure 1 (a) is a view showing the arrangement of the microphone linearly, Figure 1 (b) is a view showing the arrangement of the microphone in a circle, Figure 1 (c) is a view showing the arrangement of more than 100 microphones to be.

마이크로폰 배열은 여러 어플리케이션(application)에 적용 가능한데 특정 화자나 특정 위치에서 발생하는 소리만을 증폭 시켜주는 음원 증폭 (sound enhancement) 기술, 화자가 말을 하면 그 위치를 추적하는 음원 위치 추정 (sound source localization) 기술, 그리고 여러 화자가 동시에 말을 하여 혼합된 음성을 각 화자의 음성별로 분리시켜주는 음원 분리 (source separation) 기술 등에 사용 가능하다.Microphone arrays can be applied to many applications, including sound enhancement techniques that amplify only the sound that occurs at a specific speaker or location, and sound source localization that tracks the location as the speaker speaks. Technology, and a source separation technology that separates a mixed voice by each speaker's voice by speaking several speakers at the same time.

위의 음원 위치 추정 기술에서 가장 중요한 것은 추정 성능인데, 마이크로폰의 성능과 개수, 마이크로폰의 배치, 노이즈(noise)와 반향(echo)의 정도, 발성하는 화자의 수 등과 같은 요소들에 의해 추정 성능의 저하가 발생한다.The most important aspect of the sound source position estimation technique above is estimation performance, which is determined by factors such as microphone performance and number, microphone placement, noise and echo, and number of speakers. Degradation occurs.

즉, 마이크로폰의 성능이 우수하거나 마이크로폰의 수가 많을 때 음원 위치 추정 성능은 좋아지나 노이즈와 반향이 클수록 추정 성능은 낮아진다. 또한 특정 어플리케이션에 적합한 마이크로폰 배치를 통해서 성능을 높일 수 있으나 발성하는 화자의 수가 증가할수록 모호성이 커지기 때문에 성능은 저하된다. That is, when the microphone performance is excellent or the number of microphones is large, the sound source position estimation performance is improved, but the larger the noise and echo, the lower the estimation performance. In addition, performance can be improved through microphone placement suitable for a specific application, but performance decreases as ambiguity increases as the number of talkers increases.

음원의 위치 추정 성능을 높이기 위해 마이크로폰의 수를 늘리면 되지만, 모든 어플리케이션에 많은 수의 마이크로폰을 배치할 수가 없다. 따라서 비교적 적은 수(4~8개)의 마이크로폰으로도 음원 추정 성능을 보장하기 위해 마이크로폰의 적절한 배치 방법 및 각 어플리케이션에 적합한 음원 위치 추정 알고리즘을 선택해야 한다.You can increase the number of microphones to improve the position estimation performance of the sound source, but you cannot place a large number of microphones in every application. Therefore, in order to ensure sound source estimation performance with a relatively small number of microphones (4 to 8), it is necessary to select an appropriate placement method of the microphone and a sound source position estimation algorithm suitable for each application.

3차원 공간에서 마이크로폰 배열을 통한 음원 위치 추정 기술은 아래와 같이 크게 3가지로 요약된다.The sound source position estimation technique using the microphone array in the three-dimensional space is summarized into three types as follows.

1. 도착 지연 시간(time difference of arrival: TDOA)을 이용한 방법1. Method using time difference of arrival (TDOA)

2. 조향된 빔형성기(steered beam former)를 이용한 방법2. Method using steered beam former

3. 고해상도 스펙트럼 추정을 이용한 방법3. Method using high resolution spectral estimation

위와 같은 음원 위치 추정 기술과 관련된 설명을 참조할 수 있는 문헌은 아래와 같다. 또한, 본원 발명의 수행 과정에서 사용하게 될 SRP(Steered Response Power), SRP-PHAT(Steered Response Power - phase transform) 알고리즘에 관한 설명도 담겨있다.Documents that may refer to the description related to the sound source position estimation technique as described above are as follows. In addition, it contains a description of the Steeped Response Power (SRP), Steeped Response Power-phase transform (SRP-PHAT) algorithm to be used in the process of the present invention.

[1] M. Omologo and P. Svaizer, “Use of the crosspower-spectrum phase in acoustic event location”, IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, pp. 288-292, 1997.[1] M. Omologo and P. Svaizer, “Use of the crosspower-spectrum phase in acoustic event location”, IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, pp. 288-292, 1997.

[2] R. Ben, K. Waleed, and S. Claude, “Real time robot audition system incorporating both 3D sound source localisation and voice characterization”, IEEE International Conference on Robotics and Automation, pp. 4733-4738, 2007.[2] R. Ben, K. Waleed, and S. Claude, “Real time robot audition system incorporating both 3D sound source localization and voice characterization”, IEEE International Conference on Robotics and Automation, pp. 4733-4738, 2007.

[3] M. Togami, T. Sumiyoshi, and A. Amano, “Stepwise phase difference restoration method for sound source localization using multiple microphone pairs”, International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 117-120, 2007.[3] M. Togami, T. Sumiyoshi, and A. Amano, “Stepwise phase difference restoration method for sound source localization using multiple microphone pairs”, International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 117-120, 2007.

[4] J. H. DiBiase, “A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays”, Ph.D. thesis, Brown University, 2000.[4] J. H. DiBiase, “A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays”, Ph.D. thesis, Brown University, 2000.

[5] R. Schmidt, "Multiple emitter location and signal parameter estimation", IEEE transactions on antennas and propagation, vol. 34, pp. 276-280, 1986.[5] R. Schmidt, "Multiple emitter location and signal parameter estimation", IEEE transactions on antennas and propagation, vol. 34, pp. 276-280, 1986.

[6] A. Johansson, G. Cook, S. Nordholm, “Acoustic direction of arrival estimation a comparison between ROOT-MUSIC and SRP-PHAT”, IEEE Region 10 Conference on Convergent Technologies for the Asia-Pacific, vol. B, pp. 629-632, 2004.[6] A. Johansson, G. Cook, S. Nordholm, “Acoustic direction of arrival estimation a comparison between ROOT-MUSIC and SRP-PHAT”, IEEE Region 10 Conference on Convergent Technologies for the Asia-Pacific, vol. B, pp. 629-632, 2004.

위의 세 가지 음원 위치 추정 기술 중, 본원 발명과 관련성이 있는 위 1, 2의 방법 위주로 설명하고자 하며, 우선, 도착 지연 시간을 이용한 방법에 대해 도면을 들어 설명하면 다음과 같다. Of the above three sound source position estimation techniques, it will be described mainly in the above method 1 and 2 related to the present invention. First, the method using the arrival delay time will be described with reference to the drawings.

도 2는 2차원 공간에서의 음원 발생과 도착 지연 시간을 나타낸 도면이다.2 is a diagram illustrating a sound source generation and arrival delay time in a two-dimensional space.

도 2에 도시된 바와 같이, 공간상의 특정 부분에 위치한 음원(20)에서 발생한 소리가 두 마이크로폰(22, 24)에 평면적으로 입력된다. 2차원 공간으로 가정하였으므로, 소리가 평면적으로 입력되는 것이다. 도 2에서, 음원(20)과의 거리가 좀 더 가까운 마이크로폰2(24)에 소리가 먼저 도달하게 되고 마이크로폰1(22)에는 도착 지연 시간 τ만큼 늦게 도착하게 된다. As shown in FIG. 2, the sound generated from the sound source 20 located in a specific part of the space is input to the two microphones 22 and 24 planarly. Since it is assumed to be a two-dimensional space, the sound is input in a plane. In FIG. 2, the sound reaches the microphone 2 24, which is closer to the sound source 20, and arrives at the microphone 1 22 as late as the arrival delay time τ.

도 2에서, 우리가 알고자 하는 것은 음원의 방향 즉, 두 마이크로폰(22, 24)과 음원(20) 간의 각도 θ값을 알고자 하는 것이다. 도 2에서 두 마이크로폰 사이 의 거리(d_mic)와 도착 지연시간인 τ만큼 음원이 진행한 거리를 이용하여 sin 함수 식으로 표현하면 [수학식 1]을 얻을 수 있다.In Figure 2, what we want to know is to know the direction of the sound source, that is, the angle θ value between the two microphones 22 and 24 and the sound source 20. In FIG. 2, the expression (1) can be obtained by using the distance between the two microphones (d _mic ) and the distance traveled by the sound source by the arrival delay time τ.

[수학식 1][Equation 1]

c는 소리의 속도이다. 위 [수학식 1]을 아래 [수학식 2]와 같이 변형하면 두 마이크로 폰과 음원 간의 각도 θ를 얻을 수 있다.c is the speed of sound. By modifying Equation 1 above as Equation 2 below, an angle θ between two microphones and a sound source can be obtained.

[수학식 2][Equation 2]

[수학식 2]에서 소리의 속도 c, 두 마이크로폰 사이의 거리 d_mic는 알고 있는 값이지만, 도착 지연 시간 τ는 알지 못하는 값이다. τ를 계산하는 방법 중에 하나가 GCC(Generalized Cross Correlation) 방법이다. In Equation 2, the speed c of sound and the distance d _mic between two microphones are known, but the arrival delay time τ is unknown. One method of calculating τ is the Generalized Cross Correlation (GCC) method.

도 3은 GCC(Generalized Cross Correlation) 방법을 이용하여 도착 지연시간을 추정하는 방법을 나타낸 도면이다. 도 3 (a)는 두 마이크로폰(22, 24)에 입력된 신호를 나타낸 도면이고, 도 3 (b)는 두 마이크로폰에 입력된 신호의 상호 상관(cross correlation) 연산 결과를 나타낸 도면이다.3 is a diagram illustrating a method of estimating arrival delay time using a Generalized Cross Correlation (GCC) method. FIG. 3A is a diagram illustrating signals input to two microphones 22 and 24, and FIG. 3B is a diagram showing cross correlation calculation results of signals input to two microphones.

GCC 방법은 두 마이크로폰에 들어온 신호간의 상호 상관을 이용한 방법으로, 두 신호를 시간 축으로 이동하면서 상호 상관 연산을 수행하여 그 값이 가장 최대가 될 때의 이동 값을 도착 지연 시간으로 추정한다.The GCC method uses cross-correlation between signals input to two microphones, and performs a cross-correlation operation while moving two signals on a time axis, and estimates a shift value when the value reaches the maximum as an arrival delay time.

도 3 (a)와 같이 두 마이크로폰(22, 24) 간에 +2초 지연된 신호가 들어왔다고 할 때 두 신호를 시간 축으로 이동하면서 상호 상관 연산을 수행하면 도 3 (b)와 같은 그래프를 얻을 수 있다. 도 3 (b)에 도시된 바와 같이, -2의 값에서 가장 높은 값을 보이므로 도착 지연 시간은 -2가 되는 것을 알 수 있다.When a signal delayed by +2 seconds between two microphones 22 and 24 is inputted as shown in FIG. 3 (a), a cross-correlation operation is performed while moving the two signals on the time axis, thereby obtaining a graph as shown in FIG. have. As shown in (b) of FIG. 3, since the highest value is shown in the value of -2, the arrival delay time is -2.

도 3 (a), 도 3 (b)에 예시된 방법을 이용하여 도착 지연시간인 τ를 구하여 [수학식 2]에 대입하면 마이크로폰(22, 24)과 음원(20) 간의 각도 θ값을 추정할 수 있게 된다.By using the method illustrated in FIGS. 3 (a) and 3 (b), when the arrival delay time τ is obtained and substituted into Equation 2, the angle θ value between the microphones 22 and 24 and the sound source 20 is estimated. You can do it.

도 4는 4개의 마이크로폰을 이용한 음원 방향 추정 방법을 모식적으로 나타낸 도면이다.4 is a diagram schematically showing a sound source direction estimation method using four microphones.

음원이 발생하였을 때 측정된 두 마이크로폰 사이의 도착 지연 시간을 τ라고 할 때, 3차원 공간상에서 도착 지연 시간이 τ가 될 수 있는 모든 점들을 찍으면 하나의 하이퍼볼릭(hyperbolic)이 만들어지게 된다. 2개의 마이크로폰을 더 이용하여 총 4개의 마이크로폰을 이용하면 도 4에 도시된 바와 같이 2개의 하이퍼볼릭이 형성된다. 도 4에 도시된 바와 같이, 위 2개의 하이퍼볼릭 면이 만나는 부분에 직선이 형성되는데 이 방향을 3차원 공간상의 음원 방향이라고 추정한다.When the arrival delay time between two microphones measured when a sound source is generated is τ, a hyperbolic is made by taking all the points where the arrival delay time can be τ in three-dimensional space. Using two more microphones, using a total of four microphones, two hyperbolic is formed as shown in FIG. As shown in FIG. 4, a straight line is formed at a portion where the two hyperbolic surfaces meet, and this direction is assumed to be a sound source direction in three-dimensional space.

그러나, 도 4와 같이 마이크로폰 4개를 이용하면 음원의 방향만 알 수 있고 음원이 얼마나 멀리 떨어져 있는지는 알지 못한다. 또한, 2개의 하이퍼볼릭에 의해 생성되는 직선이 2개이기 때문에 소리가 발생한 방향이 앞인지 뒤인지를 알 수 없다. 이러한 문제점을 해결하기 위해 마이크로폰의 수를 늘려야하며, 이는 도 5에 서 설명한다.However, using four microphones as shown in Figure 4 can only know the direction of the sound source and does not know how far apart. In addition, since there are two straight lines generated by two hyperbolic, it is not possible to know whether the direction in which the sound is generated is forward or backward. In order to solve this problem, the number of microphones should be increased, which will be described in FIG. 5.

도 5는 8개의 마이크로폰을 이용한 음원 위치 추정 방법을 개략적으로 나타낸 도면이다.5 is a diagram schematically illustrating a sound source position estimation method using eight microphones.

도 5에 도시된 바와 같이, 8개의 마이크로폰을 이용하면 4개의 마이크로폰 쌍마다 2개의 직선이 만들어지고, 이중 두 직선이 만나는 한 점 또는 두 직선이 가장 근접하는 한 점이 존재하게 되는데 이 점을 3차원 공간상의 음원이 발생한 위치로 추정한다. As shown in FIG. 5, when eight microphones are used, two straight lines are created for every four pairs of microphones, and one of the two straight lines meets or one of the two closest lines exists. We estimate it as the location where sound source occurred in space.

그러나, 도 5에 도시한 방법에 의할 경우, 물리적으로 분리된 두 개의 영역에 마이크로폰을 위치시켜야 하기 때문에 하나의 장치로 도 5에 도시된 방법을 구현하기가 어렵다는 단점이 있다. 즉, 4개의 마이크로 폰이 하나의 세트일 때, 그러한 마이크로 폰 세트를 장착한 음원 위치 추정 장치를 2개나 설치하여야 하며, 공간상으로 분리시켜 설치하여야 한다. 또한, 동시에 여러 화자가 발성할 때 음원 위치 추정 성능이 급격히 떨어진다는 단점도 존재한다.However, the method shown in FIG. 5 has a disadvantage in that it is difficult to implement the method shown in FIG. 5 with a single device because the microphone must be located in two physically separated areas. That is, when four microphones are one set, two sound source position estimation apparatuses equipped with such microphone sets should be installed and separated into spaces. In addition, there is a disadvantage that the sound source position estimation performance is sharply degraded when several speakers are speaking at the same time.

앞서 언급한 마이크로폰 배열을 통한 음원 위치 추정 기술 중 조향된 빔 형성기(steered beam former)를 이용한 방법은 아래와 같다.A method using a steered beam former among the sound source position estimation techniques using the microphone array described above is as follows.

도 6은 조향된 빔 형성기를 도식화하여 나타낸 도면이다.6 is a diagram schematically illustrating a steered beam former.

도 6에 도시된 바와 같이, 조향된 빔 형성기를 이용한 음원 위치 추정 방법은, 임의의 한 점에서 소리가 발생했을 때 여러 개의 마이크로폰을 통해 들어온 각각의 신호를 더하여 어떠한 곳에서 소리가 발생했는지 알아내는 방법이다.As shown in FIG. 6, the sound source position estimation method using a steered beamformer adds signals from multiple microphones when sound is generated at an arbitrary point to find out where the sound is generated. It is a way.

각각의 신호를 더할 때, 여러 가지 조작을 가할 수 있는데, 조작의 방법에 따라 Delay-and-sum beam forming 방식과 Weight-and-sum beam forming 방식으로 구분해 볼 수 있다.When each signal is added, various manipulations can be applied, which can be classified into a delay-and-sum beam forming method and a weight-and-sum beam forming method according to the operation method.

우선, Delay-and-sum beam forming 방식에 대해 설명하면 다음과 같다. 3차원 공간의 임의의 한 점 q_s에서 음원이 발생했다고 가정했을 때, q_s와 각각의 마이크로폰과의 거리는 모두 다르다. 즉, 임의의 한 점 q_s에서 n번 마이크로폰에 도착하는 신호의 도착 지연시간을 τ_n이라고 했을 때, 도착 지연 시간을 고려하여 수식으로 나타내면 아래와 같다.First, the delay-and-sum beam forming method will be described. Assuming that a sound source occurs at any point q _s in three-dimensional space, the distance between q _s and each microphone is different. That is, assuming that the arrival delay time of the signal arriving at the n-th microphone at any one point q _s is τ _n , it is expressed by the equation considering the arrival delay time.

[수학식 3][Equation 3]

각 마이크로 폰에 음원이 도달하는 도착 지연시간을 고려하여 시간 동기(synchronization)를 맞춰주기 때문에, 주변 잡음은 줄여주고 원래의 신호는 증폭시켜줄 수 있다.Each microphone is time-synchronized to account for the arrival delay of the sound source, which reduces ambient noise and amplifies the original signal.

Weight-and-sum beam forming 방식은, 상기 Delay-and-sum beam forming 방법에 하나의 연산을 더 추가한 것이다. 입력 장치의 특성이나, 마이크로폰과 음원 과의 위치로 인해, 특정 마이크로폰은 더 신뢰할 수 있는 정보를 가지게 되고, 또 다른 마이크로폰은 특정 마이크로폰에 비해 덜 신뢰할만한 정보를 담고 있을 수 있다.The weight-and-sum beam forming method adds one operation to the delay-and-sum beam forming method. Due to the nature of the input device or the location of the microphone and the sound source, certain microphones may have more reliable information, and another microphone may contain less reliable information than certain microphones.

예를 들어, 8개의 마이크로폰이 있을 때, 1번 마이크로폰의 성능이 가장 뛰어나거나, 마이크로폰이 1번이 음원에 직접적으로 노출되어 있는 경우, 1번 마이크로폰은 다른 7개의 마이크로폰에 비해 상대적으로 중요한 정보를 담고 있을 가능성이 크다. 이와 같은 경우, 각 마이크로폰에 입력된 신호의 연산시, 1번 마이크로폰으로 입력된 신호에 가중치(weight)를 둠으로서 성능의 향상을 기대할 수 있는데, 이를 수식으로 나타내면 다음과 같다.For example, if you have eight microphones, microphone 1 has the best performance, or if microphone 1 is directly exposed to the sound source, microphone 1 provides more important information than the other seven microphones. It is likely to contain. In such a case, when the signal input to each microphone is calculated, performance can be expected to be improved by giving a weight to the signal input to the first microphone, which is expressed as follows.

[수학식 4][Equation 4]

앞서 소개한 Delay-and-sum beam forming 방법과 비교시, G_n(ω)라는 weight filter가 추가되어 있는 것을 확인할 수 있다. 또한, [수학식 4]는 [수학식 3]과 달리 대문자로 구성되어 있으며, 이는 주파수 도메인(domain) 상에서 표현되었음을 나타낸다. 퓨리에 변환(fourier transform)을 통해 신호의 축을 시간 축에서 주파수 축으로 변환한 것이며, 시간 축에 비해 주파수 축이 신호에 조작을 가해주기 쉽기 때문이다.Compared with the delay-and-sum beam forming method described above, it can be seen that a weight filter called G _n (ω) is added. In addition, unlike Equation 3, Equation 4 is composed of capital letters, indicating that it is expressed in the frequency domain. This is because the Fourier transform transforms the signal axis from the time axis to the frequency axis, and the frequency axis is easier to manipulate the signal than the time axis.

도 7은 SRP(Steered Response Power) 방법을 이용하여 측정한 공간 파워 그 래프를 나타낸 도면이다.FIG. 7 is a diagram illustrating a spatial power graph measured using a steered response power (SRP) method. FIG.

빔 형성기(beam former)로 받아들인 신호를 이용하여 음원의 위치 추적을 하기 위하여 사용하는 방법 중 하나가 SRP(Steered response power)이다. SRP는 공간상의 모든 위치에 대하여 빔 형성기의 출력 파워를 구하고, 출력 파워가 최대가 되는 위치를 음원의 발생 위치로 추정하는 방법이다. 빔 형성기의 출력 파워를 계산하는 식은 아래와 같다.One of the methods used to track the position of a sound source using a signal received by a beam former is a steep response power (SRP). SRP is a method of obtaining the output power of the beam former for all positions in space, and estimating the position where the output power is maximum as the generation position of the sound source. The formula for calculating the output power of the beam former is as follows.

위 식에서 Ψ_lk(ω)는 가중치 함수이며,

와 같다.In the above equation, Ψ _lk (ω) is a weight function,

Same as

위 식을 이용하여 음원을 추정하려는 공간상의 모든 점에 대하여 파워를 구한 것의 도시한 예가 도 7이다. 도 7에서, 가장 많이 솟아오른 부분이 출력 파워가 높게 연산된 지점의 좌표에 해당한다.FIG. 7 shows an example of obtaining power for all points in space where sound sources are estimated using the above equation. In FIG. 7, the most raised portion corresponds to the coordinate of the point where the output power is calculated to be high.

출력 파워가 최대가 되는 위치를 q_s'라 하면, q_s'는 아래 [수학식 6]에 의해 구할 수 있다.If the position where the output power is maximum is q _s ', q _s ' can be obtained by the following Equation 6.

[수학식 6][Equation 6]

도 8은 음원 위치 추정 알고리즘들의 성능 비교 그래프를 나타낸 도면이다.8 is a graph illustrating performance comparison of sound source position estimation algorithms.

도 8에서 비교된 알고리즘들은 SRP(Steered Response Power), GCC-PHAT (Generalized Cross Correlation - PHAse Transform), SRP-PHAT (Steered Response Power - PHAse Transform) 이다.The algorithms compared in FIG. 8 are Steered Response Power (SRP), Generalized Cross Correlation-PHAse Transform (GCC-PHAT), and Steeped Response Power-PHAse Transform (SRP-PHAT).

앞서 설명한 바와 같이, SRP 방식은 공간상의 모든 위치에 대하여 빔 형성기의 출력 파워를 구하고, 출력 파워가 최대가 되는 위치를 음원의 발생 위치로 추정하는 방법이다. SRP-PHAT 방식은 SRP 방식을 기반으로 하여, weight filter와 같은 연산을 부가한 방식이며, GCC-PHAT 방식은 마이크로폰에 들어온 신호간의 상호 상관(cross correlation)을 이용하여 도착 지연시간을 구한 다음 weight filter와 같은 부가 연산을 수행한 방식이다.As described above, the SRP method is a method of obtaining the output power of the beam former for all positions in the space, and estimating the position where the output power is maximum as the generation position of the sound source. The SRP-PHAT method is based on the SRP method and adds the same operation as the weight filter. The GCC-PHAT method calculates the arrival delay time by using cross correlation between the signals entering the microphone and then the weight filter. This is a method of performing additional operations such as

도 8의 가로축(x축)은 에러 임계값이며 세로축(y축)은 각 에러 임계값에 따른 음원 위치 추정 에러율(error rate)을 나타낸다. 도 8에서, 에러 임계값이 0.5미터일 때(즉, 추정한 음원 위치와 실제 음원 위치가 0.5미터 내에 해당하면 음원의 위치를 정확하게 추정한 것으로 간주), GCC-PHAT 방식의 에러율이 70%이고, SRP방식은 약 24%, SRP-PHAT 방식은 0%의 에러율을 갖는다. The horizontal axis (x axis) of FIG. 8 represents an error threshold value, and the vertical axis (y axis) represents a sound source position estimation error rate according to each error threshold value. In FIG. 8, when the error threshold value is 0.5 meters (that is, when the estimated sound source position and the actual sound source position are within 0.5 meters, the position of the sound source is accurately estimated), the error rate of the GCC-PHAT method is 70%. The SRP method has an error rate of about 24% and the SRP-PHAT method has a 0% error rate.

도 8에서 확인할 수 있듯이, 일반적으로 GCC-PHAT보다 SRP가 더 우수한 성능을 보이며, SRP보다 SRP-PHAT가 더 좋은 성능을 보임을 알 수 있다.As can be seen in Figure 8, in general, the SRP is better than the GCC-PHAT performance, it can be seen that the SRP-PHAT is better than the SRP.

그러나, 종래의 음원추정 기술 중에서 성능이 좋은 것으로 알려진 SRP-PHAT 방식을 이용할 경우, 검색 대상 공간을 정해진 일정한 크기의 블록으로 나누고, 나눠진 모든 블록에 대하여 출력 파워를 계산하고, 출력 파워가 최대인 지점을 음원의 발생 위치로 추정하는 그리드 검색을 필수적으로 수행해야 하는데, 이러한 그리드 검색은 많은 계산량이 요구되기 때문에 실시간 시스템에는 사용하기 힘들다는 문제점이 존재한다. 또한, 음원 위치 추정의 정밀도를 높이기 위해 블록의 크기를 줄이는 경우 SRP-PHAT 방법 수행을 위한 계산량은 더욱 많아지며, 음원 위치 추정을 위한 검색 대상 공간의 크기가 커지는 경우에도 계산해야 할 블록의 수가 많아지게 된다. 따라서, SRP-PHAT 방법은 그 정확성에도 불구하고 실시간 음원 위치 추정 시스템에 부적합하다는 문제점이 있다.However, when using the SRP-PHAT method, which is known to have good performance among conventional sound source estimation techniques, the search target space is divided into predetermined fixed size blocks, the output power is calculated for all divided blocks, and the point where the output power is maximum. It is essential to perform a grid search that estimates as the origin of the sound source, which is difficult to use in a real-time system because a large amount of computation is required. In addition, when the size of the block is reduced to increase the accuracy of the sound source position estimation, the amount of computation for performing the SRP-PHAT method is further increased, and even when the size of the search target space for the sound source position is increased, the number of blocks to be calculated is large. You lose. Therefore, the SRP-PHAT method is not suitable for a real-time sound source position estimation system despite its accuracy.

따라서, 본 발명의 목적은 정확도 높은 음원 위치 추정 방법을 이용하면서도 실시간 시스템에 적용 가능토록 하기 위해, 음원 위치 추정에 필요한 연산 속도를 향상시키는 방법을 제공함에 있다. Accordingly, an object of the present invention is to provide a method for improving the calculation speed required for sound source position estimation in order to be applicable to a real-time system while using an accurate sound source position estimation method.

본 발명의 다른 목적은 SRP-PHAT(Steered Response Power - phase transform)의 계산 속도 향상을 위한 방법을 제공하여 실시간 음원 위치 추정 시스템에 SRP-PHAT 알고리즘을 적용하는 방법을 제공함에 있다.Another object of the present invention is to provide a method for improving the calculation speed of steeped response power-phase transform (SRP-PHAT) and to apply a SRP-PHAT algorithm to a real-time sound source position estimation system.

본 발명의 또 다른 목적은, SRP-PHAT 방법을 사용하면서도 실시간 음원 위치 추정이 가능하도록 음원 위치 추정에 필요한 연산 속도를 향상시킨 음원 위치 추정 시스템을 제공함에 있다.It is still another object of the present invention to provide a sound source position estimation system having an improved computational speed necessary for sound source position estimation so as to enable real-time sound position estimation while using the SRP-PHAT method.

위와 같은 목적을 달성하기 위한 본 발명에 따른 음원위치 추정 방법은, 공간상에서 발생한 소리의 음파(sound wave)가 각 마이크로 폰(micro phone)에 도달하는 도착 지연시간(time difference of arrival)을 이용하여 음원(音源)의 위치를 추정에 필요한 후보 위치를 결정하는 단계 및 상기 결정된 후보 위치에 대해 SRP-PHAT(Steered Response Power - phase transform) 알고리즘을 수행하는 단계를 포함하여 이루어진다.The sound source position estimation method according to the present invention for achieving the above object, by using the time difference of arrival (sound wave) of the sound generated in space to reach each microphone (microphone) Determining a candidate position necessary for estimating the position of the sound source; and performing a steered response power-phase transform (SRP-PHAT) algorithm on the determined candidate position.

바람직하게, 상기 후보 위치를 결정하는 단계는, 도착 지연시간별 위치 참조 테이블을 작성하는 단계, 다수의 마이크로 폰 쌍에 입력되는 음파를 이용하여 각 마이크로 폰 쌍의 조합별로 도착 지연시간을 연산하는 단계, 연산된 도착 지연시간을 이용하여 각 마이크로 폰 쌍의 조합별로 후보 도착지연시간 추정 함수 값을 연산하는 단계, 연산된 후보 도착지연시간 추정 함수 값을 근거로, 각 마이크로 폰 쌍의 조합별로 소정 개수의 후보 도착지연시간을 선별하는 단계 및 선별된 후보 도착지연시간에 해당하는 공간 좌표를 상기 위치 참조 테이블을 이용하여 파악하는 단계를 포함하는 것을 특징으로 한다.Preferably, the determining of the candidate position comprises: generating a position reference table for each arrival delay time, calculating arrival delay time for each combination of microphone pairs using sound waves inputted into a plurality of microphone pairs; Calculating a candidate arrival delay time estimation function value for each combination of microphone pairs using the calculated arrival delay time; and a predetermined number of combinations of each microphone pair based on the calculated candidate arrival delay estimation function value. Selecting a candidate arrival delay time and determining spatial coordinates corresponding to the selected candidate arrival delay time using the location reference table.

또한, 본 발명에 따른 음원위치 추정 시스템은, 공간상에서 발생한 소리의 음파(sound wave)를 받아들이는 다수의 마이크로 폰(micro phone), 다수의 마이크로 폰을 통해 입력된 음파의 도착 지연시간(time difference of arrival)을 이용하여 음원(音源)의 위치를 추정에 필요한 후보 위치를 결정하고, 결정된 후보 위치에 대해 SRP-PHAT(Steered Response Power - phase transform) 알고리즘을 수행하여 음원의 위치를 추정하는 음원 위치 추정부를 포함하여 이루어진다. In addition, the sound source position estimation system according to the present invention includes a plurality of microphones for receiving sound waves of sounds generated in space, and time difference of arrival of sound waves input through a plurality of microphones. A source position for determining a candidate position necessary for estimating the position of a sound source using an arrival, and performing a SRP-PHAT algorithm on the determined candidate position to estimate the position of the sound source. It includes an estimator.

바람직하게, 상기 음원 위치 추정부는, 다수의 마이크로 폰에 도달한 음파의 도착 지연시간을 연산하는 도착 지연시간 판단부, 연산된 도착 지연시간을 이용하여 각 마이크로 폰 쌍별로 소정 개수의 후보 도착지연시간(TDOA : time difference of arrival)들을 선별하는 후보 TDOA 판단부, 선정된 후보 TDOA들의 공간상 좌표를 기록하는 위치 참조 테이블 작성부 및 위치 참조 테이블에서 상기 후보 TDOA 들의 공간상 좌표를 참조하여 각 후보 TDOA들에 대한 SRP-PHAT 알고리즘을 수행하는 SRP 알고리즘 수행부를 포함하여 이루어진다.Preferably, the sound source position estimating unit, the arrival delay time determining unit for calculating the arrival delay time of the sound waves that reach a plurality of microphones, a predetermined number of candidate arrival delay time for each microphone pair by using the calculated arrival delay time A candidate TDOA determiner for selecting (TDOA) time differences of arrivals, a position reference table generator for recording spatial coordinates of the selected candidate TDOAs, and each candidate TDOA with reference to the spatial coordinates of the candidate TDOAs in a position reference table It includes the SRP algorithm performing unit for performing the SRP-PHAT algorithm for these.

본 발명에 따른 음원 추정 방법을 이용하면, SRP(Steered Response Power), SRP-PHAT(Steered Response Power - phase transform) 와 같은 알고리즘을 이용하여 음원 위치 추정을 함에 있어 공간상의 모든 블록에 대한 연산을 수행해야 했던 과정을 줄여주게 된다. 따라서, 종래의 SRP, SRP-PHAT 알고리즘 수행시간을 단축시켜 실시간 음원 위치 추적이 가능하게 된다는 효과가 있다. When the sound source estimation method according to the present invention is used, arithmetic operations on all blocks in space are performed in the sound source position estimation using algorithms such as steep response power (SRP) and steep response power-phase transform (SRP-PHAT). This reduces the process that has to be done. Therefore, it is possible to shorten the execution time of the conventional SRP and SRP-PHAT algorithm, thereby enabling real-time sound source location tracking.

본 발명에서 제안하는 방법의 일실시 예를 개략적으로 설명하면 다음과 같다. 마이크로폰 쌍으로 입력된 음파(sound wave)의 위상 차이(또는, 도착 지연 시간, time difference of arrival; TDOA)를 주파수 별로 계산한 뒤, 음원이 발생한 것으로 추정되는 후보 위치를 파악하여 후보 위치에 대해서만 SRP-PHAT(Steered Response Power - phase transform) 방법을 이용하여 음원 위치를 추정한다.Hereinafter, an embodiment of the method proposed by the present invention will be described. After calculating the phase difference (or time difference of arrival; TDOA) of the sound waves input to the microphone pair for each frequency, the candidate position where the sound source is estimated is identified, and the SRP only for the candidate position. -Sound source location is estimated by using Steady Response Power-phase transform (PHAT) method.

도 8에서 살펴본 바와 같이, SRP-PHAT 방법이 음원 추정에 있어 가장 정확도가 높다. 그러나, SRP-PHAT 방법은 요구되는 계산량이 상당히 많아 실시간 시스템에 적용하기 어렵다는 단점이 있었다. 따라서, 음원 추정 기술 중에서 비교적 계산량이 적은 TDOA 추정을 통해 검색 대상 공간의 수를 줄여 후보 공간으로 삼고, 후보 공간에 대해서만 SRP-PHAT 방법을 수행하여 음원의 위치를 최종적으로 추정하려는 것이다.As shown in FIG. 8, the SRP-PHAT method has the highest accuracy in sound source estimation. However, the SRP-PHAT method has a disadvantage in that it is difficult to apply to a real-time system because a large amount of calculation is required. Therefore, the number of search target spaces are reduced to be candidate spaces through the TDOA estimation, which is relatively small in the sound source estimation technique, and the position of the sound source is finally estimated by performing the SRP-PHAT method only on the candidate spaces.

즉, 본 발명에 따른 음원 위치 추정 방법은 크게 보아 2단계 검색을 하는데, 그 첫 번째 단계가 마이크로폰 쌍을 이용하여 특정 시간과 주파수에 대한 도착 지연 시간(TDOA) 추정을 통해 후보 위치를 결정하는 단계이고, 두 번째 단계는 결정된 후보 위치들에 대해서 SRP-PHAT 방법을 적용하여 최종적인 음원 위치를 추정하는 단계이다. 검색해야 하는 대상 공간을 줄여 후보 공간을 결정하고, 후보 공간에 대해서만 SRP-PHAT 알고리즘을 적용함으로써 계산량을 줄일 수 있는 것이다. In other words, the method for estimating the sound source position according to the present invention performs a two-stage search in which the first step is to determine the candidate position through the arrival delay time (TDOA) estimation for a specific time and frequency using a pair of microphones. The second step is to estimate the final sound source position by applying the SRP-PHAT method to the determined candidate positions. By reducing the target space to be searched to determine the candidate space, and applying the SRP-PHAT algorithm only to the candidate space can reduce the amount of computation.

이하, 본 발명에 따른 음원 추정 방법의 실시 예를 구체적으로 설명하면 다음과 같다.Hereinafter, an embodiment of a sound source estimation method according to the present invention will be described in detail.

우선, 다수의 마이크로폰을 배열한다. 앞서, 도 1 (a), (b), (c)에 도시된 선형 마이크로폰, 원형 마이크로 폰 등이 다수의 마이크로폰이 하나의 장치에 장착되어 있는 예이다.First, a plurality of microphones are arranged. Previously, linear microphones, circular microphones, and the like illustrated in FIGS. 1A, 1B, and 3C are examples in which a plurality of microphones are mounted in one device.

마이크로폰의 배열이 이루어지면, 음원 위치를 추정할 대상공간(즉, 마이크로폰이 놓여 있는 공간)의 공간상 각 지점의 좌표를 도착 지연시간별로 대응시켜 놓은 테이블을 작성한다. 그러한 테이블을 본 발명에서는 위치 참조 테이블이라 하겠다.When the microphones are arranged, a table in which the coordinates of the respective points in the space of the target space (that is, the space in which the microphone is placed) for which the sound source positions are to be estimated is mapped for each arrival delay time. Such a table will be referred to as a location reference table in the present invention.

이후, 공간상에서 실제로 소리가 발생하여 소리의 음파(sound wave)가 각 마이크로 폰에 도달하면, 각 마이크로 폰 쌍의 조합별로 음파의 주파수 추정 값을 변경시켜가며 도착 지연 시간에 관한 τ_ij(t,f)값을 구한다. 여기서, i와 j는 각각 i 번째 마이크로폰과 j번째 마이크로 폰을 뜻하며, τ_ij는 i번째 마이크로폰과 j번째 마이크로폰에 입력되는 음파 간의 도착 지연시간을 나타낸다. 또, t는 시간, f는 주파수를 의미한다. Then, when the sound actually occurs in space and the sound wave reaches the microphones, τ _ij (t, f) Get the value. Here, i and j represent the i-th microphone and the j-th microphone, respectively, and τ _ij represents the arrival delay time between sound waves input to the i-th microphone and the j-th microphone. In addition, t means time and f means frequency.

동일한 위치에서 발생한 음성이라도 음파에 따라 도착 지연시간은 달라질 수 있는데, 음파의 진동수를 알면 마이크로폰에 도달한 음파 간의 도달 지연시간은 음파 간의 위상차이에 의해서도 연산할 수 있다. 따라서, i번째 마이크로폰과 j번째 마이크로폰에 도달한 음파를 이용한 위상 차이는 아래와 같은 함수에 의해 계산될 수 있을 것이다.Even if the voice is generated from the same location, the arrival delay time may vary depending on the sound wave. If the frequency of the sound wave is known, the arrival delay time between the sound waves reaching the microphone can also be calculated by the phase difference between the sound waves. Therefore, the phase difference using the sound waves reaching the i-th microphone and the j-th microphone may be calculated by the following function.

[수학식 7][Equation 7]

위 식에서 arg(X)는 X의 위상을 출력하는 함수이며, X_i와 X_j는 각각 i번째 마이크로 폰에 입력된 음파의 위상과 j번째 마이크로 폰에 입력된 음파의 위상을 나타낸다.In the above equation, arg (X) is a function that outputs the phase of X, and X _i and X _j represent the phase of the sound wave input to the i-th microphone and the phase of the sound wave input to the j-th microphone, respectively.

이후, 아래 식에 의해 후보 TDOA 파악을 위한 연산을 한다.After that, the operation for identifying the candidate TDOA is performed by the following equation.

[수학식 8][Equation 8]

σ(X)는 X가 참이면 1을, 거짓이면 0을 출력하는 함수이고, τ^k는 두 마이크 로폰을 이용하여 계산될 수 있는 TDOA의 집합

중 k번째 TDOA이다. σ (X) is a function that outputs 1 if X is true and 0 if false, and τ ^k is a set of TDOAs that can be calculated using two microphones.

Kth TDOA.

[수학식 8]를 이용하면 x-축이 τ^k이고 y-축이 Φ(k)인 히스토그램을 그릴 수 있다. 이 히스토그램에서 가장 큰 Φ(k)값을 나타내는 n개의 τ^k를 후보 TDOA라고 한다. 그러면 모든 마이크로폰 쌍들에 대한 후보 TDOA들을 아래와 같이 구할 수 있다. Using Equation 8, a histogram having an x-axis of τ ^k and a y-axis of Φ (k) can be drawn. In the histogram, n τ ^k representing the largest Φ (k) value is referred to as a candidate TDOA. Then candidate TDOAs for all microphone pairs can be found as follows.

τ_ij ⁿ는 i번째 마이크로폰과 j번째 마이크로폰을 이용하여 계산된 TDOA 후보들 중 n번째 후보를 의미한다. 각 마이크로 폰 쌍의 조합별로 구해진 후보 TDOA들을 아래와 같이 재조합한다.τ _ij ⁿ denotes an n th candidate among TDOA candidates calculated using the i th microphone and the j th microphone. The candidate TDOAs obtained for each combination of microphone pairs are recombined as follows.

위와 같이 재조합 과정을 마친 뒤, 상기 후보 TDOA들이 3차원 공간상의 어느 좌표에 해당하는지를 알 필요가 있다. 따라서, 앞서 작성했던 위치 참조 테이블에서 각 TDOA별로 좌표를 파악한다. After the recombination process as described above, it is necessary to know which coordinate in the three-dimensional space the candidate TDOA. Therefore, the coordinates are identified for each TDOA in the location reference table created above.

후보 TDOA 들의 좌표가 파악되면, 파악된 좌표들에 대해 SRP-PHAT 방법을 이용하여 음원 추정을 수행한다. SRP-PHAT 방법은 앞서 배경기술 부분에서 설명하였고, 참조 문헌에 상세히 기재된 종래 기술에 해당하므로, 자세한 설명은 생략한다.Once the coordinates of the candidate TDOAs are identified, sound source estimation is performed on the identified coordinates using the SRP-PHAT method. The SRP-PHAT method has been described in the background section above, and corresponds to the prior art described in detail in the reference literature, and thus detailed description thereof will be omitted.

본원 발명의 내용을 도면을 들어 좀 더 구체적으로 설명하면 다음과 같다.The content of the present invention will be described in more detail with reference to the drawings.

도 9는 본 발명의 일실시 예에 따른 음원 위치 추정 시스템을 나타낸 도면이다.9 is a view showing a sound source position estimation system according to an embodiment of the present invention.

도 9에 도시된 바와 같이, 본 발명에 따른 음원 위치 추정 시스템은 다수의 마이크로 폰(90)과 음원 위치 추정부(92)로 구성되며, 음원 위치 추정부(92)는 도착 지연시간 판단부(94), 위치 참조 테이블 작성부(95), 후보 TDOA 판단부(96), SRP 알고리즘 수행부(97)로 구성된다.As shown in FIG. 9, the sound source position estimation system according to the present invention includes a plurality of microphones 90 and a sound source position estimating unit 92, and the sound source position estimating unit 92 includes an arrival delay time determining unit ( 94), a position reference table creation unit 95, a candidate TDOA determination unit 96, and an SRP algorithm execution unit 97.

다수의 마이크로 폰(90)은 공간상에서 발생한 소리를 받아들이는 기능을 한다. 본 발명에서는 마이크로 폰에 도착한 음파의 시간차이를 이용하여 음원의 위치를 추정하기 때문에 적어도 2개 이상의 마이크로 폰이 필요하다.The plurality of microphones 90 function to receive sounds generated in space. In the present invention, at least two microphones are required because the position of the sound source is estimated using the time difference of the sound waves arriving at the microphone.

도착 지연시간 판단부(94)는 다수의 마이크로 폰(90) 중에서 한 쌍의 마이크로 폰에 도착한 음파의 도착 지연시간 τ를 연산하는데, 마이크로 폰이 2개 이상이므로, 2개의 마이크로 폰을 하나의 쌍으로 묶어 모든 마이크로 폰 쌍의 조합에 대하여 음파의 도착 지연시간을 연산한다. 이때, 음파의 주파수를 고려하여 도착 지연시간을 연산한다. The arrival delay time determining unit 94 calculates the arrival delay time τ of the sound waves arriving at the pair of microphones among the plurality of microphones 90. Since two or more microphones are used, the pair of two microphones is connected to one pair. Calculate the arrival delay of sound waves for all combinations of microphone pairs. At this time, the arrival delay time is calculated in consideration of the frequency of the sound wave.

위치 참조 테이블 작성부(95)는 [수학식 8]에서의 Φ(k)값 연산 뒤, 각 마이크로 폰 쌍의 조합별로 선정한 소정 개수의 τ^k값을 갖는 후보 TDOA 들의 공간상 좌표를 참조하기 위한 위치 참조 테이블을 작성하는 기능을 한다.The position reference table preparing unit 95 is configured to refer to spatial coordinates of candidate TDOAs having a predetermined number of τ ^k values selected for each combination of microphone pairs after the calculation of Φ (k) in Equation (8). Creates a location reference table.

후보 TDOA 판단부(96)는 각 마이크로 폰 쌍별로 판단된 도착 지연시간과 [수학식 8]를 이용하여 Φ(k)값을 연산하고, 각 마이크로폰 쌍별로 연산된 Φ(k)값에 서 소정 개수의 후보 TDOA들을 선별하는 기능을 한다. Φ(k)값 선별도 각 마이크로 쌍별로 수행한다.The candidate TDOA determination unit 96 calculates the value of Φ (k) using the arrival delay time determined for each microphone pair and [Equation 8], and predetermined from the value of Φ (k) calculated for each microphone pair. It functions to select a number of candidate TDOAs. Φ (k) value sorting is also performed for each micro pair.

SRP 알고리즘 수행부(97)는 후보 TDOA 판단부(96)가 선별한 후보 TDOA에 대하여 위치 참조 테이블에 근거해 공간상 좌표를 파악하고 SRP 알고리즘을 수행하는 기능을 한다. 수행 알고리즘은 SRP 알고리즘에 한정하지 아니하고 SRP-PHAT 알고리즘의 수행도 가능하다.The SRP algorithm execution unit 97 functions to identify the spatial coordinates of the candidate TDOA selected by the candidate TDOA determination unit 96 based on the position reference table and perform the SRP algorithm. The execution algorithm is not limited to the SRP algorithm, but the SRP-PHAT algorithm can be performed.

도 10은 본 발명에 이용될 도착 지연시간(Time Difference Of Arrival : TDOA)의 일실시 예를 나타낸 도면이다.FIG. 10 is a diagram illustrating an embodiment of a time difference of arrival (TDOA) to be used in the present invention.

도 10에 도시된 바와 같이, 마이크로 폰 10개를 원형으로 배열하였으며, 마이크로 폰 2와 4로 이루어진 마이크로 폰 쌍의 조합에 대하여 도착 지연시간 τ가 표시되어 있다.As shown in FIG. 10, ten microphones are arranged in a circle, and arrival delay time τ is indicated for a combination of microphone pairs consisting of microphones 2 and 4.

도 10에서는 마이크로 폰 2와 4를 조합한 쌍을 예시로 들었으나, 1과 2, 1과 3, ... 2와 3, 2와 4, ... 9와 10 등 여러 가지 마이크로 폰 쌍의 조합이 가능하다.In FIG. 10, a pair of microphones 2 and 4 is illustrated as an example, but a pair of microphones such as 1 and 2, 1 and 3, ... 2 and 3, 2 and 4, ... 9 and 10 Combination is possible.

도 11은 동일한 도착 지연시간을 갖는 지점들을 연결한 면을 나타낸 도면이다.FIG. 11 is a diagram illustrating a plane connecting points having the same arrival delay time.

3차원 공간상에서 두 개의 마이크로 폰 쌍에 도달하는 음성 간의 도착 지연시간이 동일한 지점은 여러 곳이 될 수 있다. 도 11은 마이크로 폰 2와 4에 도달하 는 음성의 도착 지연시간이 동일한 지점을 연결한 면을 개략적으로 나타낸 것이다. 소리가 발생한 위치가 다르더라도 마이크로폰에 도달한 음파의 도착 지연시간이 같을 수 있고, 동일한 위치에서 발생한 소리라도 소리의 주파수에 따라 도착 지연시간은 다른 값을 가질 수 있다. 따라서, 도 11에 도시된 면은 다른 형태로 나타날 수 있다.In three-dimensional space, there may be several places where arrival delays between voices arriving at two microphone pairs are the same. FIG. 11 schematically illustrates a surface in which the arrival delay times of the voices reaching the microphones 2 and 4 are connected. Even when the sound is generated in different locations, the arrival delay time of the sound waves reaching the microphone may be the same, and the arrival delay time may be different according to the frequency of the sound even when the sound is generated at the same location. Accordingly, the face shown in FIG. 11 may appear in other forms.

또한, 음파의 주파수를 알면 마이크로폰에 도달한 음파 간의 도달 지연시간은 음파 간의 위상차이를 이용해서도 연산할 수 있다. 따라서, i번째 마이크로폰과 j번째 마이크로폰에 도달한 음파를 이용한 위상 차이는 앞서 언급한 [수학식 7]을 이용하여 계산할 수 있을 것이다.In addition, if the frequency of sound waves is known, the arrival delay time between sound waves reaching the microphone can also be calculated using the phase difference between the sound waves. Therefore, the phase difference using the sound waves reaching the i-th microphone and the j-th microphone may be calculated using Equation 7 mentioned above.

도 12는 본 발명의 일실시 예에 따라 후보 TDOA 파악을 위해 계산한 Φ(k)값을 나타낸 도면이다.12 is a diagram illustrating a value of Φ (k) calculated for identifying a candidate TDOA according to an embodiment of the present invention.

[수학식 8] 부분에서 이미 설명한 바와 같이, Φ(k)값은 각 마이크로 쌍의 조합별로 n개의 τ^k를 선별하기 위해서 계산한다. As already described in [Equation 8], Φ (k) value is calculated to select n τ ^k for each micro pair combination.

도 12의 가로축에

값 등이 표시되어 있는데, 이 값들은 마이크로 폰 2와 4에 입력되는 음파간의 도착 지연시간들을 의미한다. 본 발명은 도착 지연시간을 이용하여 음원의 위치를 추정하는 방법에 관한 것이므로, 상기

값들은 실제 측정값이 아니라, 음원 위치 추정의 대상 공간에서 발생 가능한 도착지연시간 값들이다. 즉, 공간상의 어느 지점에서 음원이 발생하느냐에 따라 도착 지연시간이 달라질 수 있기 때문에 주파수에 따른, 발생 가능한 도착 지연시간들의 조합을 구한 것이

값들이다.On the horizontal axis of FIG.

Values, etc. are displayed, and these values indicate the arrival delay times between the sound waves input to the

microphones

2 and 4. Since the present invention relates to a method for estimating the position of a sound source using an arrival delay time,

The values are not actual measurements, but arrival delay values that can occur in the target space of the sound source position estimation. That is, since the arrival delay time may vary depending on where in the space the sound source occurs, a combination of possible arrival delay times according to frequency is obtained.

Values.

공간상에서 실제로 소리가 발생하게 되면, 그 소리가 퍼지면서 다수의 마이크로폰에 각각 도착하게 된다. 각 마이크로 폰에 도착한 음파의 도착 지연시간 τ와 [수학식 8]을 이용하여, 도 12에 도시된 예와 같은 Φ(k)값을 각 마이크로 쌍의 조합별로 구한다. 도 12에서는 마이크로폰 2와 4에 도착한 음파를 기준으로 Φ(k)값을 구한 예를 도시하였지만 1과 2, 1과 3, ... 2와 3, 2와 4, ... 9와 10 등 여러 가지 마이크로 폰 쌍의 조합별로 Φ(k)값을 구하는 연산을 수행한다.When sound actually occurs in space, the sound spreads and arrives at multiple microphones, respectively. Using the arrival delay time τ of the sound wave arriving at each microphone and [Equation 8], the value of φ (k) as shown in the example shown in FIG. 12 is obtained for each combination of the micro pairs. In FIG. 12, an example in which Φ (k) is obtained based on sound waves arriving at microphones 2 and 4 is shown. However, 1 and 2, 1 and 3, 2 and 3, 2 and 4, 9 and 10, etc. Calculate the value of Φ (k) for each combination of microphone pairs.

Φ(k) 값을 구하는 연산을 각 마이크로 쌍의 조합별로 마친 후, Φ(k)값이 크게 나타난 도착 지연시간을 선별하는 과정을 거친다. 선별 과정은 각 마이크로 쌍의 조합별로 이루어진다. 예를 들어, 각 마이크로 쌍의 조합별로 Φ(k) 값을 3개씩 선별한다고 가정하면, 마이크로 폰 1과 2의 조합 쌍에서 구한 Φ(k) 값 중 큰 값 3개를 나타내는 도착 지연시간 τ₁₂3개를 선별하고, 마이크로 폰 1과 3의 조합 쌍에서 구한 Φ(k) 값 중 큰 값 3개에 해당하는 도착 지연시간 τ₁₃3개를 선별하며, 이러한 작업을 모든 마이크로 폰 쌍의 조합별로 수행한다. 마이크로폰 10개를 사용할 경우, 45개의 마이크로 폰 쌍 조합이 생성될 수 있을 것이며, 각 조합별로 Φ(k) 값이 큰 3개의 τ값을 선별한다.After the calculation of Φ (k) is completed for each pair of micro pairs, the arrival delay time for which Φ (k) is large is selected. The screening process is done for each micro pair combination. For example, assuming that three Φ (k) values are selected for each combination of micro pairs, the arrival delay time τ ₁₂ representing three large values of Φ (k) values obtained from the combination pair of microphones 1 and 2 Three are screened, and three arrival delays τ _13, which are three of the larger Φ (k) values from the pairs of microphones 1 and 3, are screened. Perform. When 10 microphones are used, 45 microphone pair combinations may be generated, and three τ values having a large Φ (k) value are selected for each combination.

도 13은 본 발명의 일실시 예에 따른 음원 위치 추정 방법을 나타낸 흐름도이다.13 is a flowchart illustrating a sound source position estimation method according to an embodiment of the present invention.

도 13에 도시된 바와 같이, 음원 위치 추정을 하려는 공간에 다수의 마이크로폰을 배치한다(S130). 배치하는 마이크로폰의 수는 제한이 없다. 다만, 배치되는 마이크로폰의 수가 많을수록 음원 위치 추정의 정확도가 높아질 것이다. 음원의 위치를 정확하게 추정하기 위해서는 공간의 이곳저곳에 다수의 마이크로폰을 배치하는 것이 최상의 방법이겠으나, 사실상 몇 개의 마이크로폰을 묶어 하나의 모듈로 장착하는 것이 일반적이다. 예를 들어, 소리가 난 방향으로 몸을 돌리는 로봇에 음원 위치 추정 기술을 적용할 경우, 마이크로폰은 로봇의 본체에 설치되어야 할 것이며, 로봇의 본체에 직접 부착하지 않고 주변 공간에 마이크로폰을 배치하는 것은 로봇의 이동성을 저하시킬 것이다. As shown in FIG. 13, a plurality of microphones are disposed in a space where sound source positions are to be estimated (S130). There is no limit to the number of microphones placed. However, the larger the number of microphones arranged, the higher the accuracy of sound source position estimation will be. In order to accurately estimate the location of the sound source, it is best to place multiple microphones in various places in the room, but in practice, it is common to bundle several microphones and mount them as one module. For example, if sound source position technology is applied to a robot that turns in the direction of the sound, the microphone should be installed in the body of the robot, and placing the microphone in the surrounding space without attaching directly to the body of the robot Will reduce the mobility of the robot.

마이크로폰이 배치되면(S130), 음원 위치 추정을 수행할 대상 공간에서의 위치 참조 테이블을 작성한다(S131). 이후, 실제로 공간상에서 소리가 발생하면, 각 마이크로폰에 도달한 음파의 도착지연시간을 파악한다. 도착지연시간 파악에 있어, 주파수에 따라 발생 가능한 도착지연시간 값인 τ_ij(t,f)을 연산한다(S132). 상기 τ_ij(t,f) 값은 마이크로 폰의 배치 과정(S130) 전의 설정 과정에 의해 입력될 수도 있는 값이다. 즉, 발생 가능한 τ_ij(t,f) 값의 조합을 미리 입력해 둘 수도 있다. When the microphone is disposed (S130), a position reference table in the target space for performing sound source position estimation is created (S131). Then, when the sound actually occurs in the space, the arrival delay time of the sound waves that reached each microphone is determined. In determining the arrival delay time, τ _ij (t, f), which is a possible arrival delay time value according to the frequency, is calculated (S132). The value τ _ij (t, f) may be input by the setting process before the microphone placement process (S130). That is, a combination of possible values of tau _ij (t, f) may be input in advance.

실제로 발생한 소리에서 추정되는 도착 지연 시간과 [수학식 8]을 이용하여 후보 도착지연시간 파악 함수(Φ(k))에 의한 연산을 수행한다(S133). Using the arrival delay time estimated from the sound actually generated and [Equation 8], the operation by the candidate arrival delay time grasping function Φ (k) is performed (S133).

후보 도착지연시간 파악 함수(Φ(k))에 의한 연산을 수행하고(S133), 각 마이크로 폰의 조합별로 소정 개수 만큼의 후보 TDOA를 선별한다(S134). 상기 소정 개수는 조절이 가능한 값이다. 소정 개수를 크게 잡으면 후보 TDOA의 개수가 많아져서 음원 위치 추정을 위한 연산량이 많아지나 음원 위치 추정의 정확성을 높일 수 있고, 소정 개수를 작게 잡으면 후보 TDOA의 개수가 적어져 연산은 더욱 빠르게 수행할 수 있으나 음원 위치 추정의 정확성은 다소 낮아질 수 있다.An operation by the candidate arrival delay time grasping function Φ (k) is performed (S133), and a predetermined number of candidate TDOAs are selected for each combination of microphones (S134). The predetermined number is an adjustable value. If the predetermined number is large, the number of candidate TDOAs increases, thereby increasing the amount of computation for sound source position estimation, but the accuracy of sound source position estimation can be increased. However, the accuracy of sound source position estimation can be somewhat lower.

예를 들어, 상기 소정 개수를 3이라고 하면 후보 TDOA 들의 조합은 아래와 같을 것이다.For example, if the predetermined number is 3, the combination of candidate TDOAs will be as follows.

위와 같이 조합된 후보 TDOA들의 조합을 아래와 같이 재조합한다.The combination of candidate TDOAs combined as above is recombined as follows.

S131 과정에서 발생 가능한 도착 지연시간별로 위치 참조 테이블을 작성해 두었기 때문에 위와 같은 재조합 과정이 필요한 것이다.Since the location reference table is prepared for each arrival delay that may occur in S131, the above recombination process is necessary.

후보 TDOA의 선별이 이루어지면(S134), 앞서 S131 과정에서 작성해뒀던 위치 참조 테이블을 이용하여 후보 TDOA들의 공간상 좌표를 파악한다(S135). 후보 TDOA들의 좌표별로 SRP-PHAT 과정을 수행하면(S136) 공간 파워 그래프를 얻을 수 있다. SRP-PHAT 과정 수행 자체는 종래 기술에 해당하며, 본 명세서의 배경기술에도 기술하였으며, 참고 문헌도 밝힌바 자세한 설명은 생략한다. When the candidate TDOA is selected (S134), the spatial coordinates of the candidate TDOAs are grasped using the location reference table created in the previous step S131 (S135). If the SRP-PHAT process is performed for each coordinate of the candidate TDOAs (S136), a spatial power graph may be obtained. Performing the SRP-PHAT process itself corresponds to the prior art, it has also been described in the background of the present specification, the reference also revealed a detailed description is omitted.

도 13에서는 위치가 파악된 후보 TDOA들에 대해 SRP-PHAT 방법을 수행하는 것을 도시하였으나, SRP 방법을 수행할 수도 있다. 다만, 앞서 도 8에서도 살펴본 바와 같이, SRP-PHAT 방법이 음원 위치 추정에 있어 정확도가 높기 때문에 SRP-PHAT 방법을 이용한 것을 실시 예로 표시한 것이다.In FIG. 13, the SRP-PHAT method is performed on candidate TDOAs whose positions are identified, but the SRP method may be performed. However, as described above with reference to FIG. 8, since the SRP-PHAT method has high accuracy in sound source position estimation, the SRP-PHAT method is used as an example.

후보 TDOA들에 대한 SRP-PHAT 수행으로 출력 파워가 최대가 되는 지점이 검색되면(S137), 그 위치를 소리가 발생한 음원 위치로 추정한다(S138).If the point where the output power is maximized by performing the SRP-PHAT on the candidate TDOAs is found (S137), the position is estimated as the sound source position where the sound is generated (S138).

본 발명에 따른 음원 위치 추정 기술을 이용한 실질적 시뮬레이션 결과를 살펴보면 아래와 같다. Looking at the actual simulation results using the sound source position estimation technology according to the present invention.

음원 위치 탐색을 위한 검색 공간을 7m x 9m x 3m 크기의 직육면체 방으로 하고, 4개의 마이크로폰을 한 모서리가 20cm 인 정사각형의 각 꼭지점에 위치하도록 배치한다. 이때, 마이크로폰으로 입력되는 소리는 16 KHz 샘플링 주파수로 가정한다. 음원 위치를 추정이 이루어지는 공간을 작은 블록 단위로 나누어, 음원이 발생한 위치를 특정 블록 단위로 추정한다고 할 때, 각 블록의 크기를 한 모서리가 5cm, 10cm, 20cm인 입체도형으로 가정한 경우와 본 발명에 따른 음원 추정 방법을 이용한 경우를 비교한다. 시뮬레이션 결과, 음원 위치 추정을 위한 검색 후보의 수는 아래 [표 1]과 같다. The search space for sound source location is a 7m x 9m x 3m cuboid room, and four microphones are placed at each corner of a square 20cm in diameter. At this time, it is assumed that the sound input to the microphone is a 16 KHz sampling frequency. When dividing the space where the sound source location is estimated in small block units and estimating the position where the sound source is generated in a specific block unit, it is assumed that the size of each block is assumed to be a three-dimensional figure with 5 cm, 10 cm, and 20 cm in corners. The case of using the sound source estimation method according to the invention is compared. As a result of the simulation, the number of search candidates for sound source position estimation is shown in Table 1 below.

[표 1]TABLE 1

20cm20 cm 10cm10 cm 5cm5 cm 본 발명에 따른 방법Method according to the invention 검색 후보 위치의 수Number of search candidate locations 19,86719,867 167,553167,553 1,424,7131,424,713 729729

본 발명에 따른 음원 위치 추정 방법을 수행할 때, 각 마이크로폰 쌍의 조합별로 3개의 Φ(k) 값을 취하는 것으로 가정하였다. 4개의 마이크로폰을 이용하여 구성할 수 있는 마이크로폰 쌍의 조합은 6가지이고, 각 조합별로 3개의 Φ(k) 값을 취하면 가능한 TDOA 후보들의 조합은 729(3⁶)개가 된다. In performing the sound source position estimation method according to the present invention, it is assumed that three Φ (k) values are taken for each pair of microphone pairs. There are six combinations of microphone pairs that can be constructed using four microphones, and if each of three combinations takes three Φ (k) values, there are 729 (3 ⁶ ) possible combinations of TDOA candidates.

[표 1]에서 제시된 같이, 본 발명이 제안하는 음원 위치 추정 방법을 이용할 경우, 5cm, 10cm, 20cm 의 크기를 갖는 블록으로 나눠, 각 블록들에 대하여 각각의 SRP-PHAT 방법을 수행할 때보다 검색 후보의 수를 각각 99.9%, 99.5%, 96.5% 이상 줄일 수 있게 된다. As shown in Table 1, when the sound source position estimation method proposed by the present invention is used, it is divided into blocks having sizes of 5 cm, 10 cm, and 20 cm, rather than each SRP-PHAT method for each block. The number of search candidates can be reduced by more than 99.9%, 99.5%, and 96.5%, respectively.

물론, 본 발명에 따를 경우, 위치 참조 테이블을 작성해 두어야 하며, 후보 위치 파악 함수(Φ(k))를 이용한 연산이 부가적으로 필요하다. 그러나, 공간상의 모든 블록에 대한 SRP-PHAT 방법을 수행하는 것보다 훨씬 적은 계산량만으로 음원의 위치 추정이 가능하므로, 실시간 음원 위치 추정 시스템에 적용이 가능하다. Of course, in accordance with the present invention, a position reference table should be created, and an operation using the candidate position detecting function? (K) is additionally required. However, since it is possible to estimate the position of the sound source with much less calculation amount than performing the SRP-PHAT method for all blocks in space, it is applicable to the real-time sound source position estimation system.

도 1은 종래 기술에 따른 다수의 마이크로 폰의 배열 구조를 나타낸 도면.1 is a view showing an arrangement of a plurality of microphones according to the prior art.

도 2는 2차원 공간에서의 음원 발생과 도착 지연 시간을 나타낸 도면.2 is a diagram illustrating a sound source generation and arrival delay time in a two-dimensional space.

도 3은 GCC(Generalized Cross Correlation) 방법을 이용하여 도착 지연시간을 추정하는 방법을 나타낸 도면.3 is a diagram illustrating a method of estimating arrival delay time using a Generalized Cross Correlation (GCC) method.

도 4는 4개의 마이크로폰을 이용한 음원 방향 추정 방법을 모식적으로 나타낸 도면.4 is a diagram schematically illustrating a sound source direction estimation method using four microphones.

도 5는 8개의 마이크로폰을 이용한 음원 위치 추정 방법을 개략적으로 나타낸 도면.5 is a view schematically showing a sound source position estimation method using eight microphones.

도 6은 조향된 빔 형성기를 도식화하여 나타낸 도면.6 is a schematic representation of a steered beam former;

도 7은 SRP(Steered Response Power) 방법을 이용하여 측정한 공간 파워 그래프를 나타낸 도면.FIG. 7 is a diagram illustrating a spatial power graph measured using a steeped response power (SRP) method. FIG.

도 8은 음원 위치 추정 알고리즘들의 성능 비교 그래프를 나타낸 도면.8 is a graph illustrating performance comparison of sound source position estimation algorithms.

도 9는 본 발명의 일실시 예에 따른 음원 위치 추정 시스템을 나타낸 도면.9 is a view showing a sound source position estimation system according to an embodiment of the present invention.

도 10은 본 발명에 이용될 도착 지연시간(Time Difference Of Arrival : TDOA)의 일실시 예를 나타낸 도면.10 is a diagram illustrating an embodiment of a time difference of arrival (TDOA) to be used in the present invention.

도 11은 동일한 도착 지연시간을 갖는 지점들을 연결한 면을 나타낸 도면.FIG. 11 is a view showing planes connecting points having the same arrival delay time; FIG.

도 12는 본 발명의 일실시 예에 따라 후보 TDOA 파악을 위해 계산한 Φ(k)값을 나타낸 도면.12 illustrates Φ (k) value calculated for identifying candidate TDOA according to an embodiment of the present invention.

도 13은 본 발명의 일실시 예에 따른 음원 위치 추정 방법을 나타낸 흐름도.13 is a flowchart illustrating a sound source position estimation method according to an embodiment of the present invention.

Claims

By using the time difference of arrival generated when the sound wave of the sound generated in space reaches each microphone, the candidate position required for the estimation of the position of the sound source is determined. step; And

And performing a steered response power-phase transform (SRP-PHAT) algorithm on the determined candidate position.

The arrival delay time of the sound wave according to claim 1,

And a value calculated by changing a frequency estimation value of a sound wave reaching the microphone.

The method of claim 1, wherein determining the candidate position comprises:

Creating a location reference table for each arrival delay that may occur;

Calculating arrival delay time for each combination of the microphone pairs using sound waves input to the plurality of microphone pairs;

Calculating a candidate arrival delay time grasping function value for each combination of microphone pairs using the calculated arrival delay time;

Selecting a predetermined number of candidate arrival delay times for each combination of microphone pairs based on the calculated candidate arrival delay time determining function value; And

And determining the spatial coordinates corresponding to the selected candidate arrival delay time using the location reference table.

The method of claim 3, wherein the candidate arrival delay time grasping function Φ (k) is

In this equation, σ (X) is a function that outputs 1 when the value of X is true and 0 when the value of X is false, and τ _ij (t, f) is time t and frequency f days. Where τ ^k is the k-th arrival delay time of the set of arrival delays τ measurable by the two microphones.

The method of claim 3, wherein the predetermined number in the step of selecting the candidate arrival delay time,

Sound source position estimation method, characterized in that the variable.

And performing a steered response power (SRP) algorithm on the determined candidate position.

A plurality of microphones for receiving sound waves of sounds generated in space;

A candidate position of a sound source is determined using a time difference of arrival generated when the sound wave reaches the plurality of microphones, and a SRP-PHAT (Steered Response Power-phase) is determined for the determined candidate position. and a sound source position estimator for estimating the position of the sound source by performing a transform) algorithm.

The method of claim 7, wherein the sound source position estimation unit,

A position reference table creating unit for recording spatial coordinates according to possible arrival delay times in a target space to perform sound source position estimation;

An arrival delay time determining unit calculating an arrival delay time between sound waves reaching the plurality of microphones;

A candidate TDOA determination unit for selecting a predetermined number of time difference of arrivals (TDOA) for each microphone pair by using the calculated arrival delay time; And

And a SRP algorithm execution unit for performing an SRP-PHAT algorithm on each candidate TDOA by referring to spatial coordinates of the candidate TDOAs in the position reference table.

The method of claim 8, wherein the arrival delay time determination unit,

A sound source position estimation system, characterized in that the arrival delay time of the sound wave is calculated for each pair by combining two microphones of the plurality of microphones into one microphone pair combination.

The method of claim 8, wherein the arrival delay time determination unit,

In calculating the arrival delay time for each pair by combining two microphones among the plurality of microphones, the sound source position estimation system for calculating the arrival delay time by changing the frequency estimation value of the sound waves. .