KR20100001726A

KR20100001726A - Method for determining the representative point of cluster and system for sound source localization

Info

Publication number: KR20100001726A
Application number: KR1020080061749A
Authority: KR
Inventors: 김현수; 육동석; 조영규; 최우진
Original assignee: 삼성전자주식회사; 고려대학교 산학협력단
Priority date: 2008-06-27
Filing date: 2008-06-27
Publication date: 2010-01-06
Also published as: KR101483271B1

Abstract

PURPOSE: A method for determining a representative point of a cluster and a system for sound source localization are provided to improve sound source localization estimation by solving problem on a position discordance. CONSTITUTION: A plurality of coordinates is selected within a cluster area(S131). The sum total of distance with the coordinate within the same cluster is computed based on the specific coordinate and the specific coordinate in which value is minimized pitches as the representative point of cluster. The specific coordinate is selected between the coordinate of the selected multiple(S132). The sum total of distance within the same cluster is calculated based on the selected specific coordinate(S133). The sum total of distance with coordinates is calculated by varying the specific coordinate. The specific coordinate in which the sum total of distance was minimized pitches as the representative point of cluster(S134).

Description

Representative point selection method for sound source position estimation and sound source position estimation system using the method {Method for Determining the Representative Point of Cluster and System for Sound Source Localization}

본 발명은 다수의 마이크로폰(microphone)을 이용하여 음원(音源, Sound Source)의 위치를 추정하는 방법 및 그 방법을 이용한 음원 위치 추정 시스템에 관한 것이다.The present invention relates to a method for estimating the position of a sound source using a plurality of microphones, and a sound source position estimation system using the method.

마이크로폰에 입력되는 소리의 파워, 시간차, 위상 등을 이용하여 음원의 위치를 추정할 수 있는데, 마이크로폰 하나만으로는 소리가 발생한 위치나 방향을 정확히 추정하기 어렵기 때문에 다수의 마이크로폰을 이용한다.The position of the sound source can be estimated by using the power, time difference, phase, etc. of the sound input to the microphone. Since a single microphone cannot accurately estimate the position or direction of the sound, a plurality of microphones are used.

도 1은 종래 기술에 따른 다수의 마이크로 폰의 배열 구조를 나타낸 도면이다. 도 1 (a)는 마이크로폰을 선형으로 배열한 것을 나타낸 도면이며, 도 1 (b)는 마이크로폰을 원형으로 배열한 것을 나타낸 도면이며, 도 1 (c)는 100개 이상의 마이크로폰을 배열한 것을 나타낸 도면이다.1 is a view showing the arrangement of a plurality of microphones according to the prior art. Figure 1 (a) is a view showing the arrangement of the microphone linearly, Figure 1 (b) is a view showing the arrangement of the microphone in a circle, Figure 1 (c) is a view showing the arrangement of more than 100 microphones to be.

마이크로폰 배열은 여러 어플리케이션(application)에 적용 가능한데 특정 화자나 특정 위치에서 발생하는 소리만을 증폭 시켜주는 음원 증폭 (sound enhancement) 기술, 화자가 말을 하면 그 위치를 추적하는 음원 위치 추정 (sound source localization) 기술, 그리고 여러 화자가 동시에 말을 하여 혼합된 음성을 각 화자의 음성별로 분리시켜주는 음원 분리 (source separation) 기술 등에 사용 가능하다.Microphone arrays can be applied to many applications, including sound enhancement techniques that amplify only the sound that occurs at a specific speaker or location, and sound source localization that tracks the location as the speaker speaks. Technology, and a source separation technology that separates a mixed voice by each speaker's voice by speaking several speakers at the same time.

위의 음원 위치 추정 기술에서 가장 중요한 것은 추정 성능인데, 마이크로폰의 성능과 개수, 마이크로폰의 배치, 노이즈(noise)와 반향(echo)의 정도, 발성하는 화자의 수 등과 같은 요소들에 의해 추정 성능의 저하가 발생한다.The most important aspect of the sound source position estimation technique above is estimation performance, which is determined by factors such as microphone performance and number, microphone placement, noise and echo, and number of speakers. Degradation occurs.

즉, 마이크로폰의 성능이 우수하거나 마이크로폰의 수가 많을 때 음원 위치 추정 성능은 좋아지나 노이즈와 반향이 클수록 추정 성능은 낮아진다. 또한 특정 어플리케이션에 적합한 마이크로폰 배치를 통해서 성능을 높일 수 있으나 발성하는 화자의 수가 증가할수록 모호성이 커지기 때문에 성능은 저하된다. That is, when the microphone performance is excellent or the number of microphones is large, the sound source position estimation performance is improved, but the larger the noise and echo, the lower the estimation performance. In addition, performance can be improved through microphone placement suitable for a specific application, but performance decreases as ambiguity increases as the number of talkers increases.

음원의 위치 추정 성능을 높이기 위해 마이크로폰의 수를 늘리면 되지만, 모든 어플리케이션에 많은 수의 마이크로폰을 배치할 수가 없다. 따라서 비교적 적은 수(4~8개)의 마이크로폰으로도 음원 추정 성능을 보장하기 위해 마이크로폰의 적절한 배치 방법 및 각 어플리케이션에 적합한 음원 위치 추정 알고리즘을 선택해야 한다.You can increase the number of microphones to improve the position estimation performance of the sound source, but you cannot place a large number of microphones in every application. Therefore, in order to ensure sound source estimation performance with a relatively small number of microphones (4 to 8), it is necessary to select an appropriate placement method of the microphone and a sound source position estimation algorithm suitable for each application.

3차원 공간에서 마이크로폰 배열을 통한 음원 위치 추정 기술은 아래와 같이 크게 3가지로 요약된다.The sound source position estimation technique using the microphone array in the three-dimensional space is summarized into three types as follows.

1. 도착 지연 시간(time difference of arrival: TDOA)을 이용한 방법1. Method using time difference of arrival (TDOA)

2. 조향된 빔형성기(steered beam former)를 이용한 방법2. Method using steered beam former

3. 고해상도 스펙트럼 추정을 이용한 방법3. Method using high resolution spectral estimation

위와 같은 음원 위치 추정 기술과 관련된 설명을 참조할 수 있는 문헌은 아래와 같다. 또한, 본원 발명의 수행 과정에서 사용하게 될 SRP(Steered Response Power), SRP-PHAT(Steered Response Power - phase transform) 알고리즘에 관한 설명도 담겨있다.Documents that may refer to the description related to the sound source position estimation technique as described above are as follows. In addition, it contains a description of the Steeped Response Power (SRP), Steeped Response Power-phase transform (SRP-PHAT) algorithm to be used in the process of the present invention.

[1] M. Omologo and P. Svaizer, “Use of the crosspower-spectrum phase in acoustic event location”, IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, pp. 288-292, 1997.[1] M. Omologo and P. Svaizer, “Use of the crosspower-spectrum phase in acoustic event location”, IEEE Transactions on Speech and Audio Processing, vol. 5, no. 3, pp. 288-292, 1997.

[2] R. Ben, K. Waleed, and S. Claude, “Real time robot audition system incorporating both 3D sound source localisation and voice characterization”, IEEE International Conference on Robotics and Automation, pp. 4733-4738, 2007.[2] R. Ben, K. Waleed, and S. Claude, “Real time robot audition system incorporating both 3D sound source localization and voice characterization”, IEEE International Conference on Robotics and Automation, pp. 4733-4738, 2007.

[3] M. Togami, T. Sumiyoshi, and A. Amano, “Stepwise phase difference restoration method for sound source localization using multiple microphone pairs”, International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 117-120, 2007.[3] M. Togami, T. Sumiyoshi, and A. Amano, “Stepwise phase difference restoration method for sound source localization using multiple microphone pairs”, International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 117-120, 2007.

[4] J. H. DiBiase, “A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays”, Ph.D. thesis, Brown University, 2000.[4] J. H. DiBiase, “A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays”, Ph.D. thesis, Brown University, 2000.

[5] R. Schmidt, "Multiple emitter location and signal parameter estimation", IEEE transactions on antennas and propagation, vol. 34, pp. 276-280, 1986.[5] R. Schmidt, "Multiple emitter location and signal parameter estimation", IEEE transactions on antennas and propagation, vol. 34, pp. 276-280, 1986.

[6] A. Johansson, G. Cook, S. Nordholm, “Acoustic direction of arrival estimation a comparison between ROOT-MUSIC and SRP-PHAT”, IEEE Region 10 Conference on Convergent Technologies for the Asia-Pacific, vol. B, pp. 629-632, 2004.[6] A. Johansson, G. Cook, S. Nordholm, “Acoustic direction of arrival estimation a comparison between ROOT-MUSIC and SRP-PHAT”, IEEE Region 10 Conference on Convergent Technologies for the Asia-Pacific, vol. B, pp. 629-632, 2004.

위의 세 가지 음원 위치 추정 기술 중, 본원 발명과 관련성이 있는 위 1, 2의 방법 위주로 설명하고자 하며, 우선, 도착 지연 시간을 이용한 방법에 대해 도면을 들어 설명하면 다음과 같다. Of the above three sound source position estimation techniques, it will be described mainly in the above method 1 and 2 related to the present invention. First, the method using the arrival delay time will be described with reference to the drawings.

도 2는 2차원 공간에서의 음원 발생과 도착 지연 시간을 나타낸 도면이다.2 is a diagram illustrating a sound source generation and arrival delay time in a two-dimensional space.

도 2에 도시된 바와 같이, 공간상의 특정 부분에 위치한 음원(20)에서 발생한 소리가 두 마이크로폰(22, 24)에 평면적으로 입력된다. 2차원 공간으로 가정하였으므로, 소리가 평면적으로 입력되는 것이다. 도 2에서, 음원(20)과의 거리가 좀 더 가까운 마이크로폰2(24)에 소리가 먼저 도달하게 되고 마이크로폰1(22)에는 도착 지연 시간 τ만큼 늦게 도착하게 된다. As shown in FIG. 2, the sound generated from the sound source 20 located in a specific part of the space is input to the two microphones 22 and 24 planarly. Since it is assumed to be a two-dimensional space, the sound is input in a plane. In FIG. 2, the sound reaches the microphone 2 24, which is closer to the sound source 20, and arrives at the microphone 1 22 as late as the arrival delay time τ.

도 2에서, 우리가 알고자 하는 것은 음원의 방향 즉, 두 마이크로폰(22, 24)과 음원(20) 간의 각도 θ값을 알고자 하는 것이다. 도 2에서 두 마이크로폰 사이 의 거리(d_mic)와 도착 지연시간인 τ만큼 음원이 진행한 거리를 이용하여 sin 함수 식으로 표현하면 [수학식 1]을 얻을 수 있다.In Figure 2, what we want to know is to know the direction of the sound source, that is, the angle θ value between the two microphones 22 and 24 and the sound source 20. In FIG. 2, the expression (1) can be obtained by using the distance between the two microphones (d _mic ) and the distance traveled by the sound source by the arrival delay time τ.

c는 소리의 속도이다. 위 [수학식 1]을 아래 [수학식 2]와 같이 변형하면 두 마이크로 폰과 음원 간의 각도 θ를 얻을 수 있다.c is the speed of sound. By modifying Equation 1 above as Equation 2 below, an angle θ between two microphones and a sound source can be obtained.

[수학식 2]에서 소리의 속도 c, 두 마이크로폰 사이의 거리 d_mic는 알고 있는 값이지만, 도착 지연 시간 τ는 알지 못하는 값이다. τ를 계산하는 방법 중에 하나가 GCC(Generalized Cross Correlation) 방법이다. In Equation 2, the speed c of sound and the distance d _mic between two microphones are known, but the arrival delay time τ is unknown. One method of calculating τ is the Generalized Cross Correlation (GCC) method.

도 3은 GCC(Generalized Cross Correlation) 방법을 이용하여 도착 지연시간을 추정하는 방법을 나타낸 도면이다. 도 3 (a)는 두 마이크로폰(22, 24)에 입력된 신호를 나타낸 도면이고, 도 3 (b)는 두 마이크로폰에 입력된 신호의 상호 상관(cross correlation) 연산 결과를 나타낸 도면이다.3 is a diagram illustrating a method of estimating arrival delay time using a Generalized Cross Correlation (GCC) method. FIG. 3A is a diagram illustrating signals input to two microphones 22 and 24, and FIG. 3B is a diagram showing cross correlation calculation results of signals input to two microphones.

GCC 방법은 두 마이크로폰에 들어온 신호간의 상호 상관을 이용한 방법으로, 두 신호를 시간 축으로 이동하면서 상호 상관 연산을 수행하여 그 값이 가장 최대가 될 때의 이동 값을 도착 지연 시간으로 추정한다.The GCC method uses cross-correlation between signals input to two microphones, and performs a cross-correlation operation while moving two signals on a time axis, and estimates a shift value when the value reaches the maximum as an arrival delay time.

도 3 (a)와 같이 두 마이크로폰(22, 24) 간에 +2초 지연된 신호가 들어왔다고 할 때 두 신호를 시간 축으로 이동하면서 상호 상관 연산을 수행하면 도 3 (b)와 같은 그래프를 얻을 수 있다. 도 3 (b)에 도시된 바와 같이, -2의 값에서 가장 높은 값을 보이므로 도착 지연 시간은 -2가 되는 것을 알 수 있다.When a signal delayed by +2 seconds between two microphones 22 and 24 is inputted as shown in FIG. 3 (a), a cross-correlation operation is performed while moving the two signals on the time axis, thereby obtaining a graph as shown in FIG. have. As shown in (b) of FIG. 3, since the highest value is shown in the value of -2, the arrival delay time is -2.

도 3 (a), 도 3 (b)에 예시된 방법을 이용하여 도착 지연시간인 τ를 구하여 [수학식 2]에 대입하면 마이크로 폰(22, 24)과 음원(20) 간의 각도 θ값을 추정할 수 있게 된다.By using the method illustrated in FIGS. 3 (a) and 3 (b), the arrival delay time τ is obtained and substituted into Equation 2 to obtain the angle θ value between the microphones 22 and 24 and the sound source 20. It can be estimated.

도 4는 4개의 마이크로 폰을 이용한 음원 방향 추정 방법을 모식적으로 나타낸 도면이다.4 is a diagram schematically illustrating a sound source direction estimation method using four microphones.

음원이 발생하였을 때 측정된 두 마이크로폰 사이의 도착 지연 시간을 τ라고 할 때, 3차원 공간상에서 도착 지연 시간이 τ가 될 수 있는 모든 점들을 찍으면 하나의 하이퍼볼릭(hyperbolic)이 만들어지게 된다. 2개의 마이크로폰을 더 이용하여 총 4개의 마이크로폰을 이용하면 도 4에 도시된 바와 같이 2개의 하이퍼볼릭이 형성된다. 도 4에 도시된 바와 같이, 위 2개의 하이퍼볼릭 면이 만나는 부분에 직선이 형성되는데 이 방향을 3차원 공간상의 음원 방향이라고 추정한다.When the arrival delay time between two microphones measured when a sound source is generated is τ, a hyperbolic is made by taking all the points where the arrival delay time can be τ in three-dimensional space. Using two more microphones, using a total of four microphones, two hyperbolic is formed as shown in FIG. As shown in FIG. 4, a straight line is formed at a portion where the two hyperbolic surfaces meet, and this direction is assumed to be a sound source direction in three-dimensional space.

그러나, 도 4와 같이 마이크로폰 4개를 이용하면 음원의 방향만 알 수 있고 음원이 얼마나 멀리 떨어져 있는지는 알지 못한다. 또한, 2개의 하이퍼볼릭에 의해 생성되는 직선이 2개이기 때문에 소리가 발생한 방향이 앞인지 뒤인지를 알 수 없다. 이러한 문제점을 해결하기 위해 마이크로폰의 수를 늘려야하며, 이는 도 5에 서 설명한다.However, using four microphones as shown in Figure 4 can only know the direction of the sound source and does not know how far apart. In addition, since there are two straight lines generated by two hyperbolic, it is not possible to know whether the direction in which the sound is generated is forward or backward. In order to solve this problem, the number of microphones should be increased, which will be described in FIG. 5.

도 5는 8개의 마이크로폰을 이용한 음원 위치 추정 방법을 개략적으로 나타낸 도면이다.5 is a diagram schematically illustrating a sound source position estimation method using eight microphones.

도 5에 도시된 바와 같이, 8개의 마이크로폰을 이용하면 4개의 마이크로폰 쌍마다 2개의 직선이 만들어지고, 이중 두 직선이 만나는 한 점 또는 두 직선이 가장 근접하는 한 점이 존재하게 되는데 이 점을 3차원 공간상의 음원이 발생한 위치로 추정한다. As shown in FIG. 5, when eight microphones are used, two straight lines are created for every four pairs of microphones, and one of the two straight lines meets or one of the two closest lines exists. We estimate it as the location where sound source occurred in space.

그러나, 도 5에 도시한 방법에 의할 경우, 물리적으로 분리된 두 개의 영역에 마이크로폰을 위치시켜야 하기 때문에 하나의 장치로 도 5에 도시된 방법을 구현하기가 어렵다는 단점이 있다. 즉, 4개의 마이크로 폰이 하나의 세트일 때, 그러한 마이크로 폰 세트를 장착한 음원 위치 추정 장치를 2개나 설치하여야 하며, 공간상으로 분리시켜 설치하여야 한다. 또한, 동시에 여러 화자가 발성할 때 음원 위치 추정 성능이 급격히 떨어진다는 단점도 존재한다.However, the method shown in FIG. 5 has a disadvantage in that it is difficult to implement the method shown in FIG. 5 with a single device because the microphone must be located in two physically separated areas. That is, when four microphones are one set, two sound source position estimation apparatuses equipped with such microphone sets should be installed and separated into spaces. In addition, there is a disadvantage that the sound source position estimation performance is sharply degraded when several speakers are speaking at the same time.

앞서 언급한 마이크로폰 배열을 통한 음원 위치 추정 기술 중 조향된 빔 형성기(steered beam former)를 이용한 방법은 아래와 같다.A method using a steered beam former among the sound source position estimation techniques using the microphone array described above is as follows.

도 6은 조향된 빔 형성기를 도수학식화하여 나타낸 도면이다.6 is a diagram illustrating the steered beamformer.

도 6에 도시된 바와 같이, 조향된 빔 형성기를 이용한 음원 위치 추정 방법은, 임의의 한 점에서 소리가 발생했을 때 여러 개의 마이크로폰을 통해 들어온 각각의 신호를 더하여 어떠한 곳에서 소리가 발생했는지 알아내는 방법이다.As shown in FIG. 6, the sound source position estimation method using a steered beamformer adds signals from multiple microphones when sound is generated at an arbitrary point to find out where the sound is generated. It is a way.

각각의 신호를 더할 때, 여러 가지 조작을 가할 수 있는데, 조작의 방법에 따라 Delay-and-sum beam forming 방수학식과 Weight-and-sum beam forming 방수학식으로 구분해 볼 수 있다.When adding each signal, various manipulations can be applied, which can be classified into Delay-and-sum beam forming waterproof formula and Weight-and-sum beam forming waterproof formula.

우선, Delay-and-sum beam forming 방수학식에 대해 설명하면 다음과 같다. 3차원 공간의 임의의 한 점 q_s에서 음원이 발생했다고 가정했을 때, q_s와 각각의 마이크로폰과의 거리는 모두 다르다. 즉, 임의의 한 점 q_s에서 n번 마이크로폰에 도착하는 신호의 도착 지연시간을 τ_n이라고 했을 때, 도착 지연 시간을 고려하여 수식으로 나타내면 아래와 같다.First, the Delay-and-sum beam forming waterproof equation will be described. Assuming that a sound source occurs at any point q _s in three-dimensional space, the distance between q _s and each microphone is different. That is, assuming that the arrival delay time of the signal arriving at the n-th microphone at any one point q _s is τ _n , it is expressed by the equation considering the arrival delay time.

각 마이크로 폰에 음원이 도달하는 도착 지연시간을 고려하여 시간 동기(synchronization)를 맞춰주기 때문에, 주변 잡음은 줄여주고 원래의 신호는 증폭시켜줄 수 있다.Each microphone is time-synchronized to account for the arrival delay of the sound source, which reduces ambient noise and amplifies the original signal.

Weight-and-sum beam forming 방식은, 상기 Delay-and-sum beam forming 방법에 하나의 연산을 더 추가한 것이다. 입력 장치의 특성이나, 마이크로폰과 음원 과의 위치로 인해, 특정 마이크로폰은 더 신뢰할 수 있는 정보를 가지게 되고, 또 다른 마이크로폰은 특정 마이크로폰에 비해 덜 신뢰할만한 정보를 담고 있을 수 있다.The weight-and-sum beam forming method adds one operation to the delay-and-sum beam forming method. Due to the nature of the input device or the location of the microphone and the sound source, certain microphones may have more reliable information, and another microphone may contain less reliable information than certain microphones.

예를 들어, 8개의 마이크로폰이 있을 때, 1번 마이크로폰의 성능이 가장 뛰어나거나, 마이크로폰이 1번이 음원에 직접적으로 노출되어 있는 경우, 1번 마이크로폰은 다른 7개의 마이크로폰에 비해 상대적으로 중요한 정보를 담고 있을 가능성이 크다. 이와 같은 경우, 각 마이크로폰에 입력된 신호의 연산시, 1번 마이크로폰으로 입력된 신호에 가중치(weight)를 둠으로서 성능의 향상을 기대할 수 있는데, 이를 수식으로 나타내면 다음과 같다.For example, if you have eight microphones, microphone 1 has the best performance, or if microphone 1 is directly exposed to the sound source, microphone 1 provides more important information than the other seven microphones. It is likely to contain. In such a case, when the signal input to each microphone is calculated, performance can be expected to be improved by giving a weight to the signal input to the first microphone, which is expressed as follows.

앞서 소개한 Delay-and-sum beam forming 방법과 비교시, G_n(ω)라는 weight filter가 추가되어 있는 것을 확인할 수 있다. 또한, [수학식 4]는 [수학식 3]과 달리 대문자로 구성되어 있으며, 이는 주파수 도메인(domain) 상에서 표현되었음을 나타낸다. 퓨리에 변환(fourier transform)을 통해 신호의 축을 시간 축에서 주파수 축으로 변환한 것이며, 시간 축에 비해 주파수 축이 신호에 조작을 가해주기 쉽기 때문이다.Compared with the delay-and-sum beam forming method described above, it can be seen that a weight filter called G _n (ω) is added. In addition, unlike Equation 3, Equation 4 is composed of capital letters, indicating that it is expressed in the frequency domain. This is because the Fourier transform transforms the signal axis from the time axis to the frequency axis, and the frequency axis is easier to manipulate the signal than the time axis.

도 7은 SRP(Steered Response Power) 방법을 이용하여 측정한 공간 파워 그 래프를 나타낸 도면이다.FIG. 7 is a diagram illustrating a spatial power graph measured using a steered response power (SRP) method. FIG.

빔 형성기(beam former)로 받아들인 신호를 이용하여 음원의 위치 추적을 하기 위하여 사용하는 방법 중 하나가 SRP(Steered response power)이다. SRP는 공간상의 모든 위치에 대하여 빔 형성기의 출력 파워를 구하고, 출력 파워가 최대가 되는 위치를 음원의 발생 위치로 추정하는 방법이다. 빔 형성기의 출력 파워를 계산하는 수학식은 아래와 같다.One of the methods used to track the position of a sound source using a signal received by a beam former is a steep response power (SRP). SRP is a method of obtaining the output power of the beam former for all positions in space, and estimating the position where the output power is maximum as the generation position of the sound source. Equation for calculating the output power of the beam former is as follows.

위 수학식에서 Ψ_lk(ω)는 가중치 함수이며,

와 같다.In the above equation, Ψ _lk (ω) is a weight function,

Same as

위 수학식을 이용하여 음원을 추정하려는 공간상의 모든 점에 대하여 파워를 구한 것의 도시한 예가 도 7이다. 도 7에서, 가장 많이 솟아오른 부분이 출력 파워가 높게 연산된 지점의 좌표에 해당한다.FIG. 7 illustrates an example of obtaining power for all points in space where a sound source is estimated using the above equation. In FIG. 7, the most raised portion corresponds to the coordinate of the point where the output power is calculated to be high.

출력 파워가 최대가 되는 위치를 q_s'라 하면, q_s'는 아래 [수학식 6]에 의해 구할 수 있다.If the position where the output power is maximum is q _s ', q _s ' can be obtained by the following Equation 6.

도 8은 음원 위치 추정 알고리즘들의 성능 비교 그래프를 나타낸 도면이다.8 is a graph illustrating performance comparison of sound source position estimation algorithms.

도 8에서 비교된 알고리즘들은 SRP(Steered Response Power), GCC-PHAT (Generalized Cross Correlation - PHAse Transform), SRP-PHAT (Steered Response Power - PHAse Transform) 이다.The algorithms compared in FIG. 8 are Steered Response Power (SRP), Generalized Cross Correlation-PHAse Transform (GCC-PHAT), and Steeped Response Power-PHAse Transform (SRP-PHAT).

앞서 설명한 바와 같이, SRP 방식은 공간상의 모든 위치에 대하여 빔 형성기의 출력 파워를 구하고, 출력 파워가 최대가 되는 위치를 음원의 발생 위치로 추정하는 방법이다. SRP-PHAT 방식은 SRP 알고리즘에 phase transform(PHAT) 필터를 적용한 방식이며, 검색하고자 하는 공간을 일정한 크기의 작을 블록들로 나누어 각 블록들의 위치에서의 출력 파워를 계산하여 가장 큰 출력 파워를 나타내는 블록 위치에서 음원이 발생했다고 추정하는 방법이다.As described above, the SRP method is a method of obtaining the output power of the beam former for all positions in the space, and estimating the position where the output power is maximum as the generation position of the sound source. The SRP-PHAT method is a method of applying a phase transform (PHAT) filter to the SRP algorithm. The SRP-PHAT method divides the space to be searched into small blocks having a predetermined size and calculates the output power at the position of each block to represent the largest output power. This is a method for estimating that a sound source has occurred at a location.

SRP 방식이나 SRP-PHAT 방식은 공간상의 모든 블록에서의 출력 파워를 계산해야 하는 것이므로, 실시간 시스템에서는 사용하기 힘들 정도의 많은 계산 량이 요구된다. 따라서 실시간 시스템에서 사용하기 위해 계산 량을 줄이는 방법이 모색되었고, 공간을 다수의 블록으로 클러스터링 하는 방법이 그 중 하나이다.Since the SRP method or the SRP-PHAT method needs to calculate the output power of every block in space, it requires a large amount of computation that is difficult to use in a real-time system. Therefore, a method to reduce the computational amount for use in a real-time system has been sought, and one of them is to cluster the space into multiple blocks.

GCC-PHAT 방식은 마이크로폰에 들어온 신호간의 상호 상관(cross correlation)을 이용하여 도착 지연시간을 구한 다음 weight filter와 같은 부가 연산을 수행한 방식이다.The GCC-PHAT method is a method in which an arrival delay time is obtained by using cross correlation between signals input to a microphone, and then an additional operation such as a weight filter is performed.

도 8의 가로축(x축)은 에러 임계값이며 세로축(y축)은 각 에러 임계값에 따른 음원 위치 추정 에러율(error rate)을 나타낸다. 도 8에서, 에러 임계값이 0.5미터일 때(즉, 추정한 음원 위치와 실제 음원 위치가 0.5미터 내에 해당하면 음원의 위치를 정확하게 추정한 것으로 간주), GCC-PHAT 방식의 에러율이 70%이고, SRP방식은 약 24%, SRP-PHAT 방식은 0%의 에러율을 갖는다. 도 8에서 확인할 수 있듯이, 일반적으로 GCC-PHAT보다 SRP가 더 우수한 성능을 보이며, SRP보다 SRP-PHAT가 더 좋은 성능을 보임을 알 수 있다.The horizontal axis (x axis) of FIG. 8 represents an error threshold value, and the vertical axis (y axis) represents a sound source position estimation error rate according to each error threshold value. In FIG. 8, when the error threshold value is 0.5 meters (that is, when the estimated sound source position and the actual sound source position are within 0.5 meters, the position of the sound source is accurately estimated), the error rate of the GCC-PHAT method is 70%. The SRP method has an error rate of about 24% and the SRP-PHAT method has a 0% error rate. As can be seen in Figure 8, in general, the SRP is better than the GCC-PHAT performance, it can be seen that the SRP-PHAT is better than the SRP.

도 9는 종래 기술에 따른 클러스터링(clustering)의 대표 점 선정 방법을 나타낸 도면이다.9 is a diagram illustrating a representative point selection method of clustering according to the related art.

GCC-PHAT 방식보다 에러율이 작은 SRP 방식이나 SRP-PHAT 방식을 사용할 경우, 음원의 위치 추정 대상 공간을 다수의 블록 공간으로 쪼개는 클러스터링 작업을 수행하게 된다. 다수의 블록 공간으로 나누는 기준으로 여러 가지가 사용될 수 있으나, 도착지연시간이 같은 지점들을 하나의 블록 공간으로 나누는 클러스터링 방법이 많이 사용되고 있다.When using an SRP method or an SRP-PHAT method having a smaller error rate than the GCC-PHAT method, a clustering operation for splitting a location of a sound source into a plurality of block spaces is performed. As a criterion for dividing into a plurality of block spaces, various methods can be used, but a clustering method of dividing points having the same arrival delay time into one block space has been widely used.

공간상의 특정 위치에 대해 SRP, SRP-PHAT 방식을 적용하기 위해서는 클러스터링 된 블록의 좌표를 기록해 두어야 하며, 기록하는 좌표를 해당 블록의 대표 점으로 한다. 대표 점을 선정하는 방식으로 가장 일반적으로 사용하는 방식은 무게중 심을 이용하는 것이다.In order to apply the SRP and SRP-PHAT methods to a specific location in space, the coordinates of the clustered blocks must be recorded and the recorded coordinates are the representative points of the block. The most common method of selecting representative points is to use the center of gravity.

도 9는 클러스터링 된 블록의 일 예로 삼각형을 나타낸 것이다. 공간상에서 음원의 위치 추정이 이루어지므로, 클러스터링 된 블록도 3차원의 입체이겠으나 설명상 편의를 위해 도 9에서는 2차원의 삼각형으로 도시하였다.9 illustrates a triangle as an example of a clustered block. Since the location of the sound source is estimated in space, the clustered blocks may be three-dimensional (3D), but are illustrated as two-dimensional triangles in FIG. 9 for convenience of description.

도 9에 도시된 바와 같이, 삼각형 각 변의 중점과 상대편 꼭지점을 이은 선분들이 교차하는 점을 삼각형의 무게중심으로 한다. 이후, 무게중심의 좌표를 클러스터링 된 블록의 대표 점으로 삼는다.As shown in FIG. 9, the point where the line segments connecting the midpoint of each side of the triangle and the opposite vertex intersect is the center of gravity of the triangle. The coordinates of the center of gravity are then taken as representative points of the clustered blocks.

도 10은 종래 기술에 따른 대표 점 선정 방식의 문제점을 나타낸 도면이다.10 is a view showing a problem of the representative point selection method according to the prior art.

특정 마이크로폰 쌍 간에 발생하는 도착 지연 시간이 동일한 공간 블록의 모양은 다양하게 생성될 수 있다. 따라서, 도 10에 도시된 바와 같이 클러스터링 된 공간 블록이 형성된다면, 해당 블록의 무게중심은 블록의 바깥쪽에 형성된다. 도 10에서는 쐐기 모양의 공간 블록을 예로 들어 설명하였으나, 블록 바깥쪽에 무게중심이 형성되는 공간 블록 모양은 다양하게 존재할 수 있다.Shapes of spatial blocks having the same arrival delay time between specific microphone pairs may be generated in various ways. Thus, if a clustered space block is formed as shown in FIG. 10, the center of gravity of the block is formed outside the block. In FIG. 10, the wedge-shaped spatial block has been described as an example, but the spatial block shape in which the center of gravity is formed outside the block may exist in various ways.

블록의 바깥쪽에 대표 점이 형성될 경우, 대표 점은 해당 블록의 위치를 대표하기에는 미흡하다는 문제점이 발생한다. SRP 또는 SRP-PHAT 방법을 사용하여 클러스터링 된 공간상의 각 블록에 대해 도 7과 같은 출력 파워를 구하였을 때, 출력파워가 최대로 나온 지점과 무게중심으로 대표되는 특정 블록의 위치와 차이가 발생하기 쉽기 때문이다.If a representative point is formed on the outside of the block, the representative point is insufficient to represent the position of the block. When the output power as shown in FIG. 7 is obtained for each block in the clustered space using the SRP or SRP-PHAT method, the position and difference of the specific block represented by the point where the output power is maximized and the center of gravity occur. Because it is easy.

도 11은 종래 기술에 따른 대표 점 선정 방식의 문제점을 나타낸 또 다른 도면이다.11 is another diagram illustrating a problem of the representative point selection method according to the prior art.

도 11에 도시된 바와 같이, 4개의 마이크로폰과 클러스터 a, b, c가 도시되어 있다. 도 11에 도시된 마이크로폰에서 1과 2가 검게 칠해져 있는 것은 1에서 4까지의 마이크로폰들 중에서 1과 2를 기준으로 한 도착 지연시간에 근거하여 공간을 클러스터링하였기 때문이다.As shown in FIG. 11, four microphones and clusters a, b, c are shown. 11 and 2 are black in the microphone illustrated in FIG. 11 because the spaces are clustered based on arrival delays based on 1 and 2 among the 1 to 4 microphones.

공간상에 도 11과 같은 모양의 클러스터가 형성되었을 때, 클러스터 a, b, c의 무게중심을 구하여 각각 대표 점 a, b, c라 할 때, 각 클러스터의 대표 점들은 해당 클러스터의 영역 밖에 존재한다. 클러스터 a의 대표 점 a는 클러스터 b의 영역 내에 존재하고, 클러스터 b의 대표 점 b는 클러스터 c의 영역 내에 존재한다.When a cluster having a shape like that shown in FIG. 11 is formed in a space, the centers of clusters a, b, and c are obtained, and the representative points a, b, and c are respectively located outside the region of the cluster. do. Representative point a of cluster a exists in the region of cluster b, and representative point b of cluster b exists in the region of cluster c.

도 11에 도시된 모양의 클러스터 및 대표 점 선정방식에 의할 때, 음원 위치 추정 과정을 설명하면 다음과 같다.According to the cluster and representative point selection method of the shape shown in Figure 11, the sound source position estimation process will be described as follows.

우선, 각 클러스터의 무게중심을 각 클러스터의 대표 점으로 하여 위치 참조 테이블에 저장한다. 이후, 공간상의 특정 지점에서 소리(sound)가 발생하면, 다수의 마이크로폰에 도달한 음파 간의 도착지연시간(TDOA)를 각 마이크로폰 쌍마다 측정한다. 측정된 도착지연시간을 근거로 위치 참조 테이블을 참고하여 각 클러스터의 대표 점을 알아내고, 알아낸 각 대표 점에 대하여 SRP 또는 SRP-PHAT 방법을 수행하여 출력 파워가 가장 높은 대표 점의 위치를 선별한다. 이후, 선별된 대표 점의 위치가 소리가 발생한 위치로 추정된다. 도 11에서, 출력 파워가 가장 높아서 음원의 발생 위치로 대표 점 b가 선정되었다고 하자. 실제로 소리가 발생한 위치는 클러스터 b의 내부에 존재할 것이나, 대표 점 b는 클러스터 c의 내부에 존재하므로 음원 위치 추정 알고리즘에 의해 최종적으로 추정된 음원의 위치는 클러스터 c의 영역에 존재하게 되는 문제점이 발생한다. 즉, 클러스터의 대표 점을 무게중심으로 구하는 방법은 실제의 음원 발생 위치와 차이를 발생시켜 음원 위치 추정의 정확도를 저하시킨다. First, the center of gravity of each cluster is stored in the position reference table as the representative point of each cluster. Then, when a sound occurs at a specific point in space, the arrival delay time (TDOA) between the sound waves reaching the plurality of microphones is measured for each microphone pair. Based on the measured arrival delay time, find out the representative point of each cluster by referring to the location reference table, and select the location of the representative point with the highest output power by performing SRP or SRP-PHAT method on each found point. do. Then, the position of the selected representative point is estimated as the position where the sound is generated. In Fig. 11, it is assumed that the representative point b is selected as the generation position of the sound source because the output power is the highest. Actually, the position where the sound is generated will exist inside the cluster b, but since the representative point b exists inside the cluster c, the problem is that the position of the sound source finally estimated by the sound source position estimation algorithm will exist in the region of the cluster c. do. That is, the method of obtaining the representative point of the cluster with the center of gravity causes a difference from the actual sound source generation position, thereby degrading the accuracy of the sound source position estimation.

따라서, 본 발명의 목적은 음원의 위치 추정시 SRP 알고리즘이나 SRP-PHAT 알고리즘 등을 수행함에 있어, 연산을 수행하는 대표 점과 클러스터를 이루는 도형 간의 위치 불일치를 해소함으로써 좀 더 정확한 출력 파워 연산이 이루어지도록 하고, 음원 위치 추정의 정확도를 향상시키는 방법을 제공함에 있다. Accordingly, an object of the present invention is to perform a more accurate output power calculation by solving the position mismatch between the representative point performing the operation and the figure forming the cluster in performing the SRP algorithm or the SRP-PHAT algorithm when the position of the sound source is estimated. The present invention provides a method for improving the accuracy of sound source position estimation.

본 발명의 다른 목적은, SRP 또는 SRP-PHAT 방법을 사용하여 음원의 위치를 추정함에 있어, 클러스터의 모양이 어떠한지에 관계없이 클러스터의 위치를 정확하게 나타내는 대표 점을 선정하는 위 방식을 이용하는 음원 위치 추정 시스템을 제공함에 있다.Another object of the present invention, in estimating the position of a sound source using the SRP or SRP-PHAT method, the sound source position estimation using the above method of selecting a representative point that accurately represents the position of the cluster regardless of the shape of the cluster In providing a system.

위와 같은 목적을 달성하기 위한 본 발명에 따른 대표 점 선정 방법은, 클러스터 영역 내에 다수의 좌표를 선정하는 단계 및 선정된 다수의 좌표 중 특정 좌표를 기준으로 동일 클러스터 내의 타 좌표와의 거리의 합을 연산하여 값이 최소가 되는 특정 좌표를 상기 클러스터의 대표 점으로 결정하는 단계를 포함하여 이루어진다. Representative point selection method according to the present invention for achieving the above object, the step of selecting a plurality of coordinates in the cluster region and the sum of the distance to the other coordinates in the same cluster based on a specific coordinate of the selected plurality of coordinates And calculating specific coordinates of which the value is the minimum as a representative point of the cluster.

바람직하게, 상기 타 좌표와의 거리의 합을 연산하는 과정은, 선정된 다수의 좌표 중에서 특정 좌표를 선택하는 단계, 선택된 특정 좌표를 기준으로 동일 클러스터 내의 타 좌표들과의 거리의 합을 연산하는 단계, 기준이 되는 특정 좌표를 변 경하여 타 좌표들과의 거리의 합을 연산하는 단계 및 거리의 합을 연산하는 과정을 기준이 되는 특정 좌표를 변경시켜가며 반복하는 단계를 포함하여 이루어짐을 특징으로 한다.Preferably, the process of calculating the sum of distances with other coordinates includes selecting a specific coordinate from among a plurality of selected coordinates, and calculating a sum of distances with other coordinates in the same cluster based on the selected specific coordinates. And a step of calculating a sum of distances from other coordinates by changing a specific coordinate as a reference, and repeating by changing a specific coordinate as a reference to a process of calculating the sum of the distances. It is done.

또한, 본 발명에 따른 대표 점 선정 방식을 이용한 음원 위치 추정 시스템은, 공간상에서 발생한 소리의 음파(sound wave)를 받아들이는 다수의 마이크로폰 (microphone) 및 공간을 다수의 클러스터(cluster)로 나누고, 각 클러스터의 영역 내에 다수의 좌표를 선정하여, 선정된 다수의 좌표 중 특정 좌표를 기준으로 동일 클러스터 내의 타 좌표와의 거리의 합을 연산한 값이 최소가 되는 특정 좌표를 상기 클러스터의 대표 점으로 결정하여, 음원의 위치 추정을 위한 알고리즘을 수행하는 좌표로 이용하는 음원 위치 추정부를 포함하여 이루어진다. In addition, in the sound source position estimation system using the representative point selection method according to the present invention, a plurality of microphones and spaces for receiving sound waves of sounds generated in the space are divided into a plurality of clusters, and each Selecting a plurality of coordinates in the region of the cluster, and determine the specific coordinates that the minimum value calculated by the sum of the distances with other coordinates in the same cluster based on a specific coordinate of the selected plurality of coordinates as the representative point of the cluster In addition, a sound source position estimating unit is used as coordinates for performing an algorithm for estimating a position of the sound source.

바람직하게, 상기 음원 위치 추정부는, 음원 위치 추정을 수행할 대상 공간을 다수의 클러스터(cluster)로 나누고, 각 클러스터의 영역 내에 다수의 좌표를 선정하여, 선정된 다수의 좌표 중 특정 좌표를 기준으로 동일 클러스터 내의 타 좌표와의 거리의 합을 연산한 값이 최소가 되는 특정 좌표를 상기 클러스터의 대표 점으로 결정하여 테이블에 기록하는 위치 참조 테이블 작성부, 다수의 마이크로폰에 도달하는 음파의 도착지연시간(time difference of arrival)을 판단하는 도착 지연시간 판단부 및 소리가 발생하면, 상기 도착 지연시간 판단부에서 측정된 도착 지연시간을 갖는 클러스터에 대해 SRP(Steered Response Power) 알고리즘 또는 SRP-PHAT(Steered Response Power - phase transform) 알고리즘을 수행하며, 상기 클러스터의 좌표 파악은 상기 위치 참조 테이블을 참조하는 SRP 알고리즘 수행부를 포함하여 이루어지는 것을 특징으로 한다. Preferably, the sound source position estimating unit divides a target space for performing sound source position estimation into a plurality of clusters, selects a plurality of coordinates within a region of each cluster, and selects a plurality of coordinates based on a specific coordinate among the selected plurality of coordinates. Position reference table creation unit for determining a specific coordinate whose minimum value is calculated as the sum of distances from other coordinates in the same cluster as the representative point of the cluster and recording them in a table; arrival delay time of sound waves reaching a plurality of microphones arrival delay determination unit for determining a time difference of arrival and a sound, a steep response power (SRP) algorithm or SRP-PHAT (Steered) for a cluster having the arrival delay time measured by the arrival delay time determination unit A response power-phase transform algorithm is performed and the coordinates of the cluster are determined by referring to the location reference table. It characterized in that it comprises a RP algorithm performing unit.

음원 위치 추정을 수행함에 있어, 다수의 마이크로폰에 도달하는 음파의 도착지연시간을 이용하여 공간을 다수의 클러스터로 나눌 때, 마이크로폰의 개수와 숫자에 따라 무수히 많은 형태의 클러스터가 생길 수 있다. 클러스터의 대표 점을 선정함에 있어, 무게중심을 이용하는 종래의 방식을 사용할 경우, 클러스터의 모양에 따라 무게 중심이 클러스터의 외부에 존재하는 경우에 정확한 음원의 위치 추정이 어려워지나, 본 발명이 제시하는 방법에 따라 클러스터 내에 다수의 좌표를 선정하여 선정된 점들 간의 거리의 합이 최소가 되는 점을 해당 클러스터의 대표 점으로 선정한다면 종래의 방법에 비해 정확한 위치 추정이 가능해진다.In performing the sound source position estimation, when dividing a space into a plurality of clusters using arrival delay times of sound waves reaching a plurality of microphones, a large number of clusters may be generated according to the number and number of microphones. In selecting a representative point of the cluster, when using the conventional method using the center of gravity, it is difficult to accurately estimate the position of the sound source when the center of gravity is located outside the cluster, depending on the shape of the cluster, By selecting a plurality of coordinates in a cluster according to the method and selecting a point at which the sum of the distances between the selected points is the minimum as the representative point of the cluster, accurate position estimation is possible as compared with the conventional method.

본 발명에 따른 음원 위치 추정에 있어서 대표 점 선정 방식 및 그 방식을 이용한 음원 위치 추정 시스템에 관하여, 도면을 일 예로 들어서 설명한다.In the sound source position estimation according to the present invention, a representative point selection method and a sound source position estimation system using the method will be described with reference to the drawings.

도 12는 본 발명의 일실시 예에 따른 음원 위치 추정 시스템을 나타낸 도면이다.12 is a diagram illustrating a sound source position estimation system according to an embodiment of the present invention.

도 12에 도시된 바와 같이, 본 발명에 따른 음원 위치 추정 시스템은 다수의 마이크로 폰(120)과 음원 위치 추정부(121)로 구성되며, 음원 위치 추정부(121)는 위치 참조 테이블 작성부(122), 도착 지연시간 판단부(125), SRP 알고리즘 수행부(126)로 구성된다. 또한, 위치 참조 테이블 작성부(122)는 클러스터링 부(123)와 대표 점 연산부(124)를 포함하여 이루어진다.As shown in FIG. 12, the sound source position estimation system according to the present invention includes a plurality of microphones 120 and a sound source position estimating unit 121, and the sound source position estimating unit 121 includes a position reference table preparing unit ( 122), the arrival delay time determination unit 125, and the SRP algorithm execution unit 126. In addition, the position reference table preparing unit 122 includes a clustering unit 123 and a representative point calculating unit 124.

다수의 마이크로폰(120)은 공간상에서 발생한 소리를 받아들이는 기능을 한다. 본 발명에서는 마이크로폰에 도달한 음파의 시간차이를 이용하여 음원의 위치를 추정하기 때문에 적어도 2개 이상의 마이크로폰이 필요하다.The plurality of microphones 120 function to receive sounds generated in the space. In the present invention, at least two or more microphones are required because the position of the sound source is estimated using the time difference of the sound waves reaching the microphone.

위치 참조 테이블 작성부(121)는 음파의 도착지연시간이 같은 공간 클러스터별로 대표 점을 연산하는 기능을 수행한다. The position reference table creating unit 121 calculates a representative point for each spatial cluster having the same arrival delay time of the sound waves.

클러스터링 부(123)는 특정 주파수에서 음파의 도착지연시간이 동일한 영역별로 공간을 쪼개는 작업(클러스터링, clustering)을 수행한다. 음파의 도착지연시간은 음파의 주파수나 마이크로폰의 배치에 따라 달라질 수 있다. 따라서, 마이크로폰의 배치가 이루어지고 나서, 도착지연시간이 동일한 공간별로 클러스터링 함에 있어서 음파의 발생 가능 주파수도 고려함이 바람직하다. 즉, 음파의 예상 주파수가 a일 때 도착지연시간이 같은 공간 클러스터와, 예상 주파수가 b일 때의 공간 클러스터는 동일하지 않을 것이다.The clustering unit 123 divides the space into regions having the same arrival delay time of sound waves at a specific frequency (clustering, clustering). The arrival delay time of the sound wave may vary depending on the frequency of the sound wave or the arrangement of the microphones. Therefore, after arrangement of the microphones, it is preferable to consider the possible frequencies of sound waves in clustering for spaces having the same arrival delay time. That is, the spatial cluster having the same arrival delay time when the expected frequency of the sound wave is a and the spatial cluster when the expected frequency is b will not be the same.

대표 점 연산부(124)는 클러스터링 된 영역의 대표 점을 연산하는 기능을 수행한다. 대표 점을 연산하는 방법은, 클러스터 내의 특정 좌표를 기준으로 동일 클러스터 내의 다른 모든 좌표와의 거리의 합이 최소가 되는 특정 좌표를 선정하는 방식에 의한다. 연산에 관한 좀 더 자세한 설명은 이후에 도면을 참조하여 다시 설명한다.The representative point calculator 124 calculates a representative point of the clustered area. The method of calculating the representative point is based on a method of selecting a specific coordinate at which the sum of distances with all other coordinates in the same cluster is minimum based on the specific coordinate in the cluster. A more detailed description of the operation will be described later with reference to the drawings.

도착 지연시간 판단부(125)는 다수의 마이크로 폰(120) 중에서 한 쌍의 마이크로 폰에 도착한 음파의 도착 지연시간 τ를 연산하는데, 마이크로 폰이 2개 이상이므로, 2개의 마이크로 폰을 하나의 쌍으로 묶어 모든 마이크로 폰 쌍의 조합에 대하여 음파의 도착 지연시간을 연산한다. 이때, 음파의 주파수를 고려하여 도착 지연시간을 연산한다. The arrival delay time determining unit 125 calculates the arrival delay time τ of sound waves arriving at a pair of microphones among the plurality of microphones 120. Since two or more microphones are used, a pair of two microphones is connected to one pair. Calculate the arrival delay of sound waves for all combinations of microphone pairs. At this time, the arrival delay time is calculated in consideration of the frequency of the sound wave.

SRP 알고리즘 수행부(126)는 위치 참조 테이블에 근거해 각 클러스터별로 SRP 알고리즘을 수행하는 기능을 한다. 수행 알고리즘은 SRP 알고리즘에 한정하지 아니하고 SRP-PHAT 등의 다른 알고리즘도 가능하다.The SRP algorithm execution unit 126 performs a function of performing the SRP algorithm for each cluster based on the location reference table. The execution algorithm is not limited to the SRP algorithm, but other algorithms such as SRP-PHAT are possible.

도 13은 본 발명의 일실시 예에 따른 대표 점 선정 방법을 나타낸 도면이다.13 is a diagram illustrating a representative point selection method according to an embodiment of the present invention.

도 13에 도시된 바와 같이, 같은 TDOA를 갖는 클러스터를 특정한다(S130). 본원 발명에서는 도착 지연시간을 이용하여 공간을 나누고, 나누어진 클러스터에 대하여 SRP 알고리즘을 선정하여 음원의 위치를 추정하는 방식을 기반으로 하기 때문에, 우선 클러스터를 특정할 필요가 있는 것이다.As shown in FIG. 13, a cluster having the same TDOA is specified (S130). In the present invention, since the space is divided using the arrival delay time, the SRP algorithm is selected for the divided cluster, and the position of the sound source is estimated. Therefore, the cluster needs to be specified first.

이후, 특정된 클러스터 영역 내에서 다수의 좌표를 선정하고(S131), 선정된 좌표 중에서 하나의 특정 좌표를 선택한다(S132). 예를 들어, 동일한 TDOA를 갖는 영역끼리 묶어 각각 클러스터 A, B, C, D ... 로 특정한 후, 클러스터 B의 영역 내에서 다수의 좌표 B-1, B-2, B-3, ... 를 선정하고, 선정된 좌표들 중에서 B-1을 우선 선택한다. 클러스터 영역 내에 다수의 좌표를 선정함에 있어, 선정하는 좌표의 수는 본 발명에 따른 방법을 수행하는 사용자가 임의로 정할 수 있다. 선정하는 좌표의 수가 많으면 각 클러스터의 대표 점을 선정하기 위한 연산이 많아질 것이나 좀 더 정확한 대표 점의 성격을 가질 수 있을 것이며, 선정하는 좌표의 수가 적으면 역으로의 효과가 있을 것이다.Thereafter, a plurality of coordinates are selected in the specified cluster region (S131), and one specific coordinate is selected from the selected coordinates (S132). For example, after a region having the same TDOA is bound to each other and designated as clusters A, B, C, D ..., a plurality of coordinates B-1, B-2, B-3, .. Select, and select B-1 among the selected coordinates first. In selecting a plurality of coordinates in a cluster area, the number of coordinates to be selected may be arbitrarily determined by a user performing the method according to the present invention. If the number of coordinates to be selected is large, the operation for selecting representative points of each cluster will be increased, but it may have a more accurate character of representative points. If the number of coordinates to be selected is small, the reverse effect will be obtained.

선택된 특정 좌표와 타 좌표 간의 거리의 합을 연산한다(S133). 위 예에서, 선택된 특정 좌표를 B-1이라 한다면, B-1과 B-2간의 거리, B-1과 B-3간의 거리, B-1과 B-4간의 거리, ... 를 모두 합한다. 그리고, 이러한 거리의 합 연산 과정을 동일 클러스터 내의 각 좌표별로 모두 수행한다. 즉, B-2와 다른 점들(B-1, B-3, B-4, ...)간의 거리의 합 연산과정, B-3와 다른 점들(B-1, B-2, B-4, ...)간의 거리의 합 연산 과정을 클러스터 B 내부에 선정된 모든 점들(B-1, B-2, B-3, ...)을 기준으로 수행한다. The sum of the distances between the selected specific coordinates and the other coordinates is calculated (S133). In the above example, if the specific coordinate selected is B-1, the distance between B-1 and B-2, the distance between B-1 and B-3, the distance between B-1 and B-4, ... . The distance sum operation is performed for each coordinate in the same cluster. That is, the sum operation of the distance between B-2 and other points B-1, B-3, B-4, ..., and B-3 and other points B-1, B-2, B-4 Calculate the sum of the distances between, ...) based on all points B-1, B-2, B-3, ... selected inside the cluster B.

동일 클러스터 내의 모든 좌표를 기준으로 한 거리의 합 연산 과정(S133)이 이루어진 후, 거리의 합 연산결과가 최소인 기준점을 해당 클러스터의 대표 점으로 정한다(S134). B-2를 기준으로 다른 점들과의 거리의 합을 연산한 결과가 최소값을 갖는 경우에, B-2를 클러스터 B의 대표 점으로 삼는 것이다.After the sum operation of the distances based on all coordinates in the same cluster (S133) is performed, the reference point having the minimum sum of distances is determined as the representative point of the cluster (S134). When the result of calculating the sum of distances from other points based on B-2 has a minimum value, B-2 is regarded as the representative point of cluster B.

대표 점이 선정되면, 해당 클러스터의 대표 점으로 위치 참조 테이블에 기록한다(S135). 공간상의 모든 클러스터에 대한 위치 참조 테이블의 작성이 이루어졌는지를 판단하여(S136), 작성이 이루어질 때까지 공간상의 모든 클러스터에 대한 대표점 선정 과정을 반복한다.When the representative point is selected, the representative point of the cluster is recorded in the position reference table (S135). It is determined whether the location reference table for all the clusters in the space has been created (S136), and the representative point selection process for all the clusters in the space is repeated until the creation is made.

도 14는 본 발명의 일실시 예에 따라 선정한 대표 점을 나타낸 도면이다.14 is a view showing a representative point selected according to an embodiment of the present invention.

도 14에 도시된 바와 같이, 쐐기 모양의 도형 중앙에 검게 칠해진 하나의 점이 존재하고, 흐리게 칠해진 다수의 점이 다수 존재한다.As shown in FIG. 14, there is one blackened point in the center of the wedge-shaped figure, and there are a plurality of blurred points.

도 14에서, 검게 칠해진 점은 본 발명의 방법에 따라 선정된 대표 점을 나타낸다. 앞서, 도 10에서는 대표 점이 클러스터의 외부에 위치하는 문제점을 보였으나, 도 14에서는 클러스터의 대표 점이 클러스터의 내부에 존재하게 된다.In Fig. 14, the blacked out points represent representative points selected according to the method of the present invention. In FIG. 10, the representative point is located outside the cluster, but in FIG. 14, the representative point of the cluster exists inside the cluster.

도 14에서, 흐리게 칠해진 다수의 점들은 클러스터 내에서 선정된 다수의 점들이며, 도 13의 S131 과정 수행에 따라 선정된 좌표들을 예시적으로 나타낸 것이다.In FIG. 14, the plurality of blurred points represent a plurality of points selected in the cluster, and the coordinates selected according to the process of S131 of FIG. 13 are exemplarily illustrated.

도 15는 본 발명의 대표 점 선정 방식을 이용하여 음원 위치 추정을 수행하는 방법의 일실시 예를 나타낸 도면이다.15 is a diagram illustrating an embodiment of a method for performing sound source position estimation using the representative point selection method of the present invention.

도 15에 도시된 바와 같이, 음원의 위치 추정을 수행할 공간상에서 동일한 도착지연시간(TDOA)를 갖는 클러스터를 특정한다(S150). 특정된 클러스터 내에서 다수의 좌표를 선정하고, 선정된 좌표 중에서 특정 좌표를 기준으로 다른 좌표들과의 거리의 합을 연산하여 그 거리의 합이 최소인 좌표를 대표 점으로 정한다 (S151). 대표점을 정하는 과정은 앞서 도 13의 S131 ~ S134 부분에서 설명한 바와 같다.As shown in FIG. 15, a cluster having the same arrival delay time TDOA is specified in a space in which the position of the sound source is to be estimated (S150). A plurality of coordinates are selected in the specified cluster, and the sum of distances with other coordinates is calculated based on the specific coordinates among the selected coordinates, and the coordinates having the minimum sum of the distances are determined as representative points (S151). The process of determining the representative point is as described above with reference to S131 to S134 of FIG. 13.

클러스터별로 정해진 대표 점을 위치 참조 테이블에 기록하는 과정을 반복하여(S152), 공간상의 모든 클러스터에 대한 대표 점을 기록하는 과정을 완료한다 (S153). The process of recording the representative points determined for each cluster in the position reference table is repeated (S152), and the process of recording the representative points for all clusters in the space is completed (S153).

이후, 공간상의 특정 부분에서 소리가 발생하면(S154), 마이크로폰 쌍의 조합별로 TDOA를 측정한다(S155). 측정된 TDOA를 갖는 클러스터에 대해 위치 추정 알고리즘을 적용하기 위해, 각 클러스터의 좌표에 대한 기록이 담겨있는 위치 참조 테이블을 이용한다. 공간상에 다수 존재하는 각 클러스터에 대해 SRP 알고리즘 또는 SRP-PHAT 알고리즘을 수행하여(S156), 출력 파워가 가장 높은 클러스터를 선정하면(S156), 선정된 클러스터가 음원이 발생한 위치로 추정된다(S158).Subsequently, when a sound is generated in a specific part of the space (S154), the TDOA is measured for each combination of the microphone pairs (S155). To apply a location estimation algorithm to clusters with measured TDOA, we use a location reference table that contains a record of the coordinates of each cluster. When the SRP algorithm or the SRP-PHAT algorithm is performed on each of the plurality of clusters in the space (S156), and the cluster having the highest output power is selected (S156), the selected cluster is estimated as the position where the sound source is generated (S158). ).

이상에서 대표적인 실시예를 통하여 본 발명에 대하여 상세하게 설명하였으나, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리 범위는 설명된 실시예에 국한되어 정해져서는 안되며, 후술하는 특허 청구범위뿐만 아니라 이 특허청구범위와 균등한 것들에 의하여 정해져야 한다.Although the present invention has been described in detail through the representative embodiments, those skilled in the art to which the present invention pertains can make various modifications without departing from the scope of the present invention. I will understand. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be defined by the claims below and equivalents thereof.

도 1은 종래 기술에 따른 다수의 마이크로 폰의 배열 구조를 나타낸 도면.1 is a view showing an arrangement of a plurality of microphones according to the prior art.

도 2는 2차원 공간에서의 음원 발생과 도착 지연 시간을 나타낸 도면.2 is a diagram illustrating a sound source generation and arrival delay time in a two-dimensional space.

도 3은 GCC(Generalized Cross Correlation) 방법을 이용하여 도착 지연시간을 추정하는 방법을 나타낸 도면.3 is a diagram illustrating a method of estimating arrival delay time using a Generalized Cross Correlation (GCC) method.

도 4는 4개의 마이크로폰을 이용한 음원 방향 추정 방법을 모식적으로 나타낸 도면.4 is a diagram schematically illustrating a sound source direction estimation method using four microphones.

도 5는 8개의 마이크로폰을 이용한 음원 위치 추정 방법을 개략적으로 나타낸 도면.5 is a view schematically showing a sound source position estimation method using eight microphones.

도 6은 조향된 빔 형성기를 도식화하여 나타낸 도면.6 is a schematic representation of a steered beam former;

도 7은 SRP(Steered Response Power) 방법을 이용하여 측정한 공간 파워 그래프를 나타낸 도면.FIG. 7 is a diagram illustrating a spatial power graph measured using a steeped response power (SRP) method. FIG.

도 8은 음원 위치 추정 알고리즘들의 성능 비교 그래프를 나타낸 도면.8 is a graph illustrating performance comparison of sound source position estimation algorithms.

도 9는 종래 기술에 따른 클러스터링(clustering)의 대표 점 선정 방법을 나타낸 도면.9 is a diagram illustrating a representative point selection method of clustering according to the related art.

도 10은 종래 기술에 따른 대표 점 선정 방식의 문제점을 나타낸 도면.10 is a view showing a problem of the representative point selection method according to the prior art.

도 11은 종래 기술에 따른 대표 점 선정 방식의 문제점을 나타낸 또 다른 도면.11 is yet another view showing a problem of the representative point selection method according to the prior art.

도 12는 본 발명의 일실시 예에 따른 음원 위치 추정 시스템을 나타낸 도면.12 is a view showing a sound source position estimation system according to an embodiment of the present invention.

도 13은 본 발명의 일실시 예에 따른 대표 점 선정 방법을 나타낸 도면.13 is a view showing a representative point selection method according to an embodiment of the present invention.

도 14는 본 발명의 일실시 예에 따라 선정한 대표 점을 나타낸 도면.14 is a view showing a representative point selected in accordance with an embodiment of the present invention.

도 15는 본 발명의 대표 점 선정 방식을 이용하여 음원 위치 추정을 수행하는 방법의 일실시 예를 나타낸 도면.15 is a view showing an embodiment of a method for performing sound source position estimation using the representative point selection method of the present invention.

Claims

In the method of selecting a representative point of a cluster related to the position estimation technique of a sound source,

Selecting a plurality of coordinates within the cluster region; And

Representing a sound source position estimation comprising calculating a sum of distances with other coordinates in the same cluster based on a specific coordinate among the selected plurality of coordinates and determining a specific coordinate whose value is minimum as a representative point of the cluster. Point selection method.

The method of claim 1, wherein the determining of the representative point comprises:

Selecting a particular coordinate from the selected plurality of coordinates;

Calculating a sum of distances from other coordinates in the same cluster based on the selected specific coordinates;

Calculating a sum of distances from other coordinates by changing specific coordinates as a reference;

Repeating by changing specific coordinates based on a process of calculating a sum of distances; And

And determining, as a result of the iterative operation, a specific coordinate having the minimum sum of distances as a representative point of the cluster.

The method of claim 1, wherein the cluster,

A representative point selection method of sound source position estimation, characterized in that the space is divided based on the arrival delay time that occurs when the sound wave reaches the microphone pair.

The method of claim 1, wherein the selected plurality of coordinates,

A representative point selection method of sound source position estimation, characterized in that the arrival delay time generated when a sound wave reaches a microphone pair is selected among the same points.

The method of claim 1,

And recording the representative points determined for each cluster in the form of a table.

A plurality of microphones for receiving sound waves of sounds generated in space; And

The target space to estimate the sound source position is divided into a plurality of clusters, and a plurality of coordinates are selected within the region of each cluster, and the distance from other coordinates in the same cluster is determined based on a specific coordinate among the selected plurality of coordinates. And a sound source position estimating unit configured to determine a specific coordinate at which the sum is minimized as a representative point of the cluster, and perform a position estimation algorithm of the sound source using the determined representative point.

The method of claim 6, wherein the sound source position estimation unit,

The target space to estimate the sound source position is divided into a plurality of clusters, and a plurality of coordinates are selected within the region of each cluster, and the distance from other coordinates in the same cluster is determined based on a specific coordinate among the selected plurality of coordinates. A position reference table creating unit which determines a specific coordinate at which the sum value is the minimum as a representative point of the cluster and writes it to a table;

An arrival delay time determining unit determining a time difference of arrival of sound waves reaching the plurality of microphones when a sound is generated; And

A steered response power (SRP) algorithm or a steered response power-phase transform (SRP-PHAT) algorithm is performed on the cluster having the arrival delay time measured by the arrival delay time determining unit, and the coordinates of the cluster are referred to the position. Sound source position estimation system comprising an SRP algorithm performing section that refers to the table.

The method of claim 7, wherein the position reference table creation unit,

A clustering unit for dividing spaces into spatial regions having the same arrival delay time of sound waves; And

In calculating a representative point of each cluster divided by the clustering unit, a plurality of coordinates are selected within a specific cluster, and a sum of distances from other coordinates in the same cluster is calculated based on a specific coordinate among the plurality of coordinates. And a representative point calculator configured to determine the reference coordinate at which the sum is the minimum as a representative point of the cluster.

The method of claim 6, wherein the plurality of coordinates to be selected in each of the cluster region,

A sound source position estimation system, characterized in that the arrival delay time generated when a sound wave reaches a microphone pair is selected from the same point.