KR20110010193A

KR20110010193A - Apparatus and method for estimating the size and the location of sound source

Info

Publication number: KR20110010193A
Application number: KR1020090067625A
Authority: KR
Inventors: 박영진; 현 조; 권병호
Original assignee: 한국과학기술원
Priority date: 2009-07-24
Filing date: 2009-07-24
Publication date: 2011-02-01
Also published as: KR101046683B1

Abstract

PURPOSE: A device and a method for tracking the size of a sound source are provided to enable the estimation of position and size of sound sources using a small number of microphones. CONSTITUTION: A device for tracking the size of a sound source comprises a mutual correlation calculation unit(110), a mapping function applying unit(120), and an estimating unit(130). The mutual correlation calculation unit calculates a mutual correlation sequence which is generalized for a microphone. The mapping function applying unit maps a correlation value sequence from a temporal coordinate system to a standard spatial coordinate system. The estimating unit estimates the size of the sound source based on the mapping result of the correlation value sequence. The mapping function applying unit comprises a discrete mapping unit and a coordinate conversion unit.

Description

Apparatus and method for estimating the size of a sound source {APPARATUS AND METHOD FOR ESTIMATING THE SIZE AND THE LOCATION OF SOUND SOURCE}

개시된 기술은 음원의 크기를 추정하는 장치 및 방법에 관한 것이다.The disclosed technique relates to an apparatus and method for estimating the size of a sound source.

음원 위치 추정 기술은 마이크로폰 어레이 등의 음향 센서들을 사용하여 음원 및 화자의 위치를 파악하는 기술로서, 로봇 관련 시스템(예컨대, 음원인 사용자를 위치 추정하여, 사용자에게 다가가는 인간형 로봇 또는 위치 이동 로봇을 포함하는 시스템), 폐회로 감시 시스템(예컨대, 음원을 촬영대상으로 간주하여, 음원을 위치 추정하여 촬영하는 시스템), 입체 음향 등 다양한 용도로 활용되고 있다. Sound source position estimation technology uses sound sensors such as a microphone array to determine the position of a sound source and a speaker, and is a robot-related system (e.g., a humanoid robot or a position moving robot approaching a user by estimating a user as a sound source). Including a system), a closed loop monitoring system (for example, a system for estimating and capturing a sound source by considering the sound source as a photographing target), and stereoscopic sound.

음원 위치 추정 방식 중 마이크로폰 어레이 방식은 각 마이크로폰 쌍에서 수신된 두 신호의 도착시간지연(Time Delay of Arrival: 이하 TDOA)를 추정한 후, 마이크로폰 쌍들 간의 기하학적 관계 및 상기 추정된 TDOA 값들을 이용하여 음원 위치를 추정하는 방식이다. Among the sound source position estimation methods, the microphone array method estimates a time delay (TDOA) of two signals received from each microphone pair, and then uses a geometric relation between the microphone pairs and the estimated TDOA values. It is a method of estimating the position.

한편, 음원의 특성을 보다 정확하게 파악하기 위하여 음원의 위치뿐만 아니라 음원의 크기를 추정하는 기술이 요구된다.Meanwhile, in order to more accurately grasp the characteristics of the sound source, a technique for estimating the size of the sound source as well as the position of the sound source is required.

개시된 기술이 이루고자 하는 기술적 과제는 음원의 크기를 추정하는 장치 및 방법을 제공하는 데 있다.An object of the present invention is to provide an apparatus and method for estimating the size of a sound source.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 제1 측면은 복수의 마이크로폰들 중 서로 다른 두 개의 마이크로폰 조합으로 이루어진 n개의 마이크로폰 쌍에 있어서, 제m(1 내지 n) 마이크로폰 쌍에 대한 제m(1 내지 n) 일반화된 상호상관값 (Generalized Cross Correlation: 이하, GCC) 시퀀스 (sequence)를 계산하는 상호상관값 계산부; 상기 복수의 마이크로폰들이 설치된 플랫폼에 따라 미리 정해진 제m(1 내지 n) 사상함수(mapping function)를 이용하여 상기 제m GCC 시퀀스를 시간 좌표계에서 기준 공간 좌표계로 사상(mapping)하는 사상함수 적용부; 및 상기 제1 내지 제n GCC 시퀀스의 사상 결과를 기초로 음원의 크기를 추정하는 추정부를 포함하는 음원의 크기를 추정하는 장치를 제공한다In order to achieve the above technical problem, a first aspect of the disclosed technology includes n (1 to n) microphone pairs for n (1 to n) microphone pairs in n microphone pairs consisting of two different microphone combinations among a plurality of microphones. n) a cross-correlation value calculator for calculating a Generalized Cross Correlation (hereinafter referred to as GCC) sequence; A mapping function application unit for mapping the m-th GCC sequence from a time coordinate system to a reference spatial coordinate system using a predetermined mth (1 to n) mapping function according to a platform on which the plurality of microphones are installed; And an estimator for estimating the size of the sound source based on the mapping result of the first to nth GCC sequences.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 제2 측면은 (a) 복수의 마이크로폰들 중 서로 다른 두 개의 마이크로폰 조합으로 이루어진 n개의 마이크로폰 쌍에 있어서, 제m(1 내지 n) 마이크로폰 쌍에 대한 제m(1 내지 n) GCC 시퀀스를 계산하는 단계; (b) 상기 복수의 마이크로폰들이 설치된 플랫폼에 따라 미리 정해진 제m(1 내지 n) 사상함수(mapping function)를 이용하여 상기 제m GCC 시퀀스를 시간 좌표계에서 기준 공간 좌표계로 사상(mapping)하는 단계; 및 (c) 상기 제1 내 지 제n GCC 시퀀스의 사상 결과를 기초로 음원의 크기를 추정하는 단계를 포함하는 음원의 크기를 추정하는 방법을 제공한다.The second aspect of the disclosed technology to achieve the above technical problem is (a) m pairs of microphones consisting of two different microphone combinations of a plurality of microphones, m for m (1 to n) microphone pair (1 to n) calculating the GCC sequence; (b) mapping the m-th GCC sequence from a time coordinate system to a reference spatial coordinate system using an mth (1 to n) mapping function predetermined according to a platform on which the plurality of microphones are installed; And (c) estimating the size of the sound source based on the mapping result of the first to n-th GCC sequences.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 제3 측면은 상술한 음원 위치 추정 방법을 컴퓨터 상에서 실행시키기 위한 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록 매체를 제공한다.In order to achieve the above technical problem, a third aspect of the disclosed technology provides a computer-readable recording medium containing a program for executing the above-described sound source position estimation method on a computer.

개시된 기술의 실시예들은 다음의 장점들을 포함하는 효과를 가질 수 있다. 다만, 개시된 기술의 실시예들이 이를 전부 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다. Embodiments of the disclosed technology can have the effect of including the following advantages. However, the embodiments of the disclosed technology are not meant to include all of them, and thus the scope of the disclosed technology should not be understood as being limited thereto.

개시된 기술의 일 실시예에 따르면, 음원의 위치뿐 아니라 음원의 크기를 추정할 수 있다. 또한, 개시된 기술에 의하면, 적은 개수의 마이크로폰을 사용하여 음원의 위치 추정 및 크기 추정이 가능하며, 앞뒤 혼동에도 강인한 특성을 보인다.According to one embodiment of the disclosed technology, it is possible to estimate the size of the sound source as well as the position of the sound source. In addition, according to the disclosed technology, it is possible to estimate the position and size of the sound source using a small number of microphones, and also shows a robust characteristic even before and after confusion.

개시된 기술에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 개시된 기술의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 개시된 기술의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다.The description of the disclosed technique is merely an example for structural or functional explanation and the scope of the disclosed technology should not be construed as being limited by the embodiments described in the text. That is, the embodiments may be variously modified and may have various forms, and thus the scope of the disclosed technology should be understood to include equivalents capable of realizing the technical idea.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이 다.On the other hand, the meaning of the terms described in the present application should be understood as follows.

“제1”, “제2” 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.The terms " first ", " second ", and the like are used to distinguish one element from another and should not be limited by these terms. For example, the first component may be named a second component, and similarly, the second component may also be named a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as being "connected" to another component, it should be understood that there may be other components in between, although it may be directly connected to the other component. On the other hand, when an element is referred to as being "directly connected" to another element, it should be understood that there are no other elements in between. On the other hand, other expressions describing the relationship between the components, such as "between" and "immediately between" or "neighboring to" and "directly neighboring to", should be interpreted as well.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다" 또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions should be understood to include plural expressions unless the context clearly indicates otherwise, and terms such as "include" or "have" refer to features, numbers, steps, operations, components, parts, or parts thereof described. It is to be understood that the combination is intended to be present, but not to exclude in advance the possibility of the presence or addition of one or more other features or numbers, steps, operations, components, parts or combinations thereof.

각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.Each step may occur differently from the stated order unless the context clearly dictates the specific order. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 개시된 기술이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. Terms defined in commonly used dictionaries should be interpreted to be consistent with meaning in the context of the relevant art and can not be construed as having ideal or overly formal meaning unless expressly defined in the present application.

도 1은 개시된 기술의 일실시예에 따른 음원의 크기를 추정하는 장치를 설명하기 위한 블록도이다. 1 is a block diagram illustrating an apparatus for estimating the size of a sound source according to an embodiment of the disclosed technology.

도 1을 참조하면, 음원 크기 추정 장치(100)는 상호상관값 계산부(110), 사상함수 적용부(120) 및 추정부(130)를 포함한다. 음원의 크기는 음원이 분포하는 범위를 말하며, 각도로 표현될 수 있다. Referring to FIG. 1, the sound source size estimating apparatus 100 includes a cross-correlation value calculating unit 110, a mapping function applying unit 120, and an estimating unit 130. The size of the sound source refers to a range in which the sound source is distributed, and may be expressed in degrees.

상호상관값 계산부(110)는 복수의 마이크로폰들 중 서로 다른 두 개의 마이크로폰 조합으로 이루어진 n개의 마이크로폰 쌍에 있어서, 제m(1 내지 n) 마이크로폰 쌍에 대한 제m(1 내지 n) 일반화된 상호상관값 (Generalized Cross Correlation: 이하, GCC) 시퀀스 (sequence)를 계산한다. 이하, 도 2의 예를 들어 설명한다. The cross-correlation value calculation unit 110 is the m-th (1 to n) generalized crossover for the m-th (1 to n) microphone pair in n microphone pairs composed of two different microphone combinations among a plurality of microphones. Generalized Cross Correlation (hereinafter referred to as GCC) sequence is calculated. Hereinafter, the example of FIG. 2 is demonstrated.

도 2는 일 실시예에 따라 세 개의 마이크로폰이 사용되는 경우를 설명하기 위한 도면이다. 2 is a diagram illustrating a case where three microphones are used according to an embodiment.

도 2를 참조하면, 세 개의 마이크로폰(m1, m2, m3) 어레이는 반 시계 방향으 로 0도, 120도, 240도에 배열 되어 있다. 3개의 마이크로폰(m1, m2, m3)에 대하여, m1과 m2의 제1 마이크로폰 쌍, m2와 m3의 제2 마이크로폰 쌍 및 m3과 m1의 제3 마이크로폰 쌍이 있을 수 있다. 마이크로폰 어레이는 특정 플랫폼상에 설치될 수 있다. Referring to FIG. 2, three microphones (m1, m2, m3) arrays are arranged at 0 degrees, 120 degrees, and 240 degrees counterclockwise. For three microphones m1, m2, m3, there may be a first microphone pair of m1 and m2, a second microphone pair of m2 and m3 and a third microphone pair of m3 and m1. The microphone array can be installed on a particular platform.

본 명세서에서 플랫폼은 마이크로폰 어레이가 설치되는 장치, 물체, 구조물 등을 나타내며, 인간형 로봇의 경우에는 일반적으로 인간형 로봇의 머리 부분에 마이크로폰 어레이가 설치되므로 인간형 로봇의 머리가 플랫폼에 해당하나 반드시 이에 한정되는 것은 아니다. In the present specification, the platform represents a device, an object, a structure, etc. in which the microphone array is installed, and in the case of a humanoid robot, a microphone array is generally installed at the head of the humanoid robot, so the head of the humanoid robot corresponds to a platform, but is not limited thereto. It is not.

상호상관값 계산부(110)는 제m(1 내지 n) 마이크로폰 쌍에 대한 제m(1 내지 n) GCC 시퀀스를 계산한다. The cross-correlation calculation unit 110 calculates the m (1 to n) GCC sequence for the m (1 to n) microphone pair.

일 실시예에 따라, 제1 마이크로폰 쌍에 대한 제1 GCC 시퀀스는 수학식 1과 같은 방법으로 계산된 R12들로부터 얻어질 수 있다. According to an embodiment, the first GCC sequence for the first microphone pair may be obtained from R12s calculated in the same manner as in Equation 1.

수학식 1에서, s₁[n], s₂[n]은 각각 m1 및 m2에서 수신된 신호의 n번째 디지털 샘플을 나타내며, T_s는 수신된 신호를 샘플링(즉, 아날로그 신호에서 디지털 신호로 변환)할 때의 샘플링 주파수에 따라 결정되는 샘플 간 시간 간격을 나타낸다. N은 상관 윈도우 사이즈를 나타낸다. In Equation 1, s ₁ [n] and s ₂ [n] represent the nth digital samples of the signal received at m1 and m2, respectively, and T _s samples the received signal (i.e., analog signal to digital signal). Time interval between samples determined according to the sampling frequency during the conversion). N represents the correlation window size.

m은 상관 옵셋(correlation offset or correlation lag)를 나타내며, M은 상관 옵셋의 범위를 특정하는 값이다. m represents a correlation offset or correlation lag, and M is a value specifying a range of correlation offsets.

수학식 1은 GCC의 개념을 용이하게 설명하기 위해 시간 및 디지털 도메인에서 예시한 식일 뿐, 아날로그 신호 도메인에서 GCC값을 추정할 수도 있으며, 주파수 도메인에서 GCC값을 추정할 수도 있음은 이 분야에 종사하는 자라면 충분히 이해할 수 있다. 또한, 본 발명은 특정한 GCC값 추정 알고리즘만을 사용해야만 하는 것은 아니다.Equation 1 is only an example illustrated in the time and digital domains to easily explain the concept of GCC, and it is also possible to estimate the GCC value in the analog signal domain and to estimate the GCC value in the frequency domain. Anyone who understands can understand enough. In addition, the present invention does not have to use only a specific GCC value estimation algorithm.

상호상관값 계산부(110)는 계산된 제1 내지 제m GCC 시퀀스를 사상함수 적용부(120)에 제공한다.The cross-correlation calculation unit 110 provides the calculated first to m-th GCC sequences to the mapping function application unit 120.

사상함수 적용부(120)는 복수의 마이크로폰(m1, m2, m3)들이 설치된 플랫폼에 따라 미리 정해진 제m(1 내지 n) 사상함수(mapping function)를 이용하여 제m GCC 시퀀스를 시간 좌표계에서 기준 공간 좌표계로 사상(mapping)한다.The mapping function application unit 120 may refer to the m th GCC sequence in a time coordinate system using a m (1 to n) mapping function predetermined according to a platform on which a plurality of microphones m1, m2, and m3 are installed. Map in spatial coordinates.

여기서, 시간 좌표계는 x축의 단위가 시간(예를 들어, second)인 좌표계를 말하며, 공간 좌표계는 x축의 단위가 각도(degree)인 좌표계를 말한다. y축은 GCC 값을 나타낸다.Here, the time coordinate system refers to a coordinate system in which the unit of the x-axis is time (for example, second), and the space coordinate system refers to a coordinate system in which the unit of the x-axis is an angle. The y-axis represents the GCC value.

일 실시예에 따라, 제m 사상함수는 제m 마이크로폰 쌍 기준에서 시간 좌표계의 TDOA와 제 m(1 내지 n) 공간 좌표계의 각도 간의 대응 관계를 나타내는 함수일 수 있다. According to an embodiment, the m th mapping function may be a function indicating a correspondence relationship between the TDOA of the temporal coordinate system and the angle of the m th (1 to n) spatial coordinate system on the m th microphone pair reference.

다른 일 실시예에 따르면, 제m 사상함수는 제m 마이크로폰 쌍 기준에서 시간 좌표계의 TDOA와 기준 공간 좌표계의 각도 간의 대응 관계를 나타내는 함수가 될 수 있다.According to another embodiment, the m th mapping function may be a function indicating a correspondence relationship between the TDOA of the temporal coordinate system and the angle of the reference spatial coordinate system in the m microphone pair reference.

우선, 제m 사상함수가 시간 좌표계의 TDOA와 제 m(1 내지 n) 공간 좌표계의 각도 간의 대응 관계를 나타내는 함수인 경우에 대해서 도3을 참조하여 설명하도록 한다. First, the case where the mth mapping function is a function representing a correspondence relationship between the TDOA of the time coordinate system and the angle of the mth (1 to n) spatial coordinate system will be described with reference to FIG. 3.

도 3은 도 1의 사상함수 적용부를 일 실시예에 따라 자세히 설명하기 위한 블록도이다. 도 3을 참조하면, 사상함수 적용부(120)는 개별 사상부(310) 및 좌표 변환부(320)를 포함한다. 3 is a block diagram illustrating a mapping function applying unit of FIG. 1 in detail according to an exemplary embodiment. Referring to FIG. 3, the mapping function applying unit 120 includes an individual mapping unit 310 and a coordinate transformation unit 320.

개별 사상부(310)는 제m 사상함수를 이용하여 제m GCC 시퀀스를 시간 좌표계에서 제m 공간 좌표계로 사상(mapping)한다. 제m 공간 좌표계는 제m 마이크로폰 쌍을 기준으로 설정되는 좌표계로서 수평각 및 고도각을 포함한다. The individual mapping unit 310 maps the m th GCC sequence from the time coordinate system to the m th spatial coordinate system using the m th mapping function. The mth spatial coordinate system is a coordinate system set based on the mth microphone pair and includes a horizontal angle and an elevation angle.

도 4a 및 도4b는 본 발명의 일실시예에서 사용하는 사상 함수를 설명하기 위한 그래프이다.4A and 4B are graphs for explaining a mapping function used in an embodiment of the present invention.

제2 및 제3 마이크로폰 쌍 및 제2 및 제3 사상 함수에 대해서도 마찬가지로 설명되므로, 이하에서는 제1 마이크로폰 쌍을 기준으로 설명하고자 한다.Since the second and third microphone pairs and the second and third mapping functions are similarly described, the following description will be made based on the first microphone pair.

도 4a에서, 제1 마이크로폰 쌍의 좌표계 즉, 제1 공간 좌표계는 제1 내지 제3 마이크로폰(m1, m2, m3)이 이루는 평면을 수평각(azimuth angle)을 나타내는 평면으로 하고, 제1 마이크로폰(m1)과 제2 마이크로폰(m2) 간의 중간 위치를 원점으로 한다.In FIG. 4A, the coordinate system of the first microphone pair, that is, the first spatial coordinate system, is a plane representing the azimuth angle of the first to third microphones m1, m2, and m3, and the first microphone m1. ) And the middle position between the second microphone m2 as the origin.

열린 공간(open space)에서는, 도 4a와 같이, 제1 마이크로폰 쌍에서 임의의 값의 TDOA를 발생시키는 음원의 위치는 최우측의 자주색 원으로 근사화 표현될 수 있다. 즉, 음원이 자주색 원 상에 위치하는 경우 동일한 값의 TDOA가 발생한다. 자주색 원 중 초록색 별표 위치에 해당하는 좌표는 제1 공간 좌표계에서 수평각

및 고도각(elevation angle)

로 표현될 수 있다. 여기서,

는 0도에서 360도의 범위를 가지며,

는 -90도에서 90도의 범위를 가진다. In the open space, as shown in FIG. 4A, the position of the sound source generating an arbitrary value of TDOA in the first microphone pair may be approximated by the rightmost purple circle. That is, when the sound source is located on the purple circle, the same value of TDOA occurs. Among the purple circles, the coordinate corresponding to the green star position is the horizontal angle in the first spatial coordinate system.

And elevation angles

It can be expressed as. here,

Has a range from 0 to 360 degrees,

Has a range from -90 degrees to 90 degrees.

즉, 제1 사상함수는 각각의 TDOA 값과 그 값을 가지게 하는 음원의 위치(수평각 및 고도각) 간의 대응 관계를 나타내는 함수로서, 플랫폼에 대해 실험을 통하여 이러한 대응 관계를 얻어낼 수도 있고, 플랫폼을 모델링하여 제1 사상 함수를 얻어낼 수도 있다. 제1 사상 함수는 관심 범위에 있는 모든 TDOA값에 대해 상술한 대응 관계를 가지고 있고, 각 GCC값의 상관 지연은 TDOA값에 대응되므로, 사상함수 적용부(110)는 제1 GCC값 시퀀스의 GCC값들 각각을 제1 공간 좌표계의 좌표(수평각, 고도각)에 사상(mapping)할 수 있다. That is, the first mapping function is a function indicating a correspondence relationship between each TDOA value and the position (horizontal angle and elevation angle) of the sound source which has the value, and may obtain such correspondence through experiments on the platform. May be modeled to obtain a first mapping function. Since the first mapping function has the above-described correspondence relation for all TDOA values in the range of interest, and the correlation delay of each GCC value corresponds to the TDOA value, the mapping function application unit 110 determines the GCC of the first GCC value sequence. Each of the values may be mapped to coordinates (horizontal angle, elevation angle) of the first spatial coordinate system.

한편, 제1 내지 제3 마이크로폰들(m1, m2, m3)이 설치된 플랫폼이 제1 내지 제3 마이크로폰들(m1, m2, m3)이 이루는 평면을 기준으로 상하 비대칭 구조를 가지는 경우, 동일한 TDOA를 발생시키는 음원의 위치는 도 3의 자주색 원과는 달리 원형을 이루지 않을 수 있다. 여기서, 비대칭구조라 함은 마이크로폰 어레이 평면을 기준으로 상단의 플랫폼의 신호 수신 특성과 하단의 플랫폼의 신호 수신 특성이 표면 모양의 상이, 재질의 상이 등으로 인해 달라지는 구조를 의미한다.Meanwhile, when the platform on which the first to third microphones m1, m2, and m3 are installed has a vertically asymmetrical structure with respect to a plane formed by the first to third microphones m1, m2, and m3, the same TDOA is used. The location of the sound source to be generated may not be circular, unlike the purple circle of FIG. 3. Here, the asymmetric structure refers to a structure in which the signal reception characteristics of the upper platform and the signal reception characteristics of the lower platform are different due to the surface shape, the material, etc. based on the microphone array plane.

비대칭 구조에서는, 사운드 신호가 전달되는 경로가 평면의 상단에 음원이 위치한 경우와, 평면의 하단에 위치한 경우가 서로 다르기 때문에, 평면의 상단과 하단에 각각 동일한 고도각으로 위치한 음원이 있더라도 그 TDOA값이 다를 수 있다.In the asymmetric structure, the path through which the sound signal is transmitted is different from the case where the sound source is located at the top of the plane and the case where the sound source is located at the bottom of the plane is different. This may be different.

도 4b는 마이크로폰 어레이 평면을 기준으로 상하 비대칭 구조를 가진 구형 플랫폼에 따른 사상 함수의 대응 관계를 예시한다. 4B illustrates the correspondence of the mapping functions according to a spherical platform having a vertically asymmetrical structure with respect to the microphone array plane.

도 4b에서, y축은 TDOA값을 나타내고, x축은 수평각, 곡선의 색깔은 고도각에 대응된다. 즉, 사상함수는 TDOA값과 (수평각, 고도각) 좌표 간의 대응 관계를 나타내는데, 도 4b를 참조하면, 크기가 같고 부호가 다른 고도각은 정확히 일치되지는 않고, 대부분 조금씩 차이가 나는 서로 다른 TDOA에 사상됨을 알 수 있다. 이러한 성질은 플랫폼이 비대칭적인 구조에 따라 발생된다. In FIG. 4B, the y axis represents a TDOA value, the x axis corresponds to a horizontal angle, and the color of the curve corresponds to an elevation angle. That is, the mapping function indicates a correspondence relationship between the TDOA value and the coordinates (horizontal angle and elevation angle). Referring to FIG. 4B, altitude angles having the same magnitude and different signs are not exactly matched, and different TDOAs are mostly slightly different. It can be seen that This property occurs due to the asymmetrical structure of the platform.

도 4b에 예시된 사상 함수는 플랫폼의 표면 구조, 플랫폼의 재질 등을 고려하여 모델링하여 얻을 수 있으며, 실험을 통해서도 사상 함수를 얻을 수 있다.The mapping function illustrated in FIG. 4B may be obtained by modeling in consideration of the surface structure of the platform, the material of the platform, etc., and the mapping function may be obtained through experiments.

도 5는 개별 사상부가 일실시예에 따라 사상 함수를 적용하는 과정을 설명하기 위한 도면이다.5 is a diagram for describing a process of applying a mapping function by an individual mapping unit according to an exemplary embodiment.

제2 및 제3 마이크로폰 쌍 및 제2 및 제3 사상 함수에 대해서도 마찬가지로 설명되므로, 이하에서는 제1 마이크로폰 쌍을 기준으로 설명하고자 한다. Since the second and third microphone pairs and the second and third mapping functions are similarly described, the following description will be made based on the first microphone pair.

도 5에서 좌측은 제1 공간 좌표계를 나타내고, 우측은 시간 좌표계에서의 제1 GCC 시퀀스를 나타낸다. 제1 GCC 시퀀스의 GCC값들 각각은 해당 상관 옵셋 τ를 가지고 있으며, 이 상관 옵셋은 제1 사상 함수의 대응 관계에 따라 제1 공간 좌표 계의 해당 좌표(수평각

, 고도각

)에 대응된다. 따라서, 개별 사상부(310)는 도 5와 같이 시간 좌표계의 제1 GCC값 시퀀스의 GCC값들 각각을 제1 공간 좌표계의 해당 좌표에 할당할 수 있다.In FIG. 5, the left side shows a first spatial coordinate system, and the right side shows a first GCC sequence in a time coordinate system. Each of the GCC values of the first GCC sequence has a corresponding correlation offset τ, and the correlation offset corresponds to the corresponding coordinate (horizontal angle) of the first spatial coordinate system according to the corresponding relationship of the first mapping function.

, Elevation angle

) Corresponds to Accordingly, the individual mapping unit 310 may allocate each of the GCC values of the first GCC value sequence of the time coordinate system to corresponding coordinates of the first spatial coordinate system as shown in FIG. 5.

좌표 변환부(220)는 상기 제m 공간 좌표계로의 사상 결과를 상기 기준 공간 좌표계로 변환한다.The coordinate conversion unit 220 converts the mapping result into the m-th spatial coordinate system into the reference spatial coordinate system.

기준 공간 좌표계는 음원 크기 추정 장치(100)의 기준이 되는 공통 좌표계로서, 대개 로봇의 경우에는 정면을 기준으로 설정된다. 좌표 변환부(220)는 기준 공간 좌표계와 제m 공간 좌표계 간의 관계를 기초로 제m GCC 시퀀스의 할당 결과를 좌표 변환하여, 기준 공간 좌표계의 좌표들 각각에 제m(1 내지 n) GCC 시퀀스의 해당 GCC값을 사상한다.The reference spatial coordinate system is a common coordinate system that is a reference of the sound source size estimation apparatus 100, and is usually set based on the front face in the case of a robot. The coordinate transformation unit 220 coordinate-converts the allocation result of the m-th GCC sequence based on the relationship between the reference spatial coordinate system and the m-th spatial coordinate system, so that the coordinates of the m (1 to n) GCC sequences are assigned to each of the coordinates of the reference spatial coordinate system. Maps the GCC value.

편의상 도 2의 마이크로폰들이 정삼각형을 이루고, 기준 공간 좌표계가 제1 마이크로폰 쌍의 좌표계 즉, 제1 공간 좌표계인 경우를 설명하도록 한다. 이 경우, 제1 GCC값 시퀀스의 할당 결과는 좌표 변환할 필요 없고, 제2및 제3 GCC값 시퀀스의 사상 결과를 좌표 변환해야 한다. 제2 공간 좌표계와 제3 공간 좌표계는, 제1 공간 좌표계에 대해, 각각 -120^o의 수평각 옵셋 및 120^o 의 수평각 옵셋을 가지고 있다. 따라서, 제2 공간 좌표계의 좌표(

,

)에 할당된 GCC값은 기준 공간 좌표계 즉, 제1 공간 좌표계의 좌표(

-120^o,

)에 할당된다. 제3 공간 좌표계도 마찬가지 원리로 설명된다. 이해의 편의를 위해, 간단한 정삼각형의 구조를 설명하였 지만, 정삼각형을 이루지 않는 마이크로폰 어레이의 구조에도 본 발명이 적용될 수 있음은 이 분야에 종사하는 자라면 충분히 이해할 수 있다.For convenience, the microphones of FIG. 2 form an equilateral triangle and the reference spatial coordinate system will be described in the case of the coordinate system of the first microphone pair, that is, the first spatial coordinate system. In this case, the assignment result of the first GCC value sequence need not be coordinate-converted, and the mapping result of the second and third GCC value sequences should be coordinate-converted. The second spatial coordinate system and the third spatial coordinate system each have a horizontal angle offset of −120 ^{° and} a horizontal angle offset of 120 ^° with respect to the first spatial coordinate system. Therefore, the coordinates of the second spatial coordinate system (

,

) Is assigned to the reference spatial coordinate system, that is, the coordinates of the first spatial coordinate system (

-120 ^o ,

Is assigned to). The third spatial coordinate system is also described on the same principle. For convenience of understanding, the structure of a simple equilateral triangle has been described, but it can be fully understood by those skilled in the art that the present invention can be applied to a structure of a microphone array that does not form an equilateral triangle.

도 6은 사상함수 적용부의 결과를 예시하는 도면이다. 6 is a diagram illustrating the result of the mapping function application unit.

도 6에서 좌측의 (1)번 도면은 상호상관값 계산부(110)에 의한 GCC시퀀스를 나타내고(상단부터 제1 GCC시퀀스, 제2 GCC시퀀스, 제3 GCC시퀀스) (2)번 도면은 사상함수 적용부(120)에 의하여 시간 좌표계에서의 GCC시퀀스를 공간 좌표계에서의 GCC시퀀스로 사상한 결과이다. (상단부터 제1 GCC시퀀스, 제2 GCC시퀀스, 제3 GCC시퀀스) 이해의 편의를 위하여 수평각에 대한 사상 결과만을 도시하였으나, 고도각에 대해서도 마찬가지의 결과를 얻을 수 있다.6 shows the GCC sequence by the cross-correlation value calculation unit 110 (from the top, the first GCC sequence, the second GCC sequence, and the third GCC sequence). The function applying unit 120 maps the GCC sequence in the time coordinate system to the GCC sequence in the spatial coordinate system. (1st GCC sequence, 2nd GCC sequence, 3rd GCC sequence from the top) Although the mapping result about a horizontal angle was shown for the convenience of understanding, the same result can also be obtained about an elevation angle.

다음으로, 다른 일 실시예에 따라, 제m 사상함수가 시간 좌표계의 TDOA와 기준 공간 좌표계의 각도 간의 대응 관계를 나타내는 함수인 경우에 대해 설명한다. Next, according to another embodiment, a case in which the m th mapping function is a function indicating a correspondence relationship between the TDOA of the temporal coordinate system and the angle of the reference spatial coordinate system will be described.

사상함수 적용부(120)는 미리 정해진 제m(1 내지 n) 사상 함수에 따라, 제m(1 내지 n) GCC 시퀀스의 GCC값들 각각을 기준 공간 좌표계의 해당 좌표에 할당한다. 즉, 본 실시예의 제1 내지 제n 사상 함수는 도 3에서의 좌표 변환부(320)의 좌표 변환 과정이 미리 반영된 사상 함수들이다.The mapping function application unit 120 assigns each of the GCC values of the mth (1 to n) GCC sequence to corresponding coordinates of the reference spatial coordinate system according to the predetermined mth (1 to n) mapping function. That is, the first to nth mapping functions of the present exemplary embodiments are mapping functions in which the coordinate transformation process of the coordinate conversion unit 320 in FIG. 3 is reflected in advance.

추정부(130)는 상기 제1 내지 제n GCC 시퀀스의 사상 결과를 기초로 음원의 크기를 추정한다.The estimator 130 estimates the size of the sound source based on the mapping result of the first to nth GCC sequences.

도 7은 일 실시예에 따라 추정부를 자세히 설명하기 위한 도면이고, 도 8은 일 실시예에 따른 추정부에 의해 통합된 하나의 GCC 시퀀스를 설명하기 위한 도면이다.FIG. 7 is a diagram for describing an estimator in detail according to an embodiment, and FIG. 8 is a diagram for describing one GCC sequence integrated by an estimator according to an embodiment.

도1 및 도7을 참조하면, 추정부(130)는 상호상관값 통합부(710), 최대값 검출부(720) 및 음원크기 검출부(730)를 포함한다. 1 and 7, the estimator 130 includes a cross-correlation value integrator 710, a maximum value detector 720, and a sound source size detector 730.

상호상관값 통합부(710)는 기준 공간 좌표계로 사상된 제1 내지 제n GCC 시퀀스들을 하나의 GCC 시퀀스로 통합한다. The cross-correlation value integration unit 710 integrates the first through n-th GCC sequences mapped to the reference spatial coordinate system into one GCC sequence.

일 실시예에 있어서, 상호상관값 통합부(710)는 기준 공간 좌표계로 사상된 제1 내지 제n GCC 시퀀스들을 모두 더하여 하나의 GCC 시퀀스로 통합할 수 있다. 도 6을 참조하여 설명하면, 상호상관값 통합부(710)는 기준 공간 좌표계의 각 좌표마다 도 6의 우측 (2)번 그래프의 상단 도면의 해당 위치의 GCC값, 중간 도면의 해당 위치의 GCC값, 및 하단 도면의 해당 위치의 GCC값을 더하여 도 8과 같은 결과를 생성한다. In one embodiment, the cross-correlation value integration unit 710 may add all of the first to n-th GCC sequences mapped to the reference spatial coordinate system and integrate them into one GCC sequence. Referring to FIG. 6, the cross-correlation value integrating unit 710 is a GCC value of a corresponding position of the top view of the graph of the right side (2) of FIG. 6 for each coordinate of the reference spatial coordinate system, and a GCC of the corresponding position of the middle view. The value and the GCC value of the corresponding position in the lower figure are added to generate a result as shown in FIG. 8.

여기에서는 편의상 통합 방식으로 단순 합산 방식을 예시하였지만, 사전 정보(apriori information)가 있는 경우(예컨대, 특정 마이크로폰에서 수신된 신호의 세기가 우수하다는 정보, 음원의 대략적인 범위가 파악되는 경우)라면, 단순 합산이 아닌, 가중합 방식(각 GCC들 각각에 가중치를 부여하여 합산하는 방식)의 통합도 가능함은 이 분야에 종사하는 자라면 충분히 이해할 수 있다.Here, a simple summation method is illustrated as an integrated method for convenience. However, if there is apriori information (for example, information indicating that a signal received from a specific microphone is excellent, or if the approximate range of a sound source is known), It is well understood by those skilled in the art that the addition of weighted summation (weighted to each of the GCCs), rather than simple summation, is also possible.

다른 일 실시예에 있어서, 상호상관값 통합부(710)는 기준 공간 좌표계로 사상된 제1 내지 제n GCC 시퀀스들을 모두 곱하여 하나의 GCC 시퀀스로 통합할 수도 있다.In another embodiment, the cross-correlation value integration unit 710 may multiply all of the first through n-th GCC sequences mapped to the reference spatial coordinate system and integrate them into one GCC sequence.

최대값 검출부(720)는 상호상관값 통합부(710)에서 통합된 GCC 시퀀스 중 최대값을 가지는 각도를 검출한다.The maximum value detector 720 detects an angle having the maximum value among the GCC sequences integrated in the cross-correlation value integrator 710.

도 8을 예를 들어 설명하면, x축의 수평각 360도(0도)에 해당하는 좌표가 최대값(약 1.7)을 가지는 GCC이므로, 최대값 검출부(720)는 통합된 GCC 시퀀스를 분석하여 최대 GCC값인 1.7과 1.7에 해당하는 수평각 360도(0도)를 검출해 낼 수 있다. Referring to FIG. 8 as an example, since the coordinate corresponding to the horizontal angle 360 degrees (0 degrees) of the x-axis is the GCC having the maximum value (about 1.7), the maximum value detector 720 analyzes the integrated GCC sequence to obtain the maximum GCC. The horizontal angles 360 degrees (0 degrees) corresponding to the values 1.7 and 1.7 can be detected.

한편, 도 8을 보면, 최대 GCC값 이외에 다른 값들에 비해 상당히 큰 값을 가지는 각도(60도, 180도, 300도)가 존재하는 것을 알 수 있다. 이 위치는 개별 마이크로폰 조합으로부터 생겨날 수 있는 앞뒤 혼동 위치에 해당한다. 도 8에서 확인할 수 있는 것과 같이, 합산 혹은 곱셈을 통한 GCC 통합 방식을 이용하면 통합된 GCC의 최대 값(실제 음원의 위치, 0도)은 앞뒤 혼동이 일어날 수 있는 음원의 위치들(60도, 180도, 300도) 보다 크게 나옴을 확인 할 수 있다. 이는 마이크로폰 조합에서 실제 위치에 해당하는 GCC 값의 개별 피크가 더해지거나 곱해짐으로써 그 크기가 앞뒤 혼동 위치에서의 통합된 GCC 값보다 월등히 커졌기 때문에다. 결과적으로 GCC 통합 방식을 이용하면 앞뒤 혼동에 강인한 위치추정 성능을 얻을 수 있다.On the other hand, in Figure 8, it can be seen that there exists an angle (60 degrees, 180 degrees, 300 degrees) having a significantly larger value than other values in addition to the maximum GCC value. This position corresponds to the front and rear confusion positions that can arise from individual microphone combinations. As can be seen in Figure 8, using the GCC integration method through the summation or multiplication, the maximum value of the integrated GCC (the actual position of the sound source, 0 degrees) is the position of the sound source where the front and rear confusion can occur (60 degrees, 180 degrees, 300 degrees) can be seen that the greater than. This is because, in the microphone combination, the individual peaks of the GCC values corresponding to the actual positions are added or multiplied so that the magnitude is significantly larger than the integrated GCC values at the front and back confusion positions. As a result, the GCC integration method can achieve robust location estimation performance.

도 9a 및 9b는 앞뒤 혼동을 설명하기 위한 도면이다. 9A and 9B are diagrams for explaining back and forth confusion.

이해의 편의를 위하여 음원이 2차원 평면상에만 존재한다고 가정한다. 예를 들어 음원의 위치를 추정하기 위하여 두 개의 마이크로폰만이 사용되는 경우, 하나의 TDOA로부터 추정된 음원의 위치는 한 평면에서 앞과 뒤에 두 곳이 존재하게 되는 데, 이를 앞 뒤 혼돈(front-back confusion) 현상이라고 한다. 도 9a에서 보는 것과 같이, 앞과 뒤의 음원은 모두 두 개의 마이크로폰과의 거리가 동일하기 때문에, 두 개의 마이크로폰 쌍만을 사용하여 TDOA를 측정하면 두 개의 음원의 위치가 구별되지 않는다. 즉, 두 개의 마이크로폰만으로는 TDOA를 이용한 방법으로 음원의 위치가 정확하게 추정될 수 없다. For the sake of understanding, it is assumed that the sound source exists only on the two-dimensional plane. For example, if only two microphones are used to estimate the position of a sound source, the estimated position of the sound source from one TDOA will be two front and back in one plane, which is a front-back chaos. back confusion). As shown in FIG. 9A, since the front and rear sound sources have the same distance from two microphones, when the TDOA is measured using only two microphone pairs, the positions of the two sound sources are not distinguished. In other words, the location of the sound source cannot be accurately estimated using the TDOA using only two microphones.

도 9b는 이와 같은 앞 뒤 혼돈 현상 없이 수평면 상에서 음원의 위치를 추정하는 방법을 보여준다. 도 9b와 같이 최소한 3개의 마이크로폰을 사용하면, 세 쌍의 마이크로폰 쌍으로부터 세 개의 GCC 시퀀스가 산출될 수 있고, 세 개의 GCC 시퀀스가 하나로 통합되면서 앞뒤 혼동에 따른 혼돈 위치의 음원이 제거될 수 있다. 3차원 위치 추정의 경우에는 앞뒤 혼돈을 없애기 위해서 최소한 4개의 마이크로폰이 필요하다. 9B shows a method of estimating the position of a sound source on a horizontal plane without such front and rear chaos. Using at least three microphones as shown in FIG. 9B, three GCC sequences may be calculated from three pairs of microphone pairs, and three GCC sequences may be integrated into one, thereby eliminating chaotic positions due to confusion. In the case of three-dimensional position estimation, at least four microphones are required to eliminate back and forth chaos.

개시된 기술에 따른 음원 크기 추정 장치(100)는 여러 개의 GCC 시퀀스를 구하여 이를 더하거나 곱하는 등의 방식으로 통합하므로 앞뒤 혼돈에 강하다는 장점이 있다.The sound source size estimating apparatus 100 according to the disclosed technology is advantageous in that it is strong in front and rear chaos because it integrates the multiple GCC sequences by adding or multiplying them.

최대값 검출부(720)에 의해 검출된 각도는 점음원의 위치 또는 분포음원의 중심 위치가 된다. The angle detected by the maximum value detector 720 becomes the position of the point sound source or the center position of the distributed sound source.

음원 크기 검출부(730)는 상기 최대값을 가지는 GCC값을 중심으로 문턱값보다 큰 GCC 값을 가지는 각도의 범위를 음원의 크기로 추정한다. The sound source size detector 730 estimates a range of angles having a GCC value larger than a threshold value based on the GCC value having the maximum value as the size of the sound source.

일 실시예에 따라 문턱값은 GCC 최대값과 혼돈위치의 GCC 값 중 최대값 사이의 값으로 설정될 수 있다. 도 8을 예를 들어 설명하면, GCC 최대값은 1.7이고, 혼돈 위치((60도, 180도, 300도)의 GCC 값들 중 가장 큰 값은 대략 1이므로, 문턱값 은 1.7과 1 사이의 값이 될 수 있다. 문턱값이 1과 1.7 사이에 어떤 값을 갖는지 여부는 음원 크기 추정 장치의 목적, 용도, 사용 환경 등에 따라 다르게 설계될 수 있다. According to an embodiment, the threshold value may be set to a value between the maximum GCC value and the maximum value of the GCC values at the chaotic position. Referring to FIG. 8 as an example, the maximum GCC value is 1.7, and the largest value among the GCC values at the chaotic position ((60 degrees, 180 degrees, 300 degrees) is approximately 1, so the threshold value is between 1.7 and 1). Whether the threshold has a value between 1 and 1.7 may be designed differently according to the purpose, use, use environment, etc. of the sound source size estimation apparatus.

도 10은 충격음에 대한 위치 및 크기 추정을 위하여 외벽을 세 번 가진한 경우, 도 1의 음원 크기 추정 장치에 의해 분석된 결과를 설명하기 위한 도면이다. 도 10의 아래 그림은 x축이 시간(sec), y축이 전압(V)을 나타내며, 하나의 마이크로폰에서 측정된 음압에 해당하는 전압 데이터를 시간에 따라 그래프로 표현한 것이다. 도 10의 위의 그림은 도 1의 음원 크기 추정 장치(100)가 각각의 마이크로폰으로부터 획득한 전압 데이터를 기초로 GCC 시퀀스를 계산하여 분석한 결과이다. 도 10 위 그림에서 x축은 시간(sec)을 나타내며 y축은 음원의 위치를 표현한 각도(degree)를 나타낸다. 그래프의 색깔은 GCC값의 크기를 나타내는데, 노란색에 가까울수록 큰 값이고, 파란색에 가까울수록 작은 값을 나타낸다. 음원의 크기는 사용자가 정한 문턱값 (예를 들어 노랜색의 최대 값에서부터 노란색이 없어지는 경계까지)으로부터 추정가능하며, 위치는 GCC의 최대값으로부터 추정 가능하다.FIG. 10 is a diagram for describing a result analyzed by the sound source size estimation apparatus of FIG. 1 when the outer wall has three times to estimate the position and magnitude of the impact sound. FIG. In the lower figure of FIG. 10, the x-axis represents time (sec) and the y-axis represents voltage (V), and voltage data corresponding to sound pressure measured by one microphone is represented graphically with time. 10 is a result of calculating and analyzing a GCC sequence based on voltage data obtained from each microphone by the sound source size estimation apparatus 100 of FIG. 1. In the figure above, the x-axis represents time and the y-axis represents degrees representing the position of the sound source. The color of the graph indicates the magnitude of the GCC value. The closer to yellow, the larger the value. The closer to blue, the smaller the value. The size of the sound source can be estimated from a threshold set by the user (for example, from the maximum value of yellow to the boundary of disappearing yellow), and the position can be estimated from the maximum value of the GCC.

다른 일 실시예에 따른 음원 크기 추정 장치(100)는 음원의 주파수의 특성에 따라 음원의 위치 및 크기를 추정할 수 있다. The sound source size estimating apparatus 100 according to another embodiment may estimate the position and size of the sound source according to the characteristics of the frequency of the sound source.

예를 들어 상호상관값 계산부(110)는 음원의 주파수 특성에 따라 선별된 특정 주파수 대역만 필터링(filtering)하여 GCC값을 계산하고, 계산된 GCC 시퀀스를 사상함수 적용부(120)에 제공할 수 있다. 이러한 경우, 음원 크기 추정 장치(100)는 원하는 주파수 대역의 음원에 대한 위치 및 크기를 보다 효율적으로 추정할 수 있다. 주파수 필터는 실시예에 따라 대역 통과 필터, 저역 통과 필터, 고역 통과 필터 및 대역 저지 필터 등으로 구현될 수 있다.For example, the cross-correlation value calculation unit 110 calculates a GCC value by filtering only a specific frequency band selected according to a frequency characteristic of a sound source, and provides the calculated GCC sequence to the mapping function application unit 120. Can be. In this case, the sound source size estimating apparatus 100 may estimate the position and size of the sound source of the desired frequency band more efficiently. The frequency filter may be implemented as a band pass filter, a low pass filter, a high pass filter, a band stop filter, or the like according to an embodiment.

예를 들어 외벽 가진음이 1kHz부터 4kHz 대역의 소리로만 전파된다는 특징을 미리 알고 있다고 한다면, 음원 크기 추정 장치(100)는 1kHz부터 4kHz까지의 주파수 대역에서만 GCC 시퀀스를 계산하여 외벽 가진음에 대한 위치 및 크기 추정을 보다 효율적으로 할 수 있다.For example, if it is known in advance that the outer wall excitation sound propagates only in the sound of 1 kHz to 4 kHz, the sound source size estimating apparatus 100 calculates the GCC sequence only in the frequency band of 1 kHz to 4 kHz to position the outer wall excitation sound. And size estimation can be made more efficient.

또 다른 일 실시예에 따르면, 상호상관값 계산부(110)는 주파수 대역 별로 GCC값을 계산하여 주파수 대역별로 음원의 크기를 추정하도록 할 수 있다. According to another embodiment, the cross-correlation value calculation unit 110 may calculate the GCC value for each frequency band to estimate the size of the sound source for each frequency band.

예를 들어, 음원의 위치 및 크기는 주파수 대역별로 다를 수 있고, 이러한 정보는 음원의 특징을 좀 더 정확히 분석하는데 도움이 될 수 있다. 예컨대, 상호상관값 계산부(110)는 옥타브 밴드별로 GCC 시퀀스를 계산하여 사상함수 적용부(120)에 제공하고, 사상함수 적용부(120) 및 추정부(130) 역시 옥타브 밴드별로 동작이 수행되면, 각 밴드별로 음원 위치 및 크기가 추정될 수 있다.For example, the location and size of the sound source may be different for each frequency band, and this information may help to analyze the characteristics of the sound source more accurately. For example, the cross-correlation value calculation unit 110 calculates a GCC sequence for each octave band and provides it to the mapping function applying unit 120, and the mapping function applying unit 120 and the estimator 130 also perform operations for each octave band. If so, the position and size of the sound source can be estimated for each band.

도 11은 개시된 기술의 일 실시예에 따라 음원의 크기를 추정하는 방법을 설명하기 위한 순서도이다. 11 is a flowchart illustrating a method of estimating the size of a sound source according to an embodiment of the disclosed technology.

도 1을 참조하여 도11의 음원 크기 추정 방법을 설명하면 다음과 같다. 즉, 도 1의 실시예를 시계열적으로 구현하는 경우도 본 실시예에 해당하므로 도 1의 음 원 크기 추정 장치(100)에 대하여 설명된 부분은 본 실시예에도 그대로 적용된다. The sound source size estimation method of FIG. 11 will be described with reference to FIG. 1. That is, the case of implementing the embodiment of FIG. 1 in time series also corresponds to the present embodiment. Therefore, the description of the sound source size estimation apparatus 100 of FIG. 1 is applied to the present embodiment.

음원 크기 추정 장치(100)는 복수의 마이크로폰들 중 서로 다른 두 개의 마이크로폰 조합으로 이루어진 n개의 마이크로폰 쌍에 있어서, 제m(1 내지 n) 마이크로폰 쌍에 대한 제m(1 내지 n) GCC 시퀀스를 계산한다(S1110).The sound source size estimating apparatus 100 calculates the m (1 to n) GCC sequences for the m (1 to n) microphone pairs in n microphone pairs composed of two different microphone combinations among a plurality of microphones. (S1110).

음원 크기 추정 장치(100)는 복수의 마이크로폰들이 설치된 플랫폼에 따라 미리 정해진 제m(1 내지 n) 사상함수(mapping function)를 이용하여 제m GCC 시퀀스를 시간 좌표계에서 기준 공간 좌표계로 사상(mapping)한다(S1120). The sound source size estimating apparatus 100 maps the m-th GCC sequence from a time coordinate system to a reference spatial coordinate system using a m-th (1 to n) mapping function predetermined according to a platform on which a plurality of microphones are installed. (S1120).

음원 크기 추정 장치(100)는 S1120 단계에 따른 제1 내지 제n GCC 시퀀스의 사상 결과를 기초로 음원의 크기를 추정한다(S1130). The sound source size estimation apparatus 100 estimates the size of the sound source based on the mapping result of the first to n-th GCC sequences according to step S1120 (S1130).

도 12는 도 11의 S1130 단계를 자세히 설명하기 위한 순서도이다. 도 12를 참조하면, 일 실시예에 따라, 음원 크기 추정 장치(100)는 S1120단계에서 기준 공간 좌표계로 사상된 제1 내지 제n GCC 시퀀스들을 하나의 GCC 시퀀스로 통합한다(S1210). GCC 시퀀스가 통합되면, 음원 크기 추정 장치(100)는 통합된 GCC 시퀀스 중 최대값을 가지는 각도를 검출한다.(S1220) 검출된 각도는 음원의 위치로 추정된다. 음원 크기 추정 장치(100)는 검출된 각도를 중심으로 문턱값보다 큰 GCC 값을 가지는 각도의 범위를 음원의 크기로 추정한다(1230). 문턱값은 전체 GCC 시퀀스 중 최대 GCC값과 문턱위치의 GCC값 중 최대값 사이의 값으로 결정될 수 있다. FIG. 12 is a flowchart for describing operation S1130 of FIG. 11 in detail. Referring to FIG. 12, according to an embodiment, the sound source size estimating apparatus 100 integrates the first through n-th GCC sequences mapped to the reference spatial coordinate system in step S1120 into one GCC sequence (S1210). When the GCC sequence is integrated, the sound source size estimation apparatus 100 detects an angle having the maximum value among the integrated GCC sequences (S1220). The detected angle is estimated as the position of the sound source. The sound source size estimating apparatus 100 estimates a range of angles having a GCC value larger than a threshold value based on the detected angle as the size of the sound source (1230). The threshold may be determined as a value between the maximum GCC value of the entire GCC sequence and the maximum value of the GCC values of the threshold position.

도 13은 일 실시예에 따라 도 2에 따른 마이크로폰 어레이가 사용되는 경우, 음원의 크기를 추정하는 장치를 설명하기 위한 블록도이다. FIG. 13 is a block diagram illustrating an apparatus for estimating the size of a sound source when the microphone array according to FIG. 2 is used, according to an exemplary embodiment.

도 13을 참조하면, 음원 크기 추정 장치(1300)는 증폭기(1310), A/D 컨버터(Analog to Digital Converter)(1320), 상호상관값 추정부(110), 사상함수 적용부(120) 및 추정부(130)를 포함한다. Referring to FIG. 13, the sound source size estimating apparatus 1300 includes an amplifier 1310, an analog to digital converter 1320, a cross-correlation value estimator 110, a mapping function application unit 120, and the like. The estimator 130 is included.

도 1의 음원 크기 추정 장치(100)를 이용하여 음원 크기 추정 장치(1300)를 구현하는 경우도 본 실시예에 해당하므로 도1의 음원 크기 추정 장치(100)에 대하여 설명된 부분은 본 실시예에도 그대로 적용된다.Since the sound source size estimating apparatus 1300 is implemented using the sound source size estimating apparatus 100 of FIG. 1, the description of the sound source size estimating apparatus 100 of FIG. The same applies to.

마이크로폰들(m1, m2, m3)로부터 제공되는 신호는 일반적으로 미약하므로, 증폭기(1310)에서 증폭(amplifying) 과정을 거치게 된다. 이렇게 증폭된 신호는 A/D 컨버터(1320)를 통하여 디지털 샘플들로 변환된다. 그 다음 제1 내지 제3 상호상관값 계산부(110)들은 상술한 바와 같이 공지된 각종 추정 방식을 사용하여, 제1 내지 제3 마이크로폰 쌍에 대한 제1 내지 제3 상호상관값 시퀀스를 계산한다.Since the signals provided from the microphones m1, m2, and m3 are generally weak, they are amplified in the amplifier 1310. The amplified signal is converted into digital samples by the A / D converter 1320. The first to third cross-correlation calculation units 110 then calculate the first to third cross-correlation value sequences for the first to third microphone pairs using various known estimation methods as described above. .

계산된 제1 내지 제3 상호상관값 시퀀스는 제1 내지 제3 사상함수 적용부(120)에 제공되어 상술한 바에 따라 기준 공간 좌표로 사상되고, 추정부(130)는 제1 내지 제3 사상함수 적용부(120)로부터 사상 결과를 제공받아 음원의 위치 및 크기를 추정한다. The calculated first to third cross-correlation value sequences are provided to the first to third mapping function applying unit 120 and mapped to reference spatial coordinates as described above, and the estimation unit 130 is configured to the first to third mapping. The mapping result is provided from the function application unit 120 to estimate the position and size of the sound source.

개시된 기술은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM(Read Only Memory), RAM(Random Access Memory), CD-ROM(Compact Disc-Read Only Memory), 자기 테이프, 플로피 디스크, 광데이터 저장장치 등이 있으며, 또한 케리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 본 발명을 구현하기 위한 기능적인 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.The disclosed technology can also be embodied as computer readable code on a computer readable recording medium. Computer-readable recording media include all types of recording devices that store data that can be read by a computer system. Examples of computer-readable recording media include read only memory (ROM), random access memory (RAM), compact disc-read only memory (CD-ROM), magnetic tape, floppy disks, and optical data storage devices. It also includes implementations in the form of carrier waves (eg, transmission over the Internet). The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. In addition, functional programs, codes, and code segments for implementing the present invention can be easily inferred by programmers in the art to which the present invention belongs.

이러한 개시된 기술인 방법 및 장치는 이해를 돕기 위하여 도면에 도시된 실시예를 참고로 설명되었으나, 이는 예시적인 것에 불과하며, 당해 분야에서 통상적 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 개시된 기술의 진정한 기술적 보호 범위는 첨부된 특허청구범위에 의해 정해져야 할 것이다.The disclosed method and apparatus have been described with reference to the embodiments shown in the drawings for ease of understanding, but these are merely exemplary, and various modifications and equivalent other embodiments are possible to those skilled in the art. Will understand. Therefore, the true technical protection scope of the disclosed technology should be defined by the appended claims.

도 2는 일 실시예에 따라 세 개의 마이크로폰이 사용되는 경우를 설명하기 위한 도면이다..2 is a diagram illustrating a case where three microphones are used according to an embodiment.

도 3은 도 1의 사상함수 적용부를 일 실시예에 따라 자세히 설명하기 위한 블록도이다.3 is a block diagram illustrating a mapping function applying unit of FIG. 1 in detail according to an exemplary embodiment.

도 4a 및 도4b는 본 발명의 일실시예에서 사용하는 사상 함수를 설명하기 위한 그래프이다. 4A and 4B are graphs for explaining a mapping function used in an embodiment of the present invention.

도 6은 사상함수 적용부의 결과를 예시하는 도면이다.. 6 is a diagram illustrating a result of a mapping function application unit.

도 7은 일 실시예에 따라 추정부를 자세히 설명하기 위한 도면이다. 7 is a diagram for describing an estimator in detail according to an exemplary embodiment.

도 8은 일 실시예에 따른 추정부에 의해 통합된 하나의 GCC 시퀀스를 설명하기 위한 도면이다.8 is a diagram for describing one GCC sequence integrated by an estimator, according to an exemplary embodiment.

도 9a 및 9b는 앞뒤 혼동을 설명하기 위한 도면이다.9A and 9B are diagrams for explaining back and forth confusion.

도 10은 충격음에 대한 위치 및 크기 추정을 위하여 외벽을 세 번 가진한 경우, 도 1의 음원 크기 추정 장치에 의해 분석된 결과를 설명하기 위한 도면이다.FIG. 10 is a diagram for describing a result analyzed by the sound source size estimation apparatus of FIG. 1 when the outer wall has three times to estimate the position and magnitude of the impact sound. FIG.

도 11은 개시된 기술의 일 실시예에 따라 음원의 크기를 추정하는 방법을 설명하기 위한 순서도이다.11 is a flowchart illustrating a method of estimating the size of a sound source according to an embodiment of the disclosed technology.

도 12는 도 11의 S1130 단계를 자세히 설명하기 위한 순서도이다.FIG. 12 is a flowchart for describing operation S1130 of FIG. 11 in detail.

도 13은 일 실시예에 따라 도 2에 따른 마이크로폰 어레이가 사용되는 경우, 음원의 크기를 추정하는 장치를 설명하기 위한 블록도이다.FIG. 13 is a block diagram illustrating an apparatus for estimating the size of a sound source when the microphone array according to FIG. 2 is used, according to an exemplary embodiment.

Claims

For n microphone pairs consisting of two different microphone combinations among a plurality of microphones, the m (1 to n) generalized cross correlation (m) for the m (1 to n) microphone pairs A GCC) cross-correlation value calculator for calculating a sequence;

Mapping the m-th (1 to n) GCC sequence from a time coordinate system to a reference spatial coordinate system using a m-th (1 to n) mapping function predetermined according to the platform on which the plurality of microphones are installed Mapping function application unit; And

And an estimator for estimating the size of the sound source based on the mapping result of the m-th (1 to n) GCC sequences.

The method of claim 1,

The m th mapping function is a function representing a correspondence relationship between an arrival time delay of the temporal coordinate system and an angle of an m th (1 to n) spatial coordinate system based on the m th microphone pair,

The mapping function application unit

An individual mapping unit for mapping the mth GCC sequence from the temporal coordinate system to the mth spatial coordinate system using the mth mapping function; And

Apparatus for estimating the size of the sound source including a coordinate conversion unit for converting the coordinates corresponding to the mapping result in the m-th spatial coordinate system to the coordinates of the reference space coordinate system.

The method of claim 1,

And the m-th mapping function is a function indicating a correspondence relationship between an arrival time delay of the time coordinate system and an angle of the reference spatial coordinate system based on the m-th microphone pair.

The apparatus of claim 1, wherein the cross-correlation value calculator estimates the size of a sound source that calculates a GCC value by filtering only a predetermined frequency band according to a frequency characteristic of the sound source.

The apparatus of claim 1, wherein the cross-correlation value calculator calculates a GCC value for each frequency band to estimate the size of the sound source for each frequency band.

The method of claim 1, wherein the estimating unit

A cross-correlation value integrating unit for integrating the first to n-th GCC sequences mapped to the reference spatial coordinate system into one GCC sequence;

A maximum value detector for detecting an angle having a maximum value among the integrated GCC sequences; And

Apparatus for estimating the size of the sound source including a sound source size detector for estimating the range of the angle having a GCC value greater than the threshold value centered around the detected angle as the size of the sound source.

The method of claim 6, wherein the cross-correlation value integration unit

And estimating the size of a sound source in which all of the first through n-th GCC sequences mapped to the reference spatial coordinate system are added and integrated into one GCC sequence.

The method of claim 6, wherein the cross-correlation value integration unit

And estimating the size of a sound source multiplying all of the first through n-th GCC sequences mapped to the reference spatial coordinate system into a single GCC sequence.

The apparatus of claim 6, wherein the threshold value is set to a value between a maximum value of the integrated GCC sequence and a maximum value of the GCCs of at least one chaotic position.

(a) n microphone pairs consisting of two different microphone combinations of a plurality of microphones, the method comprising the steps of: calculating an m (1 to n) GCC sequence for an m (1 to n) microphone pair;

(b) Mapping the m-th (1 to n) GCC sequence from the time coordinate system to the reference spatial coordinate system using the m-th (1 to n) mapping function predetermined according to the platform on which the plurality of microphones are installed; mapping); And

(c) estimating the size of the sound source based on a mapping result of the m-th (1 to n) GCC sequence.

The method of claim 10,

Step (c) is

Integrating the first through n-th GCC sequences mapped to the reference spatial coordinate system into one GCC sequence;

Detecting an angle having a maximum value among the integrated GCC sequences; And

And estimating a range of angles having a GCC value larger than a threshold value based on the detected angles as a size of a sound source.

A computer-readable recording medium containing a program for executing the method of claim 10 on a computer.